About Us
We're a well-funded stealth startup backed by proven unicorn founders, building the next generation of AI-powered consumer hardware. We're assembling a small, elite team to create revolutionary products that integrate cutting-edge voice, vision, and AI technologies. If you're excited about optimizing AI models for real-world deployment and shipping world-changing technology—we'd love to talk.
The Role
Join as our AI inference specialist to optimize and deploy models that power our device. You'll work directly with founders who've built unicorn companies and know how to ship fast. This is ML optimization at its finest—converting, optimizing, and serving models for production at scale with low latency.
What You'll Do
- Optimize and convert AI models for production inference engines
- Work with frameworks like VLLM, SGLang, and similar serving systems
- Optimize TTS, STT, vision, and multimodal models for deployment
- Convert models to efficient formats (ONNX, TensorRT, etc.)
- Build and tune inference pipelines for low-latency requirements
- Quantize and compress models while maintaining quality
- Benchmark and profile model performance
- Integrate optimized models with our device infrastructure
Requirements
- 4+ years experience with ML model optimization and deployment
- Strong experience with inference engines (VLLM, SGLang, TensorRT, etc.)
- Deep knowledge of model conversion and optimization (ONNX, quantization, pruning)
- Experience optimizing TTS, STT, vision, or multimodal models
- Strong Python and C++ programming skills
- Understanding of GPU optimization and CUDA
- Track record of deploying AI models in production
- PyTorch, TensorFlow, or similar ML frameworks
Why Join
- Build an ambitious product with real-world impact from prototypes to mass production
- Work with founders who've built unicorn companies and know how to ship fast
- Competitive compensation, equity, and the chance to shape the product—and the company
Small empowered team. No bureaucracy. Big upside.