A well-funded AI startup (Series A, backed by top-tier investors and led by ex-DeepMind / Anthropic / OpenAI engineers) is building next-generation agentic systems — intelligent, autonomous software agents that can reason, plan, and act across browsers, operating systems, and enterprise environments.
The Role
As a Research Engineer on the AI Architecture team, you will design, prototype, and rigorously evaluate novel model architectures and training strategies that push the boundaries of efficiency, scaling, and model capability. Your work will directly influence the organization’s next-generation pretraining runs, and you’ll collaborate closely with the pretraining and systems teams to productionize your research.
This role is ideal for someone who thrives in fast-paced research environments, has strong intuition for promising ideas, and enjoys taking concepts from sketch → prototype → thorough experimental validation.
What You’ll Do
- Research, design, and test new model architectures and training methods aimed at improving loss-per-FLOP, loss-per-parameter, and overall modeling efficiency
- Identify and solve bottlenecks in contemporary architectures
- Rapidly prototype and iterate on ideas, running rigorous experiments, ablations, and hypothesis tests
- Collaborate closely with pretraining engineers to integrate successful approaches into large-scale training pipelines
- Work in a highly collaborative research environment where strong taste, curiosity, and creativity are valued
What We’re Looking For
- Strong research intuition and the ability to take a project from concept → experimentation → write-up
- Ability to prototype quickly and operate independently in a fast-moving research environment
- Curiosity, creativity, and a genuine interest in understanding intelligence
- Excellent collaboration skills in high-velocity research teams
Qualifications
- Research experience designing or analyzing novel architectures (e.g., state-space models, diffusion models, MoEs, long-context models)
- Experience with long-term memory systems, retrieval/RAG, dynamic or adaptive computation, or alternative credit-assignment methods
- Background with reinforcement learning, control theory, or signal processing
- Demonstrated comfort exploring unconventional or “crazy” ideas and evaluating them rigorously
- Understanding of large-scale training pipelines and GPU hardware constraints
- Strong experimental methodology (ablations, controls, statistical rigor)
- High proficiency in PyTorch and Python
- Ability to navigate and contribute to large, complex codebases
- Previously published ML research in reputable venues (NeurIPS, ICML, ICLR, CVPR, etc.)
- Postgraduate degree in CS, EE/EECS, Math, Physics, or related scientific field
Total comp: 500,000-1,000,000 (with equity, base of 200-300k)