Welcome to late 2025, the dawn of a new category of commodity: RL environments. RL envs are like evals - they're simulations in which AI agents can attempt to perform work.
When an AI agent attempts to perform work in one of these simulations, its action trajectory & whether it succeeded or failed is recorded, and used to update the weights of its neural network, using a process called reinforcement learning.
At Idler, we are developing a data factory factory factory: a factory that produces the factory that produces data factories.
data factory - an RL env is a data factory, it's used to generate training data to improve the agentic performance of LLMs
data factory factory - our in-house tools and automations constitute a data factory factory, by virtue of being an abstract assembly line for data factories
data factory factory factory - Idler is an organization that assembles data factory factories.
Are you interested in co-creating a system with 4th order consequences? Do you want to be exposed to the bleeding edge of AI model training & the world's leading AI researchers? Would you like your work to directly make a number go up? Are you a software developer with dogged persistence?
Then this job is for you!
Responsibilities:
develop and maintain software systems that automate the process of creating realistic training environments for coding AI agents
develop and maintain realistic scenarios and evaluations for coding AI agents
engage with and deeply understand the needs of frontier AI researchers
develop and maintain quality assurance systems for internal and crowdsourced work
Compensation Range: $100K - $500K