Founding RL Engineer
Built Different Talent
Job Description
Job DescriptionJob DescriptionFounding RL Engineer | San Francisco | $300K + EquityPython · PyTorch · RL Environments · Verifiers · Reward Models
The best AI engineers aren't waiting for a job ad. They're waiting for the right moment.
This might be it.
Anthropic leadership has discussed spending over $1 billion on RL environments in the next year alone. The market is moving - fast and at scale. One pre-seed company just closed a round at a valuation most Series A companies would envy, with frontier AI labs already as paying clients and acquisition interest already on the table.
The reason? They cracked something the rest of the market hasn't.
Most teams build RL environments from synthetic data - easy to demo, easy to commoditise, brittle. This team mines real human behavioural data - how domain experts actually reason, decide, and solve complex tasks over long-horizon workflows. 10-100+ step environments. Closed-loop systems where environments, data, training, and evaluation are tightly integrated. Not proxies. Not shortcuts.
25-30% uplift in model task success rates. 50-65% more training signals. Evals that reflect how humans actually work.
The funding just landed. Second time founders. The founding team is being built right now. There are very few seats. Are you going to be in one of them?
???? $300K base + founding equity at a $20-30M pre-seed valuation???? Frontier AI labs as paying clients from day one????️ Environment design treated as a first-class problem - not an afterthought???? San Francisco, remote an option but founding team energy a must???? Get in now - before this role disappears and you're watching from the outside
What you'll be doing
???? Building closed-loop RL infrastructure - environment harness, verifiers, reward models, gym infrastructure???? Designing and extending long-horizon RL environments grounded in real expert behavioural data????
Turning raw human workflows into clean, production-ready training signals???? Running training experiments to prove the gyms produce real capability gains???? Making architecture decisions that will define the platform for years - alongside second time founders who've built and shipped at scale
You'll love this if...
✅ You've built RL environments, gyms, or training infrastructure at a serious RL or AI company✅ You think environment design is the most underrated problem in AI right now✅ Long-horizon RL excites you more than short-form RLHF✅ You move fast, think in first principles, and thrive without a playbook✅ Python and PyTorch are second nature - and you know what makes a great training environment, not just a functional one
Roles like this don't stay open. If it fits - move.
Because great teams are Built Different.