About Mindbeam
We are building the next-generation AI infrastructure for open source and enterprise. Our work is deeply research-oriented and passionate about developing ground-breaking innovations to take state-of-the-art AI applications to the next level.
What drives us is not only advancing technology, but empowering the people behind it. We are a community of researchers, engineers, and visionaries who believe that collaboration, curiosity, and openness fuel progress. If you’re motivated by impact and inspired to build tools that others can build upon, you’ll be in the right place.
Mission
Engineer robust data pipelines and systems that ensure efficient, reliable, and scalable access to high-quality training data.
Role Expectations
- Design and maintain large-scale data ingestion, preprocessing, and storage pipelines for AI training.
- Optimize data systems for performance, throughput, and cost efficiency.
- Implement quality checks, deduplication, and labeling processes to improve dataset integrity.
- Collaborate with ML engineers and researchers to deliver curated datasets for experiments and production training.
- Ensure compliance, governance, and security in handling diverse data sources.
Background
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field—or equivalent experience.
- 2+ years of experience building large-scale data pipelines or infrastructure.
- Strong coding skills in Python and familiarity with data frameworks (Spark, Ray, Dask, or similar).
- Knowledge of cloud data storage (S3, BigQuery, etc.) and distributed computing.
- Familiarity with ML workflows and the role of data quality in model performance.
- Experience with observability, logging, and monitoring in data systems.
About You
You’re detail-oriented, data-driven, and thrive on building systems that scale. You understand that high-quality data is the backbone of successful AI, and you enjoy collaborating across teams to make it accessible, reliable, and efficient.
Compensation Range: $150K - $190K