Title: Senior Data Engineer
Department: Baseball Research and Development
Report to: Manager, Software Engineering
Status: Regular Full-Time
Location: Philadelphia, PA; Open to Remote
Position Overview:
The work of a Data Engineer at the Phillies extends well beyond merely coding. Phillies Engineering takes a product-centered approach in creating the platform, systems, and tooling that empower our entire organization to spend more time thinking about baseball. We turn data into information into action.
As a member of our team, you will play a central role in optimizing how data flows throughout the Phillies organization — from ingestion to modeling to application. You will help design, build, and optimize the data infrastructure that powers predictive models, research, dashboards, reports, and internal applications. Your work will directly impact data-driven decision-making for Phillies Baseball Operations and help shape the evolution of the Phillies' analytics systems.
You'll work with a wide range of data sources, including ball and player tracking data (Statcast), biomechanical time-series data, player biographical and contract information, and internally developed datasets.
Responsibilities:
- Design, develop, and maintain scalable data pipelines and systems that support predictive modeling, reporting, and internal applications.
- Architect and optimize cloud-based data platforms and backend databases to ensure performance, reliability, and cost-efficiency.
- Build systems that enable the flow of curated, high-quality data across the organization.
- Promote best practices in data engineering, including testing, monitoring (Datadog, Prometheus), documentation, and code review.
- Cultivate a deep familiarity with the internal and external data sources used at the Phillies and continuously evaluate if they are providing value to the organization.
- Collaborate with data science and engineering to design, implement, and maintain real-time data pipelines that serve as the foundation for analysis and decision-making.
- Collaborate with data science, application engineers, and product teams to build new features.
Required Qualifications:
- At least 5 years of professional experience in data engineering or software development, working with large-scale data systems.
- Fluency with Python and SQL.
- Experience with modern data engineering tools and cloud platforms (e.g. Google Cloud, AWS, Azure).
- Experience with workflow orchestration tools such as Apache Airflow, Dagster, or Prefect.
- Familiarity with containerization and deployment technologies such as Docker and Kubernetes.
- Solid understanding of data pipelines, ETL/ELT workflows, and performance optimization.
- Experience with data modeling, data warehousing concepts, and/or relational database optimization for backend systems.
- Experience with event-driven systems, streaming data, and real-time architectures (e.g. Kafka, Redpanda).
- Excellent problem-solving and communication skills, with a collaborative mindset.
- Demonstrated leadership and self-direction.
Preferred Qualifications:
- Experience with dbt or similar transformation frameworks.
- Experience working with spatiotemporal data, particularly player tracking and biomechanical data.
- Passion for working with sports data and research.
To be considered, all candidates must submit a response for the prompt below:
Describe an example of a mistake you personally made (design choice or code implementation) that affected a production data pipeline. What actions did you take to resolve it? (Please keep responses to 500 words or less)
We are an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, sex, sexual orientation, age, disability, gender identity, marital or veteran status, or any other protected class.