Data Engineer
San Francisco, CA
Hybrid role
NO C2C /1099
Role Overview
The Data Engineer will be responsible for collecting, processing, managing, analyzing, and visualizing large datasets to transform raw information into actionable insights. This role involves building secure, scalable, and repeatable data pipelines that support multiple users and business needs across diverse platforms.
Qualifications
- Bachelor’s degree in Computer Science, Information Systems, or related field, or equivalent professional experience.
- 3+ years of hands-on experience with Python and PySpark.
- Proficiency with Jupyter Notebooks, including development and unit testing.
- Proven expertise with both relational and NoSQL databases, including data modeling techniques (e.g., STAR schema, Dimensional Modeling).
- 2+ years working with modern data stacks—object stores (e.g., S3), Spark, Airflow, Lakehouse architectures, and real-time databases.
- Experience with cloud data warehouses such as Redshift or Snowflake.
- Broad knowledge of data engineering across ETL and Big Data technologies, in both on-premises and cloud environments.
- Strong experience with AWS data engineering services (e.g., CFS2/EDS), including detailed understanding of tools/services used.
- Background in building end-to-end pipelines to ingest and process unstructured/semi-structured data using Spark architecture.
Additional Requirement
- Due to the nature of the data handled, this role requires candidates to be “Protected Individuals.” Eligible applicants include U.S. citizens, U.S. nationals, lawful permanent residents, and permanent residents eligible or intending to apply for naturalization within the required timeframe.
Key Responsibilities
- Design, develop, and maintain reliable data pipelines for ingestion, transformation, cataloging, and delivery of curated, high-quality datasets into the Common Data Platform (CDP).
- Participate actively in Agile ceremonies and follow Scaled Agile (SAFe) practices defined by the CDP Program team.
- Deliver data products and services that meet enterprise standards for quality, scalability, and security.
- Monitor and troubleshoot data pipelines and stores, implementing automated monitoring, alerting, and remediation to maximize system reliability.
- Apply a security-first, test-driven, and automation-focused approach, following industry best practices.
- Collaborate with product managers, data scientists, analysts, and business stakeholders to gather requirements and deliver infrastructure and tools tailored to their needs.
- Stay current with emerging tools, frameworks, and technologies; recommend solutions to enhance efficiency and performance of data engineering workflows.