T+S
Independent Candidates only please try for USC/GC
hybrid day 1
Location: JC 2x/week
Onsite final in JC please clear them for this, 1st round is tech screen
need Photo ID
Data Engineer – Databricks & Python (ETL Testing Focus)
Location:
Hybrid / Remote / Onsite (as per project requirements)
Job Summary
We are seeking a Data Engineer with strong hands-on experience in Databricks, Python, and ETL Testing to support our enterprise data initiatives. The ideal candidate will be responsible for designing, developing, and validating data pipelines and analytics workflows, ensuring data integrity, accuracy, and performance across large-scale distributed environments.
This role blends data engineering and data quality automation, leveraging Python (Pandas, PySpark) to perform functional, regression, and data validation testing of ETL workflows built in Databricks.
Key Responsibilities
- Design, build, and maintain scalable ETL pipelines using Azure Databricks and PySpark for ingestion, transformation, and loading of structured and semi-structured data.
- Develop and execute ETL test cases to validate data accuracy, transformation logic, and end-to-end data flow.
- Implement automated data validation frameworks using Python (Pandas, PyTest, Great Expectations, or similar tools).
- Collaborate with data architects, analysts, and business users to ensure high-quality data delivery for analytics and reporting.
- Perform data reconciliation and source-to-target validation between raw data and transformed layers.
- Optimize Databricks notebooks and Spark jobs for performance and cost efficiency.
- Implement CI/CD integration for ETL testing using Azure DevOps / GitHub Actions / Jenkins.
- Maintain data quality metrics and monitor ETL job performance and reliability.
- Work with Azure Data Factory, Delta Lake, and Azure Blob Storage for pipeline orchestration and data lake management.
- Document test cases, test results, and data lineage for audit and compliance.
Required Skills And Experience
- 3–6 years of hands-on experience in Data Engineering or ETL Testing roles.
- Strong proficiency in Python, including libraries such as Pandas, NumPy, and PyTest.
- Hands-on experience with Databricks (Azure or AWS) and PySpark for ETL development and validation.
- Solid understanding of data transformation, schema validation, and data quality assurance.
- Experience writing complex SQL queries for data validation and reconciliation.
- Working knowledge of Azure Data Factory or other orchestration tools.
- Familiarity with Delta Lake, Parquet, and distributed data storage concepts.
- Experience in version control and CI/CD practices (Git, Azure DevOps, Jenkins).
- Strong analytical and problem-solving skills with attention to detail.