SUMMARY:
W e are seeking an experienced and results-driven Data Engineer II to join our team. This role is designed for professionals with 3+ years of experience who excel in designing, building, and optimizing robust data pipelines and systems. As a Data Engineer II, you will leverage your advanced knowledge of Azure, Azure Synapse, Databricks, and modern CI/CD frameworks to implement data solutions that meet organizational needs. In addition to batch processing, you will be responsible for developing and maintaining streaming and real-time data ingestion pipelines, enabling the organization to react to data as it arrives. You will work collaboratively with cross-functional teams and serve as a mentor to junior data engineers, ensuring high-quality data engineering practices across projects.
WHAT YOU’LL BE DOING:
•Pipeline Development and Optimization: Building, optimizing, and maintaining reliable ETL/ELT pipelines to process large volumes of structured and unstructured data efficiently utilizing tools such as Azure Data Factory, Azure Synapse, Databricks, and other Azure services.
•Collaboration with AI and Stakeholders: Partnering with AI/ML teams, technical peers, and non-technical stakeholders to gather requirements, design data solutions, and deliver pipelines that enable intelligent, data-driven product features and insights.
•Data Modeling: Developing and maintaining data models for structured and unstructured datasets to ensure data integrity and accessibility.
•Data Quality, Governance and Documentation: Establishing and maintaining processes to ensure data accuracy, consistency, and compliance with organizational standards.
•Mentorship and Knowledge Sharing: Providing technical guidance to junior team members and contribute to team learning and development.
YOU’VE GOT WHAT IT TAKES IF YOU HAVE/ARE:
- 3+ years of relevant work experience in Data Engineering
- Bachelor’s Degree in computer science or related field
- Technical Expertise:
- Proficiency in Azure Synapse, Azure Data Factory, Databricks, and related tools.
- Hands-on experience with streaming data pipelines in Databricks, including the use of Structured Streaming, Delta Live Tables, and event-driven architectures.
- Familiarity with Unity Catalog for centralized data governance, access control, and metadata management across the Databricks Lakehouse platform.
- Strong programming skills in Python, SQL, or Spark SQL.
- Experience designing and optimizing scalable data lake and medallion architectures.
- Knowledge of distributed systems and data processing frameworks (e.g., Spark).
- Experience with source control tools such as Git and CI/CD pipelines.
- Proven ability to design and implement data ingestion pipelines from API sources, including REST and GraphQL endpoints and handling data in various formats like JSON and XML.
- Familiarity designing and deploying Azure Functions for serverless data processing and event-driven workflows.
- Analytical and Problem-Solving Skills:
- Ability to diagnose and resolve complex data-related issues.
- Strong aptitude for performance tuning and optimization of data pipelines.
- Collaboration and Communication:
- Effective at working with cross-functional teams to translate business requirements into technical solutions.
- Strong interpersonal skills with the ability to communicate complex technical concepts clearly to non-technical stakeholders.
- Leadership and Mentorship:
- Experience mentoring junior team members and fostering a culture of collaboration.
- Ability to take ownership of projects and drive them to successful completion.
Continuous Learning:
Commitment to staying updated on emerging data engineering technologies and best practices.