Data Engineer - Databricks (Mid Level) - US Citizens/US Green Card Holders with 3 years

Infobahn Solutions Inc. • Full-time • Remote (United States) • $90k - $130k / year • 3m ago

---- Project requirements mandate role open only for US Citizens/US Green Card Holders with a minimum of 3 years on Green Card . IRS MBI Clearance a plus/ Active Secret or Top Secret a Plus. All candidates will have to go through Clearance process before being able to start on the project.--(No exceptions to this requirement)

Job Description

Infobahn Solutions is hiring Databricks Data Engineering professionals in the Washington DC Metro Area for a US Government Federal Project with the Department of Treasury .
The Data Engineers will be part of a Data Migration & Conversion Team on a large DataLake being implemented on AWS Gov Cloud .
Data will be migrated from on premise Main Frame /Legacy database systems using Informatica PowerCenter to the AWS Landing Zone on S3.
Further conversion will be done using Databricks (PySpark) in AWS.
The Data Engineer should have prior Data Migration experience and understand all the intricacies required of developing data integration routines for moving data from multiple source systems to a new target system with a different data model.
The Data Engineer should have experience in converting Oracle PL/SQL and/or Greenplum code to Databricks.
Must have experience - Experience with Data Migrations and Conversion using Databricks .
Experience of using Databricks on AWS and managing a Databricks production system is critical and a must have for the project.

What you’ll be doing:

Databricks Environment Setup: Configure and maintain Databricks clusters, ensuring optimal performance and scalability for big data processing and analytics.
ETL (Extract, Transform, Load): Design and implement ETL processes using Databricks notebooks or jobs to process and transform raw data into a usable format for analysis.
Data Lake Integration: Work with data lakes and data storage systems to efficiently manage and access large datasets within the Databricks environment.
Data Processing and Analysis: Develop and optimize Spark jobs for data processing, analysis, and machine learning tasks using Databricks notebooks.
Collaboration: Collaborate with data scientists, data engineers, and other stakeholders to understand business requirements and implement solutions.
Performance Tuning: Identify and address performance bottlenecks in Databricks jobs and clusters to optimize data processing speed and resource utilization.
Security and Compliance: Implement and enforce security measures to protect sensitive data within the Databricks environment, ensuring compliance with relevant regulations.
Documentation: Maintain documentation for Databricks workflows, configurations, and best practices to facilitate knowledge sharing and team collaboration.

Skills:

Apache Spark: Strong expertise in Apache Spark, which is the underlying distributed computing engine in Databricks.
Databricks Platform: In-depth knowledge of the Databricks platform, including its features, architecture, and administration.
Programming Languages: Proficiency in languages such as Python or Scala for developing Spark applications within Databricks.
SQL: Strong SQL skills for data manipulation, querying, and analysis within Databricks notebooks.
ETL Tools: Experience with ETL tools and frameworks for efficient data processing and transformation.
Data Lake and Storage: Familiarity with data lakes and storage systems, such as Delta Lake, AWS S3, or Azure Data Lake Storage.
Collaboration and Communication: Effective communication and collaboration skills to work with cross-functional teams and stakeholders.
Problem Solving: Strong problem-solving skills to troubleshoot issues and optimize Databricks workflows.
Version Control: Experience with version control systems (e.g., Git) for managing and tracking changes to Databricks notebooks and code.

Role Requirements:

Bachelor/Master’s degree in computer science, Engineering, or related field
7-8 plus years of development experience on ETL tools (4+ years of Databricks is a must have)
5+ years of experience as a Databricks Engineer or similar role.
Strong expertise in Apache Spark and hands-on experience with Databricks.
More than 7 years of experience performing data reconciliation, data validation, ETL testing, deploying ETL packages and automating ETL jobs, developing reconciliation reports.
Working knowledge of message-oriented middleware/streaming data technologies such as Kafka, Confluent
Proficiency in programming languages such as Python or Scala for developing Spark applications.
Solid understanding of ETL processes and data modeling concepts.
Experience with data lakes and storage systems, such as Delta Lake, AWS S3, or Azure Data Lake Storage.
Strong SQL skills for data manipulation and analysis.
Good experience in shell scripting, AutoSys
Strong Data Modeling Skills
Strong analytical skills applied to business software solutions maintenance and/or development
Must be able to work with a team to write code, review code, and work on system operations.
Past project experience with Data Conversion and Data Migration
Communicate analysis, results and ideas to key decision makers including business and technical stakeholders.
Experience in developing and deploying data ingestion, processing, and distribution systems with AWS technologies
Experience with using AWS datastores, including RDS Postgres, S3, or DynamoDB
Dev-ops experience using GIT, developing, deploying code to production
Proficient in using AWS Cloud Services for Data Engineering tasks
Proficient in programming in Python/shell or other scripting languages for the purpose of data movement
Eligible for a US Government issued IRS MBI (candidates with active IRS MBIs will be preferred)