Role: Data Engineer, Full-time role
Location: NYC, NY, OR Fort Mill, SC
This is an on-site role initially, with the possibility of transitioning to a hybrid model later.
Duration: 1+ year
Experience: 9+ yrs Lead role
****Currently, we are unable to offer sponsorship. Candidates with independent work authorization are encouraged to apply ****
Key Responsibilities:
- Collaborate with cross-functional teams, including Data Scientists, Analysts, and Engineers,s to gather data requirements and build scalable data solutions.
- Design, develop, and maintain complex ETL pipelines using AWS Glue and PySpark, ensuring efficient data processing across batch and streaming workloads.
- Ensure data integrity, quality, and security across data pipelines, applying best practices for encryption, IAM, and compliance.
- Monitor and troubleshoot pipeline issues, continuously optimizing for cost and performance across AWS services.
- Stay current with advancements in AWS Glue, PySpark, and data infrastructure tools, and recommend improvements where applicable.
- Deep understanding of Spark architecture, distributed processing, and performance tuning techniques.
- Strong grasp of data modeling, schema design, and data warehouse concepts.
- Experience with AWS data ecosystem including S3, Lambda and Glue Catalog.
- Proficiency in Python (PySpark) for data transformation and automation tasks.
- Familiarity with CI/CD practices and infrastructure-as-code tools such as Terraform is a plus.
- Excellent communication and problem-solving skills, with the ability to work independently and in a team environment.
****Currently, we are unable to offer sponsorship. Candidates with independent work authorization are encouraged to apply****