ECU Health
About ECU Health
ECU Health is a mission-driven, 1,708-bed academic health care system serving more than 1.4 million people in 29 eastern North Carolina counties. The not-for-profit system is comprised of 13,000 team members, nine hospitals and a physician group that encompasses over 1,100 academic and community providers practicing in over 180 primary and specialty clinics located in more than 130 locations.
The flagship ECU Health Medical Center, a Level I Trauma Center, and ECU Health Maynard Children's Hospital serve as the primary teaching hospitals for the Brody School of Medicine at East Carolina University. ECU Health and the Brody School of Medicine share a combined academic mission to improve the health and well-being of eastern North Carolina through patient care, education and research.
Position Summary
The AI Data Engineer will be responsible for designing, developing, and maintaining data pipelines and infrastructure to support AI and machine learning applications. This role involves collaborating with various departments to ensure data is readily available, clean, and formatted for optimal use by AI models. The AI Data Engineer will play a crucial role in bridging the gap between raw data and actionable insights, contributing to the company's digital transformation and innovation efforts.
Responsibilities
- Design and implement ETL processes to extract, transform, and load data from diverse sources into central data storage systems such as data warehouses or data lakes and implement batch and real-time data pipelines using Azure Data Factory, Azure Synapse Pipelines, and Azure Stream Analytics.
- Develop scalable data pipelines that can handle large volumes of data with high velocity and variety within an Epic EHR system.
- Ensure data quality and integrity by implementing data validation and cleansing procedures (e.g., SQL Server, Blob Storage, Event Hubs, IoT Hub, APIs) into Azure Data Lake Gen2 or Synapse Analytics.
-
Select and manage appropriate data storage solutions, including SQL, NoSQL, and cloud-based data warehouses using Delta Lake and Parquet formats to enable performant, versioned data storage for ML training.
-
Build reusable, modular pipeline components using Azure Data Factorys Data Flows and custom Azure Functions.
-
Comprehensive understanding of compliance frameworks (HIPAA) and security using RBAC, Private Link, Key Vault integration, and data masking to secure sensitive data in AI pipelines.
-
Configure and maintain data processing platforms like Databricks and Azure Fabric.
-
Automate data infrastructure operations, including pipeline deployment, monitoring, and maintenance.
-
Collaborate with data scientists, machine learning engineers, and other stakeholders to support AI model development and deployment.
-
Monitor and optimize the performance of data pipelines and infrastructure to ensure efficient data processing and storage.
-
Stay up to date with the latest advancements in AI and data engineering technologies and methodologies.
-
Azure Services: Data Factory, Synapse, Data Lake Gen2, Stream Analytics, Event Hubs, Azure ML, Key Vault, Purview
-
Big Data & Processing: Azure Databricks, PySpark, Delta Lake
-
Languages: Python, SQL, Scala (optional)
-
CI/CD & IaC: Azure DevOps, GitHub Actions, Terraform, Bicep
-
Monitoring & Logging: Azure Monitor, Log Analytics, Application Insights
-
Governance & Cataloging: Microsoft Purview, Azure Policy
Minimum Requirements
- Bachelor's degree or higher in computer science, data science, engineering, mathematics, or a related field with 3 years of experience, (with one year of work experience in an environment where HIPPA compliance is demonstrated.) or high school diploma or higher with 5 years of equivalent practical work experience, (with 2 years of work experience in an environment where HIPPA compliance is demonstrated.)
-
Proven experience in infrastructure as code (e.g., Terraform, CloudFormation).
-
Proven experience in data pipeline development and ETL processes.
-
Expertise in data storage systems, including SQL, NoSQL, data lakes, and data warehouses.
-
Proficiency in tools like Apache Kafka, Apache Spark, Airflow, and similar platforms.
-
Cloud platform expertise, including Azure, AWS, and Google Cloud.
-
Excellent problem-solving skills and attention to detail.
-
Strong communication and collaboration skills to work effectively with cross-functional teams.
-
Experience in a healthcare environment, including familiarity with healthcare data management and regulations.
-
Understanding of Health Insurance Portability and Accountability Act (HIPAA) compliance.
-
Ability to work collaboratively in a team environment.
Preferred Certifications can include but are not limited to:
-
Microsoft DP-203 (Azure Data Engineer)
-
Microsoft AI-102 (Certified Azure AI Engineer)
General Statement
It is the goal of ECU Health and its entities to employ the most qualified individual who best matches the requirements for the vacant position.
Offers of employment are subject to successful completion of all pre-employment screenings, which may include an occupational health screening, criminal record check, education, reference, and licensure verification.
We value diversity and are proud to be an equal opportunity employer. Decisions of employment are made based on business needs, job requirements and applicants qualifications without regard to race, color, religion, gender, national origin, disability status, protected veteran status, genetic information and testing, family and medical leave, sexual orientation, gender identity or expression or any other status protected by law. We prohibit retaliation against individuals who bring forth any complaint, orally or in writing, to the employer, or against any individuals who assist or participate in the investigation of any complaint.
#LI-REMOTE
#LI-MG1