LatentView Analytics is a leading global analytics and decision sciences provider, delivering solutions that help companies drive digital transformation and use data to gain a competitive advantage. With analytics solutions that provide a 360-degree view of the digital consumer, fuel machine learning capabilities and support artificial intelligence initiatives., LatentView Analytics enables leading global brands to predict new revenue streams, anticipate product trends and popularity, improve customer retention rates, optimize investment decisions, and turn unstructured data into valuable business assets.
We are seeking a talented and passionate Data Engineering Lead specialized in databricks to join our data engineering team. In this role, he/she must help architect, implement, and manage cloud-based data solutions using Databricks and associated technologies (Azure, AWS). The ideal candidate will have experience in developing, optimizing, and maintaining data pipelines, analytics workflows and data lakes.
Responsibilities:
Data Pipeline Development: Build and maintain scalable and optimized data pipelines using Apache Spark, Databricks, and other cloud-based tools to ensure efficient data flow across systems.
Cloud Infrastructure: Work with cloud providers (Azure, AWS) to design and implement cloud-native solutions for data storage, processing, and analytics. Experience with Databricks in a cloud-based environment is essential.
Collaboration with Data Scientists/Analysts: Collaborate with data scientists, analysts, and business stakeholders to transform business requirements into data solutions and deliver meaningful insights.
Performance Optimization: Continuously optimize Spark and Databricks workflows for performance and cost efficiency.
Stakeholder Management: Collaborate with business stakeholders to understand their data needs. Communicate project status, risks, and mitigation plans effectively to them.
Exposure in Gen AI (LLM models, Agentic AI): Stay updated Gen AI, LLM models and Agentic AI evaluate their potential, and make recommendations for adoption.
ETL & Data Integration: Implement ETL processes to integrate data from various sources (SQL, NoSQL, REST APIs) into Databricks environments and data lakes.
Data Governance: Ensure data security, privacy, and compliance policies are followed, including managing data access, monitoring usage.
Documentation & Best Practices: Document solutions, architectures, and workflows while enforcing coding best practices within the team.
Skills:
- Hands-on experience with Databricks (Apache Spark) for large-scale data processing and analytics. Strong experience working with cloud platforms such as Azure or AWS (Azure Databricks is highly preferred). Proficiency in SQL, Python, Scala, or Java for data processing and automation. Familiarity with data storage solutions like Delta Lake, Data Lakes, Azure Data Lake Storage, or AWS S3.
- Proficient in building data pipelines and using tools such as Apache Spark, Databricks, and Airflow.
- Experience with DevOps and CI/CD processes for data workflows. Familiarity with containerization technologies (e.g., Docker, Kubernetes) is a plus.
- Familiarity with machine learning pipelines and frameworks (e.g., MLflow, TensorFlow, scikit-learn) is a plus.
- Strong problem-solving skills and the ability to work independently as well as collaboratively.
- Excellent communication skills and the ability to articulate complex technical solutions to non-technical stakeholders.
- Liaising with team members, management, and clients to ensure projects are completed to standard.