• 5+ years’ experience as data scientist with MLops understanding.
• Responsible to development of models and prototypes in AWS sagemaker notebook using Python, Pyspark.
• Perform detailed analysis of data sources using AWS Athena, Redshift or Databricks SQL in terms of Data Quality and Feature Engineering. Expert in SQL querying and know windowing functions.
• Perform data assessment and other data related preparations needed for the use case.
• Have good understanding of the different linear and nonlinear model algorithms like Random Forest, XGBoost, etc. Aware of different model evaluation techniques.
• Responsible to perform the MLops operation end to end (i.e. development, validation, deployment & visualization) using different AWS services like AWS lambda , Step functions, Jenkins & Terraform.
• Have good understanding of cloud and different cloud services.
• Proficient in coding logic using python and have used python libraries like Numpy, Pandas, Scipy, NLTK, Sklearn etc. and different data wrangling packages.
• Good to have hands on experience on visualization tool like PowerBI, Tableau etc.
• Expert level knowledge of advanced data mining and statistical modelling: Predictive modelling techniques Classification techniques; Clustering techniques & Text mining techniques.
Candidate MUST HAVE:-
- Hands on model building with python / pyspark ml
- AWS experience (Sagemaker, Athena, etc.)
- Databricks experience
- Experience with TensorFlow or Pytorch (Transformer model)
- Experienced with Scikitlearn python package
- Expert in SQL querying and know windowing functions
- GitHub experience
Good to have:-
- Airflow
- Docker
- AWS Bedrock
- PowerBI
- GEN AI experience