Big Data Engineer

absiusa • Columbus, Ohio, United States • 3m ago

Engagement Type

Contract

" " Responsibilities: • Participate in Team activities, Design discussions, Stand up meetings and planning Review with team. • Provide Snowflake database technical support in developing reliable, efficient, and scalable solutions for various projects on Snowflake. • Ingest the existing data, framework and programs from ODM EDW IOP Big data environment to the ODM EDW Snowflake environment using the best practices. • Design and develop Snowpark features in Python, understand the requirements and iterate. • Interface with the open-source community and contribute to Snowflake//'s open-source libraries including Snowpark Python and the Snowflake Python Connector. • Create, monitor, and maintain role-based access controls, Virtual warehouses, Tasks, Snow pipe, Streams on Snowflake databases to support different use cases. • Performance tuning of Snowflake queries and procedures. Recommending and documenting the best practices of Snowflake. • Explore the new capabilities of Snowflake, perform POC and implement them based on business requirements. • Responsible for creating and maintaining the Snowflake technical documentation, ensuring compliance with data governance and security policies. • Implement Snowflake user /query log analysis, History capture, and user email alert configuration. • Enable data governance in Snowflake, including row/column-level data security using secure views and dynamic data masking features. • Perform data analysis, data profiling, data quality and data ingestion in various layers using big data/Hadoop/Hive/Impala queries, PySpark programs and UNIX shell scripts. • Follow the organization coding standard document, Create mappings, sessions and workflows as per the mapping specification document. • Perform Gap and impact analysis of ETL and IOP jobs for the new requirement and enhancements. • Create mockup data, perform Unit testing and capture the result sets against the jobs developed in lower environment. • Updating the production support Run book, Control M schedule document as per the production release. • Create and update design documents, provide detail description about workflows after every production release. • Continuously monitor the production data loads, fix the issues, update the tracker document with the issues, Identify the performance issues. • Performance tuning long running ETL/ELT jobs by creating partitions, enabling full load and other standard approaches. • Perform Quality assurance check, Reconciliation post data loads and communicate to vendor for receiving fixed data. • Participate in ETL/ELT code review and design re-usable frameworks. • Create Change requests, workplan, Test results, BCAB checklist documents for the code deployment to production environment and perform the code validation post deployment. • Work with Snowflake Admin, Hadoop Admin, ETL and SAS admin teams for code deployments and health checks. • Create re-usable framework for Audit Balance Control to capture Reconciliation, mapping parameters and variables, serves as single point of reference for workflows. • Create Snowpark and PySpark programs to ingest historical and incremental data. • Create SQOOP scripts to ingest historical data from EDW oracle database to Hadoop IOP, created HIVE tables and Impala views creation scripts for Dimension tables. • Participate in meetings to continuously upgrade the Functional and technical expertise. " REQUIRED Skill Sets: • Proficiency in Data Warehousing, Data migration, and Snowflake is essential for this role. • Strong Experience in the implementation, execution, and maintenance of Data Integration technology solutions. • Minimum (4-6) years of hands-on experience with Cloud databases. • Minimum (2-3) years of hands-on data migration experience from the Big data environment to Snowflake environment. • Minimum (2-3) years of hands-on experience with the Snowflake platform along with Snowpipe and Snowpark. • Strong experience with Snow SQL, PL/SQL, and expertise in writing snowflake procedures using SQL/python/Java. • Experience with optimizing Snowflake database performance and real-time monitoring. • Strong database architecture, critical thinking, and problem-solving abilities. • Experience with the AWS platform Services. • Snowflake Certification is Highly desirable. • Snowpark with Python programming languages is preferred to be used to build data pipelines. • 8+ years of experience with Big Data, Hadoop on Data Warehousing or Data Integration projects. • Analysis, Design, development, support and Enhancements of ETL/ELT in data warehouse environment with Cloudera Bigdata Technologies (with a minimum of 8-9 years//' experience in Hadoop, MapReduce, Sqoop, PySpark, Spark, HDFS, Hive, Impala, StreamSets, Kudu, Oozie, Hue, Kafka, Yarn, Python, Flume, Zookeeper, Sentry, Cloudera Navigator) along with Oracle SQL/PL-SQL, Unix commands and shell scripting; • Strong development experience (minimum of 8-9 years) in creating Sqoop scripts, PySpark programs, HDFS commands, HDFS file formats (Parquet, Avro, ORC etc.), StreamSets pipeline creation, jobs scheduling, hive/impala queries, Unix commands, scripting and shell scripting etc. • Writing Hadoop/Hive/Impala scripts (minimum of 8-9 years//' experience) for gathering stats on table post data loads. • Strong SQL experience (Oracle and Hadoop (Hive/Impala etc.)). • Writing complex SQL queries and performed tuning based on the Hadoop/Hive/Impala explain plan results. • Experience building data sets and familiarity with PHI and PII data. • Expertise implementing complex ETL/ELT logic. • Accountable for ETL/ELT design documentation. • Basic knowledge of UNIX/LINUX shell scripting. • Utilize ETL/ELT standards and practices towards establishing and following centralized metadata repository. • Good experience in working with Visio, Excel, PowerPoint, Word, etc. • Effective communication, presentation and organizational skills. • Familiar with Project Management methodologies like Waterfall and Agile. • Ability to establish priorities and follow through on projects, paying close attention to detail with minimal supervision. • Required Education: BS/BA degree or combination of education & experience. " DESIRED Skill Sets: • In addition, to overall Snowflake experience, a candidate should have experience in the development work in both the Snowpipe and Snowpark. • Experience with Data Migration from Big data environment to Snowflake environment. • Strong understanding of Snowflake capabilities like Snowpipe, STREAMS, TASKS etc. • Knowledge of security (SAML, SCIM, OAuth, OpenID, Kerberos, Policies, entitlements etc.). • Has experience with System DRP for Snowflake systems • Demonstrate effective leadership, analytical and problem-solving skills. • Required excellent written and oral communication skills with technical and business teams. • Ability to work independently, as well as part of a team. • Stay abreast of current technologies in area of IT assigned. • Establish facts and draw valid conclusions. • Recognize patterns and opportunities for improvement throughout the entire organization. • Ability to discern critical from minor problems and innovate new solutions.

Required/Desired Skills

Skill Required/Desired Amount of Experience experience in Hadoop, MapReduce, Sqoop, PySpark, Spark, HDFS, Hive, Impala, StreamSets, Kudu, Oozie, Hue, Kafka, Yarn, Python, Flume, Zookeeper, Sentr Required 9 Years Strong development experience in creating Sqoop scripts, PySpark programs, HDFS commands, HDFS file formats Required 9 Years writing Hadoop/Hive/Impala scripts for gathering stats on table post data loads. Required 9 Years hands-on experience with Cloud databases. Required 6 Years hands-on data migration experience from the Big data environment to Snowflake environment. Required 3 Years hands-on experience with the Snowflake platform along with Snowpipe and Snowpark. Required 3 Years BS/BA degree or combination of education & experience. Required 0