Description:
We are looking for a highly skilled NLP Data Scientist / Developer to design and implement natural language processing solutions for real-world problems. You will work on extracting insights from unstructured text data, building language models, and deploying real-world, intelligent applications that understand and process human language. This role blends data science, machine learning, and software development, with Python and LLMs at the core.
Key Responsibilities:
-
Develop and implement NLP pipelines to process, analyze, and extract insights from structured and unstructured text data.
-
Build and fine-tune models for text classification, named entity recognition, summarization, sentiment analysis, topic modeling, etc.
-
Work with state-of-the-art language models (e.g., BERT/DeBERTa, spaCy, LLM APIs) and apply transfer learning techniques.
-
Clean, tokenize, and normalize large text corpora in various formats (PDFs, HTML, etc.).
-
Collaborate with cross-functional teams to integrate NLP features into software tools and customer-facing applications.
-
Create REST APIs or services to serve models in production using frameworks like FastAPI or Flask.
-
Optimize performance, accuracy, and scalability of NLP systems.
-
Document technical approaches, experiment results, and development procedures for internal and external stakeholders.
What We Offer:
-
Competitive salary and benefits package
-
Flexible remote work options
-
Access to GPU resources and cloud infrastructure
-
Opportunities to work on cutting-edge NLP problems
-
A collaborative, forward-thinking AI/ML team
Requirements:
Required Qualifications:
-
2+ years of experience with NLP development and Python packages.
-
Strong knowledge of NLP libraries such as spaCy and Transformers (Hugging Face).
-
Solid understanding of text preprocessing, vectorization (TF-IDF, word embeddings), and classification techniques.
-
Experience with machine learning libraries like TensorFlow/PyTorch.
-
Strong knowledge of hybrid models incorporating LLMs/genAI and traditional ML approaches
-
Experience with PDF text extraction.
-
Must currently possess or be eligible to obtain a Public Trust clearance
Preferred Qualifications:
-
Bachelor’s or Master’s degree in Data Science, Computational Linguistics, Machine Learning, Applied Mathematics, Statistics, Computer Science or a related field.
-
Experience with LLMs (Large Language Models) and prompt engineering.
-
Knowledge of data privacy, redaction, and PII detection in text.
-
Background in information retrieval or question-answering systems.
-
Prior work with government, legal, healthcare, or enterprise document processing is a plus.
-
Experience working with cloud platforms (AWS, Azure, GCP) and containerization (Docker).
-
Familiarity with REST APIs, FastAPI/Flask, and deploying models to production.
-
Proficiency with version control (Git) and collaborative development workflows.