Position: Data Scientist, Diagnostics
Location: Boston, MA (On-site)
Department: R&D
Employment Type: Full-time
To Apply: Email CV/resume to contact@cellensinc.com
CELLENS OVERVIEW
Cellens is a venture-backed oncology diagnostics company developing a first-in-class platform for minimal residual disease (MRD) detection, beginning with one of the most recurrent cancers: bladder cancer. Our technology leverages proprietary, patented AI-driven computational models that analyze biophysical properties of cells measured through atomic force microscopy (AFM) and other advanced microscopy techniques. Our mission is to transform how cancer recurrence is detected and monitored—moving beyond genomic and proteomic markers to identify recurrence through fundamental cellular biophysical differences. By capturing subtle mechanical and structural changes associated with malignancy, we aim to provide clinicians with a powerful approach to disease surveillance.
We are seeking a Data Scientist with strong expertise in feature extraction, data cleaning, and pipeline optimization to advance our next-generation diagnostics platform. You will play a critical role in refining our software tools and developing robust quality-control (QC) pipelines to ensure reproducibility, consistency, and reliability of all datasets used for downstream machine learning and clinical applications.
RESPONSIBILITIES OVERVIEW
Feature Extraction & Data Processing
- Develop and refine data pipelines to convert high-dimensional AFM outputs into clean, reproducible features.
- Extract, validate, and optimize quantitative features (e.g., surface texture, topography, statistical descriptors) from imaging datasets.
- Ensure reproducibility, normalization, and consistency across runs, instruments, protocols, operators, and clinical sites.
- Apply advanced preprocessing techniques (baseline correction, denoising, drift correction) to microscopy data.
Data Quality & Statistical Analysis
- Build and implement QC metrics to identify, flag, and resolve inconsistent or low-quality data.
- Perform statistical validation analyses (e.g., confidence intervals, ICC, Bland–Altman, LoB/LoD) to assess reliability of extracted features.
- Conduct power analyses and help define technical success criteria.
Machine Learning Readiness
- Clean, normalize, and denoise data to prepare ML-ready datasets integrating multi-channel and multi-condition data streams.
- Conduct exploratory data analysis, hypothesis generation, ablations, and error analyses.
- Perform literature scans related to surface topography, biophysical features, stability/importance analysis, and comparative model studies (RF/XGBoost/DL).
- Collaborate closely with ML engineers to align feature extraction workflows with classifier requirements.
- Monitor and improve the efficiency, scalability, and robustness of data pipelines for expanding datasets.
Communication & Cross-functional Collaboration
- Align feature schemas with classifier needs; partner with ML engineers on interfaces, data structures, and data contracts.
- Write clear, reproducible SOPs and documentation.
- Present findings to cross-functional teams including R&D, QA/CLIA operations, clinical operations, and external collaborators.
REQUIRED QUALIFICATIONS
- 5+ years in data science or scientific computing (or MS/PhD with equivalent project depth).
- Strong hands-on Python skills (NumPy, Pandas, SciPy) and statistical data analysis experience.
- Demonstrated experience in feature engineering for imaging, time-series, or 3D surface data.
- Proven skills in signal or image preprocessing (denoising, normalization, artifact detection, QC metrics).
- Experience working with scientific, clinical, or diagnostics datasets.
- Excellent communication skills and ability to collaborate across disciplines.
- Demonstrated leadership experience within data science or technical teams.
DESIRED SKILLS
- Experience with multimodal data integration and large-scale computational workflows.
- Background in biological data analysis, biophysics, or clinical diagnostics.
- Familiarity with ML classifiers (Random Forest, XGBoost, SVM).
- Experience with surface texture analysis and ISO-standard feature characterization.
WHAT WE OFFER
- The opportunity to shape a cutting-edge AI-driven biophysical diagnostics platform with broad potential in oncology and beyond.
- Competitive compensation including salary, equity, and comprehensive benefits.
- A collaborative, mission-driven environment focused on improving cancer patient outcomes through innovation.