Data Scientist Intern — Predictive ML, Time Series & Deep Learning
We’re looking for a curious, hands-on Data Scientist intern to help build and evaluate predictive machine-learning solutions for time-series problems. You’ll work closely with product and engineering teams to design experiments, preprocess real-world time-series data, and prototype models — from classical statistics to deep-learning architectures (RNNs / LSTMs). This is a learning-first role with real impact: your models will help drive business decisions and product features.
Key responsibilities
- Collect, clean, and explore time-series and panel datasets; handle missing data, irregular sampling, and seasonality.
- Design and implement predictive models (ARIMA, SARIMAX, Prophet, XGBoost/LightGBM, etc.) and deep-learning models (RNN, LSTM, GRU).
- Build end-to-end model pipelines for training, validation, and evaluation (cross-validation for time-series, rolling windows).
- Feature engineering for time series: lag features, rolling/window aggregates, trend/seasonality decomposition, calendar/epoch features.
- Evaluate models with appropriate metrics for forecasting and classification (MAE, RMSE, MAPE, precision/recall, ROC AUC where applicable).
- Collaborate with engineers to productionize prototypes or create reproducible experiments (notebooks, scripts, model checkpoints).
- Document experiments, assumptions, and results; present concise findings to stakeholders.
- Stay current with literature and experiment with state-of-the-art architectures and strategies (attention mechanisms, sequence-to-sequence, ensembling, etc.).
Required qualifications
- Currently pursuing (or recently completed) BS/MS in Data Science, Computer Science, Statistics, Applied Math, or related field.
- Strong fundamentals in statistics, probability, and predictive modeling.
- Practical experience with Python and key libraries: NumPy, pandas, scikit-learn.
- Familiarity with deep-learning frameworks (TensorFlow/Keras or PyTorch) and experience implementing RNN/LSTM models.
- Hands-on with time-series modeling concepts (stationarity, differencing, seasonality, autocorrelation).
- Experience with data visualization and exploratory analysis (Matplotlib, Seaborn, Plotly).
- Good communication skills and ability to explain technical work to non-technical stakeholders.