AI Engineer

Ecco, Inc. • Full-time • Irvine, California, United States • $100k - $150k / year • 1m ago

About Us

ECCO is a leading entertainment streaming platform dedicated to delivering personalized, engaging content to millions of users worldwide. We leverage cutting-edge AI to enhance user experiences through intelligent recommendations, interactive features, and dynamic content generation. Join our innovative team to shape the future of entertainment with AI!

Role Overview

We’re seeking an AI Engineer with expertise in AI model deployment, LLM fine-tuning on GPU clusters, and prompt design/engineering to optimize and scale our AI-driven entertainment solutions. You’ll work on fine-tuning and deploying generative AI models, refining prompts for conversational and recommendation systems, and ensuring seamless integration into our streaming platform.

Key Responsibilities

AI Model Deployment & Fine-Tuning:

Deploy, fine-tune (including large-scale LLM fine-tuning on GPU clusters), and optimize generative AI models (e.g., LLMs, diffusion models) for production environments.
Collaborate with MLOps teams to ensure scalable, low-latency inference pipelines.
Monitor and improve model performance, reliability, and cost-efficiency.
Implement distributed training strategies for efficient fine-tuning on multi-GPU/TPU clusters.

Prompt Design & Engineering:

Develop and iterate on prompts for AI-driven features (e.g., personalized recommendations, search engines, content summaries).
Experiment with techniques like chain-of-thought prompting, retrieval-augmented generation (RAG), and few-shot learning.
Align AI outputs with brand voice and user engagement goals.

Cross-functional Collaboration:

Partner with product, data science, and engineering teams to integrate AI into user-facing features.
Translate creative and business requirements into technical AI solutions.

Qualifications

Must-Have:

3+ years of experience in AI/ML, with a focus on model deployment, LLM fine-tuning, and generative AI (e.g., GPT, Claude, Stable Diffusion).
Proficiency in Python, PyTorch/TensorFlow, Hugging Face Transformers, and distributed training frameworks (DeepSpeed, FSDP, or similar).
Hands-on experience with LLM fine-tuning on GPU clusters, prompt engineering, and API integrations.
Familiarity with MLOps tools (Docker, Kubernetes, MLflow, etc.) and cloud platforms (AWS/GCP/Azure).
Experience optimizing model training/inference for high-performance GPU/TPU environments.

Nice-to-Have: