About Us
ECCO is a leading entertainment streaming platform dedicated to delivering personalized, engaging content to millions of users worldwide. We leverage cutting-edge AI to enhance user experiences through intelligent recommendations, interactive features, and dynamic content generation. Join our innovative team to shape the future of entertainment with AI!
Role Overview
We’re seeking an AI Engineer with expertise in AI model deployment, LLM fine-tuning on GPU clusters, and prompt design/engineering to optimize and scale our AI-driven entertainment solutions. You’ll work on fine-tuning and deploying generative AI models, refining prompts for conversational and recommendation systems, and ensuring seamless integration into our streaming platform.
Key Responsibilities
AI Model Deployment & Fine-Tuning:
- Deploy, fine-tune (including large-scale LLM fine-tuning on GPU clusters), and optimize generative AI models (e.g., LLMs, diffusion models) for production environments.
- Collaborate with MLOps teams to ensure scalable, low-latency inference pipelines.
- Monitor and improve model performance, reliability, and cost-efficiency.
- Implement distributed training strategies for efficient fine-tuning on multi-GPU/TPU clusters.
Prompt Design & Engineering:
- Develop and iterate on prompts for AI-driven features (e.g., personalized recommendations, search engines, content summaries).
- Experiment with techniques like chain-of-thought prompting, retrieval-augmented generation (RAG), and few-shot learning.
- Align AI outputs with brand voice and user engagement goals.
Cross-functional Collaboration:
- Partner with product, data science, and engineering teams to integrate AI into user-facing features.
- Translate creative and business requirements into technical AI solutions.
Qualifications
Must-Have:
- 3+ years of experience in AI/ML, with a focus on model deployment, LLM fine-tuning, and generative AI (e.g., GPT, Claude, Stable Diffusion).
- Proficiency in Python, PyTorch/TensorFlow, Hugging Face Transformers, and distributed training frameworks (DeepSpeed, FSDP, or similar).
- Hands-on experience with LLM fine-tuning on GPU clusters, prompt engineering, and API integrations.
- Familiarity with MLOps tools (Docker, Kubernetes, MLflow, etc.) and cloud platforms (AWS/GCP/Azure).
- Experience optimizing model training/inference for high-performance GPU/TPU environments.
Nice-to-Have:
- Background in NLP, recommendation systems, or entertainment/content platforms.
- Knowledge of multimodal AI (text + video/audio) for streaming applications.
- Passion for movies, TV shows, or interactive media!
Job Type: Full-time
Pay: $100,000.00 - $150,000.00 per year
Benefits:
- Flexible schedule
- Paid time off
Compensation Package:
Schedule:
Ability to Commute:
- Irvine, CA 92614 (Required)
Ability to Relocate:
- Irvine, CA 92614: Relocate before starting work (Required)
Work Location: In person