Job Title: Senior HPC Support Engineer
Location: Seattle, WA / Westford, MA / Durham, NC / Santa Clara, CA
Duration: Direct Hire Permanent Role
JOB DESCRIPTION:
We are seeking a motivated Senior HPC Technical Support Engineer - AI Infrastructure focusing on InfiniBand, NVLink and AI GPU Cluster technology, passionate about data centre and networking technologies, to provide comprehensive solutions for sophisticated installations, maintenance, or operations for a broad scope of groundbreaking networking products. As a primary point of contact for our customers; assisting them with technical questions, debugging and resolving their issues. As a member of our Technical Support team, you are a conscientious, proficient communicator who is fundamentally interested in taking ownership in resolving issues, while ensuring a high level of customer satisfaction is maintained and delivered. Significant part of the role is also to interact with Engineering, Marketing, and Support teams regularly on technical issues
ROLES AND RESPONSIBILITIES:
- Ability to resolve sophisticated customer concerns and technical issues through meticulous research, reproduction, and solving problems for customers installing our products and supporting systems using Linux Operating Systems (Multi-distro), with the focus on client InfiniBand, NVLink and GPU Technology and our End-to-End Solutions
- Responding to customer product support inquiries via telephone, email, or conference calls
- Resolving customer issues during installation, operation, maintenance or product application or interoperability with other vendors
- Participate in multi-functional team meetings and giving feedback to engineering and marketing regarding product requirements, customer experience, support tools, etc.
- Being a technical resource, develop, re-define and document standard methodologies to share with internal teams (Support/R&D) for support processes and improvements
- Site visits and conference calls with customers
MINIMUM QUALIFICATIONS:
- 5+ years in providing in-depth Customer Support and debugging for hardware and software products.
- Exceptional interpersonal skills with the ability to maintain and own the overall resolution for any critical issue raised by our customers, under all circumstances.
- Linux OS including System Administration and Networking on a LFCS/RHCSA level
- Networking Technology, protocols and routing including IP, L2 and L3 on a CCNP/CompTIA Networking+ and Cloud+ level
- Containerized solutions experience on a level of DCA and/or CKA, Virtualization and (KVM/ESXi) and Cloud Infrastructure (AWS/OCI) Technologies
- Able to debug networking protocols using tools such as TCPDUMP and Wireshark or similar packet generation and analysis tools
- Bash/Python scripting abilities
- Strong organizational skills and able to prioritize/multi-task easily with limited supervision
- Integrating AI tools (Cursor, Gemini, ChatGPT, Copilot, Glean, etc.) into your daily workflow.
- Four-year degree from an accredited University, College, or equivalent experience in Computer Science, or Electrical or Computer Engineering
WAYS TO STAND OUT FROM THE CROWD:
- Certifications related to AI Infrastructure, Operations and Networking
- InfiniBand, RDMA, NVLink and NVIDIA GPU Technology
- Clustering or HPC Data-Centre technologies including Upper Layer Protocols (i.e., MPI, NCCL)
- Additional Operating Systems such as Microsoft Windows, VMware, Unix
- Configuration and operational expertise with traditional network switch/router and Open platforms
PRIMARY SKILLS:
- Customer Support & Debugging, Linux System Administration & Networking, Networking Technologies & Protocols (IP, L2, L3), Cloud Infrastructure (AWS, OCI), Containerization & Virtualization (Docker, Kubernetes, KVM, ESXi), Network Debugging Tools (TCPDUMP, Wireshark), Bash & Python Scripting, AI Tools Integration
Job Type: Full-time
Pay: $100,000.00 - $200,000.00 per year
Benefits:
- Dental insurance
- Health insurance
- Paid time off
- Vision insurance
Experience:
- HPC (High Performance Computing): 5 years (Required)
- Clustering or HPC Data-Centre technologies.: 5 years (Required)
- Bash/Python Scripting : 5 years (Required)
Work Location: In person