Senior DevOps Lead
Our client is seeking a Senior DevOps Lead to drive the infrastructure automation, deployment, observability, and operational scalability of its cloud-native payout platform. This individual will not only be hands-on in building and maintaining modern DevOps pipelines and systems, but also will act as a technical leader, advising on architecture, tools, and process improvements, and mentoring other DevOps team members.
We need an extraordinary team member who thrives as part of a fast-paced team and takes pride in their ability to succeed while delivering value to our customers. Be challenged by innovation and grow professionally by solving one of the most interesting challenges impacting businesses across the globe.
The Opportunity
You will be solving outstanding and meaningful problems in all aspects of deployment, monitoring, and operationalization for verification and payment products and services! This will help accomplish high SLA, low cost, easy to operate, continuous integration and continuous delivery/deployment in cloud environments, Azure.
Responsibilities:
· Build and manage a DevOps team
· Provide guidance and mentoring of team members
· Develop strategies and assign responsibilities to team members
· Design, implement, and optimize CI/CD pipelines using tools such as GitLab CI/CD, Argo CD, and Terraform to support the secure, automated deployment of containerized applications to Azure Kubernetes Service (AKS).
· Lead efforts in infrastructure-as-code development using Terraform modules and Ansible playbooks, ensuring version control, reusability, and scalability across environments.
· Define and enforce Git branching strategies aligned with environment promotion and automated deployment processes (e.g., Docker image tagging, GitLab runners).
· Champion and implement best practices for blue/green and canary deployments, including traffic management via Istio and Ingress/Gateway configurations in Kubernetes.
· Oversee the implementation and automation of PKI, TLS, and mutual TLS (mTLS) within microservice environments, leveraging tools like Azure Key Vault and CSI drivers.
· Automate rollback and cluster upgrade strategies (e.g., AKS node pool rotation, Helm rollbacks), ensuring zero-downtime deployments and rollback readiness.
· Lead incident response and root cause analysis for infrastructure and deployment issues, using monitoring and logging tools such as Prometheus, Grafana, App Insights, and Azure Monitor.
· Support Linux system administration, including daemon configuration, init system debugging, and kernel module management.
· Design and enforce container security strategies, including Docker capabilities (cap-add), privilege controls, and restricted socket access.
· Act as a trusted advisor to technical leadership on DevOps tooling, observability strategies, and cloud architecture decisions.
· Mentor DevOps engineers on the team, promote knowledge sharing, and establish coding standards and documentation practices.
· Identify infrastructure and process gaps proactively and raise technical recommendations to senior stakeholders for strategic improvement.
· Work closely with other engineering teams, product, marketing, operation, information security team, and customer support teams for end-to-end solutions
Qualifications:
· Experience in building and managing teams
· 8+ years of experience in DevOps or Site Reliability Engineering roles
· Strong hands-on experience with Azure, particularly AKS (Azure Kubernetes Service), and related cloud-native tools
· Proficiency with Kubernetes, including Helm, rollout strategies, MTLS, and pod lifecycle management
· Demonstrated expertise with Terraform for infrastructure provisioning, including module design and versioning strategies
· Experience with Ansible for configuration management and post-deployment automation
· Strong understanding of CI/CD pipelines, preferably with GitLab, including environment-based deployment strategies and Docker tagging
· Proficiency in Python, especially for building lightweight services (e.g., FastAPI), with working knowledge of async programming, Uvicorn, and GIL considerations
· Solid background in Linux system administration, including systemd, kernel modules, daemon management, and file system familiarity
· Familiarity with Azure Key Vault, secret management, and runtime credential injection (e.g., via CSI driver)
· Working knowledge of PKI and certificate-based authentication, including setting up a CA and managing cert lifecycles
· Familiarity with Git-based deployment practices and infrastructure branching models (e.g., trunk-based development)
· Experience with SOC 2 compliance controls and audit preparation, including automation of evidence collection and secure configuration management practices.
· Familiarity with FedRAMP requirements, particularly around cloud infrastructure security, identity/access management, and continuous monitoring in a government-regulated environment
· Rich experience in system administration of Linux like OS systems.
· Bachelors in Computer Science or related engineering field OR equivalent professional experience.
Desirable Qualifications:
· Experience with Argo CD or other GitOps tools for multi-cluster Kubernetes state management
· Exposure to OAuth2 or OIDC-based authentication for multi-tenant apps (with or without Istio integration)
· Basic familiarity with Cloud networking and protocols (e.g., DNS resolution, NFS, netcat, telnet alternatives)
· Experience configuring and troubleshooting SFTP servers and key-based authentication
· Understanding of container security concepts, including Docker privileges, cap-add, and bind mounts
· Familiarity with file system types such as XFS, Btrfs, and using tools like showmount, lsmod, etc.
· Exposure to Helm rollback, AKS upgrade automation, or scripting for Kubernetes cluster rotation
· Awareness of monitoring and logging best practices (e.g., pod health checks, logs, resource limits, alerting)
Job Type: Contract
Pay: $75.00 - $95.00 per hour
Expected hours: 40 – 50 per week
Experience:
- DevOps or SRE: 8 years (Required)
Ability to Commute:
- McLean, VA 22101 (Preferred)
Ability to Relocate:
- McLean, VA 22101: Relocate before starting work (Preferred)
Work Location: Hybrid remote in McLean, VA 22101