Senior Infrastructure Engineer - AI/ML

Jobgether
United States
On-site
Full-time
Posted about 1 month ago

Job Description

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Infrastructure Engineer - AI/ML in the United States.

This fully remote role offers the chance to design, implement, and optimize cutting-edge AI/ML infrastructure that empowers organizations to maintain full control over their data and compute resources. You will work on modular, cloud-native, and reusable infrastructure components supporting model training, inference serving, experiment tracking, and data pipelines. This high-impact position combines hands-on engineering with strategic influence, allowing you to shape scalable, secure, and observable systems while collaborating with a globally distributed team. The ideal candidate has strong experience in Kubernetes, cloud platforms, and Infrastructure-as-Code, and thrives in a culture that values autonomy, open source, and innovative thinking.

Accountabilities:

·         Design, implement, and maintain modular, composable infrastructure components for AI/ML workflows including training, inference, and experiment tracking.

·         Contribute to open-source MLOps tooling and Kubernetes ecosystem projects that enable data sovereignty and client-controlled AI platforms.

·         Optimize large-scale AI/ML workloads for performance, cost efficiency, reliability, and observability on client-owned cloud and hybrid infrastructure.

·         Collaborate with ML engineers, cross-functional teams, and clients to deploy, configure, and maintain sovereign AI infrastructure.

·         Mentor junior engineers, contribute to technical initiatives, and provide feedback to uphold engineering excellence.

·         Participate in designing CI/CD pipelines, GitOps workflows, and automation processes for scalable AI/ML systems.

Requirements

·         4+ years of hands-on infrastructure/platform/DevOps experience with production systems.

·         Strong experience with Kubernetes, including troubleshooting, optimization, and production deployment.

·         Proficiency with Infrastructure-as-Code tools such as Terraform, Helm, Pulumi, or Ansible.

·         Experience with at least one major cloud platform (AWS, Azure, GCP), including networking, compute, and security.

·         Strong programming skills in Python and/or Go for maintainable infrastructure code.

·         Understanding of CI/CD practices, GitOps workflows, and automation principles.

·         Ability to work independently in distributed teams and communicate effectively across time zones.

·         Experience contributing to technical initiatives or mentoring junior engineers.

·         Bonus experience: MLOps pipelines, model training and serving, monitoring tools (Prometheus, Grafana), GPU infrastructure, ML workflow orchestration (Kubeflow, MLflow, Airflow), service meshes, cost optimization, and secure deployment environments.

Disclaimer: Real Jobs From Anywhere is an independent platform dedicated to providing information about job openings. We are not affiliated with, nor do we represent, any company, agency, or agent mentioned in the job listings. Please refer to our Terms of Services for further details.