Founding MLOps Engineer
Our client, a venture-backed AI Startup, is hiring a talented ML/AI Research Engineer to join their team in San Francisco. The successful candidate will act as the key link between cutting-edge ML research and secure, scalable production systems, while helping build the foundational infrastructure that enables agents and models to be trained, evaluated, deployed and governed across multiple tenants and real-world applications.
Responsibilities
-
Design and own the infrastructure powering training, deployment, evaluation and governance of large-scale AI systems.
-
Build end-to-end pipelines for LLM fine-tuning (SFT, LoRA, RLHF, DPO).
-
Develop dynamic RAG embedding systems with continuous data updates.
-
Implement model packaging, quantization and high-performance inference deployment.
-
Manage GPU-intensive hybrid infrastructure (cloud + on-prem) using Kubernetes, Ray and Terraform.
-
Build CI/CD workflows for models using Docker, GitHub Actions and ArgoCD.
-
Establish comprehensive model governance covering versioning, lineage, reproducibility and evaluation tracking.
-
Implement observability for latency, token usage, drift detection and performance analytics.
-
Enable secure multi-tenant deployments using policy-as-code (OPA, RBAC, ABAC).
Skillset
-
Minimum of 4 years of experience in MLOps, ML infrastructure or backend/platform engineering.
-
Proven experience deploying and scaling LLM systems in production environments.
-
Strong expertise in Kubernetes, cloud infrastructure and CI/CD pipelines.
-
Hands-on experience with model lifecycle tools such as MLflow, DVC and Weights & Biases.
-
Familiarity with RAG pipelines, vector databases, and agent orchestration frameworks.
-
Experience with SOC2, HIPAA or GovCloud environments, or prior startup/founding team experience is a bonus.
Benefits
-
Salary: Circa. $250k.
-
Equity.
57440
SHARE JOB