ML / AI Research Engineer

San Francisco, California

Machine Learning

Permanent

Our client, a venture-backed AI Startup, is hiring a talented ML/AI Research Engineer to join their team in San Francisco. The successful candidate will lead the design, training, evaluation and optimization of agent-native AI systems, working at the cutting edge of LLMs, vector search, graph reasoning and reinforcement learning to build the intelligence layer on top of their enterprise data fabric.

Responsibilities

Fine-tune and evaluate open-source LLMs (e.g. LLaMA 3, Mistral, Falcon, Mixtral) for enterprise-grade applications.
Build and optimize RAG pipelines using tools such as LangChain, LangGraph, LlamaIndex or Dust.
Develop and iterate on agent architectures (ReAct, AutoGPT, BabyAGI, OpenAgents) using real-world enterprise workflows.
Design embedding-based memory systems with efficient, high-performance retrieval strategies.
Implement reinforcement learning pipelines (RLHF, DPO, PPO) to improve agent behavior and decision-making.
Create scalable evaluation frameworks, including synthetic evaluations, trace capture and explainability tooling.
Own model observability, drift detection and alignment strategies across production systems.
Optimize inference latency and GPU utilization across cloud and on-premise infrastructure.

Skillset

Strong experience fine-tuning open-source LLMs using frameworks such as HuggingFace, DeepSpeed, vLLM, FSDP and LoRA/QLoRA.
Hands-on experience with modern alignment techniques, including SFT, RLHF and DPO pipelines.
Proven ability to build high-quality training datasets and robust evaluation frameworks for LLM systems.
Deep understanding of scaling and optimization trade-offs, including batching, context windows, precision and quantization.
Experience building and deploying production-grade RAG systems.
Familiarity with orchestration and retrieval tools such as LangChain, LangGraph and LlamaIndex, and vector databases (Weaviate, Qdrant, FAISS).
Experience working across structured (SQL, graph) and unstructured data sources.
Experience designing agent-based systems with memory, tool use, and multi-step reasoning.
Strong understanding of agent workflows (e.g. Plan-Act-Reflect), including self-correction and multi-agent systems.
Expertise in inference and retrieval optimization, including chunking strategies, reranking, and low-latency deployment (e.g. vLLM, TGI).