why this role matters
agentic platforms are the third wave of ai adoption, letting organisations delegate complex multi step work to autonomous llm powered "knowledge robots." winning teams pair retrieval augmented generation (rag) with low latency inference to deliver factual answers at scale. you will own that stack-research, optimisation, and production deployment-so we ship features that feel like magic to end users.
position overview
- own the agentic & rag roadmap - design, prototype, and launch llm agents (planner-executor, multi agent, tool calling) that hit sub second p95 latency in production.
- invent & productionise rag pipelines - embedding strategy, vector db design (weaviate, pinecone), hybrid search, evaluations, guard rails.
- fine tune frontier models with peft/lora, rlhf, safety alignment; publish research that moves the needle.
- optimise inference - quantisation (int4/8), speculative decoding, tensorrt llm/vllm or ray serve to cut cost per token by 40 %.
- lead & mentor a small, high agency team; codify mlops, ci/cd, observability, and data governance best practices.
- partner with product & design to turn research into delightful user features that 10 customer roi.
core qualifications
- experience - 5+ years in software/ml, including 2+ years shipping llm/nlp products at scale.
- deep learning stack - expert in python and pytorch (tensorflow / jax welcome); cuda or triton kernels a plus.
- agentic & rag frameworks - hands on with langchain, llamaindex, crewai; vector dbs - weaviate, pinecone, qdrant.
- model optimisation - quantisation, distillation, aws neuron or gpu kernel tuning.
- cloud & mlops - kubernetes, ray, sagemaker or gcp vertex; terraform/pulumi iac; structured observability.
- communication & leadership - writes crisp design docs and guides cross functional teams.
bonus skills
• multimodal agent systems (vision language, audio language).
• privacy preserving ml (federated learning, differential privacy).
• oss contributions to langchain, weaviate, pinecone, triton, vllm.
success metrics (first 6 months)
- ship v1 rag pipeline with
- cut inference cost per 1 k tokens by 40 %.
- publish a white paper/blog on agent orchestration improving tool reliability by 25 %.
- build & mentor a team of 3-5 engineers; institute automated eval harnesses and ci/cd for model releases.
tech stack you'll own
python pytorch jax ray serve kubernetes langchain/llamaindex weaviate/pinecone vllm/tensorrt llm aws bedrock/sagemaker pg vector prometheus + grafana
compensation & benefits
• base: 30-40 lpa (india)
• remote first flexibility, quarterly on sites.
hiring process
- intro chat - vision & culture fit.
- deep dive - solve an open ended agent/rag problem in our codebase.
- research case - present past optimisation work or a roadmap proposal.
- values & leadership interview with founders.
- offer - we aim to close within 2 weeks.
ready to build the future of agentic ai? email with:
• a resume / linkedin url.
• one or two flagship projects that prove you operate in the top 3 % of ai/ml engineers (1 sentence highlight for each).
• github or demo links if the work is open source.