Senior Manager, Machine Learning Engineer - ML Ops
![]() | |
![]() | |
![]() United States, California, San Jose | |
![]() 170 W Tasman Dr (Show on map) | |
![]() | |
Applications are accepted until further notice Who We Are The Cisco's AI team consists of AI researchers, and software developers who collaborate to build innovative products and platforms for Cisco. We are motivated by t About the Role We are seeking a highly experienced Senior Engineering Manager to lead teams building, deploying, and optimizing Large Language Model (LLM)-based applications, with a strong emphasis on LLMOps (LLM operations), Retrieval-Augmented Generation (RAG) pipelines, and scalable production systems. This role involves managing cross-functional engineers, collaborating closely with product, ML research, and infrastructure teams, and ensuring the successful delivery of robust, secure, and efficient AI-powered systems. Key Responsibilities Team Leadership & Management Lead and grow a high-performing engineering team focused on LLM applications and infrastructure. Foster a culture of engineering excellence, continuous learning, and innovation. Drive team performance through mentoring, goal-setting, and technical guidance. LLMOps & Platform Engineering Design and oversee scalable LLMOps pipelines including fine-tuning, evaluation, deployment, monitoring, and optimization of large language models. Work closely with ML researchers to transition experimental models into production. Manage model lifecycle tooling (e.g., LangChain, MLflow, Weights & Biases, Hugging Face, Ray). Retrieval-Augmented Generation (RAG) Oversee the design and implementation of RAG pipelines including vector database management, chunking strategies, embedding selection, retrieval tuning, and relevance evaluation. Optimize latency, accuracy, and context window handling for high-traffic LLM services. Architecture & Scalability Own architectural decisions for high-availability, low-latency systems powering generative AI applications. Collaborate with infrastructure and DevOps teams on scaling inference workloads (e.g., with GPU clusters, model quantization, caching, and sharding). Cross-Functional Collaboration Work with product, design, and data science to define requirements, translate business needs into engineering tasks, and prioritize effectively. Maintain high communication standards across teams, ensuring alignment and transparency. Quality, Security, and Governance Champion model observability, incident response, prompt versioning, and feedback loops. Ensure responsible AI practices and data governance are followed. Qualifications Required 8+ years of software engineering experience, with 3+ years in engineering management or technical leadership roles. Proven track record of shipping production-grade ML/LLM systems. Strong understanding of LLMs, fine-tuning, prompt engineering, vector databases (e.g., Pinecone, Weaviate, FAISS), and RAG patterns. Experience with cloud-native architectures (AWS, GCP, or Azure) and container orchestration (Kubernetes). Proficiency in Python and familiarity with AI/ML frameworks such as PyTorch, Transformers, LangChain, or similar. Preferred Experience managing or working with multi-modal or multi-agent systems. Exposure to regulatory or compliance frameworks for ML systems (e.g., GDPR, SOC 2). Hands-on experience with observability and evaluation tools for LLMs. #WeAreCisco#WeAreCisco where every individual brings their unique skills and perspectives together to pursue our purpose of powering an inclusive future for all. Our passion is connection-we celebrate our employees' diverse set of backgrounds and focus on unlocking potential. Cisconians often experience one company, many careers where learning and development are encouraged and supported at every stage. Our technology, tools, and culture pioneered hybrid work trends, allowing all to not only give their best, but be their best. We understand our outstanding opportunity to bring communities together and at the heart of that is our people. One-third of Cisconians collaborate in our 30 employee resource organizations, called Inclusive Communities, to connect, foster belonging, learn to be informed allies, and make a difference. Dedicated paid time off to volunteer-80 hours each year-allows us to give back to causes we are passionate about, and nearly 86% do! Our purpose, driven by our people, is what makes us the worldwide leader in technology that powers the internet. Helping our customers reimagine their applications, secure their enterprise, transform their infrastructure, and meet their sustainability goals is what we do best. We ensure that every step we take is a step towards a more inclusive future for all. Take your next step and be you, with us. #CiscoAIJobs |