Senior ML Engineer - Agentic AI

SS&C Technologies, Inc.
parental leave, paid time off, 401(k)
United States, Massachusetts, Waltham
Apr 16, 2026
As a leading financial services and healthcare technology company based on revenue, SS&C is headquartered in Windsor, Connecticut, and has 27,000+ employees in 35 countries. Some 20,000 financial services and healthcare organizations, from the world's largest companies to small and mid-market firms, rely on SS&C for expertise, scale, and technology. Job Description Senior ML Engineer - Agentic AI Location: Waltham, MA(Hybrid) About the Role You'llbe joining a collaborative, fast-moving team of data scientists, AI engineers, machine learning engineers, and data engineers who work together to tackle complex, high-impact problems at the intersection of AI and enterprise software. Weoperatewith a genuinely agile mindset - shipping iteratively, challenging assumptions, and staying close to thecutting edge. The team is proactive about research, consistently evaluating and adoptingstate-of-the-artmethodologies, and we encourage everyone to experiment, share findings, and bringnew ideasto the table. If you thrive in an environment where intellectual curiosity is the norm and the work is always evolving,you'llfit right in. Why Join SS&C SS&C combines proprietary technology with deep industry expertise to support complex financial and health care operations. Our teams design, implement, and operate solutions that help clients manage data, automate processes, and scale their businesses with confidence. You will work with industry experts, modern platforms, and evolving technologies, gaining exposure to real-world operational challenges and large-scale enterprise environments. How You Will Make an Impact Agent Design & Implementation Build anditerate onAI agents end-to-end - from defining goals, personas, and constraints to wiring LLM reasoning with tool execution. Design and build end-to-end autonomous AI agents that take user input, reason across multi-step workflows via large language models, and execute actions through APIs, tools, and enterprise data sources. Implement planning strategies includingchain-of-thought,ReActloops, and hierarchical task decompositionto enable agents to solve multi-step problems autonomously. Design and refinesystem prompts, few-shot examples, and guardrailsthat shape agent behavior, tone, and decision-making boundaries Implement advanced planning and memory systems - including chain-of-thought loops,ReActpatterns, deep plan and hierarchical planners - that allow agents to decompose complex tasks,retaincontext, and improve over time using vector database memory. Terminal-Based AI Coding & Development Work extensively insideAI-powered coding terminals(OpenCode,DeepCode, Hermes Agent) as both a user and a builder - understanding how these tools orchestrate LLM calls, file edits, and shell commands. Contribute to the development ofcustom coding agent workflowsthat automate code generation, review, refactoring, and testing tasks. Evaluate and benchmark terminal agent behaviors: accuracy of code edits, hallucination rates, contextutilization, and multi-file reasoning. Model Context Protocol (MCP) & Tool Integration Build and extendMCP servers(usingFastMCPand similar frameworks) that expose internal tools, databases, and APIs as structured capabilities for agents. Designtool schemas, descriptions, and invocation patternsthat LLMs can reliably discover and call. Integrate agents with external services - REST APIs, vector stores, graph databases, and internal SDKs - through well-defined MCP interfaces. CI/CD Pipeline Containerize and deploy production-grade agent services on Kubernetes, including building CI/CD pipelines, autoscaling configurations, and infrastructure-as-code with Docker and Helm. Define evaluation frameworks and observability pipelines (LangFuse) to measure agent performance - covering retrieval accuracy, tool-selection correctness, and hallucination rate - and use insights to drive continuous improvement via A/B testing and audits. Embed security and compliance into every layer: IAM token, Connect tokens, prompt injection mitigations, etc. Partner closely with data scientists, ML engineers, product managers, and security teams to translate complex business needs into robust, scalable agentic solutions. LLM Experimentation & Evaluation Experiment with a range ofopen-source LLMs(Qwen, DeepSeek, finetune domain-specific models) to evaluate reasoning quality, latency, cost, and tool-use reliability. Explore inference optimizations such asspeculative decoding, constraint decoding, structured outputs, and router-mode orchestration. Build and runevaluation pipelinesto measure retrieval accuracy, tool-selection precision, hallucination rate, and end-to-end task completion. Memory & Retrieval Systems Integrate agents with a broad ecosystem of external systems: vector stores (PgVector, Milvus), relational and graph databases, REST APIs, and internal microservices, all managed through secure, least-privilege access patterns. Design and testmemory architectures- short-term (conversation context), long-term (vector-stored interaction history), and episodic (task-specific recall) - to improve agent continuity and personalization. Required Experience Strong Python Engineering- Expert-level Python skills across OOP, async programming, testing, and packaging, with the ability to write clean, modular, production-grade code.Familiarity with JavaScript/TypeScript is a plus for UI integrations or edge agent work. AI & LLM Expertise- A solid foundation in how large language models work, including transformer architecture, tokenization, context windows, and prompting strategies, paired with hands-on experience using LLM APIs from providers like OpenAI, Anthropic, Hugging Face, andOllama. You understand embeddings, vector similarity search, and RAG pipelines and know how to apply them in production systems. Agentic AI Development- Proven experience designing and building autonomous AI agents using frameworks such asLangChain,LangGraph,LlamaIndex,CrewAI,AutoGen, and Semantic Kernel.You'refluent in agent design patterns -ReActloops, chain-of-thought planning, hierarchical task decomposition, and multi-agent coordination - and can translate complex business workflows into reliable agentic solutions. MCP & Tool Integration- Familiarity with Model Context Protocol architecture and the ability to build and extend MCP servers usingFastMCPto expose enterprise tools, databases, and APIs as structured, agent-callable capabilities with well-defined schemas and invocation patterns. Framework-Level Customization & Source Code Engineering- The ability to go beyond standard library usage and work directly at the source code level of AI frameworks. This includes forking andmodifyingopen-source agent libraries, fusing capabilities across frameworks - such as integratingLangGraph'sReActagent architecture with the deep planning capabilities ofOpenCode-style coding agents to produce a hybrid base agent - and packaging those modifications cleanly for team-wide reuse. You know how to navigate unfamiliar codebases quickly, make targeted changes without breaking upstream compatibility, and contribute back whereappropriate. Custom Agent Packaging & CI/CD Deployment- Experience containerizing heavily customized agent builds - including forked ormodifiedthird-party libraries - into reproducible Docker images and shipping them through automated CI/CD pipelines (GitHub Actions, Jenkins). You can managedependencypinning, versioning, and environment parity across dev, staging, and production, ensuring that bespoke framework modifications are treated as first-class, production-grade software rather than one-off hacks. Memory & Retrieval Systems- Hands-on experience with vector databases (PgVector, Milvus) and the ability to design short-term, long-term, and episodic memory architectures that improve agent continuity and personalization over time. Data Engineering & Databases- Working knowledge of relational (PostgreSQL), NoSQL (MongoDB), and graph databases (Neo4j), along with experience building ETL pipelines tomaintainagent knowledge bases. DevOps & Deployment- Comfort containerizing and deploying agent services using Docker and Kubernetes, building CI/CD pipelines withJenkins, and working with cloud ML platforms such as AWS. Evaluation & Observability- Experience defining and running evaluation frameworks - using tools likeLangFuseor custom harnesses - to measure retrieval accuracy, tool-selection precision, hallucination rates, and end-to-end task completion, then using those insights to drive continuous improvement. Security & Governance- Awareness of secure coding practices, IAM and token-based authentication, prompt injection mitigations, and data encryption, with a mindset of embedding compliance into every layer of agent design. Curiosity & Learning Agility- A genuine habit of staying at thecutting edge- reading research, experimenting with emerging models, and building side projects.You'recomfortable with ambiguity, thrive in fast-moving environments, and bringnew ideasto the table without waiting to be asked. Hands on experience in below stack: Agent Frameworks(e.g.,OpenCode, Claude Code,LangChain,LangGraph,LlamaIndex,CrewAI,AutoGen) provide abstractions for prompt chaining, tool integration, planning loops, and multi-agent coordination, making them essential for building complex agent workflows. Models / LLMs- including open-source LLMs with speculative decoding, constraint decoding, router mode, and similar techniques - serve as the reasoning engine for agents. The right choice depends on cost, latency, context size, and task complexity, with fine-tuned or domain-specific models available for specialized use cases. Memory Storessuch as Milvus andPgVectorare vector databases that store embeddings for retrieval-augmented generation (RAG), alongside document stores likeElasticSearchand S3 for raw data persistence. Databasesincluding PostgreSQL,Milvus,MongoDB, MySQL, and Neo4j serve as structured and graph data sources for agent lookups, status checks, and relationship-aware reasoning. Tool Access / Services- spanning MCPs,FastMCP, REST APIs, AWS Lambda, Microservices, and Internal SDKs - form the agent action layer, triggering business logic, external APIs, and microservices through unified access via Model Context Protocol (MCP) wrappers. Orchestration, handled by a Custom Orchestration Framework, manages multi-step workflows and multi-agent coordination with retry logic and reliable execution. Deploymenttechnologies like Docker, Kubernetes, and Serverless host and scale agents, with serverless suited for stateless or bursty tasks and Kubernetes for stateful or GPU workloads. CI/CD & Infratools - Jenkins, Helm, and Git - automate the testing and deployment of agent services while managing infrastructure and secrets as code. Monitoringsolutions such asLangFuseand Custom SDKs capture logs, metrics, and traces of agent behavior, providing observability into prompt outputs, tool usage, latencies, and errors. Single-Agent Pipelineis the simplest pattern, where one agent handles an end-to-end task (input LLM tools output), often as an iterative RAG setup. It is ideal for narrow, well-defined tasks like FAQ bots or basic assistants, and is the easiest pattern to develop and debug. Multi-Agent Orchestrationinvolves multiple specialized agents coordinating together - one retrieves data, another analyzes it, and a supervisor routes tasks - in either sequential or hierarchical arrangements. This pattern suits complex workflows such as data analysis combined with report generation, and improves modularity, scalability, and maintainability. Serverless (Function) Agentsdeploy lightweight agent logic in serverless functions like AWS Lambda or Cloud Run, scaling automatically with demand. They are best for stateless or event-driven tasks requiring quick API responses, offering rapid deployment with minimal infrastructure overhead. Containerized Microservicesruneach agent in its own container managed on Kubernetes, enabling persistent, stateful agents with custom runtimes and GPU access. This pattern is suited for heavy ML workloads, GPU acceleration, low-latency requirements, and complex dependency management. On-Premises / Edge Deploymenthosts agents on private hardware or edge devices where data cannot leave the environment. This isrequiredfor sensitive industries such as healthcare and defense, and reduces cloud costs for constant, high-throughput workloads. Hybrid Modelcombines cloud infrastructure for scale with on-premises or edge deployment for data privacy and compliance. It is ideal when sensitive data must remain on premises while stillleveragingcloud resources for heavy computation. Join SS&C, where innovation meets global opportunities. Click here to apply. #LI-PE1 #LI-HYBRID Unless explicitly requested or approached by SS&C Technologies, Inc. or any of its affiliated companies, the company will not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. SS&C Technologies offers a comprehensive total rewards package designed to support your wellbeing, growth, and future. Our benefits include medical, dental, and vision coverage; a 401(k) plan with company match; paid time off, holidays, and parental leave; and professional development reimbursement opportunity. Actual base salary will vary based on several factors, including but not limited to relevant skills, prior experience, education, demonstrated performance, and geographic location. Massachusetts: The expected base salary for the position is between 140,000 USD to 150,000 USD. Applications will be accepted on an ongoing basis until the position is filled. SS&C Technologies is an Equal Employment Opportunity employer and does not discriminate against any applicant for employment or employee on the basis of race, color, religious creed, gender, age, marital status, sexual orientation, national origin, disability, veteran status or any other classification protected by applicable discrimination laws.