Senior Software Development Engineer

Careers

Product

Menlo Park, CA (Hybrid)

∙

New York, NY (US)

∙

Toronto

∙

$150,000 - $250,000 + Equity + Benefits

Full-Time

Apply for role

About Vijil

Vijil is the trust infrastructure enterprises need to deploy AI agents with confidence. Our platform — Diamond, Dome, and Darwin — helps organizations evaluate agents before deployment, enforce governance at runtime, and continuously harden them using production telemetry. Founded in 2023 by senior leaders from AWS and backed by Brightmind, Gradient, and Mayfield, Vijil’s platform bridges the gap between capability and trust, enabling AI agents to move from concept to production seamlessly.

With $23M in recent funding and a Gartner® Cool Vendor designation, Vijil is building the future of autonomous resilience, helping companies move from prototype to production in days rather than months.

The Role

Vijil is looking for a senior backend engineer (10+ years of experience) to own the platform end-to-end: cloud infrastructure, CI/CD, production operations, and the AI agents and libraries that make up the product. You will work across multiple codebases on a small team, ship to customers every week, and be the engineer others turn to when something breaks or a new service needs to go out. Two things matter most: deep, hands-on cloud development experience, and real production experience with agentic AI systems.

What You Will Own

Own production end-to-end: cloud infrastructure, deployments, observability, on-call.
Own CI/CD: build, test, release, and image promotion across every repository we ship.
Build, deploy, and maintain the AI agents and libraries that make up our product.
Write production-quality backend code in Python.
Set the engineering bar: clean architecture, real tests against real systems, no silent failures.
Cut through cross-repo work that no one else has the context to finish.

This is not a research role. You will work alongside applied scientists who ship algorithms; your job is to make those algorithms run reliably for paying customers.

What We Are Looking For

You have spent the last decade shipping backend systems and running them in production. You have built or operated AI agents that other software relies on, so you know what these systems look like when they go wrong and what it takes to keep them running. You tend to read code before you write it, and you debug by tracing through the call chain rather than guessing. You hold strong architectural opinions and put them in writing, but once the team has decided, you build what was decided. You are comfortable working across many repositories at once because you have done it before. You are looking for a place where your work reaches customers quickly and your judgment carries weight from day one.

‍

Minimum Qualifications:

10+ years of professional software engineering, primarily backend.
Core cloud development competency. You have designed, deployed, and operated production services on a major cloud. You understand networking, identity (IAM or equivalent), storage (S3 or equivalent), and compute (EC2/EKS or equivalent) at the level needed to debug them under pressure. This is non-negotiable.
Hands-on experience building or operating agentic AI systems in production — LLM-backed agents that call tools, coordinate with other agents, and run as services. You can speak concretely about what failed in production and how you fixed it. This is non-negotiable.
Strong Python, including modern async, type hints, and production frameworks such as FastAPI and Pydantic.
Ownership of CI/CD pipelines you built or rebuilt yourself (GitHub Actions, AWS CodeBuild, or equivalent).
Container-based deployment and orchestration at production depth: Docker, Kubernetes, and Helm.
Working fluency with relational databases (PostgreSQL preferred), including writing migrations and debugging slow queries.
Disciplined Git workflow, code review hygiene, and semantic commit history. GitHub link appreciated.
Strong written communication. You write design docs that other engineers actually read.

‍

Preferred Qualifications:

Familiarity with agent interoperability standards such as MCP (Model Context Protocol) or A2A (Agent-to-Agent).
Experience with agent development frameworks such as Google ADK, LangGraph, or equivalent.
Production observability experience with OpenTelemetry or equivalent.
Experience operating a multi-service platform owned by a small team.
Open-source maintainer or contributor history.
Prior work on AI infrastructure, agent platforms, or evaluation systems.
Comfort working directly with applied scientists and translating research into production code.

‍

Why This Role

End-to-end ownership of production infrastructure and AI systems
Direct exposure to agentic AI and AI security at production scale
A high-impact engineering position within an early-stage company
The ability to influence architecture, operational standards, and engineering culture
Close collaboration with applied scientists turning cutting-edge AI research into customer-facing products
The opportunity to work alongside experienced leaders who previously built core AWS SageMaker infrastructure and enterprise AI systems