Founding Site Reliability Engineer

RelevanceAI
San Francisco, United States
On-site
Full-time
Posted 25 days ago
Engineering

Job Description

Location 📍: San Francisco, USA (Hybrid 3 days/week)

About Us 🚀

At Relevance AI, we’re building the home of the AI workforce.

Our mission is simple: empower every team to delegate meaningful work to AI agents that think, act, and collaborate like experts.

With Relevance AI, anyone can create and manage intelligent agents that handle workflows, decisions, and collaboration - all within one unified platform. Our technology already powers industry leaders such as Canva, Databricks, Confluent, Autodesk, Lightspeed, Rakuten, Aveva, Qualified, and Activision Blizzard, helping them scale excellence across operations, marketing, and sales.

We’re backed by Bessemer Venture Partners, Insight Partners, Peak XV, and King River Capital, and raised our Series B in April 2025 to accelerate growth and push the boundaries of agentic automation.

Headquartered in San Francisco and Sydney, we operate on a hybrid model and thrive on curiosity, collaboration, and execution - we move fast, think big, and win together.

This year, we were proud to be named LinkedIn’s #1 Startup in Australia.

If you want to define how the world works with AI, join us.

The Role 🧠

We’re looking for a Founding Site Reliability Engineer to join us as our first SRE hire in San Francisco. We are open to hiring someone who is Senior, Lead or Principal level and will be candidate led. This role is perfect for someone ready to establish and scale the SRE discipline from the ground up in one of the fastest-growing AI companies globally.

You’ll own the reliability, scalability, and security of our platform as we power tens of thousands of multi-agent workloads across multiple regions. You’ll partner closely with our founders, engineering leads, and product teams to define our reliability culture, shape long-term strategy, and build world-class infrastructure for enterprise scale.

What You’ll Do 💪

  • Own SRE establishing best practices, tooling, and culture

  • Tackle reliability challenges unique to multi-agent orchestration at enterprise scale

  • Guarantee >99.9% uptime of production systems, ensuring reliability at global scale

  • Architect and automate AWS infrastructure with Terraform and CI/CD pipelines

  • Design observability systems across microservices, APIs, and vector infrastructure (metrics, tracing, logging)

  • Drive down incidents and MTTR through runbooks, alerting, and incident response excellence

  • Help scale infra to support hundreds of thousands of agents and billions of API calls

  • Partner with engineering teams to embed SRE principles into the SDLC and shape org-wide reliability strategy

  • Act as a founding voice in our SF office, influencing product direction and engineering culture

What We’re Looking For 🧠

  • 5+ years in SRE/DevOps/Infrastructure roles, with experience in enterprise SaaS environments.

  • Deep AWS expertise (EC2, ECS/EKS, Lambda, RDS, VPC, IAM).

  • Proven track record with Infrastructure as Code (Terraform, Kubernetes/EKS, CDK, or CloudFormation).

  • Hands-on with observability stacks (CloudWatch, Grafana, Prometheus, Datadog).

  • Incident management experience in production SaaS systems, including on-call, postmortems, and reliability improvements.

  • Bonus: Prior exposure to AI/ML platforms, data-heavy systems, or multi-agent workloads.

Tech Stack 🧰

AWS, Kubernetes/EKS, Terraform, GitHub Actions, Postgres/Mongo, Prometheus/Grafana, CloudWatch, PagerDuty/BetterStack

Benefits ✨

  • 🩺 Health Insurance Contribution – Relevance AI contributes to the cost of individual medical, dental, and vision insurance for employees.

  • 🚍 Commuter Benefits – Save on your commute with pre-tax deductions for transit and parking expenses

  • 🏖️ Unlimited Annual Leave – Flexible time off policy to rest, recharge, and take care of what matters most

  • 📈 ESOP – Employee Stock Ownership Plan so you can grow with the company

  • 🤖 AI Productivity Benefit – Get up to $1200 USD/year to spend on AI tools, courses, and learning resources that help you work smarter and grow your skills

  • 👶 Parental Leave – We offer 12 weeks of paid parental leave for all eligible new parents, and an additional 6 weeks for the birthing parent

  • 🎉 Milestone Merch – Celebrate your work anniversaries with customised Relevance AI swag

  • 🍿 Food, Drinks & Community – Stay energised with free breakfasts, healthy snacks, and a fully stocked fridge of drinks. Enjoy team lunches provided every Thursday and Friday, plus Uber Eats dinners and regular catered office meals throughout the week. As the home of the AI workforce, we also host vibrant community events featuring thought leaders, industry partners, and the wider tech community.

  • 🪩 Quarterly Team Events – Build stronger connections through fun, meaningful team bonding experiences every quarter

  • 🏓 Social Clubs – Share your hobbies and interests by joining or starting a club with your teammates. From hiking and chess to board game nights and social committee activities—there’s something for everyone!

  • 🧠 Sonder EAP – Access 24/7 mental health and wellbeing support through Sonder, our Employee Assistance Program

Responsibilities

🧠

Disclaimer: Real Jobs From Anywhere is an independent platform dedicated to providing information about job openings. We are not affiliated with, nor do we represent, any company, agency, or agent mentioned in the job listings. Please refer to our Terms of Services for further details.