Senior Site Reliability Engineer - APAC
Tyk TechnologiesJob Description
Who are Tyk, and what do we do?
The Tyk API Management platform is helping to drive the connected world and power new products and services. We’re changing the way that organisations connect any number of their systems and services. Whether internal, external, public or highly encrypted systems, Tyk helps businesses drive value across the retail, finance, telecoms, healthcare, or media industries (to name just a few!)
If you’ve banked online, used an app to check the news, or perhaps even driven a connected car, API’s, and by extension, Tyk, make that possible. Founded in 2015 with offices in London - UK, London - Ontario, Atlanta and Singapore, we have many thousands of users of our B2B platform across the globe. Brands using Tyk range from Lotte, Bell, T Mobile, to RBS, Capital One and Vinci. We have a varied user base hailing from every continent – even Antarctica.
Our Mission
Tyk is on a mission to connect every system in the world. We’ve started by building an API Management platform.
Total flexibility, default remote, radical responsibility
We offer unlimited paid holidays and remote working from anywhere in the world, for everyone, Why? Tyk was founded on the principle of offering flexibility and autonomy to our employees, we believe this allows our employees to achieve their best results. It also means we can build the best possible team, location and working hours are no barrier.
If this sounds like an environment that you believe could work for you then read on to find out more.
The role:
At Tyk, we’re obsessed with building software that solves problems. We count on our Site Reliability Engineers (SREs) to empower users with a rich feature set, high availability, and stellar performance level to pursue their missions.
Our customer base is growing, so we’re seeking an experienced Senior SRE to optimise, automate, and improve our performance, using insights from massive-scale data in real time. We want an original thinker, a challenger, a technical legend, an opinionated collaborator who wants to make things better.
Requirements
- Lead hands-on maintenance and optimization of our global Cloud platform within SL(A/I/O)s you'll help define
- Collaborate to shape SRE strategy, then translate into actionable technical plans coordinated through SCRUM
- Identify reliability issues, drive root cause analysis, and implement solutions alongside your squad
- Lead performance tuning and fault finding through analysis of OS and application metrics
- Design and implement automation for common operational tasks and cloud-operations workflows
- Develop proactive alerting, monitoring roadmap, and relevant dashboards; define and track KPIs
- Participate in on-call rotation, ensuring effective incident response and resolution within SLAs
- Conduct blame-free postmortems, document findings, and maintain operational runbooks
- Drive multi-region and multi-cloud platform expansion with focus on scalability and automation
- Optimize infrastructure performance and cost efficiency without impacting service delivery
- Engage with commercial teams on growth plans and translate into technical SRE strategies
- Coordinate penetration testing through provider liaison, technical setup, and environment configuration
- Champion continuous improvement across processes, communication, and team practices
- Model excellence in software design and knowledge sharing
- Plan and execute software upgrades to enhance cloud services
Experience required:
- Experience in an SRE role
- Strong knowledge of cloud technologies and SLA SLO SLI management
- Excellent communication and leadership skills
- Ability to analyze and improve operational processes and performance metrics
- Experience in software design, automation, and root cause analysis
- On-call support experience and customer-focused mindset
- Collaborative attitude with commercial and technical teams
- Launching and operating production Kubernetes clusters
- Designing and operating infrastructure on AWS and other providers
- Operating MongoDB (or other document database) clusters
- Operating Redis (or other key-value storage) clusters
- Administering Linux servers
- Operating Prometheus and Grafana
- Operating logging collection and analysis system
- Participating in the on-call rotation (4:00am - 16:00pm UTC)
Skills:
- Kubernetes (administrator)
- Go and/or Python (advanced)
- AWS/ EKS (advanced)
- Linux (advanced)
- Terraform and IaC in general (proficient)
- Helm (proficient)
- MongoDB (or similar)
- Redis (or similar)
- Monitoring – prometheus, grafana, thanos (familiar)
- Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.)
- Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP)
- Proactive, energetic, innovative and change oriented
- A desire to lead/mentor a team
Similar Jobs
Site Reliability Engineer
Tyk Technologies
Enterprise Account Executive - EMEA
Tyk Technologies
Mid-Level Full-Stack Engineer (React + Go) - EMEA
Tyk Technologies
Training and Development Specialists - Contract (Remote)
Fixpoint
Transportation, Storage, and Distribution Managers - Contract (Remote)
Fixpoint
Telephone Operators - Contract (Remote)
Fixpoint
Receptionists and Information Clerks - Contract (Remote)
Fixpoint
Production, Planning, and Expediting Clerks - Contract (Remote)
Fixpoint
Occupational Health and Safety Technicians - Contract (Remote)
Fixpoint
Nuclear Technicians - Contract (Remote)
Fixpoint
Retail Marketing Coordinator
Activate Talent
Director of Business Systems - Remote
PayNearMe
Executive Recruiting Coordinator
OpenAI
Regional Manager, Cross Border (Business Process Management)
ninjavan
Interior Designer (Revit Specialist)
D2B
Disclaimer: Real Jobs From Anywhere is an independent platform dedicated to providing information about job openings. We are not affiliated with, nor do we represent, any company, agency, or agent mentioned in the job listings. Please refer to our Terms of Services for further details.
