Senior Incident Manager (Remote - US)

Jobgether

Apply Now

United States

•On-site•

Full-time

•

Posted 8 months ago

Apply Now

Job Description

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Incident Manager in the United States.

This role offers a critical leadership opportunity in managing high-impact incidents for cloud-based services. You will coordinate cross-functional teams during major incidents, ensuring swift resolution while maintaining clear, accurate, and timely communication with stakeholders and customers. The position combines operational leadership, technical expertise, and strong communication skills to drive reliability, root cause analysis, and continuous improvement. You will mentor peers, improve incident response processes, and influence how complex distributed systems are monitored and maintained. This role is ideal for someone passionate about operational excellence, proactive problem solving, and driving confidence in technical systems during high-pressure events.

Accountabilities:

Lead critical production incidents, coordinating multi-disciplinary response teams to mitigate impact and restore operations rapidly.
Drive root cause analysis and collaborate with engineering teams to implement long-term reliability improvements.
Summarize key learnings from incidents, communicate actionable items, and ensure follow-through of technical and procedural improvements.
Own incident communications, providing timely and accurate updates to internal stakeholders and empathetic, customer-facing notifications.
Mentor and train colleagues in incident management, communication best practices, and technical response strategies to elevate the overall team performance.
Continuously refine incident response processes, playbooks, and automation to improve efficiency and reduce downtime.

Requirements

5+ years of experience in incident management, site reliability engineering, or production operations for large-scale, cloud-native systems.
Proven ability to lead high-severity incidents, identify impacts, isolate fault domains, and coordinate multi-team responses.
Strong knowledge of cloud infrastructure (AWS, Azure, or GCP) including compute, networking, storage, and observability.
Hands-on experience with log analysis, debugging, and observability systems (Datadog, Elasticsearch, Splunk, Prometheus, Grafana, OpenTelemetry, etc.).
Proficiency in at least one programming or scripting language (Python, Go, Bash) for diagnostics and automation.
Experience creating and maintaining incident playbooks and communication templates for consistent, high-quality updates.
Exceptional communication and writing skills to summarize complex technical situations for both technical and business audiences.
BS, Master’s, or advanced degree in Computer Science, Computer Engineering, or related technical field.

Ready to Apply?

Take the next step in your career journey

Apply Now

Similar Jobs

Field Engineer - High Voltage (Remote - US)

Jobgether

Senior Incident Manager (Remote - US)

Job Description

Requirements

Ready to Apply?

Similar Jobs

Field Engineer - High Voltage (Remote - US)

Sr. Project Manager (Remote - US)

Senior Software Engineer - Backend - Growth Platform (Remote - US)

Senior Application Security Engineer (Remote - US)

Engineering Manager - CAD/3D Research and Novel Algorithms (Remote - US)

Implementation Engineer (Remote - US)

Senior Data Engineer (Remote - US)

Staff Mobile Engineer (Android) (Remote - US)

IoT Security Consultant- Remote (Anywhere in the U.S.)

Senior Software Engineer (TypeScript) - AI/ML (Remote - US)

Design Director (Remote - US)

Senior Product Manager, Reporting & Analytics (Remote - US)

Firefox OS Integration Engineer, Mac OS Engineering (Remote - US)

Photonics Engineer (REmote - US)

Sr. Security Program Manager (Remote - US)