Site Reliability Engineer
L
Levi Strauss & Co.
📍 ciudad de méxico, ciudad de méxico, Mexico
Job Description
About the Job
An opportunity to grow your SRE craft in a fast-paced, collaborative environment on Google Cloud Platform, with exposure to multi‑cloud technologies and modern data engineering.
Reliability & Incident Response
- Monitor production systems using observability tooling — dashboards, alerts, and logs — to detect and triage issues before they impact end users
- Participate in on‑call rotations, respond to incidents following established runbooks, and escalate appropriately when needed
- Contribute to blameless post‑mortems, documenting root causes and follow‑up action items to prevent recurrence
- Help maintain and improve SLO dashboards and alerting thresholds to ensure platform health is visible and measurable
Toil Reduction & Automation
- Identify repetitive manual tasks and build automation to eliminate them, reducing toil for yourself and the broader team
- Write and maintain scripts, ...