Site Reliability Engineer (SRE)
M
Michael Page Colombia
📍 WorkFromHome, WorkFromHome, Colombia
Job Description
- Build reliable, scalable systems through automation and engineering.
- Improve service stability using SLOs, monitoring and incident response.
Acerca de nuestro cliente
A U.S.-based e-commerce organization specializing in personalized products, operating high-volume digital platforms supported by global teams. The company emphasizes technology-driven operations, strong customer experience, and scalable infrastructure to support rapid growth and large production capacity.
Descripción
Reliability & Performance
- Define and manage SLIs, SLOs, and error budgets.
- Improve system reliability, scalability, and resilience.
- Lead reliability reviews and prevent incidents proactively.
Observability & Monitoring
- Build and maintain monitoring, logging, and alerting.
- Ensure actionable alerts and effective dashboards.
- Implement distributed tracing. ...