Job Description
Responsibilities:
What Will Be Expected of You:Design, develop, and maintain reliability solutions and SRE utilities to reduce toil, improve cloud platform reliability, and industrialize SRE practices across the systemBuild and optimize Infrastructure as Code (IaC) using Terraform to manage AWS resources related to SRE solutions, incorporating cost-efficient design principlesDevelop CI/CD pipelines and automated testing to ensure code quality, reliability, and rapid delivery of the solutionsDefine SRE standards, best practices, and guidelines for adoption across teams; establish SRE metrics like SLI, SLOs, etc.Participate in incident management and on-call rotation, providing technical support for SRE tools, troubleshooting production issues, and collaborating with teams to reduce incident recurrence through proactive detection and pattern analysisStay current with emerging AWS services, SRE methodologies, and ...