Job Description
Job Title: Site Reliability Engineer
Job Location: Remote
Job Type: Contract
- Operate and maintain infrastructure platforms ensuring stability, availability, and performance
- Monitor platform health and performance metrics using observability and alerting tools
- Respond to operational incidents and platform issues as part of a 24/7 on-call rotation
- Execute routine operations, deployments, and maintenance tasks using configuration management tools
- Troubleshoot and resolve platform issues following documented procedures and runbooks
- Create and maintain operational documentation, including runbooks and troubleshooting guides
- Implement automation scripts to reduce manual operational tasks
- Collaborate with team members to ensure infrastructure availability and reliability
- Participate in change management processes following established procedures
- ...