Senior Site Reliability Engineer

Oracle

📍 United States, United States, United States

Regular Employee Computer Occupations Posted March 01, 2026

Job Description

Role Summary 

As a Senior Site Reliability Engineer (SRE), you will play a key role in ensuring the reliability, performance, and scalability of modern cloud-based AI applications for OCI Operations. This position involves close collaboration with development, operations, and security teams to automate processes, develop SRE standards, monitor system health, and maintain optimal uptime for critical AI applications. You will leverage your technical expertise to design, automate, and maintain AI services supporting mission-critical AI and ML initiatives.


What You'll Do

  • Design, implement, and maintain scalable, secure cloud infrastructure for AI Applications on OCI
  • Collaborate with Engineering teams to build robust automation to build, deploy, and scale resilient systems
  • Implement site reliability engineering best practices tailored for Applications: SLO/SLI definition, error budgeting, automated monitoring, data integrity validat...