Senior Data Platform Reliability Engineer
N
NISPI
📍 Mandaluyong, Metro Manila, Philippines
Job Description
Key ResponsibilitiesOperate & Scale Managed Services
- Run multi-tenant data and AI platforms (Spark, Airflow, Flink, Jupyter) with clearly defined SLAs, SLIs, and SLOs
- Own capacity planning, cost optimization, and usage guardrails across AWS/GCP and Kubernetes
- Ensure predictable, reliable operations across multiple customers and environments
Reliability & Incident Leadership
- Be the face of reliability, leading incidents end-to-end
- Own customer communications, status updates, and post-incident reviews (RCAs) with clear, visible action items
- Drive continuous improvement based on incident learnings
Customer Experience & Enablement
- Partner with data scientists and customer teams to reduce failed or slow jobs
- Improve time-to-data, pipeline reliability, and overall platform performance
- Optimize platform usage and costs so customers experience faster pipelines and fewer surp...