Senior Data Platform Reliability Engineer

NISPI

📍 Mandaluyong, Metro Manila, Philippines

Full-time Other-General Posted March 03, 2026

Job Description

Key ResponsibilitiesOperate & Scale Managed Services

  • Run multi-tenant data and AI platforms (Spark, Airflow, Flink, Jupyter) with clearly defined SLAs, SLIs, and SLOs
  • Own capacity planning, cost optimization, and usage guardrails across AWS/GCP and Kubernetes
  • Ensure predictable, reliable operations across multiple customers and environments

Reliability & Incident Leadership

  • Be the face of reliability, leading incidents end-to-end
  • Own customer communications, status updates, and post-incident reviews (RCAs) with clear, visible action items
  • Drive continuous improvement based on incident learnings

Customer Experience & Enablement

  • Partner with data scientists and customer teams to reduce failed or slow jobs
  • Improve time-to-data, pipeline reliability, and overall platform performance
  • Optimize platform usage and costs so customers experience faster pipelines and fewer surp...