Job Description
- Design, implement, and maintain data pipelines to ingest and process OpenShift telemetry (metrics, logs, traces) at scale.
- Stream OpenShift telemetry via Kafka (producers, topics, schemas) and build resilient consumer services for transformation and enrichment.
- Engineer data models and routing for multi-tenant observability; ensure lineage, quality, and SLAs across the stream layer.
- Integrate processed telemetry into Splunk for visualization, dashboards, alerting, and analytics to achieve Observability Level 4 (proactive insights).
- Implement schema management (Avro/Protobuf), governance, and versioning for telemetry events.
- Build automated validation, replay, and backfill mechanisms for data reliability and recovery.
- Instrument services with OpenTelemetry; standardize tracing, metrics, and structured logging across platforms.
- Use LLMs to enhance observability capabilities (e.g., query assistance, anomaly ...