Job Description
We are looking for an AI/ML Ops Engineer to support the deployment, monitoring, and operational reliability of AI-powered systems in production environments.This role combines elements of DevOps, cloud engineering, and AI system support. The ideal candidate should be comfortable working with cloud infrastructure, monitoring tools, and modern AI workflows, while collaborating closely with engineering and AI teams.Key ResponsibilitiesSupport deployment and operational management of AI/ML applications and servicesMonitor AI systems using logs, metrics, tracing, and observability toolsTroubleshoot and debug AI workflows, pipelines, and runtime failuresAssist in maintaining scalable, secure, and reliable cloud infrastructureSupport prompt experimentation, version tracking, and A/B testing activitiesCollaborate with engineering teams to improve system reliability, performance, and automationMaintain CI/CD workflows and deployment pipelines for AI servicesParticipate in incident investigation...