Ai infrastructure engineer

HCLTech

📍 Noida, Uttar Pradesh, India

Full-time Computer Occupations Posted March 02, 2026

Job Description

AI Infrastructure Engineer- L3
The Role
The AI Infrastructure Engineer (L3) provides advanced engineering and architectural expertise for high‑performance AI and ML infrastructure. This role focuses on building, optimizing, and scaling GPU/accelerator environments and distributed systems for large‑scale training and inference workloads.
Competency Focus: High‑performance computing (HPC), distributed systems, Kubernetes, GPU orchestration, cloud optimization
Keywords: Nvidia GPU Infrastructure, Kubernetes, GPU Cluster Administrator, Infrastructure SME, RCA
Responsibilities:
Deploy, configure, and manage GPU and AI accelerator platforms (NVIDIA A100/H100/L40, AMD Instinct, TPU).
Troubleshot GPU hardware and software issues, including failures, thermal throttling, PCIe/NVLink topology, and driver conflicts.
Install, upgrade, and maintain GPU software stacks, including drivers, CUDA, cu DNN, Tensor RT, and firmware.
Perform capacity planning and resource optimiza...