Senior Site Reliability Engineer - AI/ML optimized GPU clusters
The Next Chapter W&S
📍 Amsterdam, North Holland, Netherlands
Job Description
We focus on job opportunities in The Netherlands for IT and engineering professionals. We share relevant tips and tricks with jobseekers and we can support employers with regards to relocation, work permit rules, 30% ruling et cetera. We value transparancy, honesty and a no-nonsense approach based on our extensive technical and international recruitment expertise.
The organization
Our client operates one of the largest GPU infrastructures in the world — 90,000+ GPUs and 10InfiniBand fabrics across five global data centers. Their infrastructure doubles in size every year. We’re looking for engineers who love getting deep into Linux systems, pushing hardware and software to their limits, and making the world’s fastest AI and HPC workloads run even faster. They develop their own proprietary stack / cloud environment, fully optimized for AI/ML applications.
The role
Your responsibilities will include:
Ensure fault-tolerance, ...