KPMG Nederland

Machine Learning Engineer – GPU Acceleration & Distributed Training

2 December 2024
Apply Now
Deadline date:
£79000 - £148000 / year

Job Description

 

Are you passionate about how technology can make a real impact in cancer? Join us at kaiko.ai in building the state-of-the-art Data & AI platform, enabling large-scale training of multi-modal foundation models, and transforming the clinical workflow to deliver better patient outcomes.

Our culture

At Kaiko, we have an open, creative and non-hierarchical work atmosphere which offers continuous learning and direct impact in return for accountability and team spirit.

We offer flexibility – for instance, through remote working – alongside an expectation for managing and delivering your own goals; our team’s ownership, passion and shared commitment to improving health outcomes through data is something that sets us apart.

At the intersection of healthcare and data we recognize the implications on wellbeing and trust and approach our work with the utmost sensitivity. Data privacy, compliance and security are core to everything we do. Our open, creative environment gives talented people room to explore new ideas and we reward this with an attractive package and opportunities for further personal development.

 

About the Role

As a Machine Learning Engineer specializing in GPU acceleration and distributed training, you will focus on enhancing the efficiency of handling very long sequence lengths in Transformers, State Space Models (SSM) and other architectures using CUDA/Triton & Torch. Additionally, you will scale training processes across multi-node distributed systems to ensure robust and efficient model development. You will work closely with our ML Research teams to build and maintain high-performance training pipelines.

How you’ll contribute

  • Efficiency Optimization: Leverage CUDA, Triton and Torch to improve the efficiency of Transformers, SSMs and other architectures for very long sequence lengths.
  • Distributed Training: Scale custom machine learning training pipelines efficiently across multi-node GPU clusters.
  • Collaboration: Work with ML Researchers and Engineering teams to integrate optimized training solutions into the development lifecycle.


What you’ll bring

  •  Master’s degree in computer science, Engineering, or a related field. Ph.D. is a plus.
  •  Proficient in Python with extensive experience in PyTorch.
  • Deep expertise with CUDA and/or Triton for optimizing GPU performance, specifically for large-scale sequence processing.
  •  Proven experience in scaling machine learning trainings to multi-node distributed GPU environments.
  •  Strong understanding of Transformer, State Space Models (SSMs) and other common architectures and their optimization.
  • Skilled in performance tuning and profiling for both software and hardware in machine learning contexts.
  •  Ability to diagnose and resolve complex technical challenges related to GPU acceleration and distributed training.
  •  Excellent communication skills and ability to work effectively within a multidisciplinary team.
  • Capable of managing multiple projects simultaneously and adapting to evolving priorities in a fast-paced environment.

Nice to Have

  • Experience with containerization technologies, such as Docker or Kubernetes.
  • Experience with cloud computing platforms, such as Azure, AWS or GCP


Additional Information

This position is full-time and requires residency in either the Netherlands or Switzerland, a valid work permit, and proximity to our offices in Amsterdam or Zürich. A Certificate of Conduct will be necessary upon finalizing the employment contract due to the handling of sensitive data.