Singapore Institute of Management

Senior Platform Engineer – Pricing Platform

11 October 2024
Apply Now
Deadline date:
£96000 - £179000 / year

Job Description

Your day to day

–    Derive incident management and root cause analysis recommendations into improvement points for the platform.
–    Work with the latest (automation) tooling with a strong focus on performance,reliability,observability and security.
–    Define platform lifecycle management, resilience patterns, architecture and roadmaps together with solution, domain and enterprise architects.
–    Align platform expected changes with stakeholders, financial controllers, and report on platform volumes to area lead.
–    Present platform and automation best-practices to team and at in-/external engineering events.
–    Report on state of IT Risk & security controls on the platform as per ING Information Security Management Policy.
–    Apply CI/CD using Azure DevOps as well as remote operations on the platform.
–    Through Agile/Scrum, collaborate with the other engineers to bring live new sprint releases every 2-4 weeks to Acceptance and Production.
–    You are committed to staying updated with the latest developments in HPC and Cloud tech and participating in relevant workshops, conferences, and training programs is part of your nature.
–    You meet frequently with product managers, analysts and researchers to gather and incorporate stakeholder feedback to improve HPC services. 

What you’ll bring to the team
Experience: 5+ years of software engineering / operations experience

Tech stack/ knowledge:
Mandatory:

–    IT Operations/Support experience combined with analytical skills to identify root causes in incidents (data, technical, functional).
–    Strong understanding of high performance computing environments, including HPC (GPU) clusters, parallel computing principles, distributed computing principles and techniques
–    Strong understanding of using GPU technology as computational accelerator and proficiency in cluster management and job scheduling systems (e.g., DataSynapse, Slurm, PBS, LSF).
–    Knowledge of GPU architectures and technologies (e.g., NVIDIA CUDA, AMD ROCm).
–    Experience with deploying and maintaining parallel programming models and libraries (e.g., MPI, OpenMP, CUDA), middleware and supporting software.
–    Ability to identify and contribute to resolving performance bottlenecks in HPC applications via monitoring / observability practices.
–    Knowledge of CI/CD, experience with Git, Python, Ansible, Shell scripting and working experience with monitoring practices and alerting tools.
–    Strong Linux (RHEL 8 or 9), Azure DevOps experience, pipeline and Ansible skills and experience working with certificates / encryption technology.
–    Strong experience in translating computational requirements to IT concepts like system sizing.
–    Experience with Grafana and tools for alerting like Prometheus, as well as a strong understanding of complex subsystem monitoring and alerting
–    Good understanding of the ELK Stack and how to interact with it
–    Experience in mentoring junior engineers and providing technical guidance.
–    Thoroughness in testing validating the configurations ,optimizations and system reliability, performance.
–    Education at Master level with a strong analytical background in Computer/Data Science, Cybernetics, Software Engineering, Financial Engineering or equivalent. 
–    Due to the cross-border nature of IT teams at ING, we ask that English (advanced) is part of your skillset.

Nice to have:

–    Familiarity with Oracle 12c/19c with PL/SQL
–    Experience with cloud-based HPC solutions (e.g., AWS, Azure) and understanding of hybrid HPC environments and cloud integration is a plus.
–    Affinity with (GPU) programming languages and frameworks (e.g., CUDA, OpenCL, Pytorch) is a plus.
–    Familiarity with in-memory caching tools (Apache Ignite (GridGain), Redis et al).
–    Familiarity with shared storage configuration and design.
–    Programming skills in languages such as C, C++, and Java is a plus.
–    Familiarity with Docker and orchestration for it (Kubernetes, Openshift et al.)
–    Good Linux networking skills.