Tenstorrent

Staff Engineer, HPC Systems Software

4 December 2025
Apply Now
Deadline date:
£100000 - £500000 / year

Job Description

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible.

We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities. We are seeking a HPC Systems Engineer to architect and maintain the operating system foundation that powers our global hardware design infrastructure. You’ll own bare-metal provisioning pipelines, configuration-as-code systems, and OS lifecycle management across hundreds of compute nodes—ensuring hardware engineers have consistent, performant, and reliable systems.

This role requires deep Linux expertise, automation mastery, and the ability to solve complex infrastructure problems at scale in a rapidly evolving startup environment. This role is hybrid, based out of Austin, TX, Santa Clara, CA, or Toronto, CA.

We welcome candidates at various experience levels for this role. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting. Who You AreDesign and maintain automated OS deployment pipelines for bare-metal HPC clusters globally.

Manage large-scale configuration management using Ansible to ensure consistency across compute infrastructure. Deploy and lifecycle manage RHEL and Ubuntu systems across diverse hardware platforms. Implement infrastructure-as-code for repeatable, version-controlled system configurations.

Troubleshoot OS-level issues, optimize kernel parameters, and resolve system performance bottlenecks. Collaborate with hardware design teams to standardize system configurations, toolchains, and development environments.

Build automation and tooling to streamline provisioning, patching, and system updates at scale What You BringExperienced in RHEL and Ubuntu administration at HPC or large-scale compute environments. Highly skilled in Ansible for automation and configuration management across hundreds of nodes. Proficient with bare-metal provisioning systems (MAAS, Foreman, Cobbler, Warewulf, or similar).

Deep understanding of Linux system internals, networking, kernel tuning, and performance troubleshooting. Familiar with HPC cluster architecture, workflows, and infrastructure-as-code practices. Capable of diagnosing and resolving complex infrastructure issues independently in fast-paced environments.

Nice to HaveHands-on experience with IBM Spectrum LSF or similar HPC workload managers. Integration with commercial HPC storage platforms (Pure Storage, Weka, NetApp, DDN, Vast Data). Deep exposure to EDA tools and hardware design workflows in semiconductor development.

Container technologies (Docker, Singularity, Podman) for reproducible compute environments. Cluster monitoring and observability at scale using Prometheus, Grafana, and custom tooling. Advanced provisioning techniques including PXE boot, kickstart, cloud-init, and BMC/IPMI integration.


EWJD3