Hybrid Hardware & Software Support Engineer - HPC
Remote Home, GB
|
About Bull Bull is the Atos Group brand for high-performance computing, artificial intelligence and quantum innovations with 2,500 employees. Built on an open, end-to-end and trusted foundation, Bull designs, deploys and runs hardware and software while providing strategic services that unlock enterprise value, accelerate scientific research and drive society forward. Driven by world-class R&D with 1,500 patents, manufacturing capabilities and data science, Bull enables nations and industries to fully control their AI and data, advancing progress for the benefit of the planet.
|
About Atos Group
Atos Group is a global leader in digital transformation with c. 63,000 employees and annual revenue of c. €8 billion, operating in 61 countries under two brands — Atos for services and Eviden for products. European number one in cybersecurity, cloud and high-performance computing, Atos Group is committed to a secure and decarbonized future and provides tailored AI-powered, end-to-end solutions for all industries. Atos Group is the brand under which Atos SE (Societas Europaea) operates. Atos SE is listed on Euronext Paris.
The purpose of Atos Group is to help design the future of the information space. Its expertise and services support the development of knowledge, education and research in a multicultural approach and contribute to the development of scientific and technological excellence. Across the world, the Group enables its customers and employees, and members of societies at large to live, work and develop sustainably, in a safe and secure information space
Location
Primarily on-site at a customer facility near Reading, Berkshire, with occasional support for additional HPC installations across Europe.
Requirement: Must be eligible for UK DV Security Clearance.
About the Role
Bull’s High-Performance Computing (HPC), Artificial Intelligence & Quantum Business Unit is seeking a Hybrid Hardware & Software Support Engineer to join our HPC Services team. This is a highly visible, customer-facing operational role supporting advanced HPC infrastructures in the UK.
You will work across computing, storage, and networking layers, ensuring the deployment, stability, and performance of large-scale Linux-based systems. While prior HPC experience is an advantage, it is not mandatory - strong Linux and infrastructure engineers eager to grow into HPC & AI are encouraged to apply.
Key Responsibilities:
Deployment & System Bring‑Up
- Install, configure, and integrate HPC cluster components (compute, storage, networking).
- Perform system installation, initial configuration, and operational readiness checks.
- Apply patches, updates, and conduct routine maintenance activities.
Hybrid Hardware & Software Support
- Provide Level 1 and Level 2 operational support for HPC environments.
- Diagnose and resolve issues involving:
- Linux operating systems
- Enterprise server hardware
- High-speed interconnects
- Storage subsystems
- Conduct root cause analysis and implement corrective actions.
- Escalate appropriately within the global support organisation when needed.
Operations & Incident Handling
- Monitor system health and respond to incidents proactively.
- Perform troubleshooting in secure, mission-critical environments.
- Maintain detailed and accurate documentation of incidents and resolutions.
Customer Interface
- Act as the primary technical contact on-site.
- Communicate effectively regarding incidents, planned maintenance, and system status.
- Build trusted relationships with customer technical stakeholders.
- Represent Bull professionally in sensitive and high-profile environments.
Core Technical Skills
What you must know to perform the role successfully:
- Strong Linux expertise (RedHat and/or Debian-based environments)
- Solid understanding of enterprise server hardware (CPU, memory, storage, diagnostics)
- Scripting skills in Bash and/or Python
- Strong networking fundamentals (TCP/IP, routing, switching, security basics)
- Hands-on experience with infrastructure deployment, configuration, and maintenance
- Excellent troubleshooting and analytical abilities
- Proactive mindset and ability to work independently
Desirable Skills & Experience
Valuable, but not mandatory:
- Experience with HPC clusters
- High-speed networking (40/100GbE, InfiniBand)
- Virtualisation technologies (KVM, OpenStack)
- Storage systems (Ceph, SAN/NAS)
- Parallel filesystems (Lustre, GPFS, BeeGFS)
- Containers (Docker, Podman, Kubernetes)
- Configuration management (Ansible, Puppet)
- Monitoring and observability tools (Prometheus, Grafana, Icinga)
- Workload managers (Slurm, PBS Pro)
- Git version control
Candidate Profile
We are looking for someone who:
- Is hands-on, operationally focused, and detail oriented
- Thrives in secure, mission-critical environments
- Approaches troubleshooting methodically, even under pressure
- Communicates clearly with both technical and non-technical stakeholders
- Takes full ownership of incidents through to resolution
- Is motivated to learn continuously and expand their technical expertise
Education & Experience
Option 1:
- Degree in Computer Science, Engineering, or related field + at least 2 years of relevant experience
Option 2:
- 5+ years of relevant industry experience
Strong early-career candidates with solid technical foundations will also be considered.
Benefits:
- Working on advanced HPC and digital infrastructure projects
- Continuous learning and technical skill development
- Career growth within a global technology organisation
- Participation in internal initiatives and community-focused activities.
What happens next?
-
Your application will be reviewed (1-2 business days)
-
Short-listed candidates will be contacted for a discussion with HR
-
Interview with management team
-
Feedback (1-10 business days after the interview).
Let’s grow together.