Job Title: HPC Network Architect – Data Center Specialist
Location: Anywhere in EU
Employment Type: Full-time / Part time / Contract
Department: IT Infrastructure / High Performance Computing
Position Summary:
We are seeking a highly skilled HPC Network Architect with strong experience in data center infrastructure to design, implement, and optimize high-performance networking solutions. The ideal candidate will possess deep technical knowledge of HPC workloads, low-latency/high-bandwidth networking (e.g., InfiniBand, RoCE), and scalable data center architectures. You will collaborate with system architects, data scientists, and IT operations teams to ensure robust, high-throughput computing environments that support demanding applications in research, AI/ML, simulations, and data analytics.
Key Responsibilities:
- Design and architect end-to-end HPC network infrastructures, including interconnects, fabric topologies, and routing strategies.
- Define and implement high-speed, low-latency networks (e.g., InfiniBand, RoCE, 100/400GbE).
- Integrate HPC clusters into modern data center environments with high availability and scalability.
- Develop and maintain network policies, security models, and monitoring tools for HPC environments.
- Evaluate emerging networking technologies and recommend enhancements to improve performance and efficiency.
- Collaborate with data center engineers on power, cooling, cabling, and space planning for HPC systems.
- Troubleshoot and resolve network bottlenecks and performance issues in large-scale compute environments.
- Work with vendors and OEMs to ensure proper support and integration of HPC networking equipment.
- Document network designs, configurations, and procedures.
Qualifications:
Required:
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 8+ years of experience in network engineering, with 3+ years in HPC environments.
- Strong hands-on experience with InfiniBand, Ethernet (40/100/400GbE), RDMA, RoCE.
- Expertise in network design, fabric management, and traffic optimization for HPC workloads.
- Experience with data center design, power/cooling planning, and rack integration.
- Familiarity with Linux system administration and automation (e.g., Ansible, Python, Bash).
- Solid understanding of TCP/IP, BGP, OSPF, multicast, QoS, and software-defined networking.
Preferred:
- Experience with Slurm, PBS, or other HPC job schedulers.
- Familiarity with cloud and hybrid HPC models (e.g., AWS ParallelCluster, Azure CycleCloud).
- Certifications such as CCNP/CCIE, HCIA/HCIP, or equivalent.