Squarepoint Capital Logo
Squarepoint Capital
Platform Specialist - HPC
🌎London, New York
1w ago

Job Description

 

Squarepoint is a global investment management firm that utilizes a diversified portfolio of systematic and quantitative strategies across financial markets that seeks to achieve high quality, uncorrelated returns for our clients. We have deep expertise in trading, technology and operations and attribute our success to rigorous scientific research. As a technology and data-driven firm, we design and build our own cutting-edge systems, from high performance trading platforms to large scale data analysis and compute farms. With offices around the globe, we emphasize true, global collaboration by aligning our investment, technology, and operations teams functionally around the world.

Position: HPC Architect 

Business Area: Technology Infrastructure

The HPC Architect will be part of a talented global team focused on enterprise or low latency solutions. The candidate must demonstrate superb technical competency in delivering a mission critical infrastructure and ensuring the highest levels of availability, performance, and security. 

The candidate will be responsible for research, design, L3 support, and documentation for Squarepoint’s HPC Platform. This will involve collaborating with our business partners, application owners, clients, vendors, and internal teams (Platform, Network, Application Support, Application Development, Quants, etc.) to deliver end to end solutions that can meet the needs for today and scale to meet the needs for tomorrow.  This candidate will be flexible and flourish working in a high paced and challenging environment with capacity to grow and learn from peers. 

Position Overview:

  • Design, document, and enhance platform related services including servers, storage, and cloud.
  • Leverage modern computer architectures including but not limited to GPU, new CPU architectures, and modern HPC storage platforms.
  • Identify inefficient use of compute and storage resources and provide solutions to eliminate them.
  • Provide concise and professional documentation.
  • Efficiently measure HPC system performance using quantitative metrics to show usage and improvements over time.
  • Project delivery/management collaborating with internal and external partners.
  • Provide L3 escalation support to remediate performance and availability issues.
  • Identify areas of improvement before they become a problem.
  • Ability to customize solutions based on evolving requirements.

Required Qualifications:

  • 10+ years working with Linux (RHEL/Rocky/CentOS/OEL preferred) in an enterprise environment with the following areas of focus: operations, systems engineering and systems performance.
  • System tuning (memory/CPU/network) for high bandwidth compute infrastructure.
  • Experience with identifying low-level performance bottlenecks: induced by the OS, from software architecture, HPC storage, or on the network layer.
  • Full understanding of network protocols such as TCP, UDP, RDMA and how to properly tune servers and network for each
  • Physical server architecture understanding differences between CPU chipsets and when is the right time for each (Intel/AMD/ARM).
  • HPC job schedulers (Slurm, RunAI, Bright Cluster Manager)
  • Experience working with applications written in Python and/or C++
  • Well-organized, proactive, resourceful, able to handle a fast-paced environment, question the status quo, accountable and possesses an ownership mindset.
  • Critical thinking and problem-solving skills to tackle troubleshooting the unknown, glitches and the obscure.
  • Strong communication: verbal and written.
  • Degree in Engineering, Computer Science, or related Information Technology experience.

Nice to have:

  • Experience with configuration management tools i.e. Ansible, Chef, and Terraform.
  • Familiarity with different network switch vendors and different switch architectures.
  • Experience with KDB (Q).
  • Ability to debug and enhance applications using at least one of the frameworks:  XGBoost, LightGBM, PyTorch, Tensorflow
  • Kubernetes and how to integrate HPC workflows into it.

 

The minimum base salary for this role is $60,000 if located in New York. This expectation is based on available information at the time of posting. This role may be eligible for discretionary bonuses, which could constitute a significant portion of total compensation. This role may also be eligible for benefits, such as health, dental, and other wellness plans, as well as 401(k) contributions. Successful candidates’ compensation and benefits will be determined in consideration of various factors.