Citadel is seeking a Kubernetes Site Reliability Engineer (SRE) to join our central Compute team that is evolving the compute platform for the firm. In this role, you will be instrumental in shaping and executing our containerization strategy, optimizing resource utilization, and ensuring robust disaster recovery and business continuity. The ideal candidate has experience working in high-performance environments and has worked on the Kubernetes internals. The Kubernetes engineer at Citadel will work with application and fellow infrastructure teams to design solutions and troubleshoot issues.
Key Responsibility:
This role is hands-on, requiring direct interaction with platform users to understand their requirements. The ideal candidate will translate these requirements into effective solutions and then build and configure them. We prioritize self-documenting, version-controlled code, and configurations over traditional wiki pages and written documentation.
- Design and drive standards for compute usage, encompassing virtual machines and containers.
- Establish processes to ensure applications are host-neutral and deployable across various data center and office environments.
- Support firmwide disaster recovery and business continuity initiatives.
- Leverage data to inform strategies and decision-making.
Required Qualifications:
- Bachelor's degree in computer science, ora related technical discipline, or an equivalent experience.
- Experience building and running production Kubernetes clusters.
- Deep understanding of Linux and its network stack.
- Experience with observability techniques including logs, metrics, traces, and profiles.
- Experience deploying and managing services on the Google Cloud Platform (GCP).
- Experience writing production-grade code in Go, Python, or Rust.
- Experience developing with Git, issue tracking, code reviews, and CI/CD pipelines.
Preferred Qualifications
- Proficiency in infrastructure provisioning/management tools (e.g. Ansible, Puppet, Terraform, Packer).
- Experience with ArgoCD, Helm, and eBPF.
- Advanced knowledge of TCP/IP networking, architecture, and core technologies (such as DNS, DHCP, HTTP, Routing, VPN).
- Ability to manage and implement large-scale infrastructure projects.
In accordance with New York City’s Pay Transparency Law, the base salary range for this role is $105,000 to $300,000. Base salary does not include other forms of compensation or benefits.