At JFrog, we’re reinventing DevOps to help the world’s greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if you’re willing to do more, your career can take off. And since software plays a central role in everyone’s lives, you’ll be part of an important mission. Thousands of customers, including the majority of the Fortune 100, trust JFrog to manage, accelerate, and secure their software delivery from code to production -- a concept we call “liquid software.” Wouldn't it be amazing if you could join us on our journey?
JFrog seeks a highly-skilled Site Reliability Engineer to join our team! In this role, you will drive best practices, optimize operational workflows, and mentor junior engineers, fostering a culture of collaboration and innovation. This is an exciting opportunity for someone passionate about building and integrating services and systems that ensure the availability, performance, and reliability of JFrog SaaS environments. You will play a critical role in ensuring the availability, performance, and reliability of JFrog SaaS services and systems. You will work closely with P&E engineering and Cloud teams to build and maintain scalable, resilient infrastructure while championing best practices for automation, monitoring, and incident response. If you're eager to make a significant impact in a fast-paced, high-growth environment, we encourage you to apply.
As a Site Reliability Engineer in JFrog you will…
- Support the building and managing of scalable, reliable services and infrastructure to support JFrog SaaS services
- Drive the reliability, performance, and availability of our SaaS products, ensuring service-level objectives are met or exceeded
- Apply SRE best practices, including incident management, performance and capacity planning, and disaster recovery flows
- Adhere to Incident management framework ensuring timely identification, escalation and resolution of incidents
- Develop and manage large-scale systems with CI/CD in mind, to support multiple production environments and use cases
- Tackle large-scale production issues and bring out-of-the-box thinking to the table
- Implement SRE tools, technologies, and methodologies that align with meeting JFrog’s SaaS uptime & reliability goals
To be a Site Reliability Engineer in JFrog you need...
- 2+ years of relevant DevOps or SRE experience in large-scale production environments
- 1+ years of infrastructure automation, configuration management, or container orchestration using Kubernetes, Docker, Terraform, and Ansible
- 1+ years in Python or any other advanced programming language
- Excellent communication, and collaboration skills with an ability to work effectively across globally-distributed teams
- Experience in managing container and infrastructure orchestration tools (e.g. Kubernetes, Terraform)
- Hands-on experience administering public clouds (AWS, GCP, or Azure)
- Experience with building CI/CD pipelines for applications and microservices (Jenkins/ArgoCD)
- Experience with Chaos, alerting & observability tools (Gremlin, PagerDuty, Opsgenie, New Relic, Coralogix)