LearnUpon is looking for an SRE Manager to join our team in Ireland.
About LearnUpon:
LearnUpon LMS helps organisations train their employees, partners, and customers. Businesses can manage, track, and achieve their unique learning goals — all through a single, powerful solution.
With offices in Dublin (our HQ), Philadelphia, Belgrade, and Sydney, we are a team that puts our customers' experience at the heart of everything we do. We're always striving for the best solution (not the easy one), and we go the extra mile to deliver work we're proud of.
Our culture fosters open, collaborative environments where team and individual accomplishments are celebrated and encouraged. At LearnUpon, we work together as a friendly, supportive team, always putting the customer at the heart of everything we do. We lead with curiosity, act like owners, and believe in open, honest, and constructive communication.
About the Role:
In this role, you will be responsible for driving the technical strategy, execution, and operations of our infrastructure and ensuring the reliability, scalability, and performance of our systems. You will manage a team of skilled engineers, fostering their growth and ensuring alignment with the company’s objectives. You will create growth opportunities within and outside of the team, actively listen to gather ideas and perspectives, and support an environment that encourages diverse opinions and ideas.
As the SRE Manager, you will play a crucial role in overseeing initiatives aimed at modernising our infrastructure, optimising our processes, and managing infrastructure costs effectively. You will collaborate closely with cross-functional teams to drive these initiatives forward, ensuring alignment with business goals and objectives. The SRE team sits inside LearnUpon’s Platform Engineering group within Engineering, we are key consultants for the entire company on matters of infrastructure management, observability & incident management amongst others.The SRE team’s focus is on maintaining and expanding our cloud infrastructure and app services, to ensure platform scalability and site availability as we look to grow threefold over the next few years.
What will I be doing?
The main responsibilities are:
- Lead and mentor the SRE team, providing guidance, support, and fostering a culture of collaboration, innovation and constructive communication.
- Develop and execute the technical strategy for infrastructure and Site Reliability initiatives, ensuring alignment with business objectives.
- Drive the implementation of best practices for SRE including infrastructure as code, configuration management, observability, SLOs/SLIs & incident management.
- Oversee the design, implementation, and maintenance of cloud-based infrastructure primarily on AWS.
- Drive observability practices, implementing monitoring, logging, and tracing solutions to ensure visibility into system performance and reliability.
- Collaborate with development teams to optimize application performance, scalability, and reliability in a microservices architecture.
- Implement automation solutions to streamline processes, increase efficiency, and reduce manual intervention.
- Drive incident management processes, ensuring timely resolution of incidents and implementing measures to prevent recurrence, and fostering a culture where learning from failures is encouraged.
- Stay abreast of industry trends, emerging technologies, and best practices in SRE, and cloud infrastructure.
- Create an environment where team members feel empowered to take ownership, make decisions, and learn from failures..
What skills do I need?
- Excellent leadership, communication, and interpersonal skills, with the ability to influence and collaborate effectively across teams and encourage diverse ideas, perspectives, and experiences.
- 10+ years of experience working with SaaS products at scale within an SRE/DevOps role, with at least 2 years in a management position.
- Strong proficiency with cloud technologies, particularly AWS.
- Experience in containerization technologies such as Docker and container orchestration with Kubernetes.
- Strong understanding of observability principles and experience with tools such as Prometheus, Grafana, ELK stack, New Relic, Datadog etc.
- Solid understanding of infrastructure as code (IaC) principles and experience with tools such as Terraform, CloudFormation, or similar.
- Strong problem-solving skills and the ability to troubleshoot complex technical issues under pressure.
- Ability to make data-driven decisions and effectively communicate insights and rationale to both technical and non-technical stakeholders.
- Strong experience with Agile/Scrum methodologies and DevOps/SRE practices.
- Strong commitment to quality, reliability, and continuous improvement.
Don’t worry if you don’t tick every box in order to apply, we’re always happy to review applications and take all experience into consideration. We do our best to provide feedback where we can!
Why work with us?
- Work in a fun and supportive environment with regular team events.
- Excellent career progression - take LearnUpon where you think it can go.
- Structured learning environment.
- Competitive salary and company ESOP.
- Employer Contributed Pension.
- Private health insurance.
- 25 days annual leave + 1 Company day off.
- Flexible Working Arrangements.
What is the Hiring Process?
Applicants for the position can expect the following hiring process:
- Qualified applicants will be invited to schedule a 30-minute call.
- Successful candidates will then be invited to a series of practical interviews.
- Finally, candidates will have a short interview with our CEO/CTO.
- Successful candidates will be contacted with an offer to join our team.
Visit our Careers site to find out more about working for LearnUpon, and check us out on Instagram.