We're hiring a Software Development Engineer II to contribute to our Monitoring & Detection engineering efforts as part of the incident response program for Amazon's worldwide retail websites. As we reimagine incident management and response for Amazon's rapidly evolving retail operations, we need skilled engineers to help us keep pace. In this role, you will play an important part in developing and implementing key components of our strategic platform for the central incident response team. Your work will directly impact the decisions made by Amazon teams during critical incidents, where every minute counts. You'll collaborate with senior team members to analyze post-incident data, identify improvement opportunities, and address potential blind spots in our systems. This position requires a mix of technical problem-solving skills and the ability to work in a fast-paced, complex environment. While you may not lead major initiatives, you'll be deeply involved in the technical aspects of our incident management capabilities, contributing significantly to the stability and reliability of Amazon's retail platforms.
Key job responsibilities
As a Software Development Engineer II on our team, you will play an important role in building and integrating key performance indicators for various services into our incident management platform. Working within Amazon's complex architectural landscape, you'll collaborate with service owners across the organization to develop and maintain software features for our monitoring systems. Your responsibilities will include designing scalable solutions that support the monitoring of numerous services, with guidance from senior team members to ensure alignment with long-term strategies. You'll be deeply involved in the full software development lifecycle, from scoping and design to coding, testing, deployment, and maintenance. Collaborating with stakeholders to understand business and customer value will be crucial as you work to deliver appropriate solutions. You'll contribute to documentation, participate actively in code reviews, and demonstrate operational excellence in all aspects of your work. Balancing new feature development with operational needs, you'll make effective priority trade-offs and help resolve root causes of issues. As you grow in this role, you'll have opportunities to mentor junior engineers and support new team members. This position requires a passion for understanding Amazon's retail business and providing real-time visibility into its operational health. You should be comfortable working in a dynamic environment, leveraging your problem-solving skills to tackle complex challenges, and collaborating across the Amazon ecosystem.
A day in the life
A day in the life of a Software Development Engineer II on our team is filled with the challenge of navigating Amazon's complex, semi-connected systems. The scale of the company's operations presents unique technical problems that require creative problem-solving and persistence. One moment, you might be collaborating with service owners to design a scalable monitoring solution for a critical service. The next, you could be diving deep into post-incident analysis, uncovering root causes and identifying areas for improvement. Throughout the day, you'll work closely with your teammates, contributing to code reviews, documenting systems, and mentoring junior engineers. While the challenges may not be easy, you'll find immense satisfaction in knowing that your efforts directly contribute to enhancing the monitoring capabilities that are crucial for safeguarding the seamless operation of Amazon's retail experiences. By embracing these complexities and leveraging your technical expertise, you'll play a vital role in the central reliability and response efforts, helping to improve Amazon's operational resilience and responsiveness.
About the team
The Incident Command Systems team at Amazon is responsible for envisioning and building programs, which consistently improve remediation times for outages. This group consists of multiple 2-pizza teams (teams of 6-10 engineers) that each own software components for monitoring, anomaly detection of website degrading issues as well as incident management software used during these outages.3+ years of non-internship professional software development experience
2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience programming with at least one software programming language- Experience with incident response or on-call support
- Familiarity with monitoring and alerting systems
- Knowledge of AWS services and infrastructure
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.
Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $129,300/year in our lowest geographic market up to $223,600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.