Amazon Logo
Amazon
Senior Technical Program Manager, Chaos Engineering Red Team
🌎New York, New York, USA
2 months ago
👀 2 views
📥 0 clicked apply

Job Description

On-site
As a Sr. TPM on the Central Reliability and Response Engineering (CRRE) Red Team, you will spearhead cross-functional programs, products, and problem spaces that impact hundreds of teams, thousands of engineers, and millions of customers. The primary charter of the team is chaos engineering, which means you will be involved in the design and execution of production-based experiments that validate our software resilience and fault tolerances. During peak sales events, the Amazon retail website handles tens of thousands of customer orders and tens of millions of requests....per minute! What if we didn’t have to wait for these events to see how the thousands of micro-services behind the website would behave? What if we knew how these services might act if an availability zone goes down? What if we can create misbehaving clients on multi tenant platforms to ensure tenant isolation works? These are not hypotheticals — our team is designing and executing chaos experiments to simulate these types of scenarios and more! You’ll get the chance to experience Chaos Engineering at an unrivaled scale that only a few companies in the world deal with. We’re fundamentally reshaping how Amazon maintains an always-ready resilience posture by helping to expose issues before our customers do!

You will join a team of 20+ engineers and growing who are building tooling to allow these experiments to be run with safety, repeatability, and scalability. As a Sr. TPM, you will be involved in technical experiment design, client outreach, data analysis, system architecture, risk mitigation, and evangelizing chaos engineering culture across Amazon. You will influence hundreds of teams as well as key decision makers like org-wide directors and VPs.

The problems we are solving for have a fair amount of ambiguity and technical complexity, and you will help to simplify this and distill it into incremental and achievable deliverables. We’re looking for someone who can help lead our teams in identifying problems proactively, direct software-based initiatives, and turn any manual work we’re doing into repeatable processes that our software engineers can effectively automate. You will be getting into the technical weeds with our engineers; as a reliability engineering team, it’s important that we can dive deep into reliability issues that pose risks to the global retail websites.

This role requires strong writing skills to complement Amazon’s doc-focused culture. You will be expected to draft documents that utilize data-driven analysis, an understanding of system architectures, and present arguments from multiple viewpoints to help stakeholders understand our problem space and influence key decision makers. You will frequently be discussing and evangelizing these concepts across hundreds of Amazon teams. Given this we are interested in candidates that can demonstrate working across large organizational structures and effectively manage expectations given varied team cultures.

We often don’t know exactly what we’ll find when we run these chaos experiments, so we place a heavy emphasis on rapid prototypes and agility. We deal with high levels of ambiguity and convert head-tilting ideas into streamlined workflows that are repeatable and safe. We’re building multiple systems from the ground up that focus on chaos engineering at scale and safety mechanisms that help us detect and mitigate real customer impact in production environments. Our work is intentionally provocative and centered around challenging the status quo inside of Amazon. We find this work exciting, rewarding, and fulfilling, and we hope you will too! Our leaders are invested in this work and our teams, and we’re excited to explore your passions as an individual and as a technologist!- 5+ years of technical product or program management experience
- 7+ years of working directly with engineering teams experience
- 3+ years of software development experience
- 5+ years of technical program management working directly with software engineering teams experience
- Experience managing programs across cross functional teams, building processes and coordinating release schedules
- Excellent oral and written communication skills with both technical and non-technical stakeholders- 5+ years of project management disciplines including scope, schedule, budget, quality, along with risk and critical path management experience
- Experience managing projects across cross functional teams, building sustainable processes and coordinating release schedules
- Experience defining KPI's/SLA's used to drive multi-million dollar businesses and reporting to senior leadership

Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $133,900/year in our lowest geographic market up to $231,400/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.