Netflix is one of the world's leading entertainment services, with 283 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.
The Media Infrastructure Platform team (MIP) provides foundational infrastructure services to other engineering teams at Netflix. We create solutions with high leverage that multiply the productivity of other teams. Our platforms act as an enabler for media processing teams.
One of our main products is Stratum, a large-scale, next-generation serverless function platform designed to handle media-specific computational tasks. Stratum is the foundation of the Cosmos platform for media processing. Our team’s products are critical to Netflix— every video in the Netflix catalog has been processed by Stratum or one of its predecessors. Stratum is the primary compute platform for most engineers at Netflix in the media processing space. Stratum uses another complementary product that we develop, MezzFS, a FUSE based solution for efficiently accessing large files in S3. We also develop Nirvana, an observability solution for media processing workloads that run on top of Stratum. Due to MIP’s high scale, the impact of even minor improvements to the efficiency or developer experience of our products is enormous.
MIP is part of a media-focused engineering group that provides highly available infrastructure for content production and processing across all Netflix productions and licensed content. Infrastructure pieces like massive-scale media processing platforms (1, 2), workflows (Conductor), media asset management, collaboration, reporting, data movement, and data processing are some of the key services we build. All of this is custom-built on top of Amazon Web Services (AWS) infrastructure.
About the role
As an L6 engineer on MIP you will help us build and grow innovative solutions in the media compute space. You’ll work on resource scheduling in a distributed polyglot compute platform running at massive scale. You’ll gain exposure to building observable, efficient, highly available and fault tolerant systems. In this role you will have the opportunity to drive direction, own development end-to-end, manage stakeholder relationships, provide actionable feedback and insights to colleagues, and create technical solutions at scale.
Specifically, this role will be a part of the Cosmos Compute team within MIP that builds Stratum, our high scale, serverless media processing compute system.
About You
You are self-motivated and can work independently. You can also partner closely with other engineers on a project.
You are passionate about building quality products and want to own development and operations end-to-end, leading with the right architecture, and following sound engineering principles to deliver maintainable, performant, and predictable experience.
You are a problem solver and like to challenge yourself, but you are not afraid to reach out when you need help, and enjoy helping other engineers.
Experience
Strong experience leading engineering efforts: You should be able to work with other engineers on the team to lead large, impactful technical initiatives that may span multiple teams and orgs. This includes putting together designs, pushing the project forward to make progress, working with your manager to highlight and overcome roadblocks, and rallying your fellow engineers around the work
Strong background in distributed systems - The majority of your time on the team will be spent working on Stratum and related services, which are part of large-scale distributed systems
Experience operating a tier one production system to a high degree of operational excellence - As our platforms have matured, we need to achieve a higher degree of operational excellence than before. We are looking at developing SLOs and monitoring our system for adherence, as an example of our work in this space. Someone with skills in this space would be highly additive to our team and would make a big impact.
Strong experience with containers and/or serverless: You should be familiar with docker and ideally kubernetes or another container orchestrator (as a power user building on top of it). Familiarity with serverless paradigms such as lambda, cloud run, etc is also very helpful.
Interesting facts
We manage the largest compute clusters in Netflix.
Because many of our workloads are different from typical Netflix microservices, we drive a lot of innovation in the compute and runtime platform space. For example, we precipitated and helped design the batch compute abstraction for Netflix in partnership with the team that runs Titus, Netflix K8s-based container orchestrator.
Our functions have wildly varying durations, some run in seconds or minutes, but others in days or weeks!
We have a spectrum of different workloads – on one end of the spectrum, there are workloads we want to run as cheaply as possible but do not have strict due dates. On the other end of the spectrum, some workloads are servicing humans in the loop waiting on results, so must run with predictable latency to meet SLOs.
Our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $230,000 - $960,000.
Inclusion is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.
We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.
Job is open for no less than 7 days and will be removed when the position is filled.