Senior Technical Product Manager (ML/AI) at Nebius AI

Job Description

The company

Nebius AI is an AI cloud platform with one of the largest GPU capacities in Europe. Launched in November 2023, the Nebius AI platform provides high-end, training-optimized infrastructure for AI practitioners. As an NVIDIA preferred cloud service provider, Nebius AI offers a variety of NVIDIA GPUs for training and inference, as well as a set of tools for efficient multi-node training.

Nebius AI owns a data center in Finland, built from the ground up by the company’s R&D team and showcasing our commitment to sustainability. The data center is home to ISEG, the most powerful commercially available supercomputer in Europe and the 16th most powerful globally (Top 500 list, November 2023).

Nebius’s headquarters are in Amsterdam, Netherlands, with teams working out of R&D hubs across Europe and the Middle East.

Nebius AI is built with the talent of more than 500 highly skilled engineers with a proven track record in developing sophisticated cloud and ML solutions and designing cutting-edge hardware. This allows all the layers of the Nebius AI cloud – from hardware to UI – to be built in-house, distinctly differentiating Nebius AI from the majority of specialized clouds: Nebius customers get a true hyperscaler-cloud experience tailored for AI practitioners.

The role

We are seeking a Senior Technical Product Manager, ML/AI Lifecycle Services to join our team. In this role, you will oversee the planning and prioritization of services across the ML/AI lifecycle, including training, fine-tuning, experiments, monitoring and inference. You will deliver products for leading AI companies, utilizing thousands GPU within one cluster with cutting-edge hardware. We also provide room for creativity, empowering you to take the initiative and build what you think is best.

Responsibilities:

Be a center of ML/AI expertise for both dev and business teams
Own the backlog of 1–3 AI/ML products.
Make technical requirements for IaaS and PaaS teams that are essential for your products.
Introduce and promote products to the market in collaboration with cross-functional teams.
Make materials and onboarding guides for Solution Architect teams and Sales.
Be an internal customer for a Marketplace and Solution Architects teams to build E2E scenarios using our products.

Requirements:

We expect the candidate to be the best user of the product they manage, so technical expertise is mandatory.
Have solid experience as an ML Engineer or Administrator with one or more domains from the following list:

Distributed training that utilizes at least dozens of hosts using Slurm, Ray Cluster, MosaicML

Organizing ML infrastructure using best MLOps practices with instruments like MLflow, W&B, MosaicML, Kubeflow, ClearML, AzureML, SageMaker, VertexAI

Maintaining and optimizing a large inference cluster with KServe, vLLM, Triton, RunAI, Seldon

Building a product on top of LLMs that leverages techniques such as RAG, fine-tuning, and function calling, with an understanding of continuous eval of the quality

Product management experience is not required but willingness to learn is essential.

Ideal Candidate:

You have experience as an ML engineer, specializing in developing large generative AI models. You are now eager to shift your focus toward creating tools and instruments that enhance the efficiency of such teams.
You have worked as an MLOps, Solution Architect or DevOps engineer, providing infrastructure for ML teams and delving deeply into ML specifics. You are keen to share your expertise through product development.
You have a background as an ML engineer and transitioned to product management, with a proven track record of delivering complex products for tech customers.

Does all that sound like your kind of challenge? Then join us!