Mountain View, CA
At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.
We are a team of research/software engineers, research scientists, and machine learning experts, working together to enable superhuman understanding of the visual world. With our latest Gemini embedding effort which is a top priority for the PIU unit in GDM), we are aiming at training the most powerful omnimodal embedding model which can be used for retrieval and other agentic use cases in Google products. This unified omnimodal model will replace the currently widely adopted unimodal image (also developed by our team) and text embedding models. Meanwhile, we are also working on leveraging Gemini to build agentic solutions for media understanding.
As part of the Media Understanding at Google DeepMind, you will have the opportunity to advance the state-of-the-art research in Embedding/representation models in context of large language models. Meanwhile, you will also have the chance to build agentic solutions on top of Gemini to fulfill agentic media understanding. You'll be at the forefront of developing/serving models that power Google products used by billions of people worldwide. Your work will directly impact how these products understand and interact with diverse media, including text, images, audio, and video. This is a unique opportunity to shape the future of multimodal AI and its applications in a dynamic and impactful environment.
You'll work on the productionisation of the next SOTA models for multimodal understanding. Your work will include building various agentic Gemini demos with Gemini embedding, enhancing the serving stack for scalability, working with PAs to adopt our models, and improving models with PA feedback, etc.
As a member of the media understanding team, you will be responsible for conducting core and applied research in computer vision and language understanding to support a multitude of Google products and use cases. Your job responsibilities will include:
We are an applied research team that takes on challenging real-world problems and thrives on finding solutions in the presence of ambiguity. In order to set you up for success as a Research Engineer/Scientist at Google DeepMind, we look for the following skills and experience:
In addition, the following would be an advantage:
The US base salary range for this full-time position is between $182,000 - $215,000 + bonus + equity + benefits. Your recruiter can share more about the specific salary range for your targeted location during the hiring process.
Application deadline: March 17, 2025
Note: In the event your application is successful and an offer of employment is made to you, any offer of employment will be conditional on the results of a background check, performed by a third party acting on our behalf. For more information on how we handle your data, please see our Applicant and Candidate Privacy Policy.