At Scribd (pronounced “scribbed”), our mission is to spark human curiosity. Join our team as we create a world of stories and knowledge, democratize the exchange of ideas and information, and empower collective expertise through our three products: Everand, Scribd, and Slideshare.
We support a culture where our employees can be real and be bold; where we debate and commit as we embrace plot twists; and where every employee is empowered to take action as we prioritize the customer.
Our flexible work benefit - Scribd Flex - enables employees, in partnership with their manager, to choose the daily work-style that best suits their individual needs. As an organization, we prioritize collaboration and intentional in-person moments to build culture and connection. For this reason, occasional in-person attendance is required for all Scribd employees, regardless of their location.
About the team:
The ML Data Engineering team is at the heart of metadata extraction and enrichment for all of our brands, managing and processing hundreds of millions of documents, billions of images, and serving millions of users. We operate at an unparalleled scale, handling diverse datasets, including UGC documents, ebooks, audiobooks, and more. Our goal is to build robust systems that drive content discovery, trust, and structured metadata across our platforms.
Role Overview:
We are seeking a Software Engineer II with a strong background in data engineering, software development, and scalable systems. As part of the ML Data Engineering team, you will work on designing, building, and optimizing systems that extract, enrich, and process metadata at scale. You’ll collaborate closely with machine learning teams, product managers, and other engineers to ensure the smooth integration and processing of vast amounts of structured metadata.
Tech Stack:
Our team uses various technologies. The following are the ones that we use on a regular basis: Python, Scala, Ruby on Rails, Airflow, Databricks, Spark, HTTP APIs, AWS (Lambda, ECS, SQS, ElastiCache, Sagemaker, Cloudwatch, Datadog) and Terraform.