Dialpad Logo

Dialpad

Software Engineer, Developer Experience & MLOps

🌎

Kitchener, Canada

3h ago
👀 3 views
📥 0 clicked apply

Job Description

About Dialpad

Dialpad is the leading Ai-powered customer communications platform creating human-first, Ai-enhanced solutions that will drive the next wave of how businesses communicate with and serve their customers. Enterprise customers like Randstad, Remax, Mizuho, Cigna, T-Mobile, Johns Hopkins, Motorola, Warby Parker, Panera Bread, and Netflix, use Dialpad and its Ai capabilities to deliver amazing customer experiences. Supported by notable investors such as Andreessen Horowitz, Google Ventures, and ICONIQ Capital, Dialpad is a dynamic force in Ai technology with a rapidly expanding presence. Visit dialpad.com to learn more.

About the team

Dialpad’s Ai Engineering team works centrally alongside Data Science, Telephony, and Product Engineering teams to produce The Good Ai. In this role, you’ll leverage and acquire a broad skill set ranging from Distributed Systems Engineering, DevOps, MLOps and Data Engineering to deliver functionality essential to powering Dialpad’s Ai products.

Your role

As a Software Engineer – AI Developer Experience & ML Platform, you will design, build, and optimize the infrastructure, tooling, and workflows that enable engineers and data scientists to efficiently develop, deploy, and scale AI-powered applications. Your role will be multifaceted, spanning both developer experience (DevEx)—streamlining development, testing, and deployment—and MLOps & ML platform engineering—ensuring scalable, reliable, and high-performance AI/ML workloads.

For DevEx, you’ll focus on improving the productivity and happiness of engineers and data scientists by building robust development environments, automating workflows, and enhancing observability. You’ll work with tools such as Kubernetes, Grafana, Terraform, and CircleCI, leveraging GCP services like GKE, Cloud Workstations, Cloud Run, and BigQuery, to create a seamless developer experience.

For MLOps, you’ll architect and maintain scalable AI infrastructure, enabling real-time analytics, efficient model training, and optimized inference. You’ll work with vLLM, Apache Beam, SQL, and Kubeflow, using GCP services like Vertex AI, Dataflow, BigQuery, and GKE to build and maintain end-to-end AI pipelines. Your contributions will directly impact the scalability, performance, and reliability of Dialpad’s AI-driven insights, ensuring that AI models and analytics run efficiently at scale.

What you’ll do [i,e., Responsibilities]

First Week

  • Merge your first PR & learn the review process: Make a small contribution, go through the code review process, and get familiar with team coding standards.
  • Learn CI/CD workflows & deployment process: Understand how changes move from development to production, including CircleCI, Terraform, and GitOps workflows.
  • Test, deploy, and monitor a change in production: Push a minor update, observe logs, metrics, traces and alerts, and ensure smooth rollout.
  • Meet the team & key stakeholders: Get to know your immediate team and cross-functional teams (ML engineers, data scientists, platform engineers).

First 3 Months

  • Work directly on Dialpad’s AI/ML pipelines, Vertex AI and Kubernetes-based dev environments to enhance platform performance.
  • Optimize developer workflows, including CI/CD pipelines (CircleCI) and infrastructure (Terraform), to accelerate AI and ML deployments.
  • Strengthen observability and debugging (Grafana, Loki, OpenTelemetry) for better insights and faster issue resolution.
  • Collaborate with cross-functional teams to identify bottlenecks, ship quick wins, and demonstrate measurable improvements.

First 6 Months

  • Streamlining ML deployments, data ingestion, and environment setup through internal CLI tools, templates, and dashboards to empower self-serve developer workflows.
  • Automating ML model testing and deployment rollbacks to refine and automate CI/CD for AI/ML, including improving GitOps workflows with CircleCI and Terraform for increased reliability.
  • Using Grafana, Loki, OpenTelemetry, and Vertex AI Model Monitoring to enhance AI/ML observability and monitoring by expanding logging, tracing, and real-time metrics.
  • Optimizing Kubernetes and cloud workflows to improve GKE-based AI workloads, autoscaling policies, and resource efficiency to address growing ML and data pipeline demands.

First 12 Months

  • Optimize AI compute cost and efficiency by implementing autoscaling, spot instance scheduling, and GPU/TPU resource optimization to balance performance and cost.
  • Build a self-serve AI infrastructure by developing internal developer tooling, dashboards, or APIs that enable engineers and data scientists to easily deploy models and manage data pipelines.
  • Enable AI-driven analytics at scale by ensuring real-time AI insights power customer-facing features with sub-second query latencies in Pinot, BigQuery, and Dataflow.
  • Automate infrastructure provisioning by expanding GitOps-driven automation for deploying and managing AI workloads, Kubernetes clusters, and cloud resources.

Technologies you know

  • Kubernetes & Cloud Infrastructure – Managing GKE, Terraform, Cloud Workstations, and IAM for scalable AI/ML workloads. Related: AWS, Azure, Docker, Kubernetes
  • CI/CD & GitOps – Automating deployments with CircleCI, Terraform, and Cloud Build to streamline AI/ML workflows. Related: ArgoCD, Jenkins, GitLab
  • ML Pipeline & Data Processing – Working with Vertex AI Pipelines, MLFlow, Apache Beam, Spark, Dataflow, Pub/Sub, and BigQuery to enable real-time AI analytics. Related: Athena, Kafka, Flink, Redshift, Spark, Snowflake, Databricks
  • Observability & Monitoring – Implementing Grafana, Loki, OpenTelemetry, and Vertex AI Model Monitoring for debugging and tracking AI performance. Related: Prometheus, Jaeger
  • Model Deployment & Serving – Understanding Kubeflow, TensorFlow Serving, Triton Inference Server, and strategies for scalable ML inference. Related: ONNX, TorchServe

Skills you’ll bring

  • You have a Bachelor’s Degree in Computer Science, Software Engineering, Mathematics, or a related field, or equivalent work experience.
  • You have 3+ years of experience in DevOps, MLOps, Developer Experience, or related roles.
  • You have strong fundamentals in software engineering, cloud infrastructure, and distributed systems.
  • You thrive in a collaborative, distributed team and can work effectively across time zones.
  • You have experience building and maintaining AI/ML infrastructure, CI/CD pipelines, or developer tooling.
  • You enjoy automating and optimizing development workflows, from CI/CD pipelines to AI/ML deployments.
  • You take a data-driven approach to system reliability, ensuring observability, monitoring, and performance tracking.
  • You believe in choosing the right tool for the job, balancing scalability, efficiency, and maintainability.
  • You are comfortable working across infrastructure, AI/ML pipelines, and developer tooling to support high-scale applications.
  • You enjoy continuous learning and knowledge-sharing, improving both your skills and your team's capabilities.
  • You are fluent in English and communicate complex technical concepts clearly.

Bonus Points For

  • A track record of Open Source contributions in DevOps, MLOps, or AI tooling.
  • Experience in the Python ecosystem and related ML/DevOps libraries.
  • Hands-on expertise with cloud providers such as Google Cloud Platform (GCP) or AWS.
  • Experience with GitOps workflows and tools like ArgoCD, Flux, or Terraform.
  • Familiarity with AI/ML observability, model monitoring, and real-time inference optimization.

Benefits, time-off, and wellness

An apple a day keeps the doctor away—and it doesn’t hurt that we offer flexible time off and great options for medical, dental, and vision plans for all employees. Along with that, employees also receive a monthly stipend to help cover your cell phone bill, home internet bill, and we reimburse for gym membership costs, a variety of wellness events, and more!

Professional development

Dialpad offers reimbursement for expenses related to professional development, up to an annual limit per calendar year.

Culture
We’ve been named a Top Workplace seven times, and a big part of this is because of our collaborative culture that elevates our teammates, celebrates wins, and brings together passion and talent. 

Compensation
Teamwork makes the dream work, and Dialpad offers competitive salaries because each and every Dialer participates in our success.

Diversity, Equity, and Inclusion (DEI) at Dialpad

At Dialpad, we are passionate about Doing the Right Thing. This means we are committed to building a values-driven culture that celebrates identity, inclusion and belonging. As a global company, it’s our responsibility to come together to create a culture where all Dialers can Work BeautifullyDelight Our Users, and Innovate Continuously to bring our world-class product to life. 

Every Voice Matters at Dialpad. We build community through our Employee Resource Groups, company-wide celebrations, service days, and a robust internal learning & development program focused on the success of our Dialers.

Don’t meet every single requirement? Studies have shown that women and marginalized groups are less likely to apply to jobs unless they meet every single qualification. At Dialpad we are dedicated to building an inclusive and authentic workplace, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyways. You may be just the right candidate for this or other roles.

Dialpad is an equal-opportunity employer. We are dedicated to creating a community of inclusion and an environment free from discrimination or harassment.

More Jobs at Dialpad