The team
Vonage AI group is looking for a sharp DevOps engineer to work on AI/ML based services specializing in audio, text and conversational AI systems. The developed services enhance existing products and drive innovation with new products within the company.
What will you do?
- Drive operational excellence with automation of business processes that enhance productivity of Engineering teams
- Work closely with Data Science and Developers for performance optimization.
- Proactively identify opportunities for business process improvements, recommending and driving implementation of tools and technologies
- Collaborate closely with cross-functional teams to integrate tools and services for Security and Compliance
- Build and maintain internally developed and vendor tools utilized to support business operations
- Design and implement central observability dashboards assimilating data from various Monitoring, Alert Management and Analytics platforms
- Develop and maintain tools to support Incident Management with integration across Okta, Slack, Jira, Confluence and Google Docs
- Develop secure and compliant CI/CD pipeline frameworks which can be replicated and adopted by cross functional engineering teams
- Build tools to generate periodic reports on service availability, performance, top incidents and other key organizational metrics
- Provide training and support to engineering team on business support tools
What You Will Bring:
- Willingness and ability to learn Tools & Technologies quickly and apply them to improve business processes
- Experience building internal platform engineering tools with Python or Go
- Expertise working with Docker, Kubernetes, Helm and Argo.
- Experience with IaC tools like Terraform, Pulumi or other.
- Experience with Code Repositories, and Code deployment tools GitHub, GitHub Actions, Azure Devops, ArgoCD
- Extensive experience with AWS components and services like EKS, EC2, VPC, CloudWatch, S3, IAM, Lambda, API Gateway, SQS
- Experience developing integrations with Collaboration and Issue tracking tools like Slack, Jira, Confluence, Google Docs, Google Sheets
- Experience using and developing integrations with Monitoring, Alert Management, Analytics platforms like Opsgenie, Nagios, Grafana, Prometheus, AWS cloudwatch, Elastic Search, Kibana, Tableau
- Expertise with Linux Operating Systems and good understanding of TCP/IP networking, IT Security concepts
- Knowledge of user identity management, single sign on and role-based access concepts
- Ability to present/lead technical discussions with cross functional Development, IT and Security teams
- Self-directed, works independently and with the attitude that everything can be automated
Good to have:
- MLops / LLMops experience
- Working on AI/ML projects including building infrastructures, deploying and running self hosted models.
#LI-JB1