Cohere

GPU Infrastructure Engineer at Cohere

📍 toronto, on, Canada ⏰ Full-time

Description

Join Cohere as a Software Engineer focused on GPU infrastructure and high performance computing. Leverage your expertise in ML tooling, Kubernetes, and cloud-native environments to drive AI innovations.

In this role, you'll be part of a specialized internal infrastructure team dedicated to building top-notch tools for training and evaluating foundational AI models. Collaborate closely with AI researchers to address complex technical needs while focusing on stability and scalability. Your contributions will involve deploying superclusters and optimizing HPC infrastructure to accelerate model development for Cohere's platform.

Key Responsibilities: • Build and operate Kubernetes GPU/TPU superclusters across multiple clouds • Optimize infrastructure for AI training using RDMA and high-speed interconnects • Troubleshoot and resolve performance bottlenecks efficiently • Design self-service tools for researchers to manage training jobs • Advocate for best practices in obser...
Apply Now