Computing Cluster Operations Lead

About Mila

Founded by Professor Yoshua Bengio from the University of Montreal, Mila brings together researchers specializing in artificial intelligence (AI), particularly in machine learning. Globally recognized for its significant contributions to the fields of deep learning and reinforcement learning, Mila has distinguished itself in areas such as language modeling, machine translation, object recognition, and generative models. Since 2017, Mila has been the result of a collaboration between the University of Montreal and McGill University, in close partnership with Polytechnique Montreal and HEC Montreal.

Mila’s mission is to be a global hub for scientific advancements, inspiring innovation and the growth of artificial intelligence for the benefit of all.

The Role

Mila is seeking a highly experienced and visionary Head of Infrastructure to lead and evolve our critical computing infrastructure. This individual will be responsible for the strategic planning, design, implementation, and operation of Mila's high-performance computing (HPC/AI) clusters, data centers, and network infrastructure. The successful candidate will play a pivotal role in ensuring that our researchers and students have access to state-of-the-art computing resources to push the boundaries of AI.

Responsibilities

  • Strategic Leadership: Develop and execute a comprehensive infrastructure strategy aligned with Mila's research goals, including future needs for growth and emerging technologies.
  • HPC Cluster Management: Oversee the architecture, deployment, maintenance, and optimization of HPC clusters, ensuring high availability, performance, and scalability.
  • Vendor Management & Procurement: Lead the RFP process for the procurement of new HPC clusters and other infrastructure components, ensuring cost-effectiveness and alignment with technical requirements.
  • Team Leadership: Lead, mentor, and grow a team of skilled infrastructure engineers and administrators.
  • Operations & Reliability: Establish and enforce best practices for infrastructure operations, monitoring, troubleshooting, and incident response to maintain a highly reliable environment.
  • Budget Management: Manage infrastructure budgets.
  • Security & Compliance: Ensure the security and compliance of all infrastructure components, implementing robust security measures and data protection protocols.
  • Collaboration: Work closely with researchers, faculty, and other departments to understand their computing needs and provide tailored solutions.
  • Innovation: Stay abreast of the latest advancements in computing infrastructure and AI hardware, proposing and implementing innovative solutions to enhance Mila's capabilities.

Qualifications

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  • 10+ years of experience in IT infrastructure, with at least 5 years in a leadership role managing complex computing environments.
  • Deep expertise in HPC cluster architecture, design, and operations, including experience with schedulers (e.g., Slurm), high-speed interconnects (e.g., InfiniBand), and parallel file systems (e.g., Lustre, BeeGFS).
  • Proven experience managing data centers, network infrastructure, and storage solutions.
  • Strong understanding of virtualization technologies (e.g., Proxmox, Docker, Podman).
  • Experience with infrastructure as code (e.g., Ansible, Terraform) and automation tools.
  • Excellent leadership, communication, and interpersonal skills, with the ability to articulate complex technical concepts to both technical and non-technical audiences.
  • Demonstrated ability to manage projects, prioritize tasks, and work effectively in a fast-paced research environment.
  • A passion for contributing to cutting-edge AI research and a commitment to Mila's mission.

Desirable skills

  • Experience with GPU-accelerated computing and deep learning frameworks.
  • Knowledge of research computing environments and the specific challenges faced by AI researchers.
  • Familiarity with open-source technologies and community contributions.

Why join Mila?

  • The opportunity to contribute to a unique mission with a major impact;
  • A comprehensive group insurance program (health, dental, disability, life, travel and extended benefits);
  • An employee and family assistance program;
  • Access to a telemedicine service;
  • A vacation policy offering a base of 20 days' vacation upon hiring;
  • A retirement savings plan with a minimum employer contribution of 4%;
  • A generous flexible package allowing you to tailor your benefits to what contributes to your well-being. You can select and combine options to suit your needs, including lifestyle credits, enhanced insurance, extra vacation days and increased pension contributions;
  • Flexible working hours, a summer schedule and the possibility of telecommuting;
  • A work environment in the heart of Little Italy, in the trendy Mile-Ex district, close to public transportation;
  • A team of passionate experts in their field;
  • A collaborative and inclusive work environment.

We want to know you

At Mila, diversity is important to us. We value a work environment that is fair, open and respectful of differences. We encourage anyone who wants to work in an ecosystem that is constantly evolving and stimulated to contribute to the application and definition of a healthy and inclusive culture, to apply.

Please note that only selected candidates will be contacted.

https://mila.quebec/fr/protection-de-la-vie-privee

Apply on this job

6666 Rue Saint-Urbain Suite 200, Montreal, QC H2S 3H1, Canada

Apply on this job

Related job offers

Superviseur développement analytique

 Rejoignez l'Aventure HALO PHARMA : Ensemble, Créons un Impact !Chez Halo Pharma, nous sommes une entreprise ambitieuse, déterminée à faire la différence pour nos clients ! Dans notre environnement de travail dynamique et convivial, l’ambiance est...

Analyste contrôle qualité - Temporaire indéterminé

 **CE POSTE EST UN MANDAT TEMPORAIRE À DURÉE INDÉTERMINÉE - 37.5H/SEM- JOUR - LUNDI AU VENDREDI**Rejoignez l'Aventure HALO PHARMA : Ensemble, Créons un Impact !Chez Halo Pharma, nous sommes une entreprise ambitieuse, déterminée à faire la...

Analyste développement analytique

Description de posteRejoignez l'Aventure HALO PHARMA : Ensemble, Créons un Impact !Chez Halo Pharma, nous sommes une entreprise ambitieuse, déterminée à faire la différence pour nos clients ! Dans notre environnement de travail dynamique et...

Technicien de Prélèvements et d'Inspection en Entrepôt

Rejoignez l'Aventure HALO PHARMA : Ensemble, Créons un Impact !Chez Halo Pharma, nous sommes une entreprise ambitieuse, déterminée à faire la différence pour nos clients ! Dans notre environnement de travail dynamique et convivial, l’ambiance est à...

Officier·ère des sciences biologiques

En tant que militaire, les officier·ères des sciences biologiques préviennent et réduisent les menaces à la santé des membres des Forces armées canadiennes (FAC) et améliorent leur rendement opérationnel. Ils élaborent des procédures pratiques, des...

Biologiste, spécialiste des milieux humides

CA$29.41 per hour

Description de l'entrepriseNature-Action Québec est un organisme à but non lucratif, reconnu organisme de conservation, de bienfaisance et entreprise d’économie sociale qui travaille à la protection de l’environnement. Né d’une initiative citoyenne...