McGill University is seeking a Senior Systems Administrator to take a significant role in the operations and maintenance of present systems and services, as well as planning future initiatives in the area of Advanced Research Computing (ARC). Reporting to the Associate Director, Operations, of the McGill High Performance Computing Centre (“HPC Centre”), the incumbent will work within the Calcul Québec (CQ) organization and join a vibrant team of HPC systems administrators and analysts across several Québec institutions. It is an opportunity to be working with leading edge technology and a team that has installed and operated supercomputers in the Top 500.
McGill is a founding member of CQ, a consortium of Québec universities whose objective is to provide advanced research computing services and capabilities to the research community including HPC data Centres at the leading edge of technology and highly qualified computing experts. More than 600 research groups take advantage of the resources made available to them by CQ to conduct research in various fields. CQ is a Regional Partner of Compute Canada (CC), the non-profit organization in charge of coordinating ARC efforts throughout Canada.
The ARC environment includes an HPC cluster consisting of over one thousand nodes with a mixture of CPU and GPU processors, a cloud environment, parallel filesystems (Lustre), virtualization and container environments, as well as a multi-petabyte disk and tape storage environment with backup and archive capabilities.
Position location: ETS – École de technologie supérieure
Primary Responsibilities:
The Senior Systems Administrator will be responsible for the core operations, maintenance, and growth planning for the HPC Centre’s network, the architecture development and operational aspects for cyber-security. He – she will be responsible for both operational activities and specific tasks as part of the development projects.
Specific tasks:
Manage the configuration of routers, switches, firewalls, probes, and plan the interconnectivity of the new equipment;
Together with the cyber-security team, and following the security directives – plan the configuration of the network equipment, define and manage the firewall rules;
Develop and document the architecture of the new technical solutions;
Analyse the existing networks, the systems interconnections, and propose improvements reflecting the required configurations;
Manage the monitoring equipment (perfSonar and other) and together with the monitoring group perform the network monitoring;
Detect the service interruptions, troubleshoot, and replace defective equipment together with the other sysadmins;
Establish & maintain clear and thorough documentation (diagrams, address mapping, routing protocol, version control) for all clusters;
Collaborate with the IT staff from other institutions, regional, and national organization to ensure the oversight of the activities and solutions;
Propose, plan and manage software and hardware upgrades within the HPC Centre environment;
Maintain and upgrade the software (e.g. Cumulus);
Report operational statistics and performance metrics data to management.
Other Qualifying Skills and/or Abilities
At least 3 years of experience in a large, enterprise environment containing hundreds of server, storage and network elements operating in a cluster setup using Linux, InfiniBand and Ethernet.
Demonstrated expertise in :
Network
VLAN, VXLAN (EVPN), BGP (eBGP, iBGP, ECMP), VRF, LAG, DHCP, IPv4/IPv6, QoS, ntp, DNS, proxy, etc.;
InfiniBand, Fiber Channel, Cumulus Linux.
Firewall
Environment VŷOS;
VRRP, ACL, zonage.
Automatization
Ansible, GIT (gitlab), Puppet.
Monitoring
Sflow, SNMP, ELK (ElasticSearch-Logstash-Kibana).
Architecture
Experience with the design and implementation of a network architecture, following the security and technology best practices, covering performance and capacity planning.
Planning, designing and upgrading network installation projects
Other required skills:
Attention to detail in the level of work performed, taking pride, responsibility and a sense of ownership for the successful operations of the systems under their administration and the availability and reliability of those systems in support of all research users. Advanced problem-solving skills. Good oral and written communication skills in both French and English. Ability to work in complex technical environments. Ability to effectively work under pressure, with multiple concurrent tasks and priorities, to achieve successful outcomes and results. Ability to take supervisory and management direction to work effectively and with little direct supervision in order to complete tasks. Ability to work effectively with a distributed team in a collaborative environment across Quebec and Canada. Ability to work cooperatively with a diverse team of professionals, acting as a technical resource for others in the team, as well as to work together with other staff on projects of significant importance and value to the organization and to the clients we serve. Ability to perform problem identification and perform issue resolution in a complex environment. A demonstrated aptitude for learning new technologies.
Minimum Education and Experience:
Bachelor’s Degree 5 Years Related Experience /
Annual Salary:
(MPEX Grade 07) $79,200.00 – $118,800.00
Hours per Week:
33.75 (Full time)
Supervisor:
Associate Director Operations
Position End Date (If applicable):
12/15/2023
Deadline to Apply:
04/26/2021
McGill University hires on the basis of merit and is strongly committed to equity and diversity within its community. We welcome applications from racialized persons/visible minorities, women, Indigenous persons, persons with disabilities, ethnic minorities, and persons of minority sexual orientations and gender identities, as well as from all qualified candidates with the skills and knowledge to productively engage with diverse communities. McGill implements an employment equity program and encourages members of designated groups to self-identify. Persons with disabilities who anticipate needing accommodations for any part of the application process may contact, in confidence, [email protected] or 514-398-3711.
Apply Now
To help us with our recruitment effort, please indicate your email/cover letter where (vacanciesincanada.ca) you saw this job posting.