Computing Engineer

£6560.12 - £9367.41 per month
15 Feb 2018
15 Mar 2018
CERN European
Job Type

The CERN IT Department is completing the re-engineering of the monitoring and alarms infrastructure for the IT Data Centres and for the WLCG Grid Infrastructure. Therefore, the monitoring team is designing, developing and deploying the unified monitoring and alarms infrastructure that is progressively replacing the IT and WLCG specific solutions. This unified monitoring covers all phases of the dataflow such as collecting metrics and logs, live online data streaming, large-scale analytics and comprehensive alarms.

As a Computing Engineer in the IT-CM Group, you will contribute to the development, operations of the CERN unified monitoring and alarms infrastructure across a wide range of platforms and technologies needed to stream, analyse and store metrics collected, with a frequency of 80k metrics/sec, from:
  • The CERN Data Centres compute and storage resources (e.g. 35k VMs, > 8k hosts);
  • The WLCG world-wide activities for data transfers, jobs execution and resources availability on more than 300 grid sites.

The unified monitoring infrastructure is fully deployed and managed at CERN by the Monitoring team and is based on established open source technologies, such as Collectd, Kafka, Spark, ElasticSearch, HDFS, InfluxDB and Grafana.

Your main activities will consist of:
  • Key contribution in the development and improvement of the monitoring service.
  • Operation and support of the monitoring infrastructure service.
  • User support on monitoring for other IT services and joining the regular support rota for IT Data Centres and WLCG Grid users.
  • Advice and solution design to IT service managers and LHC experiments staff on the use of the unified monitoring infrastructure and of the monitoring data.
  • Management of the monitoring services which includes third level service user support, deployment of software packages, operation of the service with verification of performance and security.
  • Automation using Agile/DevOps tools such as Puppet to support monitoring within the standard CERN IT procedures.
  • Collaboration in the CERN IT change management procedures.
  • Definition, documentation and implementation of procedures following the standards of the IT Department.
  • Close collaboration with other IT groups, departments and physics experiments at CERN.
  • Interaction and contribution to Open Source communities such and Grafana and Collectd.

Qualification required
Master's degree or PhD or equivalent relevant experience in the field of Computer Science or related field.

The experience required for this post is:
  • Proven experience in software development, with knowledge of established open source technologies such as Collectd, Kafka, Spark, ElasticSearch, InfluxDB and Grafana.
  • Managing large-scale server deployments in a complex environment such as OpenStack cloud services or virtualisation systems such as KVM.
  • Performing automation tasks using scripting languages such as Python and deploying services using configuration system solutions such as Puppet.
  • Analysis of performance to scale them to the monitoring infrastructure to an ever-growing workload.

The technical competencies required for this post are:
  • Programming/Software development: commitment to work in an agile development environment in a young and motivated team of engineers.
  • IT operations: deliver and support properly-engineered it services and products to meet the needs of CERN.
  • Design and selection of methods and tools: analyse sub-optimum procedures, streamline and document new ones; optimisation of existing tools and selection of the most appropriate one f.
  • Technical advice and guidance: handling of special user requests; development of technical expertise within the team.
  • Change management: coordination of changes within and outside the group.
  • Release management: organisation of large scale deployments in a complex environment.

The language competencies required are:
  • Spoken and written English: ability to draw-up technical specifications, documentation and/or scientific reports. Basic knowledge of French or an undertaking to acquire it rapidly.

This vacancy will be filled as soon as possible, and applications should normally reach us no later than 12.03.2018.

We offer a limited-duration contract for a period of 5 years.

These functions require:

  • Participation in a regular stand-by duty, including nights, Sundays and public holidays.
  • Stand-by duty, when required by the needs of the Organization.
  • Work during nights, Sundays and official holidays, when required by the needs of the Organization.