A common characteristic to major physics experiments is an ever increasing need of computing resources to process experimental data and generate simulated
data. The IN2P3 Computing Center (CC-IN2P3) provides its 2,500 users with about 35,000 cores and processes millions of jobs every month. To schedule such a workload under specific constraints, the CC-IN2P3 relied for 20 years on an in-house job and resource management system complemented by an operation team who can directly act on the decisions made by the job scheduler and modify them. This system has been replaced in 2011 but legacy rules of thumb remained. Combined to other rules motivated by production constraints, they may act against the job scheduler optimizations and force the operators to apply more corrective actions than they should.
In this talk, I will present a characterization of the large HTC workload executed at the CC-IN2P3 and describe the decisions made since the end of 2016 to either transfer some of the actions done by operators to the job scheduler or make these actions become unnecessary. The preliminary but promising results coming from these modifications constitute the beginning of a long-term activity to change the operation procedures applied to the computing infrastructure of the IN2P3 Computing Center.
Date: August 14, 2018
Time: 11am PST / 2pm EST
Location: 11th floor Conference Room 1135, Information Sciences Institute, Marina del Rey, CA, USA
Frédéric Suter, Ph.D. (CNRS, IN2P3, France)
Frédéric Suter is a tenured CNRS researcher at the IN2P3 Computing Center in Lyon, France, since 2008. His research interests include scheduling, Grid computing and platform and application simulation. He is one of the main developer of the SimGrid toolkit. He obtained his M.S. from the Université Jules Verne, Amiens, France, in 1999, his Ph.D. from the Ecole Normale Supérieure de Lyon, France, in 2002 and his Habilitation Thesis from the Ecole Normale Supérieure de Lyon, France in 2014.