4–6 Nov 2021
Iasi
Europe/Bucharest timezone

Implementation of an email-based alert system for large-scale system resources

Speaker

Mr Robert POENARU (Horia Hulubei National Institute of Physics and Nuclear Engineering)

Description

Tackling the current problems of interest for physicists that deal with various topics requires lots of computing simulations. Identifying and preventing any unusual behavior within the system resources that execute large-scale calculations is a crucial process when dealing with system administration since it can improve the run-time performance of the resources themselves and also help the physicists by obtaining the required results faster. In the present work, a simple \emph{pythonic} implementation which 1) monitors a given computing architecture (i.e., its system resources such as CPU and Memory usage), and 2) alerts a custom team of administrators via e-mail in almost real-time when certain thresholds are passed, is presented. Using existing packages written in Python, with the current implementation it is possible to send e-mails to a predefined list of clients containing detailed information about any machine running outside the "normal" parameters.

Author

Mr Robert POENARU (Horia Hulubei National Institute of Physics and Nuclear Engineering)

Presentation materials