name: inverse layout: true class: center, middle, inverse
--- # Galaxy Monitoring .footnote[Tip: press `P` to view the presenter notes] ??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off --- ### <i class="fa fa-question-circle" aria-hidden="true"></i><span class="visually-hidden">question</span> Questions --- ## Manage Jobs An admin interface to list current unfinished jobs and finished jobs of a certain age. * You can stop unfinished jobs * You can show details of old jobs * You can lock the server from spawning new jobs. (e.g. for maintenance.) --- # Log Files - Galaxy logs (`/srv/galaxy/log/*`) - Web (uWSGI) - Handler - nginx logs (`/var/log/nginx/*`) - supervisor logs (`/var/log/supervisor/*`) --- # Analytics Can we make better walltime decisions? `scripts/runtime_stats.py`: Database-driven job runtime statistics --- # Reports Galaxy ships with its own app that reports usage (user, job, data, etc numbers) --- # Nagios [Nagios](https://www.nagios.com/) is a general-purpose tool for monitoring systems and services. Galaxy-specific check in `contrib/nagios/`: Runs Galaxy jobs --- # Sentry * Motto: *"Stop hoping your users will report errors"* * Error tracking and analysing tool. * Galaxy has Sentry middleware that you can enable in configuration. --- # Job Metrics Galaxy can collect metrics on each job through configurable plugins in `job_metrics_conf.xml`. Some plugins: - `core`: Captures Galaxy slots, start and end of job, runtime - `cpuinfo`: processor count for each job - `env`: dump environment for each job - `collectl`: monitor a wide array of system performance data --- # Telegraf, InfluxDB, and Grafana General purpose tools for monitoring systems and services. Tool | Use --- | --- [Telegraf](https://github.com/influxdata/telegraf) | plugin-driven server agent for collecting & reporting metrics [Influxdb](https://github.com/influxdata/influxdb/) | purpose built time series database [Grafana](https://grafana.com/) | dashboard for beautiful analytics and monitoring Dataflow: - Galaxy produces data - Telegraf consumes and buffers it, before sending it to - InfluxDB which stores the data - And Grafana is used to visualise it --- # Infrastructure for Grafana * Everything captured in Galaxy Ansible [infrastructure-playbook](https://github.com/galaxyproject/infrastructure-playbook/) repository. * Ansible [playbook](https://github.com/dj-wasabi/ansible-telegraf) to install Telegraf. * Ansible [tasks](https://github.com/galaxyproject/infrastructure-playbook/blob/master/roles/stats/tasks/redhat.yml) for installing InfluxDB and Grafana. --- # Grafana showcase * usegalaxy.eu [public server](https://stats.usegalaxy.eu) * usegalaxy.org.au [public server](https://stats.genome.edu.au) * usegalaxy.org private server If you see a dashboard you can export its configuration and put it on your Grafana with your data. Copy away! --- ## Thank you! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://wiki.galaxyproject.org/Teach/GTN) and all the contributors!