+ - 0:00:00
Notes for current slide

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting.

Notes for next slide

Galaxy Training Network

Galaxy Monitoring with Telegraf and Grafana

Tip: press P to view the presenter notes

1 / 12

Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press P again to switch presenter notes off

Press C to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting.

question Questions

  • How to monitor Galaxy with Telegraf

  • How do I set up InfluxDB

  • How can I make graphs in Grafana?

  • How can I best alert on important metrics?

2 / 12

objectives Objectives

  • Setup InfluxDB

  • Setup Telegraf

  • Setup Grafana

  • Create several charts

3 / 12

Telegraf, InfluxDB, and Grafana

General purpose tools for monitoring systems and services.

Tool Use
Telegraf plugin-driven server agent for collecting & reporting metrics
Influxdb purpose built time series database
Grafana dashboard for beautiful analytics and monitoring

Dataflow:

  • Galaxy produces data
  • Telegraf consumes and buffers it, before sending it to
  • InfluxDB which stores the data
  • And Grafana is used to visualise it
4 / 12
  • Monitoring in Galaxy is easy to setup.
  • Galaxy produces data, which is consumed by telegraf.
  • telegrafends data to Influx DB.
  • This data is visualized in Grafana.

Grafana showcase

If you see a dashboard you can export its configuration and put it on your Grafana with your data. Copy away!

5 / 12
  • We have several public Grafana servers.
  • If you like any of our graphs, you can copy them.

galaxy dashboard showing route timings, user counts, job counts, etc.

6 / 12
  • We have built numerous dashboards for monitoring Galaxy.
  • These include scripts and playbooks and configuration for everything.
  • Here is EU's galaxy dashboard showing active users, running and unscheduled jobs, etc.

node detail dashboard with filesystem usage, process states, cpu, memory, load, network, etc.

7 / 12
  • However sometimes we notice something going wrong with our infrastructure.
  • We use the node detail dashboard to begin our investigation.
  • It gives us a very fast overview of the server.
  • This can help efficiently pinpoint isuses.

DB dashboard showing transactions, tuples fetched/modified, and index sizes for each database

8 / 12
  • We also monitor the database heavily.
  • All of this monitoring is built into telegraf.
  • We need to be able to correlate latency with autovacuums or contention.
  • We monitor table size changes to check for anomalies.

user statistics page for Eu with 23k users, 30k workflows, 400k histories, 13M jobs, and 30M datasets. Additional breakdowns provided for years of compute time on various clusters included 1k years on de.NBI cloud.

9 / 12
  • Our staff often needs to report numbers for their grants.
  • We produced this user statistics dashboard to help them.
  • Now they can answer their own questions, and make their own graphs, without admin help.

cvmfs dashboard showing which repos each server supports in green, and missing ones in white. ~90% of repos are supported

10 / 12
  • We don't just monitor Galaxy though.
  • We also monitor CVMFS, and the availability of repositories in each server.
  • This can give a good view of which repositories are replicated.

keypoints Key points

  • Telegraf provides an easy solution to monitor servers

  • Galaxy can send metrics to Telegraf

  • Telegraf can run arbitrary commands like gxadmin, which provides influx formatted output

  • InfluxDB can collect metrics from Telegraf

  • Use Grafana to visualise these metrics, and monitor their values

11 / 12

Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors!

Galaxy Training Network

This material is licensed under the Creative Commons Attribution 4.0 International License.

12 / 12

question Questions

  • How to monitor Galaxy with Telegraf

  • How do I set up InfluxDB

  • How can I make graphs in Grafana?

  • How can I best alert on important metrics?

2 / 12
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow