Galaxy Monitoring with Telegraf and Grafana
How to monitor Galaxy with Telegraf
How do I set up InfluxDB
How can I make graphs in Grafana?
How can I best alert on important metrics?
Create several charts
Telegraf, InfluxDB, and Grafana
General purpose tools for monitoring systems and services.
|Telegraf||plugin-driven server agent for collecting & reporting metrics|
|Influxdb||purpose built time series database|
|Grafana||dashboard for beautiful analytics and monitoring|
- Galaxy produces data
- Telegraf consumes and buffers it, before sending it to
- InfluxDB which stores the data
- And Grafana is used to visualise it
- Monitoring in Galaxy is easy to setup.
- Galaxy produces data, which is consumed by telegraf.
- telegrafends data to Influx DB.
- This data is visualized in Grafana.
If you see a dashboard you can export its configuration and put it on your Grafana with your data. Copy away!
- We have several public Grafana servers.
- If you like any of our graphs, you can copy them.
- We have built numerous dashboards for monitoring Galaxy.
- These include scripts and playbooks and configuration for everything.
- Here is EU’s galaxy dashboard showing active users, running and unscheduled jobs, etc.
- However sometimes we notice something going wrong with our infrastructure.
- We use the node detail dashboard to begin our investigation.
- It gives us a very fast overview of the server.
- This can help efficiently pinpoint isuses.
- We also monitor the database heavily.
- All of this monitoring is built into telegraf.
- We need to be able to correlate latency with autovacuums or contention.
- We monitor table size changes to check for anomalies.
- Our staff often needs to report numbers for their grants.
- We produced this user statistics dashboard to help them.
- Now they can answer their own questions, and make their own graphs, without admin help.
- We don’t just monitor Galaxy though.
- We also monitor CVMFS, and the availability of repositories in each server.
- This can give a good view of which repositories are replicated.
- Telegraf provides an easy solution to monitor servers
- Galaxy can send metrics to Telegraf
- Telegraf can run arbitrary commands like `gxadmin`, which provides influx formatted output
- InfluxDB can collect metrics from Telegraf
- Use Grafana to visualise these metrics, and monitor their values