name: inverse layout: true class: center, middle, inverse
--- # Using Heterogeneous Compute Resources --- # What are heterogenous compute resources? Differences in: - Operating system or version - Users/groups - Data accessibility - Administrative control - Physical Location (i.e. Cities) Galaxy expects: - One OS, version (dependencies) - Shared filesystem w/ fixed paths --- # Example - Australia ![australia_locations.png](../../images/australia_locations.png) --- # Partial solution - CLI job runner SSH to remote, submit jobs with CLI `sbatch`, `qsub`, etc. Still depends on shared FS --- # Pulsar ![pulsar_logo.png](../../images/pulsar_logo.png) Galaxy's remote job management system * Can run jobs on any(?) OS including Windows * Multiple modes of operation for every environment --- # Pulsar - Architecture * Pulsar server runs on remote resource (e.g. cluster head node) * Galaxy Pulsar job runner is Pulsar client * Transport is HTTP or AMQP, language is JSON --- # Pulsar - Architecture ![pulsar_schematic.png](../../images/pulsar_schematic.png) --- # Pulsar Transports - RESTful Pulsar server listens over HTTP(S) Pulsar client initiates connections to server Good for: - Environments where firewall, open ports are not concerns - No external dependencies (AMQP server) --- # Pulsar Transports - AMQP Pulsar server and client connect to AMQP server Good for: - Firewalled/NATted remote compute - Networks w/ bad connectivity --- # Pulsar Transports - Embedded Galaxy runs Pulsar server internally Good for: - Manipulating paths - Copying input datasets from non-shared filesystem --- # Pulsar - Job file staging Pulsar can be configured to *push* or *pull* when using RESTful: - Push - Galaxy sends job inputs, metadata to Pulsar over HTTP - Upon completion signal from Pulsar, Galaxy pulls from Pulsar over HTTP - Pull - Upon setup signal, Pulsar pulls job inputs, metadata from Galaxy over HTTP - Upon completion, Pulsar pushes to Galaxy over HTTP Pulsar can use libcurl for more robust transfers with resume capability AMQP is pull-only because Pulsar does not run HTTP server --- # Pulsar - Dependency management Pulsar does not provide Tool Shed tool dependency management. But: - It has a similar dependency resolver config to Galaxy - It can auto-install **conda** dependencies - It can use containers too! --- # Pulsar - Job management Pulsar "managers" provide job running interfaces: - `queued_python`: Run locally on the Pulsar server - `queued_drmaa`: Run on a cluster with DRMAA - `queued_cli`: Run on a cluster with local `qsub`, `sbatch`, etc. - `queued_condor`: Run on HTCondor --- # Pulsar Australia ![pulsar_australia.png](../../images/pulsar_australia.png) --- # Resources * Pulsar Read-the-docs * [https://pulsar.readthedocs.io/en/latest/index.html](https://pulsar.readthedocs.io/en/latest/index.html) * Pulsar on galaxyproject.org * [https://galaxyproject.org/admin/config/pulsar/](https://galaxyproject.org/admin/config/pulsar/) * Pulsar Github * [https://github.com/galaxyproject/pulsar](https://github.com/galaxyproject/pulsar) * Pulsar Ansible * [https://github.com/galaxyproject/ansible-pulsar](https://github.com/galaxyproject/ansible-pulsar) --- ## Thank you! This material is the result of a collaborative work. Thanks the [Galaxy Training Network](https://wiki.galaxyproject.org/Teach/GTN) and all the contributors (Nate Coraor, Simon Gladman) !
.footnote[Found a typo? Something is wrong in this tutorial?
Edit it on [GitHub](https://github.com/galaxyproject/training-material/tree/master/topics/admin/tutorials/heterogeneous-compute/slides.html)]