name: inverse layout: true class: center, middle, inverse
--- # Galaxy from an administrator point of view .footnote[Tip: press `P` to view the presenter notes] ??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off --- !(../images/use-resource-banner.png) --- # Where can Galaxy run? * Cloud (SaaS) - [usegalaxy.org](https://usegalaxy.org) - [Public Galaxy Servers](https://wiki.galaxyproject.org/PublicGalaxyServers) - [Amazon EC2](https://wiki.galaxyproject.org/CloudMan) - Semi-private cloud (e.g.: [Genomics Virtual Lab](https://www.genome.edu.au/), [Jetstream](http://jetstream-cloud.org/)) * Private cloud (build your own Galaxy SaaS) * Cloud (IaaS) - Any cloud * Scalable Local Server - Dedicated or shared compute cluster(s) - Cloud compute resources * Standalone Local Server --- # Choosing where to run [Resource Directory](https://galaxyproject.org/use/) !(../images/options-to-choose.png) --- ## Reasons to Install Your Own Galaxy - You want to run a local [production Galaxy](https://galaxyproject.org/admin/config/performance/production-server/) - You want to develop [Galaxy tools](https://galaxyproject.org/admin/tools/add-tool-tutorial) - You to [Develop Galaxy](https://galaxyproject.org/develop/) itself - [Install](https://galaxyproject.org/admin/tools/add-tool-tutorial) and use tools unavailable on [public Galaxies](https://galaxyproject.org/public-galaxy-servers) - Use sensitive data (e.g. clinical) - Process large datasets that are too big for public Galaxies - [plug-in](https://galaxyproject.org/admin/internals/data-sources) new datasources --- # Software Requirements Required: - Galaxy is written in Python and depends on **Python 2.7** - All major distros in wide circulation have 2.7, *except* RHEL<sup></sup> 6 - See: Software Collections for [RHEL](https://access.redhat.com/solutions/472793), [CentOS](https://wiki.centos.org/AdditionalResources/Repositories/SCL), [Scientific Linux](http://linux.web.cern.ch/linux/scl/) Optional (but not really): - [PostgreSQL](https://galaxyproject.github.io/dagobah-training/2018-oslo/03-production-basics/databases.html) - [Reverse proxy server](https://galaxyproject.github.io/dagobah-training/2018-oslo/03-production-basics/webservers.html) (NGINX, Apache) --- # Hardware Requirements This depends: - What do you intend to run? - Where do you intend to run it? If possible, run the Galaxy server **separate** from Galaxy jobs **Storage** will be the bigger concern --- # Hardware Estimates Based on concurrent user count and assuming separate compute for jobs: Users | Resource estimate ----------|------------------- 1 - 5 | 1 core, 1GB, 10 TB 5 - 20 | 2 cores, 2 GB, 40 TB 20 - 40 | 8 cores, 8 GB, 200 TB 40+ | multiple hosts, 16 GB/host, 500 TB, dedicated DB host Storage is the big variable since it, like compute, is **analysis** and **policy** dependent --- template: left-aligned # On Storage... The Philosophy of Galaxy: - Foster transparency and reproducibility - Data is always created, *never overwritten* - Data is never deleted unless *explicitly instructed* - Even deleted data can be undeleted unless *forcibly purged* Additionally, tools can produce large amounts of transient data while running. --- template: left-aligned # Storage Requirements An "average" NGS analysis (by Anton): **66 GB** 10 users, 10 histories: **> 6 TB** Solutions: - Quotas - Clean up deleted data (aggressively) - Forced removal based on age available --- # Compute Requirements This depends: - What tools will your users be using? - What are their requirements? - In general, the most commonly used tools use a single core - But can use lots of memory! - Some compute-intensive tools use multiple cores usegalaxy.org allocates from **8 GB/core** to **16 GB/core** Connecting Galaxy to clusters/HPC is covered in the advanced section. --- # How is Galaxy deployed? Current: - Ansible / `git clone https://github.com/galaxyproject/galaxy.git` - *Framework dependencies* provided as Python "wheels", fetched at first startup with `pip` - *Tool dependencies* provided as Conda packages or legacy Tool Shed packages Future: - RPM, Debian packages - Galaxy wheel in PyPI - Watch: [Pull Request #921](https://github.com/galaxyproject/galaxy/pull/921) --- # Making Plans **Before** deploying your first Galaxy server: - Get PostgreSQL (you do not want to switch later) - Figure out where Galaxy will be stored - Make sure it will be accessible to any eventual compute - Figure out where data will be stored - Make sure it will be accessible to any eventual compute --- # Deployment Best Practices **Use configuration management** - [Ansible](http://docs.ansible.com) ([tutorial](https://galaxyproject.github.io/dagobah-training/2018-oslo/14-ansible/ansible-introduction.html)) - [Chef](https://www.chef.io/) - [Puppet](https://puppet.com) - [SaltStack](https://saltstack.com) - [CFEngine](https://cfengine.com) **Use configuration management** but if you don't, *record every change you make somehow.* - Large, complex deployments grow organically - If you don't know what you did you can't do it again - "My context switching penalty is high and my process isolation is not what it used to be." - Elon Musk --- template: left-aligned # System Administration Best Practices Take security seriously - OS security best practices (not covered yet) - **Update Galaxy when security updates are released** - We put a lot of effort into these! Privilege separate if you can - Separate code and job/data ownership Write protect Galaxy and data if you can - Read-only cluster mounts Back up everything (except that which is managed by configuration management) --- ## Related tutorials --- ## Thank you! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://wiki.galaxyproject.org/Teach/GTN) and all the contributors!