name: inverse layout: true class: center, middle, inverse
--- # Deployment and Platform options --- ### <i class="fa fa-question-circle" aria-hidden="true"></i><span class="visually-hidden">question</span> Questions - What options to deploy Galaxy do I have? - Which platforms are supported by Galaxy? - What requirements does Galaxy have? --- ### <i class="fa fa-bullseye" aria-hidden="true"></i><span class="visually-hidden">objectives</span> Objectives - Learn about different options about Galaxy deployment. - Make an educated decision about your preferred deployment model. --- layout: true name: left-aligned class: left, middle --- layout: true class: center, middle --- # Platform Options and Requirements .footnote[\#usegalaxy / @galaxyproject] --- # Where can Galaxy run? * Cloud (SaaS) - [usegalaxy.org](https://usegalaxy.org) - [Public Galaxy Servers](https://wiki.galaxyproject.org/PublicGalaxyServers) - [Amazon EC2](https://wiki.galaxyproject.org/CloudMan) - Semi-private cloud (e.g.: [Genomics Virtual Lab](https://www.genome.edu.au/), [Jetstream](http://jetstream-cloud.org/)) * Private cloud (build your own Galaxy SaaS) * Cloud (IaaS) - Any cloud * Scalable Local Server - Dedicated or shared compute cluster(s) - Cloud compute resources * Standalone Local Server --- # Choosing where to run __Public Prebuilt SaaS (usegalaxy.org, public servers)__ - Quickest to use today, but probably not why you're here... - Institutional/Protected data a concern - Not covered in this training __Private Prebuilt SaaS (EC2, GVL, Jetstream) or build your own__ - Great choices for people needing access to compute for a fixed time analysis - Not as conducive to collaboration, publishing - [tutorial](https://galaxyproject.github.io/dagobah-training/2018-oslo/18-clouds/clouds.html) __Build your own Galaxy SaaS__ - Personalized Galaxy instances for beginner-to-intermediate users - Requires a lot of infrastructure building - [tutorial](https://galaxyproject.github.io/dagobah-training/2018-oslo/18-clouds/clouds.html) --- # Choosing where to run __Scalable Local Server__ - Permanent Galaxy server - Flexible compute scalability - Full privacy control - [tutorial](http://galaxyproject.github.io/training-material/topics/admin/tutorials/connect-to-compute-cluster/slides.html) __Standalone Local Server__ - Permanent Galaxy server - Full privacy control - Should only consider this in cases of expected light usage - Get a beefy server --- # Software Requirements Required: - Galaxy is written in Python and depends on **Python 2.7** - All major distros in wide circulation have 2.7, *except* RHEL<sup></sup> 6 - See: Software Collections for [RHEL](https://access.redhat.com/solutions/472793), [CentOS](https://wiki.centos.org/AdditionalResources/Repositories/SCL), [Scientific Linux](http://linux.web.cern.ch/linux/scl/) Optional (but not really): - [PostgreSQL](https://galaxyproject.github.io/dagobah-training/2018-oslo/03-production-basics/databases.html) - [uWSGI](https://galaxyproject.github.io/dagobah-training/2018-oslo/10-uwsgi/uwsgi.html) (will come with Galaxy from 18.01 onwards) - [Reverse proxy server](https://galaxyproject.github.io/dagobah-training/2018-oslo/03-production-basics/webservers.html) (nginx, Apache) .footnote[<sup></sup> Point of order: Unless stated otherwise, "RHEL" refers to RHEL and derivatives (CentOS, Scientific Linux, etc.)] --- # What can run Galaxy UNIX-like operating system: - **Linux (any distribution)** - **OS X / macOS** - Windows under MinGW (maybe?) - Other architectures (maybe?) --- template: left-aligned # Hardware Requirements This depends: - What do you intend to run? - Where do you intend to run it? If possible, run the Galaxy server **separate** from Galaxy jobs **Storage** will be the bigger concern --- # Hardware Estimates Based on concurrent user count and assuming separate compute for jobs: Users | Resource estimate ----------|------------------- 1 - 5 | 1 core, 1GB, 10 TB 5 - 20 | 2 cores, 2 GB, 40 TB 20 - 40 | 8 cores, 8 GB, 200 TB 40+ | multiple hosts, 16 GB/host, 500 TB, dedicated DB host Storage is the big variable since it, like compute, is **analysis** and **policy** dependent --- template: left-aligned # On Storage... The Philosophy of Galaxy: - Foster transparency and reproducibility - Data is always created, *never overwritten* - Data is never deleted unless *explicitly instructed* - Even deleted data can be undeleted unless *forcibly purged* Additionally, tools can produce large amounts of transient data while running. --- template: left-aligned # Storage Requirements An "average" NGS analysis (by Anton): **66 GB** 10 users, 10 histories: **> 6 TB** Solutions: - Quotas - Clean up deleted data (aggressively) - Forced removal based on age available --- template: left-aligned # Compute Requirements This depends: - What tools will your users be using? - What are their requirements? - In general, the most commonly used tools use a single core - But can use lots of memory! - Some compute-intensive tools use multiple cores usegalaxy.org allocates from **8 GB/core** to **16 GB/core** Connecting Galaxy to clusters/HPC is covered in the advanced section. --- # Deployment Options and Best Practices .footnote[\#usegalaxy / @galaxyproject] --- template: left-aligned # How is Galaxy deployed? Current: - `git clone https://github.com/galaxyproject/galaxy.git` - *Framework dependencies* provided as Python "wheels", fetched at first startup with `pip` - *Tool dependencies* provided as Conda packages or legacy Tool Shed packages Future: - RPM, Debian packages - Galaxy wheel in PyPI - Watch: [Pull Request #921](https://github.com/galaxyproject/galaxy/pull/921) --- # Making Plans **Before** deploying your first Galaxy server: - Get PostgreSQL (you do not want to switch later) - Figure out where Galaxy will be stored - Make sure it will be accessible to any eventual compute - Figure out where data will be stored - Make sure it will be accessible to any eventual compute --- # Deployment Best Practices **Use configuration management** - [Ansible](http://docs.ansible.com) ([tutorial](https://galaxyproject.github.io/dagobah-training/2018-oslo/14-ansible/ansible-introduction.html)) - [Chef](https://www.chef.io/) - [Puppet](https://puppet.com) - [SaltStack](https://saltstack.com) - [CFEngine](https://cfengine.com) **Use configuration management** but if you don't, *record every change you make somehow.* - Large, complex deployments grow organically - If you don't know what you did you can't do it again - "My context switching penalty is high and my process isolation is not what it used to be." - Elon Musk --- template: left-aligned # System Administration Best Practices Take security seriously - OS security best practices (not covered yet) - **Update Galaxy when security updates are released** - We put a lot of effort into these! Privilege separate if you can - Separate code and job/data ownership Write protect Galaxy and data if you can - Read-only cluster mounts Back up everything (except that which is managed by configuration management) --- # Galaxy Server Styles - Public, anonymous allowed - Public, registration conditionally required (e.g. usegalaxy.org) - Public, self registration required - Public, admin registration required - Private, all of the above - Private, registration/access controlled externally (upstream proxy) - Private, registration/access controlled externally (Galaxy pluggable auth) --- ### <i class="fa fa-key" aria-hidden="true"></i><span class="visually-hidden">keypoints</span> Key points - Galaxy is scalable from personal computers to huge HPC and cloud-instances. - Amount of expected users and storage capabilities does have a big impact on the deployment. --- ## Thank you! This material is the result of a collaborative work. Thanks the [Galaxy Training Network](https://wiki.galaxyproject.org/Teach/GTN) and all the contributors (Nate Coraor, Björn Grüning) !