name: inverse layout: true class: center, middle, inverse
# Galaxy Tool Management with Ephemeris
Marius van den Beek
to view the presenter notes |
Use arrow keys to move between slides
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- ### <i class="far fa-question-circle" aria-hidden="true"></i><span class="visually-hidden">question</span> Questions - How are tools configured on a Galaxy instance? - What is the Galaxy Tool Shed? - How are Galaxy tools installed? - What is ephemeris and how can it be used to manage tools on a Galaxy instance? --- ## Galaxy tools * A Galaxy `tool` or `wrapper` is an XML file describing how some software program works * allows Galaxy to display the tool interface and execute the software * describes all software requirements, tests, inputs and outputs, help text, citations * The underlying software packages needed to execute the tool command are called `requirements` or `dependencies` * A toolshed `repository` is a code archive in Tool Shed containing Galaxy tool(s) ??? A Galaxy `tool` or `wrapper` is an XML file describing how some software program works The wrapper xml file contains definitions of the tool input form and instructions to translate form entries into a command to execute the tool Underlying software packages could be packages like samtools or biopython. A repository in the Galaxy Tool Shed is a versioned code archive containing one or more Galaxy tools --- ## The tool panel .left-column50[ ![Tool panel](../../images/tool-management-02-tool-panel.png) ] .right-column50[ - Panel on left-hand side of Galaxy UI - Contains Galaxy tools organised into sections - Some tools are distributed together with Galaxy ("built-in tools") - Tools can be installed from the Galaxy Tool Shed. - Customisable: admins can choose which tools are installed and how the tool panel is organised ] ??? The tool panel on the left hand side of a Galaxy site contains all of the tools available on that Galaxy instance, arranged into sections. In this picture the Text Manipulation section has been expanded. There are tools that come with the galaxy code such as the uploader, but the vast majority of tools on a public galaxy have been installed from the toolshed. The contents and layout of the tool panel are customisable. --- class: left ## Tool configuration files `tool_conf.xml` (default `tool_conf.xml.sample`) - Contains built-in galaxy tools (tools in galaxy codebase) - Can contain manually added tools `shed_tool_conf.xml` - Contains tools downloaded from the tool shed - Managed by galaxy `integrated_tool_panel.xml` - Contains all tools from `tool_conf.xml` and `shed_tool_conf.xml` - Automatically generated by Galaxy - Can be edited to change the layout of the tool panel ??? The contents of the tool panel are defined by three configuration files The tool_conf file contains galaxy built in tools and manually added tools The shed_tool_conf file is managed by galaxy and contains all tools installed from the tool shed The integrated_tool_panel file contains all of the tools from tool_conf and shed_tool_conf and can be edited to change the layout of the tool panel --- class: left `tool_conf.xml` ```xml <toolbox monitor="true"> <section id="getext" name="Get Data"> <tool file="data_source/upload.xml"/> <tool file="data_source/ucsc_tablebrowser.xml"/> ... ``` `shed_tool_conf.xml` ```xml <toolbox tool_path="../shed_tools"> <!-- path to installed repositories --> <section id="assembly" name="Assembly" version=""> <tool file="toolshed.g2.bx.psu.edu/repos/iuc/shovill/196a599ec43d/shovill/shovill.xml" guid="toolshed.g2.bx.psu.edu/repos/iuc/shovill/shovill/0.8.0"> <tool_shed>toolshed.g2.bx.psu.edu</tool_shed> <repository_name>shovill</repository_name> <repository_owner>iuc</repository_owner> <installed_changeset_revision>196a599ec43d</installed_changeset_revision> <id>toolshed.g2.bx.psu.edu/repos/iuc/shovill/shovill/0.8.0</id> <version>0.8.0</version> </tool> ... ``` ??? The tool_conf file contains paths to local tool wrapper XML files. Each tool element is within a section, telling galaxy where to put the tool in the panel. The shed_tool_conf file also contains paths to tool wrapper XML files where they are installed (a complex directory structure) It contains some metadata for each tool. --- ## Tool Shed - Galaxy "App Store" - It is a free service that hosts repositories containing Galaxy tools. - The Tool Shed is a hosting (not a development) platform. - Each repository should link to its development repository. - https://toolshed.g2.bx.psu.edu/ ??? The tool shed is Galaxy's app store. It's a free service hosts repositories containing Galaxy tools and can be found at this URL. It's not a development platform: tools are usually maintained in open source github repos and uploaded to the toolshed. --- #####https://toolshed.g2.bx.psu.edu/ ![Toolshed](../../images/tool-management-01-toolshed.png) ??? The Galaxy tool shed has thousands of tools (almost 8000 as of October 2020) You can go to this site to search for toolshed tools by category, or search for them by name --- class: left ## Ways to add tools You can add tools to Galaxy either * Manually - useful for tool development. * From the Tool Shed * Through the admin UI in Galaxy * Using **ephemeris** (recommended) ??? Tools can be added manually or from the toolshed. Toolshed tools can be installed through the admin UI but using ephemeris is recommended. --- ## How to add tools manually - To add a local tool by hand: - Add an entry to tool_conf.xml pointing to the local tool's xml file - if `tool_conf.xml` does not exist it can be copied from `tool_conf.xml.sample` - To add local tools with the [galaxyproject.galaxy Ansible role][galaxy-role]: - Set `galaxy_local_tools` to the local tool paths - Run the playbook - Tool dependencies need to be installed separately, unless `conda_auto_install: true` is set in `galaxy.yml` (not recommended for production) [galaxy-role]: https://galaxy.ansible.com/galaxyproject/galaxy ??? Tools can be added manually to the tool_conf.xml file. If this doesn't exist it can be copied from the default file tool_conf.xml.sample. There is also a variable in the galaxy ansible role for adding these. The dependecies need to be installed separately. --- ## How to install Tool Shed tools through the UI - From the Galaxy admin UI, select "Tool Management > Install and Uninstall" - Find the repository with the tool(s) you want to install - Install the tool repository into Galaxy ??? An administrator can look up any tool from the main toolshed in the admin panel and click the 'Install' button. --- ## Advantages of using the Tool Shed - Installing from the Tool Shed takes care of - dependencies - reference data tables - configuration files - Multiple versions ("installable revisions") of a repository can be installed to preserve reproducibility ??? Installing from the tool shed will install tool dependencies. This is typically a conda virtual environment containing every package that the tool requires. A toolshed tool might have multiple revisions. This is important for reproducibility. If you use a tool in an analysis on a public galaxy server it will be there forever. If you need to rerun your analysis in a year's time, the tool you have used will still be there even if a is a newer revision of the tool has been installed. --- ## Which Tool Shed? - The Main Tool Shed ( https://toolshed.g2.bx.psu.edu ) serves all Galaxies worldwide. - Everybody can create a repository. - Repositories are public, including their whole history. - The Test Tool Shed ( https://testtoolshed.g2.bx.psu.edu ) can be used for repositories not yet production-ready. - Local sheds can be run e.g. for private or custom-licensed tools. - The list of available Tool Sheds for a Galaxy instance is defined in `tool_sheds_conf.xml`: .reduce90[ ```xml <?xml version="1.0"?> <tool_sheds> <tool_shed name="Galaxy Main Tool Shed" url="https://toolshed.g2.bx.psu.edu/"/> <!-- Test Tool Shed should be used only for testing purposes. <tool_shed name="Galaxy Test Tool Shed" url="https://testtoolshed.g2.bx.psu.edu/"/> --> </tool_sheds> ``` ] ??? Repositories in the test tool shed are also public. The toolshed is a web app backed by a database and anybody can run one, but we discourage running local tool shed. By default, Galaxy only accepts tools from the main toolshed but also contains a commented out entry for the test toolshed. --- ## What happens when installing a repo from the Tool Shed * The repository is downloaded * If needed, the tool's dependencies are installed * If needed, reference data tables are installed * An entry for each tool is created in the Galaxy database * The tools are added to `shed_tool_conf.xml` ??? The repository is downloaded The tool's dependencies are installed if needed (they may already be there) If needed, reference data tables are installed An entry for each tool is created in the Galaxy database (or the tool install database depending on the configuration of galaxy) The tools are added to `shed_tool_conf.xml` --- ## How to install with Ephemeris * Find the repository you want to install. * Run `shed-tools install` with the repository details ??? Find a repository to install from the tool shed and install it with the ephemeris `shed-tools` command --- # Ephemeris Small Python library for Galaxy Management * Can install: - Tools - Reference data - Workflows - Data libraries * Can also test tools * `$ pip install ephemeris` * https://github.com/galaxyproject/ephemeris * https://ephemeris.readthedocs.io ??? Ephemeris is a Python library for Galaxy management. It can be used to install tools, reference data, workflows and data libraries onto a Galaxy Instance. It can also be used to run tool tests. Ephemeris can be installed with pip. Ephemeris manages tools through the Galaxy API. There is no need to be using ephemeris commands from the server running Galaxy (though you can). --- ## Get installed tool list for a Galaxy instance ```console get-tool-list [-g GALAXY] [-u USER] [-p PASSWORD] [-a API_KEY] [-h] [-v] -o OUTPUT [--include_tool_panel_id] [--skip_tool_panel_name] [--skip_changeset_revision] [--get_data_managers] # admin only [--get_all_tools] # admin only ``` ??? Ephemeris can be used to get a list of installed tools for any public galaxy instance An API key is not required for this but some options are not available unless an admin API key is provided. --- ```yaml tools: - name: 'column_maker' owner: 'devteam' tool_panel_section_label: 'Text Manipulation' revisions: - '464b9305180e' # 1.2.0 tool_shed_url: 'toolshed.g2.bx.psu.edu' - name: 'bwa' owner: 'devteam' revisions: - '051eba708f43' # 0.7.15.2 - '4d82cf59895e' # 0.7.16.2 tool_panel_section_label: 'Mapping' tool_shed_url: 'toolshed.g2.bx.psu.edu' - name: 'tabular_to_fasta' owner: 'devteam' revisions: - '0b4e36026794' # v1.1.0 tool_panel_section_label: 'Convert Formats' tool_shed_url: 'toolshed.g2.bx.psu.edu' ``` ??? The output contains all of the information we would need to install tools on a different galaxy instance. The revisions correspond to tool versions. There are comments to highlight this. There are two different revisions of bwa corresponding to two different versions. --- ## Install/Update/Test tools ```console shed-tools install [-h] [-v] [-g GALAXY] [-u USER] [-p PASSWORD] [-a API_KEY] [--log_file LOG_FILE] [-t TOOL_LIST_FILE] [-y TOOL_YAML] [--name NAME] [--owner OWNER] [--revisions [REVISIONS [REVISIONS ...]]] [--toolshed TOOL_SHED_URL] [--install_tool_dependencies] [--skip_install_resolver_dependencies] [--skip_install_repository_dependencies] [--test] [--test_existing] [--test_json TEST_JSON] [--test_user_api_key TEST_USER] [--test_user TEST_USER] [--section TOOL_PANEL_SECTION_ID] [--section_label TOOL_PANEL_SECTION_LABEL] [--latest] ``` * Ansible role: https://github.com/galaxyproject/ansible-galaxy-tools * Sample playbook: https://github.com/afgane/galaxy-tools-playbook ??? A galaxy administrator can install tools by providing their administrator API key. They can specify the name, owner and section label or provide a yaml list of tools (TOOL_LIST_FILE) --- ## Example: Installing circos .left[ (1) ```yaml shed-tools install -g <galaxy url> -a <api key> \ --name circos --owner iuc --section_label 'Graph/Display Data' ``` (2) ```yaml shed-tools install -g <galaxy url> -a <api key> -t tools.yml ``` `tools.yml` ```yaml tools: - name: circos owner: iuc tool_panel_section_label: Graph/Display Data ``` ] ??? shed-tools can be used to install a tool from command line arguments or from a yaml file containing one or more tools. These two examples are equivalent to each other. The advantage of the second approach is that many tools be listed in tools.yml to be installed at the same time The argument 'revisions' can also be provided to install a specific revision or more than one revision of the repository. In the absence of a 'revisions' argument, shed-tools will install the latest revision of the tool. --- ## Test tools ```console shed-tools test [-h] [-v] [--log_file LOG_FILE] [-g GALAXY] [-u USER] [-p PASSWORD] [-a API_KEY] [-t TOOL_LIST_FILE] [-y TOOL_YAML] [--name NAME] [--owner OWNER] [--revisions [REVISIONS [REVISIONS ...]]] [--toolshed TOOL_SHED_URL] [--test_json TEST_JSON] [--test_user_api_key TEST_USER_API_KEY] [--test_user TEST_USER] [--parallel_tests PARALLEL_TESTS] ``` ??? A good tool comes with tests: instructions within the wrapper to run the tool with test input and see whether the tool produces the expected output. You need to be an an administrator to install tools but any galaxy user with an API key can run tool tests. --- ## Tool test output - List of tests that have passed and tests that have failed - `tool_test_output.json` file with details of all of the test jobs including their standard outputs - planemo (https://planemo.readthedocs.io/en/latest/) can be used to generate test reports from `tool_test_output.json` (`pip install planemo`) ??? Running tool tests with ephemeris will yield - a list of tool tests that have passed and failed and - a more detailed file of data from the test jobs that is useful for debugging. The python library planemo (also part of the galaxy project) can be used generate a user-friendly report from the json data. --- ## List tools from a Galaxy workflow ```console workflow-to-tools -w WORKFLOW_FILES [WORKFLOW_FILES ...] A space separated list of galaxy workflow description files in json format -o OUTPUT_FILE The output file with a yml tool list -l PANEL_LABEL The name of the panel where the tools will show up in Galaxy.If not specified: "Tools from workflows" ``` ??? From a downloaded workflow file, workflow-to-tools generates a yaml list of all toolshed tools required to run that workflow --- ## Setup data libraries ```console setup-data-libraries [-h] [-v] [-g GALAXY] [-u USER] [-p PASSWORD] [-a API_KEY] -i INFILE [--training] [--legacy] ``` ??? An administrator can use the ephemeris command setup-data-libraries to upload shared data files --- ```yaml destination: type: library name: "Cool Training Library" description: "A longer description." synopsis: "Optional - does anyone ever set this?" items: - name: "Test Folder 1" description: "Description of what is in Test Folder 1" items: - url: https://example.org/cliques-high-representatives.fa src: url ext: fasta info: "A cool longer description." dbkey: "hg19" - name: "Test data segmentation-fold" items: - url: https://example.org/tests/test-data/workflow-test_cd-box_kturns.xml name: workflow-test_cd-box_kturns.xml info: Downloaded from https://example.org/ src: url ext: xml ``` ??? A yaml file describing two folders with one file each to upload to galaxy's shared data. Contains instructions to download the file contents from public URLs --- ## Wait for Galaxy ```console $ galaxy-wait -g http://localhost:8080 -v  Galaxy not up yet... HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/version (Caused  Galaxy not up yet... HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/version (Caused  Galaxy not up yet... HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/version (Caused  Galaxy not up yet... HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/version (Caused  Galaxy not up yet... HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/version (Caused  Galaxy not up yet... HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /api/version (Caused Galaxy Version: 17.05 ``` ??? galaxy-wait sends an API request to a galaxy server to check whether it is running and able to accept the request. If the server is ready it will return straight away. If not it will keep sending requests. This is useful if you want to run any of the other commands such as shed-tools install and you don't know whether Galaxy will be ready. --- # More on Tool Management - Repository structure - Reference data - Dependencies --- class: normal ### Simple tool shed repository (`remove_beginning`) ``` . ├── remove_beginning.pl # optional accompanying script ├── remove_beginning.xml # tool wrapper ├── .shed.yml # metadata file └── test-data # subdirectory for test data ├── 1.bed # test input file └── eq-removebeginning.dat # test output file ``` ??? A tool shed repository containing one tool: remove_beginning. The repository contains - a perl script that is executed from code in the wrapper - test input and output for the tool's tests - a metadata file for the toolshed. --- class: normal ## `.shed.yml` file Contains repository's metadata. ```yaml categories: - Text Manipulation description: Remove lines from the beginning of a file. long_description: | This tool removes the specified number of lines from the beginning of the input dataset. name: remove_beginning owner: devteam remote_repository_url: https://github.com/galaxyproject/tools-devteam/tree/master/tools/remove_beginning type: unrestricted ``` ??? The tool name and owner are set in the metadata file. The file also contains the development url for the tool. The development URL is displayed in the toolshed as a link to the tool's files within its development environment. This is the github repo you would go to raise an issue the the tool or make a pull request to improve the tool. --- class: normal ## Complex tool shed repository (`varscan`) ``` . ├── macros.xml # defines tool macros ├── .shed.yml ├── test-data │ ├── control_chrM.bam │ ├── fasta_indexes.loc # Loc file for tests ... │ └── varscan_mpileup_result1.vcf ├── tool-data │ └── fasta_indexes.loc.sample # Sample loc file ├── tool_data_table_conf.xml.sample # Sample data table ├── tool_data_table_conf.xml.test # Data table for tests ├── varscan_copynumber.xml # wrapper for tool #1 ├── varscan_mpileup.xml # wrapper for tool #2 ├── varscan.py └── varscan_somatic.xml # wrapper for tool #3 ``` ??? This is a more complex toolshed repository containing three tool wrappers related to the same software. Installing this repository will add three tools to the tool panel. --- ## Data tables and loc files - Several tools need to access **reference data**, e.g. genome sequences or indexes for aligner - A *loc file* is a tab-separated file containing metadata and paths for a set of reference data - A *data table* describes the columns of a loc file used by a tool - Installing CVMFS provides access to a large amount of reference data. There are also data manager tools for installing reference data. - You can read more about this in the [Reference Genomes in Galaxy slides](../reference-genomes/slides.html) - See also: [Reference Data with CVMFS](../cvmfs/slides.html) ??? Sometimes tools will need reference data such as genomes. Loc files and data tables are used to link tools with reference files. If CVMFS is installed a lot of your reference data needs will be taken care of. There are data manager tools in the toolshed for installing reference data for tools. These can be run from the admin panel or using ephemeri --- ## Suite repositories - Suites can be used to split up a set of tools into multiple repositories - A suite is a single repository that 'depends' on many others - When you install the suite, all 'dependency repositories' will be installed too - With a tweak to `.shed.yml`, [Planemo](https://planemo.readthedocs.io) can upload a directory of tools to the TS as a suite of separate tools - Examples: suite_samtools, suite_hicexplorer ??? There are also repositories in the Tool Shed that are suite repositories. Suite repositories can be used to install multiple tool shed repositories at once. For example: there are many tools associated with samtools owned by the IUC, such as samtools_view or samtools_mpileup. Installing samtools_suite will result in all of these samtools repositories being installed. --- ## Tool dependency resolution * We aim to make Galaxy resolver-independent, with a preference for [conda](https://conda.io/) and containers * What resolver is going to be used for the tool dependencies is determined at runtime according to the order specified in `dependency_resolvers_conf.xml`. ```xml <dependency_resolvers> <tool_shed_packages /> <galaxy_packages /> <conda /> <galaxy_packages versionless="true" /> <conda versionless="true" /> <!-- other resolvers <lmod /> <lmod versionless="true" /> <modules modulecmd="/opt/Modules/3.2.9/bin/modulecmd" /> <modules modulecmd="/opt/Modules/3.2.9/bin/modulecmd" versionless="true" default_indicator="default" /> <tool_shed_tap /> <homebrew /> --> </dependency_resolvers> ``` [Full documentation](https://docs.galaxyproject.org/en/master/admin/dependency_resolvers.html) ??? Without going into detail, at run-time Galaxy will look for installed dependencies in an order determined by the dependency_resolvers.conf.xml file. The file shown on this slide is the default configuration. Given a set of requirements, Galaxy will look first for an installed toolshed package that meets those requirements. If galaxy finds this it will source the package and look no further. If galaxy does not find this it will look for - a galaxy package with the required version, - followed by a conda package with the required version, - followed by a galaxy package of any version, - followed by a conda package of any version. Typically the packages to run the tool will be conda packages. --- ## Using containers for Tool Dependencies - Galaxy can also use **Docker** or **Singularity** containers to resolve dependencies - You can read more about this in the [Tool Dependencies and Containers slides](../../../dev/tutorials/containers/slides.html) ??? A more recent development in Galaxy is the use of Docker or Singularity containers to resolve dependencies and there is some further reading on this. --- ### <i class="fas fa-key" aria-hidden="true"></i><span class="visually-hidden">keypoints</span> Key points - The Galaxy Tool Shed contains thousands of tools that can be installed on a Galaxy instance - Galaxy administrators can choose which tools are installed and how they are arranged - Ephemeris can be used to manage tools on a Galaxy instance - Tool installation with ephemeris is best practice and allows for the automation of tool management tasks --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
Marius van den Beek
This material is licensed under the Creative Commons Attribution 4.0 International License