name: inverse layout: true class: center, middle, inverse
--- # Reference Data with CVMFS .footnote[Tip: press `P` to view the presenter notes] ??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- # Built in Data ![List_of_data.png](../../images/i06-List_of_data.png) --- # Data, what data? .large[ * Some genomes are large! Human, Mouse, Coral * Some tools require indices of the genomes. * The indices take a long time to build! * Better to pre-build the indices. ] --- # Data schematics in Galaxy ![schematic](../../images/data_managers_schematic_overview.png) --- # Using reference data in a tool #### bwa.xml ```xml <conditional name="reference_source"> <param name="reference_source_selector" type="select" label="Will you select a reference genome from your history or use a built-in index?" help="Built-ins were indexed using default options. See 'Indexes' section of help below"> <option value="cached">Use a built-in genome index</option> <option value="history">Use a genome from history and build index</option> </param> <when value="cached"> <param name="ref_file" type="select" label="Using reference genome" help="Select genome from the list"> <options from_data_table="bwa_mem_indexes"> <filter type="sort_by" column="2" /> <validator type="no_options" message="No indexes are available" /> </options> <validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/> </param> </when> <when value="history"> ``` --- # Where are the data tables? #### tool_data_table_conf.xml (Usually located in `galaxy/config/`) ```xml <tables> <!-- Locations of indexes in the BWA mapper format --> <table name="bwa_mem_indexes" comment_char="#" allow_duplicate_entries="False"> <columns>value, dbkey, name, path</columns> <file path="tool-data/bwa_index.loc" /> </table> </tables> ``` --- # "loc" files - Short for location! bwa_index.loc ```text # #<unique_build_id> <dbkey> <display_name> <file_path> # bosTau7 bosTau7 Cow (bosTau7) /genomes/bosTau7/bwa_mem_index/bosTau7/bosTau7.fa ce10 ce10 C. elegans (ce10) /genomes/ce10/bwa_mem_index/ce10/ce10.fa danRer7 danRer7 Zebrafish (danRer7) /genomes/danRer7/bwa_mem_index/danRer7/danRer7.fa dm3 dm3 D. melanogaster Apr. 2006 (BDGP R5/dm3) (dm3) /genomes/dm3/bwa_mem_index/dm3/dm3.fa hg19 hg19 Human (hg19) /genomes/hg19/bwa_mem_index/hg19/hg19.fa hg38 hg38 Human (hg38) /genomes/hg38/bwa_mem_index/hg38/hg38.fa mm10 mm10 Mouse (mm10) /genomes/mm10/bwa_mem_index/mm10/mm10.fa ``` --- # Some Problems! .large[ * Time consuming! * ~30 minutes work just to add a new genome to 1 tool! * Administrator needs to know: * how to index **every** tool * expected format of the reference data * format of the .loc file * Some parts solved by Data Managers * But there's an easier way! ] --- # There's a lot of reference data .large[ (and it's hard to keep up with) ] ![ref_data_prob_flow.png](../../images/ref_data_prob_flow.png) --- # CernVM-FS to the rescue * Needed a method of sharing reference data across country efficiently * **CVMFS** is an efficient method for read only data sharing between systems * Originally designed for distributed software installation at Cern * Turns out it's really useful for read only data sets as well * HTTP-based, firewall friendly * All nodes of Galaxy Main get their reference genomes and indices from CVMFS * Shared via mirroring and caching across the country * It's also really useful to share data **globally** * The **usegalaxy.*** initiative has taken full advantage of this. --- .widen_image[ ![cvmfs_server_distribution.png](../../images/cvmfs_server_distribution.png) ] --- # CVMFS Global Structure .widen_image[ ![cvmfs_global_structure.png](../../images/cvmfs_global_structure.png) ] --- ## Thank you! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://wiki.galaxyproject.org/Teach/GTN) and all the contributors!
This material is licensed under the
Creative Commons Attribution 4.0 International License
. .footnote[Found a typo? Something is wrong in this tutorial?
Edit it on [GitHub](https://github.com/galaxyproject/training-material/tree/master/topics/admin/tutorials/cvmfs/slides.html)]