name: inverse layout: true class: center, middle, inverse
--- # Reference Data with CVMFS .footnote[Tip: press `P` to view the presenter notes] ??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off --- # Built in Data ![List_of_data.png](../../images/i06-List_of_data.png) --- # Data, what data? .large[ * Some genomes are large! Human, Mouse, Coral * Some tools require indices of the genomes. * The indices take a long time to build! * Better to pre-build the indices. ] --- # Data schematics in Galaxy ![schematic](../../images/data_managers_schematic_overview.png) --- # Using reference data in a tool #### bwa.xml ```xml <conditional name="reference_source"> <param name="reference_source_selector" type="select" label="Will you select a reference genome from your history or use a built-in index?" help="Built-ins were indexed using default options. See 'Indexes' section of help below"> <option value="cached">Use a built-in genome index</option> <option value="history">Use a genome from history and build index</option> </param> <when value="cached"> <param name="ref_file" type="select" label="Using reference genome" help="Select genome from the list"> <options from_data_table="bwa_mem_indexes"> <filter type="sort_by" column="2" /> <validator type="no_options" message="No indexes are available" /> </options> <validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/> </param> </when> <when value="history"> ``` --- # Where are the data tables? #### tool_data_table_conf.xml (Usually located in `galaxy/config/`) ```xml <tables> <!-- Locations of indexes in the BWA mapper format --> <table name="bwa_mem_indexes" comment_char="#" allow_duplicate_entries="False"> <columns>value, dbkey, name, path</columns> <file path="tool-data/bwa_index.loc" /> </table> </tables> ``` --- # "loc" files - Short for location! bwa_index.loc ```text ... # #<unique_build_id> <dbkey> <display_name> <file_path> # ... bosTau7 bosTau7 Cow (bosTau7) /mnt/galaxyIndices/genomes/bosTau7/bwa_mem_index/bosTau7/bosTau7.fa ce10 ce10 C. elegans (ce10) /mnt/galaxyIndices/genomes/ce10/bwa_mem_index/ce10/ce10.fa danRer7 danRer7 Zebrafish (danRer7) /mnt/galaxyIndices/genomes/danRer7/bwa_mem_index/danRer7/danRer7.fa dm3 dm3 D. melanogaster Apr. 2006 (BDGP R5/dm3) (dm3) /mnt/galaxyIndices/genomes/dm3/bwa_mem_index/dm3/dm3.fa hg19 hg19 Human (hg19) /mnt/galaxyIndices/genomes/hg19/bwa_mem_index/hg19/hg19.fa hg38 hg38 Human (hg38) /mnt/galaxyIndices/genomes/hg38/bwa_mem_index/hg38/hg38.fa mm10 mm10 Mouse (mm10) /mnt/galaxyIndices/genomes/mm10/bwa_mem_index/mm10/mm10.fa ... ``` --- # Some Problems! .large[ * Time consuming! * ~30 minutes work just to add a new genome to 1 tool! * Administrator needs to know: * how to index **every** tool * expected format of the reference data * format of the .loc file * Some parts solved by Data Managers * But there's an easier way! ] --- # There's a lot of reference data .large[ (and it's hard to keep up with) ] ![ref_data_prob_flow.png](../../images/ref_data_prob_flow.png) --- # CernVM-fs to the rescue .largeish[ * Needed a method of sharing reference data across country efficiently * **CVM-FS** is an efficient method for read only data sharing between systems * Originally designed for distributed software installation at Cern * Turns out it's really useful for read only data sets as well * All nodes of Galaxy Main get their reference genomes and indices from CVM-FS * Shared via mirroring and caching across the country * It's also really useful to share data **globally** * The **usegalaxy.*** initiative has taken full advantage of this. ] --- .widen_image[ ![cvmfs_server_distribution.png](../../images/cvmfs_server_distribution.png) ] --- # CVM-FS Global Structure .widen_image[ ![cvmfs_global_structure.png](../../images/cvmfs_global_structure.png) ] --- ## Thank you! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://wiki.galaxyproject.org/Teach/GTN) and all the contributors!
.footnote[Found a typo? Something is wrong in this tutorial?
Edit it on [GitHub](https://github.com/galaxyproject/training-material/tree/master/topics/admin/tutorials/cvmfs/slides.html)]