View markdown source on GitHub

Reference Data with CVMFS

Contributors

Daniel Blankenberg, Simon Gladman, Helena Rasche

Objectives

last_modification Last modification: Apr 6, 2021

Built in Data

List_of_data.png

Speaker Notes


Data, what data?

.large[

Speaker Notes


Data schematics in Galaxy

schematic

Speaker Notes


Index Generation with Data Manager

Speaker Notes


“loc” files - Short for location!

#
#<unique_build_id>   <dbkey>   <display_name>   <file_path>
#
bosTau7 bosTau7 Cow (bosTau7)   /genomes/bosTau7/bwa_mem_index/bosTau7/bosTau7.fa
ce10    ce10    C. elegans (ce10)       /genomes/ce10/bwa_mem_index/ce10/ce10.fa
danRer7 danRer7 Zebrafish (danRer7)     /genomes/danRer7/bwa_mem_index/danRer7/danRer7.fa
dm3     dm3     D. melanogaster Apr. 2006 (BDGP R5/dm3) (dm3)   /genomes/dm3/bwa_mem_index/dm3/dm3.fa
hg19    hg19    Human (hg19)    /genomes/hg19/bwa_mem_index/hg19/hg19.fa
hg38    hg38    Human (hg38)    /genomes/hg38/bwa_mem_index/hg38/hg38.fa
mm10    mm10    Mouse (mm10)    /genomes/mm10/bwa_mem_index/mm10/mm10.fa

Speaker Notes


Where are the data tables?

(Usually located in galaxy/config/tool_data_table_conf.xml)

  <tables>
    <!-- Locations of indexes in the BWA mapper format -->
    <table name="bwa_mem_indexes" comment_char="#" allow_duplicate_entries="False">
      <columns>value, dbkey, name, path</columns>
      <file path="tool-data/bwa_index.loc" />
    </table>
  </tables>

Speaker Notes


Using reference data in a tool

bwa.xml

<param name="ref_file" type="select" label="Using reference genome" help="Select genome from the list">
  <options from_data_table="bwa_mem_indexes">
    <filter type="sort_by" column="2" />
    <validator type="no_options" message="No indexes are available" />
  </options>
  <validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/>
</param>

Speaker Notes


Some Problems!

Speaker Notes


There’s a lot of reference data

.large[ (and it’s hard to keep up with) ] ref_data_prob_flow.png

Speaker Notes


CernVM-FS to the rescue

Speaker Notes


IDC

Speaker Notes


CVMFS Global Structure

.widen_image[ cvmfs_global_structure.png ]

Speaker Notes


.widen_image[ cvmfs_server_distribution.png ]

Speaker Notes


Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! page logo This material is licensed under the Creative Commons Attribution 4.0 International License.