Reference Genomes in Galaxy

name: inverse
layout: true
class: center, middle, inverse

</span></div>

</span></div>

---

# Reference Genomes in Galaxy

<div class="contributors-line">
		
	
<ul class="text-list">
			
			<li>
				<a href="/training-material/hall-of-fame/blankenberg/" class="contributor-badge contributor-blankenberg"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/blankenberg?s=36" alt="Daniel Blankenberg avatar" width="36" class="avatar" />
    Daniel Blankenberg</a>
			<li>
				<a href="/training-material/hall-of-fame/slugger70/" class="contributor-badge contributor-slugger70"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/slugger70?s=36" alt="Simon Gladman avatar" width="36" class="avatar" />
    Simon Gladman</a></li>
</ul>

</div>

<div class="footnote" style="bottom: 8em;">
  <i class="far fa-calendar" aria-hidden="true"></i><span class="visually-hidden">last_modification</span> Updated:   
  <i class="fas fa-fingerprint" aria-hidden="true"></i><span class="visually-hidden">purl</span><abbr title="Persistent URL">PURL</abbr>: <a href="https://gxy.io/GTN:S00021">gxy.io/GTN:S00021</a>
</div>

<div class="footnote" style="bottom: 5em;">

<i class="fas fa-file-alt" aria-hidden="true"></i><span class="visually-hidden">text-document</span><a href="slides-plain.html"> Plain-text slides</a> |

</div>

<div class="footnote" style="bottom: 2em;">
    <strong>Tip: </strong>press <kbd>P</kbd> to view the presenter notes
    | <i class="fa fa-arrows" aria-hidden="true"></i><span class="visually-hidden">arrow-keys</span> Use arrow keys to move between slides

</div>

???
Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press `P` again to switch presenter notes off

Press `C` to create a new window where the same presentation will be displayed.
This window is linked to the main window. Changing slides on one will cause the
slide to change on the other.

Useful when presenting.

---

# Overview

.large[
* **Intro to built in datasets**
* Built in data hierarchy
* Some problems
* Data Managers
* There's just so much of it!
]

---

# Built in Data

![List_of_data.png](../../images/i06-List_of_data.png)

---

# Data, what data?

.large[
* Some genomes are large! Human, Mouse, Coral
* Some tools require indices of the genomes.
* The indices take a long time to build!
* Better to pre-build the indices.
]

---

# Overview

.large[
* Intro to built in datasets
* **Built in data hierarchy**
* Some problems
* Data Managers
* There's just so much of it!
]

---

# Data schematics in Galaxy

![schematic](../../images/data_managers_schematic_overview.png)

---

# Using reference data in a tool

#### bwa.xml

```xml
<conditional name="reference_source">
      <param name="reference_source_selector" type="select" label="Will you select a reference genome from your history or use a built-in index?" help="Built-ins were indexed using default options. See 'Indexes' section of help below">
        <option value="cached">Use a built-in genome index</option>
        <option value="history">Use a genome from history and build index</option>
      </param>
      <when value="cached">
        <param name="ref_file" type="select" label="Using reference genome" help="Select genome from the list">
          <options from_data_table="bwa_mem_indexes">
            <filter type="sort_by" column="2" />
            <validator type="no_options" message="No indexes are available" />
          </options>
          <validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/>
        </param>
      </when>
      <when value="history">

```

---

# Where are the data tables?

#### tool_data_table_conf.xml

(Usually located in `galaxy/config/`)

```xml
  <tables>
    
    <table name="bwa_mem_indexes" comment_char="#" allow_duplicate_entries="False">
      <columns>value, dbkey, name, path</columns>
      <file path="tool-data/bwa_index.loc" />
    </table>
  </tables>
```

---

# "loc" files - Short for location!

bwa_index.loc

```text
#
#<unique_build_id>   <dbkey>   <display_name>   <file_path>
#
bosTau7 bosTau7 Cow (bosTau7)   /genomes/bosTau7/bwa_mem_index/bosTau7/bosTau7.fa
ce10    ce10    C. elegans (ce10)       /genomes/ce10/bwa_mem_index/ce10/ce10.fa
danRer7 danRer7 Zebrafish (danRer7)     /genomes/danRer7/bwa_mem_index/danRer7/danRer7.fa
dm3     dm3     D. melanogaster Apr. 2006 (BDGP R5/dm3) (dm3)   /genomes/dm3/bwa_mem_index/dm3/dm3.fa
hg19    hg19    Human (hg19)    /genomes/hg19/bwa_mem_index/hg19/hg19.fa
hg38    hg38    Human (hg38)    /genomes/hg38/bwa_mem_index/hg38/hg38.fa
mm10    mm10    Mouse (mm10)    /genomes/mm10/bwa_mem_index/mm10/mm10.fa
```

---

# Overview

.large[
* Intro to built in datasets
* Built in data hierarchy
* **Some problems**
* Data Managers
* There's just so much of it!
]

---

# Some Problems!

.large[
* Time consuming!
  * ~30 minutes work just to add a new genome to 1 tool!

* Administrator needs to know:
  * how to index **every** tool
  * expected format of the reference data
  * format of the .loc file
]

---

# Typical conversation

> Hi,
> We have a local install of galaxy andI'm trying to add the reference index files for bwa using the information provided in the following link:
>
> ...
>
> I have modified the bwa_index.loc file present in the ../tool-data directory by adding the path to where the index is on our server (Also attached). However, even after restarting the server, the ference genome does not show when choosing the "use a built in index option". I'm not sure whether the loc file is correctly created, and whether any other configuration file needs to be changed/updated. Help in the matter greatly appreciated
>
> Thanks,
> Aarti

---

# Typical conversation

> Hi Aarti,
> Check the name of your rf file. If it is hg19.fa, then modified the loc file as
> "hg19  hg19  HG19_BWA  /root/Ref_INDEX/HG19BWAIndex/base/hg19.fa"
>
> Avik Datta

---

# Typical conversation

> Also make sure you are using TABs to separate the fields in the .loc file, this has bitten me several times in the past. My vim config places 4 spaces instead of TAB, to deactivate this optin you can do ":set noexpandtab"
>
> Hope it helps,
> Carlos

---

# Typical conversation

> Hello Carlos,
> Thanks a lot for the tip. The tab trick has fixed the problem.
>
> Regards,
> Aarti

---

# Other concerns

.large[
* **Accessible?**
  * Manually download genome FASTA files
  * Download, compile, run bwa index; which options?
* **Reproducible?**
  * Only if the person performing manual steps keeps good notes
* **Transparent?**
  * Send email to sysadmin asking for notes
  * Restart Galaxy server for new entries
]

---

# Overview

.large[
* Intro to built in datasets
* Built in data hierarchy
* Some problems
* **Data Managers**
* There's just so much of it!
]

---

# Data Managers

.large[
* Allows for the **creation of built-in** (reference) data
	* underlying data
	* data tables
	* \*.loc files

* Specialized Galaxy tools that can only be accessed by an admin

* Defined **locally** or installed from **ToolShed**
]

---

# Data Managers

.large[
* **Flexible** framework
  * Not just genomic data
  * Run Data Managers through UI
  * Workflow compatible
  * API
* Examples
  * Adding new genome builds (dbkeys)
  * Fetching genome (fasta) sequences
  * Building short read mapper indices for genomes
]

---

# Special class of Galaxy tool

Looks just like a normal Galaxy tool!

![Data-manager-ui.png](../../images/Data-manager-ui.png)

---

# What does it do?

The output of the data manager is a JSON description of the new data table entry

![data_table_JSON.png](../../images/data_table_JSON.png)

This gets turned into a new data table entry

![data_table_entry.png](../../images/data_table_entry.png)

The index files themselves get placed in the appropriate location.

---

# Data Managers Admin

.large[
* Located on the Galaxy's Admin Tab under **Local Data**
]
![data_managers_tool_list.png](../../images/data_managers_tool_list.png)

---

# Data Managers Admin

.large[
* UI tools to fetch reference genomes/build indices
* View progress of index build jobs
* View contents of tool data tables
]
![data_table_ui.png](../../images/data_table_ui.png)

---

# Resources / further reading

.large[
* Galaxy Wiki Page on Data Managers
  * Details
  * Building
  * Examples

https://galaxyproject.org/admin/tools/data-managers/
]

---

# Exercise Time!

---

# Overview

.large[
* Intro to built in datasets
* Built in data hierarchy
* Some problems
* Data Managers
* **There's just so much of it!**
]

---

# There's a lot of reference data

.large[
(and it's hard to keep up with)
]
![ref_data_prob_flow.png](../../images/ref_data_prob_flow.png)

---

# CernVM-FS to the rescue

* Needed a method of sharing reference data across country efficiently
* **CVMFS** is an efficient method for read only data sharing between systems
    * Originally designed for distributed software installation at Cern
    * Turns out it's really useful for read only data sets as well
    * HTTP-based, firewall friendly
* All nodes of Galaxy Main get their reference genomes and indices from CVMFS
    * Shared via mirroring and caching across the country
* It's also really useful to share data **globally**
    * The **usegalaxy.*** initiative has taken full advantage of this.

---

.widen_image[
![cvmfs_server_distribution.png](../../images/cvmfs_server_distribution.png)
]

---

# CVM-FS Global Structure

.widen_image[
![cvmfs_global_structure.png](../../images/cvmfs_global_structure.png)
]

---

# Exercise #2:

.large[
**Connect our instances to CVMFS for reference data**
]

---

## Thank You!

This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!

<div class="contributors-line">
		
<table class="contributions">
	
	<tr>
		<td><abbr title="These people wrote the bulk of the tutorial, they may have done the analysis, built the workflow, and wrote the text themselves.">Author(s)</abbr></td>
		<td>
			<a href="/training-material/hall-of-fame/blankenberg/" class="contributor-badge contributor-blankenberg"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/blankenberg?s=36" alt="Daniel Blankenberg avatar" width="36" class="avatar" />
    Daniel Blankenberg</a><a href="/training-material/hall-of-fame/slugger70/" class="contributor-badge contributor-slugger70"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/slugger70?s=36" alt="Simon Gladman avatar" width="36" class="avatar" />
    Simon Gladman</a>
		</td>
	</tr>

<tr class="reviewers">
		<td><abbr title="These people reviewed this material for accuracy and correctness">Reviewers</abbr></td>
		<td>
			<a href="/training-material/hall-of-fame/martenson/" class="contributor-badge contributor-badge-small contributor-martenson"><img src="https://avatars.githubusercontent.com/martenson?s=36" alt="Martin Čech avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/natefoo/" class="contributor-badge contributor-badge-small contributor-natefoo"><img src="https://avatars.githubusercontent.com/natefoo?s=36" alt="Nate Coraor avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/nsoranzo/" class="contributor-badge contributor-badge-small contributor-nsoranzo"><img src="https://avatars.githubusercontent.com/nsoranzo?s=36" alt="Nicola Soranzo avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/hexylena/" class="contributor-badge contributor-badge-small contributor-hexylena"><img src="https://avatars.githubusercontent.com/hexylena?s=36" alt="Helena Rasche avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/bgruening/" class="contributor-badge contributor-badge-small contributor-bgruening"><img src="https://avatars.githubusercontent.com/bgruening?s=36" alt="Björn Grüning avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/shiltemann/" class="contributor-badge contributor-badge-small contributor-shiltemann"><img src="https://avatars.githubusercontent.com/shiltemann?s=36" alt="Saskia Hiltemann avatar" width="36" class="avatar" /></a></td>
	</tr>

</table>

</div>

</div>

Tutorial Content is licensed under <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.<br/>