name: inverse layout: true class: center, middle, inverse
# Getting data into Galaxy
Updated: Jul 9, 2021
to view the presenter notes
??? Presenter notes contain extra information which might be useful if you intend to use these slides for teaching. Press `P` again to switch presenter notes off Press `C` to create a new window where the same presentation will be displayed. This window is linked to the main window. Changing slides on one will cause the slide to change on the other. Useful when presenting. --- ### <i class="far fa-question-circle" aria-hidden="true"></i><span class="visually-hidden">question</span> Questions - How do I get my data into Galaxy? - How do I get public data into Galaxy? --- # Getting data into Galaxy ??? -> Pressing **P** will toggle presenter mode. --- ## Many ways to get data into your workspace 1. Import using **Get Data** sources e.g. UCSC, SRA 2. Import from a Galaxy **Data Library** 3. Import using **Upload File** - Import from your computer - Directly enter text - Import from a URL - Import using FTP - Import directly into Collection - Import using Rule Builder ??? - To do analysis in Galaxy you first need data to work on. - There are many ways and sources for getting data into your history. - This tutorial will cover all of the techniques listed here. --- ### Best method depends on where the data is, and how big it is ![flowchart for getting data into galaxy. SRA datasets should use the upload tool, if you have many or big datasets use FTP, if they're from the web use the URL upload.](../../images/get_data_upload_logic.png) .footnote[[Source: Galaxy Community Hub](https://galaxyproject.org/tutorials/upload/)] --- # 1. The ***Get Data*** toolbox section --- - *Click* on the **Get Data** toolbox in the toolbox (the left panel) ![Click on Get Data to expand it](../../images/get_data_boxed.png) --- .pull-right[.image-75[![A typical list of data sources](../../images/get_data_expanded.png)]] - Expands to show data sources - E.g. UCSC, NCBI, Uniprot, .. - The specific data sources available on your Galaxy instance are determine by the server's administrator - All of these data sources can bring datasets (files) into your Galaxy workspace (history) ??? This shows the list of data sources that were available on usegalaxy.org in mid 2017. --- Two large data sources you can access through Galaxy are UCSC and SRA .pull-right[![Screenshot of toolbox with ucsc entered in search](../../images/get_data_ucsc.png)] .pull-left[![galaxy toolbox with sra entered in search box.](../../images/get_data_sra.png)] --- # 2. Import from Shared Data Library --- .pull-left[ - Top menu bar -> Shared Data -> Data Libraries - Configured by a Galaxy Administator - Can be imported directly into your history - Example: all GTN tutorial data] .pull-right[![galaxy top menu dropdown shared data, showing Data Libraries](../../images/get_data_data_libraries.png)] --- You can select the files you want and send to History as datasets or collection .image-75[![data library screenshot with a number of datasets selected and export to history menu open](../../images/get_data_import_library.png)] --- # 3. Upload from your computer --- .image-50[![click on upload button](../../images/upload_button.png)] ![Upload file form](../../images/upload_file_form_empty.png) ??? - The **Upload File** data source can import data: - from your computer - by directly entering text - using a URL - and via FTP This is probably the most commonly used tool for bringing data into Galaxy, and it is installed on almost every Galaxy server. --- # Choose files ![Options for importing files from your laptop](../../images/upload_file_form_empty_local_choices.png) ??? - Drag and drop is supported - as is the standard file selection using your browser. --- # Set Metadata - **Datatype** (e.g. FastQ, VCF, BAM, tabular, ..) - Galaxy will autodetect by default (sometimes guesses wrong) - **Genome Build** (e.g. hg19, mm9, ..) - must be set manually (can be done later as well) ![upload dialog from galaxy with a number of files queued.](../../images/upload_file_file_list.png) ??? - Here we have imported 13 files - one with genome annotation in GTF format - 12 paired end read files from an RNA-Seq experiment* - could import them now and have Galaxy guess at their file types. * From UC Davis Training Material. --- - Can be set for all files at once: ![Set datatype for all imported datasets](../../images/upload_file_set_all_datatype.png) --- - Or per file: ![Manually set datatype for one dataset](../../images/upload_file_set_individual_file_type.png) ??? - Here we are manually setting the first dataset's datatype to GTF, a common genome annotation format. --- # Start upload process: - Once everything is ready, click the **Start** button ![Ready to upload files. Click on start](../../images/upload_file_ready.png) ??? - Data transfer does not start until you click Start. --- You can then close the form ![Ready to upload files. Click on start](../../images/upload_file_in_progress.png) ??? --- All the items will appear in your history ![Files are loaded into your current history.](../../images/files_uploaded_into_history.png) and are ready to use when green. --- ## Directly enter text --- - Sometimes it's useful to file content directly. - only works if your dataset is tiny - choose **Paste/Fetch data** ![Select Paste/Fetch data](../../images/upload_file_form_empty_paste.png) --- Enter the data by typing (or pasting) it in the input box: ![Select Paste/Fetch data](../../images/upload_paste_direct.png) You can also set the datatype and build. *Click* **Start**, and then **Close**, and the new item shows up as **Pasted Entry** in your history. --- ## Import using URL --- The data might already be available on a web server somewhere. To avoid downloading data to your computer and uploading to Galaxy in two steps, you can instruct Galaxy to directly fetch the data from a given URL. ![Select Paste/Fetch data](../../images/upload_file_form_empty_paste.png) Select **Paste/Fetch data** --- Enter the URLs (one per line) into the input box: ![Select Paste/Fetch data](../../images/fetch_from_url.png) *Click* **Start**, and then **Close**, and the new items show up in your history with the URL as their name. --- # Import using FTP --- - Why use FTP? - Older Galaxies did not support uploading files larger than 2GB in size - Many people are very comfortable using FTP to upload large datasets and you can sometimes resume interrupted uploads. - How to use FTP - The Galaxy server's administrator must have [enabled FTP](https://galaxyproject.org/admin/config/upload-via-ftp/) on the server - You will need to create an account on that Galaxy Server - You will need to install FTP software, or to run FTP from the shell - See https://galaxyproject.org/ftp-upload/ --- ## Make sure you have an FTP client installed .pull-right[.image-25[![FileZilla](../../images/FileZilla_logo.png)]] - [FileZilla](https://filezilla-project.org/) is a free FTP client that is available on [Windows](https://filezilla-project.org/download.php?platform=win64), [MacOS](https://filezilla-project.org/download.php?platform=osx), and [Linux](https://filezilla-project.org/download.php?platform=linux) - There are many other options - If you don't already have an FTP client, download and install FileZilla. --- ## Establish FTP connection to your Galaxy server - Provide - the instance's FTP server name (e.g. usegalaxy.org, ftp.usegalaxy.eu) - your full **username** (usually an email address) and **password** ![FTP Connection Params](../../images/ftp_client_connect.png) --- ![Successfully connect](../../images/ftp_client_connected.png) Successfully connected --- ![Navigate to the files you want to transfer](../../images/ftp_client_upload.png) Right click on the files and upload them. --- ![FTP transfer in progress](../../images/ftp_client_transferring.png) FTP Transfer in progress... --- ![FTP transfer complete](../../images/ftp_client_transferred.png) ... and transfer complete. --- # Where did my files go? - File Upload menu -> **Choose FTP files** ![choose FTP files](../../images/upload_ftp_import.png) --- - Select files to import into your history - Click **Start** ![choose FTP files](../../images/upload_ftp_import2.png) ??? As you can see, this dialog gives connection settings too --- # Import directly into Collection --- - Select **Collection** tab at top of upload menu - Add files as before (upload from computer, paste/fetch, FTP) ![Direct collection Start](../../images/get_data_direct_collection1.png) --- - Choose collection type (at bottom) - Set metadata (file type, genome build) - Click "Build" ![Direct collection Build](../../images/get_data_direct_collection2.png) --- - Name your collection - Click **Create** button ![Direct collection Name](../../images/get_data_direct_collection3.png) --- - Collection is now imported in your history - Click on it to expand it and view all files in collection ![Direct collection History](../../images/get_data_direct_collection4.png) --- # Import using Rule Based uploader --- - When you want to import many files from URLs or Accession IDs directly into collection(s) - Supports advanced "rules" for creating collections from sample sheets - Click **Rule-based** tab at top of file upload window ![Rule Uploader](../../images/get_data_rule_uploader.png) --- # Import using Rule Based uploader Learn how to use it in the dedicated [Rule Based Uploader tutorial]( https://galaxyproject.github.io//training-material/topics/galaxy-interface/tutorials/upload-rules/tutorial.html) --- ## Thank You! This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!
This material is licensed under the Creative Commons Attribution 4.0 International License