NCBI BLAST+ against the MAdLand
What is MAdLand DB?
How can we perform Blast analysis on Galaxy?
Load FASTA sequence into Galaxy
Perform NCBI-Blast+ analysis against MAdLandDBTime estimation: 15 minutesSupporting Materials:Last modification: May 18, 2023License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITpurlPURL: https://gxy.io/GTN:T00238
MAdLandDB is a protein database comprising of a comprehensive collection of fully sequenced plant and algal genomes, with a particular emphasis on non-seed plants and streptophyte algae. Additionally, for comparative analysis, the database also includes genomes from various other organisms such as fungi, animals, the SAR group, bacteria, and archaea. The database is actively developed and maintained by the Rensing lab and released in the MAdLand setting. It employs a system of species abbreviation using a 5 letter code, which is constructed using the first three letters of the genus and the first two letters of the species name, for example, CHABR for Chara braunii. Furthermore, the database provides gene identification through the addition of gene ID’s and supplementary information such as the encoding source of the gene, whether it is plastome encoded (pt) or transcriptome-based (tr) in cases when a genome is not yet available. The key advantage of this database is its non-redundant nature, and the fact that all sequences are predominantly from genome projects, thereby increasing their reliability.
In this tutorial, we will deal with:
Hands-on: Data Upload
Create a new history for this tutorial and give it a proper name
Click the new-history icon at the top of the history panel.
If the new-history is missing:
- Click on the galaxy-gear icon (History options) on the top of the history panel
- Select the option Create New from the menu
- Click on galaxy-pencil (Edit) next to the history name (which by default is “Unnamed history”)
- Type the new name
- Click on Save
If you do not have the galaxy-pencil (Edit) next to the history name:
- Click on Unnamed history (or the current name of the history) (Click to rename history) at the top of your history panel
- Type the new name
- Press Enter
Import the file
- Copy the link location
Click galaxy-upload Upload Data at the top of the tool panel
- Select galaxy-wf-edit Paste/Fetch Data
Paste the link(s) into the text field
- Close the window
We just imported a FASTA file into Galaxy. Now, the next would be to perfrom the BLAST analysis against MAdLandDB.
Perform NCBI Blast+ on Galaxy
Since MAdLandDB is the collection of protein sequences, You can perform BLASTp Tool: toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastp_wrapper/2.10.1+galaxy2 and BLASTx Tool: toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastx_wrapper/2.10.1+galaxy2 tools.
Hands-on: Similarity search against MAdLand Database
- BLASTp Tool: toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastp_wrapper/2.10.1+galaxy2 OR BLASTx Tool: toolshed.g2.bx.psu.edu/repos/devteam/ncbi_blast_plus/ncbi_blastx_wrapper/2.10.1+galaxy2 with the following parameters:
- “Protein query sequence(s)”:
Amino acid input sequence(In case of BLASTp) OR
- “Translated nucleotide query sequence(s)”:
Translated nucleotide input sequence(In case of BLASTx)
- “Subject database/sequences”:
Locally installed BLAST database
- “Protein BLAST database”:
MadLandDB (Genome zoo) plant and algal genomes with a focus on non-seed plants and streptophyte algae (22 Dec 2022)
- “Set expectation value cutoff”:
- “Output format”:
- In “Output Options”:
Tabular (extended 25 columns)
tool The BLAST output will be in tabular format (you can select the desired output format from the drop down menu) and include the following fields :
|1||qseqid||Query Seq-id (ID of your sequence)|
|2||sseqid||Subject Seq-id (ID of the database hit)|
|3||pident||Percentage of identical matches|
|5||mismatch||Number of mismatches|
|6||gapopen||Number of gap openings|
|7||qstart||Start of alignment in query|
|8||qend||End of alignment in query|
|9||sstart||Start of alignment in subject (database hit)|
|10||send||End of alignment in subject (database hit)|
|11||evalue||Expectation value (E-value)|
The fields are separated by tabs, and each row represents a single hit. For more details for BLAST analysis and output, we recommand you to follow the Similarity-searches-blast tutorial.
More Similarity Search Tools on Galaxy
- Diamond: Diamond Tool: toolshed.g2.bx.psu.edu/repos/bgruening/diamond/bg_diamond/2.0.15+galaxy0 is a high-throughput program for alignment of large-scale data sets. It aligns sequences to the reference database using a compressed version of the reference sequences called a “database diamond” which is faster to read and can save computational time (~20,000 times the speed of Blastx, with high sensitivity).
See Buchfink et al. 2014 for more discussion.