Building and Annotating Metagenome-Assembled Genomes (MAGs) from Metagenomics Reads

purlPURL: https://gxy.io/GTN:
Comment: What is a Learning Pathway?
A graphic depicting a winding path from a start symbol to a trophy, with tutorials along the way
We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.

This learning path will guide you through the process of constructing and analyzing Metagenome-Assembled Genomes (MAGs) using the Galaxy platform. You will explore the key steps involved in transforming raw metagenomic data into high-quality MAGs, from preprocessing to functional annotation.

By the end of this path, you will be able to:

This path is designed to equip you with both the theoretical knowledge and practical skills needed to confidently construct, evaluate, and analyze MAGs in your research.

Module 0: Introduction to Galaxy – Navigating the Platform and Performing Your First Analysis

Before diving into metagenomics, it’s essential to become comfortable with the tools you’ll be using. This module is designed to introduce you to the Galaxy platform—a user-friendly, web-based environment for bioinformatics analysis.

Through a combination of video tutorials and hands-on exercises, you will:

By the end of this module, you’ll be ready to tackle more advanced analyses in subsequent modules.

Time estimation: 1 hour 40 minutes

Learning Objectives
  • Learn how to upload a file
  • Learn how to use a tool
  • Learn how to view results
  • Learn how to view histories
  • Learn how to extract and run a workflow
  • Learn how to share a history
  • Familiarize yourself with the basics of Galaxy
  • Learn how to obtain data from external sources
  • Learn how to run tools
  • Learn how histories work
  • Learn how to create a workflow
  • Learn how to share your work
Lesson Slides Hands-on Recordings
A short introduction to Galaxy
Galaxy Basics for genomics

Module 1: Quality Control – Ensuring High-Quality Metagenomic Data

High-quality data is the foundation of reliable metagenomic analysis. Poor-quality reads—whether due to low base-calling accuracy, adapter contamination, or insufficient length—can introduce errors, bias assemblies, and compromise your results.

In this module, you will:

By the end of this module, you’ll be equipped to confidently prepare your metagenomic data for assembly and other advanced analyses.

Time estimation: 1 hour 30 minutes

Learning Objectives
  • Assess short reads FASTQ quality using FASTQE 🧬😎 and FastQC
  • Assess long reads FASTQ quality using Nanoplot and PycoQC
  • Perform quality correction with Cutadapt (short reads)
  • Summarise quality metrics MultiQC
  • Process single-end and paired-end data
Lesson Slides Hands-on Recordings
Quality Control

Module 3: Assembly – Reconstructing and Assessing Contigs from Metagenomic Reads

The foundation of MAG reconstruction lies in assembly—the computational process of piecing together fragmented metagenomic reads into longer genomic sequences called contigs. Think of it as solving a complex jigsaw puzzle: your goal is to identify reads that “fit together” by detecting overlapping sequences.

In this module, you will:

By the end of this module, you’ll be equipped to transform your cleaned metagenomic data into contiguous sequences and evaluate their quality, setting the stage for successful MAG reconstruction.

Time estimation: 2 hours

Learning Objectives
  • Describe what an assembly is.
  • Explain the difference between co-assembly and individual assembly.
  • Explain the difference between reads, contigs and scaffolds.
  • Explain how tools based on de Bruijn graph work.
  • Evaluate the quality of the Assembly with QUAST, Bowtie2, and CoverM-Contig.
  • Construct and apply simple assembly pipelines on short read data.
Lesson Slides Hands-on Recordings
Assembly of metagenomic sequencing data

Module 4: Binning – From Contigs to Refined Microbial Genomes

Metagenomic binning is the process of grouping assembled contigs into discrete bins, each representing a potential microbial genome. By analyzing sequence composition, coverage, and similarity, binning allows researchers to reconstruct individual genomes from complex microbial communities.

However, initial bins often contain fragmented, redundant, or contaminated sequences, which can compromise downstream analyses. To address this, bin refinement and de-replication are essential steps to improve the quality, completeness, and non-redundancy of your Metagenome-Assembled Genomes (MAGs).

In this module, you will:

By the end of this module, you’ll be able to reconstruct, refine, and validate high-quality MAGs, ensuring they are ready for taxonomic and functional analysis.

Time estimation: 2 hours

Learning Objectives
  • Describe what is metagenomics binning.
  • Describe common challenges in metagenomics binning.
  • Perform metagenomic binning using MetaBAT 2 software.
  • Evaluation of MAG quality and completeness using CheckM software.
Lesson Slides Hands-on Recordings
Binning of metagenomic sequencing data

Module 6: Functional Annotation of MAGs – Applying Genomic Approaches to Metagenome-Assembled Genomes

Functional annotation is a fundamental process in genomic analysis, whether you’re working with microbial isolates or Metagenome-Assembled Genomes (MAGs). By applying the same robust approaches used for isolates, you can identify and characterize genes in MAGs, revealing their roles in metabolic pathways, environmental interactions, and ecological functions.

In this module, you will:

By the end of this module, you’ll be able to analyze MAGs with the same confidence and precision as microbial isolates, gaining deeper insights into their ecological roles and functional potential—including their resistance profiles.

Time estimation: 5 hours

Learning Objectives
  • Run a series of tool to annotate a draft bacterial genome for different types of genomic components
  • Evaluate the annotation
  • Process the outputs to formate them for visualization needs
  • Visualize a draft bacterial genome and its annotations
  • Run a series of tool to assess the presence of antimicrobial resistance genes (ARG)
  • Get information about ARGs
  • Visualize the ARGs and plasmid genes in their genomic context
Lesson Slides Hands-on Recordings
Bacterial Genome Annotation
Identification of AMR genes in an assembled bacterial genome

Editorial Board

This material is reviewed by our Editorial Board:

orcid logoBérénice Batut avatar Bérénice Batutorcid logoPaul Zierep avatar Paul Zierep

Funding

These individuals or organisations provided funding support for the development of this resource