View markdown source on GitHub

Building Reliable Machine Learning Models with PyCaret: A Case Study on the LORIS Model

Contributors

Questions

Objectives

last_modification Published: May 5, 2025
last_modification Last Updated: May 5, 2025

Introduction to PyCaret and Galaxy

Speaker Notes

PyCaret simplifies machine learning by automating tasks like data preprocessing, model training, and evaluation. In this tutorial, we will explore how PyCaret can be used within Galaxy to build reliable models, using the LORIS LLR6 model as a case study.


Use Case: The LORIS LLR6 Model (Chang et al., 2024)

Schema of the process to train and test the LORIS model


Dataset: LORIS PanCancer


Data Preparation for PyCaret

PyCaret in Galaxy


Running PyCaret Model Comparison

  1. Upload Data:
  2. Run PyCaret Model Comparison:
    • Train Dataset (CSV or TSV): Training dataset (Chowell_train_Response.tsv)
    • Test Dataset (CSV or TSV): Testing dataset (Chowell_test_Response.tsv)
    • Select the target column: Response (C22)
    • Task: Classification
    • Only Select Classification Models if you don’t want to compare all models: Logistic Regression
  3. Evaluate Model:
    • Use PyCaret’s report to assess performance.
    • Compare with original LORIS LLR6 metrics from Chang et al., 2024.

PyCaret Model Report


Model Evaluation Metrics


Conclusion


Galaxy Training Resources

GTN stats


Thank you!

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors! Galaxy Training Network Tutorial Content is licensed under Creative Commons Attribution 4.0 International License.