SQL Educational Game - Murder Mystery

Overview

Questions:
  • Who did the crime?

Objectives:
  • Explore SQL City and discover who committed the murder

  • Reinforce your experiences with SQL such as querying, filtering, and joining data.

Requirements:
Time estimation: 2 hours
Level: Intermediate Intermediate
Supporting Materials:
Last modification: Oct 19, 2021
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License The GTN Framework is licensed under MIT

Best viewed in a Jupyter Notebook

This tutorial is best viewed in a Jupyter notebook! You can load this notebook in Jupyter on one of the UseGalaxy.* servers

Launching the notebook in Jupyter in Galaxy

  1. Instructions to Launch JupyterLab
  2. Open a Terminal in JupyterLab with File -> New -> Terminal
  3. Run wget https://training.galaxyproject.org/training-material/topics/data-science/tutorials/sql-game/tutorial.md.ipynb
  4. Select the notebook that appears in the list of files on the left.

Downloading the notebook

  1. Right click this link: tutorial.ipynb
  2. Save Link As..

This is not a tutorial like most GTN content but a fun exercise for you to play around and learn a bit about SQL in a more ‘practical’, and hopefully re-inforce the skills you covered in Basic and Advanced SQL skills. It makes use of the NUKnightLab/sql-mysteries SQL murder mystery project and released under open licenses:

Original code for NUKnightLab/sql-mysteries is released under the MIT License. Original text and other content is released under Creative Commons CC BY-SA 4.0.

Download the database and connector:

# This preamble sets up the sql "magic" for jupyter. Use %%sql in your cells to write sql!
!python3 -m pip install ipython-sql sqlalchemy
!wget -c https://github.com/NUKnightLab/sql-mysteries/raw/master/sql-murder-mystery.db

Setup the database connection:

import sqlalchemy
engine = sqlalchemy.create_engine("sqlite:///sql-murder-mystery.db")
%load_ext sql
%sql sqlite:///sql-murder-mystery.db
%config SqlMagic.displaycon=False

Tables

Which tables are available to you? What columns do they contain? Here’s a handy reference for you:

import pandas as pd
from sqlalchemy import MetaData
m = MetaData()
m.reflect(engine)
results = []
for table in m.tables.values():
    results.append([table.name, ', '.join([c.name for c in table.c])])
pd.set_option('display.max_colwidth', None)
pd.DataFrame(results, columns=["Table", "Columns"])

Begin your search for the truth

A crime has taken place and the police are both useless and corrupt and it is up to you and your community to solve the mystery. They failed to secure their database, and now their crime scene reports are public, it is time to figure out who the murderer was.

You know that the crime was a murder that occurred sometime on Jan.15, 2018 and that it took place in SQL City.

All the clues to this mystery are buried in a huge database, and you need to use SQL to navigate through this vast network of information. Your first step to solving the mystery is to retrieve the corresponding crime scene report from the police department’s database. From there, you can use your SQL skills to find the murderer.

%%sql
select * from crime_scene_report limit 8;

Try using the ‘Insert → Cell Below’ functionality to keep track of important query results as you go!

Solution

Write the following queries in your SQL environment to check whether you’ve found the right murderer:

%%sql
INSERT INTO solution VALUES (1, "Insert the name of the person you found here");
SELECT value FROM solution;

Key points

  • Learning SQL can be fun!

Frequently Asked Questions

Have questions about this tutorial? Check out the FAQ page for the Foundations of Data Science topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.
Did you use this material as a learner or student? Click the form below to leave feedback.

Click here to load Google feedback frame

Citing this Tutorial

  1. Helena Rasche, NU Knight Lab, 2021 SQL Educational Game - Murder Mystery (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/data-science/tutorials/sql-game/tutorial.html Online; accessed TODAY
  2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

details BibTeX

@misc{data-science-sql-game,
author = "Helena Rasche and NU Knight Lab",
title = "SQL Educational Game - Murder Mystery (Galaxy Training Materials)",
year = "2021",
month = "10",
day = "19"
url = "\url{https://training.galaxyproject.org/training-material/topics/data-science/tutorials/sql-game/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
    doi = {10.1016/j.cels.2018.05.012},
    url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
    year = 2018,
    month = {jun},
    publisher = {Elsevier {BV}},
    volume = {6},
    number = {6},
    pages = {752--758.e1},
    author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\`{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
    title = {Community-Driven Data Analysis Training for Biology},
    journal = {Cell Systems}
}
                

Congratulations on successfully completing this tutorial!