Galactic Database

name: inverse
layout: true
class: center, middle, inverse

</span></div>

</span></div>

---

# Galactic Database

<div class="contributors-line">
		
	
<ul class="text-list">
			
			<li>
				<a href="/training-material/hall-of-fame/martenson/" class="contributor-badge contributor-martenson"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/martenson?s=36" alt="Martin Čech avatar" width="36" class="avatar" />
    Martin Čech</a>
			<li>
				<a href="/training-material/hall-of-fame/nsoranzo/" class="contributor-badge contributor-nsoranzo"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/nsoranzo?s=36" alt="Nicola Soranzo avatar" width="36" class="avatar" />
    Nicola Soranzo</a>
			<li>
				<a href="/training-material/hall-of-fame/natefoo/" class="contributor-badge contributor-natefoo"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/natefoo?s=36" alt="Nate Coraor avatar" width="36" class="avatar" />
    Nate Coraor</a>
			<li>
				<a href="/training-material/hall-of-fame/hexylena/" class="contributor-badge contributor-hexylena"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/hexylena?s=36" alt="Helena Rasche avatar" width="36" class="avatar" />
    Helena Rasche</a></li>
</ul>

</div>

<div class="footnote" style="bottom: 8em;">
  <i class="far fa-calendar" aria-hidden="true"></i><span class="visually-hidden">last_modification</span> Updated:   
  <i class="fas fa-fingerprint" aria-hidden="true"></i><span class="visually-hidden">purl</span><abbr title="Persistent URL">PURL</abbr>: <a href="https://gxy.io/GTN:S00008">gxy.io/GTN:S00008</a>
</div>

<div class="footnote" style="bottom: 5em;">

<i class="far fa-play-circle" aria-hidden="true"></i><span class="visually-hidden">video-slides</span> <a href="/training-material/videos/watch.html?v=/admin/tutorials/database/slides">Video slides</a> |

<i class="fas fa-file-alt" aria-hidden="true"></i><span class="visually-hidden">text-document</span><a href="slides-plain.html"> Plain-text slides</a> |

</div>

<div class="footnote" style="bottom: 2em;">
    <strong>Tip: </strong>press <kbd>P</kbd> to view the presenter notes
    | <i class="fa fa-arrows" aria-hidden="true"></i><span class="visually-hidden">arrow-keys</span> Use arrow keys to move between slides

</div>

???
Presenter notes contain extra information which might be useful if you intend to use these slides for teaching.

Press `P` again to switch presenter notes off

Press `C` to create a new window where the same presentation will be displayed.
This window is linked to the main window. Changing slides on one will cause the
slide to change on the other.

Useful when presenting.

---

# Galactic Database

.left[Galaxy uses a database for:]
* Galaxy objects and all their relations (users, histories, datasets, workflows)
* Job state persistence, job dispatching

.left[Galaxy does not use a database for:]
* Dataset **contents**: files on disk

???

- Galaxy objects like users, histories, datasets, and workflows are stored in the database.
- All job information is likewise stored there.
- User and reference data, however, are stored outside of the database.
- You will want to backup both user data and your database.

---

# Defaults

* Galaxy uses the [SQLAlchemy](http://www.sqlalchemy.org/) database abstraction layer. This allows for different database servers to be plugged in.
* By default Galaxy automatically creates and uses an [SQLite](https://sqlite.org/) database during the first startup.
  * The database is in the file `database/universe.sqlite`

???

- Galaxy uses SQLAlchemy for interacting with databases.
- This allows Galaxy to transparently use sqlite or postgres.

---
# Choices

* SQLite
  * Useful for single-user Galaxy or development.
* **PostgreSQL**
  * The recommended standard for anything serious.
* ~~MySQL~~
  * Supported by SQLAlchemy but Galaxy is not tested against it.

???

- sqlite is fine in development, but should not be used for production.
- postgres is the best for any production server.
- Do not use mysql if possible.

---
# Sizing

Galaxy rarely deletes from the database, most objects are *marked* deleted.

Allocate at least 20 GB of disk to start, 50+ GB if expanding would be difficult.

8-16 GB memory should be sufficient.

Recommended: Run PostgreSQL on a different server for resource isolation.

???

- Data is not removed from the database, so plan accordingly.
- Allocated at least 20 GB of disk to start, and 50 if expanding later is difficult.
- The RAM usage is usually not significant.
- We recommend running the database on a separate server from Galaxy for better isolation.

---
# Configuration

`database_connection` is specified as a [database URL](http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls) in `galaxy.yml`
  * Default SQLite:
      `sqlite:///./database/universe.sqlite?isolation_level=IMMEDIATE`

* Local PostgreSQL (socket):
       `postgresql:///<db_name>[?host=/var/run/postgresql]`

* Network PostgreSQL:
      `postgresql://[user][:password]@<host>[:5432]/<db_name>`

???

- The parameter database_connection tells galaxy where the database is.
- There are several styles, depending on how you connect to your database.

---
# New Database

On first startup with an empty database, Galaxy creates its schema

???

- On first startup with an empty database, Galaxy creates its schema.

---
# Migrations

Changes in the Galaxy DB model (when upgrading Galaxy) are captured incrementally in the form of [atomic scripts](https://github.com/galaxyproject/galaxy/tree/dev/lib/galaxy/model/migrations/alembic/versions_gxy) that use [Alembic](https://alembic.sqlalchemy.org/).
        
Each script can both upgrade and downgrade a DB.

```console
$ ./manage_db.sh upgrade release_25.0
$ ./manage_db.sh downgrade release_24.1
```
Please see the [Galaxy documentation](https://docs.galaxyproject.org/en/master/admin/db_migration.html) for more details.
        
???

- Whenever Galaxy is upgraded, there may be changes to the database schema.
- These are stored in migration scripts, and you can use the manage db script to up or downgrade.

---
# Tuning - Pool

If the server logs errors about not having enough database pool connections.

| Galaxy config option                  | default value | usegalaxy.org value |
| ------                                | ------------- | ------------------- |
| `database_engine_option_pool_size`    | `5`           | `10`                |
| `database_engine_option_max_overflow` | `10`          | `20`                |

[Values for usegalaxy.org](https://github.com/galaxyproject/usegalaxy-playbook/blob/master/env/main/group_vars/all/galaxy_config_vars.yml)

???

- Databases have a limited number of connection slots.
- Galaxy can pool connections, and re-use existing connections from a pool when it needs to query the DB.
- There are a couple of options for controlling pooling.
- The defaults are generally fine until you see errors.

---
# Tuning - Server-side cursors

If large database query results are causing memory or response time issues in the Galaxy process, leave it on server
(PostgreSQL only, recommended).

| Galaxy config option                         | default value | usegalaxy.org value |
| ------                                       | ------------- | ------------------- |
| `database_engine_option_server_side_cursors` | `false`       | `true`              |

???

- Large queries may slow down Galaxy or the DB.
- You can enable server side cursors to help with this.

---
# Tuning - Slow query logging

Queries slower than this threshold (in s) will be logged at debug level.

| Galaxy config option       | default value | usegalaxy.org value |
| ------                     | ------------- | ------------------- |
| `slow_query_log_threshold` | `0`           | `2`                 |

???

- If you notice slow responses, you can enable slow query logging.
- This will print a message in your Galaxy log if a query takes more than a specified time period.
- This can be useful to help you debug issues.

---
# Tuning - TS install database

Galaxy can track Tool Shed data in a separate DB.

| Galaxy config option          | default value                  | usegalaxy.org value |
| ------                        | -------------                  | ------------------- |
| `install_database_connection` | value of `database_connection` | SQLite DB in CVMFS  |

All other database config options but prefixed with `install_` are also available.

.left[This allows:]
* Bootstrapping fresh Galaxy instances with prebuilt/tested tool sets
* Atomic installation/rollback (esp. with SQLite: backup and restore DB file)

???

- Galaxy can track tool shed data in a separate database
- This can enable deploying Galaxies with prebuilt tool sets

---
# Access through model

Python script to access Galaxy's database layer **via the Galaxy model**.

```console
(venv)$ python -i scripts/db_shell.py --config config/galaxy.yml
>>> new_user = User('foo@example.org', 'secret')
>>> new_user.username = 'foo'
>>> sa_session.add(new_user)
>>> sa_session.flush()
>>> sa_session.query(User).all()
```

???

- You can use the Galaxy python models to interact with the database.
- After activating the galaxy virtual environment, you can use the DB shell script to interface.
- This can allow scripting tasks like resetting passwords.

---
# Useful queries

Captured in [gxadmin](/training-material/topics/admin/tutorials/gxadmin/slides.html) ([tutorial](/training-material/topics/admin/tutorials/gxadmin/tutorial.html))

???

- Many useful DB queries are captured in gxadmin
- Look into this if you need to query the database for information like running jobs or recent users.

---

## Thank You!

This material is the result of a collaborative work. Thanks to the [Galaxy Training Network](https://training.galaxyproject.org) and all the contributors!

<div class="contributors-line">
		
<table class="contributions">
	
	<tr>
		<td><abbr title="These people wrote the bulk of the tutorial, they may have done the analysis, built the workflow, and wrote the text themselves.">Author(s)</abbr></td>
		<td>
			<a href="/training-material/hall-of-fame/martenson/" class="contributor-badge contributor-martenson"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/martenson?s=36" alt="Martin Čech avatar" width="36" class="avatar" />
    Martin Čech</a><a href="/training-material/hall-of-fame/nsoranzo/" class="contributor-badge contributor-nsoranzo"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/nsoranzo?s=36" alt="Nicola Soranzo avatar" width="36" class="avatar" />
    Nicola Soranzo</a><a href="/training-material/hall-of-fame/natefoo/" class="contributor-badge contributor-natefoo"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/natefoo?s=36" alt="Nate Coraor avatar" width="36" class="avatar" />
    Nate Coraor</a><a href="/training-material/hall-of-fame/hexylena/" class="contributor-badge contributor-hexylena"><img src="/training-material/assets/images/orcid.png" alt="orcid logo" width="36" height="36"/><img src="https://avatars.githubusercontent.com/hexylena?s=36" alt="Helena Rasche avatar" width="36" class="avatar" />
    Helena Rasche</a>
		</td>
	</tr>

<tr class="reviewers">
		<td><abbr title="These people reviewed this material for accuracy and correctness">Reviewers</abbr></td>
		<td>
			<a href="/training-material/hall-of-fame/slugger70/" class="contributor-badge contributor-badge-small contributor-slugger70"><img src="https://avatars.githubusercontent.com/slugger70?s=36" alt="Simon Gladman avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/nsoranzo/" class="contributor-badge contributor-badge-small contributor-nsoranzo"><img src="https://avatars.githubusercontent.com/nsoranzo?s=36" alt="Nicola Soranzo avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/hexylena/" class="contributor-badge contributor-badge-small contributor-hexylena"><img src="https://avatars.githubusercontent.com/hexylena?s=36" alt="Helena Rasche avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/martenson/" class="contributor-badge contributor-badge-small contributor-martenson"><img src="https://avatars.githubusercontent.com/martenson?s=36" alt="Martin Čech avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/shiltemann/" class="contributor-badge contributor-badge-small contributor-shiltemann"><img src="https://avatars.githubusercontent.com/shiltemann?s=36" alt="Saskia Hiltemann avatar" width="36" class="avatar" /></a><a href="/training-material/hall-of-fame/bgruening/" class="contributor-badge contributor-badge-small contributor-bgruening"><img src="https://avatars.githubusercontent.com/bgruening?s=36" alt="Björn Grüning avatar" width="36" class="avatar" /></a></td>
	</tr>

</table>

</div>

</div>

Tutorial Content is licensed under <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.<br/>