# Regular Expressions 101

Regular expressions are a standardized way of describing patterns in textual data. They can be extremely useful for tasks such as finding and replacing data. They can be a bit tricky to master, but learning even just a few of the basics can help you get the most out of Galaxy.

Finding

Below are just a few examples of basic expressions:

Regular expression Matches
`abc` an occurrence of `abc` within your data
`(abc|def)` `abc` or `def`
`[abc]` a single character which is either `a`, `b`, or `c`
`[^abc]` a character that is NOT `a`, `b`, nor `c`
`[a-z]` any lowercase letter
`[a-zA-Z]` any letter (upper or lower case)
`[0-9]` numbers 0-9
`\d` any digit (same as `[0-9]`)
`\D` any non-digit character
`\w` any alphanumeric character
`\W` any non-alphanumeric character
`\s` any whitespace
`\S` any non-whitespace character
`.` any character
`\.`
`{x,y}` between x and y repetitions
`^` the beginning of the line
`\$` the end of the line

Note: you see that characters such as `*`, `?`, `.`, `+` etc have a special meaning in a regular expression. If you want to match on those characters, you can escape them with a backslash. So `\?` matches the question mark character exactly.

Examples

Regular expression matches
`\d{4}` 4 digits (e.g. a year)
`chr\d{1,2}` `chr` followed by 1 or 2 digits
`.*abc\$` anything with `abc` at the end of the line
`^\$` empty line
`^>.*` Line starting with `>` (e.g. Fasta header)
`^[^>].*` Line not starting with `>` (e.g. Fasta sequence)

Replacing

Sometimes you need to capture the exact value you matched on, in order to use it in your replacement, we do this using capture groups `(...)`, which we can refer to using `\1`, `\2` etc for the first and second captured values. If you want to refer to the whole match, use `&`.

Regular expression Input Captures
`chr(\d{1,2})` `chr14` `\1 = 14`
`(\d{2}) July (\d{4})` 24 July 1984 `\1 = 24`, `\2 = 1984`

An expression like `s/find/replacement/g` indicates a replacement expression, this will search (`s`) for any occurrence of `find`, and replace it with `replacement`. It will do this globally (`g`) which means it doesn’t stop after the first match.

Example: `s/chr(\d{1,2})/CHR\1/g` will replace `chr14` with `CHR14` etc.

You can also use replacement modifier such as convert to lower case `\L` or upper case `\U`. Example: `s/.*/\U&/g` will convert the whole text to upper case.

Note: In Galaxy, you are often asked to provide the find and replacement expressions separately, so you don’t have to use the `s/../../g` structure.

There is a lot more you can do with regular expressions, and there are a few different flavours in different tools/programming languages, but these are the most important basics that will already allow you to do many of the tasks you might need in your analysis.

Tip: RegexOne is a nice interactive tutorial to learn the basics of regular expressions.

Tip: Regex101.com is a great resource for interactively testing and constructing your regular expressions, it even provides an explanation of a regular expression if you provide one.

Tip: Cyrilex is a visual regular expression tester.

##### Persistent URL
Resource purlPURL: https://gxy.io/GTN:F00011