{ "metadata": { }, "nbformat": 4, "nbformat_minor": 5, "cells": [ { "id": "metadata", "cell_type": "markdown", "source": "
This module provide something like a recap of everything covered by the modular Python Introductory level curriculum. This serves as something of a graduation into the Intermediate tutorials which cover more advanced topics.
\n\n\nAgenda\nIn this tutorial, we will cover:
\n\n
This recapitulates the main points from all of the previous modular tutorials
\nMath in python works a lot like math in real life (from algebra onwards). Variables can be assigned, and worked with in the place of numbers
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-1", "source": [ "x = 1\n", "y = 2\n", "z = x * y" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">We can use familiar math operators:
\nOperator | \nOperation | \n
---|---|
+ | \nAddition | \n
- | \nSubtraction | \n
* | \nMultiplication | \n
/ | \nDivision (// for rounded, integer division) | \n
And some familiar operations require the use of the math
module:
Functions were similarly analogous to algebra and mathematics, we can express f(x) = x * 3
in python as:
There are a few basic parts of a function:
\ndef
starts a function definitionf
x
)y=3
)return
And we know we can nest functions, using functional composition, just like in math. In math functional composition was written f(g(x))
and in python it’s exactly the same:
Here we’ve nested three different functions (print is a function!). To read this we start in the middle (math.pow) and move outwards (math.sqrt, print).
\nThere are lots of different datatypes in Python! The basic types are bool
, int
, float
, str
. Then we have more complex datatypes like list
and dict
which can contain the basic types (as well as other lists/dicts nested.)
Data type | \nExamples | \nWhen to use it | \nWhen not to use it | \n
---|---|---|---|
Boolean (bool ) | \nTrue , False | \nIf there are only two possible states, true or false | \nIf your data is not binary | \n
Integer (int ) | \n1, 0, -1023, 42 | \nCountable, singular items. How many patients are there, how many events did you record, how many variants are there in the sequence | \nIf doubling or halving the value would not make sense: do not use for e.g. patient IDs, or phone numbers. If these are integers you might accidentally do math on the value. | \n
Float (float ) | \n123.49, 3.14159, -3.33334 | \nIf you need more precision or partial values. Recording distance between places, height, mass, etc. | \n\n |
Strings (str ) | \n‘patient_12312’, ‘Jane Doe’, ‘火锅’ | \nTo store free text, identifiers, sequence IDs, etc. | \nIf it’s truly a numeric value you can do calculations with, like adding or subtracting or doing statistics. | \n
List / Array (list ) | \n['A', 1, 3.4, ['Nested']] | \nIf you need to store a list of items, like sequences from a file. Especially if you’re reading in a table of data from a file. | \nIf you want to retrieve individual values, and there are clear identifiers it might be better as a dict. | \n
Dictionary / Associative Array / map (dict ) | \n{\"weight\": 3.4, \"age\": 12, \"name\": \"Fluffy\"} | \nWhen you have identifiers for your data, and want to look them up by that value. E.g. looking up sequences by an identifier, or data about students based on their name. Counting values. | \nIf you just have a list of items without identifiers, it makes more sense to just use a list. | \n
There are a couple more datatypes we didn’t cover in detail: set
s, tuple
, None
, enum
, byte
, all of which can be read about in Python’s documentation.
We have a couple of comparators available to use specifically for numeric values:
\n>
: greater than<
: less than>=
: greater than or equal to<=
: less than or equal toAnd a couple that can be used with numbers and strings (or other values!)
\n==
: equal to!=
: does not equalIn Python, there is a class of things which can be easily looped over, called “iterables”. All of the following are examples of iterable items:
\nrange(10)
'abcd'
, a string['a', 'b' , 'c' , 'd']
, a listBasic flow control looks like if
, elif
, else
:
We must start with an if
, then we can have one or more elif
s, and 0 or 1 else to end our clause. If there is no else
, it’s like nothing happens, we just check the if
and elif
s and if none match, nothing happens by default.
We could use and
to check if both conditions are true, and or
to check if one condition is true.
And if we needed, we can invert conditions with not.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-13", "source": [ "# Not\n", "for i in (True, False):\n", " print(f\"NOT {i} => {not i}\")" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">All of these components (if/elif/else
, and
, or
, not
, numerical and value comparators) let us build up
Loops let us loop over the contents of something iterable (a string, a list, lines in a file). We write
\nfor loopVariable in myIterable:\n # Do something\n print(loopVariable)\n
Each loop has:
\nfor
, a keyword to start the looploopVariable
which is set automatically every iteration of the loopin
, a keyword used in a loopIn python you must open()
a file handle, using one of the three modes (read, write, or append). Normally you must also later close()
that file, but your life can be a bit
There are several basic parts
\nwith
to indicate we want to use the context manager for file opening (this is what automatically closes the file afterwards)open(path, mode)
opens a fileas
is a keywordhandle
is the name of a file handle, something that represents the file which we can write to, or read from.Additionally if you need a newline in your file, you must write it yourself with a \\n
.
The above code is equivalent to this, but it is not recommended, it’s a bit harder to read, and it is very very common to forget to close files which is not ideal.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-19", "source": [ "handle = open('out.txt', 'w')\n", "handle.write(\"Здравствуйте \")\n", "handle.write(\"世界!\\n\")\n", "handle.write(\"Welcome!\\n\")\n", "handle.close()\n", "\n", "# Can no longer write." ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">You can also read from a file:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-21", "source": [ "# Read the entire file as one giant string\n", "with open('out.txt', 'r') as handle:\n", " print(handle.read())\n", "\n", "# Or read it as separate lines.\n", "with open('out.txt', 'r') as handle:\n", " print(handle.readlines())" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Sometimes things go wrong!
\nZeroDivisionError
)TypeError
)IndentationError
)If you expect that something will go wrong you can guard against it with a try
/except
Some of the most common reasons to do this are when you’re processing user input data. Users often input invalid data, unfortunately.
\nWrite a program that computes the sum of an alternating series where each element of the series is an expression of the form
\n\n\\[4\\cdot\\sum_{k=1}^{N} \\dfrac{(-1)^{k+1}}{2 * k-1}\\]\nUse that expression and calculate the sum for various values of N like 10
, 1000
, 1000000
You can use a monte carlo simulation to calculate the value of π. The easy way to do this is to take the region x = [0, 1], y = [0, 1]
, and fill it with random points. For each point, calculate the distance to the origin. Calculate the ratio of the inside points to the total points, and multiply the value by 4 to estimate π.
You can use the random
module to generate random values:
Using the random.random()
to generate x and y coordinates, write a function that:
distance<=1
N
, and multiply by 4
.Sixpack is an old EMBOSS program which takes in a DNA sequence, and then for every frame, for both strands, emits every Open Reading Frame (ORF) that it sees.
\n G R G F W C L G G K A A K N Y R E K S V D V A G Y D X F1\n G V A S G A W A V K R Q K T T V K S R W M W R V M M F2\n A W L L V P G R * S G K K L P * K V G G C G G L * X F3\n1 GGGCGTGGCTTCTGGTGCCTGGGCGGTAAAGCGGCAAAAAACTACCGTGAAAAGTCGGTGGATGTGGCGGGTTATGATG 79\n ----:----|----:----|----:----|----:----|----:----|----:----|----:----|----:----\n1 CCCGCACCGAAGACCACGGACCCGCCATTTCGCCGTTTTTTGATGGCACTTTTCAGCCACCTACACCGCCCAATACTAC 79\n P R P K Q H R P P L A A F F * R S F D T S T A P * S F6\n X A H S R T G P R Y L P L F S G H F T P P H P P N H H F5\n P T A E P A Q A T F R C F V V T F L R H I H R T I I F4\n
Here we see a DNA sequence GGGCGTGGCTTCTGGTGCCTGGGCGGTAAAGCGGCAAAAAACTACCGTGAAAAGTCGGTGGATGTGGCGGGTTATGATG
which you’ll use as input. Above is the translation of the sequence to protein, for each of the three frames (F1-6). Below is the reverse complement of the sequence, and the three frame translation again.
What sixpack does is:
\norfs = []\n\nfor sequence in [forward, reverse_complement(forward)]:\n for frame in [sequence, sequence[1:], sequence[2:]]:\n # Remembering\n for potential start_codon:\n # accumulate until it sees a stop codon\n # and append it to the orfs array once it does.\n
Here are some variables for your convenience:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-31", "source": [ "start_codons = ['TTG', 'CTG', 'ATG']\n", "stop_codons = ['TAA', 'TAG', 'TGA']\n", "\n", "# And some convenience functions\n", "def is_start_codon(codon):\n", " return codon in start_codons\n", "\n", "def is_stop_codon(codon):\n", " return codon in stop_codons" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">It’s a good exercise to rewrite sixpack
in a very simplified version without most of the features in sixpack: