{ "metadata": { }, "nbformat": 4, "nbformat_minor": 5, "cells": [ { "id": "metadata", "cell_type": "markdown", "source": "
A for loop tells Python to execute some statements once for each value in a list, a character string, or some other collection: “for each thing in this group, do these operations”
\n\n\nComment\nThis tutorial is significantly based on the Carpentries Programming with Python, Programming with Python, and Plotting and Programming in Python, which are licensed CC-BY 4.0.
\nAdaptations have been made to make this work better in a GTN/Galaxy environment.
\n
\n\nAgenda\nIn this tutorial, we will cover:
\n\n
Which of these would you rather write
\n\n\n\n\nInput: Manually\n\nprint(2)\nprint(3)\nprint(5)\nprint(7)\nprint(11)\n
\n\nOutput: With Loops\n\nfor number in [2, 3, 5, 7, 11]:\n print(number)\n
It may be less clear here, since you just need to do one operation (print
) but if you had to do two operations, three, more?
for
loop is made up of a collection, a loop variable, and a body.number
- this is the loop variable. It’s a new variable, that’s assigned to the values from the collection. It does not need to be defined before the loop.[2, 3, 5]
is a list
of numbers which we can tell from the square brackets used: [
, ]
\n\n\n\nInput: The loop\n\nfor number in [2, 3, 5]:\n doubled = number * 2\n print(f\"{number} doubled is {doubled}\")\n
\n\nOutput: What's really happening internally\n\n# First iteration, number = 2\ndoubled = number * 2\nprint(f\"{number} doubled is {doubled}\")\n# Second iteration, number = 3\ndoubled = number * 3\nprint(f\"{number} doubled is {doubled}\")\n# Third iteration, number = 5\ndoubled = number * 5\nprint(f\"{number} doubled is {doubled}\")\n
Writing loops saves us time and makes sure our code is accurate, that we don’t accidentally introduce a typo somewhere in the loop body.
\nYou can loop over characters in a string
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-3", "source": [ "dna_string = 'ACTGGTCATCG'\n", "for base in dna_string:\n", " print(base)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">You can loop over lists:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-5", "source": [ "cast = ['Elphaba', 'Glinda', 'Fiyero', 'Nessarose']\n", "for character in cast:\n", " print(character)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">The first line of the for
loop must end with a colon, and the body must be indented with four spaces. Many editors do this automatically for you and even convert Tabs into 4 spaces.
\n\n\nThe colon at the end of the first line signals the start of a block of statements.
\n\nfor x in y:\n print(x)\n
or
\n\nif x > 10:\n print(x)\n
or even further nesting is possible:
\n\nfor x in y:\n if x > 10:\n print(x)\n
The indentation is in fact, quite necessary. Notice how this fails:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-7", "source": [ "#Fix me!\n", "for number in [2, 3, 5]:\n", "print(number)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">And, likewise, this:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-9", "source": [ "patient1 = \"z2910\"\n", " patient2 = \"y9583\"" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Loop variables can be called anything, i
, j
, and k
are very commong defaults due to their long history of use in other programing languages.\nAs with all variables, loop variables are: Created on demand, and Meaningless; their names can be anything at all.
But meaningless is bad for variable names, and whenever possible, we should strive to pick useful, accurate variable names that help use remember what’s going on:
\nfor sequence in sequences:\n print()\nfor patient in clinic_patients:\n print()\nfor nucleotide in dna_sequence:\n print()\n
You can use range
to iterate over a sequence of numbers. This is a built in function (check help(range)
!) so it’s always available even if you don’t import
anything. The range produced is non-inclusive: range(N)
is the numbers 0
to N-1
, so the result will be exactly the length you requested.
\n\n\nIn python
\nrange
is a special type of iterable: none of the numbers are created until we need them.\nprint(range(5))\nprint(range(-3, 8)[0:4])\n
The easiest way to see what numbers are actually in there is to convert it to a
\nlist
:\nprint(list(range(5)))\nprint(list(range(-3, 8)))\nprint(list(range(0, 10, 2)))\n
In programming you’ll often want to accumulate some values: counting things (or “accumulating”). The pattern consists of creating a variable to store your result, running a loop over some data, and in that loop, adding to the variable for your result.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-15", "source": [ "# Sum the first 10 integers.\n", "total = 0\n", "for number in range(1, 11):\n", " total = total + (number)\n", "print(f\" final: {{ total }}\")" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">But how did we get that result? We can add some “debugging” lines to the above code to figure out how we got to that result. Try adding the following line in the above loop
\nprint(f'Currently {number}, our total is {total}')\n
You can add it before you update total
, after it, or both! Compare the outputs to understand what’s happening on each line.
\n\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-17", "source": [ "# Test break and continue here" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">There are multiple ways to efficiently control your loop if you need it.\nthese are the inbuilt python functions: continue & break
\nwhen python encounters continue in your loop it will stop working and goes to the next iteration of the loop.
\n\nfor letter in 'Galaxy':\n if letter == 'l':\n continue\n print(f'The letters are: {letter}')\n
with break python stops the loop and continues with the next part of the code like nothing happened
\n\nfor letter in 'Galaxy':\n if letter == 'l':\n break\n print(f'The letters are: {letter}')\nprint('Done')\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-19", "source": [ "#Test your code here!" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Tracing Execution\nCreate a table showing the numbers of the lines that are executed when this program runs,\nand the values of the variables after each line is executed.
\n\ntotal = 0\nfor char in \"tin\":\n total = total + 1\n
\n👁 View solution
\n\n\n\n
\n\n \n\n\nLine \nVariables \n\n \n1 \ntotal = 0 \n\n \n2 \ntotal = 0 char = ‘t’ \n\n \n3 \ntotal = 1 char = ‘t’ \n\n \n2 \ntotal = 1 char = ‘i’ \n\n \n3 \ntotal = 2 char = ‘i’ \n\n \n2 \ntotal = 2 char = ‘n’ \n\n \n\n3 \ntotal = 3 char = ‘n’ \n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-21", "source": [ "# Test your code here!\n", "original = \"stressed\"\n", "result = ____\n", "for char in original:\n", " result = ____\n", "print(result)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Reversing a String\nFill in the blanks in the program below so that it prints “stressed”\n(the reverse of the original character string “desserts”).
\n\noriginal = \"stressed\"\nresult = ____\nfor char in original:\n result = ____\nprint(result)\n
\n👁 View solution
\n\n\noriginal = \"stressed\"\nresult = \"\"\nfor char in original:\n result = char + result\nprint(result)\n
\n\nQuestion: Practice Accumulating\nFill in the blanks in each of the programs below\nto produce the indicated result.
\n\n# Total length of the strings in the list: [\"red\", \"green\", \"blue\"] => 12\ntotal = 0\nfor word in [\"red\", \"green\", \"blue\"]:\n ____ = ____ + len(word)\nprint(total)\n
\n👁 View solution
\n\n\ntotal = 0\nfor word in [\"red\", \"green\", \"blue\"]:\n total = total + len(word)\nprint(total)\n
\n# List of word lengths: [\"red\", \"green\", \"blue\"] => [3, 5, 4]\nlengths = ____\nfor word in [\"red\", \"green\", \"blue\"]:\n lengths.____(____)\nprint(lengths)\n
👁 View solution
\n\n\nlengths = []\nfor word in [\"red\", \"green\", \"blue\"]:\n lengths.append(len(word))\nprint(lengths)\n
# Concatenate all words: [\"red\", \"green\", \"blue\"] => \"redgreenblue\"\nwords = [\"red\", \"green\", \"blue\"]\nresult = ____\nfor ____ in ____:\n ____\nprint(result)\n
words = [\"red\", \"green\", \"blue\"]\nresult = \"\"\nfor word in words:\n result = result + word\nprint(result)\n
Create an acronym: Starting from the list [\"red\", \"green\", \"blue\"]
, create the acronym \"RGB\"
using\na for loop.
Hint: You may need to use a string method to properly format the acronym.
\nacronym = \"\"\nfor word in [\"red\", \"green\", \"blue\"]:\n acronym = acronym + word[0].upper()\nprint(acronym)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-25", "source": [ "# Test your code here!" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Cumulative Sum
\nReorder and properly indent the lines of code below\nso that they print a list with the cumulative sum of data.\nThe result should be
\n[1, 3, 5, 10]
.\ncumulative.append(total)\nfor number in data:\ncumulative = []\ntotal += number\ntotal = 0\nprint(cumulative)\ndata = [1,2,2,5]\n
\n👁 View solution
\n\n\ntotal = 0\ndata = [1,2,2,5]\ncumulative = []\nfor number in data:\n total += number\n cumulative.append(total)\nprint(cumulative)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-27", "source": [ "# Do a FizzBuzz" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: A classic programmer test: Fizz Buzz\nFizzBuzz is a classic “test” question that is used in some job interviews to remove candidates who really do not understand programming. Your task is this:
\nWrite a for loop that loops over the numbers 1 to 50.
\n\n
\n- If the number is divisible by 3, write Fizz instead of the number
\n- If the number is divisible by 5, write Buzz instead of the number
\n- If the number is divisible by 3 and 5 both, write FizzBuzz instead of the number
\n- Otherwise, write the number itself.
\n\n👁 View solution
\n\n\nfor i in range(1, 50):\n if i % 3 == 0 and i % 5 == 0:\n print(\"FizzBuzz\")\n elif i % 3 == 0:\n print(\"Fizz\")\n elif i % 5 == 0:\n print(\"Buzz\")\n else:\n print(i)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-29", "source": [ "# Fix me!\n", "seasons = ['Spring', 'Summer', 'Fall', 'Winter']\n", "print(f'My favorite season is {seasons[4]}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Identifying Item Errors\n\n
\n- Read the code below and try to identify what the errors are\nwithout running it.
\n- Run the code, and read the error message. What type of error is it?
\n- Fix the error.
\n\nseasons = ['Spring', 'Summer', 'Fall', 'Winter']\nprint(f'My favorite season is {seasons[4]}')\n
\n👁 View solution
\n\nThis list has 4 elements and the index to access the last element in the list is
\n3
.\nseasons = ['Spring', 'Summer', 'Fall', 'Winter']\nprint(f'My favorite season is {seasons[3]}')\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-31", "source": [ "# This code accidentally lost it's indentation! Can you fix it?\n", "data = [1, 3, 5, 9]\n", "acc = 0\n", "for i in data:\n", "if i < 4:\n", "acc = acc + i * 2\n", "else:\n", "acc = acc + i\n", "print(f'The value at {i} is {acc}')\n", "print(f'The answer is {acc}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Correct the errors\nThis code is completely missing indentation, it needs to be fixed. Can you make some guesses at how indented each line should be?
\n\ndata = [1, 3, 5, 9]\nacc = 0\nfor i in data:\nif i < 4:\nacc = acc + i * 2\nelse:\nacc = acc + i\nprint(f'The value at {i} is {acc}')\nprint(f'The answer is {acc}')\n
\n👁 View solution
\n\n\ndata = [1, 3, 5, 9]\nacc = 0\n# There is a : character at the end of this line, so you KNOW the next line\n# must be indented.\nfor i in data:\n # Same here, another :\n if i < 4:\n acc = acc + i * 2\n # And again! Another :\n else:\n acc = acc + i\n# But what about these lines?\nprint(f'The value at {i} is {acc}')\nprint(f'The answer is {acc}')\n
Here this code is actually ambiguous, we don’t know how indented the two prints should be. This very synthetic example lacks good context, but there are three places it could be, with three different effects.
\nThere are two bits of knowledge we can use, however:
\n\n
\n- the first print uses
\ni
, so it must be within the loop- the second print cannot be indented more than the first print (Why? It would require a block like
\nfor ... :
orif .. :
to indent further.)The first option, no indentation, prints out the value once per loop, that seems good
\n\n[...]\n else:\n acc = acc + i\n print(f'The value at {i} is {acc}')\n
The second, prints out the value only during the else case, not otherwise.
\n\nelse:\n acc = acc + i\n print(f'The value at {i} is {acc}')\n
So that’s probably wrong, and we should take the first option. That leaves two options for the final print, no indentation, or at the same level as our first print statement. We can guess that we probably want to print out the final result of the loop, and that it should not be indented.
\n\ndata = [1, 3, 5, 9]\nacc = 0\nfor i in data:\n if i < 4:\n acc = acc + i * 2\n else:\n acc = acc + i\n print(f'The value at {i} is {acc}')\nprint(f'The answer is {acc}')\n
\n\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-33", "source": [ "# We've got a Read\n", "read = \"\"\"\n", "@SEQ_ID\n", "GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT\n", "+\n", "55CCF>>>>>>CCCCCCC65!''*((((***+))%%%++)(%%%%).1***-+*''))**\n", "\"\"\".strip().split('\\n')\n", "\n", "def quality_to_percent(q):\n", " return 100 * (1 - (10 ** (q / -10)))\n", "\n", "# Extract the sequence\n", "sequence = read[1]\n", "# And the quality scores, and map those to the correct values.\n", "quality_scores = [ord(x) - 33 for x in read[3]]\n", "\n", "# Write something here\n", "# That loops over BOTH the sequence and Quality Scores.\n", "# And prints them out\n", "# If the quality scores are `<15`, then break and quit printing.\n", "for i in ..." ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Trimming a FASTQ string\nGiven a FASTQ string, and a list with quality scores, use
\nbreak
to print out just the good bit of DNA and it’s quality score.\n# We've got a Read\nread = \"\"\"\n@SEQ_ID\nGATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT\n+\n55CCF>>>>>>CCCCCCC65!''*((((***+))%%%++)(%%%%).1***-+*''))**\n\"\"\".strip().split('\\n')\n\ndef quality_to_percent(q):\n return 100 * (1 - (10 ** (q / -10)))\n\nsequence = read[1]\nquality_scores = [ord(x) - 33 for x in read[3]]\n\nfor i in ... # TODO\n
\n👁 View solution
\n\nThere are two ways to do this, one you might be able to guess, and one that might be new:
\n\n
\n- Loop over a
\nrange()
usinglen(sequence)
. Sincelen(sequence) == len(quality_scores)
, when we access the Nth position of either, they match up.- \n
zip(sequence, quality_scores)
will loop over both of these lists together. It produces a new list that looks like[['G', 20], ['A', 20], ['T', 34]]
.👁 View solution
\n\nThe naïve solution is quite easy and readable:
\n\nfor i in range(len(sequence)):\n if quality_scores[i] < 15:\n break\n print(f'Base {i} = {sequence[i]} with {quality_to_percent(quality_scores[i])}% accuracy')\n
But we can make this a bit prettier using the
\nzip()
function:\nfor base, score in zip(sequence, quality_scores):\n if score < 15:\n break\n print(f'Base = {base} with {quality_to_percent(score)}% accuracy')\n
But note that we don’t have the position in the list anymore, so we remove it from the print statement.
\n