Extreme Python

Logo

Extremely good python content :)

About

Articles

Introduction to Python iterators and generators

by Alan

In this post I’m going to cover the fundamentals of python iterators and generators. In order to understand what generators are, you must understand what iterators are. In order to understand what iterators are, we’ll need to start with the iterator pattern!

Iterators

Python iterators are implementations of a more general design pattern called the iterator pattern1.

What design problem does the iterator pattern solve?

The iterator pattern is a design pattern2 used to solve a two key problems:

  1. Access elements sequentially from a container without having knowledge of the containers underlying implementation or respresentation.
  2. Add new ways of accessing elements sequentially from a container without changing the containers interface.

The word container is referring to a concrete data structure such as an array, a linked list, a tree, or something you invented.

How does the iterator pattern solves these two key problems?

It solves the two key problems by:

A picture is worth a thousand words, so here’s a common UML (Unified Modeling Language) diagram of the pattern from wikipedia:

iterator pattern uml

In the diagram above, the difference between a concrete aggregate from just a plain aggregate is that the aggregate is the interface whereas the concrete aggregate is the implementation.

Iterators in Python

In python, every built-in container is a concrete aggregate. For example, lists, dictionaries, and sets all know how to create an iterator through the same interface. In python, that interface method is called __iter__. __iter__ in python is equivalent to iterator from the UML diagram above.

A quick aside: The leading and trailing double under scores of python methods is used by python to indicate special / built-in methods. These are also known as “dunder methods”.

Here’s an example of calling __iter__ on an list object:

>>> [].__iter__()
<list_iterator object at 0x10d4a9828>

Calling __iter__ returns a concrete iterator for a list. This list iterator object has the methods __iter__ and __next__. __iter__ returns the iterator object itself and __next__ retrieves the next element from the list.

Do I have to call __iter__ and __next__ myself every time?

Nope! Luckily, you rarely have to call __next__ yourself because python for expressions actually invokes __iter__ on the container and performs the sequential access for you.

>>> numbers = [1, 2, 3]
>>> for x in numbers:
...     print(x)
...
1
2
3

That’s what’s so great about iterators in python - you can almost pretend they don’t exist. Looping through any built-in container is just a matter of using a for expression.

Iterable vs Iterator .. what’s the difference?

An iterable is any object that implements the __iter__ method which returns an iterator. An iterator in python also implements the __iter__ method (which makes it an iterable), but it has an additional __next__ method as well.

Therefore, every iterator is an iterable because every iterator returns itself through __iter__, but in most cases the only iterable that is also an iterator is the iterator itself.

Too much word salad? Here’s some examples:

[] # iterable only (no __next__ method)
{} # iterable only
() # iterable only
[].__iter__() # iterator AND iterable (has both __next__ and __iter__)
[].__iter__().__iter__() # iterator AND iterable (we can keep going!)

Generators

Now that you have some basic knowledge of iterators, lets switch gears to generators.

Generators produce values as they’re needed and you can retrieve these values through the same iterator interface (__iter__ and __next__). I like to call them lazy iterators.

Lets look at some examples. Here’s a definition of a function that returns a single value:

def foo(): 
	return 1

When I run this program, we expect see a number printed. Not too suprising.

>>> foo()
1

Now replace the return keyword with the yield keyword:

def foo(): 
	yield 1

Invoking this function returns an object of type generator:

>>> foo()
<generator object foo at 0x7f26280da0f8>

Weird! Ok, lets nail down a couple of terms before moving forward.

  1. Any function definition containing the yield keyword is a generator function.
  2. Calling a generator function returns a generator object.

Both objects (the function and the thing the function returns) are loosely referred to colloquially as “generators” so you will need to use context to infer which one people mean.

Now lets look closer at what we can do with the generator object.

The generator object

The generator object implements a next method. Therefore, this object is an iterator!

Calling __next__ is a way of retrieving values from an iterator. Here’s an example of calling __next__ multiple times on this generator object:

>>> my_gen = foo()
>>> my_gen
<generator object foo at 0x7f26280da150>
>>> my_gen.__next__()
1
>>> my_gen.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Each time we call __next__, the generator function executes the statements in the body, just like a typical function call. Unlike typical function calls, this one is resumable.

During execution, it either encounters a yield statement or it doesn’t.

The generator function returns something once, and can return something else at another point in time. It’s also a lazy function - we can delay the evaluation of an expression until the next call to __next__.

Here’s an example of returning more than one value from a generator:

def foo(): 
	print('about to yield 1 ...')
	yield 1 
	print('about to yield 2 ...')
	yield 2 
	print('about to yield 3 ...')
	yield 3

>>> my_gen = foo()
>>> my_gen.__next__()
about to yield 1 ...
1
>>> my_gen.__next__()
about to yield 2 ...
2
>>> my_gen.__next__()
about to yield 3 ...
3
>>> my_gen.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

We can also transform this to a loop to avoid having to define yield multiple times:

def foo():
	for i in range(1, 4):
		yield i 
>>> my_gen = foo()
>>> my_gen.__next__()
1
>>> my_gen.__next__()
2
>>> my_gen.__next__()
3
>>> my_gen.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Generators in practice

While calling next repeatedly is a way to retrieve values from generator iterators, I’m only doing that for teaching purposes.

In practice, generator iterators are often used in for loops just like any other iterator. For statements expect iterators and take care of repetedly calling next and handling StopIteration errors.

If we rewrote our statement to be a for loop, we’ll see all numbers printed without having to worry about calling next ourselves or handling the StopIteration error.

>>> for i in foo():
...     print(i)
... 
1
2
3

Looking strictly at the code above, it’s impossible to tell whether we’re iterating through a container or a generator object. I think that’s fantastic! As a client of a generator object, I don’t need the details of how I’m getting the values I need. Thanks to a standard iterface by way of __iter__ and __next__, the same benefits of abstraction of the iterator pattern extends to these generator objects.

Why bother with generators though? Why not just store these elements in a list?

Generators are great for returning elements lazily. Consider a function that returns a number of fibonacci numbers:

def fib(count):
    results = []
    a, b = 0, 1
    for n in range(0, count):
        if n == 0: 
            results.append(a)
        elif n == 1:
            results.append(b)
        else:
            a, b = b, a + b 
            results.append(b)
    return results

Executing this function returns a list of fibonacci numbers:

>>> fib(10)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
>>> 

What if you have too many numbers to hold in memory? Or - just forget about fibonacci numbers for a second - if you’re not just dealing with a large list of integers but custom objects that have much higher memory footprints?

My answer is of course “GENERATORS!” but I think it’s worth looking at how this problem can be solved without generators. I think you’ll appreciate them more that way :D

Life without generators

One common approach before python introduced generators is to perform the operation as the numbers become available. This involves putting the client operation in the inner loop of where the elements are created.

Here’s how to modify the original fib function to do that:

def fib(count):
    a, b = 0, 1
    for n in range(0, count):
        if n == 0: 
            do_something_with_fib_number(a)
        elif n == 1:
            do_something_with_fib_number(b)
        else:
            a, b = b, a + b 
            do_something_with_fib_number(b)

Great, the numbers are no longer being accumulated in the list. However, we’re now coupling the do_something_with_fib_number to the fib function, which reduces the reusability of fib. A quick way to solve this is to parameterize the function:

def fib(count, fn):
    a, b = 0, 1
    for n in range(0, count):
        if n == 0: 
            fn(a)
        elif n == 1:
			fn(b)
        else:
            a, b = b, a + b 
            fn(b)

This may be sufficient for simple cases, but things get alot messier if you have to track state. For example, keeping a count of all the even fibonacci numbers processed. There are more ways around this, such as using globals or closures, but those approaches often lead to less readable and maintainable code.

Life with a generators

Now lets look at how this can be solved with generators. All we need to do is modify the original fib function by adding yield:

def fib(count):
    a, b = 0, 1
    for n in range(0, count):
        if n == 0: 
            yield a
        elif n == 1:
            yield b 
        else:
            a, b = b, a + b 
            yield b

Now we use fib without any of the memory overhead!

>>> for x in fib(10):
...     print(x)
... 
0
1
1
2
3
5
8
13
21
34

References

  1. https://en.wikipedia.org/wiki/Iterator_pattern 

  2. https://en.wikipedia.org/wiki/Software_design_pattern 

tags: