Python-like generator functions in Go

Ever since I stumbled upon David Beazleys excellent tutorial on python generator functions, I have been very fond of them.

A generator function is basically a function that - rather than return a list of stuff - it returns a sequence of stuff, one for every time the function is executed.

This allows them to be chained together in long pipelines that don't create any temporary data in memory between the different functions, but instead only one item at a time is "drawn" through the chain of (generator) functions (typically by a loop over the last function in the chain). This all means you can process arbitrary amounts of data with constant RAM usage, and which will typically mean a slight speedup too.

After learning to know the Go programming language, I was happy to see that implementing gererator-like functions in Go is quite easy too - and if implemented as goroutines - they will even be multi-threaded by default! I wanted to show how to implement them below.

Generators in Python

First a short look at python generator functions. They can be implemented in two ways (at least). For simplicity, say I want a generator function that takes a string and returns the letters uppercased, one at a time. The classical way to implement this in python is to create a normal function, but just swap the return statement to a yield statement instead, like so:

def generate_uppercase_letters(input_string):
    for letter in input_string:
        yield letter.upper()

The, to use this function, you can simply loop over it:

for letter_upper in generate_uppercase_letters("hej"):
    print letter_upper

Which would produce:

H
E
J

Python also has a much shorter syntax for creating a generator function though (that has no Go-counterpart), which should be worth mentioning (basically it is python's list comprehension, with the square brackets switched to normal parentheses):

(letter.upper() for letter in input_string)

... which makes chaining together even easier:

for letter_upper in (letter.upper() for letter in "hej"):
    print letter_upper

Mimicking generators in Go

Now let's implement the same thing in Go.

package main
 
import (
    "fmt"
    "strings"
)
 
func generateUpperCaseLetters(inputString string) chan string {
    // Create a channel where to send output
    outputChannel := make(chan string) 
    // Launch an (anonymous) function in another thread, that does
    // the actual processing.
    go func() {
        // Loop over the letters in inputString
        for _, letter := range inputString {
            // Send an uppercased letter to the output channel
            outputChannel <- strings.ToUpper(string(letter))
        }
        // Close the output channel, so anything that loops over it
        // will know that it is finished.
        close(outputChannel)
    }()
    return outputChannel
}
 
func main() {
    // Loop over the letters communicated over the channel returned
    // from generateUpperCaseLetters() and print them
    for letter := range generateUpperCaseLetters("hej") {
        fmt.Println(letter)
    }
}

So what does this program do? Basically, in the getUppercaseLetters() function, it:

  • Creates a channel that it can send output to
  • It fires away an (anonymous) function in a separate thread (this is made by the go [function name]() call)
    • ... an this function running in a separate thread, loops over the input string and sends the results back, one letter at a time, on the output channel, and when done, closes the channel to notify that it is indeed done.
  • Returns the output channel.

Now, when the getUppercaseLetters() function is executed, what you get back is not the result, and the actual content of function is not yet executed - exactly in the same way as with generator functions.

Only when you start looping over it as a range in the main() function, this "generator-like" function will start converting characters to uppercase. This all has to do with how channels work, that they both synchronize and communicate at the same time ... (that is, the channel is not filled with a new value until one is picked away from it first). But you can read much better info about that over at golang.org.

So, in conclusion: You can quite eaily implement python-generator-like functions in Go too ... definitely not as short and succinct as in in python, but for a threaded, compiled program, it's not at all too bad IMO!

Now, to see how this can be used in practice, to speed up a chain of line-by-line processing functions with up to 65%, have a look at this previous blog post!

Comments? Leave them here.

Tags: