More-automatic parallelism in Python — November 29, 2015

More-automatic parallelism in Python

A friend asked a probability question today (viz., if you roll six dice, what’s the probability that at least one of them comes up a 1 or a 5?), so I answered it analytically and then wrote a quick Python simulation to test my analytical answer. That’s all fine, but what annoys me is how serial my code is. It’s serial for a couple reasons:

  1. The Python GIL.
  2. Even if the GIL magically disappeared tomorrow, I’ve got a for-loop in there that’s going to necessarily run serially. I’m running 10 million serial iterations of the “roll six dice” experiment; if I could use all the cores on my quad-core MacBook Pro, this code would run four times as fast — or, better yet, I could run four times as many trials in the same amount of time. More trials is more better.

Most of the code uses list comprehensions as $DEITY intended, and I always imagine that a list comprehension is a poor man’s SQL — i.e., it’s Python’s way of having you explain what you want rather than how you want to get it. If the GIL disappeared, I like to think that the Python SufficientlySmartCompiler would turn all those list comprehensions parallel.

Last I knew, the state of the art in making Python actually use all your cores was to spawn separate processes using the multiprocessing library. Is that still the hotness?

I want parallelism built in at the language level, à la list comprehensions, so that I don’t need to fuss with multiprocessing. “This needs to spawn off a separate process, because of the GIL” is one of the implementation details I’m looking to ignore when I write list comprehensions. I’d have no problem writing some backend multiprocessing code, if it gets buried so far down that I don’t need to think about the backend details in everyday coding, but what I really want is to bake in parallel idioms from the ground up.

Thinking about what you want rather than how you want to obtain it is why I love SQL, and it’s why LINQ seems like a really good idea (though I’ve never used it). But even the versions of SQL that I work with require a bit more fussing with implementation details than I’d like. For instance, inner joins are expensive, so we only join two tables at a time. So if I know that I want tables A, B, and C, I need to create two sub-tables: one that joins A and B, and another that joins A-and-B with C. And for whatever reason, the SQL variants I use need me to be explicit about all the pairwise join conditions — i.e., I need to do

select foo
from A, B, C
where =
    and =
    and =

even though that final and-condition follows logically from the first two. And I can’t just do “select foo” here, or SQL would complain that ‘”foo” is ambiguous’. But if and are equivalent — as the SELECT statement says — then it doesn’t matter whether my mentioning “foo” means “” or “”.

The extent of my knowledge of declarative programming is basically everything I wrote above. I don’t even know if “declarative programming” captures all and only the things I’m interested in. I want optimization with limited effort on my part (e.g., SQL turns my query into a query plan that it then turns into an optimized set of instructions). I also want minimal overhead — a minimal gap between what I’m thinking and what I have to type as code; that’s why I hate Java. Granted, if adding overhead in the form of compiler hints will lead to optimizations, then great; I’d hardly even call that “overhead.”

At a practical level, I’d like to know how to implement all of this in Python — or, hell, in bash or Perl, if it’s easy enough.

Python SIGPIPE — September 25, 2015


I ran into the same Python problem as this fellow. Namely: he’s written a script that dumps lines to stdout, and then runs | head

and gets this:

Traceback (most recent call last):
  File "/home/slaniel/bin/", line 25, in 
  File "/home/slaniel/bin/", line 22, in main
    print "".join(["%s %s\n" % (value, key) for key, value in sorted_list])
IOError: [Errno 32] Broken pipe

I.e., Python still has data in the pipe, ready to go to stdout, but it can’t send it because head(1) exited. So gets SIGPIPE, and Python traps that as an IOError exception. The solution is straightforward:

from signal import signal, SIGPIPE, SIG_DFL

This DFL thing is new to me:

This is one of two standard signal handling options; it will simply perform the default function for the signal. For example, on most systems the default action for SIGQUIT is to dump core and exit, while the default action for SIGCHLD is to simply ignore it.

If I’m reading that right, Python replaces the default SIGPIPE behavior with a thrown exception. To make the signal yield the system default, you need to tell Python explicitly to do that.

Two questions:

  1. Why would Python do this? Is the general logic that it’s trying to “internalize” systemwide behaviors? Maybe it wants a script to be “write once, run anywhere”, so it can’t just accept the systemwide behavior. Instead, it has to turn external system events (like SIGPIPE) into internal program behaviors (like exceptions). Is that the idea?
  2. I don’t want to have to tell every one of my scripts to exit silently when it receives SIGPIPE. So I would prefer not to write
    from signal import signal, SIGPIPE, SIG_DFL

    in every script that I ever produce. Do people have a general approach here? E.g., every script does an import Steve_lib (or your own equivalent) that sets up the expected signal-handling defaults?
Three progressively better ways to generate a random string in Python — March 23, 2010

Three progressively better ways to generate a random string in Python

Version 1 here is what I started with. Version 2 comes from a colleague, version 3 from another colleague.
Version 4 just makes things a little more succinct.

I enjoyed watching this simple function get cleaner as the days went by.

import random
import string

def random_str_1(n):
Generate a pseudorandom string of
length n.
out_str = “”
letters = [‘a’,’b’,’c’,’d’,’e’,’f’,’g’,’h’,’i’,’j’,’k’,
for i in xrange(n):
out_str += random.choice(letters)
return out_str

def random_str_2(n):
An improvement, using the fact that Python
strings are iterables that iterate over their
out_str = “”
letters = ‘abcdefghijklmnopqrstuvwxyz’
for i in xrange(n):
out_str += random.choice(letters)
return out_str

def random_str_3(n):
A further improvement, using an existing
library attribute.
out_str = “”
for i in xrange(n):
out_str += random.choice(string.ascii_lowercase)
return out_str

def random_str_4(n):
Adding a bit of concision.
return “”.join([random.choice(string.ascii_lowercase)
for x in xrange(n)])

def test_all():
Not really much of a unit test. Just confirms that
each of the generated strings is as long as specified.
Doesn’t test that strings are sufficiently random.
methods = [random_str_1, random_str_2, random_str_3, random_str_4]
for n in xrange(40):
for method in methods:
out_str = method(n)
assert len(out_str) == n

Automatic memoization: cleverness to solve the wrong problem — January 23, 2010

Automatic memoization: cleverness to solve the wrong problem

This is the first time in my career that I’ve used JavaScript extensively, so I’m trying to learn some best practices. A few people have recommended Crockford’s [book: JavaScript: The Good Parts], so I picked it up. While skimming through it looking for something else, I ran into his description (on page 44) of using memoization (essentially, caching) to speed up the computation of certain functions.

The example people always use here — and Crockford is no exception — is the Fibonacci sequence. The shortest way to implement it is recursively. As Crockford points out, the recursive solution doesn’t scale: the number of recursive calls necessary to compute the nth Fibonacci number is proportional to the nth power of the golden ratio. (From what I can see, the constant of proportionality converges very rapidly to about 1.11803. That number must have some special name.) I’ve coded both versions up in Python; the recursive version keeps track of how many function calls it had to make.

So then Crockford’s solution, and the solution written in lots of places ([book: Higher-Order Perl], for instance) is to use memoization: cache the results of fib(n) for smaller n, then use the cached results rather than computing those values anew when you need them later on.

This isn’t really a solution, though, as my friend Seth pointed out to me some months ago. It’s certainly clever, and languages like Perl or JavaScript make it very easy. In Perl, you can automatically memoize any of your functions with the Memoize module: just do

use Memoize;

and voilà: all of your calls to `fib()` will be memoized from then on. Pretty cool, really. Manipulating the symbol table is kind of neat.

But this has only temporarily disguised the problem. Let’s look at the time and space requirements for all three of our `fib(n)` implementations:

__Recursive, non-memoized__: exponentially many function calls, hence exponentially many stack frames, hence exponential memory usage. Linear running time.

__Recursive, memoized__: linear memory usage (one cache entry for every [math: i], for [math: i] less than or equal to [math: n]). Linear running time.

__Iterative, non-memoized__: constant memory usage (must keep the [math: n]th, [math: (n-1)]st, and [math: (n-2)]th values of the sequence in memory at all times, but that’s it), linear running time.

By using some language cleverness, you’ve made the problem harder than you need to: memoization increases your memory usage from constant to linear, and does nothing to your running time. By thinking harder about the problem, you can improve both performance aspects and not need any clever language business.

Seth has told me for quite a long time that recursion and higher-order programming (closures and so forth) are interesting but make debugging a lot harder. His contention would probably be that you can often replace a clever recursive solution with a non-clever, easier-to-debug, better-performing one.

That said, at least some higher-order-programming tools make my life easier. In Python, I love list comprehensions:

print [x**2 for x in xrange(1,11) if (x**2 % 2 == 0)]

or the older-school but perhaps more familiar `map()` and `filter()`:

print filter(lambda x : x % 2 == 0, map(lambda x : x**2, xrange(1,11)))

(because I favor readability over concision, in practice I would expand this into several lines: a `map()` line, and a `filter()` line, and so forth)

As I’ve mentioned, I lament the absence of first-class functions in Java. Then again, I’ve seen enough people shoot themselves (and me) in the foot with unreadable Perl or Python, using all the cleverness that the languages provide to them, that I think I’d be okay with a less expressive language that forces programmers to be really boring.