More-automatic parallelism in Python — November 29, 2015

More-automatic parallelism in Python

A friend asked a probability question today (viz., if you roll six dice, what’s the probability that at least one of them comes up a 1 or a 5?), so I answered it analytically and then wrote a quick Python simulation to test my analytical answer. That’s all fine, but what annoys me is how serial my code is. It’s serial for a couple reasons:

  1. The Python GIL.
  2. Even if the GIL magically disappeared tomorrow, I’ve got a for-loop in there that’s going to necessarily run serially. I’m running 10 million serial iterations of the “roll six dice” experiment; if I could use all the cores on my quad-core MacBook Pro, this code would run four times as fast — or, better yet, I could run four times as many trials in the same amount of time. More trials is more better.

Most of the code uses list comprehensions as $DEITY intended, and I always imagine that a list comprehension is a poor man’s SQL — i.e., it’s Python’s way of having you explain what you want rather than how you want to get it. If the GIL disappeared, I like to think that the Python SufficientlySmartCompiler would turn all those list comprehensions parallel.

Last I knew, the state of the art in making Python actually use all your cores was to spawn separate processes using the multiprocessing library. Is that still the hotness?

I want parallelism built in at the language level, à la list comprehensions, so that I don’t need to fuss with multiprocessing. “This needs to spawn off a separate process, because of the GIL” is one of the implementation details I’m looking to ignore when I write list comprehensions. I’d have no problem writing some backend multiprocessing code, if it gets buried so far down that I don’t need to think about the backend details in everyday coding, but what I really want is to bake in parallel idioms from the ground up.

Thinking about what you want rather than how you want to obtain it is why I love SQL, and it’s why LINQ seems like a really good idea (though I’ve never used it). But even the versions of SQL that I work with require a bit more fussing with implementation details than I’d like. For instance, inner joins are expensive, so we only join two tables at a time. So if I know that I want tables A, B, and C, I need to create two sub-tables: one that joins A and B, and another that joins A-and-B with C. And for whatever reason, the SQL variants I use need me to be explicit about all the pairwise join conditions — i.e., I need to do

select foo
from A, B, C
where =
    and =
    and =

even though that final and-condition follows logically from the first two. And I can’t just do “select foo” here, or SQL would complain that ‘”foo” is ambiguous’. But if and are equivalent — as the SELECT statement says — then it doesn’t matter whether my mentioning “foo” means “” or “”.

The extent of my knowledge of declarative programming is basically everything I wrote above. I don’t even know if “declarative programming” captures all and only the things I’m interested in. I want optimization with limited effort on my part (e.g., SQL turns my query into a query plan that it then turns into an optimized set of instructions). I also want minimal overhead — a minimal gap between what I’m thinking and what I have to type as code; that’s why I hate Java. Granted, if adding overhead in the form of compiler hints will lead to optimizations, then great; I’d hardly even call that “overhead.”

At a practical level, I’d like to know how to implement all of this in Python — or, hell, in bash or Perl, if it’s easy enough.

Python SIGPIPE — September 25, 2015


I ran into the same Python problem as this fellow. Namely: he’s written a script that dumps lines to stdout, and then runs | head

and gets this:

Traceback (most recent call last):
  File "/home/slaniel/bin/", line 25, in 
  File "/home/slaniel/bin/", line 22, in main
    print "".join(["%s %s\n" % (value, key) for key, value in sorted_list])
IOError: [Errno 32] Broken pipe

I.e., Python still has data in the pipe, ready to go to stdout, but it can’t send it because head(1) exited. So gets SIGPIPE, and Python traps that as an IOError exception. The solution is straightforward:

from signal import signal, SIGPIPE, SIG_DFL

This DFL thing is new to me:

This is one of two standard signal handling options; it will simply perform the default function for the signal. For example, on most systems the default action for SIGQUIT is to dump core and exit, while the default action for SIGCHLD is to simply ignore it.

If I’m reading that right, Python replaces the default SIGPIPE behavior with a thrown exception. To make the signal yield the system default, you need to tell Python explicitly to do that.

Two questions:

  1. Why would Python do this? Is the general logic that it’s trying to “internalize” systemwide behaviors? Maybe it wants a script to be “write once, run anywhere”, so it can’t just accept the systemwide behavior. Instead, it has to turn external system events (like SIGPIPE) into internal program behaviors (like exceptions). Is that the idea?
  2. I don’t want to have to tell every one of my scripts to exit silently when it receives SIGPIPE. So I would prefer not to write
    from signal import signal, SIGPIPE, SIG_DFL

    in every script that I ever produce. Do people have a general approach here? E.g., every script does an import Steve_lib (or your own equivalent) that sets up the expected signal-handling defaults?
Does Heartbleed mean that C should die? — April 12, 2014

Does Heartbleed mean that C should die?

The “Does that pretty much wrap it up for C?” piece (via my man Jamie Forrest) is interesting, but I think he needs to talk it out a bit more. I mean, at *some* level, *someone* is going to have to do memory allocation on bare metal. And what do we do then? And there are always going to be functions that need high performance, because they’re in the middle of some tight inner loop. Or in the SSL case, *someone* is going to need to do very specific things with memory, like making sure it’s not holding any sensitive data.

My understanding of modern malloc implementations is that they include all kinds of sophisticated ways to prevent buffer-overflow attacks. When you request a block of memory, they set it up such that requests past the end of your block cause a segfault. Or they randomize the blocks they give you, so that you can’t just grab the next few bytes and expect there to be anything there.

I’m not a C programmer (I really need to know it, I think, to be a complete programmer), but all of this says a couple things to me:

1. If you use the right libraries, you should be protected against a lot of stupid behavior. Makes you wonder, for instance, why the OpenSSL team wasn’t using tcmalloc or ptmalloc. I’m sure there’s a reason; I just don’t know the problem space well enough to say.
2. Any serious software system, whether down at the bare metal like C or higher up like Python, is going to require lots of testing, regardless of whether it’s got compile-time type safety. There should be lots of unit tests. Ideally, the unit tests would also be able to simulate other components, using mock objects and whatnot. And then you need integration tests to see how well your component integrates with others. And then, in the case of a secure system, you probably need to bombard it with very focused buffer-overflow attacks, written by dudes who know the code inside and out. (Sort of like penetration testing within a company, on the assumption that you’re most vulnerable to your own employees.) And for performance reasons, you should also test it by bombarding it with millions of requests per second and seeing where it breaks. Testing is hard. QA is hard, and is very often not respected as a peer of engineering. Engineering is sexier. If you’re really good at QA, you’re spending your time writing systems to test many thousands of cases rather than just grinding out the same manual test over and over, and you’d probably rather be off building something new. Engineers also feel this way: they’d rather be writing new versions of the code than maintaining the old stuff.
3. An ideal team will learn from its mistakes and build systems that prevent the same bug — or similar bugs — from reappearing.
4. Building good software requires a good organization and good management (whether by “management” we mean someone who’s controlling the work product of his direct reports, or something broader like “group structure”). This is a variant of Conway’s Law: “Organizations which design systems are constrained to produce systems which are copies of the communications structures of these organizations.”

Let me be clear that I say all of this with absolutely no understanding of the OpenSSL code base, much less an understanding of the OpenSSL team’s structure. But it just strikes me that blaming an OpenSSL bug on the C language doesn’t really get at the problem. A successful software system will fix this mistake and ensure that it never happens again. A successful *open-source* software system will take community direction to build such a resilient system, and will do it all with a fully open process. That goes beyond narrow issues of language choice.

A realization about writing and programming — March 29, 2011

A realization about writing and programming

Level-zero programmers obsess over essentially unimportant details of programming: whether you should write blocks like

void someFunction(void) {
printf “Yesn”;

or like

void someFunction(void)
printf “Yesn”;

, for instance, or whether to use vim or emacs to edit your code. These are unimportant for at least a couple reasons. First, you can solve these sorts of style problems in an entirely automated way. Second, there are many, many things that are more important than these sort of nits; they don’t even really impact your code’s readability, for one. These details don’t even count as ‘style’. They’re purely mechanical.

A level-zero English-language writer pays attention to Strunk & White. What’s odd about Strunk & White is that, far from being about [book: The Elements of Style] like its name suggests, it’s really [book: The Elements of Grammar]. True, grammar does matter. Inasmuch as “style” means “writing to appeal to your audience,” and inasmuch as your audience cares about little grammar nits, grammar is important. Grammar, in this view, is arbitrary but important. It’s like the location of the silverware at dinner, or like not chewing with your mouth full: it’s an arbitrary convention that certain groups of people pay attention to. It’s a class identifier. Using “whom” properly, or not ending a sentence with a preposition, is a class identifier. It’s a way to signal to people of your class that you’re one of them. Others will completely miss the signal, which doesn’t make them rubes; it just means that they don’t subscribe to your particular class signals. So call it [book: Elements of Writing For The Wealthy], perhaps.

But even with that caveat out of the way, and even if you don’t agree with Language Log that Strunk & White is a pile of trash, it’s still the case that [book: The Elements of Style] is wildly, comically unimportant to the act of writing readable text. Whether you use “whom” properly (and I’m a dues-paying member of the Whom-Using Board Of Pedants) has practically nothing to do with whether people will read your writing all the way through to the end.

Grammar is important to get you off the ground, in a certain sense. Works riddled with typos are often hard to get through. Using commas where you mean to use semicolons will sound wrong in your reader’s ear *if he’s trained to read them that way* (so again, the rule “know your audience” is logically prior to the rule “use punctuation properly”).

But that’s just the point: this is level-zero stuff. These are the rules you pay attention to because they’re rote and mechanical, and thereby easier to remember and implement than “grab your reader with a good hook” or “use lots of examples when you’re arguing an abstract point.” They’re high-school rules; they’re not adult rules.

Three progressively better ways to generate a random string in Python — March 23, 2010

Three progressively better ways to generate a random string in Python

Version 1 here is what I started with. Version 2 comes from a colleague, version 3 from another colleague.
Version 4 just makes things a little more succinct.

I enjoyed watching this simple function get cleaner as the days went by.

import random
import string

def random_str_1(n):
Generate a pseudorandom string of
length n.
out_str = “”
letters = [‘a’,’b’,’c’,’d’,’e’,’f’,’g’,’h’,’i’,’j’,’k’,
for i in xrange(n):
out_str += random.choice(letters)
return out_str

def random_str_2(n):
An improvement, using the fact that Python
strings are iterables that iterate over their
out_str = “”
letters = ‘abcdefghijklmnopqrstuvwxyz’
for i in xrange(n):
out_str += random.choice(letters)
return out_str

def random_str_3(n):
A further improvement, using an existing
library attribute.
out_str = “”
for i in xrange(n):
out_str += random.choice(string.ascii_lowercase)
return out_str

def random_str_4(n):
Adding a bit of concision.
return “”.join([random.choice(string.ascii_lowercase)
for x in xrange(n)])

def test_all():
Not really much of a unit test. Just confirms that
each of the generated strings is as long as specified.
Doesn’t test that strings are sufficiently random.
methods = [random_str_1, random_str_2, random_str_3, random_str_4]
for n in xrange(40):
for method in methods:
out_str = method(n)
assert len(out_str) == n

Automatic memoization: cleverness to solve the wrong problem — January 23, 2010

Automatic memoization: cleverness to solve the wrong problem

This is the first time in my career that I’ve used JavaScript extensively, so I’m trying to learn some best practices. A few people have recommended Crockford’s [book: JavaScript: The Good Parts], so I picked it up. While skimming through it looking for something else, I ran into his description (on page 44) of using memoization (essentially, caching) to speed up the computation of certain functions.

The example people always use here — and Crockford is no exception — is the Fibonacci sequence. The shortest way to implement it is recursively. As Crockford points out, the recursive solution doesn’t scale: the number of recursive calls necessary to compute the nth Fibonacci number is proportional to the nth power of the golden ratio. (From what I can see, the constant of proportionality converges very rapidly to about 1.11803. That number must have some special name.) I’ve coded both versions up in Python; the recursive version keeps track of how many function calls it had to make.

So then Crockford’s solution, and the solution written in lots of places ([book: Higher-Order Perl], for instance) is to use memoization: cache the results of fib(n) for smaller n, then use the cached results rather than computing those values anew when you need them later on.

This isn’t really a solution, though, as my friend Seth pointed out to me some months ago. It’s certainly clever, and languages like Perl or JavaScript make it very easy. In Perl, you can automatically memoize any of your functions with the Memoize module: just do

use Memoize;

and voilà: all of your calls to `fib()` will be memoized from then on. Pretty cool, really. Manipulating the symbol table is kind of neat.

But this has only temporarily disguised the problem. Let’s look at the time and space requirements for all three of our `fib(n)` implementations:

__Recursive, non-memoized__: exponentially many function calls, hence exponentially many stack frames, hence exponential memory usage. Linear running time.

__Recursive, memoized__: linear memory usage (one cache entry for every [math: i], for [math: i] less than or equal to [math: n]). Linear running time.

__Iterative, non-memoized__: constant memory usage (must keep the [math: n]th, [math: (n-1)]st, and [math: (n-2)]th values of the sequence in memory at all times, but that’s it), linear running time.

By using some language cleverness, you’ve made the problem harder than you need to: memoization increases your memory usage from constant to linear, and does nothing to your running time. By thinking harder about the problem, you can improve both performance aspects and not need any clever language business.

Seth has told me for quite a long time that recursion and higher-order programming (closures and so forth) are interesting but make debugging a lot harder. His contention would probably be that you can often replace a clever recursive solution with a non-clever, easier-to-debug, better-performing one.

That said, at least some higher-order-programming tools make my life easier. In Python, I love list comprehensions:

print [x**2 for x in xrange(1,11) if (x**2 % 2 == 0)]

or the older-school but perhaps more familiar `map()` and `filter()`:

print filter(lambda x : x % 2 == 0, map(lambda x : x**2, xrange(1,11)))

(because I favor readability over concision, in practice I would expand this into several lines: a `map()` line, and a `filter()` line, and so forth)

As I’ve mentioned, I lament the absence of first-class functions in Java. Then again, I’ve seen enough people shoot themselves (and me) in the foot with unreadable Perl or Python, using all the cleverness that the languages provide to them, that I think I’d be okay with a less expressive language that forces programmers to be really boring.