Type-checking in Python: to prevent you from doing something stupid

The Python community is never too pleased, for some reason, when people ask for extra strong-typing features to be added to the language. The gripe seems to be that strong typing is for languages that are more static than Python, and that strong typing therefore is a betrayal of everything that the language stands for.

Be that as it may, strong typing is there to prevent you from doing something stupid. The dream, I take it, is that if your program makes it through compilation, it will run properly. In that sense, the dream of strong typing is that all problems of properly running code can be reduced to problems of syntax. This is a fine thing to desire from a language.

Just now I was reminded of why it’s sad that Python doesn’t think this way. What I wanted to do was something like this:

some_list = ['a', 'a', 'b','c', 'c', 'c', 'd', 'e']
some_dict = dict()

for item in some_list:
    if item in some_dict:
        some_dict[item] += 1
        some_dict[item] = 1
for (item, count) in sorted(some_dict.iteritems(), lambda x,y: y[1] - x[1]):
    print "%s:t%s" % (count, item)

It’s just supposed to take a list of items and display them in descending order of how frequently they appear. In this code, ‘a’ appears twice, ‘b’, ‘d’, and ‘e’ once, and ‘c’ three times, so the output should be

3:  c
2:  a
1:  b
1:  e
1:  d

Instead what I was getting was an error:

(10:50) slaniel@slaniel-laptop:~/python_test$ python ./dict_destruction_test.py
Traceback (most recent call last):
  File "./dict_destruction_test.py", line 5, in <module>
    if item in some_dict:
TypeError: argument of type 'int' is not iterable

I couldn’t figure out why it was telling me this. Turns out that instead of writing

if item in some_dict:
    some_dict[item] += 1
    some_dict[item] = 1

I had written

if item in some_dict:
    some_dict[item] += 1
    some_dict = 1

which wiped out some_dict and turned it into the integer 1.

A strongly-typed language would declare that dicts are dicts and ints are ints, and never the twain shall meet. It wouldn’t allow me to squash the whole dictionary. Regardless of your ideological priors for programming languages (and it is, by the way, remarkable to me that people do often have such priors), this seems like a desirable outcome. Unless I’m missing something, Python won’t let you protect yourself in this way.

Those four lines of Python could be reduced to this one line:

some_dict[item] = some_dict.get(item, 0) + 1

Inasmuch as reducing the amount of code reduces its expected bug count, this shortening may reduce errors. But it doesn’t solve the larger problem.

What I really want, what I really really want, is to specify argument types in Python signatures — e.g.,

def some_func( int my_int, str my_str ):

If I’m writing library code that others are going to use, I end up writing my own garish, hackish type-checking into the function:

def some_func( my_int, my_str ):
    if not isinstance(my_int, int):
        raise ValueError("my_int must be of type 'int'; got type '%s' instead" % type(my_int))
    if not isinstance(my_str, str):
        raise ValueError("my_str must be of type 'str'; got type '%s' instead" % type(my_str))

This could be abstracted a bit, and could be made more concise with Python decorators, but it’s still not what I want — namely, compiler-level checking that breaks as early as possible if my_int isn’t an int and my_str isn’t a str. Again, I want type violations to be syntax errors, not runtime errors.

The usual response in the Python community is that they believe in using unit tests rather than type-checking. I don’t understand why one has to choose.

I also don’t understand the harm in making argument-type specification optional. Right now, function prototypes look like

def func_name( arg1, arg2, ..., argN )

Now they’d just look like

def func_name( arg1_type arg1, arg2_type arg2, ... argN_type argN )

Those look completely different, so far as the compiler is concerned; looking at a bit of source code, it could tell whether the types were specified or not, and could decide whether or not to complain on the basis of a compiler directive — something like

use type_checking

at the top of the file. It seems to me that we can have our type checking and eat our backwards compatibility, too.

7 thoughts on “Type-checking in Python: to prevent you from doing something stupid

  1. Not to be a Lisp weenie, but that’s what I worked on for the last two years . . . so Lisp allows optional type declarations both in argument lists and for all variables, and I agree that it’s a cool and useful feature. I believe you get compile time warnings if there is an apparent violation and run time errors if there actually is a violation. Even cooler, these optional type declarations can be used by the compiler to improve performance. I had this discussion with Martin Martin when I interviewed at ITA, and he seemed to think that while other languages allow for optional type declarations (including Groovy, which he liked a lot), Lisp was the only one that currently used them for performance optimization.

  2. Google for “Optional Static Typing In Python”. This has been proposed for a long time, but, in general, it just can’t seem to get any steam going in the Python community. I mean, the top article was a written by Guido Van Rossum himself!

    Part of this, I think, has to do with the fact that Python isn’t really compiled. In fact, the Python interpreter doesn’t even look at some code blocks in modules that are loaded until execution time. So even if you had static types, you may not know about a type error until later anyway!

    For static types to help you, there would need to be a Python compiler, but since Python is extremely dynamic, that turns out to be a hard problem. There are attempts at this. Check out Shed Skin which actually converts a restricted set of python to C++ for compilation. I don’t know if it does any additional checking or if it just tries to convert it to semantically equivalent C++ or what.

    Python is just not well built for static typing to help you. It’s well built for rapid prototyping… which is where dynamism and weak to non-existent typing can be useful. If you want a more strongly typed language, there are plenty out there.

  3. I always hear that Python is extremely dynamic, but it’s not entirely clear to me what that means. I don’t think I use the dynamism that I’m supposed to be using, and I wonder how many people do. Types of objects aren’t supposed to change in the middle of the program, are they?

    In Perl it was supposed to be a virtue that you could do something like

    my $some_int = 2;
    $some_int .= " plus three equals 5";
    print $some_int, "n"'

    and have it print “2 plus three equals 5″. The type of $some_int is dynamic: at one moment it’s an int, and at another it’s a string. I never understood why this was supposed to be a good thing.

    Python won’t let you do that:

    >>> some_int = 2
    >>> some_int += " plus three equals 5"
    Traceback (most recent call last):
      File "", line 1, in 
    TypeError: unsupported operand type(s) for +=: 'int' and 'str'

    You need to explicitly cast some_int:

    >>> some_int = 2
    >>> some_int = str(some_int) + " plus three equals 5"
    >>> print some_int
    2 plus three equals 5

    I trust you and Guido that it’s a hard problem to have objects carry around their types in Python, but I’ve not been entirely convinced. Why, in principle, is it any harder for the interpreter to decide that

    some_dict = dict()
    some_dict = 1

    is illegal than it is for the interpreter to decide that

    some_int = 1
    some_int += " is the loneliest number"

    is illegal?

    Finally, as to the choice of an alternative, more-strongly-typed language: one of the virtues of Python is that it’s a great glue language, with a lot of libraries for performing standard sysadmin tasks. Languages like ML or Haskell can’t, I take it, say the same thing. Can you think of a strongly-typed language that performs the same function just as well?

    Actually, the thing about ML or Haskell is that I just don’t see a lot of platforms in wide use that use them. Python’s got Django for the web, for instance; I don’t think ML and Haskell do, though I’d love to be proved wrong: I’ve been looking for an excuse to learn one of those languages forever. In fact, I’ve been looking for that excuse ever since I read an amazing presentation by Mark-Jason Dominus on C, Perl, and ML.

  4. Matt: Martin was wrong. Look at Dylan, for example. Also, I find Lisp’s syntax for type declarations[1] to be very cumbersome compared to Dylan’s. I think in Common Lisp you get run-time errors IF you didn’t (declare (safety 0)) or some such thing.

    mrz & mjd: I think Steve’s point is that he wants a language that can do both. I agree with your contention that Python has too many dynamic features for that to work well. You’d have to be willing to restrict yourself to a subset of the language. I always wondered if the dynamic reputation of Common Lisp was one of the reasons it didn’t catch on in a big way. Given that Python is even more dynamic than CL, I’m gonna have to say that wasn’t the issue. :)

    [1] (let ((foo 1)) (declare (fixnum foo)) (+ foo bar))

  5. “”” Why, in principle, is it any harder for the interpreter to decide that

    somedict = dict() somedict = 1

    is illegal than it is for the interpreter to decide that

    someint = 1 someint += ” is the loneliest number”

    is illegal? “””

    The first is just associating values with a name and is essentially no different from

    somedict[somekey] = {} somedict[somekey] = 1

    The second is actually operating on the values. Which isn’t to say that (in principle) the interpreter couldn’t mandate that only values of certain types be associated with a name—but it’s a different principle, at least. After all, the interpreter only decides that the second is illegal when it tries to do it and the code for the int object objects, though there are I believe other tools that can catch these things earlier; probably the PyPy people have one. These do actually seem like (semi-)independent issues to me: on the one hand you might want to ensure that the following names contain values within a certain type-range, and on the other, you might want to ensure that the operations on a name are consistent with whatever it actually is, without having to find out at run-time. You could have the latter without the former—in the present example, for instance, the most certain you can be about some_dict’s type is that it is either a dict or an int, but ints aren’t iterable, so things will go boom at the if-branch. Probably you want a type-checker that does both, of course.

    There is at least one web framework for Haskell, but I have no idea how mature it is.

  6. Types of objects aren?t supposed to change in the middle of the program, are they?

    It depends, but, it’s really less about suddenly changing an object’s type so much as things like functions taking specific types. I believe the Python perspective on this is to go with duck typing and not worry about it much beyond that. This is because Python is trying to be more of a prototyping language you can “play” with. That is, they take the perspective of catching with testing because, from their perspective, if you’re jerking around worrying about types, you’re not spending that time working on the functioning part of the code.

    However, I take some issue with this. It would be like saying, why not have an untyped C and just catch all problems using asserts or something. The amount of testing to cover all necessary paths this way gets completely out of hand very quickly.

    Why, in principle, is it any harder for the interpreter to decide that…

    As Ben mentions above, this is because you’re actually operating on the values. Also, keep in mind that Python ignores stuff it’s not actually executing. This python program prints “Hello”:

    foo=2 bar=" is the loneliest number" if foo==1: baz=foo + bar print "Hello"

    And that’s because python never evaluates the contents of the if. So, you figure, “why not just look for stuff like this in a first pass?” Well, you could but that’s not how python interprets stuff and probably for performance reasons. It could probably try to do this sort of thing, but, also be aware that this sort of thing can get really complex. Again, it can be done, but Python is not designed to help with this because they explicitly seem to make it a non-goal to care about typing that much.

    Finally, as to the choice of an alternative, more-strongly-typed language: one of the virtues of Python is that it?s a great glue language, with a lot of libraries for performing standard sysadmin tasks.

    This is probably due to the fact that Python is such a great prototyping language with a fat set of libraries and it interfaces well with C in both directions. Basically, it lets programmers “bang stuff out” in an easy and convenient way without completely devolving into unreadable madness like Perl.

    Languages like ML or Haskell can?t, I take it, say the same thing. Can you think of a strongly-typed language that performs the same function just as well?

    Well, let’s face it, people seem to hate functional languages in general for whatever reason. But issues like that aside, it seems at least possible to do plenty of web programming if you want in these languages. Just google for “Haskell Web Programming” and you’ll get hits. But, yeah, because these languages are foreign to most people, the level of integration isn’t as great as you might find with something like Python.

    I suppose if you want something that’s “popular”, suitable for web progamming, and that has stronger typing than Python, you could always give Java a whirl. Though it seems like the word on the street is that nobody really wants to program in Java anymore. They just do it to put food on the table.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>