Quick, anonymous lists in Java?

slaniel | Java | Sunday, November 29th, 2009

Suppose I have a function in Python that can take either True or False as an argument, à la

import sys
def doSomethingWith(myBool):
    if myBool is True:
        print "You have experienced profound Truth."
    elif myBool is False:
        sys.stdout.write("You have brought disrepute upon your "
            "venerable ancestors.\n")

Now suppose I wanted to unit-test this function through all possible code paths. I could do something like so:

def testDoSomethingWith():
    for boolean in [True, False]:
        doSomethingWith(boolean)

How do I make an anonymous list like that in Java? The shortest way I know to do this, so far, is

List<Boolean> trueAndFalse = new ArrayList<Boolean>;
trueAndFalse.add(Boolean.TRUE);
trueAndFalse.add(Boolean.FALSE);
for( Boolean bool : trueAndFalse ) {
    doSomethingWith(bool);
}

Granted, this is doing a bit more than the Python does: it’s declaring that trueAndFalse is a list of Booleans, which is nice. But this would get really untenable if, instead of just Boolean.TRUE and Boolean.FALSE, I were testing doSomethingWith() over a range of integers.

So: is there some way to quickly iterate over an anonymous typed list in Java?

Does anything actually improve programmer productivity?

slaniel | Programming languages | Saturday, November 28th, 2009

I make a stink on here periodically about strong static typing in programming languages — namely the contention that

  1. All objects should carry around their “types” (e.g., an object is an integer, or a string, or an EmployeeRecord, or an ArticleOfImpeachment, or an ArianeRocketCurrentAltitude, or some other type that tries to model a piece of the real world)
  2. Types can become other types only through some piece of syntax which makes clear that that’s what’s happening.
  3. The only operations which can be performed on a given object are those actions which are appropriate to that object’s type.
  4. As much checking as possible should happen at compile time.

Python does 1) through 3), but not 4). So for instance, if your Python program looks like so:

#!/usr/bin/python

my_int = 3
my_str = "the largest number I can conceive of is "
my_str += my_int

and you try to run it, Python will smack you with a quickness:

(15:31) slaniel@Steve-Laniels-MacBook-Pro:~/python_test$ python ./concatenate_str_and_int.py 
Traceback (most recent call last):
  File "./concatenate_str_and_int.py", line 5, in <module>
    my_str += my_int
TypeError: cannot concatenate 'str' and 'int' objects

Python is telling you here that if you want to put a string together with an integer, you have to tell the interpreter to turn the integer into a string first, à la

my_str += str(my_int)

whence you get the perfectly fine output “the largest number I can conceive of is 3″.

Python doesn’t bother checking the parts of your code that never get executed; hence if you stick the bad syntax inside a function and never call that function

#!/usr/bin/python

def some_func():
    my_int = 3
    my_str = "the largest number I can conceive of is "
    my_str += my_int

and try to run it, Python will not complain at all. The Python interpreter delays considering the types of variables until you need to access them. Which is to say that if you then try to access some_func(), Python will complain:

#!/usr/bin/python

def some_func():
    my_int = 3
    my_str = "the largest number I can conceive of is "
    my_str += my_int

def main():
    some_func()

main()

Running this produces the same TypeError that we got up above.

That’s just a bit of ground-clearing, to differentiate between strongly-typed languages, in which every variable has a type, and statically-typed ones, in which as much checking as possible happens at compile time. Python is strongly typed but not statically typed.

The typical response of Python programmers — and it is an eminently reasonable response — is that you need unit tests. That is, every possible path into and out of your code should be exercised. That forces the Python interpreter to test our some_func(), and would reveal our problems before they reach production. Here I would ask why we can’t tell the interpreter to explicitly check inside all functions, whether or not they lie in an accessed code path, but that’s not really germane to my argument.

But in any case, Python programmers tell you that you need unit tests. If you have unit tests covering your code, you can be confident that it will run properly in the wild; if you don’t, you can’t be. Therefore static typing is at best superfluous and likely will slow you down to no valid end.

There’s a whole suite of practices that modern programming insists on, among which are

  • unit testing
  • code review (i.e., another pair of eyes has gone over your code)
  • static type checking
  • object-oriented programming
  • agile programming of one form or another
  • Higher-order languages. Over time, we’ve moved from assembly language — manually putting things into and taking things out of registers — to languages like C which abstract away some of the details of the hardware but still require you to allocate and free memory, to languages like Java that “garbage-collect” memory for you. (You can take this further, to languages like SQL that let you specify what you want and take care of the details of how you get what you want.)

My question is: have any of these demonstrably improved software development? You can measure that in any number of ways:

  • average number of bugs per line of code
  • time from initial design meeting to code being released
  • customer satisfaction along various measures (e.g., the famous tire-swing demonstration of the software-design process)
  • time through one iteration, from design through development to release through bug fixing
  • developer productivity, measured by lines of code per day, features per day, etc.
  • corporate income, corporate profits, etc.
  • security: how vulnerable is the product to attack?
  • domain-specific metrics: if your product is subscription-based, one key metric is what fraction of customers choose to stop using it every month.

You can see the defects in any of these measures just as well as I can, of course. Corporate income depends on a lot of things other than software-development system; companies can fail because of macroeconomic indicators as well as on their own merits. (A statistician here would compare companies in similar industries, subject to the same macroeconomic forces, differing only in how they produce software.) Measuring productivity by lines of code per day gets at part of what you want, but it tends to minimize the value of well-tested lines of code. If you measure security by “vulnerabilities,” you need to differentiate between a serious vulnerability (one which allows an attacker to get root on a machine, get credit-card numbers off a database, etc.) and a less-serious one (denial-of-service, etc.). And so on.

So pick your favorite set of metrics. Certainly you’re going to need some metrics if you want to judge the success of a software system. You could always be qualitative about such things and measure success “by gut instinct”; while quantitative metrics have their problems, trusting people’s guts probably have just as many. Among other things, guts don’t scale; you need to pick a measure of success whose measurement doesn’t depend on choosing the right manager with the right guts.

Now, my big question is: is there any set of metrics for which any of these software-development techniques has proved unambiguously good? I’ve gotten a few answers to this question. One is that a lot depends on the type of people you hire. My colleagues have told me that if you work with brilliant software developers, you tend to reject static type-checking; your colleagues have a low-enough error rate as it is (or at least, their errors tend not to be the sort that a type system catches) that the type checking only reduces productivity without an offsetting decrease in the error rate. But there comes a point where your company is too large to know how everyone else is going to use your code; there, perhaps, is where you start strictly defining the interfaces between your components, enforcing rigid compile-time checks on arguments, and so forth.

Maybe, then, we should divide our question into cases by company size. Or we might want to divide software-development tasks by size. It may well be that small products don’t need compile-time type checking, but large products do. Certainly there comes a point when products need to be designed in a way that they weren’t when they were smaller.

In The Mythical Man-Month, for which I still need to write a review, Fred Brooks predicts that there is no innovation in software-development practice which will improve programmer productivity by 10x [1] over the subsequent decade. In the epilogue, he’s hopeful that object-oriented programming will help with this; like a lot of people in the early 90s, Brooks subscribed to the dream that OOP would turn software into pluggable modules with well-defined interfaces, which we’d mix and match into larger systems. I remember one such dream was that you’d pick a spell-checker from one company, a grammar checker from another, a text editor from a third, and stick ‘em all together to make your own Microsoft Word variant. It didn’t work out that way. Like anything else, OOP is a tool, and the early adopters of that tool tend to think that it will do everything short of cure cancer. Give any tool a decade, and it becomes boring. Good-boring, though: the kind of boring that lets you solve little problems every day without thinking much about The Revolutionary Potential Of What You’re Holding In Your Hand. Eventually OOP just becomes another kind of hammer.

What I’m looking for, then, are some empirical results on all the various new churches of software-development practice. Which of them has demonstrably made life better for software organizations? I say “organizations” deliberately: any non-trivial programming task will require more than one programmer, so it’s not interesting to look at individual programmer productivity in isolation.

I also wonder how any of these software-development techniques compare in importance to lifestyle and to corporate structure. You may have brilliant developers, but will they produce good code if they hate the physical location of their office buildings? How will they do in cubicles versus separate offices? What if your company is dysfunctionally organized, so that no one knows what he’s doing and information gets stopped at certain critical nexuses? (A friend has a great term for those people who claim a project for their teams but never actually do anything with those projects; he calls them “cookie-lickers.”) No matter how good your software-development practice is, you’ll still end up in cases like this embodying the structure of your company in the structure of your code.

Your company can put up with a lot of dysfunction if it has a lot of money. I have no idea what Microsoft is like internally, but it makes an insane amount of cash because everyone needs operating systems and word processors. It could probably fail to improve Windows or Office for years, continue to rake in the cash, and in the meantime furiously scramble to re-engineer itself. I’ve seen versions of this on a smaller scale.

My hypothesis, then, prior to any data gathering, is that the productivity of a software-development team owes much more to corporate management and employee lifestyle than it does to whether the code is sufficiently covered by unit tests, whether the company uses OOP, or other such technical matters. I’d even suggest that developer productivity is mostly an effect of employee happiness rather than of development methodology.

Finally, there’s a second-order problem to my entire question, akin to the Red Queen’s race:

“Well, in our country,” said Alice, still panting a little, “you’d generally get to somewhere else — if you run very fast for a long time, as we’ve been doing.”

“A slow sort of country!” said the Queen. “Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!”

Suppose there were some change in process that unambiguously improved everyone’s productivity, as measured (say) by bug-free features per programmer-month. My company would either immediately start using this process, or — all else being equal — would go out of business as its competitors got on the cluetrain. So now the new standard for productivity would be twice the old standard, and all the companies left in the market would be back where they started. Absolute productivity — features per programmer-month — isn’t necessarily what we care about. Rather, we care about how productive our company is relative to its competitors.

[1] – Note to techie folks: “10x” is far shorter and clearer than “an order of magnitude.” Though “10x” does make you sound less sophisticated.

Julia Angwin, Stealing MySpace: The Battle to Control the Most Popular Website in America

You know you will run into trouble when a book about MySpace tells you that the service “had 7.5 million users and offered slightly more storage space — 300 Mb (megabits)”. Yes, the use of “bits” there is [sic]. Your hackles go up even further when Julia Angwin refers to “the program ColdFusion.” ColdFusion is a programming language. When she explains that Metcalfe’s Law “holds that the value of a network grows faster than the number of its users,” she is technically correct, but that’s sort of like asserting that a Ferrari goes from zero to sixty in “less than a year.” What Metcalfe’s Law actually says is that the value of a network grows with the square of the number of people in it. [1]

If you’re of a specific cast of mind, the final coup de grâce will be delivered when Angwin approvingly quotes Herbert Simon’s dictum that “a wealth of information creates a poverty of attention,” but doesn’t cite the original source for that quote (dictum at the bottom of page 6). Instead she chooses to cite USA Today, which in turn quotes Simon. This is somewhat akin to not quoting the Buddha, but instead quoting the fortune cookie you had after General Tso’s chicken.

So Stealing MySpace isn’t really a work intended for a technical audience, or even one with a technical focus, but that may well be the point. MySpace was, from the moment of its conception, not a Silicon Valley company; it was a Hollywood company. It has its roots in Intermix, a company haphazardly bringing together the shadiest parts of the Internet. All the websites selling you wrinkle cream, or books on How To Get Girls, or atrociously-designed social networking? Those were Intermix’s bread and butter. Facebook was the site designed by engineers; MySpace was the one designed by business executives.

At first, Stealing MySpace sides with the executives: MySpace lives by the seat of its pants, adding features as rapidly as possible to meet insatiable customer demand. If a feature turns out not to interest anyone, it’s dropped or orphaned. Angwin compares MySpace favorably, in this regard, to Friendster: Friendster spent untold quantities of engineering time figuring out how to compute statistics on graphs that only an engineer would really care about; MySpace actually solved problems that users cared about. And they pushed out code, no matter how broken, just so that they could get something out there in front of users; they “iterated often,” in the jargon.

But eventually the complexity of the problem before them required real computer science and real large-scale systems-administration talent that a company based in Los Angeles just cannot assemble. Facebook, meanwhile, is filled with Silicon Valley geeks who grew the company steadily and with relatively few speed bumps. Facebook’s frustratingly successful (he is 25 years old) founder, Mark Zuckerberg, appears sporadically throughout Stealing MySpace as the zen-like naif who scorns potential buyers, looks lazily upon MySpace’s meteoric rise, and patiently builds something which has cleaned MySpace’s clock by the end.

By the end of Stealing MySpace, it seems pretty clear that business executives are exactly the wrong focal point for the book. Software developers are the ones doing the real work, and social-networking sites live or die by their labors. We’ve just spent 250 pages, first watching petty businessmen peddle wrinkle cream, then eventually watching Rupert Murdoch decide how many billions to spend on MySpace. Murdoch doesn’t have any control over whether the site brings its users what they want. We’re watching people like Murdoch because they ostensibly make MySpace dance like a marionette; but the real action happens much further down.

When she’s not writing as an amateur business fetishist, Angwin is quoting passages from Danah Boyd that sound as though they could have been written by David “Wal-Mart People Do X, While Crate & Barrel People Do Y” Brooks:

The goodie two shoes, jocks, athletes, or other “good” kids are now going to Facebook. These kids tend to come from families who emphasize education and going to college. They are part of what we’d call hegemonic society. They are primarily white, but not exclusively. They are in honors classes, looking forward to the prom, and live in a world dictated by after school activities.

MySpace is still home for Latino/Hispanic teens, immigrant teens, “burnouts,” “alternative kids,” “art fags,” punks, emos, goths, gangstas, queer kids, and other kids who didn’t play into the dominant high school popularity paradigm. These are kids whose parents didn’t go to college, who are expected to get a job when they finish high school. These are the teens who plan to go into the military immediately after schools. Teens who are really into music or in a band are also on MySpace. MySpace has most of the kids who are socially ostracized at school because they are geeks, freaks, or queers.

There is a better book about MySpace that spends more time watching the hackers, less time caring about who spent a billion wasteful dollars on what, and even less time in amateur sociology. Julia Angwin is not the person to write that book. Stealing MySpace can be safely ignored.

[1] — Angwin doesn’t care much about this detail, so she won’t care about Odlyzko and Tilly’s refutation. One quick example demonstrating their point. First, consider two networks of similar size, and call that size “n”. Now, if Metcalfe is right, each of those networks has value proportional to the square of n; hence the total value of those networks, considered separately, is proportional to twice the square of n. But if they merge, they form a new network of size 2n, which means the value of the merged network is proportional to four times the square of n. Which means that by merging, they’d instantly double their total value. A lot of far-fetched economic assumptions would have to be true to keep these networks from merging. But then why do we still have separate instant-messaging networks like MSN, AOL, Jabber, etc., etc.? Why haven’t Verizon and AT&T already merged?

Or you could use Thoreau’s line, also quoted in the Odlyzko/Tilly paper:

We are in great haste to construct a magnetic telegraph from Maine to Texas; but Maine and Texas, it may be, have nothing important to communicate.

Which is to say: suppose someone in Texas joins AT&T’s wireless network. Metcalfe’s Law assumes that this additional Texas user is just as valuable to me as a new Massachusetts user. But that’s obviously not true.

This town has a problem with signage

slaniel | Boston | Friday, November 27th, 2009

…and not just the well-documented problem of its streets. For those of you who’ve never been to Boston, you’ve never experienced the comically, bewilderingly, sureally unlabeled streets. You can go for miles on certain major streets without once seeing a sign telling you where you are. I’ve heard it said — but have not confirmed — that it’s actually enshrined in legislation: major streets shall not have signs on them. Yet the cross streets are all perfectly well labeled. It is bizarre.

(My favorite experience trying to figure out where I was, back when I first moved to Boston: I walked up to a random fellow on the street and asked where Commonwealth Avenue was. Only then did I realize that the fellow I had chosen was drunk, and homeless. Nonetheless, he gave me a very clear answer to a question I hadn’t asked: “The streets down in this part of Boston are alphabetical: Arlington, Berkeley, Clarendon, Dartmouth, Exeter, Fairfield, Gloucester, Hereford.” I thanked him, then asked, “Right, but where is Comm. Ave.?” to which he replied by turning, pointing, and saying, “It’s right there.”)

But the signage problem extends well past the roads. Whenever there is construction, or anything vaguely out of the ordinary here that requires a detour, no one realizes this ahead of time and prints up a nice, professional-looking sign — no matter how nice and professional the city officials are. There are typically two approaches to signage around here:

A sign on a T kiosk: PRESS _CREDIT_ IF USING A _DEBIT_ CARD

  1. A hand-written sign that lingers for months, curling at the edges and getting riddled with graffiti in the meantime.
  2. A police officer or other official standing there looking bewildered. Some common commuter route is closed, so you ask the police officer where to go; he replies as though he didn’t expect anyone to ask such a thing.

The latter happened to me today as I was transferring from the orange line to the red line at Downtown Crossing. The one stairwell that will take you down to the Alewife-bound track was closed. T officials were standing there looking bewildered. There weren’t any signs. They directed me to the tunnel that connects Park to Downtown Crossing; if I hadn’t known that the red line was there, their pointing would have confused me. It still confused me, because I’m not accustomed to walking to another station to catch the train.

The signage problems extend still further. I remember going to the 1369 Café years ago with my friend Jason, who doesn’t keep his frustrations to himself as much as I do. The 1369’s line was flowing to the right away from the counter where the barista was taking orders, and the barista yelled to everyone to swing around to the left. Jason very acidly commented, “If you want people to move that way, how about putting up a goddamned sign?” Putting up a sign would not be in this city’s nature. Nor would something more permanent and effective, like setting up a rope line that follows the path you want your customers to take. That sort of structure, I contend, makes the audience feel more at ease, because it knows exactly what it should do; Boston’s typical way of setting things up is to assume that you’re a local and that you know how things work.

I could extend this to what seems a very Boston habit: set up your architecture in a broken way, then pile hacks over top to make it work, approximately. To me the pinnacles of this are the walkways over Storrow Drive near the Hatch Shell, and the route you have to follow if you want to walk on the Longfellow Bridge from MGH. With absolutely no evidence one way or another, it seems to me that neither of these was built with pedestrians in mind.

If it sounds like I’m bitching about the city I love, it’s true: I am. But I bitch out of love. Those of us who love the place, I think, alternate between exasperation and a wry smirk when we see how busted it can be. Like watching a close friend, in many ways upper-class and refined and erudite, who nonetheless can’t help chewing with his mouth open.

White House party crashers never made the place insecure

slaniel | Terrorism and psychopathology thereof | Thursday, November 26th, 2009

says the Secret Service.

So if not being on a guest list, but going through all the standard security procedures, never put The Leader Of The Free World in danger, “because the party crashers went through the same security screening for weapons as the 300-plus people actually invited to the dinner,” why should we be required to present ID when boarding an airplane after passing through multiple layers of screening?

Programming-language question of the moment: functions with multiple return types

slaniel | C/C++/C++0x; Programming languages | Tuesday, November 24th, 2009

I noticed today that Java has getInteger and getBoolean methods to parse config-file strings. You call something like

Integer myInt = getInteger("com.microsoft.office.word.maxDocsOpen");

or

Boolean myBool = getBoolean("com.microsoft.office.isSuperUser");

This violates my sense of aesthetics. Those are both tiny specializations of getProperty. What we’d like to do is write

Integer myInt = getProperty("someConfigOpt");
Boolean myBool = getProperty("someOtherConfig");

and have the compiler figure it out. But of course, compilers have to know at the time the statement is written what type each object will have. Languages descended from C don’t allow you to have multiple functions with the same arguments returning multiple types. So in this case you’d typically use outparams:

getProperty("someConfigOpt", myInt);
getProperty("someOtherConfig", myBool);

But that makes the flow of the program somewhat less clear. You have to know that myInt and myBool are outparams, rather than always looking for the thing on the left side of the equals sign to guide your knowledge of program flow.

My question is: why can’t the compiler figure out that when you write

Integer myInt = getProperty("someConfigOpt");

you mean the variant of getProperty which takes a string argument and returns an integer? I’m sure there are cases when it’s not so easy for the compiler to infer the type; in those cases, couldn’t the compiler just throw an error and ask for further type hints or an explicit cast? There are probably cases when even an explicit cast wouldn’t clarify which function signature is the desired one; in those cases I don’t know offhand what to do, which may be why C/C++/Java lack this feature.

First of all, what do you call this language feature of having multiple return types? Googling for ‘multiple return types’ returns me a rogue’s gallery of results, which is the standard sign that you’re not searching for the right vocab term. If you find me the right terminology, I will do the googling.

But secondly, I’m curious about the compiler issues involved in implementing such a thing. I presume Stroustrup discusses this in The Design and Evolution of C++, but I appear to have lost my copy. Bastard animals.

I really resent John McCain

slaniel | Palin, Sarah; Stupid-people media | Sunday, November 22nd, 2009

Without John McCain, Sarah Palin would still be several thousand miles away from my consciousness.

Without John McCain, the media would not be required to grant Palin even a modicum of respect. Her vapid utterances would be vapid utterances, and that would be that. Her book would not currently be #1 on Amazon.

Without John McCain, I would not know that there is a person named Levi Johnson.

Without John McCain, I would not know that Levi Johnson is posing for Playgirl.

Without John McCain, no friends of mine would have bothered to ask — as one did today — whether Johnson carries a tattoo with his own name.

I want to step back and observe the lunacy — the sheer unholy raving madness — of this. Levi Johnson is the father of the grandchild of a defeated former presidential candidate’s running mate. And vital neurons of mine — neurons that could have been spent understanding the non-commutativity of certain quantum-mechanical linear operators in an infinite-dimensional linear vector space, for instance — now have been stamped indelibly with certain facts about this human being. I resent this more deeply than I can convey. The bile is burning my keyboard.

Facebook Connect, Twitter, and this blog

slaniel | site admin | Sunday, November 22nd, 2009

The workflow on this blog is that I post something here, WordTwit (with my patches) sends it along to my Twitter profile, and finally the Twitter app in Facebook pulls the Twitter messages over into Facebook.

It’s a fine, mostly maintenance-free system. My only problem with it is that there are now three ways that people can post comments to these posts: either in the blog itself, as response messages (“@stevereads: LOL U R GAY”) within Twitter, or as Facebook comments. If only so that everyone can see everyone else’s comments, I’d like to get all of these linked together. I’m not entirely sure what the best way to do that is. Facebook Connect looks like it might be the way to do this, but I’m not sure. I invite ideas, and I’ll do some research on my own.

Funny scenes from Geoffrey Lewis’s Turkish Grammar

slaniel | Turkish Grammar (2nd ed.) | Saturday, November 21st, 2009
An introductory word must be said about agglutination, as it is this feature which English-speakers find most alien, although it does occur in English to a limited extent in such a word as carelessness. But in Turkish the process of adding suffix to suffix can result in huge words which may be the equivalent of a whole English phrase, clause, or sentence: sokaktakiler, ‘the people in the street’; gelirlerken, ‘while they are coming’; avrupalılaştırıverilemeyebilenlerdenmişsiniz ‘I gather that you are one of those who may be incapable of being speedily Europeanized’.
[G]roup-accent and sentence-accent (i.e. intonation) both override word-accent so completely thart some authorities have denied the existence of word-accent altogether. An English parallel will make this clearer. If one were asked to mark where the word-accent comes in machine, one would naturally put it on the second syllable: machíne. But if the word is used as the second element of a compound noun its accent is lost and the group-accent prevails: séwing-machine. If a manufacturer of sewing-machines tells his wife that he has bought one for her, her reply may well be an incredulous ‘You’ve bought a sewing-machine?’ with both word- and group-accent lost and the sentence-accent on ‘bought’ prevailing.

Let’s not penalize most failure (a riff off Google’s Chrome OS)

slaniel | Google; Mythical Man-Month, The | Saturday, November 21st, 2009

My friend Jamie notes that, on the basis of what he saw in a demo of Google’s Chrome OS yesterday, it’s going nowhere. I think this is the wrong way to look at it, for two reasons: first, it’s important to get something out there, and second, more generally: we, as a society, overly penalize failure.

Before I start, I should note that I know essentially nothing about the Chrome OS. I haven’t watched any demos of it. I know that it’s a stripped-down OS for use on netbooks. I’ve read John Gruber’s perfectly sensible point that many of us have one primary computer — a laptop or a desktop — along with a telephone (like the iPhone) that looks like a crippled computer if you squint at it right. As Gruber puts it: maybe you don’t need two cars; maybe you just need a car and a bicycle. People get along very well with a car and a bicycle.

My point here has little to do, though, with the substantive claim against Chrome OS. I don’t actually care a bit about the Chrome OS. Jamie links to some pundit’s piece, wherein the pundit claims that Chrome OS is “doom[ed] … to the dustbin of history.” Pundits need to say things like this; their jobs are to be provocative. I think it’s quite silly to take a position like that, however, when the record of pundit prognostication is so poor. Hell, the first version of Google’s Android phone interested approximately no one. Compared to the iPhone, the G1 was a flop. We’re on to the Droid, now, which people really seem to love. To suggest that the Chrome OS is dead on arrival is to suggest that it won’t improve. Windows 1.0, anyone?

Which gets to my real point, which is that you have to start somewhere. What I’ve learned from working at a startup, and from reading Stealing MySpace, is that it’s far better to get something out there, essentially no matter how broken it is, than to take forever to produce something stellar. A few reasons:

  1. By setting a firm, soon-to-arrive release date for a product, you force yourself to get something done. As they say: If it weren’t for the last minute, nothing would happen. Get something out there, then improve it.

  2. By offering a real, tangible product, you give your customers or potential customers a basis for criticism and comment. Now instead of dreaming about an ideal Google OS that they can attach all their hopes and dreams to, people have the real thing in front of them and can ask for specific improvements. (This is a point that lies somewhere within The Mythical Man-Month, which I need to review.)

    There’s a related point in here, by the way, about how to manage software organizations: if you design in a vacuum, with no actual customers to examine your product, you’re going to build something that no one wants. If you design for one customer, you’re going to find that the second customer wants something different from the first and you’ll need to redesign anyway. The Mythical Man-Month argues that you’re going to throw away your first design anyway, so don’t bother over-designing it. All of which suggests that if Chrome OS is undercooked — and again, I don’t know whether it is, and don’t care — that’s exactly as it should be.

  3. I’ll argue this next point by way of an example from my own life. I had been considering upgrading my iPhone to the latest, greatest, highest-capacity version from the 8-gig 3G I have now. Google’s Droid and Palm’s Pre aren’t good enough yet, so far as I can tell, to make me switch away from the iPhone, but they are making me delay my upgrade decision. Maybe there aren’t any phones I want to upgrade to right now, but do I want to lock myself into another two-year AT&T contract when Google or Palm might produce something really stellar before that contract would expire?

    Maybe the Chrome OS isn’t good enough to sway many purchasing decisions right now, but it’s out there now and will probably improve over time. As it does so, it may drive a wedge into the market: people will hold off until they see what Chrome OS 2.0 or 3.0 is all about. This is the strategy that Microsoft — and, I presume, most any smart software company — has been using for years; it’s called “vaporware” when a rival is doing it, “good marketing” when you’re doing it. Chrome OS may be strategic vaporware, and Google would be entirely right to create such a thing.

  4. There’s also the notion of a “disruptive technology.” I’m told that The Innovator's Dilemma has noticed a classic pattern with certain technologies; here I find MySQL a convenient example to keep in mind, though it breaks down when Sun buys MySQL and Oracle buys Sun. The pattern goes like this: there’s some entrenched player (think Oracle) that makes a massive, hardened, massively supported behemoth of a product that people pay premium prices for. Then along comes the little guy, producing a product that is — from the big player’s perspective — crippled and puny and not worth worrying about. Even better from the big player’s perspective, the little guy appeals to the big player’s most troublesome customers — those that don’t generate a lot of revenue and that generate a ton of support calls. So the Oracles of the world gladly dispense with their little customers. (Think of MySQL back when it only had MyISAM tables which didn’t guarantee that your data would be there after a power outage, didn’t support foreign-key constraints, and generally only functioned as a fast indexing engine on top of a bare filesystem.)

    Now the little guy has some customers. They’re little customers, but they’re customers. So now the little guy can build a product based on feedback — which it happily and quickly does, because there isn’t much code to change or much of an organizational battleship to turn. So now the little guy improves his product a bit and shaves off a little more of the big guy’s customers. The big guy still doesn’t notice; the little guy remains beneath his radar. Bit by bit, the little guy cuts into the big guy’s market; by the time the big guy notices, it’s too late.

    We’ve been thinking about Defeating Microsoft Windows for a very long time. It’s pretty clear to me, by now, that that’s just the wrong way to think about it. If I had to wager, I would suspect the Chrome OS is Google’s way of acknowledging that that’s the wrong way to think about it. Google isn’t going to destroy Windows with the Chrome OS, but maybe they’ll take a little bit away from the low end of the market. They probably won’t defeat IE with their Chrome browser, but they’ll insert a little wedge in that market; whatever happens, Google cannot be locked out of the browser market. Microsoft may lose a few customers here and there who decide that they don’t need a desktop computer and can do all they need with a browser, an email client, and a mobile phone. Little by little, companies pick off little corners of the computer market. Maybe Microsoft learns how to respond to these: maybe Windows Mobile goes somewhere; maybe IE becomes a capable browser; or maybe it doesn’t. But the point is that the direct assault on Windows has been tried and has failed. One promising approach to defeating Microsoft is to attack indirectly.

For all these reasons, even if Chrome OS is a failure, it may be valuable. As a society, we take a hard line on failure. We venerate Apple and excoriate Xerox. We praise Facebook and condemn Friendster. In the intellectual realm, I’ve seen praise of Gödel and condemnation of Bertrand Russell. In some senses it’s just that we do so; in many others, it’s not. Each generation of an idea learns from the failures that preceded it. The generation that succeeds generally could not have known where to step had it not watched the missteps of preceding, failing generations. We know that Newton succeeded by standing on the shoulders of giants, but don’t always realize that they were giant failures. And we’re better because they were.

Joseph White and Aaron Wildavsky, The Deficit and the Public Interest: The Search for Responsible Budgeting in the 1980s

Red border around the outside one inch. Then a black background beneath the inner rectangle. The title of the book is in white text over the black background, other than 'Deficit' in red.

First, let me note that my expectations about a book shouldn’t be relevant to anyone at all. More to the point, y’all shouldn’t care when I make a mistake about what I’m getting into. So maybe more for my benefit than for anyone else’s, let me note that this is not a book that will teach you anything about Senate procedure. It will not teach you about the Byrd rule. It will not help you understand what everyone’s talking about nowadays with “reconciliation.”

What it does do is describe the late-70s-to-mid-80s budget deficit, and how alarming a grip it held over the U.S. government. It paralyzed everyone. But — and the authors are at pains to make this clear — it didn’t paralyze them because they’re bad people, or because they wanted to hold the government hostage. It paralyzed them because, to the contrary, everyone thought that the budget deficit was a really serious problem that needed to be addressed. At the same time, everyone had very deeply felt beliefs about how the government ought to spend its money and who ought to pay taxes (or who had been suffering under the burden of heavy taxes for too long). True, there are the occasional people (Phil Gramm being one of them) who come out of The Deficit and the Public Interest looking like they just intend to score points against the other guy. But most people are doing exactly what we want them to: fighting for what they believe. Liberals (Tip O’Neill is the towering figure here) are trying to unburden the poor. Republicans are trying to slash taxes and reduce the size of government.

Ronald Reagan comes out of TDATPI looking great, actually. We may not agree with him, and his knowledge of economics may be rather flimsy, but he has his principles and he sticks to them. He doesn’t especially care what sort of compromises others think he ought to make; his belief in smaller government comes from a place of deep principle.

But the Reagan Revolution, if we read TDATPI, seems to have mostly ended by 1986. The outlandish increases in defense spending had been halted, and we can all see that Social Security is still with us. White and Wildavsky’s book, in this way, is a long affirmation of The Politics Presidents Make: we get a new revolution every few presidents (Jefferson, Jackson, Lincoln, Teddy Roosevelt, FDR, Reagan), but the revolutions get smaller every time. Every revolution brings in its train a whole set of new institutions; in Reagan’s case, those institutions were the constituencies around Social Security and Medicare, which — try as he might — he couldn’t hope to destroy. He soon enough realized he couldn’t destroy them.

Apart from the difficulty of the budget problems politicians were trying to solve, the main message I took from The Deficit and the Public Interest was a deeply skeptical one: do not depend on the government to help you out. Anyone who knows me knows that I am a firm defender of the welfare state, and would like to see it expanded to help the least fortunate much more. But White and Wildavsky’s account of the sausage-making that goes into major legislation made me strongly doubt that those benefits will be there when the poorest people need them. Congress never out-and-out destroyed Social Security, but they did pick at it as much as they could; every now and again, they would “delay a cost-of-living increase,” which is a maddening phrase. “Delaying a cost-of-living increase” means “cutting payments by the rate of inflation.” In these cases Congress is relying on Americans to not pay attention to the real value of their money.

The fundamental trouble is that there are only a few big pools of money to contend with in the federal budget: defense, Medicare and Medicaid, Social Security, and interest on the existing federal debt. If you’ve committed to double-digit real increases in defense, you are left to cut entitlements or debt service. You can’t really cut debt service; that is called “default.” So Congressmen will focus on the entitlements when they’re not just performing accounting chicanery. (The chicanery is why we talk about ten-year horizons for health reform. If you’re just going year-by-year, you can push this year’s expenses into next year; if’t harder to play games when you price things out over a longer horizon. Though you can still push some of the tenth years expenses into the eleventh; this sort of problem is with you no matter what.) I’m just not convinced that they’ll keep their hands off poor people’s benefits forever.

Now, in pursuit of these political realizations, we have 600 pages of the very minutest minutiae that man has ever committed to paper. I routinely found myself absolutely lost on details of who was supporting which bill, and where we were in Congressional procedure. This book was absolutely not made for me. In a year I am not going to remember who shifted $120 billion from where to where by offsetting some of the provisions of a particular tax shelter. Yet that kind of detail is precisely what The Deficit and the Public Interest is about. There may be some people who really need this sort of detail. Maybe political scientists? Whoever they might be, I am not one of them.

C.J. Date, Database in Depth: Relational Theory for Practitioners

slaniel | Database in Depth: Relational Theory for Practitioners | Sunday, November 1st, 2009

Cover of _Database in Depth_: a stylized, colorful version of the standard Database Diagram -- namely, a lot of tables with pointers between them, indicating foreign-key dependencies.

(Attention conservation notice: 2300 words about the foundations of databases. It would have been much less if I hadn’t felt obliged to explain everything from the ground up to people who know nothing about databases. I particularly invite comment from that part of my audience. How did I do?)

I’m glad I read this book when I did, before I knew very much about SQL or about databases generally. It’s an attempt to get people to think abstractly about databases, and therefore think more expansively about what databases can do for us. A database, in C.J. Date’s telling — and he’s got nearly (I emphasize nearly) all the weight of logic and evidence behind him — is a mathematical object that nearly exists in a Platonic realm; it only comes down to earth periodically to represent a collection of employees or a parts database or whatnot.

So then. What do most people think of when they think of a database? Under the influence of SQL, they imagine tables that look like so:

Employee ID First name Last name Title Salary
1 Jim Smitd CEO $9E9
2 Mark Johnson Janitor $10,000

Then we might have another table that lists addresses. “Why not put addresses in the same table as the employee-ID/first-name/last-name/salary table?” you might ask? Well, multiple employees can live at a single address (think of a husband and wife working at the same company), and a single employee can have multiple addresses (think vacation home and regular home, though naturally you might wonder why an employee database would list vacation homes). This is what’s called a “many-to-many relation,” and it needs to be treated in a special way. The specialness comes down to a desire not to introduce duplicate data into the database: you should only have to enter an address once, and you don’t want to risk mistyping the husband’s or the wife’s address. This is called “normalization,” and in a few specific forms it’s a tool to help keep the database consistent.

There are other accepted ways to keep databases consistent. One is to attach a type to each field. A “start date” field, for instance, must be of type DATETIME; “janitor” is not a valid start date. This is here to prevent humans from making mistakes; it would be all too easy to think you’re typing in the “job type” field, when in fact you’re typing in the “start date” field, and make some computer code further along down the line explode. True, you could write computer code to perform this same check, but enforcing a type for each field is the accepted way to do this sort of thing. [1]. And leaving this up to the application programmer is a recipe for an inconsistent database.

We define other constraints on the data as well. Let’s say, for instance, that we add a “Manager” field to our employee record. A manager is another type of employee, so this field should just include the manager’s employee ID. We add the constraint that the number under “manager’s employee ID” must exist under the “employee ID” field somewhere else in the database. This is called a “foreign-key constraint.” It’s another way to prevent errors. And when we point to managers by way of their employee IDs, we prevent duplicate records: I should only enter a manager’s full name once, and from then on only refer to the manager by way of his employee ID. Entering an employee record twice means twice the opportunities to introduce typing errors.

Now we have a collection of tables, constrained in certain ways. Here’s where C.J. Date takes this perspective and reframes it in a really compelling way: a database isn’t just a collection of data; it is a collection of true propositions. We’ve gone through all this effort to constrain our data because we want our database to express only truths. A row containing employee ID 10, with name “Andy Goldsworthy”, address “123 3rd St, Cambridge MA 02138″, and manager ID 129, is in fact a statement: “Employee ID 10 has name Andy Goldsworthy, etc.” If you’ve entered a typo, you have introduced a false statement into this database, and everything else is potentially broken: from a false statement, anything follows, and your database has now brought falsehood upon the land. You have darkened your neighbor’s door, and you shall be forced to atone.

If you think that this overstates the severity of the mistake you’ve just made, you are clearly not C.J. Date. The man takes his databases very seriously, because he views them as logical objects that just happen to be instantiated on physical computers.

This focus on the logic of databases leads him to despise certain parts of the modern database toolkit. The idea of a null value is his most-despised enemy, for one. A null indicates that we don’t know the answer to something. Let’s take the “salary” field in our employee database, for instance. Maybe this is currently unknown, because HR manages the database and hasn’t yet gotten that information from the person responsible for salary negotiations. HR needs to complete a report on employeee compensation, though, so it needs to compute amounts like the average employee salary. The average is the total salary for all employees divided by the number of employees. But the total salary is unknown, because certain employees’ salaries are unknown. If you’re not careful, you’ll end up with an average that’s too low: you’ll divide a salary total for all those with non-null salaries by the total count of all employees. And you can’t expect query languages to help you here: sometimes you’ll want to divide by the real total, and sometimes by the non-null total.

The use of nulls, though, seems largely unavoidable. You could split the salary field off into a table containing just employee ID and salary, for instance, and include no row at all when the salary is unknown. But eventually you’ll need to put this salary back with the rest of your employees’ information, using what we call a “join” (specifically, we’re “joining on employee ID”). In that joined table, sometimes you’ll want all employees to show up, whether or not their salaries are defined (an “outer join”), and sometimes you’ll only want those with salaries to show up (“inner join”). In the former case, you’re going to end up with nulls.

Or at least, C.J. Date doesn’t really give us much help in getting around this problem. It’s not clear what he would advocate in this situation. His theoretical cast of mind tells him that a database is a collection of true statements, period, full stop. Since a null introduces the possibility of false statements, it is to be rejected. It must be noted that he has a great many other practical reasons for rejecting nulls, among them that they make query optimization harder and require the user to go through hoops — like being careful about defining what “average” means, as above — that they just shouldn’t have to go through.

Viewing a database as a collection of true statements, it follows that the database should at no point contain false assertions; it must be carried from one true statement to another. This principle leads Date to reject many principles that normally guide database design. For one, he rejects the idea that a database should only be in a consistent state across transaction boundaries; every single statement that changes the state of a database, whether inside a transaction or not, must change it to another true state.

To explain what this means, we’ll use the classic example of database transactions: a bank. When you transfer money from your checking to your savings account, typically something like the following is supposed to happen within the bank’s computers:

START TRANSACTION;
Deposit $500 in savings;
Deduct $500 from checking;
COMMIT TRANSACTION;

(Semicolons indicate the end of a statement.)

If someone trips over the database’s power cable between the deposit and the deduction, a badly designed system will have just given you 500 extra dollars; such a system is said to not be “transactional.” A properly designed system, on the other hand, will start back up when the power cable goes back in, will note that the database is in an inconsistent state, and will “roll back the transaction” by withdrawing that $500 from savings. And even though that money may have ended up in savings before being rolled back, no one in the world would ever see it; so far as anyone viewing the database knows, no money leaves checking until it reaches savings. This is called “consistency across a transaction boundary”: inside the transaction, the checking account may temporarily have too much money, but outside the transaction the database contains a true picture of the world.

C.J. Date rejects this kind of consistency, again because a database is a collection of true statements about the world. Consistency, then, must be enforced at statement boundaries rather than transaction boundaries.

How would statement-level consistency actually be enforced? Date uses a new bit of syntax to enforce consistency at statement boundaries when the transaction involves multiple tables; instead of the transactional syntax from above, our checking-to-savings transfer would be written like this:

deposit $500 in savings, deduct $500 from checking;

This syntax puzzles me, because it doesn’t actually solve the problem that Date was trying to solve. The problem is that, in a real physical computer system, some actions happen after other actions. In our case, the deduction happens at a different time from the deposit, no matter what the syntax says. So there will be a time T when savings contains 500 more dollars than checking. Date’s syntax serves to mask this problem rather than solve it. The fundamental reason that he thinks this syntax solves his problem is that his head is focused on the logical basis of databases; that logical basis is timeless, in that collections of statements transform from one state to another instantaneously. His Platonic form of a database is literally timeless. He’s less concerned with physical databases, in which transactions take finite time to complete.

Date’s syntax woes pervade all of Database in Depth. He really, really, really hates SQL. He hates it because it provides limited syntactic support for fairly common use cases. The trouble is that he doesn’t provide a real alternative to SQL. Yes, he provides his own language called Tutorial D (bolding is [sic] from throughout Database in Depth), but Tutorial D is undercooked even by Date’s own standards: at points throughout Database in Depth, he invents his own ad hoc syntax, which tells us that the language certainly wasn’t ready for prime time before he started writing the book.

Even if Tutorial D were entirely ready for action, Date makes clear that he’s never implemented it in a real computer system. As its name implies, it is a language for use in the classroom. So far as I know, Tutorial D is not in use in any commercial database system anywhere. Certainly Date gives no reason to think that it has actually been implemented anywhere, which lets SQL win by default.

Anyone who reads Database in Depth from a relative newbie perspective will, I trust, be puzzled for the same reasons that I was: if SQL is so bad, then why hasn’t something like Tutorial D supplanted it? Is it another case of path dependence, where the world’s earlier mistaken adoption of SQL makes it harder to cast off SQL in the future (think “Microsoft Windows”)? Or is SQL the best real-world instantiation of the relational algebra that we could hope for? Likewise with nulls, and transactions, and all the other things to which Date is religiously opposed: if they’re so bad, why do we keep using them? Date never really says, which lets SQL win by default. Likewise, if Tutorial D is so expressive, and allows so many SQL constructs to be expressed in far fewer words, then how come no one — including Date himself! — has implemented the language in a commercial product or deployed it in a large database?

So it’s best to read Database In Depth as a summary of the relational algebra, and perhaps less of a good idea to read it as a damning verdict levied against SQL. You’ll learn the relational basis of all the SQL that you know and love. You’ll learn that foreign-key constraints and all the rest of a database’s integrity checks are just syntactic sugar on top of the relational algebra. You’ll learn how to reduce any of these sugary bits to their relational axioms. You’ll learn how a database query optimizer can transform your query into a faster one and know, beyond a shadow of a doubt, that the transformed query will return the same result as your original. You’ll learn the difficulties that SQL causes for those same query optimizers by not acting like a full relational algebra. Those of you with a mathematical cast of mind who like to understand how your systems work under the covers should read and reread Database in Depth.

[1] — This gets back to my question about inserting an IP address into MySQL. People raised the reasonable point that SQL itself shouldn’t be defining lots of different types; there are millions of reasonable types, and it’s not the language’s responsibility to provide all types to all people. While true, I think this misses the point that a modern language provides an extensible type system: C++ and Java don’t define a primitive date/time type, but they do allow programmers to compose their own date/time type from the more primitive types that the language makes available. It should be possible, in any modern language, to enforce a particular type for a particular field, and the type enforcement should take relatively few characters. Python, I’m looking at you here: I shouldn’t have to do

def some_func(foo, bar):
    if not isinstance(foo, str):
        raise TypeError("First arg to some_func must be a string")
    if not isinstance(bar, int):
        raise TypeError("Second arg to some_func must be an integer")
    [rest of function]

I should be able to do

def some_func(string foo, int bar):
    [rest of function]

Of course, just to be clear, Python does allow user-defined types. You can create your own classes to represent any complicated real-world thing. It just doesn’t allow shorthand type enforcement the way SQL does.

It looks some variants of SQL do, indeed, allow user-defined types. See, e.g., PostgreSQL, though this is the first I’ve read about it.

Bad Behavior has blocked 423 access attempts in the last 7 days.