Having installed a new hard drive a while back, I’ve been ripping all my CDs to disk. Now that I’m doing it, I’m starting to wonder how we got along using unreliable CDs. I’m right now in the middle of ripping a Frank Sinatra CD that I’ve had for something like 15 years, and the ripping software is having a hell of a time working through all the little minor scratches. Tracks on other CDs have failed to rip altogether, because their scratches were just too deep. Had I been forward-thinking, I would have ripped all of them as soon as I bought them, then never touched the CDs again. Alas.
Now that disk space is so cheap, it’s worthwhile to ask whether we should just rip the straight WAV file and do without the encoding altogether. At 700 megs per CD, 1000 CDs (pretty much the limit of what most people I know own) would take up about 700 gigs. At $150 or so for a 200-gig disk, that’s $525 in disk space, which isn’t terrible at all. Give it a couple years, and that price should be at least halved.
The real trouble, of course, comes when you try to have your friends download your music collection: if the compression ratio for most digital-music encodings is around 15:1, we’ll have to wait until our bandwidth has increased by a factor of 15 before straight WAV ripping will be feasible. Bandwidth isn’t increasing all that fast, so we may have a while to wait.
Digital files are just so much more convenient than CDs. For one thing, backing up your entire music collection from one disk to another would be phenomenally easy: just attach a slave drive to your primary hard drive, then do a disk-to-disk copy, detach the slave, and . . . I dunno, put the slave drive in a safe deposit box somewhere.
Better yet, I’m looking forward to the day when offsite backup services over high-bandwidth lines are common. Then filesystem-level backup would be really neat: every time you change a file, the change trickles over the wire to your backup provider. (More reasonably, of course, the filesystem would buffer changes and only send them in 100-megabyte chunks or every hour or somesuch.)
In the meantime, something like a RAID setup makes a lot of sense, given the amount of data that we’re accumulating: every time you make a change to your disk, it gets mirrored (in some clever way) to a number of other local disks. Corporations already use RAID, but it needs to come to the desktop.
Relatedly, I’ve been thinking about the soon-to-arrive day when no one throws away anything on his computer at all. We’ll keep all our music files, all our movies, all our photographs, every document, every draft of every document, and on and on. It seems that we’ll need smarter filesystems to handle this. For one thing, the idea of a filename is really a holdover from an earlier day: I shouldn’t have to think of the name of a file. I should write a bunch of bits (a video, a word-processing document, or whatever), and those bits should have a bunch of metadata attached to them. Then my computer should decide how to display that metadata. For instance, when I rip a CD, the software automatically pulls the album’s title, artist and so forth from the FreeDB; the software then renames the file to match some template — say, the name becomes “Artist—Album—TrackNum—SongTitle.Extension,” where “Extension” depends on the album’s format (MP3, Ogg, etc.)
That’s all very backwards, if you think about it. Yes, there’s metadata attached to that album. But sometimes I’ll want to look for a song that’s by a jazz artist and is longer than 10 minutes, and sounds like Charlie Parker. The filesystem should be able to help me with this. As it stands, I’m expected to categorize the album on my own — put it in a jazz directory, then a Charlie Parker directory, then look at the Charlie Parker directory and run some special tool to extract each song’s length. That’s just silly.
One reasonable approach, it seems to me, is to have some central database on disk of metadata that’s appropriate for each media type (say, “album,” “artist,” “genre” and others for songs; “director,” “producer,” “actors” and so forth for films; etc.). Then I could configure songs to be displayed in whatever way I wanted: the format could be “genre—artist—album—song” or whatever.
Because I want to do away with directories, the next step is to turn a filesystem into a model-view-controller setup: each file just becomes another row in a very large database table with a lot of metadata attached to it, and I can choose to present all that data in whatever way I want. I wouldn’t have an “MP3s” folder and a “video” folder anymore; basically what I’d have is the set of objects returned from a specific database query. Maybe today I want to organize all of my Charlie Parker objects — not just his albums, but also the movie Bird, say — under one view. Tomorrow I may want to organize my data under “bop,” “free jazz,” “cool jazz,” and so forth; under that view, Charlie Parker and Thelonious Monk might appear in the same “folder.” It all depends upon what I want. The filesystem shouldn’t force me to move all my Charlie Parker MP3s into a different directory every time I think about my data in a different way.
The point is simply that when we’ve got this much data, we’re going to spend absurd amounts of time just organizing it unless our systems change with us. We have a wonderful distributed system (see the FreeDB) for attaching metadata to objects. Now it’s a question of how we use that metadata to get ourselves better organized. I think filesystems are precisely where we need to start.