Helping out links

slaniel | Uncategorized | Monday, May 5th, 2003

The biggest problem with the Net, it seems to me, is just sorting through the morass of links to find the really credible information. This is not a new observation; Google is an attempt to make the Net “speak for itself,” such that highly linked pages appear high on the list, but only if those highly linked pages are themselves linked from highly linked pages. And so on recursively. This is a way of expressing trust relationships: if I link to you, I trust what you say, in some general sense. Those with a lot of links are highly trusted (by definition). If you’re highly trusted, and you link to a page, your vote counts more than a less-trustworthy person’s vote. This makes sense, and maps to how ordinary human relationships work.

It seems that the linker could do more to help the search engines, though. For instance, what if I want to make clear that I trust someone’s statements about mathematics, but not his or her statements on anthropology? If I link to someone’s Website, and that person happens to talk about both subjects, all Google knows is that I generally trust what this person says. Something more specific would help.

So it would make sense to develop a series of XML tags that both categorize information into a taxonomy, and specify links with that taxonomy. So for instance, the syntax might look like

 <link url="http://www.philosophy.duq.edu/tophtml/faculty.html#rockmore"> <topic> <main> Philosophy </main> <subhead> Continental </subhead> </topic> </link> 

This syntax isn’t exactly what we want, because we’d want arbitrarily nestable tags (Philosophy -> Continental Philosophy -> Hegel -> Hegel Scholarship Since 1950 ->  . . . ), and the above syntax seems like it wouldn’t scale very well in that direction. But you get the idea.

A search engine like Google could then take my proposed subject labeling of a page (“this is a page about Hegel”) and decide whether I’m a trustworthy judge of that subject. The way it would decide my “subjectworthiness” is analogous to how it decides trustworthiness overall: by how many people link to my site on that same subject. Highly ranked philosophers’ “votes” would count more than low-ranked philosophers’, and so forth.

Possible extensions: if pages were aware of their linkers, this process would become much more interesting. If a number of pages link to me, say on the topics of “the Internet,” “Linux,” and “Hitchcock,” then my server can publish those as “taxonomies that I’m aware of.” When someone links to me, software on the linker’s machine could ask the linker, “This page has already been categorized as a Linux page, a Hitchcock page, and a page about the Internet. Do any of these subjects fit your new link? If not, would you like to add a new topic?” We wouldn’t want taxonomies to become self-reinforcing, so that users would lazily pick “Hitchcock” even if their link was actually about Howard Hawks. But you get the idea: users could very quickly categorize their own data, in effect creating a distributed Yahoo! that does the job much, much better than Yahoo! does.

One big moral of the above is simply this: the fundamental language of the Web is the hyperlink. Authors have had their own versions of the hyperlink since well before the Internet, be they academic footnotes or literary allusions. The Web just makes linking easier and richer than it has ever been before.

It also seems that my linking to another site often expresses disdain for that site, rather than a vote of confidence for it; often I link so that I can show others how much of a fool I think the linkee is. Google only counts this as a link, and the most highly linked fools will end up at the top of the list for exactly the wrong reason. So again, there should be a syntax to express the degree of distrust for the person we link. Or at the very least, there should be a Google-specific set of tags that says, “don’t count my vote here!”

The architecture of expressing trust relationships on the Net is in its infancy, but it’s dreadfully important that we make it work. Traditional media maintain their hold over the Net (nytimes.com, cnn.com, abcnews.com, etc.) because they have carried over the public’s trust onto the Web. And yet the great promise of the Net is that more ground-level media such as Weblogs will displace the huge corporations. This won’t happen unless there are reliable ways to measure trust on the Net.

Huh

slaniel | Uncategorized | Monday, May 5th, 2003

Totally coincidentally, I discovered that someone has recently written about the same idea that I wrote earlier tonight. I think his idea is less well-developed, however, and not quite so extensible.

Though let me make clear that I’m sure my idea is unoriginal; distributed taxonomies are, I suspect, a rather intuitive idea.

Linking math

slaniel | Uncategorized | Monday, May 5th, 2003

In the coming months, I hope to put some mathematical documents on the Web. Math seems as though it would be the perfect subject for publication on the Web. One simple example: let’s say I label something as follows:

 <definition term="compact"> A set S is said to be compact if every open cover of S has a finite subcover. </definition> 

Now if I use the term “compact” elsewhere in the mathematical world, I should be able to hold my mouse over the word and see the above definition of “compact.” If I hold the mouse over a particular mathematical assertion, maybe a bubble will appear with a proof of that assertion. Mathematics is an inherently linked field, to a degree that few others are. The Web should start exploiting this.

To that end, I hope that I’ll get permission to publish a richly linked version of Lester Dubins’s and L.J. Savage’s How To Gamble If You Must: Inequalities For Stochastic Processes on the Web sometime soon. Stay tuned.