Tuesday, October 7, 2008

The God Entity

Here's a seemingly innocuous hex string:

One may wonder why Google returns about 16,600 results for this highly-entropic hex-string (Oct. 2008).

Upon further investigation, one may again wonder why the results are all FOAF...

And with a few more clicks, one may wonder why the value is so popular for foaf:mbox_sha1sum...

And then one might wonder why one might care...

The aforementioned hash is that of the empty 'mailto:' string, presumably produced by FOAF exporters from empty email input forms. Unfortunately, foaf:mbox_sha1sum is inverse functional, meaning that it should be a unique identifier for an entity: in this case a person. Now, from a reasoner's perspective, only one person can have that particular value for the property: therefore if you find two they must be the same person! Now, we have a problem. All of the descriptions for these people get merged into one super-description for this super-person. A reasoner will now see one person, with tens of thousands of names, interests, emails, etc.

Of course, there are other such values which contribute:

...not to talk about other inverse functional properties such as foaf:weblog which is oft used for defining shared weblogs (anyone who shares one is the same person).

To clarify, perhaps, this is not a criticism of FOAF but perhaps moreso an observation that people will not stick to the semantics hidden away in an RDFS/OWL description. They will see a label for a property or class, project their needs onto it and use it, although it doesn't fit the bill.

The problem becomes a serious issue where the identity of what is described is at stake. More specifically, problems with identity -- relating to assignment of URIs, lack or mis-use of same-as, inverse-functional, functional or cardinality of 1 properties -- are one of the largest stubling blocks at the moment for building a "web of entities".

In human language, a word's definition follows it's usage to a certain extent. The question is, should FOAF change the definition of their words to match how people use them? Should they loosen definitions to say that foaf:weblog can apply to communal weblogs?

Finally, where would this post be without one of the finest examples of the chaos in RDF web data.

EDIT (15/10/08): Indeed, I am new to blogging (and indeed reluctantly at that), and I missed an opportunity for flagrant self-promotion! For more on the issue of identity on the Web and smushing through inverse-functional properties, see this paper from 2007:

Aidan Hogan, Andreas Harth, Stefan Decker. "Performing Object Consolidation on the Semantic Web Data Graph". Proceedings of I3: Identity, Identifiers, Identification. Workshop at 16th International World Wide Web Conference (WWW2007), Banff, Alberta, Canada, 2007.


George Bondurant said...

I have seen and read out all pages as well as all articles and I think your post is exceptionally fascinating and generally, I continue searching for like this sort of sites where I learn or get new idea. so I need to suggest you one thought you will review the task custom essay writing service and furthermore review the rule of remark. I have to thank you for your minute because of this inconceivable read! A good blog always comes-up with new and energizing data and keeping in mind that understanding I have feel that this blog is truly have all those quality that qualify a blog to be a good one.

Colin Cowdrey said...

In any case, indeed, more utilization of them is likewise bad for wellbeing. A decent consideration must be taken for the capacity of tea and espresso so uncommon sorts of sacks came in picture. food plastic packaging manufacturers