Wednesday, October 15, 2008

OWL On What Property?

What do you do when you find a restriction that has more than one owl:onProperty value attached? Indeed, what do you do with a restriction that has multiple owl:someValuesFrom attached? Until further notice, I'm going with the highly underrated throw it out approach.

Apologies to whoever lovingly crafted this data.

Tuesday, October 7, 2008

The God Entity

Here's a seemingly innocuous hex string:

One may wonder why Google returns about 16,600 results for this highly-entropic hex-string (Oct. 2008).

Upon further investigation, one may again wonder why the results are all FOAF...

And with a few more clicks, one may wonder why the value is so popular for foaf:mbox_sha1sum...

And then one might wonder why one might care...

The aforementioned hash is that of the empty 'mailto:' string, presumably produced by FOAF exporters from empty email input forms. Unfortunately, foaf:mbox_sha1sum is inverse functional, meaning that it should be a unique identifier for an entity: in this case a person. Now, from a reasoner's perspective, only one person can have that particular value for the property: therefore if you find two they must be the same person! Now, we have a problem. All of the descriptions for these people get merged into one super-description for this super-person. A reasoner will now see one person, with tens of thousands of names, interests, emails, etc.

Of course, there are other such values which contribute:

...not to talk about other inverse functional properties such as foaf:weblog which is oft used for defining shared weblogs (anyone who shares one is the same person).

To clarify, perhaps, this is not a criticism of FOAF but perhaps moreso an observation that people will not stick to the semantics hidden away in an RDFS/OWL description. They will see a label for a property or class, project their needs onto it and use it, although it doesn't fit the bill.

The problem becomes a serious issue where the identity of what is described is at stake. More specifically, problems with identity -- relating to assignment of URIs, lack or mis-use of same-as, inverse-functional, functional or cardinality of 1 properties -- are one of the largest stubling blocks at the moment for building a "web of entities".

In human language, a word's definition follows it's usage to a certain extent. The question is, should FOAF change the definition of their words to match how people use them? Should they loosen definitions to say that foaf:weblog can apply to communal weblogs?

Finally, where would this post be without one of the finest examples of the chaos in RDF web data.

EDIT (15/10/08): Indeed, I am new to blogging (and indeed reluctantly at that), and I missed an opportunity for flagrant self-promotion! For more on the issue of identity on the Web and smushing through inverse-functional properties, see this paper from 2007:

Aidan Hogan, Andreas Harth, Stefan Decker. "Performing Object Consolidation on the Semantic Web Data Graph". Proceedings of I3: Identity, Identifiers, Identification. Workshop at 16th International World Wide Web Conference (WWW2007), Banff, Alberta, Canada, 2007.