Thursday, November 27, 2008

Who's to blame?


Testing a new data-level validator for RDF and just found that Dan Brickley has a problem with his LiveJournal FOAF export, which states that he has the empty literal "" as a value for the InverseFunctionalProperty foaf:icqChatID.

So Dan, if you're reading this, I don't blame you (although I do perhaps blame LiveJournal for not having more rigorous data-checks... they do export millions of FOAF files after all).

Overall though, I blame the lack of a tool that you can use to check for such problems. If you get people hacking away on their RDF documents problems will arise in the data (I'm as much to blame as anyone else). You need a tool to assist you in debugging your RDF on the data-level.

We're working on it.

Wednesday, November 19, 2008

The knowledge-base sapper, and his four explosive triples.

Okay, so here's some non-standard use of RDFS/OWL that Axel Polleres came up with:

rdfs:subClassOf rdfs:subPropertyOf rdfs:Resource .
rdfs:subClassOf rdfs:subPropertyOf rdfs:subPropertyOf .
rdf:type rdfs:subPropertyOf rdfs:subClassOf .
rdfs:subClassOf rdf:type owl:SymmetricProperty .

Full reasoning on this according to some ruleset (say pD*: -- i.e., OWL-Horst -- rules rdf1, rdfs4a, rdfs4b, rdfs7x, rdfp3) gives everything. I'll let you work that one out. By everything, I mean every possible (albeit finite) combination of identifiers that constitute a valid RDF triple: the number of resulting triples equals the number of unique identifiers, cubed. Stick those four statements into a web-crawl, do some happy-go-lucky rule-based reasoning and you have problems. Of course this is only one such example. Watch this space.

...oh, and before I go, it doesn't even take four triples (pD*: rdf1, rdfs4a, rdfs4b, rdfp6, rdfp7, rdfp9, rdfp10, rdfp11).

rdf:type owl:sameAs owl:sameAs .


Friday, November 14, 2008

RSS 1.0... old and broken.

I guess it's a bit old hat, but RSS 1.0 uses the exact same URI for image as a property which relates a channel to an image, as for image as a class. Seems to stem from trying to create an RDF spec which closely resembles some older XML version... at the cost of the RDF spec (RSS isn't the only victim of an XML porting hangover).

No RDF(S)/OWL document exists for the spec (it is quite old -- from 2000) but the above issue also pretty-much precludes the possibility of one being created.

Maybe time for a half-decent (and maybe even less obtuse) replacement. A quick scan of this RSS 1.1 document and it seems to be a candidate. In fairness, it's hardly rocket science... and that's a good thing.

Plus, they capitalise Channel. What's not to like?

Wednesday, October 15, 2008

OWL On What Property?

What do you do when you find a restriction that has more than one owl:onProperty value attached? Indeed, what do you do with a restriction that has multiple owl:someValuesFrom attached? Until further notice, I'm going with the highly underrated throw it out approach.

Apologies to whoever lovingly crafted this data.

Tuesday, October 7, 2008

The God Entity

Here's a seemingly innocuous hex string:

One may wonder why Google returns about 16,600 results for this highly-entropic hex-string (Oct. 2008).

Upon further investigation, one may again wonder why the results are all FOAF...

And with a few more clicks, one may wonder why the value is so popular for foaf:mbox_sha1sum...

And then one might wonder why one might care...

The aforementioned hash is that of the empty 'mailto:' string, presumably produced by FOAF exporters from empty email input forms. Unfortunately, foaf:mbox_sha1sum is inverse functional, meaning that it should be a unique identifier for an entity: in this case a person. Now, from a reasoner's perspective, only one person can have that particular value for the property: therefore if you find two they must be the same person! Now, we have a problem. All of the descriptions for these people get merged into one super-description for this super-person. A reasoner will now see one person, with tens of thousands of names, interests, emails, etc.

Of course, there are other such values which contribute:

...not to talk about other inverse functional properties such as foaf:weblog which is oft used for defining shared weblogs (anyone who shares one is the same person).

To clarify, perhaps, this is not a criticism of FOAF but perhaps moreso an observation that people will not stick to the semantics hidden away in an RDFS/OWL description. They will see a label for a property or class, project their needs onto it and use it, although it doesn't fit the bill.

The problem becomes a serious issue where the identity of what is described is at stake. More specifically, problems with identity -- relating to assignment of URIs, lack or mis-use of same-as, inverse-functional, functional or cardinality of 1 properties -- are one of the largest stubling blocks at the moment for building a "web of entities".

In human language, a word's definition follows it's usage to a certain extent. The question is, should FOAF change the definition of their words to match how people use them? Should they loosen definitions to say that foaf:weblog can apply to communal weblogs?

Finally, where would this post be without one of the finest examples of the chaos in RDF web data.

EDIT (15/10/08): Indeed, I am new to blogging (and indeed reluctantly at that), and I missed an opportunity for flagrant self-promotion! For more on the issue of identity on the Web and smushing through inverse-functional properties, see this paper from 2007:

Aidan Hogan, Andreas Harth, Stefan Decker. "Performing Object Consolidation on the Semantic Web Data Graph". Proceedings of I3: Identity, Identifiers, Identification. Workshop at 16th International World Wide Web Conference (WWW2007), Banff, Alberta, Canada, 2007.

Wednesday, June 4, 2008

First some preliminaries...


Audio Help (wěb'lôg', -lŏg') Pronunciation Key
n. A website that displays in chronological order the postings by one or more individuals and usually has links to comments on specific postings.