Sunday 26 October 2008

ISWC08 - Sunday a.m.

International Semantic Web Conference 2008
Tutorial - Introduction to the Semantic Web

Sunday 26th October - Morning

It's Sunday, the clocks have just changed and the sun is shining. I've survived my first night in a small room at the Hotel Barbarossa that usually houses chain smokers - my throat is sore and I've got a long day ahead of me (and the prospect of the same stale cigarette smell for the next 4 nights - the hotel is full and they have no non-smoking rooms).

I'm fairly new to Semantic Web technology, which is why I booked into the introduction tutorial, but I've read "Semantic Web for the Working Ontologist" and have been working with RFD(S)/OWL for a few months now, so I didn't want a simple re-hash of online documentation. I needn't have worried. The day is split into 9 sessions from 10 speakers and paints a wide canvas of what's going on and where we might be heading.

The session is opened by Jim Hendler with his "Introduction to the Introduction to the Semantic Web". Jim is an engaging presenter, with a style that reminded me a little of Jim Coplien (must be that first name :-). His main theme is that there are 2 different interpretations of the Semantic Web, one originating in the AI community (heavyweight ontologies, processing time less relevant than correctness/completeness of answers) and the other from the internet/web community (lightweight ontologies, speed of response much more important than correctness/completeness). These views, he claims, are not irreconcilable, but there remains a large space between the two that still needs to be explored.

Sean Bechhofer takes over with an "Introduction to OWL". In 45 minutes he's never going to cover RDF, RDFS and OWL in any depth , but he does sketch out the landscape all the way from motivating the need for semantics through inference, OWL dialects and DL. He even manages to fit in brief comments on the Necessary/Sufficient Conditions, Open World Assumption and Unique Naming Assumption. Not surprisingly he doesn't get through all 67 slides and skips SKOS and OWL2... I don't know about him, but I needed that coffee.

Half an hour later we hear from Asun Gómez-Pérez about "Ontology Engineering Methodologies". She covered the NeOn methodology for developing ontologies, which defines a process and some process artefacts to assist ontology developers. There's an ontology workflow, with 9 different scenarios defined, a card based system for capturing relevant data and some guidelines about how to reuse existing resources. I found Asun hard to follow, due to a combination of her accent, the speed with which she went through the material and some problems with the microphone. However, I think that I'll be looking closer at the NeOn project when I get back home.

Asun was followed by Aldo Gangemi's "Ontology Design". He covered the motivation for ontologies in general, contrasted them with controlled terminologies and reiterated that ontology design was about the non-trivial task of matching a solution (ontology) to the problem at hand. His main interest appears to be discovering and documenting ontology design patterns, which he divides into categories such as 'Logical Patterns' and 'Content Patterns'. A website has been launched as a repository of ontology patterns at http://www.ontologydesignpatterns.org . He then introduced the concept of unit testing ontologies, which was something I haven't seen elsewhere, and demonstrated by demonstrating some fundamental errors in the conferences own ontology, such as missing inverse properties, missing disjunctions and missing property transitivity.

We still, haven't reached lunchtime! Fabio Ciravegna takes us through "Technologies for Capturing, Sharing and Reusing Knowledge". There are a lot of resources in the world; some of them are text-only, but most are not, which makes automated markup really hard. However, the amount of data out there precludes manual-only annotation in most situations. Fabio talks about hybrid methods of annotating text documents, where the system learns how to do markup by tracking a human annotating a sample of the documents to be marked up. There are simpler automated markup techniques, such as Named Entity Recognition, Terminology Recognition or Known Name Recognition, which can have precision/recall accuracy of up to 80%-95%. However, when more useful (and complex) techniques are attempted, such as capturing links between elements in a document, precision/recall hit a plateau of 60%/70% in 1998 and has stayed there. And that's without trying to automate annotations of multimedia content. Once the data is annotated it becomes easier to share it, but there are still hurdles to overcome - there's a lot of research about searching/querying: keyword/semantic or hybrid. Again, hybrid appears to win, but even merging the results from a hybrid search is tricky. Look at k-now.co.uk a spinoff from Sheffield University, which has been highly rated by Rolls-Royce, no less.

My blood sugar level dipped too low during Fabio's talk, and looking back over the 100+ slides I wonder why I can only remember a fraction of them. Did he skip them or was I dreaming of lunch?

No comments: