Tuesday 28 October 2008

ISWC08 Monday a.m.

Another day, another tutorial. It's raining, but deep in the bowels of the Conference Hall there are no windows... just slide after slide....

Today the tutorial is "Reasoning for Ontology Engineering". It's a big topic and even with my little experience of the subject I know that there are many issues whenever the theory comes face to face with the real world. The tutorial outline is divided into 4 parts: Introduction, Bottom-Up Approach to Designing Ontologies, Understanding and Repairing Inferences and Data Integration through Ontologies. Here goes (and when I fail to make sense you can check out the slides.

"Introduction" - Ralf Moller
Ralf's Introduction is theoretical in nature. He introduces formal notations of a Description Logic language, ALCQ, that, though I'm sure they're useful to some, seem superfluous for my grasp of the problem. He describes a top-down approach to ontology design that seems entirely sensible to an seasoned OO designer like myself. It's all phrased in formal terms, TBox, ABox, Generalized Concept Inclusions (GCI), Grounded Conjunctive Queries (GCQ) but the intention is clear _despite_ that :-). To paraphrase (and probably distort) the TBox contains the 'type system' while the ABox contains the instances.

OK, I'll quote from the presentation:
"A TBox is a set of generalised concept inclusions" - translation: it's a class hierarchy
"An interpretation satisfies a GCI, C is subsumed by D, if all members of C are also members of D" - translation: C is a subclass of D
"An interpretation is model of a TBox if it satisfies all GCIs in the TBox" - translation: a model must conform to the type system
"A concept, C, is satisfiable w.r.t. a TBox, T, if there exists a model, I of T, such that the set C w.r.t. I that is not empty" - translation: is there any way that there can be members of the set.
etc.

Ralf covers the Unique Name and Open World assumptions and demonstrates how standard reasoning (based on RDFS & OWL) can be used to validate design decisions. He shows how Concept membership can be defined using restrictions (something that takes the OO head a while to get used to), and how that can be the base of GCQs. GCQs can be reduced to (standard) instance tests (i.e. simple triple queries), but non-trivial optimization techniques are required, such as those implemented by RacerPro. The number of individuals that can be efficiently queried (using sound & complete reasoning) has increased by orders of magnitude over the past 5 years (from 100 to 10,000 or 100,000 today).


"A Bottom-Up Approach to Designing Ontologies" - Anni-Yasmin Turhan
Anni has a cold but she delivers an interesting session. It has been shown that Domain Experts (as opposed to Knowledge Engineers) don't necessarily have a sufficient grasp of OWL (Description Logics in general) to be able to design ontologies. Instead the team have developed a methodology by which the domain engineer creates instances/individuals in the ABox. A Racer Pro/Porter plugin called Sonic implements Most Specific Concept (MSC) and Least Common Subsumer (LCS) functionality.

The idea is that the Domain Expert creates instances (in the Abox), and then use the MSC algorithm to automatically generate a Concept (class) in the TBox that closely matches the instance - this turns properties into restrictions, for example. They then select several of these MSCs, that they 'know' describe 'similar' instances and use the LCS algorithm to generate a Concept (class) that subsumes all of the MSCs as closely as possible. The tool allows the user to edit/rename the generated Concepts to remove incorrect or unnecessary assertions.

Most Specific Concept:
- the input individual is an instance
- the output is the best-fitting concept description for the input
- available for 'unfoldable' TBoxes
- only appropriate for acyclic ABoxes (but for ALE Description Logics you can compute k-approximations
So, the first two bullets above make perfect sense, but the next 2 require some formal knowledge that I don't really have, but here's what I've dug up.

Unfoldable: For a given TBox, T:
- All axioms are definitional (subclass or equivalence relationships)
- Axioms in T are unique (it's more complicated than this, but enough is enough)
- T is acyclic

Description Logic naming conventions
- AL: Attributive language, a base language which allows: Atomic negation, concept intersection, universal restrictions, limited existential quantification
- E: Full existential qualification

k-approximation: an approximation algorithm. I can't even paraphrase this one, so check it out for yourself.

Least Common Subsumer:
- the output is a Concept (class) that subsumes (is superclass of) each of the input Concepts
- the output is the best-fitting Concept description for the input Concepts
- available for 'unfoldable' TBoxes
- not available for logic more expressive than ALEN (N: Cardinality restrictions)

Are you still here? Fine.

If you have a populated TBox, but with a flat hierarchy, you can use LCS to deepen your class hierarchy, by applying it to 'similar' sibling Concepts. The reason you might want to do this is that if you have a large number of siblings, it makes it harder to navigate & query the data.

Now, there are many ontologies that use more expressive DLs, such as ALC (C: Complex concept negation). The upshot appears to be that if you apply LCS in this context you end up with a lengthy disjunction (union) which is not very useful. This can be handled by either an approximation-based or customization-based approach.

Approximation-based: eliminate disjunction from input concepts (while preserving as much information as possible) and then compute the LCS. This amounts to a translation from the more expressive DL to a less expressive DL.

Customization-based: import the more expressive ontology into your ontology as 'background' terminology, and refine it using terms from a less expressive DL. You refer to concept names from the background ontology, but there's no feasible way to actually use this! So, the proposal is to try using "subsumption-based common subsumer".

I was well lost on this last point, so it was lucky that it was lunchtime.

No comments: