Claysnow

Tuesday, 19 April 2011

Rekursiv

Reprint of an article I wrote for CVu, the member's journal of ACCU (http://www.accu.org):

Rekursiv

View more documents from Seb Rose.

Stopping the Rot - Putting Legacy C++ Under Test

Check out this presentation, from my session at ACCU 2011, Oxford:

Stopping the Rot - Putting Legacy C++ Under Test

View more presentations from Seb Rose.

Friday, 8 October 2010

This week the Newlands Primary School team completed their first challenge of the competition. They programmed the robot to release the syringe and get it back to base - with the robot returning to a well-defined position on the board. There's still some refinement needed to the construction of the tool that pushes the release handle, but that should be our first 25 points in the bag. Only another 375 points to go......

The boys don't seem very interested in the project element of the challenge. Some of the girls came up with a list of possible research areas and voting seemed to stick around "constipation". I wonder if they will come up with any solid ideas about how to design an innovative solution. Hmmm.

Thursday, 23 September 2010

Bookshelf blues

My bookshelves filled up years ago, but there's no end to my acquisition of books. There's a steady flow of technical books to the attic, where they gather dust and are never thought of again. Fiction, on the other hand, gets lent out almost as quickly as it arrives.

This week my Kindle 3 arrived, heralding respite for bookshelf and attic (and a less open lending regime for fiction). First impressions are good - it's small, light, really easy to read. The Wi-Fi and 3G work as advertised and the web browser is basic, but usable.

The cost of Kindle books isn't significantly different from the cost of the print versions, though, so it's not financially feasible to replicate my reference collection on the Kindle. It be nice if you could exchange your print copy for an electronic version. I'd be happy to pay a modest upgrade price even. Could this be a business model waiting to be developed?

Tuesday, 28 October 2008

ISWC08 Monday p.m.

I've had lunch and coffee and I'm ready for more. This afternoon promises the most relevant sessions so far. Will I sink or swim?

"Moduralisation and Explanation" a.k.a. "Understanding and repairing inferences"
Matthew Horridge & Uli Sattler

There's a customised version of Protege 4 that's used in this session available at the tutorial web site, and a couple of example ontologies to go with it. Specifically, the customised Protege 4 comes with some plug-ins that are part of the TONES project.

Uli starts by leading us in gently. There are lots of ontologies out there & if one contains concepts that you want to model you should reuse them. This is sound engineering technique & makes perfect sense. However, you might want to reuse only a part of an ontology, so how do you go about extracting the useful bit? Well, you can probably identify the concepts that you want to reuse, but they themselves will depend on other concepts within the borrowed ontology & you'll need to import those borrowed concepts as well, and this is called 'coverage'. The bad news is that guaranteeing coverage is, in general, undecidable, but there is a syntactic approximation (phew) that guarantees coverage, but not minimal size.

Unfortunately (again) you can't guarantee the 'safety' of the imported terms. There may be clashes or inconsistencies with your ontology and you'll need to check these manually. This is related to "Conservative extensions" (what ever they may be) and is the subject of ongoing research.

Now Matthew takes over, and we get into some gritty inference stuff. Using a reasoner plugin he identifies a number of Root & Derived Unsatisfiable classes in the ontology. These are generally 'Bad Things' (TM) - after all what's the point of a class that nothing CAN belong to (note this is different from an empty class, that is satisfiable, but has no members). The plugin lists the unsatisfiable classes, and shows which are Root classes - these need to be addressed first, because fixing them might well fix the unsatisfiability of the Derived classes.

In a large ontology (or any ontology) it can be hard for a mere human to determine _why_ a class is unsatisfiable. Lucky for us, reasoners can be induced to tell us. What they can give us is 'justifications' which are "minimal subsets of an ontology that are sufficient for a given entailment (inference) to hold". They are a sort of explanation of how an inference was reached (and are also known as MUPS or MinAs). Each justification is made up of one or more axiom, and the same axiom may occur in more than one justification. For a given inference (such as the unsatisfiability of a class) there may be multiple justifications. To 'repair' an unsatisfiability all you need to do is delete a single axiom from each if the justifications. Of course, nothing is that simple - how do you know which axiom (s) to delete? Well, you need to analyse the axioms and decide which ones represesent logical errors.

So far, so good. Now things begin to get a bit hairy.

Some justifications include 'superfluous' axioms - the same inference would ave been reached if those axioms didn't exist. This might point to problems in the ontology, and there's a plugin to help find these. The superfluous nature of these justifications can be concealed by both 'Internal' and 'External' masking, which I'm not even going to try and explain. Then there are 'Fine-grained' justifications that come in two flavours: 'Laconic' and 'Precise'. The plug-in lets you look at the 'Laconic', which have no superfluous axioms, and all axioms are as weak as possible. 'Precise' justifications are a subset of 'Laconic' justifications and are primarily geared towards repair.

Well, I'm glad that's all clear.

"Data Integration through Ontologies"
Diego Calvanese & Giuseppe De Giacomo

Still to come....

ISWC08 Monday a.m.

Another day, another tutorial. It's raining, but deep in the bowels of the Conference Hall there are no windows... just slide after slide....

Today the tutorial is "Reasoning for Ontology Engineering". It's a big topic and even with my little experience of the subject I know that there are many issues whenever the theory comes face to face with the real world. The tutorial outline is divided into 4 parts: Introduction, Bottom-Up Approach to Designing Ontologies, Understanding and Repairing Inferences and Data Integration through Ontologies. Here goes (and when I fail to make sense you can check out the slides.

"Introduction" - Ralf Moller
Ralf's Introduction is theoretical in nature. He introduces formal notations of a Description Logic language, ALCQ, that, though I'm sure they're useful to some, seem superfluous for my grasp of the problem. He describes a top-down approach to ontology design that seems entirely sensible to an seasoned OO designer like myself. It's all phrased in formal terms, TBox, ABox, Generalized Concept Inclusions (GCI), Grounded Conjunctive Queries (GCQ) but the intention is clear _despite_ that :-). To paraphrase (and probably distort) the TBox contains the 'type system' while the ABox contains the instances.

OK, I'll quote from the presentation:
"A TBox is a set of generalised concept inclusions" - translation: it's a class hierarchy
"An interpretation satisfies a GCI, C is subsumed by D, if all members of C are also members of D" - translation: C is a subclass of D
"An interpretation is model of a TBox if it satisfies all GCIs in the TBox" - translation: a model must conform to the type system
"A concept, C, is satisfiable w.r.t. a TBox, T, if there exists a model, I of T, such that the set C w.r.t. I that is not empty" - translation: is there any way that there can be members of the set.
etc.

Ralf covers the Unique Name and Open World assumptions and demonstrates how standard reasoning (based on RDFS & OWL) can be used to validate design decisions. He shows how Concept membership can be defined using restrictions (something that takes the OO head a while to get used to), and how that can be the base of GCQs. GCQs can be reduced to (standard) instance tests (i.e. simple triple queries), but non-trivial optimization techniques are required, such as those implemented by RacerPro. The number of individuals that can be efficiently queried (using sound & complete reasoning) has increased by orders of magnitude over the past 5 years (from 100 to 10,000 or 100,000 today).

"A Bottom-Up Approach to Designing Ontologies" - Anni-Yasmin Turhan
Anni has a cold but she delivers an interesting session. It has been shown that Domain Experts (as opposed to Knowledge Engineers) don't necessarily have a sufficient grasp of OWL (Description Logics in general) to be able to design ontologies. Instead the team have developed a methodology by which the domain engineer creates instances/individuals in the ABox. A Racer Pro/Porter plugin called Sonic implements Most Specific Concept (MSC) and Least Common Subsumer (LCS) functionality.

The idea is that the Domain Expert creates instances (in the Abox), and then use the MSC algorithm to automatically generate a Concept (class) in the TBox that closely matches the instance - this turns properties into restrictions, for example. They then select several of these MSCs, that they 'know' describe 'similar' instances and use the LCS algorithm to generate a Concept (class) that subsumes all of the MSCs as closely as possible. The tool allows the user to edit/rename the generated Concepts to remove incorrect or unnecessary assertions.

Most Specific Concept:
- the input individual is an instance
- the output is the best-fitting concept description for the input
- available for 'unfoldable' TBoxes
- only appropriate for acyclic ABoxes (but for ALE Description Logics you can compute k-approximations
So, the first two bullets above make perfect sense, but the next 2 require some formal knowledge that I don't really have, but here's what I've dug up.

Unfoldable: For a given TBox, T:
- All axioms are definitional (subclass or equivalence relationships)
- Axioms in T are unique (it's more complicated than this, but enough is enough)
- T is acyclic

Description Logic naming conventions
- AL: Attributive language, a base language which allows: Atomic negation, concept intersection, universal restrictions, limited existential quantification
- E: Full existential qualification

k-approximation: an approximation algorithm. I can't even paraphrase this one, so check it out for yourself.

Least Common Subsumer:
- the output is a Concept (class) that subsumes (is superclass of) each of the input Concepts
- the output is the best-fitting Concept description for the input Concepts
- available for 'unfoldable' TBoxes
- not available for logic more expressive than ALEN (N: Cardinality restrictions)

Are you still here? Fine.

If you have a populated TBox, but with a flat hierarchy, you can use LCS to deepen your class hierarchy, by applying it to 'similar' sibling Concepts. The reason you might want to do this is that if you have a large number of siblings, it makes it harder to navigate & query the data.

Now, there are many ontologies that use more expressive DLs, such as ALC (C: Complex concept negation). The upshot appears to be that if you apply LCS in this context you end up with a lengthy disjunction (union) which is not very useful. This can be handled by either an approximation-based or customization-based approach.

Approximation-based: eliminate disjunction from input concepts (while preserving as much information as possible) and then compute the LCS. This amounts to a translation from the more expressive DL to a less expressive DL.

Customization-based: import the more expressive ontology into your ontology as 'background' terminology, and refine it using terms from a less expressive DL. You refer to concept names from the background ontology, but there's no feasible way to actually use this! So, the proposal is to try using "subsumption-based common subsumer".

I was well lost on this last point, so it was lucky that it was lunchtime.

Sunday, 26 October 2008

ISWC08 - Sunday p.m.

International Semantic Web Conference 2008
Tutorial - Introduction to the Semantic Web

Sunday 26th October - Afternoon

"Semantic Interoperability" - Jerome Euzenat & Natasha Noy
Much of the promise of the Semantic Web stems from the claim that we will be able to query heterogeneous data sources. For this to work the ontologies from the data sources need to be 'aligned' and this is not simple. It can be partially automated, but requires human intervention both to identify missed matched and confirm correctness of automated matching. Once the ontologies are aligned a transformation between them can be automatically generated. The transformation may translate queries/triples, or simply create new assertions that make axioms in the data sources equivalent. Combining different techniques can help, though this then introduces the need to aggregate, filter and trim results. In tests precision/recall vary from under 50% to over 80% if ontologies are 'relatively similar'. Best performers in Ontology Alignment Evaluation Initiative (OAEI) tests are Falcon and RiMOM.

"Semantic Web Services" - John Dominingue & David Martin
We start with an overview of Web Services today. They are syntactic, most integration tasks need to be carried out by developers and they can have problems scaling. The goal of Semantic Web Services is to "automate all aspects of application development through reuse". The idea is that we can build clients that can analyse a query and choreograph/orchestrate the interaction with one or more web services to provide potential solutions. David and John describe OWL-S and WSMO respectively. They both provide mechanisms that extend the description of web services through added semantics to make the mediation between client and service more automatable. David describes Fujitsu's Task Computing and John covers eMerge, a system developed to assist with emergency situations in Essex. They then describe the W3C recommendation for SAWSDL that allows traditional 3 extensibility elements for WSDL: modelReference, liftingSchemaMapping, loweringSchemaMapping.

"Linked Data - the Dark Side of the Semantic Web" - Jim Hendler
Darth Vader reared his head at the beginning of this session, but was quickly dispelled. Jim is talking about the unseen side of the Semantic Web, the ability to link data dynamically. An example he gives is the (largely theoretical) wine chooser application that downloads the menu for the restaurant you are eating at, prompts you to pick the dishes that you and your companions are choosing and based upon your (& their) preferences, the downloaded wine list and some online service that matches foods to wine characteristics, recommends what wine(s) to choose. I'm not entirely convinced - my heuristic of pick a colour and don't choose any whose price makes you perspire seems to work fine. He also talked about deployed websites such as Twine, LiveJournal, Freebase and DBpedia and described the huge online RDF resource at the W3C SWEO Community Project LinkingOpenData. A brief discussion of Semantic Gridding/Seeded Tagging followed and the assertion that many of the larger commercial companies were entering the Web 3.0 arena in the belief that it is only a matter of time before it provides winners to join Google (Web 1.0) and Facebook (Web 2.0) Hall of Fame. This is exemplified by Microsoft's recent acquisition of PowerSet earlier this year.

"Using the Semantic Web" - Mathieu d'Aquin
Mathieu was (if this is possible) even more excited than Jim. He had been hoping to demonstrate several application that make use of Semantic Web APIs, but the connectivity at the conference centre is pretty poor. Consequently we had to make do with static images, but he was still convincing. He is the developer of Watson, an online Semantic Web query service (http://watson.kmi.open.ac.uk) that enables you to quickly search all marked up data on the web to discover relevant resources. You can then issue SPARQL queries against the resource, or use some of the precanned API calls, such as subclass/superclass, and he showed how he quickly knocked up a search engine, Wahoo (derived from Yahoo), that used Watson to populate a sidebar with specialisations/generalisations of the search terms entered. There are other services out there too, such as OpenCalais SemanticProxy and Hakia. Also worth a look, apparently, is the Talis platform, that will actually store your data for you!

Claysnow

Blog Archive

Tuesday, 19 April 2011

Rekursiv

Stopping the Rot - Putting Legacy C++ Under Test

Friday, 8 October 2010

First Lego League

Thursday, 23 September 2010

Bookshelf blues

Tuesday, 28 October 2008

ISWC08 Monday p.m.

ISWC08 Monday a.m.

Sunday, 26 October 2008

ISWC08 - Sunday p.m.

Claysnow

Blog Archive

Subscribe To

Tuesday, 19 April 2011

Rekursiv

Stopping the Rot - Putting Legacy C++ Under Test

Friday, 8 October 2010

First Lego League

Thursday, 23 September 2010

Bookshelf blues

Tuesday, 28 October 2008

ISWC08 Monday p.m.

ISWC08 Monday a.m.

Sunday, 26 October 2008

ISWC08 - Sunday p.m.