Sunday, 26 October 2008

ISWC08 - Sunday a.m.

International Semantic Web Conference 2008
Tutorial - Introduction to the Semantic Web

Sunday 26th October - Morning

It's Sunday, the clocks have just changed and the sun is shining. I've survived my first night in a small room at the Hotel Barbarossa that usually houses chain smokers - my throat is sore and I've got a long day ahead of me (and the prospect of the same stale cigarette smell for the next 4 nights - the hotel is full and they have no non-smoking rooms).

I'm fairly new to Semantic Web technology, which is why I booked into the introduction tutorial, but I've read "Semantic Web for the Working Ontologist" and have been working with RFD(S)/OWL for a few months now, so I didn't want a simple re-hash of online documentation. I needn't have worried. The day is split into 9 sessions from 10 speakers and paints a wide canvas of what's going on and where we might be heading.

The session is opened by Jim Hendler with his "Introduction to the Introduction to the Semantic Web". Jim is an engaging presenter, with a style that reminded me a little of Jim Coplien (must be that first name :-). His main theme is that there are 2 different interpretations of the Semantic Web, one originating in the AI community (heavyweight ontologies, processing time less relevant than correctness/completeness of answers) and the other from the internet/web community (lightweight ontologies, speed of response much more important than correctness/completeness). These views, he claims, are not irreconcilable, but there remains a large space between the two that still needs to be explored.

Sean Bechhofer takes over with an "Introduction to OWL". In 45 minutes he's never going to cover RDF, RDFS and OWL in any depth , but he does sketch out the landscape all the way from motivating the need for semantics through inference, OWL dialects and DL. He even manages to fit in brief comments on the Necessary/Sufficient Conditions, Open World Assumption and Unique Naming Assumption. Not surprisingly he doesn't get through all 67 slides and skips SKOS and OWL2... I don't know about him, but I needed that coffee.

Half an hour later we hear from Asun Gómez-Pérez about "Ontology Engineering Methodologies". She covered the NeOn methodology for developing ontologies, which defines a process and some process artefacts to assist ontology developers. There's an ontology workflow, with 9 different scenarios defined, a card based system for capturing relevant data and some guidelines about how to reuse existing resources. I found Asun hard to follow, due to a combination of her accent, the speed with which she went through the material and some problems with the microphone. However, I think that I'll be looking closer at the NeOn project when I get back home.

Asun was followed by Aldo Gangemi's "Ontology Design". He covered the motivation for ontologies in general, contrasted them with controlled terminologies and reiterated that ontology design was about the non-trivial task of matching a solution (ontology) to the problem at hand. His main interest appears to be discovering and documenting ontology design patterns, which he divides into categories such as 'Logical Patterns' and 'Content Patterns'. A website has been launched as a repository of ontology patterns at http://www.ontologydesignpatterns.org . He then introduced the concept of unit testing ontologies, which was something I haven't seen elsewhere, and demonstrated by demonstrating some fundamental errors in the conferences own ontology, such as missing inverse properties, missing disjunctions and missing property transitivity.

We still, haven't reached lunchtime! Fabio Ciravegna takes us through "Technologies for Capturing, Sharing and Reusing Knowledge". There are a lot of resources in the world; some of them are text-only, but most are not, which makes automated markup really hard. However, the amount of data out there precludes manual-only annotation in most situations. Fabio talks about hybrid methods of annotating text documents, where the system learns how to do markup by tracking a human annotating a sample of the documents to be marked up. There are simpler automated markup techniques, such as Named Entity Recognition, Terminology Recognition or Known Name Recognition, which can have precision/recall accuracy of up to 80%-95%. However, when more useful (and complex) techniques are attempted, such as capturing links between elements in a document, precision/recall hit a plateau of 60%/70% in 1998 and has stayed there. And that's without trying to automate annotations of multimedia content. Once the data is annotated it becomes easier to share it, but there are still hurdles to overcome - there's a lot of research about searching/querying: keyword/semantic or hybrid. Again, hybrid appears to win, but even merging the results from a hybrid search is tricky. Look at k-now.co.uk a spinoff from Sheffield University, which has been highly rated by Rolls-Royce, no less.

My blood sugar level dipped too low during Fabio's talk, and looking back over the 100+ slides I wonder why I can only remember a fraction of them. Did he skip them or was I dreaming of lunch?

Tuesday, 6 May 2008

RichEd20.dll

Recently, I have been dealing with a number of defects that all come home to roost with Microsoft's RichEd20.dll - or more accurately the variation between versions of the DLL.

The first issue was spotted when copying an OLE object from word and pasting it into a Rich Edit control hosted by my client's application. It worked fine when Office 2000 was installed, but the formatting was broken once Office 2007 was installed. (The pasted object was rendered in two parts - an embedded object and a static metafile).

And bizarrely, copying and pasting an embedded image from one Rich Edit control to another crashes when using the RichEd20.dll that ships with XP, but works fine when using the version that ships with Office 2007.

In both cases the presence of some text on the clipboard as well as the OLE object, significantly altered the behaviour of the paste operation.

Joyfully I searched the web, but didn't find much help, except for this blog entry from Murray Sargent that lists the various RichEd20.dll versions and where they may be found.

So, if you experience weird formatting (or worse) when pasting an OLE object into a rich edit control, check out what version(s) of RichEd20.dll you've got installed.