monkeyiq

Monday, January 25, 2010

KOffice and RDF: Say it with Style...

This scattered series of posts has been about the RDF support I'm working on for KOffice. The ODF document format lets you store RDF/XML data inside the document file, which in turn lets both a human reader and a computer know about things that comprise an office document. You can refer to a person, place, or time and have the computer know what you are saying without having to resort to heuristics.

Having RDF support in document formats means you can send somebody a single file containing exact information about real world events. The RDF can contain details which can be pulled up in the formatting of the text that you see. For example, for a given contact you might know his phone number, home page, normal business location, email address etc. You might only want to see a small fraction of this information at one place in a document, but perhaps for a header you want to know the postal address too. Stylesheets are what I'm working on right now to let that happen.

At the start of the video below, you can see James, Joyce and Mark. As I click on these contacts, the RDF docker tells me information about them. As you can see, there is more information known to KOffice than is shown in the document (first & last name). However, for Mark, we also know where he is and that is shown in the RDF docker.

James is mentioned in the second paragraph, and the document is talking about giving him a call to verify something. Instead of hunting down his phone number, you can set a semantic stylesheet for that particular reference to him in the document to include is phone number inline in the document text. The added advantage here is that if you edit his phone number via the RDF docker, all the places in the document text that cite the phone number are updated for you. KOffice knows that those digits are James' phone number, so it can modify them for you.

Later on we again cite the event itself, just saying its "next weekend", which isn't an ideal description of when a specific event is happening. Luckily, we have cited the RDF event, so it shows up in the RDF docker and the stylesheets are available to reformat the text. In this case I want to see the summary and when it starts.

I'm working on adding user specified stylesheets now too, as the Format menu shows in the video. When you create a user stylesheet it is also saved in RDF, so the stylesheets you make become part of the document itself. They will be available when some other KOffice user loads the document.

The File/Document Information widget has a new RDF section which lets you see and directly edit the RDF triples if that's your thing, the semantic tab shows you all the higher level things that KOffice has seen in the document, like poeple, places, and events, and finally the stylesheets page lets you nominate how you want things formatted by default. For example, you might want to see a persons name and phone number so setting that to the default lets you then drag and drop some contacts from kaddressbook into the document and you will see the phone number as part of the document text.

Of course, you can drag and drop items from the RDF docker into kaddressbook and korganizer. These pieces of information should be able to be moved into and out of an ODF file using KOffice without thinking about it. You want to add Fred to the text, pick him up from your kaddressbook and drop him into the RDF docker. Your default contact stylesheet is then used to insert some text into the document at the current cursor location showing you the Fred contact. Quick and simple... Lets make RDF something everybody uses but nobody needs to learn about (unless they want to).

KOffice and RDF: Say it with Style... from Ben Martin on Vimeo.

Saturday, January 2, 2010

KOffice & RDF: Who, What, When, Where?

As mentioned in a previous post, ODF documents can contain one or more RDF/XML files. These files allow you to unambiguously encode information for both computer and human consumption. So you can describe a person in a way that tells you their phone number and also lets the computer know that these digits are a specific person's home phone number. Common data formats like vcard and ical have some encodings in RDF and soon a KOffice near you will understand these pieces of data from ODF files.

KOffice currently understands some of the FOAF vocabulary (storing contact data), and the rdfical format (for events). There are a few ways to encode longitude and latitude in RDF. The current patch supports two of them, with optional linking to rdfical. This is one of the major strengths of RDF, you can say who, where and when and also link these things together so an event carries not only a time but its location information too.

The below video shows the new RDF docker in aciton. As you click on text that has associated RDF, the docker shows you the interesting information. Frodo and Sam are assoicated with both traditional contact data and a location. The items in the RDF docker let you import them into your system (into kaddressbook or korganizer) or export them to well known desktop formats like vcard and ical. Editing locations is done with Marble and there is only a minimal set of information for contacts and events currently. Note that mid way through when I edit an event, timezones are respected. If the RDF describes an event as being in Tokyo, that timezone offset from your current localtime is respected.

Towards the end of the video I show that contacts can be simply drag and dropped between koffice and kaddressbook. This also works for events to/from Evolution but I had some issues with korganizer for events. D&D; makes KOffice and ODF quite a convenient format for transmitting semantic information to colleagues in a single, self describing file.

KOffice & RDF: Who, What, When, Where? from Ben Martin on Vimeo.

Friday, December 18, 2009

This blog post message was filtered by the Australian Government.

Saturday, December 5, 2009

Office documents that mean something?

I've been hacking on the development branch of KOffice to add RDF support. Thanks to KO GmbH for this chance! This means that while a document can posit a linear progression of characters and words like normal, the software can start to understand a bit of what you are talking about too.

The ODF document specification allows RDF to be included in the document zip file either inline in the main content.xml file, or using manifest.rdf and other RDF/XML files in the zip. The RDF is free to refer to XML elements in the content.xml, so you can add metadata to names, places, times etc so that a computer can work out unambiguously what you are talking about. This makes an ODF file a very powerful container format for transmitting meaning to people.

At it's heart, RDF represents all information in triples. Bob knows Alice, etc. See lwn.net or other sources if you want to know more about RDF. But you don't have to know it to use it with KOffice ;)

Below you can see that I've hacked the "Document Information" window to let you see the raw RDF if you want to know exactly what is going on:

If on the other hand, you are not a developer and/or don't really care about triples or RDF, there is the Semantic view. Right now I only have support for Contact information which is drawn from the FOAF RDF vocabulary. If the later doesn't mean anything to you, its just a way to describe people and their relationships using RDF. As you can see, you don't have to care about RDF here, you just see people. Right clicking on a person lets you do things like import them into your contacts database, or, in the future, other things like email them or phone them.

While it is nice to be able to have contact information extracted from the document, it is much nicer to have KOffice know exactly where that information relates in the document. For this I have a new Docker for RDF as seen in the below video. First you see that the addressbook is empty. Then I start moving the cursor around in the document. Notice that the docker picks up when the cursor is on the name of a person that has RDF metadata. You see more information about Frodo, and then I import Sam. The contact entry in the addressbook created for the import brings his phone number in too, which was part of the RDF but not explicitly shown in the document.

KOffice starts getting RDF from Ben Martin on Vimeo.

I hope to add support for other things like times, locations, and relationships. This way you can send somebody a document describing a meeting and they can import the time into their smart phone directly from the document... can we make retyping such information a vague memory like punch cards?

Thursday, November 12, 2009

Clawmotia, look and feel and portrait mode

Clawmotia is a remote control for MythTV using Qt and qedje. The claw works on maemo and desktop systems. I have updated submenus to fade better, added hotkey support, and improved images and icons since 0.2. Grab it at my maemo repository.

In the shot of it running on the device shown below, the euro sign is for commercial skip forward and back. Having the skip back in the top right of the device makes it very easy to hit if an auto skip has gone too far. It is no accident that pause, mute, and cancel are placed in the other corners. The icons below change the volume up and down. In the middle of the top is an icon to bring up a submenu letting you set the aspect ratio and stretch the video in various ways to get maximal use from your screen.

I have also been tinkering with allowing switching between landscape and portrait mode, the edje is ready, there are just a few things to iron out before it works as expected. Of course, such switching will be far more useful on an n900 where it can happen automatically as you rotate the device. But for n8x0 users a button will be available trigger a rotate...

Sunday, November 8, 2009

Improving the CLAW!

Clawmotia is a MythTV remote using edje and Qt. It works well on maemo, desktop and other devices. It requires a MythTV Web setup to talk to. See previous posts for how to set it up.

I got rid of the rendering artifacts on maemo, added hotkey support (volume, fullscreen) and some of the buttons on the main remote screen now bring up submenus. This paves the way for more advanced remote control configurations where intricate but rarely used controls are still available on the remote. For example, aspect ratio and image fill are unlikely to be needed on a well setup MythTV installation, but they are in a submenu of the remote should they be needed. I know the backgound for the submenu panel is not really nice, but with the drop shadow and pan in effect (which also needs tweaking) you can easily see it's a submenu.

Grab it here. I've included the x86_64 and armel binaries as well as the edj "edje" file. You'll need QtCore and qedje and you're in business. For the mameo you'll still have to make a menu shortcut yourself, and of course, see the README.

Better artwork very welcome. If you have skills, I should be able to take an xcf file and turn out a theme. Conversely, take a look at the edc file and you shouldn't have much trouble adding/removing buttons and changing the images.

Saturday, October 31, 2009

White lightning in triplicate

Recently I started hacking on a memory mapped, multi_index soprano backend. While adding triples, and using listStatements() should work fine, implementing SPARQL is making for interesting times.

I started out allowing a single triple match with a filter(regex()) to restrict results. And this worked rather well, making the first one free as they say. So, noticing the little white rabbit that seemed to disappear into the SPARQL bushes, I decided to join in the high tea and mercury sniffing that so induces sanity. Over the course of version 0.0.1 to 0.0.5 the SPARQL code is becoming better, little by little. The code is up at my sf.net page. But don't blame me if the your SPARQL is not implemented yet or your triples somehow disappear.

Anyway, here is a little benchmark session. I'm using the data set generator and queries found here. To make the data I use


$ cd /usr/local/java/bsbmtools
$ cat run.sh
#!/bin/bash
java -cp bin:lib/ssj.jar benchmark.generator.Generator "$@"
$ ./run.sh -fc -pc 1000 -s nt
$ mv dataset.nt  thousand-prods.nt
$ mkdir -p /tmp/RDFBENCH
$ cd /tmp/RDFBENCH
$ mkdir mmap redland

Queries are run multiple times to ensure a hot disk cache. This is on a 3 disk RAID-5 and an Intel Q6600 with 8gb RAM.
The last query is not optimized properly in boostmmap yet, so its far slower than it rightly should be. For benchmarking the boostmmap backend...


$ cd /tmp/RDFBENCH/mmap
$ time sopranocmd --backend boostmmap \
  --serialization ntriples \
  import /usr/local/java/bsbmtools/thousand-prods.nt >|out 2>&1

real    1m49.642s
210M     triples.mmap*

$ time sopranocmd \
  --backend boostmmap \
  list "" '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>' \
  '<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product>' \
   >| /tmp/out 2>&1

real    0m0.103s
grep Product /tmp/out | wc -l
1001

## based on Query 6
$ time sopranocmd \
  --backend boostmmap query \
"
select ?what ?lab
where
{
  ?what http://www.w3.org/2000/01/rdf-schema#label ?lab .
  filter( regex( str( ?lab ), 'excites' ))
}"
?lab -> <yawned%20excites%20deflower>;
  ?what -> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/productfeature295>
?lab -> <goofs%20excites%20enigmata>;
  ?what -> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/productfeature3276>

real    0m0.091s


$ time sopranocmd --backend boostmmap query \
"
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix xsd: <http://www.w3.org/2001/xmlschema#>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?offer ?price
where {
    ?offer  bsbm:product http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/dataFromProducer1/Product5 .
    ?offer  bsbm:vendor ?vendor .
    ?vendor bsbm:country http://downlode.org/rdf/iso-3166/countries#ES .
    ?offer  dc:publisher ?vendor .
    ?offer  bsbm:price ?price .
}"
0.93sec

Note that this 0.9seconds is shameful and needs to be optimized back to <0.1sec.

For redland,


$ cd /tmp/RDFBENCH/redland
$ time sopranocmd --backend redland \
 --serialization ntriples \
 import /usr/local/java/bsbmtools/thousand-prods.nt \
 >|/tmp/out 2>&1

real    38m34.735s
480mb

$ time sopranocmd --backend redland \
  list "" \
  '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>' \
  '<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product>'  \
  >| /tmp/out 2>&1

real    0m0.096s
grep Product /tmp/out | wc -l
1000

So for just listStatements() redland and mmap are fairly equal in performance. Which, for a single indexed lookup, you might expect. In libferris I had restricted RDF usage to raw triple probes like this because I used redland directly prior to version 1.4.x of libferrris.

So for SPARQL,


## based on Query 6
$ time sopranocmd --backend redland query \
"
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?what ?lab
where
{
  ?what rdfs:label ?lab .
  filter( regex( str( ?lab ), 'excites' ))
}"
what -> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/productfeature295>;
   lab -> "yawned excites deflower"
what -> <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/productfeature3276>;
   lab -> "goofs excites enigmata"
real    0m3.855s

Gah, and I didn't slip up and put the 3 on the left side of the dot there. We are talking about 0.1 seconds for boostmmap against 3.86 seconds for redland.


$ time sopranocmd --backend redland query \
"
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix xsd: <http://www.w3.org/2001/xmlschema#>
prefix dc: <http://purl.org/dc/elements/1.1/>
select ?offer ?price
where {
      ?offer bsbm:product <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/datafromproducer1/product5>
      ?offer bsbm:price ?price .
      ?offer bsbm:vendor ?vendor .
      ?offer dc:publisher ?vendor .
      ?vendor bsbm:country <http://downlode.org/rdf/iso-3166/countries#es> .
}"
real    0m7.134s

Since this query doesn't work well on boostmmap it only goes from 1 to 7 seconds. But I think I can resolve it in much much less time than 1 second. This is not meant to make redland look bad, it's SPARQL implementation is much more complete than boostmmap will likely be any time soon. Creating an optimal query plan for the full SPARQL language will be an interesting challenge.

Development might be bursty as I don't know what time I can spare for improving the SPARQL completeness in the short term.