RDF

Sensible wiki titles on RDF import with "pseudo RDF namespaces"

This week I just finished the last remaining items on my todo list for my Google Summer of Code project, (which is available in the form of the RDFIO MediaWiki extension). Those things, which I also mentioned in my last blog post were to:

  • Add ability to use ("pseudo") namespaces for general RDF entities (non-properties) in order to choose wiki titles for them on RDF import.
  • Add a screen that shows URIs lacking a namespace prefix to abbreviate it with.

Regarding the first point, it might not be overly easy to see the usefulness of it at once, so I just created a screencast to show the difference between using it and not:

It demonstrates the problem of choosing sensible wiki titles for general RDF entities in case no good property for naming is available, (such as rdfs:label etc) ... since "entity" URIs often just consist of nonsensible id:s and often no namespace prefixes are defined for them. RDFIO lets you add "pseudo" namespaces (using a simplified splitting pattern, not necessarily consistent with XMLns specs), in order to come around this problem.

  • The new functionality is so far only available in the svn trunk
  • More info, install instructions etc on the extension page

Hopefully I'll find time to also demonstrate the second point above, as well as the "filter by ontology" feature for the SPARQL endpoint, with screencasts early next week.

Otherwise, the coming week I'll use for doing some refactoring of the currently quite unmanageable code, as well as add commenting, and hopefully also add the feature to filter RDF export by a [[Export RDF::false]] SMW property (which was the "it time permits" item of my TODO list).

RDFIO 0.3.0 released

I just created a new release of the RDFIO MediaWiki extension. A somewhat detailed list of the changes can be found in the change log. The relevant links:

The filter by ontology / vocabulary feature


New for this release is a "export by ontology" feature, that - when possible - restricts the URIs used for a wiki page to only those that appears in an ontology that the user points to. To give an idea of this feature see the following screenshot:

On the page "Samuel", I have one fact:

[[has blog::http://saml.rilspace.org]]

... and on the page "Property:has_blog", there are a number of facts, including:

[[Equivalent URI::http://xmlns.com/foaf/0.1/weblog]]
[[Equivalent URI::http://example.org/ExampleOntology/weblog]]

What will happen when submitting the form in the screenshot is; if I click only "Output Equivalent URIs", then both of the above facts will be exported, but by enabling "Filter by Vocabulary" and setting the URL to FOAF:s definition file, the export will be filtered to only contain the first fact, which is included in the FOAF definition.

Current GSoC status

From the "remaining TODO list" from my last blog post, the following are finished with this release:

  1. In the SPARQL endpoint, enable querying by any URI specified as equivalent URI
  2. For RDF export, implement an "export by ontology" option, that - when possible - restricts the URIs used for a wiki page to only those that appears in an ontology that the user points to.

The remaining items ones are now:

  1. Create an HTML interface for interactively configureing how wiki titles should be chosen for RDF entities for which no preferred "wiki title property" (such as rdfs:label, dc:title etc.) was found.
  2. Add "pseudo namespaces" as an option for choosing wiki titles from general RDF URIs (not only properties!). I.e, the possibility to abbreviate a part of an URI into a pseudo-namespace, making the URI more fit for use as wiki title. (For properties, there if often a well known abbreviation for the corresponding vocabulary/ontology's base URI, but this is often not the case for general RDF entities, which can often be from some user defined data etc).
  3. If time permits:
  • Implement filter by "export rdf" property.

Screencast: Installing Semantic MediaWiki and RDFIO from scratch on Ubuntu

In a previous blog post I demonstrated with a screen cast the RDFIO extension for Semantic MediaWiki but nothing on installation.

By testing I realized that the install procudure was VERY painful. I have now (with much valuable help from Oleg Simakoff) corrected a number of errors in the instructions and the code, and added to the install instructions commandline snippets for linux/ubuntu. I also created a screencast which goes through the steps from scratch (except Apache/MySQL/PHP setup), in a little more than 5 minutes. Hope this makes things easier for you testers! (And as you might try it out, please report any bugs or issues in the issue tracker!)

Sorry for the low volume level! Didn't realize that while recording ... :/

Screencast: RDF Import and SPARQL "Update" in Semantic MediaWiki

So, for those of you who might think the Install instructions for the RDFIO Semantic MediaWiki extension I'm working on are a bit daunting but would like a glimpse of what my GSoC project is up to anyway, I created a short (3:20) screencast demonstrating (ARC2 based) RDF Import and SPARQL "Update" functionality for some example data. (Sorry for the lame speaking ... :P ... didn't sleep for a looong time )

The screencast shows how you can import RDF/XML into Semantic MediaWiki and then use the SPARQL endpoint to insert or remove data to/from articles, even using the original format of the RDF that you imported earlier.

(For you who decide to try to install, please have a look at the error fixing happening in this thread.)

RDF properties to use for wiki titles on import - suggestions?

One of the things we try to do in my GSoC project is to select suitable wiki titles when importing arbitrary RDF triples into Semantic MediaWiki (The full RDF URI:s are very ugly to use as wiki titles!).

Simply shorting the namespace in the entitiy's URI to it's prefix (as specified in the import data) could be a general fallback (see screenshot) but for many types of data it could be nice to make use of a property that puts some "natural language" label for the entity instead. We came up with this list of properties so far:

More suggestions?

Even nicer with more prefixes

 

Ok, even nicer now ... after allowing to introduce custom namespace prefixes also with '?' and '=' as separators (Don't know if that is valid for QNames though? Anyone knows?) than before (blog post 1, and post 2):

Code (see code repo) is still quite ugly though. Just typed together to get things working. Will look at structuring things more as soon as I have slept a little better :)

I also got to think of these random issues, which is foremost questions to myself, that might not be fully thought-through yet, but jotting them down here for possible feedback (hoping it's readable ^^):

  • Imports will have to be done according to a special "mapping configuration", so that the same config can be used on import and export, if wanted. Such a config should be generated on import based on the RDF namespaces used. It should also be manually expandable with a prioritized list of properties (like dc:title) to use as substitute for URIs. Maybe it could be stored in a (timestamped?) wiki article?
  • When using "natural language" substitutes for URI:s, such as "dc:title", one has to put a property such as swivt:originalURI or similar into the page, so that it can be exported again with its original URI (or there is already the swivt:importedFrom (or sth like that...) ... is it better to use that?
  • The SPARQL endpoint will need to be accessed according to the same kinds of mapping configs if the wiki should be able to be queried in the same format as the import data ...

Using RDF namespace prefixes in wiki titles

 

As you might have seen in my previous blog post, using the full RDF URI:s as wiki titles on importing, is not optimal. I now managed to use the RDF namespace prefixes submitted on import time, to make titles nicer. Definitely improves things a bit. At least all properties are not all treated as the same as in the previous blog post for example.

But, also still a bit to go. Not all URI:s have prefixes, and for some things you would probably want to use the corresponding dc:title or similar instead of an abbreviated URI, etc.

 

Populating SMW from RDF/XML (First test)

I now managed, (using ARC2 and SMWWriter, in a MediaWiki extension) to populate Semantic MediaWiki pages with triples, from a snippet of RDF/XML (Thanks to Egon Willighagen for the RDF-ized NMRShiftDB data, submitted further below), yay! But ... as you can see, using the full URI:s as wiki names is not a good idea. URI:s as wiki titles is ugly ... and the predicates in this case even got truncated and all treated as the same one, since they had hashes in it, which MediaWiki obviously doesn't allow in titles), so that's the whole background for our talking so much about the "equivalent URI handler" (mentioned here for example), which is thought to be a configurable handler of mappings from URI patterns to (sensible) wiki titles. Also, optimally the same pattern can then be used both on import and export so that the format is kept, allowing to use SMW as a (collaborative) RDF editor! (which is one of the main motivations for my GSoC project).

Well, the hard bits (URI -> Wiki title mapping) remains. Diving in to it now ...

Working ARC2 RDF Store connector committed

Now I have a working RDF Store connector for Semantic MediaWiki, that uses ARC2:s RDF store, rather than SMW:s built-in store. This will allow to take advantage of functionality in ARC2, such as possibility to set up a SPARQL endpoint etc.

Thanks to Alfredas Chmieliauskas for the Joseki store connector in the SparqlExtension for SMW, which this connector is heavily based upon.

The ARC2 connector implements the same amount of the SMWStore API as the JosekiStore, but I'm not yet sure if more needs to be implemented, for the things we want to do (general RDF import/export). Gotta figure that out.

The code is available in the google code repository trunk, and install instructions on the gcode wiki.

Feel free to try it out, but be warned that it has been only very briefly much tested at all yet!

GSoC Proposal: "General RDF export/import in Semantic MediaWiki"

This is a slightly shortened version of the full Proposal, iniially posted on my user page on MediaWiki.org, and then in final form on the GSoC app site.