I blogged it earlier, but better to get everything in one post, so taking the summary again:
After doing my GSoC project for Wikimedia foundation / Semantic MediaWiki in 2010, resulting in the RDFIO Extension, I finally could make it to the Semantic MediaWiki Conference, which was in Berlin in September.
Now, the video of my talk, "hooking up Semantic MediaWiki with external tools via SPARQL" (such as Bioclipse and R), is out on YouTube, so please find it below. For your convenience, you can find the slides below the video, as well as the relevant links to the different stuff shown (click "read more" to see it all on same page).
From today evening, I'm taking one week off from work to make a sprint to try to finalize the RDFIO extension, for RDF import and export in Semantic MediaWiki.
This will be a required step for finalizing the vision described in my SMWCon fall 2011 talk, the other month.
I developed RDFIO as part of Google Summer of Code 2010, and it got into a working proof-of-concept state. Some issues, such as with performance, never were resolved though. Also it depended upon two other modules, which, after Semantic MediaWiki changed a lot of it's internals in version 1.6, have still not been updated to support these, leaving RDFIO in a state where it does not support SMW 1.6.
So, a little sprint is definitely needed, to get RDFIO in working condition again. At the same time I hope to look at it with fresh eyes, after having a lot more coding experience now, than when I coded RDFIO, after one year of quite some Java and Python development in my work at UPPMAX.
Things I plan to have a look at (or at least ponder):
I would love to get some feedback and input to the project during this intensive week, so don't hesitate to drop in at #semantic-mediawiki on irc.freenode.net (IRC chat) or in the SMW-devel mailing list! My contact options, summarized:
Looking forward to your input during this week!
After doing my GSoC project for Wikimedia foundation / Semantic MediaWiki in 2010, resulting in the RDFIO Extension, I finally could make it to the Semantic MediaWiki Conference, which was in Berlin this week.
While I write up a longer review of the many interesting talks, you can in the meantime find the slides from my talk below, on "hooking up Semantic MediaWiki with external tools" (such as Bioclipse and R):
The original use case behind the RDFIO Semantic MediaWiki Extension which I developed as part of "Google Summer of Code 2010", and which was to hook up SMW with Bioclipse, is now concretizising. By using the new Bioclipse SMW Module (code here) it is now for the first time possible to add and remove SMW facts from inside Bioclipse, using a little Bioclipse JS Script:
var wikiURL = "http://drugmet.rilspace.org/wiki/"; smw.addTriple( "w:Caffeine", "w:is_a", "w:Molecule", wikiURL );
Removing triples is similar:
var wikiURL = "http://drugmet.rilspace.org/wiki/"; smw.removeTriple( "w:Caffeine", "w:is_a", "w:Molecule", wikiURL );
Well, you can use full URI:s also, but using the "w" prefix references wiki article titles directly. Thus you can view the result of the addition at http://drugmet.rilspace.org/wiki/Caffeine
What does this mean? Well, one thing is that with Bioclipse you can edit facts in SMW with the ease and power of javascript! This could enable scenarios where an SMW gets prepopulated with data for subsequent community editing, whereafter data can be transferred back to Bioclipse again, (as blogged about by Egon Willighagen already), possibly making community editing of scientific data mainstream!
There's a little convenience method for getting all RDF data from the SMW too:
rdfStore = rdf.createInMemoryStore(); rdfStore = smw.getRDF( "http://drugmet.rilspace.org/wiki/" );
... which you can then query locally with SPARQL, using the rdf manager:
result = rdf.sparql( rdfStore, "SELECT DISTINCT ?p WHERE { ?s ?p ?o } LIMIT 10" ) js.print( result );
...getting some output like so:
[["p"], ["http://www.w3.org/1999/02/22-rdf-syntax-ns#type"], ["http://www.w3.org/2000/01/rdf-schema#domain"], ["http://www.w3.org/2000/01/rdf-schema#range"], ["http://www.w3.org/2000/01/rdf-schema#subPropertyOf"], ["http://www.w3.org/2000/01/rdf-schema#subClassOf"], ["http://semantic-mediawiki.org/swivt/1.0#wikiPageModificationDate"], ["http://semantic-mediawiki.org/swivt/1.0#wikiNamespace"], ["http://www.w3.org/2000/01/rdf-schema#isDefinedBy"], ["http://semantic-mediawiki.org/swivt/1.0#page"], ["http://www.w3.org/2000/01/rdf-schema#label"] ]
That's it! Oh, well, I demonstrated the same thing with a screencast as well:
Yay! :) As getting this to work, we ran into a number of bugs in RDFIO, so that also resulted in a new release
Version 0.5.0 of RDFIO, the MediaWIki extension providing PHP-based SPARQL endpoint and RDF import capabilities to to Semantic MediaWiki (and previously developed as part of my GSoC 2010 project ), is now released.
The 0.5.0 release fixes numerous bugs that were encountered as me and Egon Willighagen were working to hook up SMW with Bioclipse, as I blogged about earlier. We have now got this connection up running, so hopefully we tracked down most of the relevant bugs. A shortlist of changes can be found in the changelog. Links to download plus install instructions to be found on the extension page.
I hope to blog/screencast about the new Bioclipse->SMW editing functionality shortly.
Now that the autumn is here I have some new projects starting, and running for approximately this month (before jumping out into the real world and trying to get a real job :) ). Good thing is, it all builds upon previous work.
First thing is, I'll work with Egon Willighagen to hook up Bioclipse with Semantic MediaWiki, via the RDFIO extension, which I developed as part of GSoC this year, mentored by Denny Vrandecic. I'm excited about putting RDFIO into some real world usage!
Second thing is I'll work part time at the Bioclipse group to improve the ways user documentation for plugins is authored and published, enabling some automation of publishing content to the Bioclipse website as well as the wiki etc. This will hopefully make it a lot easier for end users to find their desired extra functionality, and make more users see the value of a research platform with an open and modular structure, as is Bioclipse!
With the release of RDFIO 0.4.0, my GSoC 2010 project is now over!
I want to thank especially my mentor Denny Vrandečić, and also the SMW community at large for a great time! I also want to sincerely thank my masters project mentor Egon Willighagen who mentioned about, and encouraged me to apply to the program. Without this encouragement, I'd never taken the step. It has been a good time and rewarding, and I've much enjoyed to have time to get a bit into MediaWiki/SMW extensions development, as well as to provide some new functionality to these great bits of software.
The main GSoC coding is now over, and I will need to take a little break for an exam this friday, but surely I'll continue to refine the RDFIO extension later, especially as Egon and me are looking into using it to integrate Bioclipse with SMW in order to make RDF data in Bioclipse "Community editable" (could turn out to be some real useful stuff!).
This new release brings a lot of refactoring and reworkings under the hood, as well as quite a few minor bugfixes and improvements here and there, so upgrading is recommended. We also got the issues in SMWWriter fixed now, so patching it is no longer needed, which hopefully will make installing easier!
Of the more notable changes are the improved selection of wiki titles on import, as described in this blog post. Another important fix is that the default output format (if not specifying any) is now "SPARQL Resultset XML" which now makes the SPARQL endpoint fully "SPARQL compliant" and queryable from typical SPARQL tools like Jena. It is a remaining topic though how to allow update operations without leaving the endpoint wide open ... i.e. how to implement some form of user rights checking, when used as a webservice.
A little technical note also is that RDFIO now takes the $wgDBprefix parameter into account, so if you are using RDFIO with table prefixes in the database, you will need to regenerate the tables and the triples in the store (can be done at the Special:ARC2Admin and Special:SMWAdmin pages respectively).
I should not end without a note about some great bits of existing code that I've had the pleasure to make use of:
This week I just finished the last remaining items on my todo list for my Google Summer of Code project, (which is available in the form of the RDFIO MediaWiki extension). Those things, which I also mentioned in my last blog post were to:
Regarding the first point, it might not be overly easy to see the usefulness of it at once, so I just created a screencast to show the difference between using it and not:
It demonstrates the problem of choosing sensible wiki titles for general RDF entities in case no good property for naming is available, (such as rdfs:label etc) ... since "entity" URIs often just consist of nonsensible id:s and often no namespace prefixes are defined for them. RDFIO lets you add "pseudo" namespaces (using a simplified splitting pattern, not necessarily consistent with XMLns specs), in order to come around this problem.
Hopefully I'll find time to also demonstrate the second point above, as well as the "filter by ontology" feature for the SPARQL endpoint, with screencasts early next week.
Otherwise, the coming week I'll use for doing some refactoring of the currently quite unmanageable code, as well as add commenting, and hopefully also add the feature to filter RDF export by a [[Export RDF::false]] SMW property (which was the "it time permits" item of my TODO list).
I just created a new release of the RDFIO MediaWiki extension. A somewhat detailed list of the changes can be found in the change log. The relevant links:
New for this release is a "export by ontology" feature, that - when possible - restricts the URIs used for a wiki page to only those that appears in an ontology that the user points to. To give an idea of this feature see the following screenshot:
On the page "Samuel", I have one fact:
[[has blog::http://saml.rilspace.org]]
... and on the page "Property:has_blog", there are a number of facts, including:
[[Equivalent URI::http://xmlns.com/foaf/0.1/weblog]]
[[Equivalent URI::http://example.org/ExampleOntology/weblog]]
From the "remaining TODO list" from my last blog post, the following are finished with this release:
The remaining items ones are now:
Just to inform that I created a new release of the RDFIO MediaWiki extension. It contains important security fixes, by adding at least some basic checking of user rights and CSRFs (Cross site request forgeries) to the SPARQL endpoint, RDF import form etc. Thus, it's highly recommended to upgrade if you are using the extension on a public wiki!
Also, you might already have seen:
Otherwise, me and Denny just confirmed the remaining TODO list for my GSoC project, which is what I start working on now:
There are also some extra additions that I'll look into if time permits, like adding support for filtering the output on export with a property such as "RDF export::False", as suggested by Daniel Herzig.