The Semantic MediaWiki conference Fall 2011 in Berlin is over, so time to summarize some thoughts and impressions.
It was my first SMWCon at all. A bit late regarding that I did a Google Summer of Code project for SMW in 2010, but my finances were kind of inexistent then. Happy to get the chance now though.
Before turning to some of the individual talks, just a note of two general things I found interesting (I wish I would have time to review each of them, since there was so much interesting stuff...):
Note that you can now find slides and videos for most of the talks, online:
The conference started with one tutorial day, followed by the main days for talks. The tutorials turned out to be so interesting though, so that most people seemed to attend them as well.
Find below my very brief notes/impressions on some talks that I found specially interesting, for my use cases and interests:
Daniel Hansh from OntoPrise showed off their SMW+ Community Edition package, which includes SMW, Halo and other extensions. This is quite cool stuff, with really helpful and slick UIs, so let's hope it will remain open source! :) (Slides, Video)
Markus Krötzchs talked on "Saving C02: Top SMW Performance Issues and How to Address Them". The slides are cram full of link to more detailed in formation, so this I'll have to study in more detail. (Slides , Video)
As said, there were lots of RDF talks on the conf. One of which is some reworkings of the SMW internals to support the RDF and SPARQL models better. Markus Krötzch gave an overview of the role between SMW and RDF in his talk "Connecting SMW to RDF Databases: Why, What, and How?" (Slides)
Jeroen De Dauw presented a new extension: "Semantic Watchlist", to replace a number of (in Jeroens opinion, somehow lacking) extensions. Looks very welldone! (Slides, Video)
William Smith, Christian Becker and Andreas Schultz presented some very cool sutff: "Neurowiki: How we integrated large datasets into SMW with R2R and Silk / LDIF". The LDIF framework seems to do a lot of what RDFIO does, but in a much bigger and more capable framework. Really cool stuff! (Slides 1, Slides part 2, Video)
Yaron Koren presented an idea to store "classes" in SMW as XML/JSON in one single location, rather than as now, in three different places (Category, Template and Form). (Slides, Video)
Denny Vrandečić and Daniel Kinzler presented the WikiData extension, as part of their work to make a "Semantic Wikipedia" a reality ... and based on another nice demo-project they did: Shortipedia. This is gonna be hot stuff! (Slides, Video)
Anja Jentzsch from the LODD presented Linked Data, and the best practices for how to publish RDF data, so that you really "get connected" to the evolving Semantic Web. An increasingly important topic, as shown by the increased interest in connecting SMW with the outside world. (Slides, Video)
My talk ... (as blogged earler)
Michael Erdmann also presented how they do Data integration with SMW+ and OntoBroker. They interestingly use a similar strategy as RDFIO in order to nicefy wiki page titles. Interesting! ... maybe there is a way to consolidate all these efforts in a reusable way? (Slides, Video)
Jeff Pan talked on "Tractable Reasoning". Very interesting! They focus on "making reasoning reasonable", that is, computationally feasible ... and seem to have succeeded as well, their REL reasoner has shown to totally outperform reasoners such as Pellet for more or less any kind of ontology, cool! (I found out it's quite easy to beat pellet though, earlier, with SPARQL/ARC, and especially with PROLOG). (Video)
Markus Krötzsch and Jeroen De Dauw talked about the next steps for Semantic MediaWiki. Many great things happening: Foundation started, ... (Slides, Video)
Benedikt Kaempgen demonstrated some supercool stuff, something like a pivot browser for Semantic MediaWiki, in order to get more "Excel-like" statistics in SMW. They use the Spark extension by Jeroen, to query SPARQL endpoints and similar stuff, from javascript. Supercool! (Slides, Video)
Benedikt also showed of their Semantic Web browser, that only requires "Equivalend URIs" to be defined for pages, and then let's you browse the data in the wiki in it's original format. Interesting since that matches perfectly with RDFIO, in that RDFIO complements this with also RDF export and querying in original format, using basically the same strategy! (Slides, Video)
Mike Cariaso talked about what's new with SNPedia ... lot's of cool stuff (apart from how cool SNPedia is just in itself!) (Video)
Not to forget, there were also a whole bunch of very interesting lightning talks ... too many for me to have time to cover here now. One thing you really should not miss though, is the SPARK extension by Jeroen De Dauw, to query SPARQL endpoints via javascript, for visualizations. Wow! (Slides).
Also, for you Bio-people readint this, the SNPedia + GeneWiki mashup, you'll probably find interesting! (Video)
Well, you better watch them all anyway, they are only 5 minutes each, and there's just too much good stuff there, so I can't cover it all:
After doing my GSoC project for Wikimedia foundation / Semantic MediaWiki in 2010, resulting in the RDFIO Extension, I finally could make it to the Semantic MediaWiki Conference, which was in Berlin in September.
Now, the video of my talk, "hooking up Semantic MediaWiki with external tools via SPARQL" (such as Bioclipse and R), is out on YouTube, so please find it below. For your convenience, you can find the slides below the video, as well as the relevant links to the different stuff shown (click "read more" to see it all on same page).
From today evening, I'm taking one week off from work to make a sprint to try to finalize the RDFIO extension, for RDF import and export in Semantic MediaWiki.
This will be a required step for finalizing the vision described in my SMWCon fall 2011 talk, the other month.
I developed RDFIO as part of Google Summer of Code 2010, and it got into a working proof-of-concept state. Some issues, such as with performance, never were resolved though. Also it depended upon two other modules, which, after Semantic MediaWiki changed a lot of it's internals in version 1.6, have still not been updated to support these, leaving RDFIO in a state where it does not support SMW 1.6.
So, a little sprint is definitely needed, to get RDFIO in working condition again. At the same time I hope to look at it with fresh eyes, after having a lot more coding experience now, than when I coded RDFIO, after one year of quite some Java and Python development in my work at UPPMAX.
Things I plan to have a look at (or at least ponder):
I would love to get some feedback and input to the project during this intensive week, so don't hesitate to drop in at #semantic-mediawiki on (IRC chat) or in the SMW-devel mailing list! My contact options, summarized:
Looking forward to your input during this week!
The original use case behind the RDFIO Semantic MediaWiki Extension which I developed as part of "Google Summer of Code 2010", and which was to hook up SMW with Bioclipse, is now concretizising. By using the new Bioclipse SMW Module (code here) it is now for the first time possible to add and remove SMW facts from inside Bioclipse, using a little Bioclipse JS Script:
var wikiURL = ""; smw.addTriple( "w:Caffeine", "w:is_a", "w:Molecule", wikiURL );
Removing triples is similar:
var wikiURL = ""; smw.removeTriple( "w:Caffeine", "w:is_a", "w:Molecule", wikiURL );
Well, you can use full URI:s also, but using the "w" prefix references wiki article titles directly. Thus you can view the result of the addition at
What does this mean? Well, one thing is that with Bioclipse you can edit facts in SMW with the ease and power of javascript! This could enable scenarios where an SMW gets prepopulated with data for subsequent community editing, whereafter data can be transferred back to Bioclipse again, (as blogged about by Egon Willighagen already), possibly making community editing of scientific data mainstream!
There's a little convenience method for getting all RDF data from the SMW too:
rdfStore = rdf.createInMemoryStore(); rdfStore = smw.getRDF( "" );
... which you can then query locally with SPARQL, using the rdf manager:
result = rdf.sparql( rdfStore, "SELECT DISTINCT ?p WHERE { ?s ?p ?o } LIMIT 10" ) js.print( result );
...getting some output like so:
[["p"], [""], [""], [""], [""], [""], [""], [""], [""], [""], [""] ]
That's it! Oh, well, I demonstrated the same thing with a screencast as well:
Yay! :) As getting this to work, we ran into a number of bugs in RDFIO, so that also resulted in a new release
Version 0.5.0 of RDFIO, the MediaWIki extension providing PHP-based SPARQL endpoint and RDF import capabilities to to Semantic MediaWiki (and previously developed as part of my GSoC 2010 project ), is now released.
The 0.5.0 release fixes numerous bugs that were encountered as me and Egon Willighagen were working to hook up SMW with Bioclipse, as I blogged about earlier. We have now got this connection up running, so hopefully we tracked down most of the relevant bugs. A shortlist of changes can be found in the changelog. Links to download plus install instructions to be found on the extension page.
I hope to blog/screencast about the new Bioclipse->SMW editing functionality shortly.
Now that the autumn is here I have some new projects starting, and running for approximately this month (before jumping out into the real world and trying to get a real job :) ). Good thing is, it all builds upon previous work.
First thing is, I'll work with Egon Willighagen to hook up Bioclipse with Semantic MediaWiki, via the RDFIO extension, which I developed as part of GSoC this year, mentored by Denny Vrandecic. I'm excited about putting RDFIO into some real world usage!
Second thing is I'll work part time at the Bioclipse group to improve the ways user documentation for plugins is authored and published, enabling some automation of publishing content to the Bioclipse website as well as the wiki etc. This will hopefully make it a lot easier for end users to find their desired extra functionality, and make more users see the value of a research platform with an open and modular structure, as is Bioclipse!
With the release of RDFIO 0.4.0, my GSoC 2010 project is now over!
I want to thank especially my mentor Denny Vrandečić, and also the SMW community at large for a great time! I also want to sincerely thank my masters project mentor Egon Willighagen who mentioned about, and encouraged me to apply to the program. Without this encouragement, I'd never taken the step. It has been a good time and rewarding, and I've much enjoyed to have time to get a bit into MediaWiki/SMW extensions development, as well as to provide some new functionality to these great bits of software.
The main GSoC coding is now over, and I will need to take a little break for an exam this friday, but surely I'll continue to refine the RDFIO extension later, especially as Egon and me are looking into using it to integrate Bioclipse with SMW in order to make RDF data in Bioclipse "Community editable" (could turn out to be some real useful stuff!).
This new release brings a lot of refactoring and reworkings under the hood, as well as quite a few minor bugfixes and improvements here and there, so upgrading is recommended. We also got the issues in SMWWriter fixed now, so patching it is no longer needed, which hopefully will make installing easier!
Of the more notable changes are the improved selection of wiki titles on import, as described in this blog post. Another important fix is that the default output format (if not specifying any) is now "SPARQL Resultset XML" which now makes the SPARQL endpoint fully "SPARQL compliant" and queryable from typical SPARQL tools like Jena. It is a remaining topic though how to allow update operations without leaving the endpoint wide open ... i.e. how to implement some form of user rights checking, when used as a webservice.
A little technical note also is that RDFIO now takes the $wgDBprefix parameter into account, so if you are using RDFIO with table prefixes in the database, you will need to regenerate the tables and the triples in the store (can be done at the Special:ARC2Admin and Special:SMWAdmin pages respectively).
I should not end without a note about some great bits of existing code that I've had the pleasure to make use of:
This week I just finished the last remaining items on my todo list for my Google Summer of Code project, (which is available in the form of the RDFIO MediaWiki extension). Those things, which I also mentioned in my last blog post were to:
Regarding the first point, it might not be overly easy to see the usefulness of it at once, so I just created a screencast to show the difference between using it and not:
It demonstrates the problem of choosing sensible wiki titles for general RDF entities in case no good property for naming is available, (such as rdfs:label etc) ... since "entity" URIs often just consist of nonsensible id:s and often no namespace prefixes are defined for them. RDFIO lets you add "pseudo" namespaces (using a simplified splitting pattern, not necessarily consistent with XMLns specs), in order to come around this problem.
Hopefully I'll find time to also demonstrate the second point above, as well as the "filter by ontology" feature for the SPARQL endpoint, with screencasts early next week.
Otherwise, the coming week I'll use for doing some refactoring of the currently quite unmanageable code, as well as add commenting, and hopefully also add the feature to filter RDF export by a [[Export RDF::false]] SMW property (which was the "it time permits" item of my TODO list).