Semantic MediaWiki

New Google+ communities: Bioclipse, Cheminformatics, Semantic MediaWiki

After spending countless hours on Google+, I have realized how useful their new group feature is, for sharing/finding interesting stuff happening around different technologies. It's nice handling of previews of movies, images, webpages etc makes it so much easier to spot interesting stuff. IMO it works FAR better than e.g. twitter, for this.

E.g. by subscribing to groups for the topics you are interested in (I have over 50...), I get tons of interesting stuff on your Google+ home page all the time.

While Google+ groups definitely don't replace mailing lists and IRC, which are superior for discussions, it is a great complement for sharing interesting stuff happening around a technology.

With this in mind, during the last week or so, I've tried to make sure that a few of my favourite softwares and topics have groups, which resulted in a few new ones:

... so make sure to join those of these that you like, and post some interesting stuff there! :)

SMWCon Fall 2011 impressions

(I had this post in draft for too long. Time to publish, as is)

The Semantic MediaWiki conference Fall 2011 in Berlin is over, so time to summarize some thoughts and impressions.

It was my first SMWCon at all. A bit late regarding that I did a Google Summer of Code project for SMW in 2010, but my finances were kind of inexistent then. Happy to get the chance now though.

Before turning to some of the individual talks, just a note of two general things I found interesting (I wish I would have time to review each of them, since there was so much interesting stuff...):

  1. There were a remarkable amount of talks on connecting SMW with the rest of the Semantic Web, through RDF, SPARQL etc. Cool, SMW is seemingly becoming a natural choice of platform for SemWeb publishing!
  2. The proportion of bio-people were also a bit remarkable. Apart from SNPedia founder Mike Cariaso, there were a whole bunch of others, including Salvadore from the GeneWiki project, Toni .... (and me) .... I guess it reflects how good SMW handles the need to give structure to very heterogenous datasets, so typical for the Life Sciences.

The talks

Note that you can now find slides and videos for most of the talks, online:

The conference started with one tutorial day, followed by the main days for talks. The tutorials turned out to be so interesting though, so that most people seemed to attend them as well.

Find below my very brief notes/impressions on some talks that I found specially interesting, for my use cases and interests:

Very nice UIs

Daniel Hansh from OntoPrise showed off their SMW+ Community Edition package, which includes SMW, Halo and other extensions. This is quite cool stuff, with really helpful and slick UIs, so let's hope it will remain open source! :) (Slides, Video)

Performance optimizations in SMW

Markus Krötzchs talked on "Saving C02: Top SMW Performance Issues and How to Address Them". The slides are cram full of link to more detailed in formation, so this I'll have to study in more detail. (Slides , Video)

SMW reworkings for better RDF support

As said, there were lots of RDF talks on the conf. One of which is some reworkings of the SMW internals to support the RDF and SPARQL models better. Markus Krötzch gave an overview of the role between SMW and RDF in his talk "Connecting SMW to RDF Databases: Why, What, and How?" (Slides)

Keeping track of changes, even at the fact level!

Jeroen De Dauw presented a new extension: "Semantic Watchlist", to replace a number of (in Jeroens opinion, somehow lacking) extensions. Looks very welldone! (Slides, Video)

Powerful transforming data from RDF on-demand

William Smith, Christian Becker and Andreas Schultz presented some very cool sutff: "Neurowiki: How we integrated large datasets into SMW with R2R and Silk / LDIF". The LDIF framework seems to do a lot of what RDFIO does, but in a much bigger and more capable framework. Really cool stuff! (Slides 1, Slides part 2, Video)

SMW classes as XML

Yaron Koren presented an idea to store "classes" in SMW as XML/JSON in one single location, rather than as now, in three different places (Category, Template and Form). (Slides, Video)

Towards a Semantic Wikipedia!

Denny Vrandečić and Daniel Kinzler presented the WikiData extension, as part of their work to make a "Semantic Wikipedia" a reality ... and based on another nice demo-project they did: Shortipedia. This is gonna be hot stuff! (Slides, Video)

Linked Data - increasingly important topic for SMW

Anja Jentzsch from the LODD presented Linked Data, and the best practices for how to publish RDF data, so that you really "get connected" to the evolving Semantic Web. An increasingly important topic, as shown by the increased interest in connecting SMW with the outside world. (Slides, Video)

RDFIO / Hooking up SMW with external tools via SPARQL

My talk ... (as blogged earler)

More RDF to Wiki Page title mapping strategies

Michael Erdmann also presented how they do Data integration with SMW+ and OntoBroker. They interestingly use a similar strategy as RDFIO in order to nicefy wiki page titles. Interesting! ... maybe there is a way to consolidate all these efforts in a reusable way? (Slides, Video)

More RDF to Wiki Page title mapping strategies

Jeff Pan talked on "Tractable Reasoning". Very interesting! They focus on "making reasoning reasonable", that is, computationally feasible ... and seem to have succeeded as well, their REL reasoner has shown to totally outperform reasoners such as Pellet for more or less any kind of ontology, cool! (I found out it's quite easy to beat pellet though, earlier, with SPARQL/ARC, and especially with PROLOG). (Video)

More RDF to Wiki Page title mapping strategies

Markus Krötzsch and Jeroen De Dauw talked about the next steps for Semantic MediaWiki. Many great things happening: Foundation started, ... (Slides, Video)

Excel-like statistics in SMW

Benedikt Kaempgen demonstrated some supercool stuff, something like a pivot browser for Semantic MediaWiki, in order to get more "Excel-like" statistics in SMW. They use the Spark extension by Jeroen, to query SPARQL endpoints and similar stuff, from javascript. Supercool! (Slides, Video)

SMW as a semantic browser

Benedikt also showed of their Semantic Web browser, that only requires "Equivalend URIs" to be defined for pages, and then let's you browse the data in the wiki in it's original format. Interesting since that matches perfectly with RDFIO, in that RDFIO complements this with also RDF export and querying in original format, using basically the same strategy! (Slides, Video)

SNPedia

Mike Cariaso talked about what's new with SNPedia ... lot's of cool stuff (apart from how cool SNPedia is just in itself!) (Video)

Lightning talks

Not to forget, there were also a whole bunch of very interesting lightning talks ... too many for me to have time to cover here now. One thing you really should not miss though, is the SPARK extension by Jeroen De Dauw, to query SPARQL endpoints via javascript, for visualizations. Wow! (Slides).

Also, for you Bio-people readint this, the SNPedia + GeneWiki mashup, you'll probably find interesting! (Video)

Well, you better watch them all anyway, they are only 5 minutes each, and there's just too much good stuff there, so I can't cover it all:

My SMWCon Fall 2011 Talk now on YouTube

I blogged it earlier, but better to get everything in one post, so taking the summary again:

After doing my GSoC project for Wikimedia foundation / Semantic MediaWiki in 2010, resulting in the RDFIO Extension, I finally could make it to the Semantic MediaWiki Conference, which was in Berlin in September.

Now, the video of my talk, "hooking up Semantic MediaWiki with external tools via SPARQL" (such as Bioclipse and R), is out on YouTube, so please find it below. For your convenience, you can find the slides below the video, as well as the relevant links to the different stuff shown (click "read more" to see it all on same page).

One week off to work on RDFIO

From today evening, I'm taking one week off from work to make a sprint to try to finalize the RDFIO extension, for RDF import and export in Semantic MediaWiki.

This will be a required step for finalizing the vision described in my SMWCon fall 2011 talk, the other month.

I developed RDFIO as part of Google Summer of Code 2010, and it got into a working proof-of-concept state. Some issues, such as with performance, never were resolved though. Also it depended upon two other modules, which, after Semantic MediaWiki changed a lot of it's internals in version 1.6, have still not been updated to support these, leaving RDFIO in a state where it does not support SMW 1.6.

So, a little sprint is definitely needed, to get RDFIO in working condition again. At the same time I hope to look at it with fresh eyes, after having a lot more coding experience now, than when I coded RDFIO, after one year of quite some Java and Python development in my work at UPPMAX.

Things I plan to have a look at (or at least ponder):

  • Look at possiblity to use the Wiki Object Model instead of the Page Object Model / SMWWriter combo (which unfortunately is not SMW 1.6 compliant anymore)
  • See if/how I can make more use of the new infrastructure in SMW, summarized by Markus in this post in SMW devel mailinglist.
  • Take an overall fresh look at the architecture of the code ... try to follow Domain Driven Design principles much better, to get clean maintaineable code.
  • Use existing MediaWiki feature uch more, such as the HTMLForm form builder class (for Specialpages and the like).
    (More suggestions like this highly welcome!)
  • Import ARC2 library via the ARCLibrary extension rather than with a separate import.
  • Use the existing "Equivalent URI" special property, instead of the custom "Original URI" (don't remember why I created a custom one...)
  • Run big imports as jobs?
  • OWL class import (as categories) ?
  • Allow updating Wiki articles from any connected store, by using the new SMW Internals?
  • Other things?

I would love to get some feedback and input to the project during this intensive week, so don't hesitate to drop in at #semantic-mediawiki on irc.freenode.net (IRC chat) or in the SMW-devel mailing list! My contact options, summarized:

Looking forward to your input during this week!

My SMWCon Fall 2011 Talk

After doing my GSoC project for Wikimedia foundation / Semantic MediaWiki in 2010, resulting in the RDFIO Extension, I finally could make it to the Semantic MediaWiki Conference, which was in Berlin this week.

While I write up a longer review of the many interesting talks, you can in the meantime find the slides from my talk below, on "hooking up Semantic MediaWiki with external tools" (such as Bioclipse and R):

Links

  • For the SMW/Bioclipse Hookup there is a status update on my blog.
  • ... with a demo screencast.
  • More info on the RDFIO extension available on the Extension page
  • Code for the Bioclipse SMW module is available at github
  • Bioclipse website is at bioclipse.net
  • ... and the (SMW) Bioclipse wiki
  • The SMW/R hookup is not yet published in any journal, but this is what is available:
    • Egon Willighagen, who did it, has blogged it.
    • Also, the rrdf package he wrote, is is available in the CRAN, and there's a PDF available, describing it.

Editing Semantic MediaWiki from Bioclipse (with Screencast)

The original use case behind the RDFIO Semantic MediaWiki Extension which I developed as part of "Google Summer of Code 2010", and which was to hook up SMW with Bioclipse, is now concretizising. By using the new Bioclipse SMW Module (code here) it is now for the first time possible to add and remove SMW facts from inside Bioclipse, using a little Bioclipse JS Script:

var wikiURL = "http://drugmet.rilspace.org/wiki/";
smw.addTriple( "w:Caffeine", "w:is_a", "w:Molecule", wikiURL );

Removing triples is similar:

var wikiURL = "http://drugmet.rilspace.org/wiki/";
smw.removeTriple( "w:Caffeine", "w:is_a", "w:Molecule", wikiURL );

Well, you can use full URI:s also, but using the "w" prefix references wiki article titles directly. Thus you can view the result of the addition at http://drugmet.rilspace.org/wiki/Caffeine

What does this mean? Well, one thing is that with Bioclipse you can edit facts in SMW with the ease and power of javascript! This could enable scenarios where an SMW gets prepopulated with data for subsequent community editing, whereafter data can be transferred back to Bioclipse again, (as blogged about by Egon Willighagen already), possibly making community editing of scientific data mainstream!

There's a little convenience method for getting all RDF data from the SMW too:

rdfStore = rdf.createInMemoryStore();
rdfStore = smw.getRDF( "http://drugmet.rilspace.org/wiki/" );

... which you can then query locally with SPARQL, using the rdf manager:

result = rdf.sparql( rdfStore, 
                     "SELECT DISTINCT ?p WHERE { ?s ?p ?o } LIMIT 10" )
js.print( result );

...getting some output like so:

[["p"],
["http://www.w3.org/1999/02/22-rdf-syntax-ns#type"],
["http://www.w3.org/2000/01/rdf-schema#domain"],
["http://www.w3.org/2000/01/rdf-schema#range"],
["http://www.w3.org/2000/01/rdf-schema#subPropertyOf"],
["http://www.w3.org/2000/01/rdf-schema#subClassOf"],
["http://semantic-mediawiki.org/swivt/1.0#wikiPageModificationDate"],
["http://semantic-mediawiki.org/swivt/1.0#wikiNamespace"],
["http://www.w3.org/2000/01/rdf-schema#isDefinedBy"],
["http://semantic-mediawiki.org/swivt/1.0#page"],
["http://www.w3.org/2000/01/rdf-schema#label"]
]

That's it! Oh, well, I demonstrated the same thing with a screencast as well:

Yay! :) As getting this to work, we ran into a number of bugs in RDFIO, so that also resulted in a new release

RDFIO 0.5.0 released

Version 0.5.0 of RDFIO, the MediaWIki extension providing PHP-based SPARQL endpoint and RDF import capabilities to to Semantic MediaWiki (and previously developed as part of my GSoC 2010 project ), is now released. 

The 0.5.0 release

The 0.5.0 release fixes numerous bugs that were encountered as me and Egon Willighagen were working to hook up SMW with Bioclipse, as I blogged about earlier. We have now got this connection up running, so hopefully we tracked down most of the relevant bugs. A shortlist of changes can be found in the changelog. Links to download plus install instructions to be found on the extension page.

I hope to blog/screencast about the new Bioclipse->SMW editing functionality shortly.

Autumn is here - new projects starting

Now that the autumn is here I have some new projects starting, and running for approximately this month (before jumping out into the real world and trying to get a real job :) ). Good thing is, it all builds upon previous work.

First thing is, I'll work with Egon Willighagen to hook up Bioclipse with Semantic MediaWiki, via the RDFIO extension, which I developed as part of GSoC this year, mentored by Denny Vrandecic. I'm excited about putting RDFIO into some real world usage!

Second thing is I'll work part time at the Bioclipse group to improve the ways user documentation for plugins is authored and published, enabling some automation of publishing content to the Bioclipse website as well as the wiki etc. This will hopefully make it a lot easier for end users to find their desired extra functionality, and make more users see the value of a research platform with an open and modular structure, as is Bioclipse!

RDFIO 0.4.0 released - GSoC Finished!

With the release of RDFIO 0.4.0, my GSoC 2010 project is now over!

I want to thank especially my mentor Denny Vrandečić, and also the SMW community at large for a great time! I also want to sincerely thank my masters project mentor Egon Willighagen who mentioned about, and encouraged me to apply to the program. Without this encouragement, I'd never taken the step. It has been a good time and rewarding, and I've much enjoyed to have time to get a bit into MediaWiki/SMW extensions development, as well as to provide some new functionality to these great bits of software.

The main GSoC coding is now over, and I will need to take a little break for an exam this friday, but surely I'll continue to refine the RDFIO extension later, especially as Egon and me are looking into using it to integrate Bioclipse with SMW in order to make RDF data in Bioclipse "Community editable" (could turn out to be some real useful stuff!).

The 0.4.0 release

This new release brings a lot of refactoring and reworkings under the hood, as well as quite a few minor bugfixes and improvements here and there, so upgrading is recommended. We also got the issues in SMWWriter fixed now, so patching it is no longer needed, which hopefully will make installing easier!

Of the more notable changes are the improved selection of wiki titles on import, as described in this blog post. Another important fix is that the default output format (if not specifying any) is now "SPARQL Resultset XML" which now makes the SPARQL endpoint fully "SPARQL compliant" and queryable from typical SPARQL tools like Jena. It is a remaining topic though how to allow update operations without leaving the endpoint wide open ... i.e. how to implement some form of user rights checking, when used as a webservice.

A little technical note also is that RDFIO now takes the $wgDBprefix parameter into account, so if you are using RDFIO with table prefixes in the database, you will need to regenerate the tables and the triples in the store (can be done at the Special:ARC2Admin and Special:SMWAdmin pages respectively).

I should not end without a note about some great bits of existing code that I've had the pleasure to make use of:

  • ARC, the PHP RDF library I've been using, and on which RDFIO is very heavily dependent
    I found it very powerful and super conveniant to work with!
  • I enjoyed making use of the SMWWriter and PageObjectModel extensions, which also definitely made life easier for me and saved me tons of work.

Sensible wiki titles on RDF import with "pseudo RDF namespaces"

This week I just finished the last remaining items on my todo list for my Google Summer of Code project, (which is available in the form of the RDFIO MediaWiki extension). Those things, which I also mentioned in my last blog post were to:

  • Add ability to use ("pseudo") namespaces for general RDF entities (non-properties) in order to choose wiki titles for them on RDF import.
  • Add a screen that shows URIs lacking a namespace prefix to abbreviate it with.

Regarding the first point, it might not be overly easy to see the usefulness of it at once, so I just created a screencast to show the difference between using it and not:

It demonstrates the problem of choosing sensible wiki titles for general RDF entities in case no good property for naming is available, (such as rdfs:label etc) ... since "entity" URIs often just consist of nonsensible id:s and often no namespace prefixes are defined for them. RDFIO lets you add "pseudo" namespaces (using a simplified splitting pattern, not necessarily consistent with XMLns specs), in order to come around this problem.

  • The new functionality is so far only available in the svn trunk
  • More info, install instructions etc on the extension page

Hopefully I'll find time to also demonstrate the second point above, as well as the "filter by ontology" feature for the SPARQL endpoint, with screencasts early next week.

Otherwise, the coming week I'll use for doing some refactoring of the currently quite unmanageable code, as well as add commenting, and hopefully also add the feature to filter RDF export by a [[Export RDF::false]] SMW property (which was the "it time permits" item of my TODO list).