planet bioclipse

New Google+ communities: Bioclipse, Cheminformatics, Semantic MediaWiki

After spending countless hours on Google+, I have realized how useful their new group feature is, for sharing/finding interesting stuff happening around different technologies. It's nice handling of previews of movies, images, webpages etc makes it so much easier to spot interesting stuff. IMO it works FAR better than e.g. twitter, for this.

E.g. by subscribing to groups for the topics you are interested in (I have over 50...), I get tons of interesting stuff on your Google+ home page all the time.

While Google+ groups definitely don't replace mailing lists and IRC, which are superior for discussions, it is a great complement for sharing interesting stuff happening around a technology.

With this in mind, during the last week or so, I've tried to make sure that a few of my favourite softwares and topics have groups, which resulted in a few new ones:

... so make sure to join those of these that you like, and post some interesting stuff there! :)

My SMWCon Fall 2011 Talk now on YouTube

I blogged it earlier, but better to get everything in one post, so taking the summary again:

After doing my GSoC project for Wikimedia foundation / Semantic MediaWiki in 2010, resulting in the RDFIO Extension, I finally could make it to the Semantic MediaWiki Conference, which was in Berlin in September.

Now, the video of my talk, "hooking up Semantic MediaWiki with external tools via SPARQL" (such as Bioclipse and R), is out on YouTube, so please find it below. For your convenience, you can find the slides below the video, as well as the relevant links to the different stuff shown (click "read more" to see it all on same page).

My SMWCon Fall 2011 Talk

After doing my GSoC project for Wikimedia foundation / Semantic MediaWiki in 2010, resulting in the RDFIO Extension, I finally could make it to the Semantic MediaWiki Conference, which was in Berlin this week.

While I write up a longer review of the many interesting talks, you can in the meantime find the slides from my talk below, on "hooking up Semantic MediaWiki with external tools" (such as Bioclipse and R):

Links

  • For the SMW/Bioclipse Hookup there is a status update on my blog.
  • ... with a demo screencast.
  • More info on the RDFIO extension available on the Extension page
  • Code for the Bioclipse SMW module is available at github
  • Bioclipse website is at bioclipse.net
  • ... and the (SMW) Bioclipse wiki
  • The SMW/R hookup is not yet published in any journal, but this is what is available:
    • Egon Willighagen, who did it, has blogged it.
    • Also, the rrdf package he wrote, is is available in the CRAN, and there's a PDF available, describing it.

HPC Client Screencast: Experimental Job Config Wizard

My work at UPPMAX, on the Bioclipse based HPC Client i is progressing, slowly but steadily. I just screencasted an experimental version of the job configuration wizard, which loads command line tool definitions from the Galaxy workbench, and use them to generate a GUI for configuring the parameters to the command line tool in question, as well as the parameters for the Slurm Resource manager (used at UPPMAX). Have a look if you want :) :

The Wizard obviously has quite some rough edges still. My current TODO is as follows:

  • Set sensible default values i widgets (i.e. when there is just 1 alt)
  • Use checkboxes and radiobuttons for select fields with few options
  • Use progress bar between wizard pages that takes time to load
  • Decide how to take care of the cheetah #if#lse#endif syntax, available in some galaxy tool config files.
  • Add validation
  • Use a time widget for the job time
  • Add a custom view with just a "connect" button, and showing only remote files for the configured host.
  • More modular loading of modules (hierarchical etc.)
  • More advanced parsing of options (i.e. allowing to omit params, rather than just saying "no" on them).

Etc ... More suggestions? :)

FIMS Project Status update - Thinking about CLI wrapper XML formats

Time for a little more "overview" like status update of the Bioclipse HPC Client part of the FIMS Project I'm working on for UPPNEX at UPPMAX.

Right now I'm hacking away on the batch job config wizard (just fixed a "select remote file" dialog, for file path parameters, which I actually even screencasted :=)).

Otherwise, I start coming to the stage when I need to do a decision about command line tool wrapper formats.

So far I've tried to use the Galaxy (bioinformatics portal) XML format, hoping to take advantage of the vast number of already wrapped bioinformatics tools (Actually, I'm using the format now - that is what drives the wizard in the Bioclipsescreencast above).

Unfortunately though, I figured out that most (if not all?) tool configs in Galaxy do not wrap the command line tool itself, but rather an accompanying script file (python/bash/perl), that does some additional stuff (different for each tool), which makes it hard to use the tool configs right away outside the galaxy platform.

So, realizing this, I'm considering whether we should go for something even more general, for this light-weight batch config wizard (which is not trying to be a complete replacement for Galaxy).

I actually just got to know about another such format (via a question on stack overflow). Apparently the docbook-package contains such a format! So, in case one would find that there are lots of ready-made such docbook-definitions for a bunch of cli tools already, then this could be quite interesting. ... so that's something I'll check in now. Otherwise, I maybe might as well stick with the Galaxy format (Have to admit though that the docbook one feels like a more general choice, in the general sense ... or what do you think?).

Then one could of course also have converters between the DocBook and Galaxy xml formats too ... should be pretty straight-forward with XSLT.

Ok, so that's where I am, and what I'm thinking about right now! Feel free to drop a line of feedback!

Got "select remote file" to work in the Cluster Job Config Wizard

As hinted about in this blog post, I'm working on a cluster batch job configuration wizard for Bioclipse, making heavy use of the TM / RSE components for Eclipse.

In the wizard, I of course wanted to be able to fire up a file selection dialog for filling in fields for file paths on the cluster. Only problem was that the RSE API is (as usual to Java projects) not of the simplest kind, and for a newcomer like me I found it a bit challenging to find where to start.

As pointed out in this short blog post, I finally found the (simple) solution, and here we go (the final code needed some additions though, but in principle it was simple):

Some more info and can be found at the project's wiki page.

Opening a remote file selection dialog with the RSE for Eclipse

This was easier than expected. Helped by the RSE File UI API Docs and this forum post, I figured out how to do:

SystemRemoteFileDialog dialog = new SystemRemoteFileDialog(SystemBasePlugin.getActiveWorkbenchShell());
dialog.open();
 
IRemoteFile file = (IRemoteFile) dialog.getSelectedObject();
System.out.println("Selected file's absolute path: " + file.getAbsolutePath());

Now also committed!

Update: Using proper interface for dealing with remote files (commit).

A graphical client for running bioinformatics tools on HPC clusters

This blog has been silent for a while and someone might wonder what I've been doing.

One answer is: Developing a graphical client for non-linux-experienced users to connect securely to a computer cluster and configure batch jobs for common bioinformatics software. The project is financed within the UPPNEX project, and so the focus is foremost analysis of Next Generation Sequencing data, but the client will be fully capable to use for any software installed on the cluster.

The client meets a rising need in the next generation sequencing community, since biologists generally have far less experience with *nix systems and programming, than, say physicists, while the vast amounts of sequencing data increases the need to use large scale computing resources such as the ones provided in the UPPNEX project.

I demonstrated a proof-of-concept version at the UPPNEX-SciLifeLab bioinformatics forum on Feb 22nd, and the slides are now available:

As can be seen in the slides, the client is based on the very capable Bioclipse platform.

Looping over a List<String> in Bioclipse's JS console

I had some trouble finding out how to loop over the results of a manager method in Bioclipse, which returns a List<String> to Bioclipse's javascript console. Since I didn't find it documented anywhere (probably it is, somewhere?), I wanted to ducument the snippet here:

var strings = myManager.methodReturningListOfStrings(someParams);
for (var i=0; i<strings.size(); i++) {
  js.say(strings.get(i));
}

Editing Semantic MediaWiki from Bioclipse (with Screencast)

The original use case behind the RDFIO Semantic MediaWiki Extension which I developed as part of "Google Summer of Code 2010", and which was to hook up SMW with Bioclipse, is now concretizising. By using the new Bioclipse SMW Module (code here) it is now for the first time possible to add and remove SMW facts from inside Bioclipse, using a little Bioclipse JS Script:

var wikiURL = "http://drugmet.rilspace.org/wiki/";
smw.addTriple( "w:Caffeine", "w:is_a", "w:Molecule", wikiURL );

Removing triples is similar:

var wikiURL = "http://drugmet.rilspace.org/wiki/";
smw.removeTriple( "w:Caffeine", "w:is_a", "w:Molecule", wikiURL );

Well, you can use full URI:s also, but using the "w" prefix references wiki article titles directly. Thus you can view the result of the addition at http://drugmet.rilspace.org/wiki/Caffeine

What does this mean? Well, one thing is that with Bioclipse you can edit facts in SMW with the ease and power of javascript! This could enable scenarios where an SMW gets prepopulated with data for subsequent community editing, whereafter data can be transferred back to Bioclipse again, (as blogged about by Egon Willighagen already), possibly making community editing of scientific data mainstream!

There's a little convenience method for getting all RDF data from the SMW too:

rdfStore = rdf.createInMemoryStore();
rdfStore = smw.getRDF( "http://drugmet.rilspace.org/wiki/" );

... which you can then query locally with SPARQL, using the rdf manager:

result = rdf.sparql( rdfStore, 
                     "SELECT DISTINCT ?p WHERE { ?s ?p ?o } LIMIT 10" )
js.print( result );

...getting some output like so:

[["p"],
["http://www.w3.org/1999/02/22-rdf-syntax-ns#type"],
["http://www.w3.org/2000/01/rdf-schema#domain"],
["http://www.w3.org/2000/01/rdf-schema#range"],
["http://www.w3.org/2000/01/rdf-schema#subPropertyOf"],
["http://www.w3.org/2000/01/rdf-schema#subClassOf"],
["http://semantic-mediawiki.org/swivt/1.0#wikiPageModificationDate"],
["http://semantic-mediawiki.org/swivt/1.0#wikiNamespace"],
["http://www.w3.org/2000/01/rdf-schema#isDefinedBy"],
["http://semantic-mediawiki.org/swivt/1.0#page"],
["http://www.w3.org/2000/01/rdf-schema#label"]
]

That's it! Oh, well, I demonstrated the same thing with a screencast as well:

Yay! :) As getting this to work, we ran into a number of bugs in RDFIO, so that also resulted in a new release