Calling Java from Python without the JVM startup latency: NailGun and JPype

Even though Python is a great platform with tons of libraries available, there are still a whole universe of functionality still available only in the Java world, and so, you will sometimes need to bridge the worlds together.

The easiest solution to do this would be to do a simple subprocess call to invoke the java code via a system call. This does not need any external libraries at all. The downside is that the startup time of the Java Virtual Machine (JVM), which is at least around 100 milliseconds for a simple hello world type of program.

What do do then?

How to force update of the layout of a Eclipse JFace Wizard?

Add this to it's onEnterPage() method:

getShell().layout(true, true);

For more info, see this StackOverflow question:

Looping over ArrayList in Java

for (String str : someStringArrayList) {
    System.out.println("String: " + str);


(E)BNF parser for (parts of) the Galaxy ToolConfigs with ANTLR

As blogged earlier, I'm currently into parsing the syntax of some definitions for the parameters and stuff of command line tools. As said in the linked blog post, I was pondering whether to use the Galaxy Toolconfig format or the DocBook CmdSynopsis format. It turned out though Well, that cmdsynopsis lacks the option to specify a list of valid choices, for a parameter, as is possible in the Galaxy ToolConfig format (see here), and thus can be used to generate drop-down lists in wizards etc. which is basically what I want to do ... so, now I'm going with the Galaxy format after all.

Enter the Galaxy format then. Look at an example code snippet:

<tool id="sam_to_bam" name="SAM-to-BAM" version="1.1.1">
  <description>converts SAM format to BAM format</description>
    <requirement type="package">samtools</requirement>
  <command interpreter="python">
      #if $source.index_source == "history":
      #end if
    <conditional name="source">
      <param name="index_source" type="select" label="Choose the source for the reference list">
        <option value="cached">Locally cached</option>
        <option value="history">History</option>
      <when value="cached">
      ... cont ...

Here I've got some challenges. XML parsing is easy, even in Java (I use the Java XPath libs for that). But look inside the <command> tag ... that's some really non-xml stuff, no? (it is instructions for a python based template library, used in galaxy). I have to parse this though, in order to replicate the logic of it ... so what to do? ... well, I turned to the ANTLR Parser Generator.

ANTLRWorks works nicely out of the box

I heard a lot of good things about ANTLR, like that it is more easily debugged than typical BNF parsers etc, so the choice wasn't that hard. I tried the ANTLR for Eclipse, but though it looks nice, it that was quite buggy, and I couldnt get it to work properly in neither Eclipse 3.5 or 3.6. So, finally I went with the easy option and developed my EBNF grammar in ANTLRWorks, which is an integrated Java App, with the correct ANTLR lib already installed etc. Turned out to work really good!

The grammar I came up with so far (only for the syntax inside the <command> tag so far, though!) is available on GitHub ... and below (in condensed syntax to save some space), for you convenience :)

grammar GalaxyToolConfig;
options {output=AST;}
command    : binary (ifstatement param+ (ELSE param+)? ENDIF | param)*;
binary     : WORD;
WORD    : ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'.'|'_'|'0'..'9')*;
        : '$'('{')?WORD('}')?;
STRING  : '"'('a'..'z'|'A'..'Z')+'"';
IF      : '#if';
ELSE    : '#else';
ENDIF   : '#end if';
EQ      : '=';
EQTEST  : '==';
DBLDASH : '--';
COLON   : ':';
WS      : (' '|'\t'|'\r'|'\n') {$channel=HIDDEN;};

Suggestions for improvements? :) ... Then go ahead and mail me ... samuel dot lampa at gmail dot com)

Also, see a little screenshot from ANTLRWorks below:

ANTLRWorks Screenshot

As you can see in the screenshot, the different parts have correctly been identified as "param", "if statement" and so forth. You can se also how I can click in the test syntax, to see where in the parse tree that actual part appears.

When done, I just exported the resulting parser code in ANTLRWorks with "Generate > Generate Code", copied the code from the "output" folder into my Eclipse project, added the antlr-3.3 jar into the build path of it, and then ran the file that comes with the output.

I wanted to do a little more parsing in my test though, so I ended up with this little test code:

package net.bioclipse.uppmax.galaxytoolconfigparser;
import org.antlr.grammar.v3.*;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.TokenStream;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.DOTTreeGenerator;
import org.antlr.runtime.tree.Tree;
import org.antlr.runtime.tree.TreeAdaptor;
import org.antlr.stringtemplate.StringTemplate;
public class ParseTest {
    // Generated stuff from ANTLR, which I can use to recognize token types   
    public static final int EOF=-1;
    public static final int ELSE=4;
    public static final int ENDIF=5;
    public static final int WORD=6;
    public static final int IF=7;
    public static final int STRING=8;
    public static final int VARIABLE=9;
    public static final int EQTEST=10;
    public static final int COLON=11;
    public static final int DBLDASH=12;
    public static final int EQ=13;
    public static final int WS=14;
    public static void main(String[] args) throws RecognitionException {
        String testString = "" 
                + "      --input1=$source.input1\n"
                + "      --dbkey=${input1.metadata.dbkey}\n"
                + "      #if $source.index_source == \"history\":\n"
                + "        --ref_file=$source.ref_file\n" 
                + "      #else\n"
                + "        --ref_file=\"None\"\n" 
                + "      #end if\n"
                + "      --output1=$output1\n"
                + "      --index_dir=${GALAXY_DATA_INDEX_DIR}\n"; 
        CharStream charStream = new ANTLRStringStream(testString);
        GalaxyToolConfigLexer lexer = new GalaxyToolConfigLexer(charStream);
        TokenStream tokenStream = new CommonTokenStream(lexer);
        GalaxyToolConfigParser parser = new GalaxyToolConfigParser(tokenStream, null);
        System.out.println("Starting to parse ...");
        // GalaxyToolConfigParser.command_return command = parser.command();
        CommonTree tree = (CommonTree)parser.command().getTree();
        System.out.println("Done parsing ...");
        int i = 0;
        while (i<tree.getChildCount()) {
            Tree subTree = tree.getChild(i);
            System.out.println("Tree child: " + subTree.getText() + ", (Token type: " + subTree.getType() + ")");
        // Generate DOT Syntax tree
        //DOTTreeGenerator gen = new DOTTreeGenerator();
        //StringTemplate st = gen.toDOT(tree);
        //System.out.println("Tree: \n" + st);

... generating this output:

Starting ...
Done executing command ...
Subtree text:, (Token type: 6)
Subtree text: --, (Token type: 12)
Subtree text: input1, (Token type: 6)
Subtree text: =, (Token type: 13)
Subtree text: $source.input1, (Token type: 9)
Subtree text: --, (Token type: 12)
Subtree text: dbkey, (Token type: 6)
Subtree text: =, (Token type: 13)
Subtree text: ${input1.metadata.dbkey}, (Token type: 9)
Subtree text: #if, (Token type: 7)
Subtree text: $source.index_source, (Token type: 9)
Subtree text: ==, (Token type: 10)
Subtree text: "history", (Token type: 8)
Subtree text: :, (Token type: 11)
Subtree text: --, (Token type: 12)
Subtree text: ref_file, (Token type: 6)
Subtree text: =, (Token type: 13)
Subtree text: $source.ref_file, (Token type: 9)
Subtree text: #else, (Token type: 4)
Subtree text: --, (Token type: 12)
Subtree text: ref_file, (Token type: 6)
Subtree text: =, (Token type: 13)
Subtree text: "None", (Token type: 8)
Subtree text: #end if, (Token type: 5)
Subtree text: --, (Token type: 12)
Subtree text: output1, (Token type: 6)
Subtree text: =, (Token type: 13)
Subtree text: $output1, (Token type: 9)
Subtree text: --, (Token type: 12)
Subtree text: index_dir, (Token type: 6)
Subtree text: =, (Token type: 13)
Subtree text: ${GALAXY_DATA_INDEX_DIR}, (Token type: 9)

... seemingly I have the stuff I need, for doing some logic parsing now! :)

Some words about BNF

ANTLR is an (E)BNF parser generator. I had heard a little about BNF before, and was more or less scared off from the topic, thinking it looked too advanced, but really, I found it isn't that hard at all!

It strikes me that BNF is quite much RegEx but with functions added, which allows for recursive pattern matching, which you'll need for anything more advanced, such as nested braces/xml tags etc ... but as you can see in the example above also, much of the pattern matching syntax actually has big similarities to RegEx.

In terms of tutorials, for the (E)BNF/ANTLR combo at least, I'd highly recommend this set of screencasts on using ANTLR in Eclipse. Though I didn't use the Eclipse version, these screencasts quickly give you an idea of how it all works ... I watched at least a bunch of them, and I'm happy I did.

Opening a remote file selection dialog with the RSE for Eclipse

This was easier than expected. Helped by the RSE File UI API Docs and this forum post, I figured out how to do:

SystemRemoteFileDialog dialog = new SystemRemoteFileDialog(SystemBasePlugin.getActiveWorkbenchShell());;
IRemoteFile file = (IRemoteFile) dialog.getSelectedObject();
System.out.println("Selected file's absolute path: " + file.getAbsolutePath());

Now also committed!

Update: Using proper interface for dealing with remote files (commit).

3rd Project Update (Integrating SWI-Prolog for Semantic Reasoning in Bioclipse)

I just had my 3rd, and last project update presentation (before the final presenation on April 28th), presenting results from comparing the performance of the integrated SWI-Prolog against Jena and Pellet, for a spectrum similarity search query. Find the sldes below.

Eclipse boot process, classloading, incorporation of native code etc.

This was a very good article on classloading and booting etc. in Bioclipse, with indispensable hints for how to best incorporate native code in eclipse plugins.


Java/Prolog interface (SWI Prolog JPL) up running!

Install notes for SWI Prolog JPL on Ubuntu Jaunty (9.04)

JPL is a bi-directional Java/Prolog interface for SWI-Prolog, which I hope to be able to use for integrating Blipkit into Bioclipse, so I'm happy to have got it up running tonight. Below are some notes from the installation procedure.

NOTE: These are still rather incomplete notes about how to get this to work.

Starting work on creating a Bioclipse manager

Will now start the work of creating a Bioclipse manager for the integration of DR-PROLOG into Bioclipse.

Egon pointed me to a blog post about a feature they've made in order to make it easier for us who are new to Bioclipse development to get started. Nice, det tackar vi för!