Samuel Lampa's blog

Performance comparison #2, Simple 13C Spectrum Similarity Search


Bioclipse code

var start2 = new Date().getTime();
// js.say(blipkit.queryProlog( [ "findMoleculeWithPeakValuesNear", "100", "[23.3, 23.3, 23.5, 23.5, 26.1, 60.5, 90.0, 132.1, 0]", "Molecules" ] ));
js.say(blipkit.queryProlog( [ "findMoleculeWithPeakValuesNear", "100", "[12.5, 13.8, 23.8, 36.5, 44.3, 78.8, 87.3, 133.8, 0]", "Molecules" ] ));
var elapsed2 = (new Date().getTime() - start2)/1000;
js.say("Total time for finding molecule by shift values (Near-search): " + elapsed2 + " s");

Very good Prolog intro

Fond a very good Prolog intro. (Should of course better have learned prolog long ago... but didn't find no course for it before, and it's only now that I start to have aconcrete use for it ... )

It made me for the first time understand the meaning of bagof/3 and setof/3, since they compare it towards the easier-to-understand findall/3.


Regex to split up instensity and multiplicity in nmrshift data

sed -r 's/(<.*?hasMultiplicity[^<>]*?>)([0-9](\.[0-9]+)?)(\w)(<.*?>)/\1\4\5\n    <nmr:hasIntensity rdf:datatype="xsd:float">\2<\/nmr:hasIntensity>/' nmrshiftdata.r2.rdf.xml > nmrshiftdata.r3.rdf.xml

File based RDF storage in Pellet, first tests

As reported in a previous blog post, I ran into java stacksize errors when importing large amounts of data into pellet. Pellet was using just the in-memory Jena RDF store, which obviously puts limits on the amount of data it can handle.

Jena offers other options for RDF storage though, including SDB for SQL backends, and TDB for a pure Java file based storage.

Initial performance comparison: Pellet vs Prolog in Bioclipse

I started with some initial performance testing for RDF data, between pellet an prolog, which are now both available integrated in Bioclipse.

Bioclipse manager method to take arbitrary number of arguments

I needed a Bioclipse manager method that could take an arbitrary number of arguments, (for a general purpose prolog method mapper). Through a useful discussion with jonalv, we figured out that there exists at least one working way of doing this, while there are a number of ways that do not work across both the Rhino/JavaScript though they work in Java alone.

Catching up on NMR Spectroscopy

Looking into some NMR usecases for my project, I realize my knowlegde from the Organic Chemistry course some four years ago has quite much faded away. Luckily I found this excellent NMR intro lecture on YouTube:


A usage strategy emerges

A strategy for how to work with the Bioclipse/JPL/Prolog/Blipkit combination I'm setting up, is becoming clear.

The main idea with Bioclipse, as well as with having a prolog engine available in it, is for flexible and "interactive" knitting together of knowledge. One of the main questions regarding how to use a Bioclipse/JPL/Prolog/Blipkit combination, has been where to put the bulk of knowledge integration/reasoning code? There would in principle be three options for that:

  1. Bioclipse (Javascript environment)
  2. The Blipkit-Prolog/Bioclipse integration plugin (Java code, a.k.a. "Manager methods")
  3. The prolog engine (As a prolog file)

Problems to solve

These are a few technical (and other types) of problems that I'm now realizing I will have to solve, sooner or later:

  • How to output data? As array/N3/RDF?
  • Output everything, or only one solution at a time? (Use flags, for different options?)
  • How to indicate that a number should be treated as a literal? (For now I'm using the following syntax: [00.0], but it might clash with other uses of the same character.
  • Make IFile searchpaths resolvable to full paths, which Prolog can read.
  • How to output if you are using more than one variable in a prolog expression? (Multiple arrays?, Multidimensional arrays?).

Converting RDF predicates to Prolog convenience methods with RegEx

Using SWI-Prolog's semweb package, I had extracted all predicates in a RDF source, containing some 1 million triples, into the following list: