planet bioclipse
Surprise: Jena/SPARQL outperformed Prolog for spectrum similarity search
I was a bit worried over the performance of the RDF facilities in Bioclipse, as a SPARQL query for doing NMR Spectrum similarity search, including a numerical comparison (against datasets which are attached) were quite unsatisfactory, being some 2 orders of magnitude worse than some Prolog code I wrote for doing the same task (But of course, this might not be the way Pellet is optimized to work).
Anyway, when testing the same SPARQL query with Jena, I got some quite different results, as seen in the graph to the right; Jena in fact outperformed the prolog code! Interesting..
Test results
Pellet/SPARQL vs SWI-Prolog
So, while Prolog far outperformed Pellet for the task (at least when Pellet was to do it using SPARQL) ...
Jena/SPARQL vs SWI-Prolog
... at the same time Jena outperforms Prolog, running the same SPARQL query above. Interesting!

In Tabular form
| No of spectra | SWI-Prolog | Jena/SPARQL | Pellet/SPARQL |
| 10 | 0.06 | 0.01 | 9.50 |
| 20 | 0.10 | 0.02 | 14.22 |
| 30 | 0.14 | 0.02 | 15.39 |
| 40 | 0.17 | 0.02 | 31.52 |
| 50 | 0.18 | 0.03 | 32.56 |
| 60 | 0.23 | 0.02 | 43.93 |
| 70 | 0.26 | 0.02 | 51.05 |
| 80 | 0.30 | 0.02 | 55.60 |
| 90 | 0.36 | 0.02 | 57.96 |
| 100 | 0.42 | 0.08 | 66.49 |
Code
SPARQL Query
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
PREFIX nmr: <http://www.nmrshiftdb.org/onto#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s ?s1 ?s2 ?s3
WHERE {
?s nmr:hasPeak [ nmr:hasShift ?s1 ] ,
[ nmr:hasShift ?s2 ] ,
[ nmr:hasShift ?s3 ] .
FILTER ( fn:abs(?s1 - 17.6) < 0.3 &&
fn:abs(?s2 - 18.3) < 0.3 &&
fn:abs(?s3 - 22.6) < 0.3 )
} LIMIT 1"
Prolog code
findMolWithPeakValsNear( SearchShiftVals, Mols ) :-
% Pick the Mols in 'Mol', that match the pattern:
% listPeakShiftsOfMol( Mol, MolShiftVals ),
% containsListElemsNear( SearchShiftVals, MolShiftVals )
% and collect them in 'Mols'.
setof( Mol,
( listPeakShiftsOfMol( Mol, MolShiftVals ), % A Mol's shift values are collected
containsListElemsNear( SearchShiftVals, MolShiftVals ) ), % ...and compared against the given SearchShiftVals
[Mols|MolTail] ). % In 'Mols', all 'Mol's, for which their shift
% values match the SearchShiftVals, are collected.
% Given a 'Mol', give it's shiftvalues in list form, in 'ListOfPeaks'
listPeakShiftsOfMol( Mol, ListOfPeaks ) :-
hasSpectrum( Mol, Spectrum ),
findall( ShiftVal,
( hasPeak( Spectrum, Peak ),
hasShiftVal( Peak, ShiftVal ) ),
ListOfPeaks ).
% Compare two lists to see if list2 has near-matches for each of the values in list1
containsListElemsNear( [ElemHead|ElemTail], List ) :-
memberCloseTo( ElemHead, List ),
( containsListElemsNear( ElemTail, List );
ElemTail == [] ).
%%%%%%%%%%%%%%%%%%%%%%%%
% Recursive construct: %
%%%%%%%%%%%%%%%%%%%%%%%%
% Test first the end criterion:
memberCloseTo( X, [ Y | Tail ] ) :-
closeTo( X, Y ).
% but if the above doesn't validate, then recursively continue with the tail of List2:
memberCloseTo( X, [ Y | Tail ] ) :-
memberCloseTo( X, Tail ).
% Numerical near-match
closeTo( Val1, Val2 ) :-
abs(Val1 - Val2) =< 0.3.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Convenience accessory methods %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
hasShiftVal( Peak, ShiftVal ) :-
rdf_db:rdf( Peak, 'http://www.nmrshiftdb.org/onto#hasShift',
literal(type('http://www.w3.org/2001/XMLSchema#decimal',
ShiftValLiteral))),
atom_number_create( ShiftValLiteral, ShiftVal ).
hasSpectrum( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasSpectrum', Predicate).
hasPeak( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasPeak', Predicate).
% Wrapper method for the atom_number/2 method which converts atoms (string constants) to number.
% The wrapper methods avoids exceptions on empty atoms, instead converting into a zero.
atom_number_create( Atom, Number ) :-
atom_length( Atom, AtomLength ), AtomLength > 0 -> % IF atom is not empty
atom_number( Atom, Number ); % THEN Convert the atom to a numerical value
atom_number( '0', Number ). % ELSE Convert to a zero
Attached files
Attached are the datasets (nmrshiftdata10...100.rdf.xml) for the different number of spectra, and the Bioclipse JavaScript files for the tests (NMR.Pellet.js, NMR.Jena.js and NMR.Swipl.js, as well as NMR.Pellet.Init, which I used for preparing the RDF stores for both Jena and Pellet).
Screencast: Experimental Prolog integration in Bioclipse
Wed, 2010-03-03 20:51 | by samlI wanted to test out some screen casting, so I chose to demo the (still experimental) SWI-Prolog integration into Bioclipse, showing how Prolog code (or a "Prolog knowledge base") can conveniently be stored inside Bioclipse's JavaScript environment (in a JS variable), loaded into the prolog engine, and then queried, all from the JS environment, and finally the results can be returned as well to the Javascript environment for further processing or output.
Note that this is still at the experimental stage, so things are a bit rough around the edges!
Chemical ring and cage structures need rules (seemingly)
Mon, 2010-03-01 23:31 | by samlDuring my short stay at EBI, Egon had kindly arranged an opportunity to talk to Janna from the Steinbeck group, about some work she has done on searching for cage structures in molecules, using Prolog, so we met over a coffee, together with Nico who's now at the Steinbeck group (visiting).
Bot Nico and Janna kindly gave a lot of good advice about research in general (as I'm currently looking into possibly doing PhD somewhere), which I highly appreciated. :)
And we talked some about the original topic too :), that is the cage structure problem. The cage structure problem is kind of an extension to the problem of expressing rings, which has previously been reported as a problem for OWL-DL. So because of this, it is interesting that Janna came up with a working solution, using Prolog.
As a highlighting example from the DL-side, Michel Dumontier has done some work on representing molecules, including rings. But they also had to use rules, not plain OWL.
So that seem to be the general conclusion: In order to express ring structures (or extensions of it, such as cage structures), you'll need to use rules in some way.
Unfortunatly my project is now running out of time, so I might not have much time to look more into this topic as part of my project :(. Will see if I can include this as a part of another course I still have to finish ("knowledge based systems in bioinformatics"), but that remains to see.
NMR SPARQL search problem solved
Fri, 2010-02-19 22:20 | by samlThe problem I've had, to find a working SPARQL query for doing similarity search for spectra, according to a list of peak shift values (documented on this blog: post1, post 2 and on semanticoverflow) is now finally solved, thanks to helpful advice from Brandon Ibach on the [pellet-users] mailing list.
"Orthogonal expressivity" of Pellet and Prolog?
Wed, 2010-02-17 16:40 | by samlFound a very interesting quote:
"Both OWL-DL and function-free Horn rules are decidable fragments of first-order logic with interesting, yet orthogonal expressive power"1
"Horn rules", is what prolog builds upon (a prolog statement are horn rules, AFAIS), so maybe Prolog fits into the category of "function-free horn rules"? (Gotta try to figure that out), and OWL-DL is the W3C standard for expressive semantics, that reasoners like pellet (which is available in bioclipse build upon.
- 1. Motik B, Sattler U, Studer R. Query Answering for OWL-DL with rules. Web Semantics: Science, Services and Agents on the World Wide Web. 2005;3(1):41-60. Available at: http://linkinghub.elsevier.com/retrieve/pii/S157082680500003X.
Automating answering of questions with no answers - by wrapping simulations in semantics
What do you think of that title? :) To me it sounds like one of the (many) natural next steps forward for Bioclipse sometime in future1.
Explicit knowledge is too expensive
There are lots of things that can't be answered by a computer from data alone. Maybe the majority of what we humans perceive as knowledge is inferred from a combination of data (simple fact statements about reality) and rules that tell how facts can be combined together to allow making implicit knowledge (knowledge that is not persisted as facts anywhere, but has to be inferred from other facts and rules) become explicit.
One can easily imagine though, that storing every single piece of knowledge that could be stated, as an explicit fact, would require more storage than can probably ever be made available in this universe.
Simulations can make knowledge explicit, from first princples
It is not too hard to come up with some processes which are just too complex and involves too much variability2 that it is unrealistic to try to capture every imaginable state of of that system or process in explicit facts. Instead we must seek the "first principles" that defines the process, and through simulations make explicit any knowledge we are looking for, at the time we need it (one can of course cache often accessed knowledge).
Systems biology simulation software for future Bioclipse integration?
Wed, 2010-02-10 02:52 | by samlWhile reading up before the UU Cheminformatics journal club, initiated thanks to Egon Willighagen, I stumbled on this ... which seems to be something for Bioclipse, whenever it's time for extending it into the systems biology direction:
A LGPL licenced, Java based, Stochastic biological system simulator, designed with ease of integration and interoperability in mind. (Stochastical simulations seems to be the ones which currently captures biological behaviour the best).
Backtracking - Key difference between SPARQL and Prolog?
On something I realized a minute ago ...
Though being really different types of technologies, it might at first be tempting to compare a SPARQL query with a Prolog rule returning a list of results (Or at least it was to me until just a minute ago). In fact, SPARQL queries and prolog rules that return their results as lists, DO share some similarities. For example, in both you provide patterns of RDF statements with variables that are to be bound to each other or to RDF entities, in order to find all queried entities that match the pattern.
But then comes some important differences, in how SPARQL handles cases where one wants to evaluate looked-up entities with a function, like an arithmetic one. In SPARQL this has to be done (I think) with the FILTER construct, but that also means that backtracking is not done if nothing passes the filter (and that is the true meaning of a filter anyway, isn't it).
Bioclipse manager method to take arbitrary number of arguments
I needed a Bioclipse manager method that could take an arbitrary number of arguments, (for a general purpose prolog method mapper). Through a useful discussion with jonalv, we figured out that there exists at least one working way of doing this, while there are a number of ways that do not work across both the Rhino/JavaScript though they work in Java alone.
A usage strategy emerges
A strategy for how to work with the Bioclipse/JPL/Prolog/Blipkit combination I'm setting up, is becoming clear.
The main idea with Bioclipse, as well as with having a prolog engine available in it, is for flexible and "interactive" knitting together of knowledge. One of the main questions regarding how to use a Bioclipse/JPL/Prolog/Blipkit combination, has been where to put the bulk of knowledge integration/reasoning code? There would in principle be three options for that:
- Bioclipse (Javascript environment)
- The Blipkit-Prolog/Bioclipse integration plugin (Java code, a.k.a. "Manager methods")
- The prolog engine (As a prolog file)

Recent comments
1 week 3 hours ago
1 week 5 hours ago
1 week 22 hours ago
1 week 1 day ago
1 week 1 day ago
1 week 1 day ago
1 week 2 days ago
2 weeks 4 days ago
2 weeks 5 days ago
2 weeks 5 days ago