PROLOG
Surprise: Jena/SPARQL outperformed Prolog for spectrum similarity search
I was a bit worried over the performance of the RDF facilities in Bioclipse, as a SPARQL query for doing NMR Spectrum similarity search, including a numerical comparison (against datasets which are attached) were quite unsatisfactory, being some 2 orders of magnitude worse than some Prolog code I wrote for doing the same task (But of course, this might not be the way Pellet is optimized to work).
Anyway, when testing the same SPARQL query with Jena, I got some quite different results, as seen in the graph to the right; Jena in fact outperformed the prolog code! Interesting..
Test results
Pellet/SPARQL vs SWI-Prolog
So, while Prolog far outperformed Pellet for the task (at least when Pellet was to do it using SPARQL) ...
Jena/SPARQL vs SWI-Prolog
... at the same time Jena outperforms Prolog, running the same SPARQL query above. Interesting!

In Tabular form
| No of spectra | SWI-Prolog | Jena/SPARQL | Pellet/SPARQL |
| 10 | 0.06 | 0.01 | 9.50 |
| 20 | 0.10 | 0.02 | 14.22 |
| 30 | 0.14 | 0.02 | 15.39 |
| 40 | 0.17 | 0.02 | 31.52 |
| 50 | 0.18 | 0.03 | 32.56 |
| 60 | 0.23 | 0.02 | 43.93 |
| 70 | 0.26 | 0.02 | 51.05 |
| 80 | 0.30 | 0.02 | 55.60 |
| 90 | 0.36 | 0.02 | 57.96 |
| 100 | 0.42 | 0.08 | 66.49 |
Code
SPARQL Query
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
PREFIX nmr: <http://www.nmrshiftdb.org/onto#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s ?s1 ?s2 ?s3
WHERE {
?s nmr:hasPeak [ nmr:hasShift ?s1 ] ,
[ nmr:hasShift ?s2 ] ,
[ nmr:hasShift ?s3 ] .
FILTER ( fn:abs(?s1 - 17.6) < 0.3 &&
fn:abs(?s2 - 18.3) < 0.3 &&
fn:abs(?s3 - 22.6) < 0.3 )
} LIMIT 1"
Prolog code
findMolWithPeakValsNear( SearchShiftVals, Mols ) :-
% Pick the Mols in 'Mol', that match the pattern:
% listPeakShiftsOfMol( Mol, MolShiftVals ),
% containsListElemsNear( SearchShiftVals, MolShiftVals )
% and collect them in 'Mols'.
setof( Mol,
( listPeakShiftsOfMol( Mol, MolShiftVals ), % A Mol's shift values are collected
containsListElemsNear( SearchShiftVals, MolShiftVals ) ), % ...and compared against the given SearchShiftVals
[Mols|MolTail] ). % In 'Mols', all 'Mol's, for which their shift
% values match the SearchShiftVals, are collected.
% Given a 'Mol', give it's shiftvalues in list form, in 'ListOfPeaks'
listPeakShiftsOfMol( Mol, ListOfPeaks ) :-
hasSpectrum( Mol, Spectrum ),
findall( ShiftVal,
( hasPeak( Spectrum, Peak ),
hasShiftVal( Peak, ShiftVal ) ),
ListOfPeaks ).
% Compare two lists to see if list2 has near-matches for each of the values in list1
containsListElemsNear( [ElemHead|ElemTail], List ) :-
memberCloseTo( ElemHead, List ),
( containsListElemsNear( ElemTail, List );
ElemTail == [] ).
%%%%%%%%%%%%%%%%%%%%%%%%
% Recursive construct: %
%%%%%%%%%%%%%%%%%%%%%%%%
% Test first the end criterion:
memberCloseTo( X, [ Y | Tail ] ) :-
closeTo( X, Y ).
% but if the above doesn't validate, then recursively continue with the tail of List2:
memberCloseTo( X, [ Y | Tail ] ) :-
memberCloseTo( X, Tail ).
% Numerical near-match
closeTo( Val1, Val2 ) :-
abs(Val1 - Val2) =< 0.3.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Convenience accessory methods %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
hasShiftVal( Peak, ShiftVal ) :-
rdf_db:rdf( Peak, 'http://www.nmrshiftdb.org/onto#hasShift',
literal(type('http://www.w3.org/2001/XMLSchema#decimal',
ShiftValLiteral))),
atom_number_create( ShiftValLiteral, ShiftVal ).
hasSpectrum( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasSpectrum', Predicate).
hasPeak( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasPeak', Predicate).
% Wrapper method for the atom_number/2 method which converts atoms (string constants) to number.
% The wrapper methods avoids exceptions on empty atoms, instead converting into a zero.
atom_number_create( Atom, Number ) :-
atom_length( Atom, AtomLength ), AtomLength > 0 -> % IF atom is not empty
atom_number( Atom, Number ); % THEN Convert the atom to a numerical value
atom_number( '0', Number ). % ELSE Convert to a zero
Attached files
Attached are the datasets (nmrshiftdata10...100.rdf.xml) for the different number of spectra, and the Bioclipse JavaScript files for the tests (NMR.Pellet.js, NMR.Jena.js and NMR.Swipl.js, as well as NMR.Pellet.Init, which I used for preparing the RDF stores for both Jena and Pellet).
Screencast: Experimental Prolog integration in Bioclipse
Wed, 2010-03-03 20:51 | by samlI wanted to test out some screen casting, so I chose to demo the (still experimental) SWI-Prolog integration into Bioclipse, showing how Prolog code (or a "Prolog knowledge base") can conveniently be stored inside Bioclipse's JavaScript environment (in a JS variable), loaded into the prolog engine, and then queried, all from the JS environment, and finally the results can be returned as well to the Javascript environment for further processing or output.
Note that this is still at the experimental stage, so things are a bit rough around the edges!
Chemical ring and cage structures need rules (seemingly)
Mon, 2010-03-01 23:31 | by samlDuring my short stay at EBI, Egon had kindly arranged an opportunity to talk to Janna from the Steinbeck group, about some work she has done on searching for cage structures in molecules, using Prolog, so we met over a coffee, together with Nico who's now at the Steinbeck group (visiting).
Bot Nico and Janna kindly gave a lot of good advice about research in general (as I'm currently looking into possibly doing PhD somewhere), which I highly appreciated. :)
And we talked some about the original topic too :), that is the cage structure problem. The cage structure problem is kind of an extension to the problem of expressing rings, which has previously been reported as a problem for OWL-DL. So because of this, it is interesting that Janna came up with a working solution, using Prolog.
As a highlighting example from the DL-side, Michel Dumontier has done some work on representing molecules, including rings. But they also had to use rules, not plain OWL.
So that seem to be the general conclusion: In order to express ring structures (or extensions of it, such as cage structures), you'll need to use rules in some way.
Unfortunatly my project is now running out of time, so I might not have much time to look more into this topic as part of my project :(. Will see if I can include this as a part of another course I still have to finish ("knowledge based systems in bioinformatics"), but that remains to see.
Think I finally got it what Horn clauses are good for
Thu, 2010-02-18 17:01 | by samlThe Wikipedia article on Horn clauses states the following: "Horn clauses are relevant to theorem proving by first-order resolution, in that the resolution of two Horn clauses is itself a Horn clause"
That seems to explain why horn clauses are so foundational for prolog, since in prolog one can compose goal functions as compounds of other goals.
(I realize my lack of basic knowledge of Prolog, heavily regretting not having been able to have any formal training in logic programming / prolog at my uni :( (they removed the prolog part of the course where I hoped to learn it, just before I entered that course) ).
"Orthogonal expressivity" of Pellet and Prolog?
Wed, 2010-02-17 16:40 | by samlFound a very interesting quote:
"Both OWL-DL and function-free Horn rules are decidable fragments of first-order logic with interesting, yet orthogonal expressive power"1
"Horn rules", is what prolog builds upon (a prolog statement are horn rules, AFAIS), so maybe Prolog fits into the category of "function-free horn rules"? (Gotta try to figure that out), and OWL-DL is the W3C standard for expressive semantics, that reasoners like pellet (which is available in bioclipse build upon.
- 1. Motik B, Sattler U, Studer R. Query Answering for OWL-DL with rules. Web Semantics: Science, Services and Agents on the World Wide Web. 2005;3(1):41-60. Available at: http://linkinghub.elsevier.com/retrieve/pii/S157082680500003X.
Idea: How to store Prolog rules in Bioclipse scripts
It just struck me a very simple way of storing prolog code inside Bioclipse scripts, avoiding the need for a separate file containing the Prolog code. This might be very useful when using prolog as a kind of "query language" somewhat analogous to how SPARQL is used.
(Update: Prolog in fact turns out to be more powerful than SPARQL in this regard, as shown by the observation in this blog post, that SPARQL doesn't support backtracking).
The idea would be to simply store the prolog code in Bioclipse JS variable, and create a special manager method that can write such prolog query-code containing variables to a temporary file in the workspace and then just telling Prolog to "consult" that file, thereby "feeding the prolog engine" with the logic to use, from inside Bioclipse scripts.
Initial performance comparison: Pellet vs Prolog in Bioclipse
I started with some initial performance testing for RDF data, between pellet an prolog, which are now both available integrated in Bioclipse.
A usage strategy emerges
A strategy for how to work with the Bioclipse/JPL/Prolog/Blipkit combination I'm setting up, is becoming clear.
The main idea with Bioclipse, as well as with having a prolog engine available in it, is for flexible and "interactive" knitting together of knowledge. One of the main questions regarding how to use a Bioclipse/JPL/Prolog/Blipkit combination, has been where to put the bulk of knowledge integration/reasoning code? There would in principle be three options for that:
- Bioclipse (Javascript environment)
- The Blipkit-Prolog/Bioclipse integration plugin (Java code, a.k.a. "Manager methods")
- The prolog engine (As a prolog file)
Converting RDF predicates to Prolog convenience methods with RegEx
Using SWI-Prolog's semweb package, I had extracted all predicates in a RDF source, containing some 1 million triples, into the following list:
Nice intro to RDF in Prolog (by Pellet author)
Wed, 2009-11-04 11:00 | by samlI found a nice introduction to the use of RDF in Prolog (SWI-Prolog). It contains short primers for both RDF and Prolog, so it should be accessible to anyone with a minimal programming background:

Recent comments
1 week 3 hours ago
1 week 5 hours ago
1 week 22 hours ago
1 week 1 day ago
1 week 1 day ago
1 week 1 day ago
1 week 2 days ago
2 weeks 4 days ago
2 weeks 5 days ago
2 weeks 5 days ago