SPARQL

Surprise: Jena/SPARQL outperformed Prolog for spectrum similarity search

I was a bit worried over the performance of the RDF facilities in Bioclipse, as a SPARQL query for doing NMR Spectrum similarity search, including a numerical comparison (against datasets which are attached) were quite unsatisfactory, being some 2 orders of magnitude worse than some Prolog code I wrote for doing the same task (But of course, this might not be the way Pellet is optimized to work).

Anyway, when testing the same SPARQL query with Jena, I got some quite different results, as seen in the graph to the right; Jena in fact outperformed the prolog code! Interesting..

Test results

Pellet/SPARQL vs SWI-Prolog

So, while Prolog far outperformed Pellet for the task (at least when Pellet was to do it using SPARQL) ...

Jena/SPARQL vs SWI-Prolog

... at the same time Jena outperforms Prolog, running the same SPARQL query above. Interesting!

In Tabular form

 

No of spectra SWI-Prolog Jena/SPARQL Pellet/SPARQL
10 0.06 0.01 9.50
20 0.10 0.02 14.22
30 0.14 0.02 15.39
40 0.17 0.02 31.52
50 0.18 0.03 32.56
60 0.23 0.02 43.93
70 0.26 0.02 51.05
80 0.30 0.02 55.60
90 0.36 0.02 57.96
100 0.42 0.08 66.49

Code

SPARQL Query
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
PREFIX nmr: <http://www.nmrshiftdb.org/onto#> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s ?s1 ?s2 ?s3
WHERE {
  ?s nmr:hasPeak [ nmr:hasShift ?s1 ] ,
                 [ nmr:hasShift ?s2 ] ,
                 [ nmr:hasShift ?s3 ] .
FILTER ( fn:abs(?s1 - 17.6) < 0.3 &&
         fn:abs(?s2 - 18.3) < 0.3 &&
         fn:abs(?s3 - 22.6) < 0.3 )
} LIMIT 1"


Prolog code

findMolWithPeakValsNear( SearchShiftVals, Mols ) :-    
  % Pick the Mols in 'Mol', that match the pattern:    
  %   listPeakShiftsOfMol( Mol, MolShiftVals ),
  %   containsListElemsNear( SearchShiftVals, MolShiftVals )    
  % and collect them in 'Mols'.
  setof( Mol,     
         ( listPeakShiftsOfMol( Mol, MolShiftVals ),         % A Mol's shift values are collected  
   containsListElemsNear( SearchShiftVals, MolShiftVals ) ), % ...and compared against the given SearchShiftVals
   [Mols|MolTail] ).                                         % In 'Mols', all 'Mol's, for which their shift     
                                                             % values match the SearchShiftVals, are collected. 
% Given a 'Mol', give it's shiftvalues in list form, in 'ListOfPeaks'
listPeakShiftsOfMol( Mol, ListOfPeaks ) :-
  hasSpectrum( Mol, Spectrum ),
  findall( ShiftVal,     
           ( hasPeak( Spectrum, Peak ),
             hasShiftVal( Peak, ShiftVal ) ),     
             ListOfPeaks ).    
 
% Compare two lists to see if list2 has near-matches for each of the values in list1
containsListElemsNear( [ElemHead|ElemTail], List ) :-    
  memberCloseTo( ElemHead, List ),
  ( containsListElemsNear( ElemTail, List );    
    ElemTail == [] ).  
 
%%%%%%%%%%%%%%%%%%%%%%%%
% Recursive construct: %
%%%%%%%%%%%%%%%%%%%%%%%%
% Test first the end criterion:
memberCloseTo( X, [ Y | Tail ] ) :-    
  closeTo( X, Y ).
% but if the above doesn't validate, then recursively continue with the tail of List2:
memberCloseTo( X, [ Y | Tail ] ) :-    
  memberCloseTo( X, Tail ).    
 
% Numerical near-match    
closeTo( Val1, Val2 ) :-
  abs(Val1 - Val2) =< 0.3.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Convenience accessory methods %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
hasShiftVal( Peak, ShiftVal ) :-
 
rdf_db:rdf( Peak, 'http://www.nmrshiftdb.org/onto#hasShift',
literal(type('http://www.w3.org/2001/XMLSchema#decimal',
ShiftValLiteral))),
  atom_number_create( ShiftValLiteral, ShiftVal ).        
hasSpectrum( Subject, Predicate ) :-    
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasSpectrum', Predicate). 
hasPeak( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasPeak', Predicate).

% Wrapper method for the atom_number/2 method which converts atoms (string constants) to number.    
% The wrapper methods avoids exceptions on empty atoms, instead converting into a zero.    
atom_number_create( Atom, Number ) :-     
  atom_length( Atom, AtomLength ), AtomLength > 0 -> % IF atom is not empty 
  atom_number( Atom, Number );                       % THEN Convert the atom to a numerical value     
  atom_number( '0', Number ).                        % ELSE Convert to a zero

Attached files

Attached are the datasets (nmrshiftdata10...100.rdf.xml) for the different number of spectra, and the Bioclipse JavaScript files for the tests (NMR.Pellet.js, NMR.Jena.js and NMR.Swipl.js, as well as NMR.Pellet.Init, which I used for preparing the RDF stores for both Jena and Pellet).

NMR SPARQL search problem solved

The problem I've had, to find a working SPARQL query for doing similarity search for spectra, according to a list of peak shift values (documented on this blog: post1, post 2 and on semanticoverflow) is now finally solved, thanks to helpful advice from Brandon Ibach on the [pellet-users] mailing list.

Great SPARQL FAQ

This is a great SPARQL FAQ, and seemingly a good starting point for finding out anything SPARQL.

Tricky reasnong problem for SPARQL and OWL: Lists of property chains containing numerical value constraints

EDIT (19/2): The problem with doing this with a SPARQL query, is now solved! See this blog post.


Thought I should write up on my NMRSpectrum similarity search problem, which I've got quite stuck with ... even after trying to get some advice from Semantic Overflow. So ... here we go.


I have a problem that I can't seem to express successfully in either pure SPARQL nor using OWL class descriptions, seemingly because the problem combines lists, property chains, and numerical value constraints, in a troublesome mix.

Backtracking - Key difference between SPARQL and Prolog?

On something I realized a minute ago ...

Though being really different types of technologies, it might at first be tempting to compare a SPARQL query with a Prolog rule returning a list of results (Or at least it was to me until just a minute ago). In fact, SPARQL queries and prolog rules that return their results as lists, DO share some similarities. For example, in both you provide patterns of RDF statements with variables that are to be bound to each other or to RDF entities, in order to find all queried entities that match the pattern.

But then comes some important differences, in how SPARQL handles cases where one wants to evaluate looked-up entities with a function, like an arithmetic one. In SPARQL this has to be done (I think) with the FILTER construct, but that also means that backtracking is not done if nothing passes the filter (and that is the true meaning of a filter anyway, isn't it).