Backtracking - Key difference between SPARQL and Prolog?

On something I realized a minute ago ...

Though being really different types of technologies, it might at first be tempting to compare a SPARQL query with a Prolog rule returning a list of results (Or at least it was to me until just a minute ago). In fact, SPARQL queries and prolog rules that return their results as lists, DO share some similarities. For example, in both you provide patterns of RDF statements with variables that are to be bound to each other or to RDF entities, in order to find all queried entities that match the pattern.

But then comes some important differences, in how SPARQL handles cases where one wants to evaluate looked-up entities with a function, like an arithmetic one. In SPARQL this has to be done (I think) with the FILTER construct, but that also means that backtracking is not done if nothing passes the filter (and that is the true meaning of a filter anyway, isn't it).

So, while a simple case might be possible to implement in both, like the following query for finding a spectrum with only one peak of a certain shift. That could look like this in Prolog (Items starting with a Capital letter are variables):

findSpectrumWithPeakOfShiftValue( Spectrum, Peakshiftvalue ) :-
  hasPeak( Spectrum, Peak ),
  hasShift( Peak, Shift ),
  valueNear( Shift, PeakShiftValue ).
 
valueNear( Val1, Val2 ) :-
  abs(Val1 - Val2) =< 0.3.

...and like this in SPARQL:
PREFIX onto: <http://www.nmrshiftdb.org/onto#>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
SELECT distinct ?spectrum
WHERE {
  ?spectrum onto:hasPeak ?peak .
  ?peak onto:hasShift ?shift .
  FILTER (
    fn:abs(?shift - [SomeValue]) < [SomeTresholdValue]
  )
}"; 

But the problems come when you want to do more complicated searches, like finding spectra based on a list of peak shift values, which should find near-matches in the spectra to be returned. So for the following prolog code, which does exactly that, I could find no way to implement it solely in SPARQL:

findMoleculeWithPeakValuesNear( SearchShiftValues, Molecules ) :-
% Pick the [Molecule]s, that match the pattern:
%   listPeakShiftsOfMolecule( Molecule, MoleculeShiftValues ),
%   containsListElementsNear( SearchShiftValues, MoleculeShiftValues )
% and collect them in [Molecules].
  setof( Molecule,  
         ( listPeakShiftsOfMolecule( Molecule, MoleculeShiftValues ),            % A Molecule's shift values are collected
           containsListElementsNear( SearchShiftValues, MoleculeShiftValues ) ), % ...and compared against the given SearchShiftValues
         Molecules ).                                                            % In [Molecules], all [Molecule]s, for which their shift
                                                                                 % values match the SearchShiftValues, are collected.
 
% Given a [Molecule], give it's shiftvalues in list form, in [MoleculeShiftValues]
listPeakShiftsOfMolecule( Molecule, MoleculeShiftValues ) :-
  hasSpectrum( Molecule, Spectrum ),               % Given a Molecule, pick it's Spectrum
  findall( ShiftValue,                                                         % Find all [ShiftValue]s of [Molecule]
           ( hasPeak( Spectrum, Peak ),            % and collect them in [MoleculeShiftValues]
             hasShiftValue( Peak, ShiftValue ) ),
           MoleculeShiftValues ).
 
% containsListElementsNear( List1, List2 ) :-
% Compare two lists to see if list2 has near-matches for each of the values in list1
containsListElementsNear( [ElemHead|ElemTail], List ) :-
  memberCloseTo( ElemHead, List ),               % See if first element of List1 has a near-match in List2
  ( containsListElementsNear( ElemTail, List );  % Recursively do the same test, now with the rest of List1
    ElemTail == [] ).                            % (That is, List1 - it's first element), [ElemTail].
 
% Recursive construct:
% -----------------------------
% Test:
memberCloseTo( List1, [ List2Head | List2Tail ] ) :-
  closeTo( List1, List2Head ).
% but if the above doesn't validate, then recursively continue with the tail of List2
memberCloseTo( List1, [ List2Head | List2Tail ] ) :-
  memberCloseTo( List1, List2Tail ).
 
% Numerical near-match
closeTo( Val1, Val2 ) :-
  abs(Val1 - Val2) =< 0.3.
 
% Some convenience methods
 
hasSpectrumId( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#spectrumId', Predicate).
 
hasShiftValue( Peak, ShiftValue ) :-
  rdf_db:rdf( Peak, 'http://www.nmrshiftdb.org/onto#hasShift', literal(type('nmr:ppm', ShiftValueLiteral))),
  atom_number_create( ShiftValueLiteral, ShiftValue ).
 
hasPeak( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasPeak', Predicate).

So... probably I will have to perform the search by constructing a custom OWL class instead, but then comes questions like: where do I specify the list of shifts that I want to match against? Does that have to be "hard-coded" in the OWL class defenition, or can I submit it in the SPARQL query in some way? That will hopefully be more clear soon.

Comments

prolog style

Hi Samuel

You might want to try documenting your code using the pldoc style:

%% findSpectrumWithPeakOfShiftValue( ?Spectrum, +Peakshiftvalue ) is nondet

I am guessing the mode here, as it seems that your predicate would fail if Peakshiftvalue is unbound.

Also, predicate names with "find" in term are very un-prolog like and reflect a procedural mindset. I advocate naming predicates like relations (and not using camel case!)

I'd also advocate switching the order such that the bound variable comes first

Your second code snippet is also quite procedural - I don't think you need so many findall and member predicates. But it's hard to see what you're doing. Do you have example datasets?

Good hints

Thanks! These are good hints.

I'll try to improve the problem description shortly, providing sample data etc.

Sample data

I posted a short sample data extract at a post on semanticoverflow, but I'll try add it here as well.

(... and I should definitely comment the prolog code!)

Probably due to list processing

Chris Mungall wrote:
"Your second code snippet is also quite procedural - I don't think you need so many findall and member predicates. But it's hard to see what you're doing. Do you have example datasets?"

True. I guess it is be due to all the list processing, since I'm taking a list of shift values as "input".

If creating a prolog clause with hard coded shift values to search for (using it a bit like a SPARQL query), I guess I had not needed that, and could probably have gone something like (for example):

find_spectra_with_hardcoded_shift_values( Spectra ) :-
  hasPeak( Spectrum, Peak1 ),
  hasShiftValue( Peak1, Shift1Value ),
  closeTo( Shift1Value, '12.8' ),
 
  hasPeak( Spectrum, Peak2 ),
  hasShiftValue( Peak2, Shift2Value ),
  closeTo( Shift2Value, '12.8' ),
 
  ... more peaks ...
 
  hasPeak( Spectrum, Peak8 ),
  hasShiftValue( Peak8, Shift8Value ),
  closeTo( Shift8Value, '122.1' ).

Maybe this is the way to go (that could possibly be a useful way of using prolog, from Bioclipse). But I guess, if wanting to take a list, of shift peaks, it is not so easily done in another way?

Solved!

The problem with doing this with a SPARQL query, is now solved! See: