Initial performance comparison: Pellet vs Prolog in Bioclipse

I started with some initial performance testing for RDF data, between pellet an prolog, which are now both available integrated in Bioclipse.

Importing data

Total time for importing nmrshiftdata with Pellet: 56.791 s

Loading into prolog, with the pellet data already loaded:

Total time for importing nmrshiftdata with Prolog: 49.091 s

Listing all predicates

Pellet

Bioclipse JS Script

// JavaScript
var nmrShiftDBStore = pellet.createStore();
 
rdf.importFile(nmrShiftDBStore, "runningbioclipse/nmrshiftdata.100.R2.rdf.xml", "RDF/XML");
 
var start = new Date().getTime();
var sparql = "SELECT distinct ?predicate WHERE {   ?x ?predicate ?y. }";
js.say(rdf.sparql(nmrShiftDBStore, sparql));
var elapsed = (new Date().getTime() - start)/1000;
js.say("Total time for retreiving all predicates, with Pellet: " + elapsed + " s");

Result

[[http://www.nmrshiftdb.org/onto#hasShift],
[http://www.w3.org/2002/07/owl#sameAs],
[http://www.w3.org/1999/02/22-rdf-syntax-ns#type],
[http://www.nmrshiftdb.org/onto#hasSpectrum],
[http://www.nmrshiftdb.org/onto#moleculeId],
[http://www.blueobelisk.org/chemistryblogs/inchikey],
[http://purl.org/dc/elements/1.1/title], [http://xmlns.com/foaf/0.1/homepage],
[http://www.blueobelisk.org/chemistryblogs/inchi],
[http://www.nmrshiftdb.org/onto#spectrumId],
[http://www.nmrshiftdb.org/onto#hasPeak],
[http://www.nmrshiftdb.org/onto#field],
[http://www.nmrshiftdb.org/onto#solvent],
[http://www.nmrshiftdb.org/onto#temperature],
[http://www.nmrshiftdb.org/onto#spectrumType],
[http://www.w3.org/2000/01/rdf-schema#subPropertyOf],
[http://www.w3.org/2002/07/owl#equivalentProperty],
[http://www.blueobelisk.org/chemistryblogs/casnumber],
[http://www.w3.org/2002/07/owl#disjointWith],
[http://www.w3.org/2002/07/owl#equivalentClass],
[http://www.w3.org/2000/01/rdf-schema#subClassOf],
[http://www.w3.org/2002/07/owl#complementOf]]
Total time for retreiving all predicates, with Pellet: 111.68 s

Note that the 7 last predicates are specific to pellet. That is, they are in-built predicates, not coming from the NMRShift data.

Prolog

Prolog function

listAllPredicates(Ps) :-
  setof(P, rdf_db:S^O^rdf( S, P, O ), Ps ).

(This code is placed in the file NMRShiftReasoner.pl)

Bioclipse JS Script

blipkit.init();
blipkit.loadRDFToProlog("/home/samuel/bioclipse-workspace/runningbioclipse/nmrshiftdata.100.R2.rdf.xml");
blipkit.consult('/home/samuel/bioclipse-workspace/runningbioclipse/NMRShiftReasoner.pl');
 
var start = new Date().getTime();
js.say(blipkit.queryProlog( [ "listAllPredicates", "100", "Ps" ] ));
var elapsed = (new Date().getTime() - start)/1000;
js.say("Total time for retreiving all predicates with Prolog: " + elapsed + " s");

Result

[['.'('http://purl.org/dc/elements/1.1/title',
'.'('http://www.blueobelisk.org/chemistryblogs/casnumber',
'.'('http://www.blueobelisk.org/chemistryblogs/inchi',
'.'('http://www.blueobelisk.org/chemistryblogs/inchikey',
'.'('http://www.nmrshiftdb.org/onto#field',
'.'('http://www.nmrshiftdb.org/onto#hasPeak',
'.'('http://www.nmrshiftdb.org/onto#hasShift',
'.'('http://www.nmrshiftdb.org/onto#hasSpectrum',
'.'('http://www.nmrshiftdb.org/onto#moleculeId',
'.'('http://www.nmrshiftdb.org/onto#solvent',
'.'('http://www.nmrshiftdb.org/onto#spectrumId',
'.'('http://www.nmrshiftdb.org/onto#spectrumType',
'.'('http://www.nmrshiftdb.org/onto#temperature',
'.'('http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
'.'('http://www.w3.org/2002/07/owl#sameAs',
'.'('http://xmlns.com/foaf/0.1/homepage', []))))))))))))))))]]
Total time for retreiving all predicates with Prolog: 0.023 s

Note that this prolog method returns a list, rather than a set of instances/atoms, which explains the difference in output.

There is obviously some problems with pellet here, in that it takes 111.68 s to retreive all predicates, whereas Prolog does the same thing in 0.023 s. Talking to Egon, we figured out it is most probably related to the fact that Pellet/Jena stores the whole RDF store in memory only, so the thing to do would be to implement a database backend or similar, for the RDF store.

  • This seems to be a good starting point.