I started with some initial performance testing for RDF data, between pellet an prolog, which are now both available integrated in Bioclipse.
Total time for importing nmrshiftdata with Pellet: 56.791 s
Loading into prolog, with the pellet data already loaded:
Total time for importing nmrshiftdata with Prolog: 49.091 s
// JavaScript var nmrShiftDBStore = pellet.createStore(); rdf.importFile(nmrShiftDBStore, "runningbioclipse/nmrshiftdata.100.R2.rdf.xml", "RDF/XML"); var start = new Date().getTime(); var sparql = "SELECT distinct ?predicate WHERE { ?x ?predicate ?y. }"; js.say(rdf.sparql(nmrShiftDBStore, sparql)); var elapsed = (new Date().getTime() - start)/1000; js.say("Total time for retreiving all predicates, with Pellet: " + elapsed + " s");
[[http://www.nmrshiftdb.org/onto#hasShift], [http://www.w3.org/2002/07/owl#sameAs], [http://www.w3.org/1999/02/22-rdf-syntax-ns#type], [http://www.nmrshiftdb.org/onto#hasSpectrum], [http://www.nmrshiftdb.org/onto#moleculeId], [http://www.blueobelisk.org/chemistryblogs/inchikey], [http://purl.org/dc/elements/1.1/title], [http://xmlns.com/foaf/0.1/homepage], [http://www.blueobelisk.org/chemistryblogs/inchi], [http://www.nmrshiftdb.org/onto#spectrumId], [http://www.nmrshiftdb.org/onto#hasPeak], [http://www.nmrshiftdb.org/onto#field], [http://www.nmrshiftdb.org/onto#solvent], [http://www.nmrshiftdb.org/onto#temperature], [http://www.nmrshiftdb.org/onto#spectrumType], [http://www.w3.org/2000/01/rdf-schema#subPropertyOf], [http://www.w3.org/2002/07/owl#equivalentProperty], [http://www.blueobelisk.org/chemistryblogs/casnumber], [http://www.w3.org/2002/07/owl#disjointWith], [http://www.w3.org/2002/07/owl#equivalentClass], [http://www.w3.org/2000/01/rdf-schema#subClassOf], [http://www.w3.org/2002/07/owl#complementOf]] Total time for retreiving all predicates, with Pellet: 111.68 s
listAllPredicates(Ps) :- setof(P, rdf_db:S^O^rdf( S, P, O ), Ps ).
blipkit.init(); blipkit.loadRDFToProlog("/home/samuel/bioclipse-workspace/runningbioclipse/nmrshiftdata.100.R2.rdf.xml"); blipkit.consult('/home/samuel/bioclipse-workspace/runningbioclipse/NMRShiftReasoner.pl'); var start = new Date().getTime(); js.say(blipkit.queryProlog( [ "listAllPredicates", "100", "Ps" ] )); var elapsed = (new Date().getTime() - start)/1000; js.say("Total time for retreiving all predicates with Prolog: " + elapsed + " s");
[['.'('http://purl.org/dc/elements/1.1/title', '.'('http://www.blueobelisk.org/chemistryblogs/casnumber', '.'('http://www.blueobelisk.org/chemistryblogs/inchi', '.'('http://www.blueobelisk.org/chemistryblogs/inchikey', '.'('http://www.nmrshiftdb.org/onto#field', '.'('http://www.nmrshiftdb.org/onto#hasPeak', '.'('http://www.nmrshiftdb.org/onto#hasShift', '.'('http://www.nmrshiftdb.org/onto#hasSpectrum', '.'('http://www.nmrshiftdb.org/onto#moleculeId', '.'('http://www.nmrshiftdb.org/onto#solvent', '.'('http://www.nmrshiftdb.org/onto#spectrumId', '.'('http://www.nmrshiftdb.org/onto#spectrumType', '.'('http://www.nmrshiftdb.org/onto#temperature', '.'('http://www.w3.org/1999/02/22-rdf-syntax-ns#type', '.'('http://www.w3.org/2002/07/owl#sameAs', '.'('http://xmlns.com/foaf/0.1/homepage', []))))))))))))))))]] Total time for retreiving all predicates with Prolog: 0.023 s
There is obviously some problems with pellet here, in that it takes 111.68 s to retreive all predicates, whereas Prolog does the same thing in 0.023 s. Talking to Egon, we figured out it is most probably related to the fact that Pellet/Jena stores the whole RDF store in memory only, so the thing to do would be to implement a database backend or similar, for the RDF store.