I started with some initial performance testing for RDF data, between pellet an prolog, which are now both available integrated in Bioclipse.
Total time for importing nmrshiftdata with Pellet: 56.791 s
Loading into prolog, with the pellet data already loaded:
Total time for importing nmrshiftdata with Prolog: 49.091 s
// JavaScript
var nmrShiftDBStore = pellet.createStore();
rdf.importFile(nmrShiftDBStore, "runningbioclipse/nmrshiftdata.100.R2.rdf.xml", "RDF/XML");
var start = new Date().getTime();
var sparql = "SELECT distinct ?predicate WHERE { ?x ?predicate ?y. }";
js.say(rdf.sparql(nmrShiftDBStore, sparql));
var elapsed = (new Date().getTime() - start)/1000;
js.say("Total time for retreiving all predicates, with Pellet: " + elapsed + " s");[[http://www.nmrshiftdb.org/onto#hasShift], [http://www.w3.org/2002/07/owl#sameAs], [http://www.w3.org/1999/02/22-rdf-syntax-ns#type], [http://www.nmrshiftdb.org/onto#hasSpectrum], [http://www.nmrshiftdb.org/onto#moleculeId], [http://www.blueobelisk.org/chemistryblogs/inchikey], [http://purl.org/dc/elements/1.1/title], [http://xmlns.com/foaf/0.1/homepage], [http://www.blueobelisk.org/chemistryblogs/inchi], [http://www.nmrshiftdb.org/onto#spectrumId], [http://www.nmrshiftdb.org/onto#hasPeak], [http://www.nmrshiftdb.org/onto#field], [http://www.nmrshiftdb.org/onto#solvent], [http://www.nmrshiftdb.org/onto#temperature], [http://www.nmrshiftdb.org/onto#spectrumType], [http://www.w3.org/2000/01/rdf-schema#subPropertyOf], [http://www.w3.org/2002/07/owl#equivalentProperty], [http://www.blueobelisk.org/chemistryblogs/casnumber], [http://www.w3.org/2002/07/owl#disjointWith], [http://www.w3.org/2002/07/owl#equivalentClass], [http://www.w3.org/2000/01/rdf-schema#subClassOf], [http://www.w3.org/2002/07/owl#complementOf]] Total time for retreiving all predicates, with Pellet: 111.68 s
listAllPredicates(Ps) :- setof(P, rdf_db:S^O^rdf( S, P, O ), Ps ).
blipkit.init();
blipkit.loadRDFToProlog("/home/samuel/bioclipse-workspace/runningbioclipse/nmrshiftdata.100.R2.rdf.xml");
blipkit.consult('/home/samuel/bioclipse-workspace/runningbioclipse/NMRShiftReasoner.pl');
var start = new Date().getTime();
js.say(blipkit.queryProlog( [ "listAllPredicates", "100", "Ps" ] ));
var elapsed = (new Date().getTime() - start)/1000;
js.say("Total time for retreiving all predicates with Prolog: " + elapsed + " s");[['.'('http://purl.org/dc/elements/1.1/title',
'.'('http://www.blueobelisk.org/chemistryblogs/casnumber',
'.'('http://www.blueobelisk.org/chemistryblogs/inchi',
'.'('http://www.blueobelisk.org/chemistryblogs/inchikey',
'.'('http://www.nmrshiftdb.org/onto#field',
'.'('http://www.nmrshiftdb.org/onto#hasPeak',
'.'('http://www.nmrshiftdb.org/onto#hasShift',
'.'('http://www.nmrshiftdb.org/onto#hasSpectrum',
'.'('http://www.nmrshiftdb.org/onto#moleculeId',
'.'('http://www.nmrshiftdb.org/onto#solvent',
'.'('http://www.nmrshiftdb.org/onto#spectrumId',
'.'('http://www.nmrshiftdb.org/onto#spectrumType',
'.'('http://www.nmrshiftdb.org/onto#temperature',
'.'('http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
'.'('http://www.w3.org/2002/07/owl#sameAs',
'.'('http://xmlns.com/foaf/0.1/homepage', []))))))))))))))))]]
Total time for retreiving all predicates with Prolog: 0.023 sThere is obviously some problems with pellet here, in that it takes 111.68 s to retreive all predicates, whereas Prolog does the same thing in 0.023 s. Talking to Egon, we figured out it is most probably related to the fact that Pellet/Jena stores the whole RDF store in memory only, so the thing to do would be to implement a database backend or similar, for the RDF store.