As reported in a previous blog post, I ran into java stacksize errors when importing large amounts of data into pellet. Pellet was using just the in-memory Jena RDF store, which obviously puts limits on the amount of data it can handle.
Jena offers other options for RDF storage though, including SDB for SQL backends, and TDB for a pure Java file based storage. The latter is said to be the faster and easier to setup, which makes it suitable for implementing in the pellet/rdf bioclipse plugin, so I went away and implemented it (which was indeed very easy, after all the Eclipse classpath horror had been sorted out). Will have to refine it with better handling of filepaths, good method naming etc, before hopefully committing a patch during tomorrow or so.
Meanwhile, went away and compared the time for the two operations from the last blog post (Loading of ~1 million triples + extracting all unique predicates) between Pellet and Prolog. Now the results are more similar (Pellet's performance has increased around 1000 fold :) ) though Prolog is still the faster one:
Operation | Pellet | Prolog |
---|---|---|
Loading ~mill. triples | 71.703 s | 49.519 s |
Retr.all unique predicates | 4.371 s | 3.508 s |
Total time for importing nmrshiftdata with Pellet: 71.703 s [[http://www.nmrshiftdb.org/onto#moleculeId], [http://xmlns.com/foaf/0.1/homepage], [http://www.blueobelisk.org/chemistryblogs/casnumber], [http://www.w3.org/2002/07/owl#sameAs], [http://www.blueobelisk.org/chemistryblogs/inchi], [http://www.blueobelisk.org/chemistryblogs/inchikey], [http://www.nmrshiftdb.org/onto#hasSpectrum], [http://purl.org/dc/elements/1.1/title], [http://www.nmrshiftdb.org/onto#spectrumId], [http://www.nmrshiftdb.org/onto#spectrumType], [http://www.nmrshiftdb.org/onto#hasPeak], [http://www.nmrshiftdb.org/onto#temperature], [http://www.nmrshiftdb.org/onto#solvent], [http://www.nmrshiftdb.org/onto#field], [http://www.w3.org/1999/02/22-rdf-syntax-ns#type], [http://www.nmrshiftdb.org/onto#hasShift], [http://purl.org/dc/elements/1.1/source], [http://purl.org/ontology/bibo/doi]] Total time for retreiving all predicates, with Pellet: 4.371 s
Total time for importing nmrshiftdata with Prolog: 49.519 s [['.'('http://purl.org/dc/elements/1.1/source', '.'('http://purl.org/dc/elements/1.1/title', '.'('http://purl.org/ontology/bibo/doi', '.'('http://www.blueobelisk.org/chemistryblogs/casnumber', '.'('http://www.blueobelisk.org/chemistryblogs/inchi', '.'('http://www.blueobelisk.org/chemistryblogs/inchikey', '.'('http://www.nmrshiftdb.org/onto#field', '.'('http://www.nmrshiftdb.org/onto#hasPeak', '.'('http://www.nmrshiftdb.org/onto#hasShift', '.'('http://www.nmrshiftdb.org/onto#hasSpectrum', '.'('http://www.nmrshiftdb.org/onto#moleculeId', '.'('http://www.nmrshiftdb.org/onto#solvent', '.'('http://www.nmrshiftdb.org/onto#spectrumId', '.'('http://www.nmrshiftdb.org/onto#spectrumType', '.'('http://www.nmrshiftdb.org/onto#temperature', '.'('http://www.w3.org/1999/02/22-rdf-syntax-ns#type', '.'('http://www.w3.org/2002/07/owl#sameAs', '.'('http://xmlns.com/foaf/0.1/homepage', []))))))))))))))))))]] Total time for retreiving all predicates with Prolog: 3.508 s
To be continued...
Comments
Problem with activation
Hi there, I dont know if I am writing in a proper board but I have got a problem with activation, link i receive in email is not working... http://saml.rilspace.org/?665d2b7627c4bd294ab773a5ead,