File based RDF storage in Pellet, first tests
As reported in a previous blog post, I ran into java stacksize errors when importing large amounts of data into pellet. Pellet was using just the in-memory Jena RDF store, which obviously puts limits on the amount of data it can handle.
Jena offers other options for RDF storage though, including SDB for SQL backends, and TDB for a pure Java file based storage. The latter is said to be the faster and easier to setup, which makes it suitable for implementing in the pellet/rdf bioclipse plugin, so I went away and implemented it (which was indeed very easy, after all the Eclipse classpath horror had been sorted out). Will have to refine it with better handling of filepaths, good method naming etc, before hopefully committing a patch during tomorrow or so.
First comparable performance comparison: Pellet vs. Prolog
Meanwhile, went away and compared the time for the two operations from the last blog post (Loading of ~1 million triples + extracting all unique predicates) between Pellet and Prolog. Now the results are more similar (Pellet's performance has increased around 1000 fold :) ) though Prolog is still the faster one:
Comparison results
| Operation | Pellet | Prolog |
|---|---|---|
| Loading ~mill. triples | 71.703 s | 49.519 s |
| Retr.all unique predicates | 4.371 s | 3.508 s |
Pellet
Total time for importing nmrshiftdata with Pellet: 71.703 s
[[http://www.nmrshiftdb.org/onto#moleculeId],
[http://xmlns.com/foaf/0.1/homepage],
[http://www.blueobelisk.org/chemistryblogs/casnumber],
[http://www.w3.org/2002/07/owl#sameAs],
[http://www.blueobelisk.org/chemistryblogs/inchi],
[http://www.blueobelisk.org/chemistryblogs/inchikey],
[http://www.nmrshiftdb.org/onto#hasSpectrum],
[http://purl.org/dc/elements/1.1/title],
[http://www.nmrshiftdb.org/onto#spectrumId],
[http://www.nmrshiftdb.org/onto#spectrumType],
[http://www.nmrshiftdb.org/onto#hasPeak],
[http://www.nmrshiftdb.org/onto#temperature],
[http://www.nmrshiftdb.org/onto#solvent],
[http://www.nmrshiftdb.org/onto#field],
[http://www.w3.org/1999/02/22-rdf-syntax-ns#type],
[http://www.nmrshiftdb.org/onto#hasShift],
[http://purl.org/dc/elements/1.1/source], [http://purl.org/ontology/bibo/doi]]
Total time for retreiving all predicates, with Pellet: 4.371 sNote that Pellet returns some extra, in-built owl predicates
Prolog
Total time for importing nmrshiftdata with Prolog: 49.519 s
[['.'('http://purl.org/dc/elements/1.1/source',
'.'('http://purl.org/dc/elements/1.1/title',
'.'('http://purl.org/ontology/bibo/doi',
'.'('http://www.blueobelisk.org/chemistryblogs/casnumber',
'.'('http://www.blueobelisk.org/chemistryblogs/inchi',
'.'('http://www.blueobelisk.org/chemistryblogs/inchikey',
'.'('http://www.nmrshiftdb.org/onto#field',
'.'('http://www.nmrshiftdb.org/onto#hasPeak',
'.'('http://www.nmrshiftdb.org/onto#hasShift',
'.'('http://www.nmrshiftdb.org/onto#hasSpectrum',
'.'('http://www.nmrshiftdb.org/onto#moleculeId',
'.'('http://www.nmrshiftdb.org/onto#solvent',
'.'('http://www.nmrshiftdb.org/onto#spectrumId',
'.'('http://www.nmrshiftdb.org/onto#spectrumType',
'.'('http://www.nmrshiftdb.org/onto#temperature',
'.'('http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
'.'('http://www.w3.org/2002/07/owl#sameAs',
'.'('http://xmlns.com/foaf/0.1/homepage', []))))))))))))))))))]]
Total time for retreiving all predicates with Prolog: 3.508 sTo be continued...
Recent comments
18 hours 47 min ago
6 days 8 hours ago
6 days 9 hours ago
1 week 2 hours ago
1 week 10 hours ago
1 week 15 hours ago
1 week 22 hours ago
1 week 2 days ago
2 weeks 3 days ago
2 weeks 4 days ago