As reported in a previous blog post, I ran into java stacksize errors when importing large amounts of data into pellet. Pellet was using just the in-memory Jena RDF store, which obviously puts limits on the amount of data it can handle.
Jena offers other options for RDF storage though, including SDB for SQL backends, and TDB for a pure Java file based storage.
I started with some initial performance testing for RDF data, between pellet an prolog, which are now both available integrated in Bioclipse.
I needed a Bioclipse manager method that could take an arbitrary number of arguments, (for a general purpose prolog method mapper). Through a useful discussion with jonalv, we figured out that there exists at least one working way of doing this, while there are a number of ways that do not work across both the Rhino/JavaScript though they work in Java alone.
Looking into some NMR usecases for my project, I realize my knowlegde from the Organic Chemistry course some four years ago has quite much faded away. Luckily I found this excellent NMR intro lecture on YouTube:
A strategy for how to work with the Bioclipse/JPL/Prolog/Blipkit combination I'm setting up, is becoming clear.
The main idea with Bioclipse, as well as with having a prolog engine available in it, is for flexible and "interactive" knitting together of knowledge. One of the main questions regarding how to use a Bioclipse/JPL/Prolog/Blipkit combination, has been where to put the bulk of knowledge integration/reasoning code? There would in principle be three options for that:
These are a few technical (and other types) of problems that I'm now realizing I will have to solve, sooner or later:
[00.0]
, but it might clash with other uses of the same character.Using SWI-Prolog's semweb package, I had extracted all predicates in a RDF source, containing some 1 million triples, into the following list:
I had the problem that in JPL (The java Prolog API) you cannot use namespaces before term (atoms or variables etc.) names, like so:
prologFunction( ns:'atom' ).
The best solution would be to have some kind of "namespace-like" support in the JS console of Bioclipse instead. One easy thing one can do is to just create a simple function that appends the long preceding URL, so a JS Example could be:
function molid ( term ) { return "http://pele.farmbio.uu.se/nmrshiftdb/?moleculeId=" + term; } blipkit.queryRDF(molid("234"),"X","Y");
I had problems executing rdf_db from inside Bioclipse, but was getting problems similar to this one:
Running JavaScript... org.mozilla.javascript.WrappedException: Wrapped java.lang.RuntimeException: Failed to run method (line: #9) jpl.PrologException: PrologException: error(existence_error(procedure, /(rdf_load, 1)), context(:(system, /('$c_call_prolog', 0)), _0)) JavaScript done.
This was solved by adding the following line to the blipstart.pl file:
:- use_module(library('semweb/rdf_db')).
So the last part of the file now looks like:
.
I found a nice introduction to the use of RDF in Prolog (SWI-Prolog). It contains short primers for both RDF and Prolog, so it should be accessible to anyone with a minimal programming background: