This blog is currently mostly used for documentation of my degree project at Dept. of Pharmaceutical Sciences at Uppsala University, named "Integrating Blipkit/BioProlog for semantic reasoning in Bioclipse"
(Supervisor: Egon Willighagen).

I'm jotting down quite everything that I need to remember here, so pardon the mess of small posts about this and that. For posts of longer periodicity, see the Planet bioclipse tagged posts.

-- Samuel Lampa - firstname.lastname@rilnet.com

Project on hold (Course being taken)

As I'm taking a 4 week course (molecular control systems) right now, my thesis project is on hold until the 17th of February. The project presentation will also be delayed until the end of April.

Looking forward to be back on track soon!

Backtracking - Key difference between SPARQL and Prolog

On something I realized a minute ago ...

Though being really different types of technologies, it might at first be tempting to compare a SPARQL query with a Prolog rule returning a list of results (Or at least it was to me until just a minute ago). In fact, SPARQL queries and prolog rules that return their results as lists, DO share some similarities. For example, in both you provide patterns of RDF statements with variables that are to be bound to each other or to RDF entities, in order to find all queried entities that match the pattern.

But then comes some important differences, in how SPARQL handles cases where one wants to evaluate looked-up entities with a function, like an arithmetic one. In SPARQL this has to be done (I think) with the FILTER construct, but that also means that backtracking is not done if nothing passes the filter (and that is the true meaning of a filter anyway, isn't it).

[solved] User defined datatypes not working in OWL 1.X

I seemingly ran into the trouble that user-defined datatypes does not work in OWL 1.X (which is seemingly what the version of Pellet used in Bioclipse does support?

Idea: How to store Prolog rules in Bioclipse scripts

It just struck me a very simple way of storing prolog code inside Bioclipse scripts, avoiding the need for a separate file containing the Prolog code. This might be very useful when using prolog as a kind of "query language" somewhat analogous to how SPARQL is used.
(Update: Prolog in fact turns out to be more powerful than SPARQL in this regard, as shown by the observation in this blog post, that SPARQL doesn't support backtracking).

The idea would be to simply store the prolog code in Bioclipse JS variable, and create a special manager method that can write such prolog query-code containing variables to a temporary file in the workspace and then just telling Prolog to "consult" that file, thereby "feeding the prolog engine" with the logic to use, from inside Bioclipse scripts.

File based RDF storage in Pellet, first tests

As reported in a previous blog post, I ran into java stacksize errors when importing large amounts of data into pellet. Pellet was using just the in-memory Jena RDF store, which obviously puts limits on the amount of data it can handle.

Jena offers other options for RDF storage though, including SDB for SQL backends, and TDB for a pure Java file based storage.

Initial performance comparison: Pellet vs Prolog in Bioclipse

I started with some initial performance testing for RDF data, between pellet an prolog, which are now both available integrated in Bioclipse.

Bioclipse manager method to take arbitrary number of arguments

I needed a Bioclipse manager method that could take an arbitrary number of arguments, (for a general purpose prolog method mapper). Through a useful discussion with jonalv, we figured out that there exists at least one working way of doing this, while there are a number of ways that do not work across both the Rhino/JavaScript though they work in Java alone.

A usage strategy emerges

A strategy for how to work with the Bioclipse/JPL/Prolog/Blipkit combination I'm setting up, is becoming clear.

The main idea with Bioclipse, as well as with having a prolog engine available in it, is for flexible and "interactive" knitting together of knowledge. One of the main questions regarding how to use a Bioclipse/JPL/Prolog/Blipkit combination, has been where to put the bulk of knowledge integration/reasoning code? There would in principle be three options for that:

  1. Bioclipse (Javascript environment)
  2. The Blipkit-Prolog/Bioclipse integration plugin (Java code, a.k.a. "Manager methods")
  3. The prolog engine (As a prolog file)

Problems to solve

These are a few technical (and other types) of problems that I'm now realizing I will have to solve, sooner or later:

  • How to output data? As array/N3/RDF?
  • Output everything, or only one solution at a time? (Use flags, for different options?)
  • How to indicate that a number should be treated as a literal? (For now I'm using the following syntax: [00.0], but it might clash with other uses of the same character.
  • Make IFile searchpaths resolvable to full paths, which Prolog can read.
  • How to output if you are using more than one variable in a prolog expression? (Multiple arrays?, Multidimensional arrays?).

Converting RDF predicates to Prolog convenience methods with RegEx

Using SWI-Prolog's semweb package, I had extracted all predicates in a RDF source, containing some 1 million triples, into the following list: