Bioclipse

Final version of degree project report

The last administrative details of my thesis project are now finished, and the report is now available in final form, for download as PDF in on this page (no 14 in the list), or this direct link. (The title of the project was "SWI-Prolog as a Semantic Web Tool for semantic querying in Bioclipse: Integration and performance benchmarking").

Report approved - preliminary version for download

My degree project, titled "SWI-Prolog as a Semantic Web tool for semantic querying in Bioclipse" is getting closer to finish. Now my report is approved by the Scientific Reviewer (thanks, Prof. Mats Gustafsson), so I wanted to make it available here (Download PDF). Reports on typos are welcome of course! :)

NSF soon requiring data management plans

Interesting: "Scientists Seeking NSF Funding Will Soon Be Required to Submit Data Management Plans". Is this a new trend? If so, it should be an area where Bioclipse can help, I gues, not least through the planned ability to export semantic data to a Semantic MediaWiki (That will require me to finish my GSoC first, though).

3rd Project Update (Integrating SWI-Prolog for Semantic Reasoning in Bioclipse)

I just had my 3rd, and last project update presentation (before the final presenation on April 28th), presenting results from comparing the performance of the integrated SWI-Prolog against Jena and Pellet, for a spectrum similarity search query. Find the sldes below.

Correction of flawed results: Close competition between Jena and Prolog

UPDATE 29/3: See new results here

I reported in a previous blog post (with a bit of surprise) that Jena clearly outperformed SWI-Prolog for a NMR Spectrum similarity search run inside Bioclipse. I have now realized that indeed these previous results were flawed for a number of reasons.

Querying multiple SPARQL endpoints from single query, with Jena SERVICE extension

Egon pointed to an interesting blog post about a feature that is available as a an extension to Jena, the semantic web framework available in Bioclipse. It allows to very easily query multiple SPARQL endpoints from a single SPARQL query (using the SERVICE keyword), and use variable bound from one endpoint when querying the next.

This is very useful in general. I was also thinking of the specific scenario (along the lines we have partly already been thinking) to use multiple Semantic MediaWikis as community maintained databanks, for querying back into Bioclipse. Being able to use multiple MediaWiki installs is very useful because it is hard to incorporate a very efficient access restriction system in MediaWiki (due to the nature of how it works, with template calls and all), so then it is better to be able to have separate wikis for content which needs special restrictions.

Screencast: Experimental Prolog integration in Bioclipse

I wanted to test out some screen casting, so I chose to demo the (still experimental) SWI-Prolog integration into Bioclipse, showing how Prolog code (or a "Prolog knowledge base") can conveniently be stored inside Bioclipse's JavaScript environment (in a JS variable), loaded into the prolog engine, and then queried, all from the JS environment, and finally the results can be returned as well to the Javascript environment for further processing or output.

Note that this is still at the experimental stage, so things are a bit rough around the edges!

"Orthogonal expressivity" of Pellet and Prolog?

Found a very interesting quote:

"Both OWL-DL and function-free Horn rules are decidable fragments of first-order logic with interesting, yet orthogonal expressive power"Motik B, Sattler U, Studer R. Query Answering for OWL-DL with rules. Web Semantics: Science, Services and Agents on the World Wide Web. 2005;3(1):41-60. Available at: http://linkinghub.elsevier.com/retrieve/pii/S157082680500003X.

"Horn rules", is what prolog builds upon (a prolog statement are horn rules, AFAIS), so maybe Prolog fits into the category of "function-free horn rules"? (Gotta try to figure that out), and OWL-DL is the W3C standard for expressive semantics, that reasoners like pellet (which is available in bioclipse build upon.

Automating answering of questions with no answers - by wrapping simulations in semantics

What do you think of that title? :) To me it sounds like one of the (many) natural next steps forward for Bioclipse sometime in future1.

Explicit knowledge is too expensive

There are lots of things that can't be answered by a computer from data alone. Maybe the majority of what we humans perceive as knowledge is inferred from a combination of data (simple fact statements about reality) and rules that tell how facts can be combined together to allow making implicit knowledge (knowledge that is not persisted as facts anywhere, but has to be inferred from other facts and rules) become explicit.

One can easily imagine though, that storing every single piece of knowledge that could be stated, as an explicit fact, would require more storage than can probably ever be made available in this universe.

Simulations can make knowledge explicit, from first princples

It is not too hard to come up with some processes which are just too complex and involves too much variability2 that it is unrealistic to try to capture every imaginable state of of that system or process in explicit facts. Instead we must seek the "first principles" that defines the process, and through simulations make explicit any knowledge we are looking for, at the time we need it (one can of course cache often accessed knowledge).

[solved] User defined datatypes not working in OWL 1.X

I seemingly ran into the trouble that user-defined datatypes does not work in OWL 1.X (which is seemingly what the version of Pellet used in Bioclipse does support?