Egon pointed to an interesting blog post about a feature that is available as a an extension to Jena, the semantic web framework available in Bioclipse. It allows to very easily query multiple SPARQL endpoints from a single SPARQL query (using the SERVICE
keyword), and use variable bound from one endpoint when querying the next.
This is very useful in general. I was also thinking of the specific scenario (along the lines we have partly already been thinking) to use multiple Semantic MediaWikis as community maintained databanks, for querying back into Bioclipse. Being able to use multiple MediaWiki installs is very useful because it is hard to incorporate a very efficient access restriction system in MediaWiki (due to the nature of how it works, with template calls and all), so then it is better to be able to have separate wikis for content which needs special restrictions.
Just found out that there is a Spatial Indexing package available for SWI-Prolog! How cool! That would come extremely handy if wanting to model the embryologic process of developmental biology with semantics (as I dream of doing :) ) ... and of course combining it with the ontologies and formats of the BioModels initiative.
Very good review on Semantic Web technologies in the Life Sciences:
Interestingly they talk some on "Semantic Systems biology" as well, concluding that Systems biology has been predicted, and shown, to be one of the main adopters of semantic technologies within the life sciences, due to it's high need of integration of knowledge from diverse scientific fields.
Surprisingly they are not mentioning any of the projects of the BioModels initiative, of which I just blogged, such as the Systems Biology Ontology. Should be essential?!
EDIT (19/2): The problem with doing this with a SPARQL query, is now solved! See this blog post.
Thought I should write up on my NMRSpectrum similarity search problem, which I've got quite stuck with ... even after trying to get some advice from Semantic Overflow. So ... here we go.
I have a problem that I can't seem to express successfully in either pure SPARQL nor using OWL class descriptions, seemingly because the problem combines lists, property chains, and numerical value constraints, in a troublesome mix.
Just got to know there is something called "OWL (2) profiles", which basically seems to be certain sets of restrictions one can infer on what you can express, with the aim to make certain usage patterns possible.
OWL 2 RL seems particularly interesting for my concern, since it is (according to the above link) meant to be "a syntactic subset of OWL 2 which is amenable to implementation using rule-based technologies", and rule-based technologies is exactly what I'm looking at with SWI-Prolog and BLIPKIT.
Just got to know the Semantic Science Portal today (though I've read with keen interest the papers of some of the people behind it, and know SADI and Bio2RDF from before, since Egon W told me about them).
On the portal I found some interesting new things though, including:
Found a very interesting quote:
"Both OWL-DL and function-free Horn rules are decidable fragments of first-order logic with interesting, yet orthogonal expressive power"
"Horn rules", is what prolog builds upon (a prolog statement are horn rules, AFAIS), so maybe Prolog fits into the category of "function-free horn rules"? (Gotta try to figure that out), and OWL-DL is the W3C standard for expressive semantics, that reasoners like pellet (which is available in bioclipse build upon.
What do you think of that title? :) To me it sounds like one of the (many) natural next steps forward for Bioclipse sometime in future1.
There are lots of things that can't be answered by a computer from data alone. Maybe the majority of what we humans perceive as knowledge is inferred from a combination of data (simple fact statements about reality) and rules that tell how facts can be combined together to allow making implicit knowledge (knowledge that is not persisted as facts anywhere, but has to be inferred from other facts and rules) become explicit.
One can easily imagine though, that storing every single piece of knowledge that could be stated, as an explicit fact, would require more storage than can probably ever be made available in this universe.
It is not too hard to come up with some processes which are just too complex and involves too much variability2 that it is unrealistic to try to capture every imaginable state of of that system or process in explicit facts. Instead we must seek the "first principles" that defines the process, and through simulations make explicit any knowledge we are looking for, at the time we need it (one can of course cache often accessed knowledge).
As work has now started on getting bioclipse.net on Drupal (more on that in a while), it's good to know that Drupal is in to semantic web.
I was wondering what it looks like (since I have not had time to play with it), and to me, this screenshot was clarifying, showing how you can map fields (core fields, or your own custom ones) to rdf types (In drupal you can create both custom fields, and custom content types, which contain many fields).
Using SWI-Prolog's semweb package, I had extracted all predicates in a RDF source, containing some 1 million triples, into the following list: