Converting RDF predicates to Prolog convenience methods with RegEx
Using SWI-Prolog's semweb package, I had extracted all predicates in a RDF source, containing some 1 million triples, into the following list:
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'
'http://www.nmrshiftdb.org/onto#moleculeId'
'http://www.blueobelisk.org/chemistryblogs/inchi'
'http://www.blueobelisk.org/chemistryblogs/inchikey'
'http://www.w3.org/2002/07/owl#sameAs'
'http://www.nmrshiftdb.org/onto#hasShift'
'http://www.nmrshiftdb.org/onto#hasPeak'
'http://www.blueobelisk.org/chemistryblogs/casnumber'
'http://purl.org/dc/elements/1.1/title'
'http://www.nmrshiftdb.org/onto#spectrumId'
'http://www.nmrshiftdb.org/onto#spectrumType'
'http://www.nmrshiftdb.org/onto#temperature'
'http://xmlns.com/foaf/0.1/homepage'
'http://www.nmrshiftdb.org/onto#hasSpectrum'
'http://www.nmrshiftdb.org/onto#solvent'
'http://www.nmrshiftdb.org/onto#field'
'http://purl.org/dc/elements/1.1/source'
'http://purl.org/ontology/bibo/doi'These predicates can in SWI-Prolog be used to find subjects and objects by executing a query like this (given that you have loaded the RDF database, with rdf_db:rdf_load(?RDF_file)):
rdf_db:rdf( Subject, 'http://purl.org/ontology/bibo/doi', Object ).(Note: In Prolog, by using an upper case on a term, in this case 'Subject', and 'Object', you indicate that it is a variable, to be resolved by Prolog).
Now, that is very tedious to type, as you try to tie together your knowledge, so I wanted to create some convenience methods for each of the predicates.
Solved with Regexes
Of course, Regular Expressions was the tool of choice :), and you find the regexes I used below. The RegExes were created, and works in the Kate text editor (I have installed it in my Ubuntu, even if it is a KDE program, since it is the only one with decent RegEx support, AFAIK).
RegEx1
Find pattern:
^(\'.*\#)(.*)(\')$Replacement pattern:
has\2( Subject, Predicate ) :-\n rdf_db:rdf( Subject, \0, Predicate).RegEx 2
Find pattern:
^(\'.*\/)([^/]*)(\')$Replacement pattern:
has\2( Subject, Predicate ) :-\n rdf_db:rdf( Subject, \0, Predicate).Result
After running these regexes, I had the following neat list of convenience Prolog methods (I only had to manually add the upper casing of the first letters of the predicate after 'has', and remove some double 'has'es):
hasType( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', Predicate).
hasMoleculeId( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#moleculeId', Predicate).
hasInchi( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.blueobelisk.org/chemistryblogs/inchi', Predicate).
hasInchikey( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.blueobelisk.org/chemistryblogs/inchikey', Predicate).
sameAs( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.w3.org/2002/07/owl#sameAs', Predicate).
hasShift( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasShift', Predicate).
hasPeak( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasPeak', Predicate).
hasCasnumber( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.blueobelisk.org/chemistryblogs/casnumber', Predicate).
hasTitle( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://purl.org/dc/elements/1.1/title', Predicate).
hasSpectrumId( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#spectrumId', Predicate).
hasSpectrumType( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#spectrumType', Predicate).
hasTemperature( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#temperature', Predicate).
hasHomepage( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://xmlns.com/foaf/0.1/homepage', Predicate).
hasSpectrum( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasSpectrum', Predicate).
hasSolvent( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#solvent', Predicate).
hasField( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#field', Predicate).
hasSource( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://purl.org/dc/elements/1.1/source', Predicate).
hasDoi( Subject, Predicate ) :-
rdf_db:rdf( Subject, 'http://purl.org/ontology/bibo/doi', Predicate).Special cases
Now, that's not the whole story, as some of the triples were stored as more complex prolog atoms (i.e. including compund terms), the execusion of which could be simplified, so the following methods were created to address those cases:
hasInchiKey( MoleculeID, InchiKey ) :-
rdf_db:rdf( MoleculeID, 'http://www.blueobelisk.org/chemistryblogs/inchikey', literal(InchiKey)).
hasTitle( MoleculeID, Title ) :-
rdf_db:rdf( MoleculeID, 'http://purl.org/dc/elements/1.1/title', literal(Title) ).
hasShift( Peak, ShiftValueLiteral ) :-
rdf_db:rdf( Peak, 'http://www.nmrshiftdb.org/onto#hasShift', literal(type('nmr:ppm', ShiftValueLiteral))).
hasMoleculeID( MoleculeID, MoleculeIDValue ) :-
rdf_db:rdf( MoleculeID, 'http://www.nmrshiftdb.org/onto#moleculeId', literal(MoleculeIDLiteral)),
atom_number_create( MoleculeIDLiteral, MoleculeIDValue).
hasShiftCloseTo( Peak, ShiftValue ) :-
rdf_db:rdf( Peak, 'http://www.nmrshiftdb.org/onto#hasShift', literal(type('nmr:ppm', ValueLiteral))),
atom_number_create( ValueLiteral, Value ),
Value >= ShiftValue - 0.0001,
Value =< ShiftValue + 0.0001. ...where atom_number_create/2 is the following wrapper method, for taking care of cases with empty literals:
atom_number_create( Atom, Number ) :-
atom_length( Atom, AtomLength ), AtomLength > 0 ->
atom_number( Atom, Number );
atom_number( '0', Number ).
Comments
another trick is to use an
another trick is to use an RDFS schema to generate prolog predicates.
You can do this in blip as follows:
See biopax3_db and biopax3_bridge_from_rdf in the blip distro, and also:
http://blipkit.wordpress.com/2009/11/26/exploring-pathway-data/
for a discussion of native prolog predicates compared to prolog access via the semweb library
Thanks a lot for the tips,
Thanks a lot for the tips, will look at it!