Converting RDF predicates to Prolog convenience methods with RegEx

Using SWI-Prolog's semweb package, I had extracted all predicates in a RDF source, containing some 1 million triples, into the following list:

'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'
'http://www.nmrshiftdb.org/onto#moleculeId'
'http://www.blueobelisk.org/chemistryblogs/inchi'
'http://www.blueobelisk.org/chemistryblogs/inchikey'
'http://www.w3.org/2002/07/owl#sameAs'
'http://www.nmrshiftdb.org/onto#hasShift'
'http://www.nmrshiftdb.org/onto#hasPeak'
'http://www.blueobelisk.org/chemistryblogs/casnumber'
'http://purl.org/dc/elements/1.1/title'
'http://www.nmrshiftdb.org/onto#spectrumId'
'http://www.nmrshiftdb.org/onto#spectrumType'
'http://www.nmrshiftdb.org/onto#temperature'
'http://xmlns.com/foaf/0.1/homepage'
'http://www.nmrshiftdb.org/onto#hasSpectrum'
'http://www.nmrshiftdb.org/onto#solvent'
'http://www.nmrshiftdb.org/onto#field'
'http://purl.org/dc/elements/1.1/source'
'http://purl.org/ontology/bibo/doi'

These predicates can in SWI-Prolog be used to find subjects and objects by executing a query like this (given that you have loaded the RDF database, with rdf_db:rdf_load(?RDF_file)):

rdf_db:rdf( Subject, 'http://purl.org/ontology/bibo/doi', Object ).

(Note: In Prolog, by using an upper case on a term, in this case 'Subject', and 'Object', you indicate that it is a variable, to be resolved by Prolog).

Now, that is very tedious to type, as you try to tie together your knowledge, so I wanted to create some convenience methods for each of the predicates.

Solved with Regexes

Of course, Regular Expressions was the tool of choice :), and you find the regexes I used below. The RegExes were created, and works in the Kate text editor (I have installed it in my Ubuntu, even if it is a KDE program, since it is the only one with decent RegEx support, AFAIK).

RegEx1

Find pattern:

^(\'.*\#)(.*)(\')$

Replacement pattern:
has\2( Subject, Predicate ) :-\n  rdf_db:rdf( Subject, \0, Predicate).

RegEx 2

Find pattern:

^(\'.*\/)([^/]*)(\')$

Replacement pattern:
has\2( Subject, Predicate ) :-\n  rdf_db:rdf( Subject, \0, Predicate).

Result

After running these regexes, I had the following neat list of convenience Prolog methods (I only had to manually add the upper casing of the first letters of the predicate after 'has', and remove some double 'has'es):

hasType( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', Predicate).
hasMoleculeId( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#moleculeId', Predicate).
hasInchi( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.blueobelisk.org/chemistryblogs/inchi', Predicate).
hasInchikey( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.blueobelisk.org/chemistryblogs/inchikey', Predicate).
sameAs( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.w3.org/2002/07/owl#sameAs', Predicate).
hasShift( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasShift', Predicate).
hasPeak( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasPeak', Predicate).
hasCasnumber( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.blueobelisk.org/chemistryblogs/casnumber', Predicate).
hasTitle( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://purl.org/dc/elements/1.1/title', Predicate).
hasSpectrumId( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#spectrumId', Predicate).
hasSpectrumType( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#spectrumType', Predicate).
hasTemperature( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#temperature', Predicate).
hasHomepage( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://xmlns.com/foaf/0.1/homepage', Predicate).
hasSpectrum( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasSpectrum', Predicate).
hasSolvent( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#solvent', Predicate).
hasField( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#field', Predicate).
hasSource( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://purl.org/dc/elements/1.1/source', Predicate).
hasDoi( Subject, Predicate ) :-
  rdf_db:rdf( Subject, 'http://purl.org/ontology/bibo/doi', Predicate).

Special cases

Now, that's not the whole story, as some of the triples were stored as more complex prolog atoms (i.e. including compund terms), the execusion of which could be simplified, so the following methods were created to address those cases:

hasInchiKey( MoleculeID, InchiKey ) :-
  rdf_db:rdf( MoleculeID, 'http://www.blueobelisk.org/chemistryblogs/inchikey', literal(InchiKey)).
 
hasTitle( MoleculeID, Title ) :-
  rdf_db:rdf( MoleculeID, 'http://purl.org/dc/elements/1.1/title', literal(Title) ).
 
hasShift( Peak, ShiftValueLiteral ) :-
  rdf_db:rdf( Peak, 'http://www.nmrshiftdb.org/onto#hasShift', literal(type('nmr:ppm', ShiftValueLiteral))).
 
hasMoleculeID( MoleculeID, MoleculeIDValue ) :- 
  rdf_db:rdf( MoleculeID, 'http://www.nmrshiftdb.org/onto#moleculeId', literal(MoleculeIDLiteral)),
  atom_number_create( MoleculeIDLiteral, MoleculeIDValue).
 
hasShiftCloseTo( Peak, ShiftValue ) :-
  rdf_db:rdf( Peak, 'http://www.nmrshiftdb.org/onto#hasShift', literal(type('nmr:ppm', ValueLiteral))),
  atom_number_create( ValueLiteral, Value ),
  Value >= ShiftValue - 0.0001, 
  Value =< ShiftValue + 0.0001. 

...where atom_number_create/2 is the following wrapper method, for taking care of cases with empty literals:
atom_number_create( Atom, Number ) :-
  atom_length( Atom, AtomLength ), AtomLength > 0 -> 
  atom_number( Atom, Number );
  atom_number( '0', Number ).

Comments

another trick is to use an

another trick is to use an RDFS schema to generate prolog predicates.

You can do this in blip as follows:

 blip -i biopax-level3.owl ontol-schema  -local biopax3_db -ns "http://www.biopax.org/release/biopax-level2.owl#"

See biopax3_db and biopax3_bridge_from_rdf in the blip distro, and also:

http://blipkit.wordpress.com/2009/11/26/exploring-pathway-data/

for a discussion of native prolog predicates compared to prolog access via the semweb library

Thanks a lot for the tips,

Thanks a lot for the tips, will look at it!