Using SWI-Prolog's semweb package, I had extracted all predicates in a RDF source, containing some 1 million triples, into the following list:
'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' 'http://www.nmrshiftdb.org/onto#moleculeId' 'http://www.blueobelisk.org/chemistryblogs/inchi' 'http://www.blueobelisk.org/chemistryblogs/inchikey' 'http://www.w3.org/2002/07/owl#sameAs' 'http://www.nmrshiftdb.org/onto#hasShift' 'http://www.nmrshiftdb.org/onto#hasPeak' 'http://www.blueobelisk.org/chemistryblogs/casnumber' 'http://purl.org/dc/elements/1.1/title' 'http://www.nmrshiftdb.org/onto#spectrumId' 'http://www.nmrshiftdb.org/onto#spectrumType' 'http://www.nmrshiftdb.org/onto#temperature' 'http://xmlns.com/foaf/0.1/homepage' 'http://www.nmrshiftdb.org/onto#hasSpectrum' 'http://www.nmrshiftdb.org/onto#solvent' 'http://www.nmrshiftdb.org/onto#field' 'http://purl.org/dc/elements/1.1/source' 'http://purl.org/ontology/bibo/doi'
These predicates can in SWI-Prolog be used to find subjects and objects by executing a query like this (given that you have loaded the RDF database, with rdf_db:rdf_load(?RDF_file)):
rdf_db:rdf( Subject, 'http://purl.org/ontology/bibo/doi', Object ).
Now, that is very tedious to type, as you try to tie together your knowledge, so I wanted to create some convenience methods for each of the predicates.
Of course, Regular Expressions was the tool of choice :), and you find the regexes I used below. The RegExes were created, and works in the Kate text editor (I have installed it in my Ubuntu, even if it is a KDE program, since it is the only one with decent RegEx support, AFAIK).
Find pattern:
^(\'.*\#)(.*)(\')$
has\2( Subject, Predicate ) :-\n rdf_db:rdf( Subject, \0, Predicate).
Find pattern:
^(\'.*\/)([^/]*)(\')$
has\2( Subject, Predicate ) :-\n rdf_db:rdf( Subject, \0, Predicate).
After running these regexes, I had the following neat list of convenience Prolog methods (I only had to manually add the upper casing of the first letters of the predicate after 'has', and remove some double 'has'es):
hasType( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', Predicate). hasMoleculeId( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#moleculeId', Predicate). hasInchi( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.blueobelisk.org/chemistryblogs/inchi', Predicate). hasInchikey( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.blueobelisk.org/chemistryblogs/inchikey', Predicate). sameAs( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.w3.org/2002/07/owl#sameAs', Predicate). hasShift( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasShift', Predicate). hasPeak( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasPeak', Predicate). hasCasnumber( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.blueobelisk.org/chemistryblogs/casnumber', Predicate). hasTitle( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://purl.org/dc/elements/1.1/title', Predicate). hasSpectrumId( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#spectrumId', Predicate). hasSpectrumType( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#spectrumType', Predicate). hasTemperature( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#temperature', Predicate). hasHomepage( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://xmlns.com/foaf/0.1/homepage', Predicate). hasSpectrum( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#hasSpectrum', Predicate). hasSolvent( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#solvent', Predicate). hasField( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://www.nmrshiftdb.org/onto#field', Predicate). hasSource( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://purl.org/dc/elements/1.1/source', Predicate). hasDoi( Subject, Predicate ) :- rdf_db:rdf( Subject, 'http://purl.org/ontology/bibo/doi', Predicate).
Now, that's not the whole story, as some of the triples were stored as more complex prolog atoms (i.e. including compund terms), the execusion of which could be simplified, so the following methods were created to address those cases:
hasInchiKey( MoleculeID, InchiKey ) :- rdf_db:rdf( MoleculeID, 'http://www.blueobelisk.org/chemistryblogs/inchikey', literal(InchiKey)). hasTitle( MoleculeID, Title ) :- rdf_db:rdf( MoleculeID, 'http://purl.org/dc/elements/1.1/title', literal(Title) ). hasShift( Peak, ShiftValueLiteral ) :- rdf_db:rdf( Peak, 'http://www.nmrshiftdb.org/onto#hasShift', literal(type('nmr:ppm', ShiftValueLiteral))). hasMoleculeID( MoleculeID, MoleculeIDValue ) :- rdf_db:rdf( MoleculeID, 'http://www.nmrshiftdb.org/onto#moleculeId', literal(MoleculeIDLiteral)), atom_number_create( MoleculeIDLiteral, MoleculeIDValue). hasShiftCloseTo( Peak, ShiftValue ) :- rdf_db:rdf( Peak, 'http://www.nmrshiftdb.org/onto#hasShift', literal(type('nmr:ppm', ValueLiteral))), atom_number_create( ValueLiteral, Value ), Value >= ShiftValue - 0.0001, Value =< ShiftValue + 0.0001.
atom_number_create( Atom, Number ) :- atom_length( Atom, AtomLength ), AtomLength > 0 -> atom_number( Atom, Number ); atom_number( '0', Number ).
Comments
another trick is to use an
another trick is to use an RDFS schema to generate prolog predicates.
You can do this in blip as follows:
See biopax3_db and biopax3_bridge_from_rdf in the blip distro, and also:
http://blipkit.wordpress.com/2009/11/26/exploring-pathway-data/
for a discussion of native prolog predicates compared to prolog access via the semweb library
Thanks a lot for the tips,
Thanks a lot for the tips, will look at it!