I needed to convert a bunch of chemical compound name into International Chemical Identifiers (Inchis), to enable easily creating links to various web services and databases that take inchis as input, such as Chembl.
I found out the very useful Chemical Translation Service, which has nice GUIs for doing this manually. In order to do this in a more automated fashion for many compounds though, I realized I'd have to script it up a bit, (in python of course).
I decided to make use of the XML format of the translation service. I have had mixed experiences with both messing with urls, and parsing xml, in python before, so I was very happy to get to know two new python packages that focus on providing a straightforward API that is "usable to humans", requests and xmltodict.
They turned out to be great combination, and IMO the conversion becomes a quite readable bunch of code lines:
# Base URL of the Chemical Translation Service base_url = "http://cts.fiehnlab.ucdavis.edu/transform/transform" # Create a dictionary with the query parameters query_params = { "format" : "xml", "extension" : "xml", "to" : "inchikey", "idValue" : query_compound_name, "from" : "name"} # Execute the query response = requests.get(base_url, params=query_params) # Parse the XML into a python dict (array) structure xmldict = xmltodict.parse(response.text) # Extract the Inchi key from the array structure chem_data = xmldict['compoundResultSets']['compoundResultSet'] inchi_key = chem_data['inchiHashKey']
And, why not make it complete with command line flags and stuff:
import requests import xmltodict import optparse import sys def main(): # Get the query compound name from the command line parameters opts = parse_commandline_opts() query_compound_name = opts.query_string # Base URL of the Chemical Translation Service base_url = "http://cts.fiehnlab.ucdavis.edu/transform/transform" # Create a dictionary with the query parameters query_params = { "format" : "xml", "extension" : "xml", "to" : "inchikey", "idValue" : query_compound_name, "from" : "name"} # Execute the query response = requests.get(base_url, params=query_params) # Parse the XML into a python dict (array) structure xmldict = xmltodict.parse(response.text) # Extract the Inchi key from the array structure chem_data = xmldict['compoundResultSets']['compoundResultSet'] inchi_key = chem_data['inchiHashKey'] print "Inchi key: %s" % inchi_key def parse_commandline_opts(): op = optparse.OptionParser() op.add_option("-q", "--query-string", help="Query string. Should be a chemical name" + "such as 'phenobarbital'") opts,args = op.parse_args() if not opts.query_string: sys.exit("You have to specify chemical name and output format!" + "Use -h flag to view options") return opts # Execute main funtion if this file is the starting point if __name__ == '__main__': main()
Then I can call the script like so, to get an Inchi from a chemical name:
[samuel inchi_conv]$ python inchi_conv.py -q phenobarbital Inchi key: DDBREPKUVSBGFI-UHFFFAOYSA-N
[samuel inchi_conv]$ python inchi_conv.py -h Usage: inchi_conv.py [options] Options: -h, --help show this help message and exit -q QUERY_STRING, --query-string=QUERY_STRING Query string. Should be a chemical name such as 'phenobarbital'