Using the Chemical Translation Service with Python to get Inchis from chemical names

I needed to convert a bunch of chemical compound name into International Chemical Identifiers (Inchis), to enable easily creating links to various web services and databases that take inchis as input, such as Chembl.

I found out the very useful Chemical Translation Service, which has nice GUIs for doing this manually. In order to do this in a more automated fashion for many compounds though, I realized I'd have to script it up a bit, (in python of course).

I decided to make use of the XML format of the translation service. I have had mixed experiences with both messing with urls, and parsing xml, in python before, so I was very happy to get to know two new python packages that focus on providing a straightforward API that is "usable to humans", requests and xmltodict.

They turned out to be great combination, and IMO the conversion becomes a quite readable bunch of code lines:

# Base URL of the Chemical Translation Service
base_url = "http://uranus.fiehnlab.ucdavis.edu:8080/cts/transform/transform"
 
# Create a dictionary with the query parameters
query_params = { "format" : "xml",
                   "extension" : "xml",
                   "to" : "inchikey",
                   "idValue" : query_compound_name,
                   "from" : "name"}
 
# Execute the query
response = requests.get(base_url, params=query_params)
 
# Parse the XML into a python dict (array) structure
xmldict = xmltodict.parse(response.text)
 
# Extract the Inchi key from the array structure
chem_data = xmldict['compoundResultSets']['compoundResultSet']
inchi_key = chem_data['inchiHashKey']

And, why not make it complete with command line flags and stuff:

import requests
import xmltodict
import optparse
import sys
 
def main():
    # Get the query compound name from the command line parameters
    opts = parse_commandline_opts()
    query_compound_name = opts.query_string
 
    # Base URL of the Chemical Translation Service
    base_url = "http://uranus.fiehnlab.ucdavis.edu:8080/cts/transform/transform"
 
    # Create a dictionary with the query parameters
    query_params = { "format" : "xml",
                       "extension" : "xml",
                       "to" : "inchikey",
                       "idValue" : query_compound_name,
                       "from" : "name"}
 
    # Execute the query
    response = requests.get(base_url, params=query_params)
 
    # Parse the XML into a python dict (array) structure
    xmldict = xmltodict.parse(response.text)
 
    # Extract the Inchi key from the array structure
    chem_data = xmldict['compoundResultSets']['compoundResultSet']
    inchi_key = chem_data['inchiHashKey']
 
    print "Inchi key: %s" % inchi_key
 
def parse_commandline_opts():
    op = optparse.OptionParser()
    op.add_option("-q", "--query-string", 
                  help="Query string. Should be a chemical name" + 
                       "such as 'phenobarbital'")
    opts,args = op.parse_args()
    if not opts.query_string:
        sys.exit("You have to specify chemical name and output format!" + 
                 "Use -h flag to view options")
    return opts
 
# Execute main funtion if this file is the starting point
if __name__ == '__main__':
    main()

Then I can call the script like so, to get an Inchi from a chemical name:

[samuel inchi_conv]$ python inchi_conv.py -q phenobarbital
Inchi key: DDBREPKUVSBGFI-UHFFFAOYSA-N

... and of course it includes a nice help screen by default (created by the optparse package):
[samuel inchi_conv]$ python inchi_conv.py -h
Usage: inchi_conv.py [options]
 
Options:
  -h, --help            show this help message and exit
  -q QUERY_STRING, --query-string=QUERY_STRING
                        Query string. Should be a chemical name such as
                        'phenobarbital'