Inchi

Using the Chemical Translation Service with Python to get Inchis from chemical names

I needed to convert a bunch of chemical compound name into International Chemical Identifiers (Inchis), to enable easily creating links to various web services and databases that take inchis as input, such as Chembl.

I found out the very useful Chemical Translation Service, which has nice GUIs for doing this manually. In order to do this in a more automated fashion for many compounds though, I realized I'd have to script it up a bit, (in python of course).

I decided to make use of the XML format of the translation service. I have had mixed experiences with both messing with urls, and parsing xml, in python before, so I was very happy to get to know two new python packages that focus on providing a straightforward API that is "usable to humans", requests and xmltodict.

They turned out to be great combination, and IMO the conversion becomes a quite readable bunch of code lines:

# Base URL of the Chemical Translation Service
base_url = "http://cts.fiehnlab.ucdavis.edu/transform/transform"
 
# Create a dictionary with the query parameters
query_params = { "format" : "xml",
                   "extension" : "xml",
                   "to" : "inchikey",
                   "idValue" : query_compound_name,
                   "from" : "name"}
 
# Execute the query
response = requests.get(base_url, params=query_params)
 
# Parse the XML into a python dict (array) structure
xmldict = xmltodict.parse(response.text)
 
# Extract the Inchi key from the array structure
chem_data = xmldict['compoundResultSets']['compoundResultSet']
inchi_key = chem_data['inchiHashKey']

And, why not make it complete with command line flags and stuff: