I needed to convert a bunch of chemical compound name into International Chemical Identifiers (Inchis), to enable easily creating links to various web services and databases that take inchis as input, such as Chembl.
I found out the very useful Chemical Translation Service, which has nice GUIs for doing this manually. In order to do this in a more automated fashion for many compounds though, I realized I'd have to script it up a bit, (in python of course).
I decided to make use of the XML format of the translation service. I have had mixed experiences with both messing with urls, and parsing xml, in python before, so I was very happy to get to know two new python packages that focus on providing a straightforward API that is "usable to humans", requests and xmltodict.
They turned out to be great combination, and IMO the conversion becomes a quite readable bunch of code lines:
# Base URL of the Chemical Translation Service base_url = "http://cts.fiehnlab.ucdavis.edu/transform/transform" # Create a dictionary with the query parameters query_params = { "format" : "xml", "extension" : "xml", "to" : "inchikey", "idValue" : query_compound_name, "from" : "name"} # Execute the query response = requests.get(base_url, params=query_params) # Parse the XML into a python dict (array) structure xmldict = xmltodict.parse(response.text) # Extract the Inchi key from the array structure chem_data = xmldict['compoundResultSets']['compoundResultSet'] inchi_key = chem_data['inchiHashKey']
And, why not make it complete with command line flags and stuff: