Company
Date Published
Author
Tom Nijhof
Word count
1388
Language
English
Hacker News points
None

Summary

The author of the text is a back-end developer at CytoSMART who aims to update their graph database with chemical synonyms from PubChem. The author has already downloaded 197M nodes of all chemical synonyms, but recognizes that this number is still too low and wants to connect compounds from NCI (National Cancer Institute) to their database using NSC numbers as synonyms. To achieve this, the developer uses two endpoints provided by PubChem: PUG (Power User Gateway) and RDF (Resource Description Framework). The author creates a backend function to utilize these endpoints efficiently, retrieving data from both APIs and combining the results to create a list of compounds with their corresponding synonyms. The developer then updates their database using this information, removing incorrect synonyms and adding correct ones. Finally, they apply this method to update NCI60 data, which includes 56,685 unique NSC numbers, and achieve an improvement in matching at least one synonym with a compound for 55578 out of 56685 NSC numbers.