Frequently Asked Questions
  1. Withdrawn 2.0 in numbers
  2. System Requirements
  3. What is the ATC code?
  4. How can I use the structure search options?
  5. How was the side-effect mapping done?
  6. How were the drug synonym lists for the FAERS mapping obtained?
  7. How is the toxicity class of a compound determined?
  8. How are the activity values for the mechanism of action determined?
  9. How was the pathway enrichment analysis performed?
    1. What is the p-value?
    2. What is the e-value?
  10. Why do mechanism of action prediction and pathway enrichment analysis sometimes give different ChEMBL compounds as results?
  11. How large is the overlap from Withdrawn 2.0 to ChEMBL withdrawn compounds?
  12. Is there a bulk download?

Withdrawn 2.0 in numbers

Withdrawn 2.0 is a database containing 626 withdrawn drugs, along with information regarding their withdrawal history, toxicity, side-effects etc.
Additionally, 409 unique targets of withdrawn drugs were extracted from ChEMBL assays and filtered regarding their reliability and minimal activity to withdrawn compounds, leading to 1628 unique drug-target-relationships.
For 275 KEGG pathways containing druggable targets, a pathway enrichment analysis is available, and the ATC tree contains 512 withdrawn drugs, for which a level 5 ATC code is available.

System Requirements


We recommend a recent version of Mozilla Firefox, Microsoft Internet Explorer or Google Chrome. For the usage of the ChemDoodle web component (used on pages where a similarity search is performed), Internet Explorer 9 or newer is recommended. If you have an older version of Internet Explorer Google chrome frame is required for the right presentation of ChemDoodle. In addition to that, JavaScript has to be enabled.

What is the ATC code?

The Anatomical Therapeutic Chemical (ATC) classification system is used for the classification of drugs. It is published by the World Health Organization (WHO). The classification into groups is based on therapeutic and chemical characteristics of the drugs. Each ATC code is divided into 5 levels:

  • 1. level: Anatomical main group
  • 2. level: Therapeutic main group
  • 3. level: Therapeutic/pharmacological subgroup
  • 4. level: Chemical/therapeutic/pharmacological subgroup
  • 5. level: Chemical substance

Substances or combination of substances in the 5th level refer to a single indication. Drugs having more than one indication belong to more than one ATC code. Aspirine for example has 3 ATC codes assigned.


How can I use the structure search options?


The option to use a user provided molecular structure is available in a variety of tabs (drug search, mechanism of action and pathways). As a first step, a molecular structure has to be loaded in the ChemDoodle web interface. Structures can be obtained by entering a PubChem name, a SMILES string, loading a structure file or drawing with the provided tools (see below).
Once a structure is loaded, additional modifications can be done as well. When satisfied with the result, the button "Start Similarity Search" can be used to start the search/predictions.

How was the side-effect mapping done?


The FAERS database is a reporting system where medical professionals, patients, lawyers and similar are able to submit side-effects that occured during the use of drugs. For the displayed side-effects, all case reports in FAERS were filtered, leaving only reports submitted by medical professionals, and in a second step, only cases where the specific drug was the primary suspect for the occuring reaction. Side-effects were assigned to withdrawn drugs by collecting every report that concerned a marketed drug that contains the specific withdrawn compound. Lastly, the occuring reactions were mapped to MedDRA terms to filter out duplicates.

How were the drug synonym lists for the FAERS mapping obtained?


To map withdrawn drugs to FAERS drugs, a collection of drug names, synonyms, products and brands was created and mapped to its ingredients (in case of combination drugs). Synonym data was collected from TTD (single drugs and drugs identified as combinations), DrugBank (drugs, synonyms, external identifier, products, mixtures, international brands), NDA (tradenames) and a FAERS list of explicit combinations of drug names that often occur in FAERS data.
As an example, you can download the resulting synonym list for the drug Ethinylestradiol.

How is the toxicity class of a compound determined?

ProTox-II is the first freely available web server for toxicity prediction based on chemical similarity and the identification of toxic fragments and demonstrates good performance in comparison to available QSAR-based methods. ProTox predicts the median oral lethal doses (LD50 values) and toxicity classes in rodents. In addition to the oral toxicity prediction, the web server indicates possible toxicity targets based on a collection of protein-ligand-based pharmacophores ('toxicophores') and therefore provides suggestions for the mechanism of toxicity development. However, the absence of such toxicity prediction or alert for a compound should not be taken as an indication of safety.

How are the activity values for the mechanism of action determined?

From the ChEMBL 32 database, all interactions between ChEMBL compounds and human targets were extracted. Furthermore, activities were normalized to nanomolar values and for each unique compound-target pair, the minimum activity value was determined.
Afterwards, a similarity comparison to Withdrawn 2.0 compounds was performed, using Morgan fingerprints of length 128 and radius 2.

How was the pathway enrichment analysis performed?

Used resources were the ChEMBL 32 database, the UniProt database, KEGG pathways and a list of druggable genes from IDG, further enriched with 43 genes which are targets of approved drugs from TTD.
At the start a table was created containing data from assays with human organisms. This table was filtered for real binders, taking into account a combination of binding strength, IC50/EC50 values and confidence scores. The ChEMBL-IDs of the targets were mapped to UNIPROTKB IDs, and to KEGG-IDs. Both mappings are not unique, therefore there are cases where a ChEMBL-ID is connected to more than one UNIPROTKB-ID, or a UNIPROTKB-ID is mapped to more than one KEGG-ID. Not for every Target-ChEMBL-ID there is a UNIPROTKB-ID. The resulting table contains a mapping of unique relations between CHEMBL molecules and KEGG-IDs of human targets. Using all human KEGG-ids, which were derived from the UNIPROT-DBs-table, into a query on KEGG mapper, a table was derived containing all information between KEGG pathways and human genes. Filtering according to KEGG pathways related to diseases and infections, a table containing only human KEGG-IDS appearing within disease pathways was built. For each of these entries, the information whether the gene is druggable for small molecules using HGNC mapping was added. Using all this for each molecule and all disease pathways, the numbers how many distinct druggable KEGG-IDs are related to molecule and pathway was evaluated. Additionally, the number of distinct druggable KEGG-IDs for each molecule, the number of distinct druggable KEGG-IDs for each pathway, and the number of distinct druggable KEGG-IDs appearing in any disease-related pathway were assessed and furthermore used to evaluate the probability for a specific compound to act on the evaluated pathways.
For the search for molecules for a special pathway, this strategy is not useful because you get a large number of molecules with the same number of targets within this pathway, so an additional score for a ranking was needed. Here all molecules with a binding to any druggable target within the pathway were evaluated and ranked according to the binding value.

What is the p-value?

For the evaluation of the p-value the number of interactions for a compound to druggable genes and the part within the given pathways with the same values for all druggable genes are evaluated. The p-value gives the probability that you have at least so many interactions within the given pathways only by chance. This propability value is evaluated with the cumulative hypergeometric distribution using R. In this way, a binding within a pathway containing only two druggable genes is ranked up compared to a few bindings within pathways with a lot of druggable genes, and a single binding in one pathway of a compound with a lot of interactions is ranked down.

What is the e-value?

The e-value gives the probability that you find an interaction within a pathway by chance if you search within a number of pathways. For the evaluation of the e-values the number of pathways with druggable targets and binding within ChEMBL was taken into account (275 pathways). For very small p-values the p-value can be multiplied with the number of pathways, for larger values you need to evaluate this by the formula e = 1 - (1-p)275.

Why do mechanism of action prediction and pathway enrichment analysis sometimes give different ChEMBL compounds as results?

For the mechanism of action prediction, associated human targets are found in ChEMBL with a low binding activity (<= 10 000 nm) and a sufficient confidence score. In contrast, pathway searches are performed on a subset of the MOA structures, that have the additional requirement to have according activities to at least one druggable target. Therefore, it is possible that the pathway enrichment analysis has fewer hits than the MOA prediction.

How large is the overlap from Withdrawn 2.0 to ChEMBL withdrawn compounds?

Comparing the dataset of Withdrawn 2.0 and ChEMBL is a complex problem. While the withdrawn database uses only one entry for each drug, the ChEMBL database often has several entries for the same drug. For instance, ChEMBL contains METHAPYRILENE (CHEMBL1411979), METHAPYRILENE FUMARATE (CHEMBL3187246), and METHAPYRILENE HYDROCHLORIDE (CHEMBL1255739), while Withdrawn 2.0 only lists METHAPYRILENE (CHEMBL1411979). Furthermore, there are several levels of withdrawn structures within ChEMBL: structures having only the tag withdrawn_flag = 1 within the table “molecule_dictionary”, as well as structures with additional information within the table “drug_warning”, including data for withdrawal country, year and description. For each compound with the “withdrawn_flag”, that is not contained in the table “drug_warning”, a different compound with identical or near identical structure is contained instead, which further complicates mapping of withdrawn structures to additional withdrawal information. Last but not least, there are entries that are marked as approved and subsequently withdrawn, that are not actually approved drugs. For instance, METABROMSALAN (CHEMBL1801800) is a pesticide and DIBROMSALAN (CHEMBL21869) is a disinfectant, used in soaps. So, even more manual curation would be needed to fully integrate the data.
Generally, CHEMBL 32 lists 4194 IDs with max_phase 4.0 as approved drugs, of which 3215 also have the year of the first approval assigned. 262 of these IDs have the flag “withdrawn_flag = 1”, of which Withdrawn 2.0 lists 171 with the same ChEMBL ID. 225 different structures in the table “drug_warning” have the tag “withdrawn” - 141 of these structures are listed with the same ID in Withdrawn 2.0. Summarized, at least 171 compounds of Withdrawn 2.0 are listed as withdrawn in ChEMBL too, but there are further copies of structures within the 262 ChEMBL structures with “withdrawn_flag” (which result in not more than 225 different drugs), that are contained within Withdrawn 2.0 but have another ChEMBL ID.

Is there a bulk download

Information on all 626 withdrawn drugs, including chemical properties such as SMILES, InChi and molecular formula, withdrawal and toxicity information is available as csv file download.