Estimating synthetic accessibility of drug-like molecules
2009 June 24
In a recent Journal of Cheminformatics paper, Peter Ertl and Ansgar Schuffenhauer have proposed a new scoring approach SAscore or synthetic accessibility score to estimate ease of synthesis (synthetic accessibility) of drug-like molecules based on fragment contributions and molecular complexity of given molecule.
In this study
Fragment contributions have been calculated based on the analysis of one million representative molecules from PubChem and therefore one can say that they capture historical synthetic knowledge stored in this database. The molecular complexity score takes into account the presence of non-standard structural features, such as large rings, non-standard ring fusions, stereocomplexity and molecule size. The method has been validated by comparing calculated SAscores with ease of synthesis as estimated by experienced medicinal chemists for a set of 40 molecules. The agreement between calculated and manually estimated synthetic accessibility is very good with r2= 0.89.
Although there are several other computational approaches (either complexity-based or retrosynthetic-based) to estimate synthetic accessibility (SA) of molecules, but validation of prediction results is biggest bottleneck because there are no experimental measures which can define the SA. Indeed only way to validate the predicated SA estimation values is correlate to a ranking of ease to synthesis provided by experienced medicinal chemists. Unfortunately performance of chemists in ranking molecules or estimating their synthetic accessibility depends upon human factors and varies from chemist to chemist. In past several studies found that for a given set of molecules ranking done by different experience chemist is highly variable and not very consistent which indicate that even experienced chemists differ in their estimations of ease of synthesis. With respect to validation point of view this new approach is not very different from others, this study also relies on the ranking provided by experinced chemist. In this case 9 chemists have provided ranking for 40 test molecules and agreement among chemists in their rankings is quite good, the r2 ranges between 0.450 and 0.892 with the average r2 for all “chemist pairs” being 0.718. The purpose of this new study is to develop a better score which can exploit most of the available information about the molecule.
SAscore is a hybrid complexity-based approach which lies somewhere in between fast complexity-based, and resourceintensive full retrosynthetic approaches. It is important to note that
Complexity-based methods use sets of rules to estimate complexity of target structures (features like presence of spiro-rings, non-standard ring fusions, or large number of stereocenters) which is then directly related to SA. The second group of methods is based on the full retrosynthetic approach when the complete synthetic tree leading to the molecules needs to be processed. Such a procedure is quite time consuming, because the size of the synthetic tree grows exponentially with the number of required steps. Additionally, retrosynthetic methods rely on reaction databases as well as lists of available reagents, which both need to be kept up-to-date.
and,
Pure complexity-based approaches, however, have known deficiencies: they do not take into account easy availability of complex reagents, which allows us to introduce some complex features to molecules relatively easily , neither the fact that some simple reactions can produce quite complex structures (condensation reactions, cycloadditions, various cyclizations).
SAscore is calculated as a combination of two factors
SAscore = fragmentScore – complexityPenalty
where,
fragmentScore accounts for background synthetic knowledge and is calculated as a sum of contributions of all fragments in the molecule divided by the number of fragments in this molecule,
complexityScore captures the complex structural features in the molecules and is calculates as a combination of ringComplexityScore, stereoComplexityScore, macrocyclePenalty and the sizePenalty.
As mentioned earlier, the correlation between calculated SAscore and the average of chemist ranks is very good with r2= 0.89. Distribution of SAscore for different type of molecules such as natural products, bioactive molecules and molecules from catalogues reveals that natural products are much more difficult to synthesize than “standard” organic molecules which is very much according to our previous understanding. SAscore of Bioactive molecules lies somewhere in the middle between bioactive and catalogues molecules.
References:
fragmentScore accounts for background synthetic knowledge and is calculated as a sum of contributions of all fragments in the molecule divided by the number of fragments in this molecule,
complexityScore captures the complex structural features in the molecules and is calculates as a combination of ringComplexityScore, stereoComplexityScore, macrocyclePenalty and the sizePenalty.
As mentioned earlier, the correlation between calculated SAscore and the average of chemist ranks is very good with r2= 0.89. Distribution of SAscore for different type of molecules such as natural products, bioactive molecules and molecules from catalogues reveals that natural products are much more difficult to synthesize than “standard” organic molecules which is very much according to our previous understanding. SAscore of Bioactive molecules lies somewhere in the middle between bioactive and catalogues molecules.
References:
Ertl, P., & Schuffenhauer, A. (2009). Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions Journal of Cheminformatics, 1 (1) DOI: 10.1186/1758-2946-1-8
2 Responses
leave one →





Estimating synthetic accessibility of drug-like molecules: In a recent Journal of Cheminformatics paper, Peter E.. http://tinyurl.com/ntqalp
Estimating synthetic accessibility of drug-like molecules http://tinyurl.com/ntqalp