- Research
- Open Access
- Published:
Computational evaluation of some compounds as potential anti-breast cancer agents
Future Journal of Pharmaceutical Sciences volume 7, Article number: 167 (2021)
Abstract
Background
The emergence of high resistance and toxicity of the existing anti-breast cancer drugs have demanded the need to design new drugs with improved activities against breast cancer. A computational technique incorporating quantitative structure–activity relationship and virtual template-based design was carried out to evaluate thirty-four compounds from derivatives of thiophene, pyrimidine, coumarin, pyrazole and pyridine with anti-breast cancer activities. The chemical structures of the compounds were drawn with chem draw v.12.0.2 and they were optimized using Spartan 14 software. The molecular descriptors were calculated with the aid of PaDel descriptor software. The dataset was curated and then divided into training and test set that was used to generate and validate the model.
Results
The first out of the four models generated was chosen as the paramount model with statistical validations of R2 = 0.9847, \(R_{{{\text{adj}}}}^{2}\) = 0.9814, \(Q_{{{\text{cv}}}}^{2}\) = 0.9763, min expt. error for non-significant LOF (95%) = 0.0679, an external validation \(R_{{{\text{test}}}}^{2}\) of 0.8240 and coefficient of Y-randomization (\({\text{cR}}_{{\text{p}}}^{2}\)) = 0.8200, which confirm the robustness of the model.
Conclusions
The high predictive power of the generated model describes the models’ reliability and the designed compounds pointed out compound 2 with pGI50 = 4.2504 as the best designed compound to inhibit breast cancer, compared to its co-designed compounds and the template. The results of this research provide vital information to the pharmaceutical chemists and the pharmacologist in the course of developing new breast cancer drugs.
Background
Cancer is a word used to describe the unusual growth of the cells leading to one of the most dangerous health problems for humans all over the world [1]. Despite the availability of improved drugs targeting cancer therapies, the worldwide cancer burden is expected to increase to 19.3 million new cancer cases, and nearly 10 million cancer deaths were observed in the year 2020 [2].
Breast cancer is the most common cancer among women all over the world and impermanence from breast cancer is commonly due to tumour metastasis [3]. It constitutes a major public health issue globally, with over 1 million new cases diagnosed annually; resulting in over 400,000 annual deaths and about 4.4 million women living with the disease [4]. The mortality rate of breast cancer among Nigerian women is about 16% [5].
Amino-thiophene derivatives were known to be one of the most important groups of heterocyclic compounds with a wide spectrum of biological activities such as antitumor [6] anti-mitotic [7] and antiviral [8]. Furthermore, thieno [2, 3-d] pyrimidine derivatives show anti-proliferative activity [9] while pyrazole derivatives have a specific effect with favourable antitumor activity [10]. Coumarin scaffold turn out to be an attractive subject due to their broad spectrum of pharmacological activities, its derivative is extensively explored for anticancer activities as it possesses minimum side effect along with multi-drug reversal activity [11]. Most pyridine derivatives had been synthesized as potentially biologically active compounds and had a multitude of pharmacological characteristics, in particular, anti-cancer activity [12,13,14].
Quantitative Structure Activity Relationship (QSAR) is one of the commonly used computational method for predicting the activities/properties of molecules in drug design as it saves time and lesser cost [15]. Generating a good QSAR model depends on factors such as: the quality of biological data, the choice of descriptors, variable selection, statistical methods and validations.
The aim of this research is to develop a good QSAR model for predicting the activity of some selected compounds against breast cancer and also design new compounds with better activities against breast cancer.
Methods
Data collection
The dataset used in this work was collected from the literature [16] and were reported as fifty percent growth inhibition (GI50) concentrations in (mmol L−1). These reported inhibitory activities were converted to logarithm scale to have a well-defined range with the help of Eq. (1) shown below.
Compounds sketching, optimization and descriptors calculations
The two-dimensional structure (2D) of the compounds were sketched using ChemDraw software version 12.0.2 [17], they were imported into Spartan 14 V.1.1.4 software to obtain the optimized three-dimensional spatial conformer (3D) at Density Functional Theory (DFT) level applying B3LYP 6-31G* basis set [18]. The optimized compounds in Spartan format were converted to SD file format and later imported into the PaDEL software to calculate the models’ descriptors.
Dataset normalization and pre-treatment
To give the descriptors equal chance of occurrence, the compounds were normalized using Eq. (2), [19]. The normalized data were pre-treated using the data pre-treatment software obtained from Drug Theoretic and Cheminformatics Laboratory (DTC Lab) to remove all empty columns and some useless descriptors [20].
where Xi in the equation is the value of each descriptor for a given molecule and \(X_{\max }\) and \(X_{\min }\) are the maximum and minimum values for each column of descriptors X respectively.
Model generation and validation
In other to generate a good QSAR model, the pre-treated dataset was divided into training and test set in the ratio 7:3 by the means of data division software of DTC Lab [20]. The model was built using the training set, employing GFA-MLR method from the material studio. The test set was then used to validate the built model [21]. The suitability notch of the generated model was assessed using the lack of fit (LOF) [22], as in Eq. (3).
SEE being the Standard Error of Estimation, C is the number of terms in the model, d is a user-defined smoothing parameter, P is the total number of descriptors in the model and M is the number of training dataset. SEE can be expressed as:
where \(Y_{{{\text{exp}}}}\) and \(Y_{{{\text{pre}}}}\) are the experimental activity and the predicted activity in the training set respectively [22].
The squared correlation coefficient (R2) is a validation test used to match the predicted and experimental activities. The model would be considered robust with an R2 value close to 1. R2 is expressed as:
where \(Y_{{{\text{exp}}}}\), \(Y_{{{\text{pred}}}}\) and \(\overline{ Y}_{{\text{training }}}\), were respectively the experimental activity, the predicted activity, and the mean experimental activity of the samples in the training set. The validity of the model cannot be based on R2 only, therefore an adjustment in the R2 would give a more reliable model. The adjusted R2 is givens by:
where d is the number of descriptors in the model and n is the number of training set compounds.The predictive power of the model is usually determined by the Cross-validation \((Q_{{{\text{cv}}}}^{2} )\) and the external validation test as expressed in Eqs. (7) and (8) respectively.
where \(Y_{{{\text{pred}}_{{{\text{test}}}} }}\) is the predicted activity, \(Y_{{{\text{exp}}_{{{\text{test}}}} }}\) is the experimental activity of the test set and \(\overline{Y}_{{{\text{training}}}}\) is the mean activity of the training set [21].
Y-randomization
Y-randomization is an external validation test performed to generate a new model from the bogus dataset so as to improve the models’ efficacy. For a good model, the randomized squared correlation coefficient (\({\text{cR}}_{{\text{p}}}^{2}\)) must be greater than 0.5, and is expressed as:
where \({\text{cR}}_{{\text{p}}}^{2}\) is the Y-randomization coefficient and \(R_{{\text{r}}}\) is the average ‘R’ of random models [19].
Applicability domain (AD)
Applicability domain is a theoretical region of the chemical space that is defined by the model descriptors, model response and nature of the training set. The leverage approach was employed to measure the data within the AD [23], any dataset that lies outside the AD would be treated as an outlier. Equation (10) is normally used to calculate the AD.
where \(l_{i}\) is the leverage of each compound, \(X_{i}\) is the descriptor row-vector of the query compound i, and X is the (m × n) descriptor matrix of the training set compounds used in building the model. The critical value (l*) is defined by Eq. (11).
where p is the number of descriptors in the model and n is the number of objects used to develop the model.
Mean effect (ME) and variance inflation factor (VIF)
The mean effect is used to elucidate the comparative importance of each descriptor in the model while the VIF is used to determine the linearity between the descriptors in the model. VIF value of 1 show no linearity among the descriptors and value above 10 indicates a bad model. The ME and VIF are respectively calculated using Eqs. (12) and (13).
where \(B_{j}\) is the coefficient of the descriptor j in the model, \(D_{j}\) is the value of each descriptor in the data matrix for each of the training set data, m and n are respectively the number of descriptors that appears in the model and the number of molecules in the training set
where R2 is the multiple regression correlation coefficient between the variables in the model [24].
Molecular design
An In-silico approach of template-based design was employed to design new compounds with enhance activity against breast cancer. This method has been hired frequently to screen and modelled compounds with better-quality activity by relating the experimental activities of the compounds with their structures [25]. Henceforth, compound with the highest activity would be defined as the template to design new compounds with enhanced activities.
Results
All the tables and figures that describes the outcome of the built model and the designed compounds are presented in this section.
Discussion
All the thirty-four compounds used in this study were first sketched by ChemDraw to obtain the 2D structures, they were imported to the spartan 14 software to obtain their 3D optimised structures. The optimized dataset was normalized, pre-treated and the molecular descriptors were calculated with the help of PaDEL descriptor software. A large number of 1874 of molecular descriptors that are responsible for encrypting the important features of the structures were calculated.
The 2D structures and activities of the studied compounds were presented in Table 1. The Genetic Function Approximation (GFA), was used to generate four models, the first model out of the four models was selected as the optimum model since it best agrees with the minimum criteria for generating good QSAR model, reported in Table 2. Table 3 display the validation parameters for the generated models. Table 4 present the Y-randomization test used to affirm the strength of a model. This test was carried out on the training set by keeping the independent variable constant and randomizing the dependent variables. The low values of R, R2 and Q2 indicate the robustness of the generated model and the coefficient of Y-randomization (\({\text{cR}}_{{\text{p}}}^{2}\) = 0.8200) confirmed the generated model was not gotten by chance.
Table 5 displays the correlation matrix, VIF and the ME of the four descriptors used to build the models. The low value of the Pearson’s correlation indicates that there is no significant connection between the descriptors, this means that each descriptor gives different information that influenced the model. The relative importance of each of the descriptor in the model was measured with the low value of the Variance Inflation Factor (VIF) and since the VIF value were all less than 2, henceforth, the descriptors in the model were rightfully selected and the model is therefore said to be statistically satisfactory [24]. Meanwhile descriptor MaxHBd with highest positive ME value indicates its prominence in the models’ activity, as such, the descriptor was made the focal point when designing new enhanced compounds. The descriptor (MaxHBd), means Maximum E-States for (strong) Hydrogen Bond donors.
Descriptive analysis was carried out to back up the evidence that the dataset was well divided into a new set (training set and test set). Table 6 present the maximum, minimum and standard deviation values for both training and test sets were very close suggesting no significant difference in them, as a result, we deduce that the training set is extrapolative within the test set, this confirm the fit of the Kennard and stone method employed in the data division.
Table 7 present the details of the descriptors used to build the model. The first two descriptors were 2D and the last two being 3D. The equations generated from the material studio software displayed below, indicates Eq. (1) as the best model when compared to the standard validation parameters for generating a good QASR model in Table 2.
Model 1
pGI50 = 0.709363893 * GATS8c − 4.252846824 * maxHBd − 0.063150018 * TDB10p − 0.153565552 * RNCS + 4.211504042;
Model 2
pGI50 = 0.772397640 * GATS8c − 4.191643628 * maxHBd − 0.000418849 * TDB10v − 0.153328698 * RNCS + 4.006452472;
Model 3
pGI50 = 1.245271529 * MATS7c − 0.826297651 * SpMax5_Bhi − 3.598436568 * maxHBd − 0.135122003 * RNCS + 6.680738880;
Model 4
pGI50 = 0.723219873 * GATS8c − 4.266084558 * maxHBd − 0.061174372 * Kier1 − 0.155935615 * RNCS + 4.726811020.
The difference between the predicted activity and the reported activity is the residual activity, which is presented in Table 8. The low residual values indicate that the predicted activities lie within the experimental activities, accounting for the high predicting power of the model. Figure 1 and 2 below shows the graphical plot of experimental activity against the predicted activity for both training and test set respectively, the R2 value of the two plots are satisfactory when compared to the recommended R2 value of a good QSAR model reported in Table 2. The plot of standardize residual versus experimental activity in Fig. 3, was used to check for any systematic error in the built model, it was found that the built model was free of systematic error since all it standardizes value lies within ± 2 unit. Figure 4 shows the Williams plot, the plot help to determine compounds that are either influential or outliers. Four compounds were found to be outliers because their leverage values were greater that the critical leverage (l* = 0.6) and those compounds shall not be considered while designing a new anti-breast cancer agent.
In other to design more potent anti-breast cancer compounds, compound 10 (Fig. 5) with the highest reported activity (4.0458) was endorsed as the template. The most influential descriptor maxHBd (maximum E-state for Hydrogen bond donor), with mean effect of 0.8382 was investigated. To raise the hydrogen bond donor, H-bond acceptor and strong electronegative atoms (F, O and N) were attached to the appropriate positions, which lead to the design of six new compounds with enhanced 50% growth inhibitory activity as displayed in Table 9.
Conclusion
This research has effectively built a good QSAR model with high predictive power, using the descriptors maxHBd, GATS8c, TDB10p and RNCS. The Williams plot, outlined four compounds (outliers) that should not be considered for further computational study. The validation parameters used to generate the model as discussed above all passed the minimum recommendation for building a valid QSAR model. Descriptor maxHBd with positive mean effect value of 0.8382 was found to mostly influence the optimum model, and was chosen as the template that was then used to design six new compounds with better inhibitory activities. Three out of the six designed compounds were found to have pIC50 value (4.2118, 4.1688 and 4.2504) greater than the template and the rest of the design compounds. Conclusively, the research aim was achieved and the results of this work would serve as first-hand information to the pharmaceutical chemist, pharmacist and pharmacologist in the course of producing new drug against breast cancer.
Availability of data and materials
The data used in this work was collected from the work of Albratty, M, El-Sharkawy, K. A., and Alam, S., (2017), Synthesis and Antitumor Activity of Some Novel Thiophene, Pyrimidine, Coumarine, Pyrazole and Pyridine Derivatives. Acta Pharm. 67, 15–33, https://doi.org/10.1515/acph-2017-0004. All the data and materials generated during the current study are included in this published article and are available upon request.
Abbreviations
- QSAR:
-
Quantitative structure–activity relationships
- VIF:
-
Variance inflation factor
- ME:
-
Mean effect
- DTC Lab:
-
Drug Theoretic and Cheminformatics Laboratory
- GI50 :
-
50% Growth inhibition
- DFT:
-
Density functional theory
References
Idris MO, AbechiShallagwa SEGA, Uzairu A (2020) QSAR and molecular docking studies of novel thiophene, pyrimidine, coumarin, pyrazole and pyridine derivatives as potential anti-breast cancer agent. Turk Comput Theo Chem (TC&TC) 4(1):12–23
Ferlay J, Ervik M, Lam F, Colombet M, Mery L, Piñeros M et al (2021) Global cancer observatory: cancer today. International Agency for Research on Cancer, Lyon. https://gco.iarc.fr/today. Accessed Feb 2021
Jemal A, Siegel R, Ward E, Murray T, Xu J, Thun MJ (2007) Cancer statistics. CA Cancer J Clin 57:43–66
GLOBOCAN (2018) Latest Global Cancer data: cancer Burden Rises to 18.1 million new cases and 9.6 million cancer deaths in 2020. IARC Global Cancer Observatory
Okobia MN, Bunker CH, Okonofua FE, Osime U (2006) Knowledge, attitude and practice of Nigerian women towards breast cancer: a cross-sectional study. World J Surg Oncol 4:11–15
Romagnoli R, Baraldi PG, Lopez-Cara C, Salvador MK, Preti D, Tabrizi MA, Balzarini V, Nussbaumer P, Bassetto M, Brancale A, Fu XH, Gao Y, Li J, Zhang SZ, Hamel E, Bortolozzi R, Basso G, Viola G (2014) Design synthesis and biological evaluation of 3, 5-disubstituted 2-aminothiophene derivatives as a novel class of antitumor agents. Bioorg Med Chem 22:5097–5109. https://doi.org/10.1016/j.bmc.2013.12.030
Romagnoli R, Baraldi PG, Carrion MD, Cara CL, Perti D, Fruttarolo F, Pavani MG, Tabrizi MA, Tolomio M, Grimaudo S, Di-Cristina A, Balzarini J, Hadfield JA, Bracale A, Hamel E (2007) Synthesis and biological evaluation of 2- and 3-aminobenzo[b] thiophene derivatives as antimitotic agents and inhibitors of tubulin polymerization. J Med Chem 50:2273–2277. https://doi.org/10.1021/jm070050f
Stephens CE, Felder TM, Sowell JW, Andrei G, Balzarini J, Snoeck R, De-Clerq E (2001) Synthesis and antiviral/antitumor evaluation of 2-amino-2-carboxamido-3-aryl-sulfonyl thiophenes and related compounds as a new class of diarylsulfones. Bio Org Med Chem 9:1123–1132. https://doi.org/10.1016/S0968-0896(00)00333-3
Hansch C (1990). In: Ramsden CA (ed) Comprehensive medicinal chemistry, vol 4. Pergamon Press, New York, pp 5–8
Li Z, Wan H, Shi Y, Ouyang P (2004) Personal experience with four kinds of chemical structure drawing software: review on ChemDraw, ChemWindow, ISIS/Draw, and ChemSketch. J Chem Inf Comput Sci 44:1886–1890
Dandriyal J, Singla R, Kumar M, Jaitak V (2016) Recent developments of C-4 substituted coumarin derivatives as anticancer agents. Eur J Med Chem 119:141–168. https://doi.org/10.1016/j.ejmech.2016.03.087
El-Borai MA, Rizk HF, Beltagy DM et al (2013) Microwave-assisted synthesis of some new pyrazolopyridines and their antioxidant, anti-tumor and antimicrobial activities. Eur J Med Chem 66:415–422
Salem MS, Ali MAM (2016) Novel Pyrazolo[3,4-b] pyridine derivatives: synthesis, characterization, antimicrobial and antiproliferative profile. Biol Pharm Bull 39:473–483
Chavva K, Pillalamarri S, Banda V et al (2013) Synthesis and biological evaluation of novel alkyl amide functionalized trifluoromethyl substituted pyrazolo[3,4-b] pyridine derivatives as potential anti-cancer agents. Bioorg Med Chem Lett 23:5893–5895
Becke AD (1993) Becke’s three parameter hybrid method using the LYP correlation functional. J Chem Phys 98:5648–5652
Albratty M, El-Sharkawy KA, Alam S (2017) Synthesis and antitumor activity of some novel thiophene, pyrimidine, coumarine, pyrazole and pyridine derivatives. Acta Pharm 67:15–33. https://doi.org/10.1515/acph-2017-0004
Singh P (2013) Quantitative structure-activity relationship study of substituted-[1, 2, 4] oxadiazoles as S1P1 agonists. J Curr Chem Pharm Sci 3:334–345
Kennard RW, Stone LA (1969) Computer aided design of experiments: technometrics. J Sci Res 11:137–148. https://doi.org/10.1080/00401706.1969.10490666
Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. Mol Inform 22:69–77. https://doi.org/10.1002/qsar.200390007
Khaled KF (2011) Modelling corrosion inhibition of iron in acid medium by genetic function approximation method: a QSAR model. Corros Sci 53:3457–3465
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67
Adeniji ES, Uba S, Uzairu A (2018) Theoretical modelling for predicting the activities of some active compounds as potent inhibitors against Mycobacterium tuberculosis using GFA-MLR approach. J King Saud Univ Sci 32:575–586. https://doi.org/10.1016/j.jksus.2018.08.010
Veerasamy R, Rajak H, Jain A, Sivadasan VCP, Agrawal RK (2011) Validation of QSAR models-strategies and importance. Int J Drug Des Discov 3:511–519
Idris MO, Abechi SE, Shallangwa GA, Uzairu A (2020) Insilico elucidation of some quinoline derivative with potent anti-breast cancer activities. J Eng Exact Sci 6(1):265–274
Zakari YI, Uzairu A, Shallangwa GA, Abechi SE (2021) Molecular modelling and design of some β-amino alcohol grafted 1,4,5-trisubstituted 1,2,3-triazoles derivatives against chloroquine sensitive, 3D7 strain of Plasmodium falciparum. Heliyon 7:e05924
Acknowledgements
The authors really appreciate the scholarly guidance of Mr. Adeniji Elijah Shola of Ahmadu Bello University (ABU) Zaria-Nigeria and Mr. Abdullahi Mustapha of Kaduna state University (KASU) Nigeria.
Funding
No any form of funding applicable to this work.
Author information
Authors and Affiliations
Contributions
GAS collected the Data and sketched all through to optimizations. SEA pre-prosed the data, built and validates the model. MOI analysed and discussed the result and also designed the new compounds with improved activity. All authors have read and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interest
The authors declare that there was no competing interest regarding this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Idris, M.O., Abechi, S.E. & Shallangwa, G.A. Computational evaluation of some compounds as potential anti-breast cancer agents. Futur J Pharm Sci 7, 167 (2021). https://doi.org/10.1186/s43094-021-00315-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s43094-021-00315-2
Keywords
- Anti-breast cancer
- Dataset
- Dataset division
- Model validations
- Template
- Design compounds