Skip to main content

Computational modeling and ligand-based design of some novel hypothetical compound as prominent inhibitors against Mycobacterium tuberculosis



Time consumed and expenses in discovering and synthesizing new hypothetical drugs with improved biological activity have been a major challenge toward the treatment of multi-drug-resistant strain Mycobacterium tuberculosis (TB). To solve the above problem, quantitative structure activity relationship (QSAR) is a recent approach developed to discover novel agents with better biological activity against M. tuberculosis.


A validated QSAR model was developed in this study to predict the biological activities of some anti-tubercular compounds and to design new hypothetical drugs is influenced with the molecular descriptors, AATS7s, VR1_Dzi, VR1_Dzs, SpMin7_Bhe, and TDB8e, which has been validated through internal and external validation test. Prior to high anti-tubercular activity of the lead compound, compound 17 served as a template structure to design compounds with improved activity. Among the compounds designed, compounds 17i, 17j, and 17n were observed with improved anti-tubercular activities which ranges from 8.8981 to 9.0377 pBA.


The outcome of this research is recommended for pharmaceutical and medicinal chemists to synthesis and carry out an in vivo and in vitro screening for the proposed designed compounds in order to substantiate the computational findings.


Multi-drug resistance strain (TB) has posed a challenge toward the treatment of tuberculosis in the global community. World Health Organization in (2018) has reported 9.0 million people infected with tuberculosis, 360,000 HIV patients who were living with tuberculosis, death of 230,000 children, and death of 1.6 million people worldwide [1]. Some of the notable commercially sold drugs administered to people infected with tuberculosis are isoniazid (INH), pyrazinamide (PZA), rifampicin (RMP), and para-amino salicylic acid (PAS). The emergence of multi drug-resistant strains of M. tuberculosis toward the aforementioned drugs has led to advances in searching for new and better approach that is precise and fast in developing a novel compound with improved biological activity against M. tuberculosis.

The advance of computational chemistry has led to development of new drug. Computational methods which reduced the cost for effective evaluation of large virtual database of chemical compounds are currently employed in designing new drugs. Such method includes complex network theory, quantitative structure–activity relationships (QSAR) models, Machine Learning (ML), and Artificial Neural Networks (ANN) analysis. For the time being, QSAR is a theoretical approach with widely used computational method in predicting and designing new hypothetical drug candidate [2]. The application of QSAR technique to this problem has a potential to minimize the effort and time required to discover new compounds or to improve current compounds in terms of their efficiency. Multi variant QSAR model is expressed mathematically to relate the physical, chemical, biological, or environmental activities of interest with measurable or computable parameters such as physicochemical, topological, stereochemical, or electronic indices called molecular descriptors. Meanwhile, some prominent researchers [3,4,5,6] have successfully established QSAR models to show the relationship between some derivatives such as triazole, chalcone, quinolone, 7-methyijuglone, pyrrole, and their respective biological activities using the QSAR approach. Hence, this research was aimed to build a robust QSAR model with high predictability and to design new potent hypothetical compounds with proposed better anti-tubercular activity.


Data collection

The molecules of derivatives of 2,4-disubstituted quinoline derivatives reported as anti-Mycobacterium tuberculosis which were used in this study were obtained from the literature [7]. The list of these compounds and their biological activities were presented in Table 1.

Table 1 Geometrical structures of inhibitory compounds as anti-tubercular agents

Biological activities

The biological activities of 2,4-disubstituted quinoline derivatives as potent anti-tubercular agents were initially expressed in percentage (%) and then converted to logarithm unit using Eq. 1 below in order to increase the linearity and approach normal distribution of the activity values. The observed structures and the biological activities of these compounds were presented in Table 1 [4].

$$ \left[\mathrm{pBA}=\log \left(\frac{{\mathrm{molecular}\ \mathrm{weight}}_{\left(\mathrm{g}/\mathrm{mol}\right)}}{{\mathrm{Dose}}_{\left(\mathrm{g}/\mathrm{mol}\right)}}\right)\left(\frac{\mathrm{percentage}\ \left(\%\right)\ }{100-\mathrm{percentage}\ \left(\%\right)\ }\right)\right] $$

Molecular optimization

The Spartan 14 software version 1.1.4 was used to optimize all the inhibitory compounds in order for the compounds to attain stable conformation at a minimal energy. The strain energy from the molecules was removed by employing molecular mechanics force field (MMFF), and complete optimization was achieved with the aid of density functional theory (DFT) by utilizing the (B3LYP) basic set [4].

Generation of molecular descriptor

A descriptor is a mathematical logic that defines the properties of a molecule in a numeral term based on the connection between the biological activity of each molecule and its molecular structure. Descriptors for all the inhibitory molecules were calculated with the aid of the PaDEL descriptor software version 2.20, and a total of 1879 molecular descriptors were generated.

Normalization and pretreatment of data

For each of the variable (descriptor) to have the same chance at the inception so as to influence the QSAR model, the descriptor values generated from the PaDEL descriptor software version 2.20 were subjected to normalization using Eq. 2 [8].

$$ D=\frac{d_1-{d}_{\mathrm{min}}}{d_{\mathrm{max}}-{d}_{\mathrm{min}}} $$

where dmax and dmin are the maximum and minimum value for each descriptor column of D. d1 is the descriptor value for each of the molecule. Immediately after the data have been normalized, the normalized data were then subjected to pretreatment so as to remove redundant descriptors.

Generation training and test set

The whole compounds that made up the data set were divided into training and test set in proportion of 70 to 30% using Kennard and Stone’s algorithm which was incorporated in DTC lab software. The development of the QSAR model and internal validation test were performed on the training set while the confirmation of the developed model was performed on test set.

Building of QSAR models and internal validation test

The QSAR models were built by adopting the Genetic Function Approximation (GFA) technique incorporated in the Material Studio software version 8.0 to select the optimum descriptors for the training set. Meanwhile, multi-linear regression approach (MLR) was used as a modeling tool to develop the multi-variant equations by placing the activity data in the last column of Microsoft Excel 2013 spread sheet which was later imported into the Material Studio software version 8.0 to generate the QSAR model. The internal validation test to affirm the built model is robust and also has a high predictability that was also performed in the Material Studio software version 8.0 and reported.

Evaluation of leverage values (applicability domain)

Influential and outlier molecule present in both the training and test set were determined by employing the applicability domain approach. The leverage hi approach as defined in Eq. 3 was used to define applicability domain space ±3 for outlier molecule [9].

$$ hi= Mi\ {\left({M}^TM\right)}^{-1}{M}_i^T $$

where Mi represents the matrix of i for the training set. M represents the n × d descriptor matrix for the training set, and MT is the transpose of the training set (M).\( {M}_i^T \) represents the transpose matrix Mi. Meanwhile, the warning leverage h* defined in Eq. 4 is the limit boundary to check for an influential molecule.

$$ h\ast =3\ \frac{\left(d+1\right)}{N} $$

where d is the total number of descriptors present in the built model, and N is the total number of compounds that made up the training set.

Y-randomization validation test

Y-randomization test is one of the external validation criteria which has to be considered in order to ascertain that the developed model is not built by chance [10]. Random shuffling of the data was performed on the training set following the principle laid by [11]. The activity data (dependent variable) were shuffled while the descriptors (independent variables) were kept unchanged in order to generate the multi-linear regression (MLR) model. For the developed QSAR to pass the Y-randomization test, the R2 and Q2 values for the model must be significantly low for numbers of trials while Y-randomization coefficient (c\( {R}_p^2\Big) \) shown in Eq. 5 must be ≥0.5 in order to establish the robustness of the model.

$$ \mathrm{c}{R}_p^2=R\times {\left[{R}^2-{\left({R}_r\right)}^2\right]}^2 $$

where c\( {R}_p^2 \) is the Y-randomization coefficient, R is the correlation coefficient, and Rr is the average “R” of random models.

Affirmation of the build model

The internal and external validation criteria for both test and training set reported were compared with the generally accepted threshold value shown in Table 6 for any QSAR model [9,10,11,12] in order to affirm the reliability, fitting, stability, robustness, and predictability of the developed models.


Model 1

pBA = − 7.836545646 × AATS7s + 0.201962934 × VR1_Dzi + 0.087893211 × VR1_Dzs−4.204663658 × SpMin7_Bhe + 0.674915710 TDB8e + 29.11653208

Model 2

pBA = − 4.790218643 × AATS5e + 0.082643756 × VR3_Dzv−3.953009651 × SpMin7_Bhe + 0.094784839 TDB7e + 0.024520722 RDF90i + 41.534742802

EE is the standard error of estimation, w is the total number of terms present in the built model except the constant term, j is the number of descriptors confined in the built model, q is a user-defined factor, and N is the number of compounds of training set. Yobs, \( {\overline{Y}}_{training} \), and Ypred are the observed activity, mean observed activity of the training compounds and the predicted activity respectively. r2 is correlation coefficients of the plot of observed activity against predicted activity values, ro2 is the correlation coefficients of the plot of observed activity against predicted activity values at zero intercept, ro2 is the correlation coefficients of the plot of predicted activity against observed activity at zero intercept [7, 9, 10].

Discussion on designed compounds


QSAR studies

Optimum QSAR model for predicting the derivatives of 2,4-disubstituted quinoline against M. tuberculosis was successfully achieved by adopting the combination of computational and theoretical method. Data set comprises of 36 compounds was partitioned into 25 training set and 11 test set using Kennard and Stone algorithm method. The 25 training set compounds were used to derive QSAR model using the multi-linear regression technique which also served as data set for internal validation test while the external validation test for the derived model was conducted on the test set.

The observed activities reported in literature, the calculated activities calculated for all the anti-tubercular compounds, the leverage values, and the residual values were presented in Table 1. The difference between the observed activities and calculated activities is the residual values which were observed to be significant low [13,14,15]. The low residual value indicates that the model built has a good predictive ability.

The optimum (2D and 3D) descriptors that efficiently describe the anti-tubercular compounds in relation to their biological activities were selected by the GFA approach. The characterization and relative information on the molecular structure of the anti-tubercular agent illustrated by the descriptors were reported in numerical value as shown in Table 2.

Table 2 Descriptors name and class in model 1

Various statistical analyses were conducted on the calculated descriptors in order to check the validity of the built model as reported in Table 3. Variance inflation factor (VIF) was evaluated for all the descriptors in order to determine the degree of correlation between each the descriptor. Generally, VIF value equals to 1 or falls with 1 and 5 signify non-existence of inter-correlation among the descriptors. However, if the VIF value is greater than 10, it signify that the model developed is unstable, hence the model should be re-checked if necessary. Regarding the VIF values for each of the descriptors which were found to be less than 5 as reported in Table 3 affirm that the descriptors were significantly orthogonal to each order since there is no inter-correlation between them. The degree of contribution that each descriptor plays in the built model was evaluated by determining the standard regression coefficient (\( {b}_j^s\Big) \) and mean effect (ME). The magnitude and signs for \( {b}_j^s \)and ME values reported in Table 4 indicate strength and direction with which each descriptor influence the activity model. The relationship between the descriptors and biological activity of each compound was determined by one way analysis of variance (ANOVA). The probability value of each of the descriptor at 95% confidence level was found to be (p < 0.05) as presented in Table 3. Therefore, this signify that the alternative hypothesis that says there is a direct relationship between the biological activity of each compound and the descriptor swaying the built model is accepted; thus, null hypothesis proposing no direct relationship between biological activity of each compound and the descriptor swaying the built model is rejected. To further justify the validation of the descriptors in the activity model, Pearson’s correlation statistic was conducted to also check whether there is inter-correlation between each descriptor. The correlation coefficient between each descriptors reported in Table 4 was all < ± 0.8. Hence, this implies that all the descriptors were void of multicollinearity.

Table 3 Statistical analysis and validation of descriptors model 1
Table 4 Coefficient of Pearson’s correlation for descriptor in model

Validation results for both the external and internal assessment to assure that the built models are reliable and robust were presented in Table 5. These results were all in full agreement with the general validation criteria presented in Table 5 to truly indorse that the stability and robustness of the model is valid. Reference to these validation results obtained, model one was selected and established to be the prime model which was used to predict the biological activities of 2, 4-disubstituted quinoline against M. tuberculosis.

Table 5 Internal and external validation parameters for each model

The built QSAR model and results obtained in this research were compared with recent model developed in the literature [3, 6] as shown below:

pBA =−0.307001458(MATS2s)+ 1.528715398(nHBint3)+3.976720227(maxtsC)+ 0.016199645(TDB9e)+ 0.089381479(RDF90i) − 0.107407822(RDF110s)+ 4.057082751, R2= 0.92024, Radj= 0.9102, \( {Q}_{\mathrm{cv}}^2 \)= 0.8954, and R2pred = 0.8842 [3]

pIC50 =−2.040810634 (nCl)−19.024890361 (MATS2m)+ 1.855704759 (RDF140s)+ 6.739013671 = 27, R2= 0.9480, Radj = 0.9350, \( {Q}_{\mathrm{cv}}^2 \)= 0.8799, and R2pred = 0.7690 [6]

The validation factors reported in this work and those reported in the literature were all in agreement with the validation parameters presented in Table 5 which really inveterate that the model generated is predictive and robust.

The coefficient of Y-randomization (c\( {R}_p^2\Big) \) with significant value of 0.7362 greater than threshold value of 0.5 reported in Table 5 provides a reasonable support that the model built is robust and not just by chance.

The graphical representation to show the degree of correlation between the calculated activities and observed activities of the training and test set were shown in Figs. 1 and 2. The correlation coefficient (R2) value of 0.9183 and 0.8052 for both the training set and test set shows that there is a high correlation existing between the calculated activities and observed activities of the training and test set which were also in agreement with the accepted QSAR threshold values reported in Table 6.

Fig. 1
figure 1

Plot of predicted activity against observed activity of training set

Fig. 2
figure 2

Plot of predicted activity against observed activity of test set

Table 6 Binding affinity, hydrogen bond, and hydrophobic interaction of the ligands (2,4-disubstituted quinolone derivatives) with M. tuberculosis target (DNA gyrase)

The residual plot shown in Fig. 3 signify that there is no indication of computational incompetency and inaccuracy in the QSAR model derived as all the standard residual values for both training and test set were found within the defined boundary of ±2 on the standard residual activity axis.

Fig. 3
figure 3

Plot of standardized residual activity versus observed activity

The Williams plot to show the applicability domain space (AD) is shown in Fig. 4. However, compound (number 30) is found to have a leverage value greater than the predicted warning leverage (h* = 0.60). Therefore, it can be infer that compound (number 30) is an influential molecule. Moreover, it is also observed that all the compounds fall within the defined space of ±3 which indicates that no compound is said to be outlier.

Fig. 4
figure 4

The Williams plot of the standardized residuals versus the leverage value

In silico design for new derivatives based on Lead compound 17

Ligand-based approach was employed to design new compounds with better anti-tubercular activities via modification of the template by deletion, insertion, and substitution of active substituent(s) into the template structure. The choice of template used in this study was (E)-n-benzyl-2-(2-benzylidenehydrazinyl)quinoline-4-carboxamide (i.e., molecule 17 in Table 1) due to its relative high anti-tubercular activity which also falls within the model applicability domain (AD) space shown in Fig. 4. The modification was easily made around n-ethylacetamide and 2-methylhydrazine moiety of the template at positions 16 and 23 shown in Fig. 5. The QSAR model built indicated that increase in the values of descriptors, VR3_Dzp, VR1_Dzi, and VR1_Dzs and influences the activity positively. This implies that increase in the value of these descriptors also augment the values of the activity in the same direction. Variation of the substituent at positions 16 and 23 of the template structure with alkyl group, benzene derivatives, and substituted alky amines lead to generation of fourteen compounds with better anti-tubercular activities reported in Table 7. The leverage values predicted for the designed compounds were used to screen and confirm whether these compounds were within their model AD. Based on the leverage value predicted for each compound in Table 7 assured that each compound have a low leverage value compared with the warning leverage h = 0.64 shown in Fig. 5. Hence, this signified that all the designed compounds fall within their model AD space. Among the compounds designed, compounds 17i, 17j, and 17n were observed with better anti-tubercular activities. This was as a result of substitution at positions 16 and 23 of the template structure with n-substituted alky amine which act as electron releasing group via positive inductive effect (+I). Due to the positive +I effect of the alkyl group attached to the template structure, the nitrogen becomes strongly electronegative, so the lone pair of electron on N-atom is easily available. The steric hindrance of the bulky alkyl group (30 amine) observed in the compound 17j accounts for the decrease in its reactivity when compared to compound 17i (10 amine) and 17n (20 amine). Based on the decreasing order of amine, (CH3)2NH > CH3NH2 > (CH3)3N > NH3, suggests why compound 17n was observed with prominent activity.

Fig. 5
figure 5

a Shows the lead compound (17) for 2,4-disubstituted quiloline. b Shows the general formula of the lead compound (17) for 2,4-disubstituted quiloline as a design template

Table 7 Designed molecule, predicted descriptors, and calculated activities for template 17 of 2,4-quiloline derivative


This work addresses the quantitative structure activity relationship (QSAR) between quinoline derivatives and their biological activities against Mycobacterium tuberculosis. The QSAR model was established to predict the reported experimental activities of 2,4-disubstituted quinoline derivatives against M. tuberculosis via computational modeling approach under the influence of optimum descriptors: AATS7s, VR1_Dzi, VR1_Dzs, SpMin7_Bhe, and TDB8e. The lead compound (compound 17) with higher anti-tubercular activity was used as a structural template to designed new hypothetical drug candidates. Among the compounds designed, compounds 17i, 17j, and 17n were observed with improved anti-tubercular activities which ranges from 8.8981 to 9.0377 pBA. The outcome of this research is recommended for pharmaceutical and medicinal chemists to synthesis and carry out an in vivo and in vitro screening for the proposed designed compounds in order to substantiate the computational findings.

Availability of data and materials

It has been reported and cited in the methodology part of the manuscript.


  1. W.H.O (2018)

  2. Hansch C, Kurup A, Garg R, Gao H (2001) Chem-bioinformatics and QSAR: a review of QSAR lacking positive hydrophobic terms. Chem Rev 101:619–672

    Article  CAS  Google Scholar 

  3. Ogadimma AI, Adamu U (2016) Analysis of selected chalcone derivatives as Mycobacterium tuberculosis inhibitors. Open Access Library J 3:1–13

    Google Scholar 

  4. Adeniji SE, Uba S, Uzairu A (2018) QSAR Modeling and molecular docking analysis of some active compounds against Mycobacterium tuberculosis receptor (Mtb CYP121). J Pathog 2018

  5. Adeniji SE, Uba S, Uzairu A (2018) A Novel QSAR Model for the evaluation and prediction of (E)-N’-Benzylideneisonicotinohydrazide derivatives as the potent anti-mycobacterium tuberculosis antibodies using genetic function approach. Phys Chem Res 6:479–492

    CAS  Google Scholar 

  6. Adeniji SE, Uba S, Uzairu A (2019) A derived QSAR model for predicting some compounds as potent antagonist against M. tuberculosis: a theoretical approach. Hindawi Adv Prev Med:5173786

  7. Nayyar A, Jain R (2008) Synthesis and anti-tuberculosis activity of 2, 4-disubstituted quinolines. Ind J Chemist Sect B Org Med Chem 47:117–128

    Google Scholar 

  8. Singh P (2013) Quantitative structure-activity relationship study of substituted-[1, 2, 4] oxadiazoles as S1P1 agonists. J Curr Chem Pharm Sci 3:64–79

    CAS  Google Scholar 

  9. Veerasamy R, Rajak H, Jain A, Sivadasan S, Varghese CP, Agrawal RK (2011) Validation of QSAR models-strategies and importance. Int J Drug Des Discov 3:511–519

    Google Scholar 

  10. Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. Mol Inform 22:69–77

    CAS  Google Scholar 

  11. Adeniji SE, Uba S, Uzairu A (2018) Theoretical modeling and molecular docking simulation for investigating and evaluating some active compounds as potent anti-tubercular agents against MTB CYP121 receptor. Future J Pharm Sci 4:284–295

    Article  Google Scholar 

  12. Roy K, Chakraborty P, Mitra I, Ojha PK, Kar S, Das RN (2013) Some case studies on application of “rm2” metrics for judging quality of quantitative structure–activity relationship predictions: emphasis on scaling of response data. J Comput Chem 34:1071–1082

    Article  CAS  Google Scholar 

  13. Ibrahim MT, Uzairu A, Shallangwa GA, Uba S (2020) In-silico activity prediction and docking studies of some 2, 9-disubstituted 8-phenylthio/phenylsulfinyl-9 h-purine derivatives as Anti-proliferative agents. Heliyon 6:e03158

    Article  Google Scholar 

  14. Adeniji SE, Uba S, Uzairu A (2020) Multi-linear regression model, molecular binding interactions and ligand-based design of some prominent compounds against M. tuberculosis. Network Model Anal Health Inform Bioinform 9(8):1–18

    Google Scholar 

  15. Abdullahi M, Shallangwa GA, Uzairu A (2020) In silico QSAR and molecular docking simulation of some novel aryl sulfonamide derivatives as inhibitors of H5N1 influenza A virus subtype. Beni-Suef Univ J Basic Appl Sci 9(2):1–13

    Google Scholar 

Download references





Author information

Authors and Affiliations



SE and OB did the conception and design of the work. SE and OB did the acquisition and analysis of the data. SE interpreted the data. SE drafted the manuscript. SE and OB substantively revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shola Elijah Adeniji.

Ethics declarations

Ethics approval and consent to participate

Not applicable for that section

Consent for publication

Not applicable for that section

Competing interests

Not applicable for that section

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Adeniji, S.E., Adalumo, O.B. Computational modeling and ligand-based design of some novel hypothetical compound as prominent inhibitors against Mycobacterium tuberculosis. Futur J Pharm Sci 6, 15 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: