Computational modeling and ligand-based design of some novel hypothetical compound as prominent inhibitors against Mycobacterium tuberculosis

Time consumed and expenses in discovering and synthesizing new hypothetical drugs with improved biological activity have been a major challenge toward the treatment of multi-drug-resistant strain Mycobacterium tuberculosis (TB). To solve the above problem, quantitative structure activity relationship (QSAR) is a recent approach developed to discover novel agents with better biological activity against M. tuberculosis. A validated QSAR model was developed in this study to predict the biological activities of some anti-tubercular compounds and to design new hypothetical drugs is influenced with the molecular descriptors, AATS7s, VR1_Dzi, VR1_Dzs, SpMin7_Bhe, and TDB8e, which has been validated through internal and external validation test. Prior to high anti-tubercular activity of the lead compound, compound 17 served as a template structure to design compounds with improved activity. Among the compounds designed, compounds 17i, 17j, and 17n were observed with improved anti-tubercular activities which ranges from 8.8981 to 9.0377 pBA. The outcome of this research is recommended for pharmaceutical and medicinal chemists to synthesis and carry out an in vivo and in vitro screening for the proposed designed compounds in order to substantiate the computational findings.

technique to this problem has a potential to minimize the effort and time required to discover new compounds or to improve current compounds in terms of their efficiency. Multi variant QSAR model is expressed mathematically to relate the physical, chemical, biological, or environmental activities of interest with measurable or computable parameters such as physicochemical, topological, stereochemical, or electronic indices called molecular descriptors. Meanwhile, some prominent researchers [3][4][5][6] have successfully established QSAR models to show the relationship between some derivatives such as triazole, chalcone, quinolone, 7-methyijuglone, pyrrole, and their respective biological activities using the QSAR approach. Hence, this research was aimed to build a robust QSAR model with high predictability and to design new potent hypothetical compounds with proposed better anti-tubercular activity.

Data collection
The molecules of derivatives of 2,4-disubstituted quinoline derivatives reported as anti-Mycobacterium tuberculosis which were used in this study were obtained from the literature [7]. The list of these compounds and their biological activities were presented in Table 1.

Biological activities
The biological activities of 2,4-disubstituted quinoline derivatives as potent anti-tubercular agents were initially expressed in percentage (%) and then converted to logarithm unit using Eq. 1 below in order to increase the linearity and approach normal distribution of the activity values. The observed structures and the biological activities of these compounds were presented in Table 1 [4].

Molecular optimization
The Spartan 14 software version 1.1.4 was used to optimize all the inhibitory compounds in order for the compounds to attain stable conformation at a minimal energy. The strain energy from the molecules was removed by employing molecular mechanics force field (MMFF), and complete optimization was achieved with the aid of density functional theory (DFT) by utilizing the (B3LYP) basic set [4].

Generation of molecular descriptor
A descriptor is a mathematical logic that defines the properties of a molecule in a numeral term based on the connection between the biological activity of each molecule and its molecular structure. Descriptors for all the inhibitory molecules were calculated with the aid of the PaDEL descriptor software version 2.20, and a total of 1879 molecular descriptors were generated.

Normalization and pretreatment of data
For each of the variable (descriptor) to have the same chance at the inception so as to influence the QSAR model, the descriptor values generated from the PaDEL descriptor software version 2.20 were subjected to normalization using Eq. 2 [8].
where d max and d min are the maximum and minimum value for each descriptor column of D. d 1 is the descriptor value for each of the molecule. Immediately after the data have been normalized, the normalized data were then subjected to pretreatment so as to remove redundant descriptors.

Generation training and test set
The whole compounds that made up the data set were divided into training and test set in proportion of 70 to 30% using Kennard and Stone's algorithm which was incorporated in DTC lab software. The development of the QSAR model and internal validation test were performed on the training set while the confirmation of the developed model was performed on test set.

Building of QSAR models and internal validation test
The QSAR models were built by adopting the Genetic Function Approximation (GFA) technique incorporated in the Material Studio software version 8.0 to select the optimum descriptors for the training set. Meanwhile, multi-linear regression approach (MLR) was used as a modeling tool to develop the multi-variant equations by placing the activity data in the last column of Microsoft Excel 2013 spread sheet which was later imported into the Material Studio software version 8.0 to generate the QSAR model. The internal validation test to affirm the built model is robust and also has a high predictability that was also performed in the Material Studio software version 8.0 and reported.

Evaluation of leverage values (applicability domain)
Influential and outlier molecule present in both the training and test set were determined by employing the applicability domain approach. The leverage h i approach as defined in Eq. 3 was used to define applicability domain space ±3 for outlier molecule [9].
where M i represents the matrix of i for the training set. M represents the n × d descriptor matrix for the training set, and M T is the transpose of the training set (M). M T i represents the transpose matrix M i . Meanwhile, the warning leverage h * defined in Eq. 4 is the limit boundary to check for an influential molecule.
where d is the total number of descriptors present in the built model, and N is the total number of compounds that made up the training set.

Y-randomization validation test
Y-randomization test is one of the external validation criteria which has to be considered in order to ascertain that the developed model is not built by chance [10]. Random shuffling of the data was performed on the training set following the principle laid by [11]. The activity data (dependent variable) were shuffled while the descriptors (independent variables) were kept unchanged in order to generate the multi-linear regression (MLR) model. For the developed QSAR to pass the Yrandomization test, the R 2 and Q 2 values for the model must be significantly low for numbers of trials while Yrandomization coefficient (cR 2 p Þ shown in Eq. 5 must be ≥0.5 in order to establish the robustness of the model.
where cR 2 p is the Y-randomization coefficient, R is the correlation coefficient, and R r is the average "R" of random models.

Affirmation of the build model
The internal and external validation criteria for both test and training set reported were compared with the generally accepted threshold value shown in Table 6 for any QSAR model [9][10][11][12] in order to affirm the reliability, fitting, stability, robustness, and predictability of the developed models. EE is the standard error of estimation, w is the total number of terms present in the built model except the constant term, j is the number of descriptors confined in the built model, q is a user-defined factor, and N is the number of compounds of training set. Y obs , Y training , and Y pred are the observed activity, mean observed activity of the training compounds and the predicted activity respectively. r 2 is correlation coefficients of the plot of observed activity against predicted activity values, r o 2 is the correlation coefficients of the plot of observed activity against predicted activity values at zero intercept, r′ o 2 is the correlation coefficients of the plot of predicted activity against observed activity at zero intercept [7,9,10].

QSAR studies
Optimum QSAR model for predicting the derivatives of 2,4-disubstituted quinoline against M. tuberculosis was successfully achieved by adopting the combination of computational and theoretical method. Data set comprises of 36 compounds was partitioned into 25 training set and 11 test set using Kennard and Stone algorithm method. The 25 training set compounds were used to derive QSAR model using the multi-linear regression technique which also served as data set for internal validation test while the external validation test for the derived model was conducted on the test set.
The observed activities reported in literature, the calculated activities calculated for all the anti-tubercular compounds, the leverage values, and the residual values were presented in Table 1. The difference between the observed activities and calculated activities is the residual values which were observed to be significant low [13][14][15]. The low residual value indicates that the model built has a good predictive ability.
The optimum (2D and 3D) descriptors that efficiently describe the anti-tubercular compounds in relation to their biological activities were selected by the GFA approach. The characterization and relative information on the molecular structure of the anti-tubercular agent illustrated by the descriptors were reported in numerical value as shown in Table 2.
Various statistical analyses were conducted on the calculated descriptors in order to check the validity of the built model as reported in Table 3. Variance inflation factor (VIF) was evaluated for all the descriptors in order to determine the degree of correlation between each the descriptor. Generally, VIF value equals to 1 or falls with 1 and 5 signify non-existence of inter-correlation among the descriptors. However, if the VIF value is greater than 10, it signify that the model developed is unstable, hence the model should be re-checked if necessary. Regarding the VIF values for each of the descriptors which were found to be less than 5 as reported in Table 3 affirm that the descriptors were significantly orthogonal to each order since there is no inter-correlation between them. The degree of contribution that each descriptor plays in the built model was evaluated by determining the standard regression coefficient ( b s j Þ and mean effect (ME). The magnitude and signs for b s j and ME values reported in Table 4 indicate strength and direction with which each descriptor influence the activity model. The relationship between the descriptors and biological activity of each compound was determined by one way analysis of variance (ANOVA). The probability value of each of the descriptor at 95% confidence level was found to be (p < 0.05) as presented in Table 3. Therefore, this signify that the alternative hypothesis that says there is a direct relationship between the biological activity of each compound and the descriptor swaying the built model is accepted; thus, null hypothesis proposing no direct relationship between biological activity of each compound and the descriptor swaying the built model is rejected. To further justify the validation of the  descriptors in the activity model, Pearson's correlation statistic was conducted to also check whether there is inter-correlation between each descriptor. The correlation coefficient between each descriptors reported in Table 4 was all < ± 0.8. Hence, this implies that all the descriptors were void of multicollinearity. Validation results for both the external and internal assessment to assure that the built models are reliable and robust were presented in Table 5. These results were all in full agreement with the general validation criteria presented in Table 5 to truly indorse that the stability and robustness of the model is valid. Reference to these validation results obtained, model one was selected and established to be the prime model which was used to predict the biological activities of 2, 4disubstituted quinoline against M. tuberculosis.
The built QSAR model and results obtained in this research were compared with recent model developed in the literature [3,6] as shown below:   Table 5 which really inveterate that the model generated is predictive and robust.
The coefficient of Y-randomization (c R 2 p Þ with significant value of 0.7362 greater than threshold value of 0.5 reported in Table 5 provides a reasonable support that the model built is robust and not just by chance. The graphical representation to show the degree of correlation between the calculated activities and observed activities of the training and test set were shown in Figs. 1 and 2. The correlation coefficient (R 2 ) value of 0.9183 and 0.8052 for both the training set and test set shows that there is a high correlation existing between the calculated activities and observed activities of the training and test set which were also in agreement with the accepted QSAR threshold values reported in Table 6.
The residual plot shown in Fig. 3 signify that there is no indication of computational incompetency and inaccuracy in the QSAR model derived as all the standard residual values for both training and test set were found within the defined boundary of ±2 on the standard residual activity axis.
The Williams plot to show the applicability domain space (AD) is shown in Fig. 4. However, compound (number 30) is found to have a leverage value greater than the predicted warning leverage (h* = 0.60). Therefore, it can be infer that compound (number 30) is an influential molecule. Moreover, it is also   observed that all the compounds fall within the defined space of ±3 which indicates that no compound is said to be outlier.

In silico design for new derivatives based on Lead compound 17
Ligand-based approach was employed to design new compounds with better anti-tubercular activities via modification of the template by deletion, insertion, and substitution of active substituent(s) into the template structure. The choice of template used in this study was (E)-N-benzyl-2-(2-benzylidenehydrazinyl)quinoline-4-carboxamide (i.e., molecule 17 in Table 1) due to its relative high anti-tubercular activity which also falls within the model applicability domain (AD) space shown in Fig. 4. The modification was easily made around N-ethylacetamide and 2-methylhydrazine moiety of the template at positions 16 and 23 shown in Fig. 5. The QSAR model built indicated that increase in the values of descriptors, VR3_Dzp, VR1_ Dzi, and VR1_Dzs and influences the activity positively. This implies that increase in the value of these descriptors also augment the values of the activity in the same direction. Variation of the substituent at positions 16 and 23 of the template structure with alkyl group, benzene derivatives, and substituted alky amines lead to generation of fourteen compounds with better anti-tubercular activities reported in Table  7. The leverage values predicted for the designed compounds were used to screen and confirm whether these compounds were within their model AD. Based on the leverage value predicted for each compound in   Fig. 5. Hence, this signified that all the designed compounds fall within their model AD space. Among the compounds designed, compounds 17i, 17j, and 17n were observed with better antitubercular activities. This was as a result of substitution at positions 16 and 23 of the template structure with N-substituted alky amine which act as electron releasing group via positive inductive effect (+I). Due to the positive +I effect of the alkyl group attached to the template structure, the nitrogen becomes strongly electronegative, so the lone pair of electron on Natom is easily available. The steric hindrance of the bulky alkyl group (3 0 amine) observed in the compound 17j accounts for the decrease in its reactivity when compared to compound 17i (1 0 amine) and 17n (2 0 amine). Based on the decreasing order of amine, (CH 3 ) 2 NH > CH 3 NH 2 > (CH 3 ) 3 N > NH 3 , suggests why compound 17n was observed with prominent activity.

Conclusion
This work addresses the quantitative structure activity relationship (QSAR) between quinoline derivatives and their biological activities against Mycobacterium tuberculosis. The QSAR model was established to predict the reported experimental activities of 2,4-disubstituted quinoline derivatives against M. tuberculosis via computational modeling approach under the influence of optimum descriptors: AATS7s, VR1_Dzi, VR1_Dzs, SpMin7_Bhe, and TDB8e. The lead compound (compound 17) with higher antitubercular activity was used as a structural template to designed new hypothetical drug candidates. Among the  compounds designed, compounds 17i, 17j, and 17n were observed with improved anti-tubercular activities which ranges from 8.8981 to 9.0377 pBA. The outcome of this research is recommended for pharmaceutical and medicinal chemists to synthesis and carry out an in vivo and in vitro screening for the proposed designed compounds in order to substantiate the computational findings.