Computational virtual screening and structure-based design of some epidermal growth factor receptor inhibitors

The foremost cause of cancer mortality worldwide was lung cancer. Lung cancer is divided into small cell lung cancer and non-small cell lung cancer (NSCLC). The latter is the main type of lung cancer that account for about 90% of the cancer issues and estimate about 25% of the cancer mortality each year in the world. Among the types of lung cancer with about 1.5 million patients and less than 20% survival rate is NSCLC. Overexpression of EGFR tyrosine kinase was recognized to be the cause of NSCLC. Therefore, there is a need to develop more EGFR inhibitors due to drug-resistance development by the mutation. Computational virtual screening on some epidermal growth factor receptor inhibitors (EGFRL858R/T790M inhibitors or NSCLC therapeutic agents) against their target protein (EGFR tyrosine kinase receptor pdb entry 3IKA) was performed via molecular docking simulation and pharmacokinetics to identify hit compounds with a promising affinity toward their target. The hit compounds discovered were compound 22 with −9.8 kcal/mol, 24 with −9.7 kcal/mol, 17 with −9.7 kcal/mol, and 19 with −9.5 kcal/mol respectively. These lead compounds were further subjected to drug-likeness and ADME prediction and found to be orally bioavailable. Six (6) new EGFRL858R/T790M inhibitors using compound 22 with the highest binding affinity as a template were designed. The six newly EGFRL858R/T790M inhibitors were found to have a better binding affinity than the template used in the designing process and AZD9291 (the positive control). None of the designed compounds was found to violate more than the permissible limit set by RO5 thereby predicting their easy transportation, absorption, and diffusion. More so, the designed compounds were found to have good synthetic accessibility which indicates that these designed compounds can be easily synthesized in the laboratory.


Background
Lung cancer is one of the leading cancer problems in the globe. It was reported to cause a lot of death every year (estimated to take about one-third of the entire cancer deaths). Non-small cell lung cancer (NSCLC) is the main subset of lung cancers that accounts for about 85% of the cancer problems [1]. Overexpression of epidermal growth factor receptor kinase was identified to be the common cause of NSCLCs.
Report on NSCLCs on the population of patients in the Caucasia rise to about 10-15% and 30-40% in Asia [1].
The discovery of NSCLC therapeutic agents for the treatment of EGFR tyrosine kinase is one of the major challenges encountered by the medicinal chemist [2]. The treatment of EGFR tyrosine kinase to managed NSCLCs became a very urgent therapeutic necessity [3].
NSCLC therapeutic agents show a very high response rate in patients with arousing modifications of EGFR. NSCLC therapeutic agents or EGFR inhibitors are classified into reversible EGFR inhibitors (first-generation EGFR inhibitors); gefitinib and erlotinib are the example of this class of EGFR inhibitors include. Unluckily, the span of the potency of these first-generation EGFR inhibitors is narrowed due to the development of drug resistance by the secondary mutation T790M [4]. And the irreversible EGFR inhibitors (secondand third-generation EGFR inhibitors), afatinib and osimertinib, are the examples of these EGFR inhibitors. Inline to defeat the resistance to the first-generation of EGFR inhibitors, the second-generation irreversible EGFR inhibitors, such as afatinib and canertinib, were afterward devised to treat NSCLC EGFR T790M mutation [5]. Yet, due to severe side effects, such as skin rash and diarrhea, these secondgeneration inhibitors cannot attain satisfaction over the firstgeneration reversible inhibitors. It is believed that the activities upon wild-type EGFR will narrow the possible activities on the patients with the T790M mutation [2,[6][7][8].
Molecular docking is a molecular modeling technique used in structure-based design to screen a library of compounds to identify compounds with a higher affinity toward their target protein by elucidating their mode of interaction with their target utilizing their 3D structures [13]. Pharmacokinetics and drug-likeness properties prediction of hit compounds play a vital role in structure-based design in the determination of the pharmacokinetic profile of the hit compounds under investigation in the early stage of the drug pipeline [14].
This work is aimed at carrying out computational virtual screening on some EGFR L858R/T790M inhibitors using molecular docking to identify hit compounds with a promising affinity against their target receptor (EGFR tyrosine kinase receptor), confirm their bioavailability via their pharmacokinetics and drug-likeness properties, and design new potent EGFR L858R/T790M inhibitors that have better binding affinity than the template.

Method
This computational work was done on a Dell personal computer laptop, with these specifications: Intel ® Core™ i7 Dual CPU, M330 @2.75 GHz 2.75 GHz, and 8 GB of RAM. The following software was utilized to achieve the success of this research: Pyrex virtual screening software,     UCSF Chimera, PyMOL, Discovery studio, and SWISSA DME, an online web tool.

Source and sketching of dataset under investigation
Twenty-eight (28) sets of EGFR L858R/T790M inhibitors were gotten from the work of Hu et al. [15] and used in this research. Immediately after the retrieval of the data, the next thing is drawing of all the molecules under investigation in 2D format. Chemdraw software was then used to draw the 2D structures of all the molecules under investigation [16]. Table 1 presents the structures of all the data set under investigation.

Determination of the optimum structures under investigation
Determination of the most stable/optimum geometry of all the molecules on potential energy surface (PES) was achieved by the use of Spartan 14 wave software in this research. B3LYP/6-311G* level of theory with density functional theory (DFT) was used to achieve the searching for the optimum structures under investigation [17].

Ligands, EGFR enzyme preparation and execution of the molecular docking simulation
Ligands preparation is very vital in any molecular docking studies. As such, the preparation of the ligands in this work was done using the optimum geometry of each of the ligands obtained in 2.1 above before the elucidation of their binding interactions and the binding pose of the EGFR enzyme [18]. Figure 1 shows the 3D geometry of a prepared EGFR L858R/T790M inhibitor (ligand) under investigation. The EGFR enzyme with protein data bank code: 3IKA was retrieved from the RCSB protein data bank database. After successful retrieval of the enzyme, the preparation of the EGFR enzyme for the molecular docking simulation was done using discovery studio visualizer, in the process of the preparation of the enzyme, the cocrystalline ligand and molecule of water present on the structure were deleted. Before that, polar hydrogen was added. Figure 2 shows the 3D structure of the prepared EGFR enzyme.
The docking of the ligands to the binding pose of the EGFR enzyme was achieved with the help of Autodock vina of Pyrex virtual screening software [19]. After a successful docking procedure, since Pyrex was used there is a need to re-couple the docked ligand and the receptor for further investigation. UCSF Chimera software was used for the re-coupling of the docked ligand and the receptor. PyMOL and Discovery studio were used to achieve the visualization of recoupled complexes in order to view the nature of the interaction between the ligand and the receptor.

Drug-likeness and ADME properties prediction
Pharmacokinetics and drug-likeness of the EGFR L858R/T790M inhibitors under investigation were determined using Swis-sADME, a free online web tool [20]. Lipinski's rule of five was the criteria used in the determination of the druglikeness of the molecules under investigation which states that if any small molecule violates more than 2 of these criteria, the molecules might not be orally bioavailable [21].

Design
Structure-based drug design is a very robust and useful technique. Structure-based drug design is also called direct design which involves the acquisition of the information regarding the three-dimensional structure of the molecular target (protein) through methods such as xray crystallography, NMR spectroscopy, or homology modeling, followed by the design of suitable drug candidates based on the binding affinity and selectivity for their target molecules. Structure-based drug design comprises several steps such as protein structure retrieval and preparation, ligand library preparation, docking and manual design of new compounds [22].

Molecular docking simulation
The results of the molecular docking simulation are presented in Table 2 and Figs. 3 and 4 respectively.

Drug-likeness and ADME properties prediction
The results of the drug-likeness and ADME properties prediction are presented in Tables 3 and 4, Figs. 5 and 6 respectively.

Molecular docking of designed compounds
The results of the molecular docking of designed compounds are presented in Tables 5 and 6 and Fig. 7 respectively.

Drug-likeness and ADME properties prediction
The result of the drug-likeness and ADME properties prediction is presented in Tables 7 and 8, respectively.

Molecular docking simulation
Molecular docking simulation was used to screen twenty-eight (28) sets of EGFR L858R/T790M inhibitors in order to identify hit compounds that could be used to design new EGFR L858R/T790M inhibitors by investigating their binding interactions in the binding pose of EGFR receptor (3IKA) ( Table 2). The result of the four best hit compounds with the lowest docking scores/highest binding affinity will be discussed. Compound 22 was the best among the four selected compounds that have the lowest docking score of −9.8 kcal/mol due to the major number of interactions in the binding pocket of the enzyme. Discovery studio visualizer was used to investigate its interaction in the binding pose of the enzyme, it was seen to interact with MET790 (2.65 Å), LYS745 (2.67 Å), ASP855 (3.21 Å), GLY857 (3.69 Å) and PHE723 (2.63 Å) amino acid residues in the active site of EGFR receptor via both conventional and carbon-hydrogen bond interactions. Beside conventional and carbon-hydrogen bond interactions, it also bound to LEU844, PHE723, LEU718 (3), ALA743, and LEU844 residues via Pi-Sigma, Pi-Sulfur, Pi-Pi Stacked, Alkyl, and Pi-Alkyl hydrophobic interactions. Pi-Anion electrostatic interaction with ASP855 and Pi-Sulfur interaction with MET790 were also observed.
Drug-likeness and ADME properties prediction of the studied compounds Table 3 presents the computed drug-likeness of the compounds under investigation. It was observed in the table that none of the molecules under investigation violated more than the maximum permissible limit of the criteria stated by Lipinski's filters, it therefore means that there is a high tendency that all of these molecules might be pharmacologically very active. In fact, all these molecules under investigation are said to have good absorption, low toxicity level, orally bioavailable, and permeable properties except molecule 28 which has WlogP value (it predicts whether a molecule has low toxicity level or not) greater than 5. The Bioavailability Radar of the four selected molecules under investigation was shown to further confirm their drug-likeness properties (Fig. 5).
The compounds under investigation could be said to be orally bioavailable. Table 4 presents the gastrointestinal (GI) absorption, blood-brain barrier (BBB) permeant, Pgp substrate, and CYP isoforms inhibition properties of all the molecules under investigation. From the table, all the molecules under investigation have high GI absorption, none has BBB permeant, some were found to be able to permeate through the skin and some cannot, also all were observed to inhibit the CYP isoforms except CYP1A2. The boiled-egg plot was performed to further confirm the GI absorption and BBB permeant properties of the four hit compounds (Fig. 6). It is further confirmed from the plot that none of them passed through the BBB but they were within the GI absorption region.

Molecular docking of designed compounds
Six new EGFR L858R/T790M inhibitors were designed using compound 22 with the highest binding affinity of −9.8 kcal/mol as the template (Table 5). Based on the interaction of compound 22 with the EGFR receptor, structural modifications were carried out on the template by the addition of substituents on the piperazin-1-yl moiety and isopropyl phenyl ring of the template.
The addition of acetyl group on the piperazin-1-yl moiety and 2 chlorine molecules at the meta position of the isopropyl phenyl ring of the template showed a significant increase in the interaction of the designed compound (D3) with the EGFR receptor with −10.2 kcal/mol binding energy. It was found to bind with the EGFR receptor through conventional and carbon-hydrogen bonds, hydrophobic, electrostatic, and other interactions ( Table 6). Four amino acid residues (ASP855, MET790, LYS745, and LYS745) of the enzyme with bond distance 2.9622 Å, 2.49526 Å, 2.61911 Å, and 2.38759 Å were  observed to form a conventional hydrogen bond with a different part of the ligand as depicted in Fig. 7a. Carbon-hydrogen bond was also observed in the binding pocket of the enzyme between these two amino acid residues ASP855 (3.24379 Å) and PHE723 (2.57647 Å) and the ligand. The ten (10) amino acid residues in the binding pocket of the enzyme who interacted with the ligands via hydrophobic interaction were LEU844 (2), MET790, PHE723, LEU718 (3), LEU792, CYS797, and ALA743 (2) respectively. Besides the mentioned interactions, electrostatic interaction was also observed between the ligand (D3) and ASP855 residue in the binding pocket of the receptor. The only amino acid who interacted via Pi-Sulfur interaction (other) was MET790. The addition of only the acetyl group on the piperazin-1-yl moiety of the template yielded significant change also in the interaction of the designed compound (D5) with the EGFR receptor with a very good binding affinity of −10.1 kcal/mol (Table 5). Designed compound D5 bounded to EGFR receptor via a hydrogen bond, hydrophobic interactions, and other interaction as shown in Table 6. The same number of the conventional   (2) respectively. Besides the mentioned interactions, ASP855 residue was the only that form electrostatic interaction between the ligand and in the binding pocket of the receptor and MET790 was the only residue who interacted via Pi-Sulfur (other) interaction. This might be possible as the result of not having halogens in the designed compound 5 (D5) which is why the number of hydrophobic interactions were less than that of D3. The other designed compounds (D1, D2, D4, and D6) showed good interactions with higher binding affinity in the binding pocket of the EGFR tyrosine kinase receptor (Table 6). They were observed to have interacted with the binding pocket of the enzyme via the same conventional hydrogen, carbon-hydrogen bond, hydrophobic, electrostatic, and Pi-Sulfur (other) interactions except D4 which has not interacted via  Pi-Sulfur (other) interaction. Furthermore, AZD9291 was used as a positive control and used to validate the docking process than compared with the designed compounds. The designed compounds were found to be better than AZD9291 which has the binding affinity of −8.1 kcal/mol which is as a result of less number of interactions as compared with the designed compounds. The 2D structures of designed compound D3 and D5 are presented in Fig. 7a and b.

Drug-likeness and ADME prediction of designed compounds
Using the Lipinski's rule of five as a standard filter for small molecule, the drug-likeness of the designed compounds were also predicted as presented in Table 7. From the table, no any designed compound was found to violate more than the permissible limit set by Lipinski's rule of five filters and therefore predicting their easy transportation, absorption, and diffusion [23,24]. ADME properties of these designed compounds were also predicted and presented in Table 8. All were observed to have low gastrointestinal absorption. But none was observed to permeant through the brain. All designed compounds have a lower bioavailability score of 0.17. Based on the synthetic accessibility score (Table 8), they can all be synthesized in the laboratory [25,26].

Conclusion
In conclusion, molecular docking simulation carried out on the twenty-eight (28) EGFR L858R/T790M inhibitors has identified four hit compounds with a higher binding affinity toward their target. The hit compounds discovered were compound 22 with −9.8 kcal/ mol, 24 with −9.7 kcal/mol, 17 with −9.7 kcal/mol, and 19 with −9.5 kcal/mol respectively. These lead compounds were further subjected to drug-likeness and ADME prediction and found to be orally bioavailable with good absorption, low toxicity level, and permeable properties. The best among the hit compounds was retained as a template and used to design six new EGFR L858R/T790M inhibitors with better binding affinity than the template and AZD9291 (the positive control). None of the designed compounds was found to violate more than the permissible limit set by RO5 thereby predicting their easy transportation, absorption, and diffusion. More so, the designed compounds were found to have good synthetic accessibility which indicates that these designed compounds can be synthesized in the laboratory.