Assessment of computational approaches in the prediction of spectrogram and chromatogram behaviours of analytes in pharmaceutical analysis: assessment review

Today, artificial intelligence-based computational approach is facilitating multitasking and interdisciplinary analytical research. For example, the data gathered during an analytical research project such as spectral and chromatographic data can be used in predictive experimental research. The spectral and chromatographic information plays crucial role in pharmaceutical research, especially use of instrumental analytical approaches and it consume time, man power, and money. Hence, predictive analysis would be beneficial especially in resource-limited settings. Computational approaches verify data at an early phase of study in research process. Several in silico techniques for predicting analyte’s spectral and chromatographic characteristics have recently been developed. Understanding of these tools may help researchers to accelerate their research with boosted confidence and prevent researchers from being misled by incorrect analytical data. In this communication, the properties of chemical compounds and its relation to chromatographic retention will be discussed, as well as the prediction technique for UV/IR/Raman/NMR spectrograms. This review looked at the reference data of chemical compounds to compare the predictive ability in silico tools along with the percentage error, limitations, and advantages. The computational prediction of analytical characteristics offers a wide range of applications in academic research, bioanalytical method development, computational chemistry, analytical method development, data analysis approaches, material characterization, and validation process.


Background
The use of computational chemistry in research has been well-acknowledged in recent years and afforded significant research outcomes [1,2].There are literature reports on computer code for analysing models, replicating processes, predicting models, and interpreting chemical compounds [3].Unlike the drug discovery area, the validity of computational techniques in analytical chemistry yet to be explored as a comprehensive tool [4][5][6].The computational approach in analytical research is important because simulations of chemical behaviour of an analyte are needed for modelling of analyte response relationship in instrumental methods.Of course, it can be viewed as a visual representation of the connection between the analytical experiment and theoretical prediction [4,7].
In this era, new chemical entity research is needed in new drug discovery process for treatment, diagnostic, and biomarker research.At this juncture, spectroscopy and chromatography techniques are playing a vital role in the purification, identification, and characterization of the targeted chemical compound [8,9].In general, understanding and interpreting the spectrograms and chromatographic retention times of the new compounds is quite difficult for beginners if the researcher is a nonchemist [10].But, knowledge of spectrogram and chromatography is very essential for researchers and plays a crucial role in the process of developing new drugs.Indeed, the level of expertise and awareness on the accuracy of computation tools could assist the researchers in speeding up the experiments with partial validity of the analytical data [11].In the current scenario, still there are predatory journals publish data sets that are not reliable if they are not verified [12].Here, researchers may utilize computational tools to verify the data before citing in their research [4,7].The prediction tools of various spectrograms like UV-visible, infrared (IR), Raman, nuclear magnetic resonance (NMR), and mass spectra are now widely accessible to researchers.Similarly, in silico approaches to predict the chromatographic behaviour of an analyte in various chromatographic techniques like HPLC and GC are in existence [13,14].The prediction of retention time (t R ) in chromatography is gaining much importance in analytical method development research.Several computational prediction approaches have been reported.Some of them are artificial neural networks (ANNs), response surface methodology (RSM), analytical quality by design (AQbD) [15], design of experiments (DoE), chemometrics, and quantitative structure retention relationship (QSRR) methods [16].Although the knowledge about artificial intelligence software is limited, several artificial neural network-based programmes are widely available these days.Many researchers spend a significant amount of time on their experimental work, even though they are shortcomings in computational chemistry.The AQbD and QSRR approaches explore the scientific understanding of critical method variables and method response in chromatography [17,18].These methods are still recommended in pharmaceutical method development because it allows regulatory flexibility [19].In the AQbD approach [20], the tool used in the model development is DoE.In chromatographic research, the quantitative structure retention relationship (QSRR) is a reliable in silico method for predicting molecular systems [21,22], and it can be used to evaluate complex physicochemical features of analytes in chromatographic analyses and for predicting chromatographic retention parameters [23,24].
Considering the above discussion, the present assessment review focused on various prediction tools available, and accessible to resource-limited research setups.We have also explored the predictive ability of the different in silico tools with examples pertaining to the reference spectral library.Thus, this review can assist researchers in assessing the tool's reliability from case to case.

Problems involving the analytical methods
Today, the difficulties in analytical laboratories are the same as they had experienced in the past, although there has been advancement in analytical technology.Analytical laboratories experience difficulties related to the growth and preservation of expertise, maintaining the equipment sensitivities, and introduction of novel methodologies [25].There are many reports on previous analytical issues with analytes, including method performance [26], a lack of regulatory flexibility [27], complex chemical processes [28], OOT-out of trend [29], and OOS-out of specification [30,31].This problem could be mainly raised by three stages such as pre-analytical, post-analytical, and development phase.These can be overcome by utilizing the most modern and advanced computational methods.

Pre-analytical phase
One of the crucial stages in the analysis of the sample is the pre-analytical phase it includes, gathering of literature, sampling, preparation of the sample, transport, and storage.This entire process is the most time-consuming and might occasionally lead to errors [32].It is widely acknowledged that a degraded sample cannot produce good results.Always, it is important to conduct a literature review before beginning any research on an analyte.There are many databases, books, journals, and websites, but in some instances, information on new analyte may not be available due to a lack of studies on the analytes, or newly synthesized materials, or a lack of source availability [33].Next, for new analytical method development, the preparation of a sample is a critical step.A sample processing method is unique for each type of sample, including biological matrix, food products, active compounds, excipients, and pesticides.A given procedure cannot be applied to a different type of analyte without a complete revalidation of the method [34].Unfortunately, this rule is regularly ignored.Finally, there are several issues with analyte that affect storage and transportation; they are temperature, humidity control, data storage maintenance, and a lack of advancement [35].

Development phase
The selection of the method, procedure, principle, technology, and appropriate recommendations are the main problems that arise throughout the development phase.Unfortunately, it must be acknowledged that no method has yet been developed that satisfies all of these criteria and appropriate for all classes of analytes.This always place restrictions on analytical chemists.It is also crucial to understand whether the analysis's objective is merely screening or accurate quantification.In developing chromatography methods, optimization includes temperature, flow rate, the choice of mobile and stationary phases, separation efficiency, internal standard selection, and validation.Thus, re-optimization are difficult task, if the method fails during method transfer [36].In the last decade, new chromatographic techniques for the detection of bio-analytes have emerged.One of these techniques is tandem mass spectrometry (LC-MS/MS), which has advantages such as high selectivity and sensitivity but possess disadvantages such as expensive equipment, experienced operators, and more challenging method development [37,38].In the development of electrochemistry supported instrument, the general settings for resolution, path of the composite electrochemical response examination, and optimal path of analysis of the multidimensional data are complicated [39].

Post-analytical phase
In this phase, the key challenge is the collection and interpretation of data with analytical techniques, particularly when it comes to clinical research, proteomics, and metabolomics.Additionally, certain sophisticated computations raised problems from the data analysis as well.In general, manual calculations can produce inaccurate findings.From pre-and post-data analyses in chromatography methods, the common troubles are unwanted background signals, baseline drift, unresolved peaks shifting and retention durations, data comparison errors, and improper retention time alignments which are to be addressed [40].In spectroscopic analyses, specific mathematical transformations that are frequently created for a certain experimental approach are typically used to rectify systematic undesirable signal changes.Baseline shifts (offsets), horizontal shifts, drifts (slope changes), and global intensity effects are some of the systematic signal fluctuations.The significant alteration of signal profiles produced by the derivation transform can mislead the interpretation of final results [41].Overall, the scheme of application of computational method is shown in Fig. 1.

Prediction of C 13 -NMR and H 1 -NMR
NMR is a significant tool for detecting carbon and hydrogen atoms in organic compounds.In the pharmaceutical industry, C 13 -NMR and H 1 -NMR are used to assess drug purity, composition, and chemical shifts of diverse organic molecules.NMR parameters are now calculated by utilizing computational methods in association with chemical structures.AI has created several software tools (e.g.ChemDraw, Chemaxon, etc.) that are now used to predict chemical shifts in H 1 -NMR and C 13 -NMR and offer net intensity, quality, and spectrograms.

Machine learning approach in NMR prediction
Machine learning (ML) approaches are more beneficial and, in most cases, faster than prediction-based databases like HOSE codes.The database works by finding structural similarities and averaging the experimental data for chemical structures.The similarity between the new and known HOSE codes has little bearing on the accuracy of the prediction.The well-established structure determination approach formerly relied on quantum chemical calculation-based methods such as topical-based DFT calculation.This method is accurate for H1 and C 13 chemical shift predictions, but considerably more time-consuming and expensive.Today, software tools have been designed to speed up the procedure.The NMR signal characteristics can be visualized more accurately using a machine learning method called "Automatic structure verification (ASV)" based on variables such as temperature, solvent, pH, salt content, concentration, and so on which will affect chemical shifts in laboratory studies.All of these parameters have considered, such a way that NMR can predict the chemical shift for an unknown structure.But, certain other prediction algorithms take some of them into account, still the prediction systems produce variable values.But, the ASV system is capable of properly dealing with overlapping peaks.This is especially important when sections of the compound's relevant peaks, such as significant solvent peaks, are quite close to other signals [42][43][44][45].Few researchers have used this approach, including Jia et al. [46], who have developed a method for extracting data from previously examined 13 C and 1 H NMR spectra in order to recognize the NMR spectrum.Min Lin and colleagues predicted the chemical shifts based on cutting-edge machine learning [47].

Software handling for NMR Signal prediction
The user can either use a software application to draw the chemical structure of the test molecule or download and paste it into the software.The user will be able to locate the predicted C 13 -NMR and H 1 -NMR spectra in 1-5 min after clicking the calculation button.The user can optionally alter the frequency range from 60.0 to 1000 Hz after the prediction.Finally, a pdf document will be generated including the substance's chemical shift, peak intensity, peak quality, molecular location, and coupling constant values [48].A typical H 1 -NMR signal for Zidovudine is shown in Fig. 2.

Prediction of UV-Visible Spectra
The UV-Vis absorption spectrum of an organic substance is a key component of its physical makeup.Using predictions of UV-Vis spectra from molecular structural formulas, it is generally quite interesting to design new materials, find potential phototoxic chemicals, and estimate missing spectroscopic data for known molecules [49].In a recent study, Chan et al. [50] utilized TD-DFT computation approach for rapid ultraviolet-visible spectrum prediction.The method was developed by Urbina et al. [51] using neural network-based computation to predict UV-visible spectrograms.

Time-dependent density functional theory (TD-DFT)
For TD-DFT calculation, the software should be able to analyse the energy of the chemical structure in the excited states, and the probability of transition between energy levels for the chemical molecule.For example, the ORCA programme contains several methods for accurately determining excited state properties.The TD-DFT technique is the most effective of all the approaches.For precise results in this method, an optimized geometry file of the chemical structure is required.To optimize the structure, the user might utilize the "IQmol" software package or another.After that, the user can use Notepad + + to create the input file, with the function code "! B3LYP def2-TZVP", "RIJCOSX" code to speed up the process, "% TDDFT" code to automatically generate the excited state calculation, "NROOTS" flag to determine how many excited states to be added, and "MEXDIM" to determine the maximum dimension of the expansion space.To simulate the analyte employed in the experiment, CPCM may be a solvation model for both the ground and excited states.The number "0" denotes charge, whereas the second number denotes multiplicity.Finally, from the same folder, save this file in "inp" format (tddft.inp).The user may then go to the folder and input a comment "orca tddft.inp> tddft.out"followed by "Enter" to execute the computation on the CMD line (comment prompt).Depending on the molecules involved, it might take some time (10 min-2 h).After the computation is completed, the programme creates an output file in the same folder that contains all of the data [52,53].

Visualization of UV-visible spectra
The UV-Visible spectrum can be obtained for an unknown analyte instantly using a graphical interface.It does indeed show thin line spectral waves, but some line broadening is required to make the predicted spectra match the experimental one.This is easily accomplished by selecting "Advanced > > " and then, on the "Infrared Spectra Settings" tab, adjusting the "Peak Width" to 10-30 cm −1 [54][55][56][57][58]. Figure 3 shows the generated spectrogram of Zidovudine compound.

IR/Raman predictions
For chemical characterization and identification, both infrared (IR) and Raman spectroscopy continued to be essential tools.Recently, McGill et al. [59] developed the IR spectrum prediction procedure using a neural network-based approach.IR and Raman spectra may also be predicted using the ORCA software.It uses "Avogadro" or "IQmol" to compute the frequencies of the molecules.The 3D structure of the analyte is to be analysed and optimized.The ORCA programme can create output on its own.The user must create a new folder and set the optimized geometry structure and input file, similar to the UV-visible computations.The "! B3LYP DEF2-SVP" is the function code, while "OPT FREQ" specifies multiplicity.Finally, save the file in "inp" format in the same location so that the user may navigate the folder and execute "orca foscarnet.inp> foscarnet.out"followed by "Enter" to perform the computation.The output file can be created in the same folder when the operation is finished [7,[54][55][56][57][58]. Figure 4 shows the predicted IR spectrum of foscarnet generated by Avogadro.

Plotting a spectrum
Using Avogadro as a graphical user interface, the IR spectrum may be generated rapidly.To view the visual spectra in a new window, the user can open the saved output file and click "Show Spectra".Although it displays narrow spectral lines, some line widening is necessary to bring the predicted spectra as close to the observed one as possible.This can be readily performed by selecting Fig. 3 Predicted CD and UV spectrum of zidovudine generated by Avogadro "Advanced > > " and then changing the "Peak Width" to 30-130 cm −1 on the "Infrared Spectra Settings" tab [54][55][56].

Mass spectroscopy predictions
The molecular weight of an analytes in pharmaceutical studies is determined by mass spectrometry (MS).In an electron ionization mass spectrometry (EI-MS), an electron beam positively ionizes and fragments the molecules [60].According to the mass-to-charge (m/z) ratio, the mass spectrum is a distribution of the frequency or intensity of each type of ion [61].The prediction models calculate the chance of each bond breaking under ionization and the frequency of each ion fragment by using quantum mechanics calculations [62] or machine learning [63].For large molecules, model's prediction can consume few minutes, depending on the molecule's size.This due to the fact that these techniques must either utilize sophisticated computations to determine molecular orbital energies with high accuracy or stochastically mimic the fragmentation of the molecule.A neural network termed neural electron ionization mass spectrometry (NEIMS) predicts the electron ionization mass spectrum for a particular small molecule and is studied by Jennifer N. Wei and colleagues.Additionally, they found that the forward-only model fails to adequately capture the fragmentation events, but the bidirectional prediction mode does [64] because it directly predicts spectra rather than bond breaking probabilities.As a result, this model is significantly faster than previously reported methods.
Wang et al. utilized the recently developed quantum chemical programme QCEIMS (Quantum Chemical Electron Ionization Mass Spectrometry).QCEIMS can theoretically calculate the spectra for any given chemical structure.However, in order to make quick predictions, approximations and parameter estimations are required, which are important for the precision of QCEIMS predictions.For the MD trajectories, fragment ions are calculated by QCEIMS using Born-Oppenheimer molecular dynamics (MD) within picosecond reaction durations with femtosecond intervals.With this approach, they discovered that tweaking QCEIMS's parameters were not a practical way to enhance simulation outcomes [65,66].One of the best tools for in silico mass-spectrum-tocompound identification is CFM-ID, which Wang et al. used to predict more accurate ESI-MS/MS spectra.They added a new method for modelling ring cleavage that models the process as a series of straightforward chemical bond dissociations, and they expanded their handwritten rule-based predictor to cover more chemical classes of analytes [67].They also listed parameters from molecular topological parameters.

Fluorescence spectroscopy predictions
Fluorescence spectroscopy measures a target analyte fluorescence upon being excited by a laser beam (often UV absorption) [68].The prediction of analyte's fluorescence features, including the type of fluorescence, emission, and excitation wavelengths [69], can be employed to examine included solvent effects.It has been used to predict the spectra for a variety of fluorescent compounds [70].The majority of the predicted spectra have molecular masses of 228 or below.In such case, DFT technique can be used for larger molecular weight and chemical emission spectra calculation with solvent effects.
The characterization of electronic excited states depends on the accuracy of simulation spectrum of molecular absorption /or emission and precise techniques like the equation of motion coupled cluster singles and doubles (EOM-CCSD) [71,72].In order to increase the emission spectrum qualities, Caricato et al. [73] combined the EOM-CCSD and polarizable continuum (PCM) models and reported that the predicted values of vertical emission energies are in good accord with the available experimental data.Later, DFT was used by Powell et al. [74] to demonstrate the capability of predicted spectra in generating libraries of fluorescence spectra in a digital format.Ye et al. concluded that the statistical requirements for the numerically predicted wavelength were satisfied by the Lasso-RF (Random Forest descriptor) model.Four conjugated bonding-related characteristics were found by the model to contribute primarily to the predicted emission wavelength [75].Furthermore, Shams-Nateri et al. [76] investigated the link between absorption and emission spectra using the PCA chemometric approach, and they found that the accuracy of emission spectra prediction was improved with the addition of more principal components.

Electrochemistry predictions
Because of the growing interest in electrochemistry as a potential drug core structure and for the development of organic photovoltaic materials, it has recently experienced a huge comeback and provided valuable prediction, filtering, and active learning.This includes a promising optimization of the electrochemical properties of the analytes, investigation of intrinsic electron deficiency, and rendering of the connection between electronic characteristics and substituent effects [77].Using electrochemistry predictions of compounds using quantum mechanical calculations provides a quick and accurate method for the research.For instance, DFT is regarded as the "workhorse" of recent theoretical investigations in electrochemistry and physics [78].
Electrochemical systems are studied using the popular electrochemical impedance spectroscopy (EIS) characterization approach.The significance of this method is still constrained by several issues.EIS is also extensively employed in the development of sensors [79,80], in health care [81], drug release [82], testing, and biology [83] because EIS makes it possible to characterize such systems and helps in identifying crucial variables like conductivities [84], resistances [85], and capacitances [86].The computational Gaussian processes (GPs) used in this method faced significant challenges including noise, impeded spectrum regression, polarization resistance, and probed frequencies that were not always ideal.An infinite or finite collection of random variables is referred to as a GP, if the joint distribution of any finite subset displays multivariate Gaussian behaviour.Then, GPs may regress and predict it using a prior distribution and a set of assumptions on the characteristics of the observed unknown function [87].Regression and prediction uncertainty can be measured using GPs and also have so far been used to filter data, predict parameters in diverse situations [88], and enhance experiments in the active learning domain.Liu and Ciucci et al. [89,90] used GPs to de-convolve the distribution of relaxation duration, a novel approach for EIS analysis.Then, using a finite GP approximation, Maradesa et al. extended this framework to constrain the DFT to be non-negative.Additionally, Py et al. [91,92] created and validated the method that Ciucci used to assess the quality of EIS spectra using GPs that complied with the Hilbert transform.
Kiss et al. [93] predicted the substituent effects in electrochemical properties of the analyte and comprehended the influence of substituents on the character of the electronic transition and transition density matrices (TDMs).This procedure makes it possible to access the distribution of electrons and holes in the excited state and determine their delocalization.This makes it possible to reveal electronic excitation processes like charge transfer [94].The imbalance in the TDMs is caused by the presence of electron-donating and electron-withdrawing groups interacting with the hole.The location of the hole is altered when an electron-donating moiety uses mesomeric effects to donate electron density to the hole.Instead of being just inductively impacted, at this instance, the TDM can be described as mesomerically effected.On the other hand, the inductively dominated TDM lacks any localization due to the absence of any major TDM elements on the analyte.The polarity difference has a significant impact on the mesmeric contribution to the TDM.This made it easier to spot the impacts of charge transfer and substitution.
The next field of research addressed the exciton binding energies, which show the Coulomb attraction between the exciton quasiparticles (electron and hole).It is a measurement of the exciton's separability in free charges, and it has a direct impact on how an effective current is produced in optoelectronics [95].More details on the impacts on the characteristics of the electronic structure are revealed by analysing the HOMO and LUMO energies (EHOMO and ELUMO) [96].In order to optimize the electrochemical characteristics of an analyte, Min et al. [97] developed and verified a machine learning (ML) approach for electrochemistry.Both output (such as initial capacity and cycle life) and few input (synthesis parameters, ICP-MS data, and X-ray diffraction (XRD) results) variables were used to build several experimental datasets for analyte [98].When distributing these variables across the entire dataset while building the ML model, a number of primary variables were chosen to serve as suggestions for the optimal experimental parameters.

Quantitative structure retention relationship (QSRR)
QSRR is a computational approach for linking chemical structural variables to chromatographic column retention behaviour.Here, Y-variables are frequently employed as dependent variables for predictive or explanatory purposes, whereas X-variables are utilized as independent variables.As a result, Y-variables in QSRR have connected to solute chromatographic retention, whereas X-variables encode solute molecular structure.QSRR was first used to characterize columns by quantitatively comparing their separation qualities or to supply knowledge for predicting retention mechanisms in various chromatographic settings [22].A typical QSRR study includes building a retention database of compounds with known chemical structures, computing molecular descriptors for each structure, choosing descriptors, creating a QSRR model, and validation.Figure 5 illustrates a QSRR methodology and work flow.
The most popular methods for expressing chemical structures are molecule 1D descriptors, 2D descriptors, and 3D descriptors.While representing a connection table or a molecular graph, the chemical structure of the solutes of interest is used to compute 2D descriptors, whereas 1D descriptors provide simple chemical information about a solute, such as molecular weight or the number of oxygen atoms in the structure.A molecular descriptor that describes both the general surfaces/or Fig. 5 Scheme of QSRR methodology in chromatography volumes of molecules and 3D arrangement of structural attributes is known as a 3D molecular descriptor [23].
Depicting the molecular structure of QSRR is one of the key concerns in QSRR modelling.Molecular descriptors that describe chemical structures are typically categorized as physicochemical descriptors and descriptors of the quantum chemical, topological, etc. [99].The fact is that physicochemical descriptors have a positive correlation with solute retention on chromatographic columns.On a molecular level, quantum chemical descriptors shed light on the process of chromatographic retention, although the link to solute retention is frequently poor, and the calculation is laborious.With today's computational technologies, topological descriptors are easily constructed, but they are unrelated to retention phenomena [24].There are two methods of the QSRR approach, viz., the direct mapping method and the direct comparison method.

Prediction of retention time by the direct mapping method
It is a simple method for predicting compound retention time on a chromatographic column.It is a web-based solution that allows users to predict retention by submitting their data and receiving expected retention values.Predict is a database available, and this experiment has four steps as follows.
The user can create a CVS file that includes the compound name, real retention time from the PubChem CID or InChIs databases, and stereo-chemical parameters.The user must be able to upload retention data and get new retention predictions easily using a web interface.On the website, the user is initially asked to create a new chromatographic system.Each system will contain two types of columns: (1) a name and (2) a column type (for example: RP, HILIC).( 3) column description (for example, Waters and Symmetry C18 columns), (4) eluent system (for instance, 95:5 methanol/water), ( 5) The eluent's pH (for example, acidic or alkaline), and (6) Eluent additives (for example, 0.1 per cent trifluoroacetic acid).The user will next submit a CSV file containing retention times for chemicals derived from their studies or google scholar in the following phase.Finally, the user may obtain the estimated retention time by clicking "get a prediction" [58].

Prediction of retention time by direct comparison method
QSRR Automator, a python-based software, can be used to predict retention using the direct comparison method.Mordred, a software package that uses the rdkit package, can be used to determine molecular descriptors.Machine learning operations may be performed with the sci-kit learn package.The following is a description of the QSRR Automator Workflow.The training data, which contains the name of each chemical, the structure in the form of a simplified molecular input line entry system (SMILES) text string, and the retention duration, may be created by the user.The programme creates a template and simplifies the input file on its own.After that, the user may submit their training data (chemical descriptions, SMILES, compound name, and actual retention time).The structural and electrical descriptions to be utilized should be used.Functional groups, hybridizations, the number of carbon atoms, and the ring system are all structural properties.Aromaticity and numerous electronegativity calculations are two electrical properties.All of these calculations are simple; unlike more complex fingerprint feature combinations, they can all be done using the Mordred software package, which calculates over 1500 features [100].The recent data on the QSRR based method were listed in the Table 1.

Chemometrics in chromatography
The chemometric approach is widely used in separation science to predict the analysed peak asymmetry, peak overlapping, and peak optimizations.Co-elution of multiple analytes in chromatography significantly complicates quantification of the target analyte due to interference caused by incorrect method optimization.At this juncture, chemometric methods such as principal component analysis (PCA) are widely used in separation science and have now been extended to LC-HRMS analysis for proteomics and metabolomics.In addition, artificial neural networks (ANN), factorial design (FD), partial least squares (PLS), and cluster analysis (CA) are also in place [113,114]

Chemometrics in one-and two-dimensional chromatography
In the development of two-dimensional (2D) chromatography, the entire first-dimension (1D) effluent is divided into many fractions, each of which is subjected to 2D separation.Two-dimensional chromatography is created by combining the results of 1D liquid chromatography separations (LC × LC).The placements of the spots provide qualitative information, while the intensities of the spots provide quantitative information.However, extracting information from extremely complex molecules like protein digests, metabolic extracts, and oil mixes can be problematic.Even with modern high-resolution chromatography, extracting the entire information of a complex matrix remains a challenging task.Many researchers are constantly working to improve the efficiency of chemometric data processing strategies.In chromatography, chemometric is an appreciable tool for pre-and post-data analysis to resolve undesired background signals, baseline drift, unresolved peaks, and shifting retention times.Chemometric-based data interpretation, information extraction, and predata processing can significantly increase the analytical performance of an existing technique.The various chemometric approaches used in chromatography are penalized partial least squares (PPLS) approaches, multivariate curve resolution and orthogonal subspace projection for background correction, local minimum value approach, baseline estimation, and denoising using scarcity, retention-time-alignment strategies, peak clustering, and principal component analysis (PCA).These methods highlighted the chemometric techniques as the most progressing in silico approach in 1D and 2D chromatography and spectroscopy [115].

Chemometrics in unsupervised and supervised techniques
For understanding the dissimilarity or variance in the data matrix, PCA, independent compound analysis (ICA), and cluster analysis (CA) are used.As a result, the "calibration sets" may be defined as loading vectors and utilized to project unknown data.If data does not cluster against any objective criterion, then supervised procedures such as multivariate calibration methods are applied.Although a regression model may be built utilizing a large number of PCA variables, this approach is referred to as principal component regression (PCR).The data matrix's PCR analysis is mainly based on variance.The partial least squares (PLS) method, also known as a projection to latent structures, is commonly used in the linear supervised method.It finds the route through the data matrix that maximizes the covariance between the matrix and the predicted variable and then creates a regression model [116].

Software tools in chemometrics and their workflow
Chemometric software (for example, BWIQ) is available for on-and off-line quantitative and qualitative spectral measurements to identify principal components.The software classifies the sample as corresponding to the group with the shortest calculated "Mahalanobis distance (a measure of the distance between point-P and distribution D)".The workflow is described in following section.
The complete spectrum will be presented on the screen once you start the software, click "file", open the data, and import it into the software.We may designate spectral files in BWIQ in a variety of ways, including calibration, validation, and ignored files.The "usage" column's dropdown button was used to manually designate the spectrum.The algorithm parameters have been chosen and are accessible in the algorithm properties tab.We may use the sampling method and adjust the calibration file to the o validation file ratio in the property panel, for example, 60:40 (calibration: validation).After that, eliminate any change in the unrelated to chemical variations data sets but rather to scattering, instrumental fluctuations, spectral noise, or background differences in the preprocessing processes.Because the model can analyse the full spectrum, it will be more sensitive to contaminants or changes in the samples that add signals in other spectral areas.However, excluding non-informative or noisy data areas from analysis is an advantage.Then, we have the option of using a chemometric method such as PCA-Mahalanobis distance (MD).In principle component space, the scores plot illustrates the sample clusters.The result shows clusters matching the different classes of principal components.Additional graphs, such as loading and variance, are also available [117].

Different types of chemometrics approaches Penalized partial least squares approach (PPLS)
This method was initially developed by Whittaker in 1922 to address signal smoothing issues [118].The goal of PLS is to approximate observed data by resolving conflicts between original data fidelity and the imprecision of fitting data more easily by resolving the model's fit to the data [119].Assume that Eq. ( 1) is used to calculate the fidelity and roughness combined in a balanced way: where z is the fitting vector and v is a vector representing the analyte spectrum, both of which have a length of "n" elements.Fitted z should maintain both the roughness of the fitted vector and fidelity to v. The sum of squares of differences between the vector and element of z and its neighbours can be used to describe F, which stands for fidelity to the analyte spectrum "v", and R, which stands for the roughness of the fitting vector z.A user-adjustable parameter called "λ" finds a balance between fidelity and roughness.Greater λ favours a fitted vector that is smoother.
A weight vector w was added for fidelity in order to use the PLS to estimate the background.Its element wi may be thought of as a weight that represents the dependability of point I as a component of background.The partial derivatives of Q are equalled to zero ∂Q ∂z = 0 , in order to solve the minimization issue of Eq. ( 1).The matrix form of the resulting linear system is then used to determine the fit (Eq.2). (1) To use this PLS approach for baseline correction, which is used by Zhang et al. and Cobas, one must first identify the locations of the chromatogram's peaks.In order to determine whether a data point in the chromatogram relates to background or a peak, respectively, a binary mask or weighted matrix can be generated once these peak points are known [120,121].
Additionally, Eilers et al. [122] created the asymmetrical least squares (asLS), which introduces an asymmetry parameter in an effort to address this problem.The weights assigned to positive and negative deviations from the baseline can now be less and bigger, respectively.However, this also takes into account of issues with the baseline that were raised for the introduction of adaptive iteratively reweighted penalized least squares (airPLS) [123], which enables some baseline regions to be fined more than others.By iteratively resolving a weighted penalized least squares problem, airPLS develops a weight vector.
Once the difference between the signal and the fitted vector d t is less than one thousand of the original sig- nal, it is assumed that an accurate weight vector has been established.The PLS approach satisfies the following termination criteria.
In some situations, both approaches overestimate the baseline when a matrix is present.Baek et al. created the asymmetrically reweighted penalizes least squares (arPLS) method as a solution [124].MairPLS is another technique built on the similar concepts.While comparing to the prior technique, Long Chen et al. [125] collaborative PLS for Raman spectra background correction result was better.

Multivariate curve resolution-alternating least squares (MCR-ALS)
From MCR-ALS, estimations of the chemically significant profiles of the relevant chemical species may be created from mixed experimental data using a bilinear decomposition [126].Building many MCR-ALS models while investigating suitable quality-of-fit and interpretability of resolved chemical information is commonly required by strategies to determine the optimal number of components in the MCR-ALS model [127].The data set include complex, heterogeneous samples of unknown composition, spatially resolved chemical images and (2) associated resolved analyte spectra of the individual, pure chemical components.MCR-ALS specifically breaks down an experimental data matrix (DM) [128] where in Eq. ( 5), the resolved spectrum matrix is ST, the residual error matrix is E, and the concentration profile matrix is C. Three-dimensional experimental data produced by spectroscopic techniques contain spectral (λ or v) and spatial (x and y) information.The 2D experimental data matrix, DM, which contains integrated spatial (both x and y together) and spectral (λ or v) information, is generated from the three-dimensional experimental data before MCR-ALS.This approach applied for the baseline correction and quantitative purpose also for correction of local minimum of the least square errors obtained by various other methods such as singular value decomposition (SVD) or PCA [129].

Principal component analysis (PCA)
The principal component analysis is a popular unsupervised learning technique for reducing the dimensionality of data.The PCA was invented in 1901 by Pearson [130].In chromatography, PCA is frequently used to examine the outcomes of complicated samples [131] where uncorrelated variables are linearly fit across the data set.The major variation of data is represented by the first component, which also describes the second-most frequent variance in the data, and so on.This chemometrics tool can be particularly helpful when it comes to interpreting highly dimensional data.
The PCA method may be used for interference factor removal, interference factor extraction, and data compression.The following equation illustrates the outcomes of using the singular value decomposition (SVD) method to carry out PCA analysis and get orthogonal principal components (PCs) [132].
where in Eq. ( 6), the three matrices U , , and V T denote scores, singular values, and loadings with sizes of m × m , m × n , and n × n , respectively.D stands for the raw data with a size of m × n for decomposition [133].
In chromatography, Soares et al. [134] applied the PCA in combination with COW; its interesting use is to compare columns.Prior performing PCA, the chromatograms are first aligned with a COW technique to increase the probability (p-) values.It is possible to determine if there are significant differences between chromatograms by computing the Mahalanobis distances and converting them to p-values.Although this method decreases noise and raises the signal-to-noise ratio (S/N), there is a (5) possibility that numerous components may become convoluted and that chemical information is lost.According to a report, the ideal bin size depends on the sample [135].This method may be used to classify samples in complicated or multidimensional data set.

Parallel factor analysis (PARAFAC)
PARAFAC reduces the dimensionality of the data collection, but factor analysis is as similar to PCA.Factor analysis present the data as trilinear and contains three modes, namely spectra, chromatograms, and concentrations [136], whereas PCA is essentially a dimension reduction approach.As a result, it discovers not only a subspace, but also the vector orientations [137].PARA-FAC2, which was developed by Khakimov et al. [138], can similarly handle slight changes in retention time.The three-way array X of dimensions I, J, and K can be described by the PARAFAC decomposition.
In Eq. ( 7), F stands for the number of factors, while X ijk , a if , b jf , c kf , and e ijk are, respectively, elements of X, A, B, C, and E. The loading matrices A, B, and C have dimensions of I × F,J × F , and K × F , respectively.The three- way array of dimensions I × J × K is denoted E [139].
The uniqueness of PARAFAC model is that it establishes not only the subspace, but also the location of the axes defining it.Additionally, the PARAFAC model offers a second-order benefit of allowing for the analysis of chemical components even in the presence of unidentified interferences [140].Tatjana et al. and Na Peng et al. [141] both applied the PARAFAC to the fluorescence analysis, and they discovered that the model of fluorescence had the capacity to quantify and analyse fluorophores quality in analytes and classify the various types of fluorophores.Another study recommended the combination of PARAFAC with fluorescence regional integration for better characterizing analyte and understanding their functionality [142].

Partial least squares (PLS)-based methods
PLS-DA, also known as discriminant partial least squares (D-PLS), is a method for analysing partial least squares.The technique was first developed by Barker and Rayens [143].Dimension reduction and the construction of a predictive model are the two major components of PLS-DA modelling.It gives a linear delimiter using partial least squares (PLS) regression with the response variables being binary class membership indices (e.g.0 and 1) for each class.The PLS-2 algorithm, which enables (7) the prediction of a matrix of response variables in multiple components, is used when there are more than two classes involved.
PLS-DA-The components must be orthogonal to one another in the ordinary variant.The non-singular eigenvectors of the covariance matrix C can be used to formulate it [144].
where in Eq. ( 8), y is the class label vector, C n is the n × n centring matrix, and X is the loading matrix.The load- ing vectors a 1 ,… a d , which denote the relevance of each feature in that component, are computed iteratively.Its objective for iteration h is as follows: where X1 = X, y h and X h are the residual (error) matrices following transformation with the prior h-1 components, and b h is the loading for the label vector y h .
PLS-DA has been used mostly in biomarker and drug discovery research using LC-MS/MS and NMR study of advanced-stage melanoma in blood [145].Using LC-MS data, Lambrecht et al. [146] employed PLS-DA to classify black rice according to its place of origin.PLS was used by Eleni et al. [147] to predict the diffusion of substances in artificial membranes.
Additionally, orthogonal partial least squares discriminant analysis (OPLS-DA) is designed to distinguish between the discriminating and non-discriminatory dimensions [148].Using a set of metabolites identified by LC-MS/MS, Zhang et al. [149] applied OPLS-DA to confirm the legitimacy of fruit juices.Shurui et al. [150] used a similar strategy when they used OPLS-DA to HRMS study for non-target metabolomics.

Support vector machines (SVM)
A set of pattern-recognition techniques called support vector machines (SVM) was developed to effectively handle nonlinear data distributions.It is one of the chemometrics' machine learning methods.The fundamental component of SVM is the projection of data points into a space with added dimensions, which serves as a means of identifying linear functions capable of modelling the data [151].Such modelling functions can be projected back into the space of the original predictors, and producing functions are higher in complexity but lower in dimension (often nonlinear).The use of SVM in discriminant classification is conventional.Nevertheless, several authors offered classmodelling-relevant adjustments.It is important to note the support vector domain description (SVDD) method by Songfeng Zheng [152] used hyperspheres to describe the class spaces, as one of the most popular strategies.Numerous researchers have used this strategy in a variety of analytical studies, including laser-induced breakdown spectroscopy [153], ATR-FT-IR spectroscopy [154], tandem mass spectrometry (MS/MS) [155], and HPLC [156].

Artificial neural networks (ANNs)
ANNs are multilayer networks of linked mathematical operators (neurons).The feed-forward neural network is the most common ANN.Here, each neuron performs as a weighted sum of the input data or outputs of the preceding layer as modified by an activation function (typically linear or logistic function).The proposed algorithms learns from a dataset for predicting event outcomes [157].
In the last decade, artificial neural networks (ANNs) have been developed to determine retention index or time for 1D-GC, 1D-LC, 2D-LC and 2D-GC separations [158,159].ANNs are computer programs that "learn" to carry out tasks by taking into account multiple cases.As long as enough input is given, an ANN can detect traits and patterns in data.Then, predictions are made in novel conditions using these traits and patterns.ANNs have been employed in variety of analytical research studies like LC-MS/MS determination [160], GC-MS [161], and HPLC [162].Moreover, the list of chemometric methods used in analytical techniques were listed in Table 2.

Table 2 List of chemometric methods used in analytical techniques
LC-NMR Liquid chromatography nuclear magnetic resonance, LC-MS Liquid chromatography-mass spectrometry, GC-MS Gas chromatography-mass spectrometry, FT-IR Fourier transform infrared, HPLC High-performance liquid chromatography, UPLC Ultra-performance liquid chromatography

Analytical quality by design (AQbD)
Analytical quality by design (AQbD) is an approach for developing robust analytics that is appropriate for regulatory flexibility in pharmaceutical submissions to the FDA.AQbD is widely used in the development of various analytical methods such as UV-visible, FT-IR, Raman, NIR, fluorimetric, HPLC, UHPLC, LC-MS, GC-MS, HPTLC, and SFC.In the pharmaceutical industry, the AQbD tool is integrated with PAT as a real-time process analyser to monitor any given process or material, which generates massive and complex data sets.There is a growing interest in the implementation of AQbD in new analytical method development procedures for wider applications including assays, stability studies, and bioanalytical studies, in analytical method development.While comparing to one-factor-at-a-time (OFAT) approach, AQbD-based analytical methods have demonstrated a high degree of robustness and method performance.Notably, using these techniques reduces the likelihood of human error, and the AQbD approach will not predict any chromatogram but instead explore scientific understanding in method implementation sequences, beginning with the quality of predictions that relate to risk assessment in method choice, then between method parameter and expected method results, and finally a region for a highly robust and cost-effective approach [186].The design of experiment (DoE) is a part of AQbD methodology and represents the interaction among the input factors that ultimately affect the technique response and outcomes.Therefore, a typical AQbD methodology starts with an analytical target profile (ATP) and risk and critical evaluation, then uses DoE to optimize the method variables, creates a method operable design region (MODR), and implements a control plan [187][188][189].There are works available and comprised in Table 3 and the scheme of methodology illustrated in Fig. 6.

Assessment of the predictive ability of NMR prediction by Chemaxon
An attempt was made to verify the expected chemical shift values for the chosen test compounds shown in Fig. 7.The original experimental chemical shift values were compared to the predicted chemical shift values of ten chemically divergent structural compounds in this experiment.A per cent error (%) for each chemical shift value was obtained, as well as regression analysis.
The per cent error ranged from − 26.52 to 35.98%.The Fig. 6 A typical AQbD approach in analytical method development correlation's graphs in Figs. 8 and 9 show R 2 value of 0.959 (H 1 -NMR) and 0.974 (C 13 NMR).This indicates the accuracy of NMR signal prediction.According to prediction results, in H 1 -NMR, aliphatic proton error ranged from − 26.52 to 35.98%, whereas aromatic proton error ranged from − 25.47 to 9.21%.The aliphatic carbon error ranged from − 14.41 to 27.54% in the C 13 NMR, whereas the aromatic carbon error ranged from − 14.95 to 6.49 per cent.Finally, we conclude the aliphatic error was greater when compared with the aromatic error all those data was presented in the Table 4.

Assessment of the predictive ability of ORCA For UV-Visible prediction
Here, originally obtained wavelength maximum (λmax) values were compared to the predicted wavelength values of fifteen structurally divergent structural compounds.
A per cent error (%) for each wavelength value was obtained, as well as regression analysis.The error rate was found to be between − 2.27 and 18.69%.The correlation's graph in Fig. 10 shows R

Predicted signals
Fig. 9 Regression for C 13 -NMR predicted versus experimental signals solvent, the error ranges from 0.0 to 18.69 per cent, whereas water has a range of − 2.27 to 11.73%.Finally, we conclude that more error is observed when using methanol as a solvent for prediction compared with water.The resulting data were presented in the Table 5.

For Raman and infrared
The predicted Raman shift and infrared absorption frequency for the selected test substances was verified with experimental values.Here, the predicted frequency values of ten chemically divergent structural compounds are verified with original experimental frequency values.The % error for each frequency value and regression analysis was calculated.Furthermore, an attempt has been made to verify the predicted infrared absorption frequency for lamivudine and zidovudine with all functional frequencies.In this study, the predicted frequency values of lamivudine and zidovudine structural functional group frequencies were compared to the original experimental frequency values.The % error for each frequency value and regression analysis was calculated.The % error was observed between − 24.26 and 18.89%.The R 2 value for the correlation graph in Fig. 13 was shown to be 0.970 for both lamivudine and zidovudine absorption frequencies.This also demonstrates the reliability of frequency prediction using a single compound prediction with all functional groups was presented in the Table 7.

Assessment of predictive ability QSRR Automator with reference data
The QSSR retention predictions for antiviral drugs were conducted.For that reference information was gathered from various research publications, and different antiviral drugs with C 18 column elution were selected.The predicted retention time for the test set of drugs was compared to the published retention time data.The % error for each retention time and regression coefficient were calculated.The % error was observed in the range of − 20 to 20%.This can be observed clearly in the histogram plot in Fig. 14 and 15, and the R 2 value for the correlation was 0.947.This indicates the reliability of retention time prediction.Topological and 2D descriptors like MW, AATS, MATS, GATS, Axp, n6aHRing, NsNH2, and SLogP are the most often used contributing descriptors, and they depend on the analyte chemical structure.

Predicted frequency
Fig. 12 Regression plot for infrared absorption of predicted versus experimental frequency

Short-time Fourier transform (STFT) method for assessment
The short-time Fourier transform (STFT) study revealed sufficiency to recognize that all the assessment results of C 13 , H 1 -NMR, UV-visible, IR, Raman are accurate and have also been used for further evaluation purpose.Based on the density power of frequency spectrogram, it is most likely that the yellow or red colour denoted high power, and the blue colour is low power.If the spectrograms had the same frequency power or should have produced results close to the acceptable prediction, we would have concentrated on the greater frequency power of both the predicted and experimental data sets.While both will have different frequency powers, this indicates that the prediction was inaccurate.
The H 1 -NMR power frequency of the predicted and experimental results are nearly identical and the highest power index should be in the range of 18.25-17.39,respectively (Fig. 16).This indicates that both outcomes are accurate.With regard to the C 13 -NMR, the power frequency of both the predicted and the experimental data are depicted in Fig. 17, both of which exhibit the identical power index value of 44.17 and 44.01, respectively.This demonstrated the validity of the data, and the spectrogram revealed very slight frequency differences in the lower power range, which are visible in the blue colour peaks.The highest frequency indexes in the UV-visible Fig. 18 are 44.89 and 44.82, for both predicted and experimental results, respectively, in the same frequency index.Finally, the Raman and infrared power frequency index can be observed in Figs.19 and 20 that both the predicted and experimental data are shown using the same frequency index and Fig. 21 shows the STFT spectrogram 3D plot for IR prediction results all these results providing us to confirm that the prediction was accurate.

Discussion
Our assessment afforded the acceptable results, however few software-related constraints, particularly time consumption of 5-20 h for TD-DFT calculations.The prediction error will produce more erroneous findings when the data set is small and a prediction tool required to be unique.Therefore, a large data set is necessary for successful finding; but, in some cases, a large data set can also result in inaccurate prediction, e.g. a complicated structure with multiple classes of variables takes longer time to process, and the impact on the prediction process ultimately leads to wrong results, which is disappointing for a research study.Therefore, careful planning in the dataset and systematic prediction are required to produce reliable research findings.Then, while collecting the reference data set, we stumbled into issues with some data not being present in the reference library.In that situation, leaving the compound and switching to another approach might be an option.For example, if two distinct spectrum results for the same chemicals are found in certain reference data, such case data optimization need to be performed.
There are several online reference data sources available for mass spectroscopy; however, there are fewer for infrared Raman spectra.In Table 9, all these problems and challenges related to spectrogram prediction are listed.In QSRR approach, more than 50 compounds are needed for prediction of retention time.During operation, we noticed that whenever a data set was given, it was based on predicting values nearby.This may be a problem with the QSRR Automator programme, but more sophisticated software for retention time prediction is already available, so we can utilize it for alternative purposes.The chemometric theory is entirely mathematically based, understanding AQbD and chemometrics is more critical in nature.If a chemist is not familiar with mathematical, it will be harder to develop a prediction process.Each approach in chemometrics has a unique methodology, thus experts are required for both planning and result evaluation.Additionally, we noted in the literature study that there is less research on the electrochemistry spectroscopic prediction with chemometrics.Generally, the electrochemistry prediction will be employed in the technological field, but only when few drugs are discovered and developed.Since there are so many variables that might influence the results, such as instrument setting, calibration, process, and model selections, some AQbD method failure will certainly occur in the case of method replacements.However, this strategy is most effective at minimizing method transfer, OOS, and OOT failure rates.These are presented in Table 10.
The differences between the physical and chemical data predictions are also illustrated in Table 11.By comparison, the physical data prediction is simpler than the chemical data prediction because the latter requires a larger number of supporting techniques and programmes.In addition, it requires larger number of descriptors, and is more challenging for beginners and students.[193] Enhanced method efficiency; fewer trials, resulting in lower method cost; time utilization; levels of compliance; and knowledge of the extremes As the technique demonstrates a link between the method variables and performance The analyst gains confidence in the method's effectiveness Analytical techniques are re-evaluated regularly to resolve any gaps in method performance To avoid failures in method transfer, OOS and OOT, AQbD methodology might be used AQbD allows for regulatory flexibility, but it necessitates to high level of robustness 3 Chemometrics Some good laboratory-based analysers are not all mathematically minded, so they did not want to overburden their studies with maths Teaching and learning, chemometrics is still having problems integrating itself Chemometrics is partial because the fundamental body of information is overburdened Any new content must supplant previous subjects Basic statistical knowledge, such as univariate calibration, precision, accuracy, and uncertainty, is necessary [275] Chemometrics technique advancement is continual, rapid, and efficient With the improvements in exploratory tools, they provide rich information about chemical systems [276] Adaptability for analysis of complex chemical process data in the industry [277] Quality control of herbal drugs, food analysis like vegetables [278][279][280], fruits [281] grains [282,283], proteins [284] etc. Environmental chemistry studies [285] and assessment of the results The development of high-throughput chromatographic and spectroscopic data calculations 3.1 ANN Retention time of new congener cannot be predicted No information about the relationship between molecular property and retention behaviour can be obtained [286] An accurate and reliable Human brain way it is working Optimize the separation without employing analyte properties

SVM Introducing bias in the results
Class-modelling technique should only consider sensitivity when deciding on the ideal circumstances for process parameters Exhibit better overall performance These models and the experimental result agree well The distribution of non-target samples is prevented from the target samples 3.3 MLR-ALS Fail to develop an appropriate QSPR model Less prediction when compare with the SVM model [287] Measuring the amount of variables Improved selectivity through improved chemical information separation from interference effects and higher signal-to-noise ratios, which improve chemical distribution visualization

Conclusions
Finally, with acceptable accuracy and the least feasible variation, the present review of computational approaches in spectrum prediction was concluded.Overall, students and researchers are considerably utilizing the in silico tools in computational chemistry and indicate the reliability of such tools in research.The development and application of computational approaches in analytical research and development are our key objectives.As we observed, computational analytical behaviour prediction offers a wide range of applications in academic research, bioanalytical method development, computational chemistry, analytical method development, data analysis approaches, material characterization, and validation.Still, the prediction error of these tools need to be minimized for better accuracy, thus it will be explored much more in exploratory research in future.A combination of strongly overlapping spectra, the accuracy is in the region of 99-101% More reliable with simple mathematical calculation [295] Table 11 Differences in physical and chemical data predictions

Physical data prediction Chemical data prediction
Falls under the predictions of mass, capacity, density, response time, and size Falls under prediction of the electronegativity, ionic potential, bonding energy, electrochemical characteristics, fragmentation, and structural attributes There is no more uncomplicated process The process is little complicated Involvement of fewer descriptors in the prediction process Involvement of a large number of descriptors in the prediction process 1D, 2D, and topological descriptors are mostly utilized for the prediction [271] All topological, electronic, 1D, 2D, and 3D descriptors were utilized for the prediction [272] The database size is typically smaller The database is larger and more complex Canonical SMILES are includes in the data set, mostly, 2D chemical structure is sufficient for the prediction Data sets used for prediction includes canonical SIMILES, ChEMBL, and 2D and 3D chemical structures Predictions scores typically produce immediate results Prediction outcomes should not be direct; instead, they correlate with other outcomes to produce final outcomes There are fewer accessible methods and approaches; most are QSPR-based Numerous techniques and strategies are employed, such as QSAR and QSRR

Mathematical calculations only involved such as regression and correlations
There are included quantum chemical calculations like TD-DFT and DFT The final outcomes are unaffected by the less accurate physical data predictions in some cases Less accurate chemical data prediction will have an impact on the outcomes

Fig. 4
Fig. 4 Predicted IR spectrum of foscarnet generated by Avogadro

Fig. 10
Fig. 10 Regression plot for UV-visible of predicted versus experimental frequency

Fig. 11
Fig. 11 Regression plot for Raman shift of predicted versus experimental frequency

Fig. 13
Fig. 13 Regression plot for infrared absorption of predicted versus experimental frequency of lamivudine and zidovudine

Fig. 14 Fig. 15
Fig. 14 Regression plot for a retention time of predicted versus published

Fig. 19
Fig. 19 STFT spectrogram plot of Raman predicted published results

Table 1
Data on the analytical methods reported based on QSRR F Variance ratio, R Correlation coefficient, R 2The square correlation coefficient, RMSE The root-mean-squared error Table 1 (continued)

Table 3
Data on the analytical methods reported based on AQbD

Table 4
Comparative data for H 1 -NMR and C 13 NMR signals of predicted versus experimental signals MHz: Megahertz, A.no Atom number, M.t Multiplet information, Std Standard, Pred Predicted, Dev Deviation, % Error: Percentage error

Table 5
Comparative data of UV-visible spectral data: predicted versus experimental

Table 8
was presented the results of QSRR predicted and experimental retention time of Antiviral class drugs.

Table 6
Comparative data on Raman shift and infrared absorption frequency: predicted versus experimental values

Table 7
Comparative infrared data of single compound with functional group frequency: predicted vs experimental values

Table 8
Comparative data of predicted and experimental retention in HPLC using the QSRR approach

Table 9
Spectrogram behaviour predictions' limitations and benefits

Table 10
Chromatography behaviour predictions' limitations and benefits To accommodate the changes in the more complicated compounds, the training set must be bigger With an equal-sized training set, lipids provide better predictions than metabolites It is insufficient to separate very near co-elute compounds It is difficult to predict retention time when using several columns and conditions The predictor places the majority of predictions within one or two minutes/ One minute of their real value equals about 5-8% of the retention time Approximately, all predictions fall between 2-5 min about 10-25 per cent depending on the column It will quickly and easily produce and save a large number of models In-house data tests yielded comparable results To estimate retention times for numerous columns and conditions It is adequate to enhance confidence in exact identifications Compound of the same interests that are clearly identified 2 AQbD When overcome, can result in method failures and, in certain cases method replacements Chromatographic methods where the number of analytes is required for effective separation Since there are so many factors that impact the method's outcomes Applying the AQbD paradigm to analytical methods is justifiable Instrument settings, sample characteristics, procedure parameters, and calibration model selection are examples of these factors