- Review
- Open access
- Published:
Assessment of computational approaches in the prediction of spectrogram and chromatogram behaviours of analytes in pharmaceutical analysis: assessment review
Future Journal of Pharmaceutical Sciences volume 9, Article number: 86 (2023)
Abstract
Background
Today, artificial intelligence-based computational approach is facilitating multitasking and interdisciplinary analytical research. For example, the data gathered during an analytical research project such as spectral and chromatographic data can be used in predictive experimental research. The spectral and chromatographic information plays crucial role in pharmaceutical research, especially use of instrumental analytical approaches and it consume time, man power, and money. Hence, predictive analysis would be beneficial especially in resource-limited settings.
Main body
Computational approaches verify data at an early phase of study in research process. Several in silico techniques for predicting analyte’s spectral and chromatographic characteristics have recently been developed. Understanding of these tools may help researchers to accelerate their research with boosted confidence and prevent researchers from being misled by incorrect analytical data. In this communication, the properties of chemical compounds and its relation to chromatographic retention will be discussed, as well as the prediction technique for UV/IR/Raman/NMR spectrograms. This review looked at the reference data of chemical compounds to compare the predictive ability in silico tools along with the percentage error, limitations, and advantages.
Conclusion
The computational prediction of analytical characteristics offers a wide range of applications in academic research, bioanalytical method development, computational chemistry, analytical method development, data analysis approaches, material characterization, and validation process.
Background
The use of computational chemistry in research has been well-acknowledged in recent years and afforded significant research outcomes [1, 2]. There are literature reports on computer code for analysing models, replicating processes, predicting models, and interpreting chemical compounds [3]. Unlike the drug discovery area, the validity of computational techniques in analytical chemistry yet to be explored as a comprehensive tool [4,5,6]. The computational approach in analytical research is important because simulations of chemical behaviour of an analyte are needed for modelling of analyte response relationship in instrumental methods. Of course, it can be viewed as a visual representation of the connection between the analytical experiment and theoretical prediction [4, 7].
In this era, new chemical entity research is needed in new drug discovery process for treatment, diagnostic, and biomarker research. At this juncture, spectroscopy and chromatography techniques are playing a vital role in the purification, identification, and characterization of the targeted chemical compound [8, 9]. In general, understanding and interpreting the spectrograms and chromatographic retention times of the new compounds is quite difficult for beginners if the researcher is a non-chemist [10]. But, knowledge of spectrogram and chromatography is very essential for researchers and plays a crucial role in the process of developing new drugs. Indeed, the level of expertise and awareness on the accuracy of computation tools could assist the researchers in speeding up the experiments with partial validity of the analytical data [11]. In the current scenario, still there are predatory journals publish data sets that are not reliable if they are not verified [12]. Here, researchers may utilize computational tools to verify the data before citing in their research [4, 7]. The prediction tools of various spectrograms like UV–visible, infrared (IR), Raman, nuclear magnetic resonance (NMR), and mass spectra are now widely accessible to researchers. Similarly, in silico approaches to predict the chromatographic behaviour of an analyte in various chromatographic techniques like HPLC and GC are in existence [13, 14]. The prediction of retention time (tR) in chromatography is gaining much importance in analytical method development research. Several computational prediction approaches have been reported. Some of them are artificial neural networks (ANNs), response surface methodology (RSM), analytical quality by design (AQbD) [15], design of experiments (DoE), chemometrics, and quantitative structure retention relationship (QSRR) methods [16]. Although the knowledge about artificial intelligence software is limited, several artificial neural network-based programmes are widely available these days. Many researchers spend a significant amount of time on their experimental work, even though they are shortcomings in computational chemistry. The AQbD and QSRR approaches explore the scientific understanding of critical method variables and method response in chromatography [17, 18]. These methods are still recommended in pharmaceutical method development because it allows regulatory flexibility [19]. In the AQbD approach [20], the tool used in the model development is DoE. In chromatographic research, the quantitative structure retention relationship (QSRR) is a reliable in silico method for predicting molecular systems [21, 22], and it can be used to evaluate complex physicochemical features of analytes in chromatographic analyses and for predicting chromatographic retention parameters [23, 24].
Considering the above discussion, the present assessment review focused on various prediction tools available, and accessible to resource-limited research setups. We have also explored the predictive ability of the different in silico tools with examples pertaining to the reference spectral library. Thus, this review can assist researchers in assessing the tool’s reliability from case to case.
Main text
Problems involving the analytical methods
Today, the difficulties in analytical laboratories are the same as they had experienced in the past, although there has been advancement in analytical technology. Analytical laboratories experience difficulties related to the growth and preservation of expertise, maintaining the equipment sensitivities, and introduction of novel methodologies [25]. There are many reports on previous analytical issues with analytes, including method performance [26], a lack of regulatory flexibility [27], complex chemical processes [28], OOT-out of trend [29], and OOS-out of specification [30, 31]. This problem could be mainly raised by three stages such as pre-analytical, post-analytical, and development phase. These can be overcome by utilizing the most modern and advanced computational methods.
Pre-analytical phase
One of the crucial stages in the analysis of the sample is the pre-analytical phase it includes, gathering of literature, sampling, preparation of the sample, transport, and storage. This entire process is the most time-consuming and might occasionally lead to errors [32]. It is widely acknowledged that a degraded sample cannot produce good results. Always, it is important to conduct a literature review before beginning any research on an analyte. There are many databases, books, journals, and websites, but in some instances, information on new analyte may not be available due to a lack of studies on the analytes, or newly synthesized materials, or a lack of source availability [33]. Next, for new analytical method development, the preparation of a sample is a critical step. A sample processing method is unique for each type of sample, including biological matrix, food products, active compounds, excipients, and pesticides. A given procedure cannot be applied to a different type of analyte without a complete revalidation of the method [34]. Unfortunately, this rule is regularly ignored. Finally, there are several issues with analyte that affect storage and transportation; they are temperature, humidity control, data storage maintenance, and a lack of advancement [35].
Development phase
The selection of the method, procedure, principle, technology, and appropriate recommendations are the main problems that arise throughout the development phase. Unfortunately, it must be acknowledged that no method has yet been developed that satisfies all of these criteria and appropriate for all classes of analytes. This always place restrictions on analytical chemists. It is also crucial to understand whether the analysis’s objective is merely screening or accurate quantification. In developing chromatography methods, optimization includes temperature, flow rate, the choice of mobile and stationary phases, separation efficiency, internal standard selection, and validation. Thus, re-optimization are difficult task, if the method fails during method transfer [36]. In the last decade, new chromatographic techniques for the detection of bio-analytes have emerged. One of these techniques is tandem mass spectrometry (LC–MS/MS), which has advantages such as high selectivity and sensitivity but possess disadvantages such as expensive equipment, experienced operators, and more challenging method development [37, 38]. In the development of electrochemistry supported instrument, the general settings for resolution, path of the composite electrochemical response examination, and optimal path of analysis of the multidimensional data are complicated [39].
Post-analytical phase
In this phase, the key challenge is the collection and interpretation of data with analytical techniques, particularly when it comes to clinical research, proteomics, and metabolomics. Additionally, certain sophisticated computations raised problems from the data analysis as well. In general, manual calculations can produce inaccurate findings. From pre- and post-data analyses in chromatography methods, the common troubles are unwanted background signals, baseline drift, unresolved peaks shifting and retention durations, data comparison errors, and improper retention time alignments which are to be addressed [40]. In spectroscopic analyses, specific mathematical transformations that are frequently created for a certain experimental approach are typically used to rectify systematic undesirable signal changes. Baseline shifts (offsets), horizontal shifts, drifts (slope changes), and global intensity effects are some of the systematic signal fluctuations. The significant alteration of signal profiles produced by the derivation transform can mislead the interpretation of final results [41]. Overall, the scheme of application of computational method is shown in Fig. 1.
Prediction of spectrograms
Prediction of C13-NMR and H1-NMR
NMR is a significant tool for detecting carbon and hydrogen atoms in organic compounds. In the pharmaceutical industry, C13-NMR and H1-NMR are used to assess drug purity, composition, and chemical shifts of diverse organic molecules. NMR parameters are now calculated by utilizing computational methods in association with chemical structures. AI has created several software tools (e.g. ChemDraw, Chemaxon, etc.) that are now used to predict chemical shifts in H1-NMR and C13-NMR and offer net intensity, quality, and spectrograms.
Machine learning approach in NMR prediction
Machine learning (ML) approaches are more beneficial and, in most cases, faster than prediction-based databases like HOSE codes. The database works by finding structural similarities and averaging the experimental data for chemical structures. The similarity between the new and known HOSE codes has little bearing on the accuracy of the prediction. The well-established structure determination approach formerly relied on quantum chemical calculation-based methods such as topical-based DFT calculation. This method is accurate for H1 and C13 chemical shift predictions, but considerably more time-consuming and expensive. Today, software tools have been designed to speed up the procedure. The NMR signal characteristics can be visualized more accurately using a machine learning method called “Automatic structure verification (ASV)” based on variables such as temperature, solvent, pH, salt content, concentration, and so on which will affect chemical shifts in laboratory studies. All of these parameters have considered, such a way that NMR can predict the chemical shift for an unknown structure. But, certain other prediction algorithms take some of them into account, still the prediction systems produce variable values. But, the ASV system is capable of properly dealing with overlapping peaks. This is especially important when sections of the compound’s relevant peaks, such as significant solvent peaks, are quite close to other signals [42,43,44,45]. Few researchers have used this approach, including Jia et al. [46], who have developed a method for extracting data from previously examined 13C and 1H NMR spectra in order to recognize the NMR spectrum. Min Lin and colleagues predicted the chemical shifts based on cutting-edge machine learning [47].
Software handling for NMR Signal prediction
The user can either use a software application to draw the chemical structure of the test molecule or download and paste it into the software. The user will be able to locate the predicted C13-NMR and H1-NMR spectra in 1–5 min after clicking the calculation button. The user can optionally alter the frequency range from 60.0 to 1000 Hz after the prediction. Finally, a pdf document will be generated including the substance’s chemical shift, peak intensity, peak quality, molecular location, and coupling constant values [48]. A typical H1-NMR signal for Zidovudine is shown in Fig. 2.
Prediction of UV–Visible Spectra
The UV–Vis absorption spectrum of an organic substance is a key component of its physical makeup. Using predictions of UV–Vis spectra from molecular structural formulas, it is generally quite interesting to design new materials, find potential phototoxic chemicals, and estimate missing spectroscopic data for known molecules [49]. In a recent study, Chan et al. [50] utilized TD-DFT computation approach for rapid ultraviolet–visible spectrum prediction. The method was developed by Urbina et al. [51] using neural network-based computation to predict UV–visible spectrograms.
Time-dependent density functional theory (TD-DFT)
For TD-DFT calculation, the software should be able to analyse the energy of the chemical structure in the excited states, and the probability of transition between energy levels for the chemical molecule. For example, the ORCA programme contains several methods for accurately determining excited state properties. The TD-DFT technique is the most effective of all the approaches. For precise results in this method, an optimized geometry file of the chemical structure is required. To optimize the structure, the user might utilize the “IQmol” software package or another. After that, the user can use Notepad + + to create the input file, with the function code “! B3LYP def2-TZVP”, “RIJCOSX” code to speed up the process, “% TDDFT” code to automatically generate the excited state calculation, “NROOTS” flag to determine how many excited states to be added, and “MEXDIM” to determine the maximum dimension of the expansion space. To simulate the analyte employed in the experiment, CPCM may be a solvation model for both the ground and excited states. The number “0” denotes charge, whereas the second number denotes multiplicity. Finally, from the same folder, save this file in “inp” format (tddft.inp). The user may then go to the folder and input a comment “orca tddft.inp > tddft.out” followed by “Enter” to execute the computation on the CMD line (comment prompt). Depending on the molecules involved, it might take some time (10 min–2 h). After the computation is completed, the programme creates an output file in the same folder that contains all of the data [52, 53].
Visualization of UV–visible spectra
The UV–Visible spectrum can be obtained for an unknown analyte instantly using a graphical interface. It does indeed show thin line spectral waves, but some line broadening is required to make the predicted spectra match the experimental one. This is easily accomplished by selecting “Advanced > > ” and then, on the “Infrared Spectra Settings” tab, adjusting the “Peak Width” to 10-30 cm−1 [54,55,56,57,58]. Figure 3 shows the generated spectrogram of Zidovudine compound.
IR/Raman predictions
For chemical characterization and identification, both infrared (IR) and Raman spectroscopy continued to be essential tools. Recently, McGill et al. [59] developed the IR spectrum prediction procedure using a neural network-based approach. IR and Raman spectra may also be predicted using the ORCA software. It uses “Avogadro” or “IQmol” to compute the frequencies of the molecules. The 3D structure of the analyte is to be analysed and optimized. The ORCA programme can create output on its own. The user must create a new folder and set the optimized geometry structure and input file, similar to the UV–visible computations. The “! B3LYP DEF2-SVP” is the function code, while “OPT FREQ” specifies multiplicity. Finally, save the file in “inp” format in the same location so that the user may navigate the folder and execute “orca foscarnet.inp > foscarnet.out” followed by “Enter” to perform the computation. The output file can be created in the same folder when the operation is finished [7, 54,55,56,57,58]. Figure 4 shows the predicted IR spectrum of foscarnet generated by Avogadro.
Plotting a spectrum
Using Avogadro as a graphical user interface, the IR spectrum may be generated rapidly. To view the visual spectra in a new window, the user can open the saved output file and click “Show Spectra”. Although it displays narrow spectral lines, some line widening is necessary to bring the predicted spectra as close to the observed one as possible. This can be readily performed by selecting “Advanced > > ” and then changing the “Peak Width” to 30–130 cm−1 on the “Infrared Spectra Settings” tab [54,55,56].
Mass spectroscopy predictions
The molecular weight of an analytes in pharmaceutical studies is determined by mass spectrometry (MS). In an electron ionization mass spectrometry (EI-MS), an electron beam positively ionizes and fragments the molecules [60]. According to the mass-to-charge (m/z) ratio, the mass spectrum is a distribution of the frequency or intensity of each type of ion [61]. The prediction models calculate the chance of each bond breaking under ionization and the frequency of each ion fragment by using quantum mechanics calculations [62] or machine learning [63]. For large molecules, model’s prediction can consume few minutes, depending on the molecule’s size. This due to the fact that these techniques must either utilize sophisticated computations to determine molecular orbital energies with high accuracy or stochastically mimic the fragmentation of the molecule. A neural network termed neural electron ionization mass spectrometry (NEIMS) predicts the electron ionization mass spectrum for a particular small molecule and is studied by Jennifer N. Wei and colleagues. Additionally, they found that the forward-only model fails to adequately capture the fragmentation events, but the bidirectional prediction mode does [64] because it directly predicts spectra rather than bond breaking probabilities. As a result, this model is significantly faster than previously reported methods.
Wang et al. utilized the recently developed quantum chemical programme QCEIMS (Quantum Chemical Electron Ionization Mass Spectrometry). QCEIMS can theoretically calculate the spectra for any given chemical structure. However, in order to make quick predictions, approximations and parameter estimations are required, which are important for the precision of QCEIMS predictions. For the MD trajectories, fragment ions are calculated by QCEIMS using Born–Oppenheimer molecular dynamics (MD) within picosecond reaction durations with femtosecond intervals. With this approach, they discovered that tweaking QCEIMS’s parameters were not a practical way to enhance simulation outcomes [65, 66]. One of the best tools for in silico mass-spectrum-to-compound identification is CFM-ID, which Wang et al. used to predict more accurate ESI–MS/MS spectra. They added a new method for modelling ring cleavage that models the process as a series of straightforward chemical bond dissociations, and they expanded their handwritten rule-based predictor to cover more chemical classes of analytes [67]. They also listed parameters from molecular topological parameters.
Fluorescence spectroscopy predictions
Fluorescence spectroscopy measures a target analyte fluorescence upon being excited by a laser beam (often UV absorption) [68]. The prediction of analyte’s fluorescence features, including the type of fluorescence, emission, and excitation wavelengths [69], can be employed to examine included solvent effects. It has been used to predict the spectra for a variety of fluorescent compounds [70]. The majority of the predicted spectra have molecular masses of 228 or below. In such case, DFT technique can be used for larger molecular weight and chemical emission spectra calculation with solvent effects.
The characterization of electronic excited states depends on the accuracy of simulation spectrum of molecular absorption /or emission and precise techniques like the equation of motion coupled cluster singles and doubles (EOM-CCSD) [71, 72]. In order to increase the emission spectrum qualities, Caricato et al. [73] combined the EOM-CCSD and polarizable continuum (PCM) models and reported that the predicted values of vertical emission energies are in good accord with the available experimental data. Later, DFT was used by Powell et al. [74] to demonstrate the capability of predicted spectra in generating libraries of fluorescence spectra in a digital format. Ye et al. concluded that the statistical requirements for the numerically predicted wavelength were satisfied by the Lasso-RF (Random Forest descriptor) model. Four conjugated bonding-related characteristics were found by the model to contribute primarily to the predicted emission wavelength [75]. Furthermore, Shams-Nateri et al. [76] investigated the link between absorption and emission spectra using the PCA chemometric approach, and they found that the accuracy of emission spectra prediction was improved with the addition of more principal components.
Electrochemistry predictions
Because of the growing interest in electrochemistry as a potential drug core structure and for the development of organic photovoltaic materials, it has recently experienced a huge comeback and provided valuable prediction, filtering, and active learning. This includes a promising optimization of the electrochemical properties of the analytes, investigation of intrinsic electron deficiency, and rendering of the connection between electronic characteristics and substituent effects [77]. Using electrochemistry predictions of compounds using quantum mechanical calculations provides a quick and accurate method for the research. For instance, DFT is regarded as the “workhorse” of recent theoretical investigations in electrochemistry and physics [78].
Electrochemical systems are studied using the popular electrochemical impedance spectroscopy (EIS) characterization approach. The significance of this method is still constrained by several issues. EIS is also extensively employed in the development of sensors [79, 80], in health care [81], drug release [82], testing, and biology [83] because EIS makes it possible to characterize such systems and helps in identifying crucial variables like conductivities [84], resistances [85], and capacitances [86]. The computational Gaussian processes (GPs) used in this method faced significant challenges including noise, impeded spectrum regression, polarization resistance, and probed frequencies that were not always ideal. An infinite or finite collection of random variables is referred to as a GP, if the joint distribution of any finite subset displays multivariate Gaussian behaviour. Then, GPs may regress and predict it using a prior distribution and a set of assumptions on the characteristics of the observed unknown function [87]. Regression and prediction uncertainty can be measured using GPs and also have so far been used to filter data, predict parameters in diverse situations [88], and enhance experiments in the active learning domain. Liu and Ciucci et al. [89, 90] used GPs to de-convolve the distribution of relaxation duration, a novel approach for EIS analysis. Then, using a finite GP approximation, Maradesa et al. extended this framework to constrain the DFT to be non-negative. Additionally, Py et al. [91, 92] created and validated the method that Ciucci used to assess the quality of EIS spectra using GPs that complied with the Hilbert transform.
Kiss et al. [93] predicted the substituent effects in electrochemical properties of the analyte and comprehended the influence of substituents on the character of the electronic transition and transition density matrices (TDMs). This procedure makes it possible to access the distribution of electrons and holes in the excited state and determine their delocalization. This makes it possible to reveal electronic excitation processes like charge transfer [94]. The imbalance in the TDMs is caused by the presence of electron-donating and electron-withdrawing groups interacting with the hole. The location of the hole is altered when an electron-donating moiety uses mesomeric effects to donate electron density to the hole. Instead of being just inductively impacted, at this instance, the TDM can be described as mesomerically effected. On the other hand, the inductively dominated TDM lacks any localization due to the absence of any major TDM elements on the analyte. The polarity difference has a significant impact on the mesmeric contribution to the TDM. This made it easier to spot the impacts of charge transfer and substitution.
The next field of research addressed the exciton binding energies, which show the Coulomb attraction between the exciton quasiparticles (electron and hole). It is a measurement of the exciton’s separability in free charges, and it has a direct impact on how an effective current is produced in optoelectronics [95]. More details on the impacts on the characteristics of the electronic structure are revealed by analysing the HOMO and LUMO energies (EHOMO and ELUMO) [96]. In order to optimize the electrochemical characteristics of an analyte, Min et al. [97] developed and verified a machine learning (ML) approach for electrochemistry. Both output (such as initial capacity and cycle life) and few input (synthesis parameters, ICP-MS data, and X-ray diffraction (XRD) results) variables were used to build several experimental datasets for analyte [98]. When distributing these variables across the entire dataset while building the ML model, a number of primary variables were chosen to serve as suggestions for the optimal experimental parameters.
Prediction of chromatographic retention behaviour
Quantitative structure retention relationship (QSRR)
QSRR is a computational approach for linking chemical structural variables to chromatographic column retention behaviour. Here, Y-variables are frequently employed as dependent variables for predictive or explanatory purposes, whereas X-variables are utilized as independent variables. As a result, Y-variables in QSRR have connected to solute chromatographic retention, whereas X-variables encode solute molecular structure. QSRR was first used to characterize columns by quantitatively comparing their separation qualities or to supply knowledge for predicting retention mechanisms in various chromatographic settings [22]. A typical QSRR study includes building a retention database of compounds with known chemical structures, computing molecular descriptors for each structure, choosing descriptors, creating a QSRR model, and validation. Figure 5 illustrates a QSRR methodology and work flow.
The most popular methods for expressing chemical structures are molecule 1D descriptors, 2D descriptors, and 3D descriptors. While representing a connection table or a molecular graph, the chemical structure of the solutes of interest is used to compute 2D descriptors, whereas 1D descriptors provide simple chemical information about a solute, such as molecular weight or the number of oxygen atoms in the structure. A molecular descriptor that describes both the general surfaces/or volumes of molecules and 3D arrangement of structural attributes is known as a 3D molecular descriptor [23].
Depicting the molecular structure of QSRR is one of the key concerns in QSRR modelling. Molecular descriptors that describe chemical structures are typically categorized as physicochemical descriptors and descriptors of the quantum chemical, topological, etc. [99]. The fact is that physicochemical descriptors have a positive correlation with solute retention on chromatographic columns. On a molecular level, quantum chemical descriptors shed light on the process of chromatographic retention, although the link to solute retention is frequently poor, and the calculation is laborious. With today’s computational technologies, topological descriptors are easily constructed, but they are unrelated to retention phenomena [24]. There are two methods of the QSRR approach, viz., the direct mapping method and the direct comparison method.
Prediction of retention time by the direct mapping method
It is a simple method for predicting compound retention time on a chromatographic column. It is a web-based solution that allows users to predict retention by submitting their data and receiving expected retention values. Predict is a database available, and this experiment has four steps as follows.
The user can create a CVS file that includes the compound name, real retention time from the PubChem CID or InChIs databases, and stereo-chemical parameters. The user must be able to upload retention data and get new retention predictions easily using a web interface. On the website, the user is initially asked to create a new chromatographic system. Each system will contain two types of columns: (1) a name and (2) a column type (for example: RP, HILIC). (3) column description (for example, Waters and Symmetry C18 columns), (4) eluent system (for instance, 95:5 methanol/water), (5) The eluent’s pH (for example, acidic or alkaline), and (6) Eluent additives (for example, 0.1 per cent trifluoroacetic acid). The user will next submit a CSV file containing retention times for chemicals derived from their studies or google scholar in the following phase. Finally, the user may obtain the estimated retention time by clicking “get a prediction” [58].
Prediction of retention time by direct comparison method
QSRR Automator, a python-based software, can be used to predict retention using the direct comparison method. Mordred, a software package that uses the rdkit package, can be used to determine molecular descriptors. Machine learning operations may be performed with the sci-kit learn package. The following is a description of the QSRR Automator Workflow. The training data, which contains the name of each chemical, the structure in the form of a simplified molecular input line entry system (SMILES) text string, and the retention duration, may be created by the user. The programme creates a template and simplifies the input file on its own. After that, the user may submit their training data (chemical descriptions, SMILES, compound name, and actual retention time). The structural and electrical descriptions to be utilized should be used. Functional groups, hybridizations, the number of carbon atoms, and the ring system are all structural properties. Aromaticity and numerous electronegativity calculations are two electrical properties. All of these calculations are simple; unlike more complex fingerprint feature combinations, they can all be done using the Mordred software package, which calculates over 1500 features [100]. The recent data on the QSRR based method were listed in the Table 1.
Chemometrics in chromatography
The chemometric approach is widely used in separation science to predict the analysed peak asymmetry, peak overlapping, and peak optimizations. Co-elution of multiple analytes in chromatography significantly complicates quantification of the target analyte due to interference caused by incorrect method optimization. At this juncture, chemometric methods such as principal component analysis (PCA) are widely used in separation science and have now been extended to LC-HRMS analysis for proteomics and metabolomics. In addition, artificial neural networks (ANN), factorial design (FD), partial least squares (PLS), and cluster analysis (CA) are also in place [113, 114]
Chemometrics in one- and two-dimensional chromatography
In the development of two-dimensional (2D) chromatography, the entire first-dimension (1D) effluent is divided into many fractions, each of which is subjected to 2D separation. Two-dimensional chromatography is created by combining the results of 1D liquid chromatography separations (LC × LC). The placements of the spots provide qualitative information, while the intensities of the spots provide quantitative information. However, extracting information from extremely complex molecules like protein digests, metabolic extracts, and oil mixes can be problematic. Even with modern high-resolution chromatography, extracting the entire information of a complex matrix remains a challenging task. Many researchers are constantly working to improve the efficiency of chemometric data processing strategies.
In chromatography, chemometric is an appreciable tool for pre- and post-data analysis to resolve undesired background signals, baseline drift, unresolved peaks, and shifting retention times. Chemometric-based data interpretation, information extraction, and pre-data processing can significantly increase the analytical performance of an existing technique. The various chemometric approaches used in chromatography are penalized partial least squares (PPLS) approaches, multivariate curve resolution and orthogonal subspace projection for background correction, local minimum value approach, baseline estimation, and denoising using scarcity, retention-time-alignment strategies, peak clustering, and principal component analysis (PCA). These methods highlighted the chemometric techniques as the most progressing in silico approach in 1D and 2D chromatography and spectroscopy [115].
Chemometrics in unsupervised and supervised techniques
For understanding the dissimilarity or variance in the data matrix, PCA, independent compound analysis (ICA), and cluster analysis (CA) are used. As a result, the “calibration sets” may be defined as loading vectors and utilized to project unknown data. If data does not cluster against any objective criterion, then supervised procedures such as multivariate calibration methods are applied. Although a regression model may be built utilizing a large number of PCA variables, this approach is referred to as principal component regression (PCR). The data matrix’s PCR analysis is mainly based on variance. The partial least squares (PLS) method, also known as a projection to latent structures, is commonly used in the linear supervised method. It finds the route through the data matrix that maximizes the covariance between the matrix and the predicted variable and then creates a regression model [116].
Software tools in chemometrics and their workflow
Chemometric software (for example, BWIQ) is available for on- and off-line quantitative and qualitative spectral measurements to identify principal components. The software classifies the sample as corresponding to the group with the shortest calculated “Mahalanobis distance (a measure of the distance between point-P and distribution D)”. The workflow is described in following section.
The complete spectrum will be presented on the screen once you start the software, click “file”, open the data, and import it into the software. We may designate spectral files in BWIQ in a variety of ways, including calibration, validation, and ignored files. The “usage” column’s drop-down button was used to manually designate the spectrum. The algorithm parameters have been chosen and are accessible in the algorithm properties tab. We may use the sampling method and adjust the calibration file to the o validation file ratio in the property panel, for example, 60:40 (calibration: validation). After that, eliminate any change in the unrelated to chemical variations data sets but rather to scattering, instrumental fluctuations, spectral noise, or background differences in the pre-processing processes. Because the model can analyse the full spectrum, it will be more sensitive to contaminants or changes in the samples that add signals in other spectral areas. However, excluding non-informative or noisy data areas from analysis is an advantage. Then, we have the option of using a chemometric method such as PCA-Mahalanobis distance (MD). In principle component space, the scores plot illustrates the sample clusters. The result shows clusters matching the different classes of principal components. Additional graphs, such as loading and variance, are also available [117].
Different types of chemometrics approaches
Penalized partial least squares approach (PPLS)
This method was initially developed by Whittaker in 1922 to address signal smoothing issues [118]. The goal of PLS is to approximate observed data by resolving conflicts between original data fidelity and the imprecision of fitting data more easily by resolving the model’s fit to the data [119]. Assume that Eq. (1) is used to calculate the fidelity and roughness combined in a balanced way:
where z is the fitting vector and v is a vector representing the analyte spectrum, both of which have a length of “n” elements. Fitted z should maintain both the roughness of the fitted vector and fidelity to v. The sum of squares of differences between the vector and element of z and its neighbours can be used to describe F, which stands for fidelity to the analyte spectrum “v”, and R, which stands for the roughness of the fitting vector z. A user-adjustable parameter called “λ” finds a balance between fidelity and roughness. Greater λ favours a fitted vector that is smoother.
A weight vector w was added for fidelity in order to use the PLS to estimate the background. Its element wi may be thought of as a weight that represents the dependability of point I as a component of background. The partial derivatives of Q are equalled to zero \(\left( {{{\partial Q} \mathord{\left/ {\vphantom {{\partial Q} {\partial z}}} \right. \kern-0pt} {\partial z}} = 0} \right)\), in order to solve the minimization issue of Eq. (1). The matrix form of the resulting linear system is then used to determine the fit (Eq. 2).
To use this PLS approach for baseline correction, which is used by Zhang et al. and Cobas, one must first identify the locations of the chromatogram’s peaks. In order to determine whether a data point in the chromatogram relates to background or a peak, respectively, a binary mask or weighted matrix can be generated once these peak points are known [120, 121].
Additionally, Eilers et al. [122] created the asymmetrical least squares (asLS), which introduces an asymmetry parameter in an effort to address this problem. The weights assigned to positive and negative deviations from the baseline can now be less and bigger, respectively. However, this also takes into account of issues with the baseline that were raised for the introduction of adaptive iteratively reweighted penalized least squares (airPLS) [123], which enables some baseline regions to be fined more than others. By iteratively resolving a weighted penalized least squares problem, airPLS develops a weight vector.
Once the difference between the signal and the fitted vector \(\left| {{\text{d}}^{t} } \right|\) is less than one thousand of the original signal, it is assumed that an accurate weight vector has been established. The PLS approach satisfies the following termination criteria.
In some situations, both approaches overestimate the baseline when a matrix is present. Baek et al. created the asymmetrically reweighted penalizes least squares (arPLS) method as a solution [124]. MairPLS is another technique built on the similar concepts. While comparing to the prior technique, Long Chen et al. [125] collaborative PLS for Raman spectra background correction result was better.
Multivariate curve resolution-alternating least squares (MCR-ALS)
From MCR-ALS, estimations of the chemically significant profiles of the relevant chemical species may be created from mixed experimental data using a bilinear decomposition [126]. Building many MCR-ALS models while investigating suitable quality-of-fit and interpretability of resolved chemical information is commonly required by strategies to determine the optimal number of components in the MCR-ALS model [127]. The data set include complex, heterogeneous samples of unknown composition, spatially resolved chemical images and associated resolved analyte spectra of the individual, pure chemical components. MCR-ALS specifically breaks down an experimental data matrix (DM) [128]
where in Eq. (5), the resolved spectrum matrix is ST, the residual error matrix is E, and the concentration profile matrix is C. Three-dimensional experimental data produced by spectroscopic techniques contain spectral (λ or v) and spatial (x and y) information. The 2D experimental data matrix, DM, which contains integrated spatial (both x and y together) and spectral (λ or v) information, is generated from the three-dimensional experimental data before MCR-ALS. This approach applied for the baseline correction and quantitative purpose also for correction of local minimum of the least square errors obtained by various other methods such as singular value decomposition (SVD) or PCA [129].
Principal component analysis (PCA)
The principal component analysis is a popular unsupervised learning technique for reducing the dimensionality of data. The PCA was invented in 1901 by Pearson [130]. In chromatography, PCA is frequently used to examine the outcomes of complicated samples [131] where uncorrelated variables are linearly fit across the data set. The major variation of data is represented by the first component, which also describes the second-most frequent variance in the data, and so on. This chemometrics tool can be particularly helpful when it comes to interpreting highly dimensional data.
The PCA method may be used for interference factor removal, interference factor extraction, and data compression. The following equation illustrates the outcomes of using the singular value decomposition (SVD) method to carry out PCA analysis and get orthogonal principal components (PCs) [132].
where in Eq. (6), the three matrices \(U\), \(\Sigma\), and \(V^{T}\) denote scores, singular values, and loadings with sizes of \(m \times m\), \(m \times n\), and \(n \times n\), respectively. D stands for the raw data with a size of \(m \times n\) for decomposition [133].
In chromatography, Soares et al. [134] applied the PCA in combination with COW; its interesting use is to compare columns. Prior performing PCA, the chromatograms are first aligned with a COW technique to increase the probability (p-) values. It is possible to determine if there are significant differences between chromatograms by computing the Mahalanobis distances and converting them to p-values. Although this method decreases noise and raises the signal-to-noise ratio (S/N), there is a possibility that numerous components may become convoluted and that chemical information is lost. According to a report, the ideal bin size depends on the sample [135]. This method may be used to classify samples in complicated or multidimensional data set.
Parallel factor analysis (PARAFAC)
PARAFAC reduces the dimensionality of the data collection, but factor analysis is as similar to PCA. Factor analysis present the data as trilinear and contains three modes, namely spectra, chromatograms, and concentrations [136], whereas PCA is essentially a dimension reduction approach. As a result, it discovers not only a subspace, but also the vector orientations [137]. PARAFAC2, which was developed by Khakimov et al. [138], can similarly handle slight changes in retention time. The three-way array X of dimensions I, J, and K can be described by the PARAFAC decomposition.
In Eq. (7), F stands for the number of factors, while \(X_{ijk}\), \(a_{if}\), \(b_{jf}\), \(c_{kf}\), and \(e_{ijk}\) are, respectively, elements of X, A, B, C, and E. The loading matrices A, B, and C have dimensions of \(I \times F\),\({ }J \times F\), and \(K \times F\), respectively. The three-way array of dimensions \(I \times J \times K\) is denoted E [139].
The uniqueness of PARAFAC model is that it establishes not only the subspace, but also the location of the axes defining it. Additionally, the PARAFAC model offers a second-order benefit of allowing for the analysis of chemical components even in the presence of unidentified interferences [140]. Tatjana et al. and Na Peng et al. [141] both applied the PARAFAC to the fluorescence analysis, and they discovered that the model of fluorescence had the capacity to quantify and analyse fluorophores quality in analytes and classify the various types of fluorophores. Another study recommended the combination of PARAFAC with fluorescence regional integration for better characterizing analyte and understanding their functionality [142].
Partial least squares (PLS)-based methods
PLS-DA, also known as discriminant partial least squares (D-PLS), is a method for analysing partial least squares. The technique was first developed by Barker and Rayens [143]. Dimension reduction and the construction of a predictive model are the two major components of PLS-DA modelling. It gives a linear delimiter using partial least squares (PLS) regression with the response variables being binary class membership indices (e.g. 0 and 1) for each class. The PLS-2 algorithm, which enables the prediction of a matrix of response variables in multiple components, is used when there are more than two classes involved.
PLS-DA—The components must be orthogonal to one another in the ordinary variant. The non-singular eigenvectors of the covariance matrix C can be used to formulate it [144].
where in Eq. (8), \(y\) is the class label vector, \(C_{n}\) is the \(n \times n\) centring matrix, and \(X\) is the loading matrix. The loading vectors a1,… ad, which denote the relevance of each feature in that component, are computed iteratively. Its objective for iteration h is as follows:
where X1 = X, \(y_{h}\) and \(X_{h}\) are the residual (error) matrices following transformation with the prior h-1 components, and \(b_{h}\) is the loading for the label vector \({ }y_{h}\).
PLS-DA has been used mostly in biomarker and drug discovery research using LC–MS/MS and NMR study of advanced-stage melanoma in blood [145]. Using LC–MS data, Lambrecht et al. [146] employed PLS-DA to classify black rice according to its place of origin. PLS was used by Eleni et al. [147] to predict the diffusion of substances in artificial membranes.
Additionally, orthogonal partial least squares discriminant analysis (OPLS-DA) is designed to distinguish between the discriminating and non-discriminatory dimensions [148]. Using a set of metabolites identified by LC–MS/MS, Zhang et al. [149] applied OPLS-DA to confirm the legitimacy of fruit juices. Shurui et al. [150] used a similar strategy when they used OPLS-DA to HRMS study for non-target metabolomics.
Support vector machines (SVM)
A set of pattern-recognition techniques called support vector machines (SVM) was developed to effectively handle nonlinear data distributions. It is one of the chemometrics’ machine learning methods. The fundamental component of SVM is the projection of data points into a space with added dimensions, which serves as a means of identifying linear functions capable of modelling the data [151]. Such modelling functions can be projected back into the space of the original predictors, and producing functions are higher in complexity but lower in dimension (often nonlinear). The use of SVM in discriminant classification is conventional. Nevertheless, several authors offered class-modelling-relevant adjustments. It is important to note the support vector domain description (SVDD) method by Songfeng Zheng [152] used hyperspheres to describe the class spaces, as one of the most popular strategies. Numerous researchers have used this strategy in a variety of analytical studies, including laser-induced breakdown spectroscopy [153], ATR-FT-IR spectroscopy [154], tandem mass spectrometry (MS/MS) [155], and HPLC [156].
Artificial neural networks (ANNs)
ANNs are multilayer networks of linked mathematical operators (neurons). The feed-forward neural network is the most common ANN. Here, each neuron performs as a weighted sum of the input data or outputs of the preceding layer as modified by an activation function (typically linear or logistic function). The proposed algorithms learns from a dataset for predicting event outcomes [157].
In the last decade, artificial neural networks (ANNs) have been developed to determine retention index or time for 1D-GC, 1D-LC, 2D-LC and 2D-GC separations [158, 159]. ANNs are computer programs that “learn” to carry out tasks by taking into account multiple cases. As long as enough input is given, an ANN can detect traits and patterns in data. Then, predictions are made in novel conditions using these traits and patterns. ANNs have been employed in variety of analytical research studies like LC–MS/MS determination [160], GC–MS [161], and HPLC [162]. Moreover, the list of chemometric methods used in analytical techniques were listed in Table 2.
Analytical quality by design (AQbD)
Analytical quality by design (AQbD) is an approach for developing robust analytics that is appropriate for regulatory flexibility in pharmaceutical submissions to the FDA. AQbD is widely used in the development of various analytical methods such as UV–visible, FT-IR, Raman, NIR, fluorimetric, HPLC, UHPLC, LC–MS, GC–MS, HPTLC, and SFC. In the pharmaceutical industry, the AQbD tool is integrated with PAT as a real-time process analyser to monitor any given process or material, which generates massive and complex data sets. There is a growing interest in the implementation of AQbD in new analytical method development procedures for wider applications including assays, stability studies, and bioanalytical studies, in analytical method development. While comparing to one-factor-at-a-time (OFAT) approach, AQbD-based analytical methods have demonstrated a high degree of robustness and method performance. Notably, using these techniques reduces the likelihood of human error, and the AQbD approach will not predict any chromatogram but instead explore scientific understanding in method implementation sequences, beginning with the quality of predictions that relate to risk assessment in method choice, then between method parameter and expected method results, and finally a region for a highly robust and cost-effective approach [186]. The design of experiment (DoE) is a part of AQbD methodology and represents the interaction among the input factors that ultimately affect the technique response and outcomes. Therefore, a typical AQbD methodology starts with an analytical target profile (ATP) and risk and critical evaluation, then uses DoE to optimize the method variables, creates a method operable design region (MODR), and implements a control plan [187,188,189]. There are works available and comprised in Table 3 and the scheme of methodology illustrated in Fig. 6.
Assessments of prediction ability of prediction software
Assessment of the predictive ability of NMR prediction by Chemaxon
An attempt was made to verify the expected chemical shift values for the chosen test compounds shown in Fig. 7. The original experimental chemical shift values were compared to the predicted chemical shift values of ten chemically divergent structural compounds in this experiment. A per cent error (%) for each chemical shift value was obtained, as well as regression analysis. The per cent error ranged from − 26.52 to 35.98%. The correlation’s graphs in Figs. 8 and 9 show R2 value of 0.959 (H1-NMR) and 0.974 (C13 NMR). This indicates the accuracy of NMR signal prediction. According to prediction results, in H1-NMR, aliphatic proton error ranged from − 26.52 to 35.98%, whereas aromatic proton error ranged from − 25.47 to 9.21%. The aliphatic carbon error ranged from − 14.41 to 27.54% in the C13 NMR, whereas the aromatic carbon error ranged from − 14.95 to 6.49 per cent. Finally, we conclude the aliphatic error was greater when compared with the aromatic error all those data was presented in the Table 4.
Assessment of the predictive ability of ORCA
For UV–Visible prediction
Here, originally obtained wavelength maximum (λmax) values were compared to the predicted wavelength values of fifteen structurally divergent structural compounds. A per cent error (%) for each wavelength value was obtained, as well as regression analysis. The error rate was found to be between − 2.27 and 18.69%. The correlation’s graph in Fig. 10shows R2 value of 0.926. This demonstrates the accuracy of UV–visible prediction. The results demonstrate that when methanol is used as a prediction solvent, the error ranges from 0.0 to 18.69 per cent, whereas water has a range of − 2.27 to 11.73%. Finally, we conclude that more error is observed when using methanol as a solvent for prediction compared with water. The resulting data were presented in the Table 5.
For Raman and infrared
The predicted Raman shift and infrared absorption frequency for the selected test substances was verified with experimental values. Here, the predicted frequency values of ten chemically divergent structural compounds are verified with original experimental frequency values. The % error for each frequency value and regression analysis was calculated. The % error was observed between − 30.04 and 29.26%. The R2 value (Figs. 11 and 12) for the correlation was 0.946 for both the Raman shift and the infrared absorption frequency. This indicates the reliability of Raman shift and infrared absorption frequency prediction. The results reveals that aliphatic single bond compound error ranged from − 20.02 to 29.26%, double and triple bonded compound error ranged from − 4.20 to 13.14%, and hydroxyl function group compound error ranged from − 21.01 to 3.55%, and aromatic ring compound error ranged from − 11.43 to 18.07%. As a result, we find that the aliphatic single bond compound error was greater than other errors, moreover the comparative data was presented in the Table 6 clearly.
Furthermore, an attempt has been made to verify the predicted infrared absorption frequency for lamivudine and zidovudine with all functional frequencies. In this study, the predicted frequency values of lamivudine and zidovudine structural functional group frequencies were compared to the original experimental frequency values. The % error for each frequency value and regression analysis was calculated. The % error was observed between − 24.26 and 18.89%. The R2 value for the correlation graph in Fig. 13 was shown to be 0.970 for both lamivudine and zidovudine absorption frequencies. This also demonstrates the reliability of frequency prediction using a single compound prediction with all functional groups was presented in the Table 7.
Assessment of predictive ability QSRR Automator with reference data
The QSSR retention predictions for antiviral drugs were conducted. For that reference information was gathered from various research publications, and different antiviral drugs with C18 column elution were selected. The predicted retention time for the test set of drugs was compared to the published retention time data. The % error for each retention time and regression coefficient were calculated. The % error was observed in the range of − 20 to 20%. This can be observed clearly in the histogram plot in Fig. 14 and 15, and the R2 value for the correlation was 0.947. This indicates the reliability of retention time prediction. Topological and 2D descriptors like MW, AATS, MATS, GATS, Axp, n6aHRing, NsNH2, and SLogP are the most often used contributing descriptors, and they depend on the analyte chemical structure. Table 8 was presented the results of QSRR predicted and experimental retention time of Antiviral class drugs.
Short-time Fourier transform (STFT) method for assessment
The short-time Fourier transform (STFT) study revealed sufficiency to recognize that all the assessment results of C13, H1-NMR, UV–visible, IR, Raman are accurate and have also been used for further evaluation purpose. Based on the density power of frequency spectrogram, it is most likely that the yellow or red colour denoted high power, and the blue colour is low power. If the spectrograms had the same frequency power or should have produced results close to the acceptable prediction, we would have concentrated on the greater frequency power of both the predicted and experimental data sets. While both will have different frequency powers, this indicates that the prediction was inaccurate.
The H1-NMR power frequency of the predicted and experimental results are nearly identical and the highest power index should be in the range of 18.25–17.39, respectively (Fig. 16). This indicates that both outcomes are accurate. With regard to the C13-NMR, the power frequency of both the predicted and the experimental data are depicted in Fig. 17, both of which exhibit the identical power index value of 44.17 and 44.01, respectively. This demonstrated the validity of the data, and the spectrogram revealed very slight frequency differences in the lower power range, which are visible in the blue colour peaks. The highest frequency indexes in the UV–visible Fig. 18 are 44.89 and 44.82, for both predicted and experimental results, respectively, in the same frequency index. Finally, the Raman and infrared power frequency index can be observed in Figs. 19 and 20 that both the predicted and experimental data are shown using the same frequency index and Fig. 21 shows the STFT spectrogram 3D plot for IR prediction results all these results providing us to confirm that the prediction was accurate.
Discussion
Our assessment afforded the acceptable results, however few software-related constraints, particularly time consumption of 5–20 h for TD-DFT calculations. The prediction error will produce more erroneous findings when the data set is small and a prediction tool required to be unique. Therefore, a large data set is necessary for successful finding; but, in some cases, a large data set can also result in inaccurate prediction, e.g. a complicated structure with multiple classes of variables takes longer time to process, and the impact on the prediction process ultimately leads to wrong results, which is disappointing for a research study. Therefore, careful planning in the dataset and systematic prediction are required to produce reliable research findings. Then, while collecting the reference data set, we stumbled into issues with some data not being present in the reference library. In that situation, leaving the compound and switching to another approach might be an option. For example, if two distinct spectrum results for the same chemicals are found in certain reference data, such case data optimization need to be performed. There are several online reference data sources available for mass spectroscopy; however, there are fewer for infrared Raman spectra. In Table 9, all these problems and challenges related to spectrogram prediction are listed.
In QSRR approach, more than 50 compounds are needed for prediction of retention time. During operation, we noticed that whenever a data set was given, it was based on predicting values nearby. This may be a problem with the QSRR Automator programme, but more sophisticated software for retention time prediction is already available, so we can utilize it for alternative purposes. The chemometric theory is entirely mathematically based, understanding AQbD and chemometrics is more critical in nature. If a chemist is not familiar with mathematical, it will be harder to develop a prediction process. Each approach in chemometrics has a unique methodology, thus experts are required for both planning and result evaluation. Additionally, we noted in the literature study that there is less research on the electrochemistry spectroscopic prediction with chemometrics. Generally, the electrochemistry prediction will be employed in the technological field, but only when few drugs are discovered and developed. Since there are so many variables that might influence the results, such as instrument setting, calibration, process, and model selections, some AQbD method failure will certainly occur in the case of method replacements. However, this strategy is most effective at minimizing method transfer, OOS, and OOT failure rates. These are presented in Table 10.
The differences between the physical and chemical data predictions are also illustrated in Table 11. By comparison, the physical data prediction is simpler than the chemical data prediction because the latter requires a larger number of supporting techniques and programmes. In addition, it requires larger number of descriptors, and is more challenging for beginners and students.
Conclusions
Finally, with acceptable accuracy and the least feasible variation, the present review of computational approaches in spectrum prediction was concluded. Overall, students and researchers are considerably utilizing the in silico tools in computational chemistry and indicate the reliability of such tools in research. The development and application of computational approaches in analytical research and development are our key objectives. As we observed, computational analytical behaviour prediction offers a wide range of applications in academic research, bioanalytical method development, computational chemistry, analytical method development, data analysis approaches, material characterization, and validation. Still, the prediction error of these tools need to be minimized for better accuracy, thus it will be explored much more in exploratory research in future.
Availability of data and materials
Not applicable. The manuscript does not contain any data.
Abbreviations
- UV:
-
Ultraviolet
- IR:
-
Infrared
- NMR:
-
Nuclear magnetic resonance
- GC:
-
Gas chromatography
- ANNs:
-
Artificial neural networks
- RSM:
-
Response surface methodology
- AQbD:
-
Analytical quality by design
- DoE:
-
Design of experiments
- QSRR:
-
Quantitative structure retention relationship
- OOT:
-
Out of trend
- OOS:
-
Out of specification
- AI:
-
Artificial intelligence
- ASV:
-
Automatic structure verification
- TD-DFT:
-
Time-dependent density function theory
- EI-MS:
-
Electron ionization mass spectrometry
- NEIMS:
-
Neural electron ionization mass spectrometry
- QCEIMS:
-
Quantum chemical electron ionization mass spectrometry
- MD:
-
Molecular dynamics
- ESI–MS/MS:
-
Electrospray ionization mass spectroscopy
- EOM-CCSD:
-
Equation of motion coupled cluster singles and doubles
- PCM:
-
Polarizable continuum model
- QSAR:
-
Qualitative structural activity relationship
- Lasso-RF:
-
Random forest descriptor
- EIS:
-
Electrochemical impedance spectroscopy
- GPs:
-
Computational Gaussian processes
- TDMs:
-
Transition density matrices
- ICP-MS:
-
Inductively coupled plasma mass spectroscopy
- XRD:
-
X-ray diffraction
- CSV:
-
Comma-separated values
- SMILES:
-
Simplified molecular input line entry system
- RMSE:
-
Root-mean-squared error
- FD:
-
Factorial design
- PLS:
-
Partial least squares
- CA:
-
Cluster analysis
- PCR:
-
Principal component regression
- TLRC:
-
Trilinear regression calibration
- MLRC:
-
Multi-linear regression calibration
- PLSR:
-
Partial least square regression
- ILS:
-
Inverse least square
- PCA:
-
Principal component analysis
- OPA:
-
Orthogonal projection analysis
- EFA:
-
Evolving factor analysis
- MCR-ALS:
-
Multivariate curve resolution by alternating the least squares approach
- SFA:
-
Sub-window factor analysis
- PARAFAC:
-
Parallel factor analysis
- GRAM:
-
Generalized rank annihilation
- PLS-DA:
-
Partial least square discriminate analysis
- HCA:
-
Hierarchical cluster analysis
- SIMCA:
-
Soft independent modelling of class analogy
- RAFA:
-
Rank annihilation factor analysis
- LC–NMR:
-
Liquid chromatography nuclear magnetic resonance
- LC–MS:
-
Liquid chromatography–mass spectrometry
- GC–MS:
-
Gas chromatography–mass spectrometry
- FT-IR:
-
Fourier transform infrared
- HPLC:
-
High-performance liquid chromatography
- UPLC:
-
Ultra-performance liquid chromatography
- HPTLC:
-
High-performance thin-layer chromatography
- SFC:
-
Supercritical fluid chromatography
- PAT:
-
Process analytical technique
- OFAT:
-
One-factor-at-a-time
- FR:
-
Flow rate
- CT:
-
Column temperature
- AQ:
-
Aqueous
- ACN:
-
Acetonitrile
- RSD:
-
Relative standard deviation
- TBAH:
-
Tetrabutylammonium hydroxide
- PDOP:
-
Potassium dihydrogen orthophosphate
- TEA:
-
Triethylamine
- STFT:
-
Short-time Fourier transform
- QSPR:
-
Qualitative structural property relationship
References
Nova A, Maseras F (2013) Enantioselective synthesis. In: Comprehensive inorganic chemistry ii (second edition): from elements to applications, pp. 807–831
Genheden S, Reymer A, Saenz-Méndez P, Eriksson LA (2017) Computational chemistry and molecular modelling basics
Polanski J, Gasteiger J (2016) Computer representation of chemical compounds. J Puzyn T Eds. https://doi.org/10.1007/978-94-007-6169-8_50-1
Gerlich M, Neumann S (2013) Metfusion: integration of compound identification strategies. J Mass Spectrom 48(3):291–298
Wolf S, Schmidt S, Müller-Hannemann M, Neumann S (2010) In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinf 11(1):1–12
Peironcely JE, Rojas-Chertó M, Tas A, Vreeken R, Reijmers T, Coulier L, Hankemeier T (2013) Automated pipeline for de novo metabolite identification using mass-spectrometry-based metabolomics. Anal Chem 85(7):3576–3583
Snyder HD, Kucukkal TG (2021) Computational chemistry activities with avogadro and orca. J Chem Educ 98(4):1335–1341. https://doi.org/10.1021/acs.jchemed.0c00959
Kotha RR, Natarajan S, Wang D, Luthria DL (2019) Compositional analysis of non-polar and polar metabolites in 14 soybeans using spectroscopy and chromatography tools. Foods 8(11):557
Kaleta M, Oklestkova J, Novák O, Strnad M (2021) Analytical methods for the determination of neuroactive steroids. Biomolecules 11(4):553
Vandierendonck A (2017) A comparison of methods to combine speed and accuracy measures of performance: a rejoinder on the binning procedure. Behav Res Methods 49(2):653–673
Paul D, Sanap G, Shenoy S, Kalyane D, Kalia K, Tekade RK (2021) Artificial intelligence in drug discovery and development. Drug Discov Today 26(1):80
Udayakumar V, Periandy S, Ramalingam S (2011) Experimental (ft-ir and ft-raman) and theoretical (hf and dft) investigation, ir intensity, raman activity and frequency estimation analyses on 1-bromo-4-chlorobenzene. Spectrochim Acta A Mol Biomol Spectrosc 79(5):920–927. https://doi.org/10.1016/j.saa.2011.03.049
Guideline, I. H. T. (2017). Technical and regulatory considerations for pharmaceutical product lifecycle management q12. Paper presented at the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use.
FDA. (2011). Food drug administration. Pharmaceutical quality system (ich 10) conference. Accessed Jul 2021 from https://www.fda.gov/regulatory-information/search-fda-guidance-documents/q10-pharmaceutical-quality-system
Patel KY, Dedania ZR, Dedania RR, Patel U (2021) Qbd approach to hplc method development and validation of ceftriaxone sodium. Future J Pharm Sci 7(1):141. https://doi.org/10.1186/s43094-021-00286-4
Peraman R, Bhadraya K, Padmanabha Reddy Y (2015) Analytical quality by design: a tool for regulatory flexibility and robust analytics. Int J Anal Chem. https://doi.org/10.1155/2015/868727
Agatonovic-Kustrin S, Zecevic M, Zivanovic L, Tucker I (1998) Application of artificial neural networks in HPLC method development. J Pharm Biomed Anal 17(1):69–76
Webb R, Doble P, Dawson M (2009) Optimisation of hplc gradient separations using artificial neural networks (anns): application to benzodiazepines in post-mortem samples. J Chromatogr B 877(7):615–620
Chatterjee S (2013) QBD considerations for analytical methods—FDA perspective. Paper presented at the US IFPAC annual meeting
Burnett K, Harrington B, Graul T, Fanalis S, Haddad P, Poole C (2013) Qbd in liquid chromatographic applications. Elsevier
Kaliszan R (2000) Chapter 11 recent advances in quantitative structure-retention relationships (QSRR). In: Valkó K (ed) Handbook of analytical separations. Elsevier Science, pp 503–534
Héberger K (2007) Quantitative structure–(chromatographic) retention relationships. J Chromatogr A 1158(1):273–305. https://doi.org/10.1016/j.chroma.2007.03.108
Amos RIJ, Haddad PR, Szucs R, Dolan JW, Pohl CA (2018) Molecular modeling and prediction accuracy in quantitative structure-retention relationship calculations for chromatography. TrAC, Trends Anal Chem 105:352–359. https://doi.org/10.1016/j.trac.2018.05.019
De Matteis CI, Simpson DA, Doughty SW, Euerby MR, Shaw PN, Barrett DA (2010) Chromatographic retention behaviour of n-alkylbenzenes and pentylbenzene structural isomers on porous graphitic carbon and octadecyl-bonded silica studied using molecular modelling and QSRR. J Chromatogr A 1217(44):6987–6993
MacNeil JD (2012) Analytical difficulties facing today’s regulatory laboratories: issues in method validation. Drug Test Anal 4(Suppl 1):17–24. https://doi.org/10.1002/dta.1358
Volta ESL, Gonçalves R, Menezes JC, Ramos A (2021) Analytical method lifecycle management in pharmaceutical industry: a review. AAPS PharmSciTech 22(3):128. https://doi.org/10.1208/s12249-021-01960-9
Yang W, Qian W, Yuan Z, Chen B (2022) Perspectives on the flexibility analysis for continuous pharmaceutical manufacturing processes. Chin J Chem Eng 41:29–41. https://doi.org/10.1016/j.cjche.2021.12.005
Akash MSH, Rehman K (2020) Introduction to pharmaceutical analysis. In: Akash MSH, Rehman K (eds) Essentials of pharmaceutical analysis. Springer Nature Singapore, Singapore, pp 1–18
Cadinoska M, Popstefanova N, Ilievska M, Karadzinska E, Jovanoska M, Glavas Dodov M (2019) Trending and out-of-trend results in pharmaceutical industry. Maced Pharm Bull 65:39–60. https://doi.org/10.33320/maced.pharm.bull.2019.65.01.005
Appleton T, Bryan P, Contos D et al (2012) Nonclinical dose formulation: out of specification investigations. Aaps J 14(3):523–529. https://doi.org/10.1208/s12248-012-9347-4
Martinez Calatayud J (2005) Spectrophotometry | pharmaceutical applications. In: Worsfold P, Townshend A, Poole C (eds) Encyclopedia of analytical science, 2nd edn. Elsevier, Oxford, pp 373–383
Simundic AM, Lippi G (2012) Preanalytical phase–a continuous challenge for laboratory professionals. Biochem Med 22(2):145–149. https://doi.org/10.11613/bm.2012.017
Paré GKS (2017) Handbook of ehealth evaluation: an evidence-based approach. University of Victoria, Victoria
Redrup MJ, Igarashi H, Schaefgen J et al (2016) Sample management: recommendation for best practices and harmonization from the global bioanalysis consortium harmonization team. Aaps J 18(2):290–293. https://doi.org/10.1208/s12248-016-9869-2
Piskunov DP, Danilova LA, Pushkin AS, Rukavishnikova SA (2020) Influence of exogenous and endogenous factors on the quality of the preanalytical stage of laboratory tests (review of literature). Klin Lab Diagn 65(12):778–784. https://doi.org/10.18821/0869-2084-2020-65-12-778-784
Krčmová LK, Melichar B, Švec F (2020) Chromatographic methods development for clinical practice: requirements and limitations. Clin Chem Lab Med 58(11):1785–1793. https://doi.org/10.1515/cclm-2020-0517
Patil R, Bhaskar R, Ola M, Pingale D, Chalikwar SS (2019) Bioanalytical method development and method validation in human plasma by using LC MS/MS
Khamis MM, Adamko DJ, El-Aneed A (2021) Strategies and challenges in method development and validation for the absolute quantification of endogenous biomarker metabolites using liquid chromatography-tandem mass spectrometry. Mass Spectrom Rev 40(1):31–52. https://doi.org/10.1002/mas.21607
Ragoisha G (2020) Challenge for electrochemical impedance spectroscopy in the dynamic world. J Solid State Electrochem 24:2171–2172
Pierce KM, Trinklein TJ, Nadeau JS, Synovec RE (2021) Chapter 20 - data analysis methods for gas chromatography. In: Poole CF (ed) Gas chromatography, 2nd edn. Elsevier, Amsterdam, pp 525–546
Oliveri P, Malegori C, Simonetti R, Casale M (2019) The impact of signal pre-processing on the final interpretation of analytical outcomes - a tutorial. Anal Chim Acta 1058:9–17. https://doi.org/10.1016/j.aca.2018.10.055
Cobas C (2020) NMR signal processing, prediction, and structure verification with machine learning techniques. Magn Reson Chem 58(6):512–519
Ito K, Xu X, Kikuchi J (2021) Improved prediction of carbonless NMR spectra by the machine learning of theoretical and fragment descriptors for environmental mixture analysis. Anal Chem 93(18):6901–6906
Kern S, Liehr S, Wander L, Bornemann-Pfeiffer M, Müller S, Maiwald M, Kowarik S (2020) Artificial neural networks for quantitative online NMR spectroscopy. Anal Bioanal Chem 412(18):4447–4459
Jonas E, Kuhn S (2019) Rapid prediction of NMR spectral properties with quantified uncertainty. J Cheminf 11(1):1–7
Jia W, Yang Z, Yang M, Cheng L, Lei Z, Wang X (2021) Machine learning enhanced spectrum recognition based on computer vision (SRCV) for intelligent NMR data extraction. J Chem Inf Model 61(1):21–25. https://doi.org/10.1021/acs.jcim.0c01046
Lin M, Xiong J, Su M et al (2022) A machine learning protocol for revealing ion transport mechanisms from dynamic NMR shifts in paramagnetic battery materials. Chem Sci 13(26):7863–7872. https://doi.org/10.1039/D2SC01306A
Chemaxon (2021) Chemaxon software solution service for chemistry and biology. Accessed May 2021 from https://chemaxon.com/products/calculators-and-predictors
Mamede R, Pereira F, Aires-de-Sousa J (2021) Machine learning prediction of UV–vis spectra features of organic compounds related to photoreactive potential. Sci Rep 11(1):23720. https://doi.org/10.1038/s41598-021-03070-9
Chan B, Hirao K (2020) Rapid prediction of ultraviolet-visible spectra from conventional (non-time-dependent) density functional theory calculations. J Phys Chem Lett 11(18):7882–7885. https://doi.org/10.1021/acs.jpclett.0c02146
Urbina F, Batra K, Luebke KJ et al (2021) UV-advisor: attention-based recurrent neural networks to predict UV–vis spectra. Anal Chem 93(48):16076–16085. https://doi.org/10.1021/acs.analchem.1c03741
ORCA (2021) Orca. Basis sets. Orca input library. Accessed May 2021 from https://sites.google.com/site/orcainputlibrary/basis-sets
(2021) UV-vis spectroscopy orca tutorials 5.0 documentation. Accessed May 2021 from https://www.orcasoftware.de/tutorials_orca/spec/uvvis.html
Orca A (2021) Avogadro orca: an open-source molecular builder and visualization tool, version 4.2. Accessed May 2021 from https://avogadro.cc/
Neese F (2012) The orca program system. Wiley Interdiscip Rev Comput Mol Sci 2(1):73–78
Avogadro (2021) Avogadro program manual. Accessed Jun 2021 from https://avogadro.cc/docs
Neese F, Wennmohs F, Becker U, Riplinger C (2020) The orca quantum chemistry program package. J Chem Phys 152(22):224108. https://doi.org/10.1063/5.0004608
Stanstrup J, Neumann S, Vrhovšek U (2015) Predret: prediction of retention time by direct mapping between multiple chromatographic systems. Anal Chem 87(18):9421–9428. https://doi.org/10.1021/acs.analchem.5b02287
McGill C, Forsuelo M, Guan Y, Green WH (2021) Predicting infrared spectra with message passing neural networks. J Chem Inf Model 61(6):2594–2609. https://doi.org/10.1021/acs.jcim.1c00055
Maciel EVS, Pereira Dos Santos NG, Vargas Medina DA, Lanças FM (2022) Electron ionization mass spectrometry: Quo vadis? Electrophoresis 43(15):1587–1600. https://doi.org/10.1002/elps.202100392
Smith RW (2013) Mass spectrometry. In: Siegel JA, Saukko PJ, Houck MM (eds) Encyclopedia of forensic sciences, 2nd edn. Academic Press, Waltham, pp 603–608
Bauer CA, Grimme S (2016) How to compute electron ionization mass spectra from first principles. J Phys Chem A 120(21):3755–3766. https://doi.org/10.1021/acs.jpca.6b02907
Zhou Z, Zare RN (2017) Personal information from latent fingerprints using desorption electrospray ionization mass spectrometry and machine learning. Anal Chem 89(2):1369–1372. https://doi.org/10.1021/acs.analchem.6b04498
Wei JN, Belanger D, Adams RP, Sculley D (2019) Rapid prediction of electron-ionization mass spectrometry using neural networks. ACS Cent Sci 5(4):700–708. https://doi.org/10.1021/acscentsci.9b00085
Wang S, Kind T, Tantillo DJ, Fiehn O (2020) Predicting in silico electron ionization mass spectra using quantum chemistry. J Cheminform 12(1):63. https://doi.org/10.1186/s13321-020-00470-3
Allen F, Pon A, Greiner R, Wishart D (2016) Computational prediction of electron ionization mass spectra to assist in GC/MS compound identification. Anal Chem 88(15):7689–7697. https://doi.org/10.1021/acs.analchem.6b01622
Wang F, Liigand J, Tian S, Arndt D, Greiner R, Wishart DS (2021) Cfm-id 4.0: more accurate ESI-MS/MS spectral prediction and compound identification. Anal Chem 93(34):11692–11700. https://doi.org/10.1021/acs.analchem.1c01465
Kannan R, Solaimalai A, Jayakumar M, Surendran U (2022) Chapter 26 - advance molecular tools to detect plant pathogens. In: Rakshit A, Meena VS, Abhilash PC, Sarma BK, Singh HB, Fraceto L, Parihar M, Singh AK (eds) Biopesticides. Woodhead Publishing, pp 401–416
Zhu G, Bian Y, Hursthouse AS et al (2017) Application of 3-d fluorescence: characterization of natural organic matter in natural water and water purification systems. J Fluoresc 27(6):2069–2094. https://doi.org/10.1007/s10895-017-2146-7
Tomasi J, Mennucci B, Cammi R (2005) Quantum mechanical continuum solvation models. Chem Rev 105(8):2999–3093. https://doi.org/10.1021/cr9904009
Shavitt I, Bartlett RJ (2009) Many-body methods in chemistry and physics: Mbpt and coupled-cluster theory
Pavošević F, Hammes-Schiffer S (2019) Multicomponent equation-of-motion coupled cluster singles and doubles: theory and calculation of excitation energies for positronium hydride. J Chem Phys 150(16):161102. https://doi.org/10.1063/1.5094035
Caricato M (2012) Absorption and emission spectra of solvated molecules with the eom–ccsd–pcm method. J Chem Theory Comput 8(11):4494–4502. https://doi.org/10.1021/ct3006997
Powell J, Heider EC, Campiglia A, Harper JK (2016) Predicting accurate fluorescent spectra for high molecular weight polycyclic aromatic hydrocarbons using density functional theory. J Mol Spectrosc 328:37–45. https://doi.org/10.1016/j.jms.2016.06.015
Ye Z-R, Huang I-S, Chan Y-T et al (2020) Predicting the emission wavelength of organic molecules using a combinatorial qsar and machine learning approach. RSC Adv 10:23834–23841
Shams-Nateri A, Piri N (2016) Prediction of emission spectra of fluorescence materials using principal component analysis. Color Res Appl 41(1):16–21. https://doi.org/10.1002/col.21959
Mai S, Atkins AJ, Plasser F, González L (2019) The influence of the electronic structure method on intersystem crossing dynamics. The case of thioformaldehyde. J Chem Theory Comput 15(6):3470–3480
Ohto T, Dodia M, Xu J et al (2019) Accessing the accuracy of density functional theory through structure and dynamics of the water-air interface. J Phys Chem Lett 10(17):4914–4919. https://doi.org/10.1021/acs.jpclett.9b01983
Wang S, Zhang J, Gharbi O, Vivier V, Gao M, Orazem ME (2021) Electrochemical impedance spectroscopy. Nat Rev Methods Prim 1:41
Magar HS, Hassan RYA, Mulchandani A (2021) Electrochemical impedance spectroscopy (EIS): Principles, construction, and biosensing applications. Sensors. https://doi.org/10.3390/s21196578
Krukiewicz K (2020) Electrochemical impedance spectroscopy as a versatile tool for the characterization of neural tissue: a mini review. Electrochem Commun 116:106742. https://doi.org/10.1016/j.elecom.2020.106742
Pasqual JAR, Freisleben LC, Colpo JC, Egea JRJ, dos Santos LAL, de Sousa VC (2021) In situ drug release measuring in α-tcp cement by electrochemical impedance spectroscopy. J Mater Sci Mater Med 32(4):38. https://doi.org/10.1007/s10856-021-06507-9
Heijne A et al (2018) Quantification of bio-anode capacitance in bioelectrochemical systems using electrochemical impedance spectroscopy. J Power Sour 400:533
Vadhva P, Hu JX, Johnson MJ, Stocker R, Braglia M, Brett DJL, Rettie AJE (2021) Electrochemical impedance spectroscopy for all-solid-state batteries: theory, methods and future outlook. ChemElectroChem 8(11):1930–1947
Meyer Q, Zeng Y, Zhao C (2019) Electrochemical impedance spectroscopy of catalyst and carbon degradations in proton exchange membrane fuel cells. J Power Sourc 437:226922
Pajkossy T, Jurczakowski R (2017) Electrochemical impedance spectroscopy in interfacial studies. Curr Opin Electrochem 1(1):53–58. https://doi.org/10.1016/j.coelec.2017.01.006
Maradesa A, Py B, Quattrocchi E, Ciucci F (2022) The probabilistic deconvolution of the distribution of relaxation times with finite gaussian processes. Electrochim Acta 413:140119. https://doi.org/10.1016/j.electacta.2022.140119
Schulz E, Speekenbrink M, Krause A (2018) A tutorial on gaussian process regression: modelling, exploring, and exploiting functions. J Math Psychol 85:1–16. https://doi.org/10.1016/j.jmp.2018.03.001
Liu J, Ciucci F (2020) The gaussian process distribution of relaxation times: a machine learning tool for the analysis and prediction of electrochemical impedance spectroscopy data. Electrochim Acta 331:135316. https://doi.org/10.1016/j.electacta.2019.135316
Lu Y, Zhao CZ, Huang J, Zhang QK (2022) The timescale identification decoupling complicated kinetic processes in lithium batteries. Joule. 6:1172
Liu J, Wan TH, Ciucci F (2020) A bayesian view on the hilbert transform and the kramers-kronig transform of electrochemical impedance data: probabilistic estimates and quality scores. Electrochim Acta 357:136864
Ciucci F (2020) The gaussian process hilbert transform (gp-ht): TESTING the consistency of electrochemical impedance spectroscopy data. J Electrochem Soc 167:126503
Kiss FL, Corbet BP, Simeth NA, Feringa BL, Crespi S (2021) Predicting the substituent effects in the optical and electrochemical properties of n, n′-substituted isoindigos. Photochem Photobiol Sci 20(7):927–938. https://doi.org/10.1007/s43630-021-00071-5
Li Y, Ullrich CA (2011) Time-dependent transition density matrix. Chem Phys 391:157–163
Li H-W, Guan Z, Cheng Y, Lui T, Yang Q, Lee C-S, Tsang S-W (2016) On the study of exciton binding energy with direct charge generation in photovoltaic polymers. Adv Electr Mater. https://doi.org/10.1002/aelm.201600200
Albuquerque LS, Arias JJR, Santos BPS, Marques M et al (2020) Synthesis and characterization of novel conjugated copolymers for application in third generation photovoltaic solar cells. J Market Res 9:7975–7988
Min K, Choi B, Park K, Cho E (2018) Machine learning assisted optimization of electrochemical properties for ni-rich cathode materials. Sci Rep 8(1):15778. https://doi.org/10.1038/s41598-018-34201-4
Min K, Park K, Park SY, Seo S-W, Choi B, Cho E (2017) Improved electrochemical properties of lini0.91co0.06mn0.03o2 cathode material via li-reactive coating with metal phosphates. Sci Rep 7(1):7151. https://doi.org/10.1038/s41598-017-07375-6
Jalili-Jahani N, Zeraatkar E (2021) Fuzzy wavelet network based on extended kalman filter training algorithm combined with least square weight estimation: efficient and improved chromatographic QSRR/QSPR models. Chemom Intell Lab Syst 208:104191. https://doi.org/10.1016/j.chemolab.2020.104191
Naylor BC, Catrow JL, Maschek JA, Cox JE (2020) Qsrr automator: a tool for automating retention time prediction in lipidomics and metabolomics. Metabolites 10(6):237
Wen Y, Amos RIJ, Talebi M, Szucs R, Dolan JW, Pohl CA, Haddad PR (2018) Retention index prediction using quantitative structure-retention relationships for improving structure identification in nontargeted metabolomics. Anal Chem 90(15):9434–9440. https://doi.org/10.1021/acs.analchem.8b02084
Daghir-Wojtkowiak E, Studzińska S, Buszewski B, Kaliszan R, Markuszewski MJ (2014) Quantitative structure–retention relationships of ionic liquid cations in characterization of stationary phases for hplc. Anal Methods 6(4):1189–1196. https://doi.org/10.1039/C3AY41805G
Goryński K, Bojko B, Nowaczyk A, Buciński A, Pawliszyn J, Kaliszan R (2013) Quantitative structure-retention relationships models for prediction of high performance liquid chromatography retention time of small molecules: endogenous metabolites and banned compounds. Anal Chim Acta 797:13–19. https://doi.org/10.1016/j.aca.2013.08.025
Bodzioch K, Durand A, Kaliszan R, Baczek T, Vander Heyden Y (2010) Advanced QSRR modeling of peptides behavior in RPLC. Talanta 81(4–5):1711–1718. https://doi.org/10.1016/j.talanta.2010.03.028
Filipic S, Nikolic K, Krizman M, Danica A (2008) The quantitative structure–retention relationship (QSRR) analysis of some centrally acting antihypertensives and diuretics. QSAR Comb Sci 27:1036–1044. https://doi.org/10.1002/qsar.200710161
Bahmani A, Saaidpour S, Rostami A (2017) Quantitative structure–retention relationship modeling of morphine and its derivatives on ov-1 column in gas–liquid chromatography using genetic algorithm. Chromatographia 80(4):629–636. https://doi.org/10.1007/s10337-017-3273-7
Zhang DX, Si HZ, Liu X (2014) Quantitative structure-retention time relationship for retention time of coffee flavor compounds. Adv Mater Res 926–930:1010–1013. https://doi.org/10.4028/www.scientific.net/AMR.926-930.1010
Paritala J, Peraman R, Kondreddy VK, Subrahmanyam CVS, Ravichandiran V (2021) Quantitative structure retention relationship (QSRR) approach for assessment of chromatographic behavior of antiviral drugs in the development of liquid chromatographic method. J Liq Chromatogr Relat Technol 44(13–14):637–648. https://doi.org/10.1080/10826076.2022.2025827
Akbar J, Iqbal S, Batool F, Karim A, Chan KW (2012) Predicting retention times of naturally occurring phenolic compounds in reversed-phase liquid chromatography: a quantitative structure-retention relationship (qsrr) approach. Int J Mol Sci 13(11):15387–15400. https://doi.org/10.3390/ijms131115387
Parinet J (2021) Prediction of pesticide retention time in reversed-phase liquid chromatography using quantitative-structure retention relationship models: a comparative study of seven molecular descriptors datasets. Chemosphere 275:130036. https://doi.org/10.1016/j.chemosphere.2021.130036
Maljurić N, Golubović J, Otašević B, Zečević M, Protić A (2018) Quantitative structure –retention relationship modeling of selected antipsychotics and their impurities in green liquid chromatography using cyclodextrin mobile phases. Anal Bioanal Chem 410(10):2533–2550. https://doi.org/10.1007/s00216-018-0911-3
Ji C, Li Y et al (2009) Quantitative structure-retention relationships for mycotoxins and fungal metabolites in LC-MS/MS. J Sep Sci 32:3967–3979. https://doi.org/10.1002/jssc.200900441
Szucs R, Brown R, Brunelli C, Heaton JC, Hradski J (2021) Structure driven prediction of chromatographic retention times: applications to pharmaceutical analysis. Int J Mol Sci 22(8):3848
Duarte A, Capelo S (2006) Application of chemometrics in separation science. J Liq Chromatogr Relat Technol 29(7–8):1143–1176
Bos TS, Knol WC, Molenaar SR, Niezen LE, Schoenmakers PJ, Somsen GW, Pirok BW (2020) Recent applications of chemometrics in one-and two-dimensional chromatography. J Sep Sci 43(9–10):1678–1727
Komsta Ł (2012) Chemometrics in fingerprinting by means of thin layer chromatography. Chromatogr Res Int 2012:893246. https://doi.org/10.1155/2012/893246
BWIQ (2021). Chemometric tool BWIQ-software package. Accessed Jun 2021 from https://bwtek.Com/support and https://bwtek.Com/videos-applications
Whittaker ET (1922) On a new method of graduation. Proc Edinb Math Soc 41:63–75
Suzuki T, Yoshida N (2020) Penalized least squares approximation methods and their applications to stochastic processes. Jpn J Stat Data Sci 3(2):513–541. https://doi.org/10.1007/s42081-019-00064-w
Carlos Cobas J, Bernstein MA, Martín-Pastor M, Tahoces PG (2006) A new general-purpose fully automatic baseline-correction procedure for 1d and 2d NMR data. J Magn Reson 183(1):145–151. https://doi.org/10.1016/j.jmr.2006.07.013
Zhang Z, Chen S, Liang Y et al (2010) An intelligent background-correction algorithm for highly fluorescent samples in Raman spectroscopy. J Raman Spectrosc 41:659–669
Eilers PH (2003) A perfect smoother. Anal Chem 75(14):3631–3636. https://doi.org/10.1021/ac034173t
Zhang ZM, Chen S, Liang YZ (2010) Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst 135(5):1138–1146. https://doi.org/10.1039/b922045c
Baek SJ, Park A, Ahn YJ, Choo J (2015) Baseline correction using asymmetrically reweighted penalized least squares smoothing. Analyst 140(1):250–257. https://doi.org/10.1039/c4an01061b
Chen L, Wu Y, Li T, Chen Z (2018) Collaborative penalized least squares for background correction of multiple Raman spectra. J Anal Methods Chem 2018:9031356. https://doi.org/10.1155/2018/9031356
Pérez Y, Casado M, Raldúa D et al (2020) Mcr-als analysis of (1)h NMR spectra by segments to study the zebrafish exposure to acrylamide. Anal Bioanal Chem 412(23):5695–5706. https://doi.org/10.1007/s00216-020-02789-0
Felten J, Hall H, Jaumot J, Tauler R, de Juan A, Gorzsás A (2015) Vibrational spectroscopic image analysis of biological material using multivariate curve resolution-alternating least squares (MCR-ALS). Nat Protoc 10(2):217–240. https://doi.org/10.1038/nprot.2015.008
Smith JP, Holahan EC, Smith FC, Marrero V, Booksh KS (2019) A novel multivariate curve resolution-alternating least squares (MCR-ALS) methodology for application in hyperspectral raman imaging analysis. Analyst 144(18):5425–5438. https://doi.org/10.1039/C9AN00787C
Nagai Y, Sohn WY, Katayama K (2019) An initial estimation method using cosine similarity for multivariate curve resolution: application to NMR spectra of chemical mixtures. Analyst 144:5986
Pearson K (1901) LIII. On lines and planes of closest fit to systems of points in space. Philos Magaz J Sci 2(11):559–572
Cserháti T (2010) Data evaluation in chromatography by principal component analysis. Biomed Chromatogr 24(1):20–28. https://doi.org/10.1002/bmc.1294
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52
Pang T, Zhang H, Wen L et al (2021) Quantitative analysis of a weak correlation between complicated data on the basis of principal component analysis. J Anal Methods Chem 2021:8874827. https://doi.org/10.1155/2021/8874827
Soares EJ, Clifford AJ, Brown CD, Dean RR, Hupp AM (2019) Balancing resolution with analysis time for biodiesel–diesel fuel separations using GC, PCA, and the mahalanobis distance. Separations 6(2):28
Sudol PE, Gough DV, Prebihalo SE, Synovec RE (2020) Impact of data bin size on the classification of diesel fuels using comprehensive two-dimensional gas chromatography with principal component analysis. Talanta 206:120239
Cook DW, Rutan SC (2014) Chemometrics for the analysis of chromatographic data in metabolomics investigations. J Chemom 28:681–987
Smilde AK, Bro R, Geladi P (2004) Multi-way analysis with applications in the chemical sciences. John Wiley & Sons
Khakimov B, Amigo JM, Bak S, Engelsen SB (2012) Plant metabolomics: resolution and quantification of elusive peaks in liquid chromatography–mass spectrometry profiles of complex plant extracts using multi-way decomposition methods. J Chromatogr A 1266:84–94. https://doi.org/10.1016/j.chroma.2012.10.023
Kumar K (2019) Optimizing parallel factor (PARAFAC) assisted excitation-emission matrix fluorescence (EEMF) spectroscopic analysis of multifluorophoric mixtures. J Fluoresc 29(3):683–691. https://doi.org/10.1007/s10895-019-02379-z
Kumar K, Kumar Mishra A (2015) Parallel factor (PARAFAC) analysis on total synchronous fluorescence spectroscopy (TSFS) data sets in excitation–emission matrix fluorescence (EEMF) layout: certain practical aspects. Chemom Intell Lab Syst 147:121–130. https://doi.org/10.1016/j.chemolab.2015.08.008
Dramićanin T, Zeković I, Periša J, Dramićanin MD (2019) The parallel factor analysis of beer fluorescence. J Fluoresc 29(5):1103–1111. https://doi.org/10.1007/s10895-019-02421-0
Peng N, Wang K, Tu N, Liu Y, Li Z (2020) Fluorescence regional integration combined with parallel factor analysis to quantify fluorescencent spectra for dissolved organic matter released from manure biochars. RSC Adv 10(52):31502–31510. https://doi.org/10.1039/D0RA02706E
Barker M, Rayens W (2003) Partial least squares for discrimination. J Chemom 17(3):166–173. https://doi.org/10.1002/cem.785
Ruiz-Perez D, Guan H, Madhivanan P, Mathee K, Narasimhan G (2020) So you think you can pls-da? BMC Bioinf 21(1):2. https://doi.org/10.1186/s12859-019-3310-7
Bayci AWL, Baker DA, Somerset AE et al (2018) Metabolomic identification of diagnostic serum-based biomarkers for advanced stage melanoma. Metabolomics 14(8):105. https://doi.org/10.1007/s11306-018-1398-9
Dittgen CL, Hoffmann JF, Chaves FC, Rombaldi CV, Filho JMC, Vanier NL (2019) Discrimination of genotype and geographical origin of black rice grown in brazil by LC-MS analysis of phenolics. Food Chem 288:297–305. https://doi.org/10.1016/j.foodchem.2019.03.006
Tsanaktsidou E, Karavasili C, Zacharis CK, Fatouros DG, Markopoulou CK (2020) Partial least square model (PLS) as a tool to predict the diffusion of steroids across artificial membranes. Molecules. https://doi.org/10.3390/molecules25061387
Wang X, Wang P, Zhang A, Sun H (2015) Chapter 11 - metabolic profiling and potential biomarkers of Shenyinxu syndrome and the therapeutic effect of liuweidihuang wan. In: Wang X, Zhang A, Sun H (eds) Chinmedomics. Academic Press, Boston, pp 175–194
Zhang J, Yu Q, Cheng H, Ge Y-Q, Liu H, Ye X, Chen Y (2018) Metabolomic approach for the authentication of berry fruit juice by liquid chromatography quadrupole time-of-flight mass spectrometry coupled to chemometrics. J Agric Food Chem 66(30):8199–8208
Cao S, Du H, Tang B, Xi C, Chen Z (2021) Non-target metabolomics based on high-resolution mass spectrometry combined with chemometric analysis for discriminating geographical origins of Rhizoma Coptidis. Microchem J 160:105685. https://doi.org/10.1016/j.microc.2020.105685
Pisner DA, Schnyer DM (2020) Chapter 6 - support vector machine. In: Mechelli A, Vieira S (eds) Machine learning. Academic Press, pp 101–121
Zheng S (2016) Smoothly approximated support vector domain description. Pattern Recogn 49:55–64. https://doi.org/10.1016/j.patcog.2015.07.003
Képeš E, Vrábel J, Adamovsky O, Střítežská S, Modlitbová P, Pořízka P, Kaiser J (2022) Interpreting support vector machines applied in laser-induced breakdown spectroscopy. Anal Chim Acta 1192:339352. https://doi.org/10.1016/j.aca.2021.339352
Khanmohammadi Khorrami M et al (2021) Genetic algorithm based support vector machine regression for prediction of sara analysis in crude oil samples using ATR-FTIR spectroscopy. Spectrochim Acta A Mol Biomol Spectrosc 245:118945. https://doi.org/10.1016/j.saa.2020.118945
Hwang H, Jeong HK, Lee HK et al (2020) Machine learning classifies core and outer fucosylation of n-glycoproteins using mass spectrometry. Sci Rep 10(1):318. https://doi.org/10.1038/s41598-019-57274-1
Usman AG et al (2020) Artificial intelligence-based models for the qualitative and quantitative prediction of a phytochemical compound using hplc method. Turk J Chem 44(5):1339–1351. https://doi.org/10.3906/kim-2003-6
Mendez KM, Broadhurst DI, Reinke SN (2020) Migrating from partial least squares discriminant analysis to artificial neural networks: a comparison of functionally equivalent visualisation and feature contribution tools using jupyter notebooks. Metabolomics 16(2):17. https://doi.org/10.1007/s11306-020-1640-0
D’Archivio AA, Incani A, Ruggieri F (2011) Retention modelling of polychlorinated biphenyls in comprehensive two-dimensional gas chromatography. Anal Bioanal Chem 399:903–913
Malenović A, Jančić-Stojanović BS, Kostić N, Ivanović D, Medenica M (2011) Optimization of artificial neural networks for modeling of atorvastatin and its impurities retention in micellar liquid chromatography. Chromatographia 73:993–998
Xu Y, Chen J, Yang D, Hu Y et al (2021) Development of lc-ms/ms determination method and backpropagation artificial neural networks pharmacokinetic model of febuxostat in healthy subjects. J Clin Pharm Ther 46(2):333–342. https://doi.org/10.1111/jcpt.13285
Huang S, Liu Y, Sun X, Li J (2021) Application of artificial neural network based on traditional detection and GC-MS in prediction of free radicals in thermal oxidation of vegetable oil. Molecules. https://doi.org/10.3390/molecules26216717
Mert Ozupek N, Cavas L (2021) Modelling of multilinear gradient retention time of bio-sweetener rebaudioside a in HPLC analysis. Anal Biochem 627:114248. https://doi.org/10.1016/j.ab.2021.114248
Dinç E, Yücesoy C, Onur F (2002) Simultaneous spectrophotometric determination of mefenamic acid and paracetamol in a pharmaceutical preparation using ratio spectra derivative spectrophotometry and chemometric methods. J Pharm Biomed Anal 28(6):1091–1100. https://doi.org/10.1016/S0731-7085(02)00031-6
Dinç E (2003) Linear regression analysis and its application to the multivariate spectral calibrations for the multiresolution of a ternary mixture of caffeine, paracetamol and metamizol in tablets. J Pharm Biomed Anal 33(4):605–615. https://doi.org/10.1016/s0731-7085(03)00260-7
Niazi A, Goodarzi M (2008) Orthogonal signal correction-partial least squares method for simultaneous spectrophotometric determination of cypermethrin and tetramethrin. Spectrochim Acta A Mol Biomol Spectrosc 69(4):1165–1169. https://doi.org/10.1016/j.saa.2007.06.017
Dinç E, Baydan E, Kanbur M, Onur F (2002) Spectrophotometric multicomponent determination of sunset yellow, tartrazine and allura red in soft drink powder by double divisor-ratio spectra derivative, inverse least-squares and principal component regression methods. Talanta 58(3):579–594. https://doi.org/10.1016/S0039-9140(02)00320-X
Chen K, Park J, Li F, Patil SM, Keire DA (2018) Chemometric methods to quantify 1d and 2d NMR spectral differences among similar protein therapeutics. AAPS PharmSciTech 19(3):1011–1019. https://doi.org/10.1208/s12249-017-0911-1
Wasim M, Brereton RG (2005) Application of multivariate curve resolution methods to on-flow LC-NMR. J Chromatogr A 1096(1–2):2–15. https://doi.org/10.1016/j.chroma.2005.05.101
Gargallo R, Tauler R, Cuesta-Sánchez F, Massart DL (1996) Validation of alternating least-squares multivariate curve resolution for chromatographic resolution and quantitation. TrAC Trends Anal Chem 15(7):279–286. https://doi.org/10.1016/0165-9936(96)00048-9
Peré-Trepat E, Lacorte S, Tauler R (2005) Solving liquid chromatography mass spectrometry coelution problems in the analysis of environmental samples by multivariate curve resolution. J Chromatogr A 1096(1):111–122. https://doi.org/10.1016/j.chroma.2005.04.089
Bylund D, Danielsson R, Malmquist G, Markides KE (2002) Chromatographic alignment by warping and dynamic programming as a pre-processing tool for parafac modelling of liquid chromatography–mass spectrometry data. J Chromatogr A 961(2):237–244. https://doi.org/10.1016/S0021-9673(02)00588-5
Deng X, Liao Q, Xu X et al (2014) Analysis of essential oils from cassia bark and cassia twig samples by GC-MS combined with multivariate data analysis. Food Anal Methods 7(9):1840–1847. https://doi.org/10.1007/s12161-014-9821-y
Fraga CG (2003) Chemometric approach for the resolution and quantification of unresolved peaks in gas chromatography–selected-ion mass spectrometry data. J Chromatogr A 1019(1–2):31–42. https://doi.org/10.1016/s0021-9673(03)01329-3
Lukitaningsih E et al (2012) Quantitative analysis of lard in cosmetic lotion formulation using FTIR spectroscopy and partial least square calibration. J Am Oil Chem Soc 89(8):1537–1543. https://doi.org/10.1007/s11746-012-2052-8
Rohman A, Che Man YB (2011) Application of fourier transform infrared (FT-IR) spectroscopy combined with chemometrics for authentication of cod-liver oil. Vib Spectrosc 55(2):141–145. https://doi.org/10.1016/j.vibspec.2010.10.001
Yan F, Liang Z, Jianna C, Zhengtao W, Losahan X, Zhengxing Z (2001) Analysis of Cnidium Monnieri fruits in different regions of china. Talanta 53(6):1155–1162. https://doi.org/10.1016/S0039-9140(00)00594-4
Comas E, Gimeno RA, Ferré J, Marcé RM, Borrull F, Rius FX (2004) Quantification from highly drifted and overlapped chromatographic peaks using second-order calibration methods. J Chromatogr A 1035(2):195–202. https://doi.org/10.1016/j.chroma.2004.02.069
Detroyer A, Schoonjans V, Questier F, Vander Heyden Y, Borosy AP, Guo Q, Massart DL (2000) Exploratory chemometric analysis of the classification of pharmaceutical substances based on chromatographic data. J Chromatogr A 897(1):23–36. https://doi.org/10.1016/S0021-9673(00)00803-7
Lu X-F, Bi K-S, Zhao X, Chen X-H (2012) Authentication and distinction of shenmai injection with hplc fingerprint analysis assisted by pattern recognition techniques. J Pharm Anal 2(5):327–333. https://doi.org/10.1016/j.jpha.2012.07.009
Abdollahi H, Nazari F (2003) Rank annihilation factor analysis for spectrophotometric study of complex formation equilibria. Analytica Chimica Acta ANAL CHIM ACTA 486:109–123. https://doi.org/10.1016/S0003-2670(03)00471-9
Kong W-J, Zhao Y-L, Xiao X-H, Jin C, Li Z-L (2009) Quantitative and chemical fingerprint analysis for quality control of Rhizoma Coptidischinensis based on UPLC-pad combined with Chemometrics methods. Phytomedicine 16(10):950–959. https://doi.org/10.1016/j.phymed.2009.03.016
O’Connell M-L, Ryder AG, Leger MN, Howley T (2010) Qualitative analysis using raman spectroscopy and chemometrics: A comprehensive model system for narcotics analysis. Appl Spectrosc 64(10):1109–1121
Porfire A, Tomuta I, Tefas L, Leucuta SE, Achim M (2012) Simultaneous quantification of l-α-phosphatidylcholine and cholesterol in liposomes using near infrared spectrometry and chemometry. J Pharm Biomed Anal 63:87–94. https://doi.org/10.1016/j.jpba.2012.01.017
Candolfi A, De Maesschalck R, Massart DL, Hailey PA, Harrington AC (1999) Identification of pharmaceutical excipients using NIR spectroscopy and simca. J Pharm Biomed Anal 19(6):923–935. https://doi.org/10.1016/s0731-7085(98)00234-9
Rodríguez Cáceres MI, Durán Merás I, Ornelas Soto NE, López de Alba PL, López Martinez L (2008) Determination of anticarcinogenic and rescue therapy drugs in urine by photoinduced spectrofluorimetry using multivariate calibration: comparison of several second-order methods. Anal Bioanal Chem 391(4):1119–1127. https://doi.org/10.1007/s00216-008-2069-x
Ter Horst JP, Turimella SL, Metsers F, Zwiers A (2021) Implementation of quality by design (QBD) principles in regulatory dossiers of medicinal products in the European Union (EU) between 2014 and 2019. Ther Innov Regul Sci 55(3):583–590. https://doi.org/10.1007/s43441-020-00254-9
Gurrala S, Raj S, Cvs S, Anumolu PD (2022) Quality-by-design approach for chromatographic analysis of metformin, Empagliflozin and Linagliptin. J Chromatogr Sci 60(1):68–80
Zacharis CK, Vastardi E (2018) Application of analytical quality by design principles for the determination of alkyl p-toluenesulfonates impurities in aprepitant by HPLC. Validation using total-error concept. J Pharm Biomed Anal 150:152–161
Almeida J, Bezerra M, Markl D, Berghaus A, Borman P, Schlindwein W (2020) Development and validation of an in-line API quantification method using AQBD principles based on UV-vis spectroscopy to monitor and optimise continuous hot melt extrusion process. Pharmaceutics 12(2):150
Žigart N, Časar Z (2020) Development of a stability-indicating analytical method for determination of venetoclax using AQBD principles. ACS Omega 5(28):17726–17742
Bandopadhyay S, Beg S, Katare O, Sharma T, Singh B (2020) Integrated analytical quality by design (AQBD) approach for the development and validation of bioanalytical liquid chromatography method for estimation of valsartan. J Chromatogr Sci 58(7):606–621
Bossunia MTI, Urmi KF, Chironjit Kumar S (2017) Quality-by-design approach to stability indicating RP-HPLC analytical method development for estimation of canagliflozin API and its validation. Pharm Methods 8:2
Peraman R, Bhadraya K, Reddy YP, Reddy CS, Lokesh T (2015) Analytical quality by design approach in RP-HPLC method development for the assay of etofenamate in dosage forms. Indian J Pharm Sci 77(6):751–757. https://doi.org/10.4103/0250-474x.174971
Tumpa A, Stajić A, Jančić-Stojanović B, Medenica M (2017) Quality by design in the development of hydrophilic interaction liquid chromatography method with gradient elution for the analysis of olanzapine. J Pharm Biomed Anal 134:18–26
Peraman R, Kalva B, Shanka S, Reddy YP (2014) Analytical quality by design (AQBD) approach to liquid chromatographic method for quantification of acyclovir and hydrocortisone in dosage forms. Anal Chem Lett 4(5–6):329–342
Peraman R, Bandi J, Kondreddy VK et al (2021) Analytical quality by design approach versus conventional approach: development of HPLC-dad method for simultaneous determination of etizolam and propranolol hydrochloride. J Liquid Chromatogr Relat Technol 44(3–4):197–209
Palakurthi AK, Dongala T, Katakam LNR (2020) QBD based development of HPLC method for simultaneous quantification of telmisartan and hydrochlorothiazide impurities in tablets dosage form. Pract Lab Med 21:e00169
Krishna MV, Dash RN, Reddy BJ, Venugopal P, Sandeep P, Madhavi G (2016) Quality by design (QBD) approach to develop HPLC method for eberconazole nitrate: application oxidative and photolytic degradation kinetics. J Saudi Chem Soc 20:S313–S322
Thakur D, Jain A, Ghoshal G, Shivhare U, Katare O (2017) RP-HPLC method development using analytical QBD approach for estimation of cyanidin-3-o-glucoside in natural biopolymer based microcapsules and tablet dosage form. J Pharm Investig 47(5):413–427
Sandhu PS, Beg S, Kumar R, Katare O, Singh B (2017) Analytical QBD-based systematic bioanalytical HPLC method development for estimation of quercetin dihydrate. J Liq Chromatogr Relat Technol 40(10):506–516
Alruwaili NK (2021) Analytical quality by design approach of reverse-phase high-performance liquid chromatography of atorvastatin: method development, optimization, validation, and the stability-indicated method. Int J Anal Chem. https://doi.org/10.1155/2021/8833900
Fink DW (1988) Ivermectin analytical profiles of drug substances. Elsevier, pp 155–184
Abdel-Moety EM, Al-Khamees HA (1990) Analytical profile of azintamide. In: Analytical profiles of drug substances, vol 18, pp 1–32. Elsevier
Zubair MU, Hassan MM, Al-Meshal IA (1986) Caffeine. In: Analytical profiles of drug substances, vol 15, pp 71–150. Elsevier
Florey K (1979) Aspirin. In: Analytical profiles of drug substances, vol 8, pp 1–46. Elsevier
Townley ER (1979) Griseofulvin. In: Analytical profiles of drug substances, vol 8, pp 219–249. Elsevier
Pitré D (1986) Iodamide. In: Analytical profiles of drug substances, vol 15, pp 337–365. Elsevier.
Wishart DS, Knox C, Guo AC, Eisner R, Young N, Gautam B et al (2009) Hmdb: a knowledgebase for the human metabolome. Nucl Acids Res 37(suppl_1):D603–D610
SDBSWeb (2014) Sdbsweb: National institute of advanced industrial science and technology. Accessed May 2021 from http://sdbs.Riodb.Aist.Go.Jp
Hassan MM, Elazzouny AA (1982) Clofibrate. In: Analytical profiles of drug substances, vol 11, pp 197–224. Elsevier
Piskorik HG (1985) Tripelennamine hydrochloride. In: Analytical profiles of drug substances. vol 14, pp 107–133. Elsevier
Pitrè D (1985) Iopoanoic acid. In Analytical profiles of drug substances, vol. 14, pp 181–206. Elsevier
Brittain HG (2001) Malic acid. Profiles Drug Subst Excip Relat Methodol 28:153–195
Al-Badr AA, El-Obeid HA (1988) Analytical profile of primidone. In: Analytical profiles of drug substances, vol 17, pp 749–795. Elsevier
Fairbrother JE (1974) Acetaminophen. In: Analytical profiles of drug substances, vol 3, pp 1–109. Elsevier
Ali SL (1983) Benzocaine. In: Analytical profiles of drug substances, vol 12, pp 73–104. Elsevier
Eisner U, Gore PH (1958) The light absorption of pyrroles. Part i. Ultraviolet spectra. J Chem Soc 186:922–927. https://doi.org/10.1039/JR9580000922
Filpponen I, Sadeghifar H, Argyropoulos DS (2011) Photoresponsive cellulose nanocrystals. Nanomater Nanotechnol 1:7. https://doi.org/10.5772/50949
Arayne MS, Sultana N, Zuberi MH, Siddiqui FA (2009) Spectrophotometric quantitation of metformin in bulk drug and pharmaceutical formulations using multivariate technique. Indian J Pharm Sci 71(3):331–335. https://doi.org/10.4103/0250-474x.56022
Dunn DL, Jones WJ, Dorsey ED (1983) Analysis of chlorobutanol in ophthalmie ointments and aqueous solutions by reverse-phase high-performance liquid chromatography. J Pharm Sci 72(3):277–280. https://doi.org/10.1002/jps.2600720317
Pubchem (2022) Pubchem database. Accessed May 2021 from https://pubchem.Ncbi.Nlm.Nih.Gov.
Fairbrother JE (1973) Chloral hydrate. In: Florey K (ed) Analytical profiles of drug substances. Academic Press, pp 85–143
Daley RD (1972) Halothane. In: Florey K (ed) Analytical profiles of drug substances. Academic Press, pp 119–147
Orzech CE, Nash NG, Daley RD (1979) Halcinonide. In: Florey K (ed) Analytical profiles of drug substances. Academic Press, pp 283–314
Stober HC (1986) Lithium carbonate. In: Florey K (ed) Analytical profiles of drug substances. Academic Press, pp 367–391
Chao MKC, Albert KS, Fusari SA (1978) Phenobarbital. In: Florey K (ed) Analytical profiles of drug substances. Academic Press, pp 359–399
Newman AW, Vitez IM, Mueller RL, Kiesnowski CC et al (1999) Sorbitol. In: Brittain HG (ed) Analytical profiles of drug substances and excipients. Academic Press, pp 459–502
Manius GJ (1978) Trimethoprim. In: Florey K (ed) Analytical profiles of drug substances. Academic Press, pp 445–475
Coates J (2000) Interpretation of infrared spectra: a practical approach. John Wiley & Sons Ltd, Chichester
Vilas S, Thilagar S (2021) Formulation and optimisation of lamivudine-loaded eudragit(®) s 100 polymer-coated pectin microspheres for colon-specific delivery. IET Nanobiotechnol 15(1):90–99. https://doi.org/10.1049/nbt2.12010
Bansal R, Guleria A, Acharya PC (2013) FT-IR method development and validation for quantitative estimation of zidovudine in bulk and tablet dosage form. Drug Res 63(4):165–170. https://doi.org/10.1055/s-0032-1333297
Anbazhagan S, Indumathy N, Shanmugapandiyan P, Sridhar SK (2005) Simultaneous quantification of stavudine, lamivudine and nevirapine by UV spectroscopy, reverse phase HPLC and HPTLC in tablets. J Pharm Biomed Anal 39(3–4):801–804
Chandramowli B, Kumar BS, Bhikshapathi D, Rajkamal BB (2018) A new quantitative analytical method development and validation for the analysis of boceprevir in bulk and marketed formulation. Int J Pharm Sci Drug Res 10(3):201–205
Snehal DJ, Prafulla SC, Vishal SM (2018) HPLC method development of cidofovir as bulk drug and its formulation. Int J Eng Devel Res 6(2):870–874
Abdul K, Milon M, Mohammad A, Parvin M, Rafiquzzaman M, Kundu S (2017) Development and validation of a new RP-HPLC method for the determination of daclatasvir dihydrochloride in bulk and pharmaceutical dosage forms. Int J Pharm 7:7–13
Satyanarayana L, Naidu S, Rao MN, Ayyanna C, Kumar A (2011) The estimation of RALTIGRAVIR in tablet dosage form by RP-HPLC. Asian J Pharm Anal 1(3):56–58
Tiruveedhi VBG, Battula VR, Bonige KB (2021) Rp-hplc (stability-indicating) based assay method for the simultaneous estimation of doravirine, tenofovir disoproxil fumarate and lamivudine. Int J Appl Pharm 13(1):153–159
Tejaswi KD et al (2019) Reverse-phase high-performance liquid chromatography method development and validation for simultaneous estimation and forced degradation studies of Emtricitabine, Rilpivirine, and Tenofovir Alafenamide in solid dosage form. Asian J Pharm Clin Res 12:112. https://doi.org/10.22159/ajpcr.2018.v12i1.28765
Mastanamma S, Chandini S, Reehana S, Saidulu P (2018) Development and validation of stability indicating RP-HPLC method for the simultaneous estimation of Sofosbuvir and Ledipasvir in bulk and their combined dosage form. Future J Pharm Sci 4(2):116–123
Dey S, Patro SS, Babu NS, Murthy P, Panda S (2017) Development and validation of a stability-indicating RP–HPLC method for estimation of atazanavir sulfate in bulk. J Pharm Anal 7(2):134–140
Rathnasamy R, Karuvalam R, Pakkath R, Kamalakannan P, Sivasubramanian A (2018) RP-HPLC method development and method validation of lopinavir and ritonavir in pharmaceutical dosage form. Am J Clin Microbiol Antimicrob 1(1):1002
Ezzeldin E, Abo-Talib NF, Tammam MH, Asiri YA, Amr AE-GE, Almehizia AA (2020) Validated reversed-phase liquid chromatographic method with gradient elution for simultaneous determination of the antiviral agents: sofosbuvir, ledipasvir, daclatasvir, and simeprevir in their dosage forms. Molecules 25(20):4611
Ahmad SAR, Patil L, Usman MRM, Imran M, Akhtar R (2018) Analytical method development and validation for the simultaneous estimation of abacavir and lamivudine by reversed-phase high-performance liquid chromatography in bulk and tablet dosage forms. Pharm Res 10(1):92. https://doi.org/10.4103/pr.pr_96_17
Haneef M, Rajkamal B, Goud VM (2013) Development and validation by RP-HPLC method for estimation of zidovudine in bulk and its pharmaceutical dosage form. Asian J Res Chem 6(4):341–344
Godela R, Gummadi S (2021) A simple stability indicating RP-HPLC-dad method for concurrent analysis of tenofovir disoproxil fumarate, doravirine and lamivudine in pure blend and their combined film coated tablets. Paper presented at the Annales Pharmaceutiques Françaises.
Harikrishnan N, Prasad VV et al (2019) Stability indicating RP-HPLC method development and validation for the simultaneous estimation of pibrentasvir and glecaprevir in bulk and pharmaceutical dosage form. Int J Res Pharm Sci 10(3):1841–1846
Ramya V et al (2019) Simultaneous estimation of amantadine hydrochloride and oseltamivir phosphate using precolumn derivatization technique. Int J Pharm Sci Res 10(12):5443–5449. https://doi.org/10.13040/IJPSR.0975-8232.10(12).5443-49
García J, Márquez A, Ruiz R, López LF, Claro C, Lucero MJ (2006) Determination of foscarnet (trisodium phosphonoformate) in pharmaceutical preparations by high-performance liquid chromatography with ultraviolet detection. Biomed Chromatogr 20(10):1024–1027
Ramesh P, Basavaiah K, Vinay K, Xavier CM (2012) Development and validation of Rp-HPLC method for the determination of ganciclovir in bulk drug and in formulations. Int Schol Res Not. https://doi.org/10.5402/2012/894965
Nekkala K, Kumar JS, Ramachandran D (2019) Stability indicating RP-HPLC method for quantification of impurities in fosamprenavir calcium drug substance. J Pharm Sci Res 11(3):712–717
Bulduk İ (2021) HPLC-UV method for quantification of favipiravir in pharmaceutical formulations. Acta Chromatogr 33(3):209–215
Asha E, Surendra Babu K (2020) A new selective separation method development and validation of trifluridine and tipiracil and its degradents were characterized by LC-MS/MS/QTOF. J Pharm Sci Res 12(1):199–205
Reddy Y, Harika K, Sowjanya K, Swathi E, Soujanya B, Reddy S (2011) Estimation of zanamivir drug present in tablets using RP-HPLC method. Int J PharmTech Res 3:180–186
Dalmora SL, Sangoi M et al (2010) Validation of a stability-indicating RP-HPLC method for the determination of entecavir in tablet dosage form. J AOAC Int 93(2):523–530
Prakash KV, Rao JV, Raju NA (2007) RP-HPLC method for the estimation of nelfinavir mesylate in tablet dosage form. E J Chem 4(3):302–306
Jing Q, Shen Y, Tang Y, Ren F, Yu X, Hou Z (2006) Determination of nelfinavir mesylate as bulk drug and in pharmaceutical dosage form by stability indicating HPLC. J Pharm Biomed Anal 41(3):1065–1069
Attaluri T, Seru G, Varanasi SNM (2021) Development and validation of a stability-indicating RP-HPLC method for the simultaneous estimation of bictegravir, emtricitabine, and tenofovir alafenamide fumarate. Turk J Pharm Sci 18(4):410–419. https://doi.org/10.4274/tjps.galenos.2020.70962
Jitta SR, Salwa K et al (2021) Development and validation of high-performance liquid chromatography method for the quantification of remdesivir in intravenous dosage form. Assay Drug Dev Technol 19(8):475–483. https://doi.org/10.1089/adt.2021.074
Shuangjin C, Fang F, Han L, Ming M (2007) New method for high-performance liquid chromatographic determination of amantadine and its analogues in rat plasma. J Pharm Biomed Anal 44(5):1100–1105
Kaliszan R (2007) Qsrr: Quantitative structure-(chromatographic) retention relationships. Chem Rev 107(7):3212–3246
Venkat Kumar C, Anantahakumar D, Rao J (2010) A new validated rp- hplc method for determination of penciclovir in human plasma. Int J Chem Sci 2:95–102
Raees Ahmad SA, Patil L, Mohammed Usman MR, Imran M, Akhtar R (2018) Analytical method development and validation for the simultaneous estimation of abacavir and lamivudine by reversed-phase high-performance liquid chromatography in bulk and tablet dosage forms. Pharmacognosy Res 10(1):92–97. https://doi.org/10.4103/pr.pr_96_17
Hussain Shah SS, Nasiri MI, Sarwar H, Ali A et al (2021) RP-HPLC method development and validation for quantification of daclatasvir dihydrochloride and its application to pharmaceutical dosage form. Pak J Pharm Sci 34(3):951–956
Hiremath SN, Bhirud CH (2015) Development and validation of a stability indicating HPLC method for the simultaneous analysis of lopinavir and ritonavir in fixed-dose combination tablets. J Taib Univ Med Sci 10(3):271–277. https://doi.org/10.1016/j.jtumed.2014.11.006
dos Santos JV, de Carvalho LA, Pina ME (2011) Development and validation of a RP-HPLC method for the determination of zidovudine and its related substances in sustained-release tablets. Anal Sci 27(3):283–289. https://doi.org/10.2116/analsci.27.283
Higashi Y, Uemori I, Fujii Y (2005) Simultaneous determination of amantadine and rimantadine by hplc in rat plasma with pre-column derivatization and fluorescence detection for pharmacokinetic studies. Biomed Chromatogr 19(9):655–662. https://doi.org/10.1002/bmc.492
Mamatha J, Devanna N (2017) Development and validation of a rp-hplc method for analysis of cidofovir in medicinal form. Indian J Sci Technol 10:1–5. https://doi.org/10.17485/ijst/2017/v10i34/117108
Patel J, Bedi H, Middha A, Prajapati L, Parmar V (2012) Rp-hplc method for estimation of nitazoxanide in oral suspension formulation. Der Pharm Chem 4:1140–1144
Kanagala V, Anusha M, Reddy S (2015) Rapid rp-hplc method development and validation for analysis of raltegravir in bulk and pharmaceutical dosage form. Asian J Pharm Anal. https://doi.org/10.5958/2231-5675.2015.00003.4
Babu G, Atmakuri LR, Rao J (2015) A rapid RP-HPLC method development and validation for the quantitative estimation ribavirin in tablets. Int J Pharm Pharm Sci 7:60–63
Chandrasekaran B, Abed SN, Al-Attraqchi O, Kuche K, Tekade RK (2018) Chapter 21 - computer-aided prediction of pharmacokinetic (admet) properties. In: Tekade RK (ed) Dosage form design parameters. Academic Press, pp 731–755
Wu Y-J, Zhan T, Hou Z, Fang L, Xu Y (2020) Physical and chemical descriptors for predicting interfacial thermal resistance. Sci Data 7:36. https://doi.org/10.1038/s41597-020-0373-2
Zhang C, Idelbayev Y, Roberts N, Tao Y et al (2017) Small molecule accurate recognition technology (smart) to enhance natural products research. Sci Rep 7(1):14243–14243. https://doi.org/10.1038/s41598-017-13923-x
Stagner WC, Haware RV (2019) Qbd innovation through advances in pat, data analysis methodologies, and material characterization. AAPS PharmSciTech 20(7):295. https://doi.org/10.1208/s12249-019-1506-9
Skoog DA et al (2014) Fundamentals of analytical chemistry, 9th edn. Brooks/Cole, Belmont CA
Booksh KS, Kowalski BR (1994) Theory of analytical chemistry. Anal Chem 66(15):782A-791A. https://doi.org/10.1021/ac00087a718
Wise BM, Gallagher NB (1996) The process chemometrics approach to process monitoring and fault detection. J Process Control 6(6):329–348
Fu H, Yin Q, Xu L, Wang W, Chen F, Yang T (2017) A comprehensive quality evaluation method by ft-nir spectroscopy and chemometric: fine classification and untargeted authentication against multiple frauds for chinese ganoderma lucidum. Spectrochimica acta. Part A Mol Biomol Spectrosc 182:17–25. https://doi.org/10.1016/j.saa.2017.03.074
Rady AM, Guyer DE (2015) Evaluation of sugar content in potatoes using nir reflectance and wavelength selection techniques. Postharvest Biol Technol 103:17–26. https://doi.org/10.1016/j.postharvbio.2015.02.012
Hong X, Wang J, Qi G (2015) E-nose combined with chemometrics to trace tomato-juice quality. J Food Eng 149:38–43. https://doi.org/10.1016/j.jfoodeng.2014.10.003
Canizo BV, Escudero LB, Pérez MB, Pellerano RG, Wuilloud RG (2017) Intra-regional classification of grape seeds produced in mendoza province (argentina) by multi-elemental analysis and chemometrics tools. Food Chem 242:272–278
Liu W, Liu C, Hu X, Yang J, Zheng L (2016) Application of terahertz spectroscopy imaging for discrimination of transgenic rice seeds with chemometrics. Food Chem 210:415–421. https://doi.org/10.1016/j.foodchem.2016.04.117
Barbosa RM, de Paula ES, Paulelli AC, Moore AF, Souza JMO et al (2016) Recognition of organic rice samples based on trace elements and support vector machines. J Food Compos Anal 45:95–100. https://doi.org/10.1016/j.jfca.2015.09.010
Zheng W, Fu X, Ying Y (2014) Spectroscopy-based food classification with extreme learning machine. Chemom Intell Lab Syst 139:42–47. https://doi.org/10.1016/j.chemolab.2014.09.015
PK H (1985) Receptor modling in environmental chemistry. John Wiley Sons, New York
Golubović J, Protić A, Otašević B, Zečević M (2016) Quantitative structure-retention relationships applied to development of liquid chromatography gradient-elution method for the separation of sartans. Talanta 150:190–197. https://doi.org/10.1016/j.talanta.2015.12.035
Khosrokhavar R, Ghasemi JB, Shiri F (2010) 2d quantitative structure-property relationship study of mycotoxins by multiple linear regression and support vector machine. Int J Mol Sci 11(9):3052–3068. https://doi.org/10.3390/ijms11093052
Fu T, Sieng IDY (2021) A comparative study between pcr, plsr, and lw-pls on the predictive performance at different data splitting ratios. Chem Eng Commun. https://doi.org/10.1080/00986445.2021.1957853
El-Gindy A, Emara S, Mostafa A (2005) Hplc and chemometric-assisted spectrophotometric methods for simultaneous determination of atenolol, amiloride hydrochloride and chlorthalidone. Il Farmaco 60(3):269–278. https://doi.org/10.1016/j.farmac.2004.11.013
Luis ML, Fraga JMG, Jiménez F, Jiménez AI, Arias JJ (2001) Simultaneous spectrophotometric determination of diuretics by using multivariate calibration methods. Talanta 53(4):761–770. https://doi.org/10.1016/S0039-9140(00)00538-5
Cramer RD (1993) Partial least squares (pls): its strengths and limitations. Perspect Drug Discov Des 1(2):269–278. https://doi.org/10.1007/BF02174528
El-Gindy A, Emara S, Mostafa A (2006) Application and validation of chemometrics-assisted spectrophotometry and liquid chromatography for the simultaneous determination of six-component pharmaceuticals. J Pharm Biomed Anal 41(2):421–430. https://doi.org/10.1016/j.jpba.2005.12.005
Koleini F, Balsini P, Parastar H (2022) Evaluation of partial least-squares regression with multivariate analytical figures of merit for determination of 10 pesticides in milk. Int J Environ Anal Chem 102(8):1900–1910. https://doi.org/10.1080/03067319.2020.1745198
Pate ME, Turner MK, Thornhill NF, Titchener-Hooker NJ (1999) The use of principal component analysis for the modelling of high performance liquid chromatography. Bioprocess Eng 21(3):261–272. https://doi.org/10.1007/s004490050674
Singh VD, Daharwal SJ (2017) Development and validation of multivariate calibration methods for simultaneous estimation of paracetamol, enalapril maleate and hydrochlorothiazide in pharmaceutical dosage form. Spectrochim Acta A Mol Biomol Spectrosc 171:369–375. https://doi.org/10.1016/j.saa.2016.08.028
Acknowledgements
We would like to express our gratitude to our college’s organization, co-workers, and collaborations for moulding review collection concerns; without their enthusiasm and hard work, we would not have been able to succeed, as well as the software developers of QSRR Automator, ORCA, and Avogadro.
Study involving plants/licence for the study
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Ethical approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors whose names are listed in this manuscript that they have NO affiliations with or involvement in any organization or entity with any financial or nonfinancial interest in the subject matter or materials discussed in this manuscript.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Malarvannan, M., Kumar, K.V., Reddy, Y.P. et al. Assessment of computational approaches in the prediction of spectrogram and chromatogram behaviours of analytes in pharmaceutical analysis: assessment review. Futur J Pharm Sci 9, 86 (2023). https://doi.org/10.1186/s43094-023-00537-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s43094-023-00537-6