### Materials

Materials include stearic acid, glycerol, and talc (BDH Chemicals Ltd Poole, England); microcrystalline cellulose (Avicel® PH101) (ATOZ Pharmaceuticals Ltd, Ambaltur, India); absolute ethanol and hydrochloric acid (Emerck Darmstadt, Germany); xylene (Loba Chemic Laboratory Ltd., Mumbai, India); sodium hydroxide pellets (Avondale Laboratories Ltd., Banbury, England).

### Starch preparation and modifications

Livingstone potato (*Plectranthus esculentus*) was obtained from the Vom area of Plateau State, Nigeria. The tubers were identified in the herbarium of the Department of Biological Sciences, Ahmadu Bello University, Zaria, Nigeria, (Voucher number 28448). Starch extraction was done by wet milling technique, and starch modifications were performed by three methods, namely pregelatinization, ethanol dehydrated pregelatinization, and acid hydrolysis [3]. The modified starches were labeled accordingly as pregelatinized starch (PS), ethanol dehydrated pregelatinized starch (ES), and acid hydrolyzed starch (AS), respectively. Microcrystalline cellulose (Avicel® PH101) was adopted as a standard for comparison which is used commonly in direct compression tablet formulations.

### Tablet compaction studies

Powder samples of the modified starches were made into compacts by compressing 500 mg using a 10.5-mm die and flat-faced punches on an Apex hydraulic hand press (184 models, Apex Construction LTD, London). Varied pressure (28–170 MNm^{−2}) was used with a 30 s dwell time. The tablet compacts were kept in a desiccator filled with silica for 1 day to enable elastic recovery and hardening and also to prevent low yield values. The tablet properties thickness, diameter, and weights (*W*) were then determined. The relative densities (*D*) of the tablets were then calculated according to Eq. 1 below [3]:

$$ D=W/{V}_{\rho s} $$

(1)

where *V* is the tablet volume (cm^{3}) and ρs is the particle density (g/cm^{3}) of the compact material. The Heckle plots [ln (*1/1-D*) versus the compression pressure *P* (MNm^{−2})] and Kawakita plots of applied pressure (*P*) divided by the degree of volume reduction (*C*) [*P/C* versus *P*] were generated [3]. Also, the compressibility indices of the materials were obtained from the plot of compact density (g/cm^{3}) against the log of compaction pressure. Bulk, tapped and true densities were generated according to the method described by Khalid et al. [3].

### Proposed methodology for AI modeling

In this work, various data-driven approaches were proposed separately for modeling the performance of these novel excipients. The primary data of this study were collected from our experimental results. Furthermore, two performances of these excipients were determined as the output variables, i.e., tablet density and degree of volume reduction. The work employed the applications of drug/excipient ratio (*D*/*E* ratio), friability (%), crushing strength (*C*/strength), compression pressure (*C*/pressure), and log of compression pressure (log *C*/pressure) as the input variables for the corresponding out parameters of the modified starches: PS, ES, and AS, with Avicel® 101 as standard for comparison. Therefore, this work proposes the development of three different data-driven models, which include two non-linear models, namely ANN (the most widely used data-driven model) and ANFIS (as the hybrid learning algorithm), and a traditional linear regression model (MLR, which is the most commonly used linear model). The main aim of employing different data intelligence algorithms is to understand the nature and behavior of the models towards different data sets, which in turn makes it difficult for modelers to select a specific model while simulating a certain data set. The complexity issue can be overcome by choosing various models, which include the linear data-driven algorithms despite their weakness towards handling complex non-linear data. Regarding the implementation of this work, the models were evaluated by applying various performance indices.

### Artificial neural networks (ANN)

ANNs are generally new computerized tools that have broad uses in resolving many complicated real-world problems. The attraction of ANNs originates from their outstanding information processing traits related mostly to nonlinearity, fault and noise tolerance, learning, and generalized abilities [17]. ANNs are also referred to as neural networks (NNs) or connection model. It is an algorithmic numerical model that mimics the behavior characteristics of the biological brain neural network and performs distributed mode and data processing.

### Adaptive neuro-fuzzy inference system (ANFIS)

Nevertheless, ANNs tools are one of the broadly use AI-based models which are motivated by copying the brain of human beings, as a result of its resilience of mimicking with a high complex connection between the input and output models of the data collections [18].

ANFIS has been demonstrated to be a successful software that incorporates the approach of the fuzzy Sugeno model that benefits from both fuzzy logic and ANN in one system. ANFIS has been recently used in predicting and modeling complex datasets [15]. ANFIS is also a real-world estimator because of its capacity to approximate real functions. In practice, several membership functions (MF) are used including trapezoidal, triangular, sigmoid, and Gaussian, although the Gaussian function is the most frequent MF [19].

Assume the FIS contains two inputs “*x*” and “*y*” and one output “f,” a first-order Sugeno fuzzy has the following rules:

$$ \mathrm{Rule}\ 1:\mathrm{if}\kern0.5em \upmu \left(\mathrm{x}\right)\ \mathrm{is}\kern0.5em {\mathrm{A}}_1\kern0.5em \mathrm{and}\kern0.5em \upmu \left(\mathrm{y}\right)\kern0.5em \mathrm{is}{\mathrm{B}}_1\kern0.5em \mathrm{then}\kern0.5em {\mathrm{f}}_1={\mathrm{p}}_1\mathrm{x}+{\mathrm{q}}_1\mathrm{y}+{\mathrm{r}}_1 $$

(2)

$$ \mathrm{Rule}\ 2:\mathrm{if}\kern0.5em \upmu \left(\mathrm{x}\right)\ \mathrm{is}\kern0.5em {\mathrm{A}}_2\kern0.5em \mathrm{and}\kern0.5em \upmu \left(\mathrm{y}\right)\ \mathrm{is}\kern0.5em {\mathrm{B}}_2\mathrm{then}\kern0.5em {\mathrm{f}}_2={\mathrm{p}}_2\mathrm{x}+{\mathrm{q}}_2\mathrm{y}+{\mathrm{r}}_2 $$

(3)

*A*_{1}, *B*_{1}, *A*_{2}, *B*_{2} Parameters are membership functions for *x* and *y* inputs.

*p*_{1}, *q*_{1}, *r*_{1,}*p*_{2}, *q*_{2}, *r*_{2,} are outlet function parameters. The structure and formulation of ANFIS follow a five-layer neural network arrangement.

### Multi-linear regression (MLR)

Regression is generally categorized into two major domains of simple and multiple linear regression; each one can be applied according to the purpose of the simulation. For example, if we aim to estimate a linear regression, which exists between a single input and single output, such a model is known as a simple linear regression (SLR). Furthermore, if we want to simulate the linear relation between a single output and multiple input parameters, it is called a multiple linear regression (MLR) [20]. Usually, MLR is the linear regression type that is generally used, and it involves analysis such that each parameter of the inputs is correlated with the output parameter [21]. Generally, MLR consists of estimating the rate of the relationship that exists between each parameter, i.e., between the output and two or more input parameters [22]. The entire expression of MLR is shown in Eq. (4).

$$ Y=b0+b1+b2x2+\dots b1x1 $$

(4)

where *x*1 is the value of the predictor, *b*0 is the regression constant, and *b*1 stands for the coefficient of the predictor.

### Evaluation criteria and validation method for data-driven models

Usually, for any form of the data-driven algorithm, the performances of the models are evaluated using various performance indices by comparing the simulated and experimental values. In this work, the determination coefficient of (*R*^{2}) and correlation coefficient (*R*) as the goodness of fit and two statistical errors, root-mean-square error (RMSE) and mean- square error (MSE), were used for the evaluation of the models:

$$ {R}^2=1-\frac{\sum \frac{N}{j}=1{\left[(Y)\mathrm{obs}.j-(Y)\mathrm{com}.j\right]}^2}{\sum \frac{N}{j}=1{\left[(Y)\mathrm{obs}.j-\left(\overline{Y}\right)\mathrm{com}.j\right]}^2} $$

(5)

$$ R=\frac{\sum \frac{N}{i}=1\left({Y}_{\mathrm{obs}}-{\overline{Y}}_{\mathrm{obs}}\right)\left({Y}_{\mathrm{com}}-{\overline{Y}}_{\mathrm{com}}\right)}{\sqrt{\sum \frac{N}{i}=1{\left({Y}_{\mathrm{obs}}-{\overline{Y}}_{\mathrm{obs}}\right)}^2}\sum \frac{N}{i}=1{\left({Y}_{\mathrm{com}}-{\overline{Y}}_{\mathrm{com}}\right)}^2} $$

(6)

$$ \mathrm{RMSE}=\sqrt{\frac{\sum \frac{N}{i}=1{\left({Y}_{\mathrm{obs}i}-{Y}_{\mathrm{com}i}\right)}^2}{N}} $$

(7)

$$ \mathrm{MSE}=\frac{1}{N}\sum \frac{N}{i}=1{\left({Y}_{\mathrm{obs}i}-{Y}_{\mathrm{com}i}\right)}^2 $$

(8)

where *N*, *Y*_{obsi}, \( \overline{Y} \), and *Y*_{comi} *a*re the data number, observed data, the average value of the observed data, and computed values, respectively.

For the validation technique, different types of validation methods can be applied such as cross-validation (i.e., *k*-fold cross-validation), holdout, and leave one out. In this work, the *k*-fold cross-validation is used, which is regarded as the process employed in order to reduce the problems of overfitting [22]. In this technique, the initial data set is categorized into same-sized subsets of *k* [23].