| Title: | Partial Least Squares and Chemometrics Engine |
|---|---|
| Description: | A portable Partial Least Squares engine for chemometrics: the slim, PLS-focused distribution carved from the 'nirs4all-methods' library. It ships every method built on the shared PLS core (NIPALS, SIMPLS, SVD, kernel, wide-kernel, orthogonal-scores, power, randomized SVD, PCR): regression variants (sparse SIMPLS, CPPLS, weighted, robust, ridge, continuum, multi-block, GLM, MIR), adaptive AOM-PLS / POP-PLS operator selection, variable-selection methods (SPA, CARS, GA, random frog, stability selection, VIP), PLS diagnostics (Hotelling T2, Q residuals, DModX), and calibration transfer (PDS, DS). The spectroscopy-specific surface (spectral preprocessing, augmentation, sample filters, signal-type detection) lives in the full 'nirs4all-methods' distribution. The same C++17 numerical core powers both; here it is vendored and compiled from source at install time, with no external system libraries required. |
| Authors: | Gregory Beurier [aut, cre], pls4all contributors [ctb] |
| Maintainer: | Gregory Beurier <[email protected]> |
| License: | CeCILL (== 2.1) |
| Version: | 0.99.0 |
| Built: | 2026-06-12 07:27:45 UTC |
| Source: | https://github.com/GBeurier/nirs4all-methods |
aom_pls() (alias aompls()) runs global Adaptive Operator
Mixture PLS selection:
one preprocessing operator is selected for the whole PLS model.
pop_pls() (alias poppls()) runs per-component operator
selection, where each retained PLS component may use a different
preprocessing operator.
aom_pls(X, Y, max_components = 3L, n_operators = 9L, cv = 3L) aompls(X, Y, max_components = 3L, n_operators = 9L, cv = 3L) pop_pls(X, Y, max_components = 3L, n_operators = 9L, cv = 3L) poppls(X, Y, max_components = 3L, n_operators = 9L, cv = 3L)aom_pls(X, Y, max_components = 3L, n_operators = 9L, cv = 3L) aompls(X, Y, max_components = 3L, n_operators = 9L, cv = 3L) pop_pls(X, Y, max_components = 3L, n_operators = 9L, cv = 3L) poppls(X, Y, max_components = 3L, n_operators = 9L, cv = 3L)
X |
Numeric matrix of spectra, with rows as samples and columns as wavelengths. |
Y |
Numeric vector or matrix of responses, with one row per sample. |
max_components |
Maximum number of latent PLS components. |
n_operators |
Number of compact AOM bank operators to expose. |
cv |
Number of contiguous cross-validation folds. |
The operator bank mirrors the compact public nirs4all AOM bank:
identity, Savitzky-Golay smoothers, Savitzky-Golay derivatives, polynomial
detrending, and finite difference. The implementation is provided by the
native n4m AOM selector ABI.
A named list. Both functions return predictions, operator_kinds,
component-selection diagnostics, cross-validation scores, and selected
component metadata.
set.seed(1) X <- matrix(rnorm(40 * 20), nrow = 40) Y <- as.numeric(X[, 1] + rnorm(40, sd = 0.1)) fit <- aom_pls(X, Y, max_components = 2L) dim(fit$predictions) fit2 <- pop_pls(X, Y, max_components = 2L) dim(fit2$predictions)set.seed(1) X <- matrix(rnorm(40 * 20), nrow = 40) Y <- as.numeric(X[, 1] + rnorm(40, sd = 0.1)) fit <- aom_pls(X, Y, max_components = 2L) dim(fit$predictions) fit2 <- pop_pls(X, Y, max_components = 2L) dim(fit2$predictions)
Fits the AOM preprocessing pipeline (operator bank + gating) over 'X' and returns a 'n4m_method_fit' object carrying the selected operators, the per-component gating weights, and the transformed spectra ready to feed into a downstream regression solver.
aom_preprocess(X, Y = NULL, n_operators = 1L, gating_mode = 0L)aom_preprocess(X, Y = NULL, n_operators = 1L, gating_mode = 0L)
X |
numeric matrix of spectra (rows = samples, cols = wavelengths). |
Y |
optional numeric vector of supervisory targets. When 'NULL', the unsupervised gating path is used (a zero target vector is substituted internally). |
n_operators |
number of operators in the AOM bank (default '1L'). |
gating_mode |
integer code selecting the gating strategy: '0L' = hard-select per component, '1L' = soft-mixture (default '0L'). |
A 'n4m_method_fit' object. Use 'predict()' for inference on new spectra and 'coef()' to extract the gating coefficients.
set.seed(1) X <- matrix(rnorm(40 * 10), nrow = 40) Y <- as.numeric(X[, 1] + rnorm(40, sd = 0.1)) fit <- aom_preprocess(X, Y) class(fit)set.seed(1) X <- matrix(rnorm(40 * 10), nrow = 40) Y <- as.numeric(X[, 1] + rnorm(40, sd = 0.1)) fit <- aom_preprocess(X, Y) class(fit)
For each component count k in 1..max_components, fits SIMPLS and approximates PRESS via leverage-inflated in-sample residuals.
approximate_press(X, Y, max_components)approximate_press(X, Y, max_components)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
max_components |
Integer; maximum number of components to scan. |
A list with 'press_per_component', 'rmse_per_component', 'selected_n_components' (the argmin of PRESS, 1-based as an integer length 1).
Bagging PLS — formula entry point.
bagging_pls( formula, data, ncomp = 2L, n_estimators = 50L, seed = 0L, na.action = stats::na.omit )bagging_pls( formula, data, ncomp = 2L, n_estimators = 50L, seed = 0L, na.action = stats::na.omit )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
n_estimators |
Integer >= 1. Number of bootstrap / boosting / random-subspace estimators. |
seed |
Integer. Random seed for reproducibility. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
Bagging PLS (bootstrap aggregation of PLS regressors).
bagging_pls_fit(X, Y, n_components, n_estimators = 50L, seed = 0L)bagging_pls_fit(X, Y, n_components, n_estimators = 50L, seed = 0L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_estimators |
Integer >= 1. Number of bootstrap / boosting / random-subspace estimators. |
seed |
Integer. Random seed for reproducibility. |
biPLS — backward interval PLS.
bipls_select(X, Y, n_components, interval_width = 10L, min_intervals = 1L)bipls_select(X, Y, n_components, interval_width = 10L, min_intervals = 1L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
interval_width |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
min_intervals |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Boosting PLS — formula entry point.
boosting_pls( formula, data, ncomp = 2L, n_estimators = 50L, learning_rate = 0.1, na.action = stats::na.omit )boosting_pls( formula, data, ncomp = 2L, n_estimators = 50L, learning_rate = 0.1, na.action = stats::na.omit )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
n_estimators |
Integer >= 1. Number of bootstrap / boosting / random-subspace estimators. |
learning_rate |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
Boosting PLS (stage-wise refit with learning_rate).
boosting_pls_fit(X, Y, n_components, n_estimators = 50L, learning_rate = 0.1)boosting_pls_fit(X, Y, n_components, n_estimators = 50L, learning_rate = 0.1)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_estimators |
Integer >= 1. Number of bootstrap / boosting / random-subspace estimators. |
learning_rate |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
BVE-PLS.
bve_select(X, Y, n_components, n_steps = 10L, min_features = 5L)bve_select(X, Y, n_components, n_steps = 10L, min_features = 5L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_steps |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
min_features |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Fit-data selector with a built-in 5-fold validation plan (default C-side fallback). For a custom plan, use the lower-level Python binding or extend r_methods.c.
cars_select(X, Y, n_components, n_iterations = 50L, min_features = 5L)cars_select(X, Y, n_components, n_iterations = 50L, min_features = 5L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_iterations |
Number of CARS iterations (typical 50-100). |
min_features |
Lower bound on the final subset size. |
A list with 'selected_indices' (1-based) and 'best_rmse'.
Returns the '(p x q)' coefficient matrix ('p' predictors by 'q' targets) read from the fitted libn4m model via the 'n4m_model_get_array' C ABI (tag 'N4M_MODEL_COEFFICIENTS'). Rows are named after the predictors when their names are available from the model terms.
## S3 method for class 'n4m_fit' coef(object, ...)## S3 method for class 'n4m_fit' coef(object, ...)
object |
A 'n4m_fit' returned by [pls()]. |
... |
Ignored (for S3 generic compatibility). |
A numeric 'p x q' matrix of regression coefficients.
Coefficient-magnitude ranker.
coefficient_select(model, X, top_k)coefficient_select(model, X, top_k)
model |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
X |
Numeric matrix used for the fit (re-passed for diagnostics). |
top_k |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
A list with 'scores' (|coef| sums) and 'selected_indices'.
Continuum regression — formula entry point.
continuum_regression( formula, data, ncomp = 2L, tau = 0.5, na.action = stats::na.omit )continuum_regression( formula, data, ncomp = 2L, tau = 0.5, na.action = stats::na.omit )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
tau |
Numeric in [0, 1]. Continuum regression mixing parameter. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
Continuum regression (tau in [0, 1]).
continuum_regression_fit(X, Y, n_components, tau = 0.5)continuum_regression_fit(X, Y, n_components, tau = 0.5)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
tau |
Numeric in [0, 1]. Continuum regression mixing parameter. |
Canonical Powered PLS — formula entry point.
cppls(formula, data, ncomp = 2L, gamma = 0.5, na.action = stats::na.omit)cppls(formula, data, ncomp = 2L, gamma = 0.5, na.action = stats::na.omit)
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
gamma |
Numeric. CPPLS / kernel band parameter. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
Canonical Powered PLS fit (Indahl 2005).
cppls_fit(X, Y, n_components, gamma = 0.5)cppls_fit(X, Y, n_components, gamma = 0.5)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
gamma |
Power exponent in [0, 1]. 0 recovers SIMPLS, 1 is fully power-rescaled. |
A list with 'coefficients', 'predictions', 'x_mean', 'y_mean', 'rmse'.
Domain-invariant PLS – formula entry point.
di_pls( formula, data, ncomp = 2L, X_target, di_lambda = 1, na.action = stats::na.omit )di_pls( formula, data, ncomp = 2L, X_target, di_lambda = 1, na.action = stats::na.omit )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
X_target |
Numeric matrix for the target domain. |
di_lambda |
Numeric DI-PLS penalty. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model and in-sample predictions.
Domain-Invariant PLS (Nikzad-Langerodi 2018).
di_pls_fit(X_source, Y_source, n_components, X_target, di_lambda = 1)di_pls_fit(X_source, Y_source, n_components, X_target, di_lambda = 1)
X_source |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Y_source |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
n_components |
Integer. Number of latent components. |
X_target |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
di_lambda |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Direct Standardization (calibration transfer).
ds_fit(X_source, X_target)ds_fit(X_source, X_target)
X_source |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
X_target |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Elastic Component Regression — formula entry point.
ecr(formula, data, ncomp = 2L, alpha = 0.5, na.action = stats::na.omit)ecr(formula, data, ncomp = 2L, alpha = 0.5, na.action = stats::na.omit)
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
alpha |
Numeric in [0, 1]. Elastic-net / penalty mixing parameter. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
Elastic Component Regression (Liu 2009/2010).
ecr_fit(X, Y, n_components, alpha = 0.5)ecr_fit(X, Y, n_components, alpha = 0.5)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
alpha |
Numeric in [0, 1]. Elastic-net / penalty mixing parameter. |
EMCUVE — ensemble Monte Carlo UVE.
emcuve_select( X, Y, n_components, noise_features = NULL, noise_seed = 0L, n_ensembles = 5L, vote_threshold = 0.5 )emcuve_select( X, Y, n_components, noise_features = NULL, noise_seed = 0L, n_ensembles = 5L, vote_threshold = 0.5 )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
noise_features |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
noise_seed |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
n_ensembles |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
vote_threshold |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Fused-sparse PLS (L1 + adjacent-coef smoothing).
fused_sparse_pls_fit( X, Y, n_components, l1_lambda = 0.05, fusion_lambda = 0.05 )fused_sparse_pls_fit( X, Y, n_components, l1_lambda = 0.05, fusion_lambda = 0.05 )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
l1_lambda |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
fusion_lambda |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
GA-PLS — genetic algorithm variable selection.
ga_select( X, Y, n_components, n_generations = 50L, population_size = 50L, min_features = NULL, max_features = NULL, mutation_rate = 0.01, seed = 0L )ga_select( X, Y, n_components, n_generations = 50L, population_size = 50L, min_features = NULL, max_features = NULL, mutation_rate = 0.01, seed = 0L )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_generations |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
population_size |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
min_features |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
max_features |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
mutation_rate |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
seed |
Integer. Random seed for reproducibility. |
Gaussian Process Regression on PLS scores (single-target Y).
gpr_pls_fit( X, Y, n_components, length_scale = 1, noise_level = 1e-04, seed = 0L )gpr_pls_fit( X, Y, n_components, length_scale = 1, noise_level = 1e-04, seed = 0L )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
length_scale |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
noise_level |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
seed |
Integer. Random seed for reproducibility. |
Group-sparse PLS (group L1 across feature groups).
group_sparse_pls_fit(X, Y, n_components, group_assignment, group_lambda = 0.05)group_sparse_pls_fit(X, Y, n_components, group_assignment, group_lambda = 0.05)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
group_assignment |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
group_lambda |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Interval selector (iPLS).
interval_select(X, Y, n_components, interval_width = 10L, step = 1L)interval_select(X, Y, n_components, interval_width = 10L, step = 1L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
interval_width |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
step |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
IPW-PLS.
ipw_select( X, Y, n_components, n_iterations = 10L, top_k = 10L, damping = 0.5, weight_floor = 1e-06 )ipw_select( X, Y, n_components, n_iterations = 10L, top_k = 10L, damping = 0.5, weight_floor = 1e-06 )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_iterations |
Integer >= 1. Number of outer-loop iterations. |
top_k |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
damping |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
weight_floor |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
IRF — Interval Random Frog.
irf_select( X, Y, n_components, n_iterations = 100L, window_size = 10L, initial_intervals = 10L, top_k = 5L, seed = 0L )irf_select( X, Y, n_components, n_iterations = 100L, window_size = 10L, initial_intervals = 10L, top_k = 5L, seed = 0L )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_iterations |
Integer >= 1. Number of outer-loop iterations. |
window_size |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
initial_intervals |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
top_k |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
seed |
Integer. Random seed for reproducibility. |
IRIV — Iteratively Retains Informative Variables.
iriv_select(X, Y, n_components, max_rounds = 20L, seed = 0L)iriv_select(X, Y, n_components, max_rounds = 20L, seed = 0L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
max_rounds |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
seed |
Integer. Random seed for reproducibility. |
Non-linear kernel PLS (Rosipal & Trejo 2001).
kernel_pls_fit( X, Y, n_components, kernel_type = 1L, gamma = 0, coef0 = 1, degree = 3L )kernel_pls_fit( X, Y, n_components, kernel_type = 1L, gamma = 0, coef0 = 1, degree = 3L )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
kernel_type |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
gamma |
Numeric. CPPLS / kernel band parameter. |
coef0 |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
degree |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Locally-weighted PLS (Næs & Centner 1998).
lw_pls_fit(X, Y, n_components, n_neighbors = 30L)lw_pls_fit(X, Y, n_components, n_neighbors = 30L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_neighbors |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Multi-block PLS — formula entry point.
mb_pls(formula, data, ncomp = 2L, block_sizes, na.action = stats::na.omit)mb_pls(formula, data, ncomp = 2L, block_sizes, na.action = stats::na.omit)
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
block_sizes |
Integer vector summing to the number of predictors. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
Multi-block PLS (block-weighted SIMPLS).
mb_pls_fit(X, Y, n_components, block_sizes)mb_pls_fit(X, Y, n_components, block_sizes)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
block_sizes |
Integer vector; must sum to ncol(X). |
A list with 'coefficients', 'predictions', 'x_mean', 'x_scale', 'intercept', 'block_weights', 'rmse'.
MIR-PLS — formula entry point.
mir_pls(formula, data, ncomp = 2L, na.action = stats::na.omit)mir_pls(formula, data, ncomp = 2L, na.action = stats::na.omit)
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
MIR-PLS — Multivariate Inverse Regression PLS.
mir_pls_fit(X, Y, n_components)mir_pls_fit(X, Y, n_components)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
A list with 'coefficients', 'predictions', 'x_mean', 'y_mean', 'rmse'.
Missing-aware NIPALS — formula entry point.
missing_aware_nipals(formula, data, ncomp = 2L, na.action = stats::na.pass)missing_aware_nipals(formula, data, ncomp = 2L, na.action = stats::na.pass)
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
Missing-aware NIPALS PLS (Nelson 1996).
missing_aware_nipals_fit(X, Y, n_components)missing_aware_nipals_fit(X, Y, n_components)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
N-PLS (3-way tensor) regression. 'X_flat' is the flattened (n, mode_j*mode_k) matrix.
n_pls_fit(X_flat, Y, n_components, mode_j, mode_k)n_pls_fit(X_flat, Y, n_components, mode_j, mode_k)
X_flat |
Numeric matrix. The flattened 3-way design tensor (rows = samples, cols = mode_j * mode_k). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
mode_j |
Integer. Size of the first non-sample tensor mode. |
mode_k |
Integer. Size of the second non-sample tensor mode. |
Loaded ABI version as an integer vector (major, minor, patch).
n4m_abi_version()n4m_abi_version()
An integer vector of length 3.
Accepts a numeric matrix X (n x p) and a numeric vector or matrix Y (n x q). Both are coerced to double precision and row-major contiguous before being passed across the C boundary.
n4m_fit( X, Y, algo, n_components, store_scores = FALSE, center_x = TRUE, scale_x = TRUE, center_y = TRUE, scale_y = TRUE )n4m_fit( X, Y, algo, n_components, store_scores = FALSE, center_x = TRUE, scale_x = TRUE, center_y = TRUE, scale_y = TRUE )
X |
Numeric matrix, n x p. |
Y |
Numeric matrix or vector. |
algo |
Character. Solver name (see Details). |
n_components |
Integer >= 1. |
store_scores |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
center_x |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
scale_x |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
center_y |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
scale_y |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
'algo' selects the solver. Recognized values: "pls_nipals", "pls_orthogonal_scores", "pls_simpls", "pls_kernel_algorithm", "pls_wide_kernel", "pls_svd", "pls_power", "pls_randomized_svd", "pcr_svd", "opls_nipals".
An external pointer wrapping the fitted model handle. Pass it to [n4m_predict()] to obtain predictions. The model is freed automatically when the external pointer is garbage-collected.
Low-level n4m method dispatcher.
n4m_method( algo, X, Y, n_components, params = list(), center_x = TRUE, scale_x = FALSE, center_y = TRUE, scale_y = FALSE )n4m_method( algo, X, Y, n_components, params = list(), center_x = TRUE, scale_x = FALSE, center_y = TRUE, scale_y = FALSE )
algo |
Character; algorithm name (see Details). |
X |
Numeric matrix or NULL (for one_se_rule_compute). |
Y |
Numeric matrix or vector. Pass an n x 1 placeholder for classifier-style fits where labels go in 'params$y_labels'. |
n_components |
Positive integer. |
params |
Named list of algorithm-specific parameters (sparsity_lambda, sample_weights, block_sizes, X_target, y_labels, alpha_thresholds, ...). |
center_x, scale_x, center_y, scale_y
|
Boolean preprocessing flags (default centering, no scaling — matches the Python tier-1 defaults). |
Supported algorithm names:
MethodResult fits (33): "sparse_simpls" "cppls" "ecr" "di_pls" "weighted_pls" "robust_pls" "ridge_pls" "continuum_regression" "recursive_pls" "n_pls" "kernel_pls" "o2pls" "sparse_pls_da" "group_sparse_pls" "fused_sparse_pls" "so_pls" "on_pls" "rosa" "bagging_pls" "boosting_pls" "random_subspace_pls" "gpr_pls" "pls_glm" "pls_qda" "pls_cox" "pds" "ds" "mir_pls" "missing_aware_nipals" "mb_pls" "lw_pls" "pls_lda" "pls_logistic"
Selectors (24): "spa_select" "cars_select" "interval_select" "stability_select" "uve_select" "random_frog_select" "scars_select" "ga_select" "pso_select" "vissa_select" "shaving_select" "bve_select" "t2_select" "wvc_select" "wvc_threshold_select" "emcuve_select" "randomization_select" "bipls_select" "sipls_select" "rep_select" "ipw_select" "st_select" "iriv_select" "irf_select" "vip_spa_select"
Diagnostics (2 via dispatcher; the other 2 stay in r_methods.c): "approximate_press_compute" "one_se_rule_compute"
A named list with the MethodResult arrays + scalars. Index fields ('selected_indices', 'top_k_intervals', ...) are 1-based.
Predict with a fitted n4m model.
n4m_predict(model, X)n4m_predict(model, X)
model |
External pointer returned by [n4m_fit()]. |
X |
Numeric matrix, n_new x p. |
Numeric matrix, n_new x n_targets.
Runtime version string of the loaded libn4m.
n4m_version()n4m_version()
A character scalar like "0.67.0+abi.1.1.0".
O2-PLS — formula entry point (uses n_predictive for component count).
o2pls( formula, data, n_predictive = 2L, n_x_orthogonal = 1L, n_y_orthogonal = 1L, na.action = stats::na.omit )o2pls( formula, data, n_predictive = 2L, n_x_orthogonal = 1L, n_y_orthogonal = 1L, na.action = stats::na.omit )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
n_predictive |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
n_x_orthogonal |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
n_y_orthogonal |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
O2-PLS (bi-directional OPLS).
o2pls_fit(X, Y, n_predictive = 2L, n_x_orthogonal = 1L, n_y_orthogonal = 1L)o2pls_fit(X, Y, n_predictive = 2L, n_x_orthogonal = 1L, n_y_orthogonal = 1L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_predictive |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
n_x_orthogonal |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
n_y_orthogonal |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
OnPLS — Orthogonal multi-block PLS (joint + unique loadings).
on_pls_fit(X, Y, n_joint, n_unique_per_block, block_sizes)on_pls_fit(X, Y, n_joint, n_unique_per_block, block_sizes)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_joint |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
n_unique_per_block |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
block_sizes |
Integer vector. Per-block feature counts for multi-block PLS. |
One-SE rule from a (max_components × n_folds) fold RMSE matrix.
one_se_rule(fold_rmse_matrix)one_se_rule(fold_rmse_matrix)
fold_rmse_matrix |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Uses the same formula/S3 return contract as [pls()], with the model API configured as 'algo = "opls_nipals"'.
opls(formula, data, ncomp = 2L, na.action = stats::na.omit)opls(formula, data, ncomp = 2L, na.action = stats::na.omit)
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
An object of class 'c("opls_fit", "n4m_fit", "pls_fit")'.
Piecewise Direct Standardization (calibration transfer).
pds_fit(X_source, X_target, window_half_width = 2L)pds_fit(X_source, X_target, window_half_width = 2L)
X_source |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
X_target |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
window_half_width |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Tier-2 idiomatic R interface for tier-1 [n4m_fit()] / [n4m_predict()]. Provides a formula entry point and S3 generics so the returned object plays well with 'predict()', 'summary()', 'coef()', 'print()', base R 'cv.glmnet'-style workflows, and any caret model that supports a 'predict(object, newdata)' adapter.
When the first argument is not a formula, 'pls(x, y, ...)' dispatches to the matrix-oriented [pls_mdatools()] compatibility facade.
pls( formula, data, ncomp = 2L, algo = "pls_nipals", na.action = stats::na.omit, ... )pls( formula, data, ncomp = 2L, algo = "pls_nipals", na.action = stats::na.omit, ... )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
algo |
Character. One of '"pls_nipals"', '"pls_simpls"', '"pls_svd"', '"pls_power"', '"pls_kernel_algorithm"', '"pls_wide_kernel"', '"pls_orthogonal_scores"', '"pls_randomized_svd"', '"pcr_svd"', '"opls_nipals"'. Defaults to '"pls_nipals"' to match the R ‘pls' package’s default. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
... |
For matrix-style calls, forwarded to [pls_mdatools()]. |
An object of class 'c("n4m_fit", "pls_fit")' with components: * 'model' — external pointer to the libn4m model handle * ‘formula' — the call’s formula * 'terms' — the 'terms()' object describing the model * 'xlevels' — factor levels of the predictors, for newdata coercion * 'call' — the original call (for 'print' / 'summary') * 'ncomp' — components used * 'algo' — solver used * 'n_features_in' — predictor count * 'response_name' — left-hand side of the formula (string)
## Not run: set.seed(0) df <- data.frame( x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100) ) df$y <- 2 * df$x1 - df$x2 + rnorm(100, sd = 0.1) fit <- pls(y ~ ., data = df, ncomp = 3) summary(fit) preds <- predict(fit, newdata = df) ## End(Not run)## Not run: set.seed(0) df <- data.frame( x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100) ) df$y <- 2 * df$x1 - df$x2 + rnorm(100, sd = 0.1) fit <- pls(y ~ ., data = df, ncomp = 3) summary(fit) preds <- predict(fit, newdata = df) ## End(Not run)
PLS-Cox proportional hazards.
pls_cox_fit(X, n_components, survival_times, event_indicators)pls_cox_fit(X, n_components, survival_times, event_indicators)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
n_components |
Integer. Number of latent components. |
survival_times |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
event_indicators |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
PLS diagnostics: T², Q, DModX from a fitted model.
pls_diagnostics(model, X)pls_diagnostics(model, X)
model |
External pointer from 'n4m_fit()'. |
X |
Numeric matrix to score (typically the training matrix). |
A list with 't2', 'q', 'dmodx' — each is a 1-row matrix of length nrow(X).
PLS-GLM — formula entry point. Default is Gaussian; set 'family = "poisson"' for Poisson IRLS.
pls_glm( formula, data, ncomp = 2L, family = "gaussian", na.action = stats::na.omit )pls_glm( formula, data, ncomp = 2L, family = "gaussian", na.action = stats::na.omit )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
family |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
PLS-GLM — Gaussian (default) or Poisson IRLS.
pls_glm_fit(X, Y, n_components, poisson = FALSE)pls_glm_fit(X, Y, n_components, poisson = FALSE)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
poisson |
Logical; TRUE selects the Poisson-link path. |
A list with 'coefficients', 'intercept', 'predictions', 'x_mean', 'rmse'.
PLS-LDA — Linear Discriminant Analysis on PLS scores.
pls_lda_fit(X, y_labels, n_components, n_classes = NULL)pls_lda_fit(X, y_labels, n_components, n_classes = NULL)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
y_labels |
Integer vector. Class labels. |
n_components |
Integer. Number of latent components. |
n_classes |
Integer >= 2. Number of class labels. |
Multinomial logistic regression on PLS scores.
pls_logistic_fit(X, y_labels, n_components, n_classes = NULL)pls_logistic_fit(X, y_labels, n_components, n_classes = NULL)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
y_labels |
Integer vector. Class labels. |
n_components |
Integer. Number of latent components. |
n_classes |
Integer >= 2. Number of class labels. |
Matrix-oriented PLS facade modelled after mdatools::pls(x, y, ...)
for NIRS/chemometrics workflows. The top-level pls(x, y, ...)
function dispatches here when its first argument is not a formula.
pls_mdatools( x, y, ncomp = min(nrow(x) - 1L, ncol(x), 20L), center = TRUE, scale = FALSE, cv = NULL, exclcols = NULL, exclrows = NULL, x.test = NULL, y.test = NULL, method = c("simpls", "cppls", "kernelpls", "widekernelpls", "oscorespls", "nipals", "pcr"), info = "", ncomp.selcrit = "min", lim.type = "ddmoments", alpha = 0.05, gamma = 0.5, cv.scope = "local", prep = NULL, fit_components = TRUE, ... )pls_mdatools( x, y, ncomp = min(nrow(x) - 1L, ncol(x), 20L), center = TRUE, scale = FALSE, cv = NULL, exclcols = NULL, exclrows = NULL, x.test = NULL, y.test = NULL, method = c("simpls", "cppls", "kernelpls", "widekernelpls", "oscorespls", "nipals", "pcr"), info = "", ncomp.selcrit = "min", lim.type = "ddmoments", alpha = 0.05, gamma = 0.5, cv.scope = "local", prep = NULL, fit_components = TRUE, ... )
x |
Numeric predictor matrix. |
y |
Numeric response vector or matrix. |
ncomp |
Maximum number of components. |
center |
Logical; center predictors and response. |
scale |
Logical; standardize predictors and response. |
cv |
|
exclcols |
Optional predictor columns to exclude. |
exclrows |
Optional rows to exclude before fitting. |
x.test |
Optional external test predictor matrix. |
y.test |
Optional external test response vector or matrix. |
method |
PLS/PCR algorithm selector. |
info |
Free-form model label. |
ncomp.selcrit |
Stored for compatibility. |
lim.type |
Stored for compatibility. |
alpha |
Stored for compatibility. |
gamma |
CPPLS gamma when |
cv.scope |
Stored for compatibility. |
prep |
Optional preprocessing function applied to |
fit_components |
Logical; fit component prefixes 1:ncomp. |
... |
Reserved for compatibility. |
A n4m_mdatools_pls object with calibration results in
calres; optional cross-validation and test results are stored in
cvres and testres.
## Not run: set.seed(1) X <- matrix(rnorm(80), 20, 4) y <- X[, 1] - X[, 2] + rnorm(20, sd = 0.05) fit <- pls_mdatools(X, y, ncomp = 3, cv = 5) predict(fit, x = X) fit$calres$rmse fit$cvres$rmse ## End(Not run)## Not run: set.seed(1) X <- matrix(rnorm(80), 20, 4) y <- X[, 1] - X[, 2] + rnorm(20, sd = 0.05) fit <- pls_mdatools(X, y, ncomp = 3, cv = 5) predict(fit, x = X) fit$calres$rmse fit$cvres$rmse ## End(Not run)
Computes phase-1 thresholds on 'X_reference', then evaluates 'X_monitor' against those thresholds and reports per-row alarms.
pls_monitoring(model, X_reference, X_monitor, alpha = 0.95)pls_monitoring(model, X_reference, X_monitor, alpha = 0.95)
model |
External pointer from 'n4m_fit()'. |
X_reference |
Phase-1 numeric matrix (used to set thresholds). |
X_monitor |
Phase-2 numeric matrix (rows are evaluated). |
alpha |
Confidence level (e.g. 0.95). Thresholds correspond to the (1 - alpha) quantile. |
A list with 't2', 'q', 't2_alarms', 'q_alarms', 'any_alarms', 't2_threshold', 'q_threshold'.
PLS-QDA (Quadratic Discriminant Analysis on PLS scores).
pls_qda_fit(X, y_labels, n_components)pls_qda_fit(X, y_labels, n_components)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
y_labels |
Integer vector. Class labels. |
n_components |
Integer. Number of latent components. |
These functions provide the familiar formula-oriented surface of the
R pls package without importing it. Computation is backed by
n4m; the facade is intended for NIRS/chemometrics scripts
that already use plsr(), pcr(), predict(),
RMSEP() and selectNcomp() patterns.
plsr( formula, ncomp = 2L, data, subset, na.action = stats::na.omit, method = c("kernelpls", "widekernelpls", "simpls", "oscorespls", "cppls", "nipals"), scale = FALSE, validation = c("none", "CV"), segments = 10L, center = TRUE, fit_components = TRUE, gamma = 0.5, sparsity_lambda = 0, ... ) pcr( formula, ncomp = 2L, data, subset, na.action = stats::na.omit, method = c("svdpc", "pcr"), scale = FALSE, validation = c("none", "CV"), segments = 10L, center = TRUE, fit_components = TRUE, ... ) mvr( formula, ncomp = 2L, data, subset, na.action = stats::na.omit, method = c("kernelpls", "widekernelpls", "simpls", "oscorespls", "cppls", "nipals", "pcr"), scale = FALSE, validation = c("none", "CV"), segments = 10L, center = TRUE, model = c("plsr", "pcr"), fit_components = TRUE, gamma = 0.5, sparsity_lambda = 0, ... ) MSEP(object, ...) RMSEP(object, ...) R2(object, ...) selectNcomp(object, method = c("min"), estimate = c("CV", "train"), ...)plsr( formula, ncomp = 2L, data, subset, na.action = stats::na.omit, method = c("kernelpls", "widekernelpls", "simpls", "oscorespls", "cppls", "nipals"), scale = FALSE, validation = c("none", "CV"), segments = 10L, center = TRUE, fit_components = TRUE, gamma = 0.5, sparsity_lambda = 0, ... ) pcr( formula, ncomp = 2L, data, subset, na.action = stats::na.omit, method = c("svdpc", "pcr"), scale = FALSE, validation = c("none", "CV"), segments = 10L, center = TRUE, fit_components = TRUE, ... ) mvr( formula, ncomp = 2L, data, subset, na.action = stats::na.omit, method = c("kernelpls", "widekernelpls", "simpls", "oscorespls", "cppls", "nipals", "pcr"), scale = FALSE, validation = c("none", "CV"), segments = 10L, center = TRUE, model = c("plsr", "pcr"), fit_components = TRUE, gamma = 0.5, sparsity_lambda = 0, ... ) MSEP(object, ...) RMSEP(object, ...) R2(object, ...) selectNcomp(object, method = c("min"), estimate = c("CV", "train"), ...)
formula |
A two-sided model formula. |
ncomp |
Number of latent components. |
data |
Data frame containing formula variables. |
subset |
Optional row subset. |
na.action |
Missing-value action. |
method |
PLS/PCR algorithm selector. |
scale |
Logical; standardize predictors and response. |
validation |
|
segments |
Number of contiguous CV segments, or a list of assessment row indices. |
center |
Logical; center predictors and response. |
fit_components |
Logical; fit component prefixes 1:ncomp for component-wise prediction and metrics. |
gamma |
CPPLS gamma. |
sparsity_lambda |
Sparse-SIMPLS soft threshold for the dense SIMPLS-compatible path. |
model |
|
object |
A |
estimate |
Metric source, |
... |
Reserved for compatibility. |
plsr(), pcr() and mvr() return a
n4m_mvr object. MSEP(), RMSEP() and
R2() return a small mvrVal-compatible list with a
val array. selectNcomp() returns an integer component
count.
## Not run: set.seed(1) df <- data.frame(x1 = rnorm(40), x2 = rnorm(40)) df$y <- 2 * df$x1 - df$x2 + rnorm(40, sd = 0.05) fit <- plsr(y ~ ., data = df, ncomp = 2, method = "simpls") predict(fit, df, ncomp = 1:2) RMSEP(fit) selectNcomp(fit) ## End(Not run)## Not run: set.seed(1) df <- data.frame(x1 = rnorm(40), x2 = rnorm(40)) df$y <- 2 * df$x1 - df$x2 + rnorm(40, sd = 0.05) fit <- plsr(y ~ ., data = df, ncomp = 2, method = "simpls") predict(fit, df, ncomp = 1:2) RMSEP(fit) selectNcomp(fit) ## End(Not run)
Predict from a [pls()]-fitted model.
## S3 method for class 'n4m_fit' predict(object, newdata = NULL, ...)## S3 method for class 'n4m_fit' predict(object, newdata = NULL, ...)
object |
A 'n4m_fit' returned by [pls()]. |
newdata |
A 'data.frame' (or matrix). If 'NULL', predictions on the training matrix are returned. |
... |
Ignored (for S3 generic compatibility). |
Numeric vector (for single-target regression) or matrix (multi-target).
Predict from a MethodResult-based n4m fit.
## S3 method for class 'n4m_method_fit' predict(object, newdata = NULL, ...)## S3 method for class 'n4m_method_fit' predict(object, newdata = NULL, ...)
object |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
newdata |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
... |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
PSO-PLS (Binary Particle Swarm Optimization).
pso_select( X, Y, n_components, n_swarm = 30L, n_iterations = 50L, w = 0.729, c1 = 1.494, c2 = 1.494, v_max = 4, seed = 0L )pso_select( X, Y, n_components, n_swarm = 30L, n_iterations = 50L, w = 0.729, c1 = 1.494, c2 = 1.494, v_max = 4, seed = 0L )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_swarm |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
n_iterations |
Integer >= 1. Number of outer-loop iterations. |
w |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
c1 |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
c2 |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
v_max |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
seed |
Integer. Random seed for reproducibility. |
Random Frog (Phase 5g).
random_frog_select( X, Y, n_components, n_iterations = 100L, initial_size = 30L, min_size = NULL, max_size = NULL, top_k = 10L, seed = 0L )random_frog_select( X, Y, n_components, n_iterations = 100L, initial_size = 30L, min_size = NULL, max_size = NULL, top_k = 10L, seed = 0L )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_iterations |
Integer >= 1. Number of outer-loop iterations. |
initial_size |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
min_size |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
max_size |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
top_k |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
seed |
Integer. Random seed for reproducibility. |
Random-subspace PLS — formula entry point.
random_subspace_pls( formula, data, ncomp = 2L, n_estimators = 50L, features_per_subspace = 10L, seed = 0L, na.action = stats::na.omit )random_subspace_pls( formula, data, ncomp = 2L, n_estimators = 50L, features_per_subspace = 10L, seed = 0L, na.action = stats::na.omit )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
n_estimators |
Integer >= 1. Number of bootstrap / boosting / random-subspace estimators. |
features_per_subspace |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
seed |
Integer. Random seed for reproducibility. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
Random-subspace PLS (Ho 1998).
random_subspace_pls_fit( X, Y, n_components, n_estimators = 50L, features_per_subspace = 10L, seed = 0L )random_subspace_pls_fit( X, Y, n_components, n_estimators = 50L, features_per_subspace = 10L, seed = 0L )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_estimators |
Integer >= 1. Number of bootstrap / boosting / random-subspace estimators. |
features_per_subspace |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
seed |
Integer. Random seed for reproducibility. |
Randomization test selector.
randomization_select( X, Y, n_components, n_permutations = 100L, randomization_seed = 0L, alpha = 0.05 )randomization_select( X, Y, n_components, n_permutations = 100L, randomization_seed = 0L, alpha = 0.05 )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_permutations |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
randomization_seed |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
alpha |
Numeric in [0, 1]. Elastic-net / penalty mixing parameter. |
Recursive PLS stores in-sample predictions rather than a reusable coefficient model. 'predict()' therefore accepts only the fitted training design.
recursive_pls( formula, data, ncomp = 2L, window_size = 50L, na.action = stats::na.omit )recursive_pls( formula, data, ncomp = 2L, window_size = 50L, na.action = stats::na.omit )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
window_size |
Integer moving-window size. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
Moving-window recursive PLS.
recursive_pls_fit(X, Y, n_components, window_size = 50L)recursive_pls_fit(X, Y, n_components, window_size = 50L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
window_size |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
REP-PLS.
rep_select( X, Y, n_components, n_steps = 10L, min_features = 5L, remove_count = 1L )rep_select( X, Y, n_components, n_steps = 10L, min_features = 5L, remove_count = 1L )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_steps |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
min_features |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
remove_count |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Ridge PLS — formula entry point.
ridge_pls( formula, data, ncomp = 2L, ridge_lambda = 1, na.action = stats::na.omit )ridge_pls( formula, data, ncomp = 2L, ridge_lambda = 1, na.action = stats::na.omit )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
ridge_lambda |
Numeric >= 0. Ridge regularisation strength. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
L2-augmented PLS regression.
ridge_pls_fit(X, Y, n_components, ridge_lambda = 1)ridge_pls_fit(X, Y, n_components, ridge_lambda = 1)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
ridge_lambda |
Numeric >= 0. Ridge regularisation strength. |
Robust PLS — formula entry point.
robust_pls( formula, data, ncomp = 2L, huber_k = 1.345, max_irls_iter = 20L, na.action = stats::na.omit )robust_pls( formula, data, ncomp = 2L, huber_k = 1.345, max_irls_iter = 20L, na.action = stats::na.omit )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
huber_k |
Numeric >= 0. Huber loss tuning constant. |
max_irls_iter |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
Robust PLS via Huber IRLS.
robust_pls_fit(X, Y, n_components, huber_k = 1.345, max_irls_iter = 20L)robust_pls_fit(X, Y, n_components, huber_k = 1.345, max_irls_iter = 20L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
huber_k |
Numeric >= 0. Huber loss tuning constant. |
max_irls_iter |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
ROSA — Response-Oriented Sequential Alternation.
rosa_fit(X, Y, n_components, block_sizes)rosa_fit(X, Y, n_components, block_sizes)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
block_sizes |
Integer vector. Per-block feature counts for multi-block PLS. |
SCARS — Stability + CARS.
scars_select( X, Y, n_components, n_iterations = 50L, min_features = 5L, sample_fraction = 0.8, seed = 0L )scars_select( X, Y, n_components, n_iterations = 50L, min_features = 5L, sample_fraction = 0.8, seed = 0L )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_iterations |
Integer >= 1. Number of outer-loop iterations. |
min_features |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
sample_fraction |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
seed |
Integer. Random seed for reproducibility. |
Selectivity-ratio ranker.
selectivity_ratio_select(model, X, top_k)selectivity_ratio_select(model, X, top_k)
model |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
X |
Numeric matrix used for the fit (re-passed for diagnostics). |
top_k |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
A list with 'scores' and 'selected_indices'.
Shaving selector.
shaving_select( X, Y, n_components, n_steps = 10L, min_features = 5L, shave_fraction = 0.1 )shaving_select( X, Y, n_components, n_steps = 10L, min_features = 5L, shave_fraction = 0.1 )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_steps |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
min_features |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
shave_fraction |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
siPLS — synergistic interval PLS.
sipls_select(X, Y, n_components, interval_width = 10L, combination_size = 2L)sipls_select(X, Y, n_components, interval_width = 10L, combination_size = 2L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
interval_width |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
combination_size |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Sequential & Orthogonalised multi-block PLS (Næs et al. 2011). 'block_sizes' integer vector summing to ncol(X); 'n_components_per_block' integer vector of same length.
so_pls_fit(X, Y, block_sizes, n_components_per_block)so_pls_fit(X, Y, block_sizes, n_components_per_block)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
block_sizes |
Integer vector. Per-block feature counts for multi-block PLS. |
n_components_per_block |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Fit-data selector: pass (X, Y) and the desired number of features.
spa_select(X, Y, n_components, top_k)spa_select(X, Y, n_components, top_k)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
top_k |
Number of features to select. |
A list with 'selected_indices' (1-based) and 'best_rmse'.
Sparse SIMPLS — formula entry point.
sparse_pls( formula, data, ncomp = 2L, sparsity_lambda = 0.05, na.action = stats::na.omit )sparse_pls( formula, data, ncomp = 2L, sparsity_lambda = 0.05, na.action = stats::na.omit )
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
sparsity_lambda |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
A 'n4m_method_fit' object carrying the fitted model, in-sample predictions, training RMSE, and method-specific metadata. Use 'predict()' for inference and 'coef()' to extract regression coefficients.
Sparse PLS-DA classifier ('y_labels' is an integer vector of class IDs).
sparse_pls_da_fit(X, y_labels, n_components, sparsity_lambda = 0.05)sparse_pls_da_fit(X, y_labels, n_components, sparsity_lambda = 0.05)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
y_labels |
Integer vector. Class labels. |
n_components |
Integer. Number of latent components. |
sparsity_lambda |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Sparse SIMPLS fit.
sparse_simpls_fit(X, Y, n_components, sparsity_lambda = 0.05)sparse_simpls_fit(X, Y, n_components, sparsity_lambda = 0.05)
X |
Numeric matrix. |
Y |
Numeric matrix or vector. |
n_components |
Integer >= 1. |
sparsity_lambda |
Soft-threshold magnitude per component (>= 0). |
A list with 'coefficients', 'predictions', 'x_mean', 'y_mean', 'weights_w', 'rmse'.
ST-PLS — score-threshold selector.
st_select(X, Y, n_components, thresholds, min_selected = NULL)st_select(X, Y, n_components, thresholds, min_selected = NULL)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
thresholds |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
min_selected |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Stability selector (coefficient stability, MCUVE-style).
stability_select(X, Y, n_components, top_k = 10L)stability_select(X, Y, n_components, top_k = 10L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
top_k |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
T2-PLS - sweep over alpha thresholds.
t2_select(X, Y, n_components, alpha_thresholds, min_selected = NULL)t2_select(X, Y, n_components, alpha_thresholds, min_selected = NULL)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
alpha_thresholds |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
min_selected |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
UVE — Uninformative Variable Elimination.
uve_select(X, Y, n_components, noise_features = NULL, noise_seed = 0L)uve_select(X, Y, n_components, noise_features = NULL, noise_seed = 0L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
noise_features |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
noise_seed |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
Operates on an already-fitted n4m model handle (from 'n4m_fit'). Returns the top-'top_k' features by VIP score.
vip_select(model, X, top_k)vip_select(model, X, top_k)
model |
External pointer from 'n4m_fit()'. |
X |
Numeric matrix used for the fit (re-passed for diagnostics). |
top_k |
Integer; number of features to return. |
A list with 'scores' (length p VIP scores) and 'selected_indices' (1-based, length top_k).
VIP-SPA hybrid selector.
vip_spa_select(X, Y, n_components, vip_threshold = 0.3, top_k = 10L)vip_spa_select(X, Y, n_components, vip_threshold = 0.3, top_k = 10L)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
vip_threshold |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
top_k |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
VISSA — Variable Iterative Space Shrinkage Approach.
vissa_select( X, Y, n_components, n_iterations = 20L, n_submodels = 100L, ratio_kept = 0.1, threshold = 0.5, floor_probability = 0.01, seed = 0L )vissa_select( X, Y, n_components, n_iterations = 20L, n_submodels = 100L, ratio_kept = 0.1, threshold = 0.5, floor_probability = 0.01, seed = 0L )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
n_iterations |
Integer >= 1. Number of outer-loop iterations. |
n_submodels |
Integer >= 1. Number of inner sub-models per VISSA-style iteration. |
ratio_kept |
Numeric in (0, 1]. Fraction of features kept per iteration. |
threshold |
Numeric. Convergence / pruning threshold. |
floor_probability |
Numeric in [0, 0.5). Lower bound on per-feature retention probability. |
seed |
Integer. Random seed for reproducibility. |
Sample-weighted PLS — formula entry point.
weighted_pls(formula, data, ncomp = 2L, weights, na.action = stats::na.omit)weighted_pls(formula, data, ncomp = 2L, weights, na.action = stats::na.omit)
formula |
A two-sided formula (e.g. 'y ~ .' or 'y ~ x1 + x2 + x3'). Response on the left, predictors on the right. |
data |
A 'data.frame' (or anything 'as.data.frame'-coercible) containing the response and predictor columns referenced by 'formula'. |
ncomp |
Integer. Number of latent components. |
weights |
Numeric vector of length nrow(data) with sample weights. |
na.action |
What to do with 'NA's. Default: 'na.omit'. |
Sample-weighted PLS (sqrt(w)-prescaled SIMPLS).
weighted_pls_fit(X, Y, n_components, sample_weights)weighted_pls_fit(X, Y, n_components, sample_weights)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
sample_weights |
Numeric vector of length nrow(X), strictly positive finite weights. |
A list with 'coefficients', 'predictions', 'x_mean', 'y_mean', 'rmse'.
WVC-PLS — weighted vector correlation top-k selector.
wvc_select(X, Y, n_components, top_k = 10L, normalize = TRUE)wvc_select(X, Y, n_components, top_k = 10L, normalize = TRUE)
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
top_k |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
normalize |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
WVC-threshold selector.
wvc_threshold_select( X, Y, n_components, normalize = TRUE, threshold = 0, threshold_factor = 1, min_selected = 1L )wvc_threshold_select( X, Y, n_components, normalize = TRUE, threshold = 0, threshold_factor = 1, min_selected = 1L )
X |
Numeric matrix of predictors (rows = samples, cols = features). |
Y |
Numeric matrix or vector of responses, with one row per sample. |
n_components |
Integer. Number of latent components. |
normalize |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
threshold |
Numeric. Convergence / pruning threshold. |
threshold_factor |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |
min_selected |
Method-specific parameter. See the underlying '*_fit()' function for the exact semantics. |