Dimensionality Reduction via Regression

An S4 Class implementing Dimensionality Reduction via Regression (DRR).

Details

DRR is a non-linear extension of PCA that uses Kernel Ridge regression.

Slots

fun: A function that does the embedding and returns a dimRedResult object.
stdpars: The standard parameters for the function.

General usage

Dimensionality reduction methods are S4 Classes that either be used directly, in which case they have to be initialized and a full list with parameters has to be handed to the @fun() slot, or the method name be passed to the embed function and parameters can be given to the ..., in which case missing parameters will be replaced by the ones in the @stdpars.

Parameters

DRR can take the following parameters:

ndim: The number of dimensions
lambda: The regularization parameter for the ridge regression.
kernel: The kernel to use for KRR, defaults to "rbfdot".
kernel.pars: A list with kernel parameters, elements depend on the kernel used, "rbfdot" uses "sigma".
pca: logical, should an initial pca step be performed, defaults to TRUE.
pca.center: logical, should the data be centered before the pca step. Defaults to TRUE.
pca.scale: logical, should the data be scaled before the pca ste. Defaults to FALSE.
fastcv: logical, should fastCV from the CVST package be used instead of normal cross-validation.
fastcv.test: If fastcv = TRUE, separate test data set for fastcv.
cv.folds: if fastcv = FALSE, specifies the number of folds for crossvalidation.
fastkrr.nblocks: integer, higher values sacrifice numerical accuracy for speed and less memory, see below for details.
verbose: logical, should the cross-validation results be printed out.

Implementation

Wraps around drr, see there for details. DRR is a non-linear extension of principal components analysis using Kernel Ridge Regression (KRR, details see constructKRRLearner and constructFastKRRLearner). Non-linear regression is used to explain more variance than PCA. DRR provides an out-of-sample extension and a backward projection.

The most expensive computations are matrix inversions therefore the implementation profits a lot from a multithreaded BLAS library. The best parameters for each KRR are determined by cross-validaton over all parameter combinations of lambda and kernel.pars, using less parameter values will speed up computation time. Calculation of KRR can be accelerated by increasing fastkrr.nblocks, it should be smaller than $n^{1/3}$ up to sacrificing some accuracy, for details see constructFastKRRLearner. Another way to speed up is to use pars$fastcv = TRUE which might provide a more efficient way to search the parameter space but may also miss the global maximum, I have not ran tests on the accuracy of this method.

References

Laparra, V., Malo, J., Camps-Valls, G., 2015. Dimensionality Reduction via Regression in Hyperspectral Imagery. IEEE Journal of Selected Topics in Signal Processing 9, 1026-1036. doi:10.1109/JSTSP.2015.2417833

Examples

if (FALSE) { # \dontrun{
if(requireNamespace(c("kernlab", "DRR"), quietly = TRUE)) {

dat <- loadDataSet("variable Noise Helix", n = 200)[sample(200)]

emb <- embed(dat, "DRR", ndim = 3)

plot(dat, type = "3vars")
plot(emb, type = "3vars")

# We even have function to reconstruct, also working for only the first few dimensions
rec <- inverse(emb, getData(getDimRedData(emb))[, 1, drop = FALSE])
plot(rec, type = "3vars")
}

} # }