Title: | Multivariate Kernel Density Estimation with Vine Copulas |
---|---|
Description: | Implements the vine copula based kernel density estimator of Nagler and Czado (2016) <doi:10.1016/j.jmva.2016.07.003>. The estimator does not suffer from the curse of dimensionality and is therefore well suited for high-dimensional applications. |
Authors: | Thomas Nagler [aut, cre] |
Maintainer: | Thomas Nagler <[email protected]> |
License: | GPL-3 |
Version: | 0.4.5 |
Built: | 2025-01-09 03:39:40 UTC |
Source: | https://github.com/tnagler/kdevine |
This package implements a vine copula based kernel density estimator. The estimator does not suffer from the curse of dimensionality and is therefore well suited for high-dimensional applications (see, Nagler and Czado, 2016).
The multivariate kernel density estimators is implemented by the
kdevine
function. It combines a kernel density estimator for
the margins (kde1d
) and a kernel estimator of the vine copula
density (kdevinecop
). The package is built on top of the copula
density estimators in the kdecopula::kdecopula-package and let's you
choose from all its implemented methods. Optionally, the vine copula can be
estimated parameterically (only the margins are nonparametric).
Thomas Nagler
Nagler, T., Czado, C. (2016)
Evading the curse of
dimensionality in nonparametric density estimation with simplified vine
copulas.
Journal of Multivariate Analysis 151, 69-89
(doi:10.1016/j.jmva.2016.07.003)
Nagler, T., Schellhase, C. and Czado, C. (2017)
Nonparametric
estimation of simplified vine copula models: comparison of methods
arXiv:1701.00845
Nagler, T. (2017)
A generic approach to nonparametric function
estimation with mixed data.
arXiv:1704.07457
Useful links:
Contour plots of pair copula kernel estimates
## S3 method for class 'kdevinecop' contour(x, tree = "ALL", xylim = NULL, cex.nums = 1, ...)
## S3 method for class 'kdevinecop' contour(x, tree = "ALL", xylim = NULL, cex.nums = 1, ...)
x |
a |
tree |
|
xylim |
numeric vector of length 2; sets |
cex.nums |
numeric; expansion factor for font of the numbers. |
... |
arguments passed to |
data(wdbc, package = "kdecopula") # load data u <- VineCopula::pobs(wdbc[, 5:7], ties = "average") # rank-transform # estimate density fit <- kdevinecop(u) # contour matrix contour(fit)
data(wdbc, package = "kdecopula") # load data u <- VineCopula::pobs(wdbc[, 5:7], ties = "average") # rank-transform # estimate density fit <- kdevinecop(u) # contour matrix contour(fit)
The density, cdf, or quantile function of a kernel density estimate are
evaluated at arbitrary points with dkde1d
, pkde1d
,
and qkde1d
respectively.
dkde1d(x, obj) pkde1d(x, obj) qkde1d(x, obj) rkde1d(n, obj, quasi = FALSE)
dkde1d(x, obj) pkde1d(x, obj) qkde1d(x, obj) rkde1d(n, obj, quasi = FALSE)
x |
vector of evaluation points. |
obj |
a |
n |
integer; number of observations. |
quasi |
logical; the default ( |
The density or cdf estimate evaluated at x
.
data(wdbc) # load data fit <- kde1d(wdbc[, 5]) # estimate density dkde1d(1000, fit) # evaluate density estimate pkde1d(1000, fit) # evaluate corresponding cdf qkde1d(0.5, fit) # quantile function hist(rkde1d(100, fit)) # simulate
data(wdbc) # load data fit <- kde1d(wdbc[, 5]) # estimate density dkde1d(1000, fit) # evaluate density estimate pkde1d(1000, fit) # evaluate corresponding cdf qkde1d(0.5, fit) # quantile function hist(rkde1d(100, fit)) # simulate
Evaluate the density of a kdevine object
dkdevine(x, obj)
dkdevine(x, obj)
x |
( |
obj |
a |
The density estimate evaluated at x
.
# load data data(wdbc) # estimate density (use xmin to indicate positive support) fit <- kdevine(wdbc[, 5:7], xmin = rep(0, 3)) # evaluate density estimate dkdevine(c(1000, 0.1, 0.1), fit)
# load data data(wdbc) # estimate density (use xmin to indicate positive support) fit <- kdevine(wdbc[, 5:7], xmin = rep(0, 3)) # evaluate density estimate dkdevine(c(1000, 0.1, 0.1), fit)
kdevinecop
objectA vine copula density estimate (stored in a kdevinecop
object)
can be evaluated on arbitrary points with dkevinecop
. Furthermore,
you can simulate from the estimated density with rkdevinecop
.
dkdevinecop(u, obj, stable = FALSE) rkdevinecop(n, obj, U = NULL, quasi = FALSE)
dkdevinecop(u, obj, stable = FALSE) rkdevinecop(n, obj, U = NULL, quasi = FALSE)
u |
|
obj |
|
stable |
logical; option for stabilizing the estimator: the estimated
pair copula density is cut off at |
n |
integer; number of observations. |
U |
(optional) |
quasi |
logical; the default ( |
A numeric vector of the density/cdf or a matrix of
simulated data.
Thomas Nagler
Nagler, T., Czado, C. (2016)
Evading the curse of dimensionality in nonparametric density estimation.
Journal of Multivariate Analysis 151, 69-89 (doi:10.1016/j.jmva.2016.07.003)
Dissmann, J., Brechmann, E. C., Czado, C., and Kurowicka, D. (2013).
Selecting and estimating regular vine copulae and application to financial returns.
Computational Statistics & Data Analysis, 59(0):52–69.
kdevinecop
,
dkdecop
,
rkdecop
,
ghalton
data(wdbc, package = "kdecopula") # load data u <- VineCopula::pobs(wdbc[, 5:7], ties = "average") # rank-transform fit <- kdevinecop(u) # estimate density dkdevinecop(c(0.1, 0.1, 0.1), fit) # evaluate density estimate
data(wdbc, package = "kdecopula") # load data u <- VineCopula::pobs(wdbc[, 5:7], ties = "average") # rank-transform fit <- kdevinecop(u) # estimate density dkdevinecop(c(0.1, 0.1, 0.1), fit) # evaluate density estimate
Discrete variables are convoluted with the uniform distribution (see, Nagler,
2017). If a variable should be treated as discrete, declare it as
ordered()
.
kde1d(x, mult = 1, xmin = -Inf, xmax = Inf, bw = NULL, bw_min = 0, ...)
kde1d(x, mult = 1, xmin = -Inf, xmax = Inf, bw = NULL, bw_min = 0, ...)
x |
vector of length |
mult |
numeric; the actual bandwidth used is |
xmin |
lower bound for the support of the density. |
xmax |
upper bound for the support of the density. |
bw |
bandwidth parameter; has to be a positive number or |
bw_min |
minimum value for the bandwidth. |
... |
unused. |
If xmin
or xmax
are finite, the density estimate will
be 0 outside of . Mirror-reflection is used to correct
for boundary bias. Discrete variables are convoluted with the uniform
distribution (see, Nagler, 2017).
An object of class kde1d
.
Nagler, T. (2017). A generic approach to nonparametric function estimation with mixed data. arXiv:1704.07457
dkde1d
, pkde1d
, qkde1d
,
rkde1d
plot.kde1d
, lines.kde1d
data(wdbc, package = "kdecopula") # load data fit <- kde1d(wdbc[, 5]) # estimate density dkde1d(1000, fit) # evaluate density estimate
data(wdbc, package = "kdecopula") # load data fit <- kde1d(wdbc[, 5]) # estimate density dkde1d(1000, fit) # evaluate density estimate
Implements the vine-copula based estimator of Nagler and Czado (2016). The
marginal densities are estimated by kde1d
, the vine copula
density by kdevinecop
. Discrete variables are convoluted with
the uniform distribution (see, Nagler, 2017). If a variable should be treated
as discrete, declare it as ordered()
. Factors are expanded into binary
dummy codes.
kdevine(x, mult_1d = NULL, xmin = NULL, xmax = NULL, copula.type = "kde", ...)
kdevine(x, mult_1d = NULL, xmin = NULL, xmax = NULL, copula.type = "kde", ...)
x |
( |
mult_1d |
numeric; all bandwidhts for marginal kernel density estimation
are multiplied with |
xmin |
numeric vector of length d; see |
xmax |
numeric vector of length d; see |
copula.type |
either |
... |
further arguments passed to |
An object of class kdevine
.
Nagler, T., Czado, C. (2016) Evading the curse of
dimensionality in nonparametric density estimation with simplified vine
copulas. Journal of Multivariate Analysis 151, 69-89
(doi:10.1016/j.jmva.2016.07.003)
Nagler, T. (2017). A generic approach to nonparametric function
estimation with mixed data. arXiv:1704.07457
# load data data(wdbc, package = "kdecopula") # estimate density (use xmin to indicate positive support) fit <- kdevine(wdbc[, 5:7], xmin = rep(0, 3)) # evaluate density estimate dkdevine(c(1000, 0.1, 0.1), fit) # plot simulated data pairs(rkdevine(nrow(wdbc), fit))
# load data data(wdbc, package = "kdecopula") # estimate density (use xmin to indicate positive support) fit <- kdevine(wdbc[, 5:7], xmin = rep(0, 3)) # evaluate density estimate dkdevine(c(1000, 0.1, 0.1), fit) # plot simulated data pairs(rkdevine(nrow(wdbc), fit))
The function estimates a vine copula density using kernel estimators for the pair copulas (based on the kdecopula package).
kdevinecop( data, matrix = NA, method = "TLL2", renorm.iter = 3L, mult = 1, test.level = NA, trunc.level = NA, treecrit = "tau", cores = 1, info = FALSE )
kdevinecop( data, matrix = NA, method = "TLL2", renorm.iter = 3L, mult = 1, test.level = NA, trunc.level = NA, treecrit = "tau", cores = 1, info = FALSE )
data |
( |
matrix |
R-Vine matrix ( |
method |
see |
renorm.iter |
see |
mult |
see |
test.level |
significance level for independence test. If you provide a
number in |
trunc.level |
integer; the truncation level. All pair copulas in trees above the truncation level will be set to independence. |
treecrit |
criterion for structure selection; defaults to |
cores |
integer; if |
info |
logical; if |
An object of class kdevinecop
. That is, a list containing
T1 , T2 , ...
|
lists of the estimted pair copulas in each tree, |
matrix |
the structure matrix of the vine, |
info |
additional information about the fit (if |
Nagler, T., Czado, C. (2016)
Evading the curse of
dimensionality in nonparametric density estimation with simplified vine
copulas.
Journal of Multivariate Analysis 151, 69-89
(doi:10.1016/j.jmva.2016.07.003)
Nagler, T., Schellhase, C. and Czado, C. (2017)
Nonparametric
estimation of simplified vine copula models: comparison of methods
arXiv:1701.00845
Dissmann, J., Brechmann, E. C., Czado, C., and Kurowicka, D. (2013).
Selecting and estimating regular vine copulae and application to financial
returns.
Computational Statistics & Data Analysis, 59(0):52–69.
dkdevinecop
,
kdecop
,
BiCopIndTest
,
foreach
data(wdbc, package = "kdecopula") # rank-transform to copula data (margins are uniform) u <- VineCopula::pobs(wdbc[, 5:7], ties = "average") fit <- kdevinecop(u) # estimate density dkdevinecop(c(0.1, 0.1, 0.1), fit) # evaluate density estimate contour(fit) # contour matrix (Gaussian scale) pairs(rkdevinecop(500, fit)) # plot simulated data
data(wdbc, package = "kdecopula") # rank-transform to copula data (margins are uniform) u <- VineCopula::pobs(wdbc[, 5:7], ties = "average") fit <- kdevinecop(u) # estimate density dkdevinecop(c(0.1, 0.1, 0.1), fit) # evaluate density estimate contour(fit) # contour matrix (Gaussian scale) pairs(rkdevinecop(500, fit)) # plot simulated data
Plotting kde1d objects
## S3 method for class 'kde1d' plot(x, ...) ## S3 method for class 'kde1d' lines(x, ...)
## S3 method for class 'kde1d' plot(x, ...) ## S3 method for class 'kde1d' lines(x, ...)
x |
|
... |
further arguments passed to |
data(wdbc) # load data fit <- kde1d(wdbc[, 7]) # estimate density plot(fit) # plot density estimate fit2 <- kde1d(as.ordered(wdbc[, 1])) # discrete variable plot(fit2, col = 2)
data(wdbc) # load data fit <- kde1d(wdbc[, 7]) # estimate density plot(fit) # plot density estimate fit2 <- kde1d(as.ordered(wdbc[, 1])) # discrete variable plot(fit2, col = 2)
Simulate from a kdevine object
rkdevine(n, obj, quasi = FALSE)
rkdevine(n, obj, quasi = FALSE)
n |
number of observations. |
obj |
a |
quasi |
logical; the default ( |
An matrix of simulated data from the
kdevine
object.
# load and plot data data(wdbc) # estimate density fit <- kdevine(wdbc[, 5:7], xmin = rep(0, 3)) # plot simulated data pairs(rkdevine(nrow(wdbc), fit))
# load and plot data data(wdbc) # estimate density fit <- kdevine(wdbc[, 5:7], xmin = rep(0, 3)) # plot simulated data pairs(rkdevine(nrow(wdbc), fit))
The data contain measurements on cells in suspicious lumps in a women's breast. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. All samples are classsified as either benign or malignant.
data(wdbc)
data(wdbc)
wdbc
is a data.frame
with 31 columns. The first column indicates wether the sample is classified as benign (B
) or malignant (M
). The remaining columns contain measurements for 30 features.
Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)
The references listed below contain detailed descriptions of how these features are computed.
The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features.
This breast cancer database was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.
O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via linear
programming",
SIAM News, Volume 23, Number 5, September 1990, pp 1 & 18.
William H. Wolberg and O.L. Mangasarian: "Multisurface method of
pattern separation for medical diagnosis applied to breast cytology",
Proceedings of the National Academy of Sciences, U.S.A., Volume 87,
December 1990, pp 9193-9196.
K. P. Bennett & O. L. Mangasarian: "Robust linear programming
discrimination of two linearly inseparable sets",
Optimization Methods
and Software 1, 1992, 23-34 (Gordon & Breach Science Publishers).
data(wdbc) str(wdbc)
data(wdbc) str(wdbc)