Title: | Kernel Smoothing for Bivariate Copula Densities |
---|---|
Description: | Provides fast implementations of kernel smoothing techniques for bivariate copula densities, in particular density estimation and resampling. |
Authors: | Thomas Nagler [aut, cre], Kuangyu Wen [ctb] |
Maintainer: | Thomas Nagler <[email protected]> |
License: | GPL-3 |
Version: | 0.9.1 |
Built: | 2024-11-02 04:13:10 UTC |
Source: | https://github.com/tnagler/kdecopula |
The optimal size of knots is chosen by a rule of thumb adapted from Rose (2015).
bw_bern(udata)
bw_bern(udata)
udata |
data. |
The formula is
where is the empirical Spearman's rho of the data.
optimal order of the Bernstein polynomials.
The bandwidth is selected by minimizing the MISE using the Frank copula as the reference family. The copula parameter is set by inversion of Kendall's tau. See Nagler (2014) for details.
bw_beta(udata)
bw_beta(udata)
udata |
data. |
To speed things up, optimal bandwidths have been pre-calculated on a grid of tau values.
A scalar bandwidth parameter.
Nagler, T. (2014). Kernel Methods for Vine Copula Estimation. Master's Thesis, Technische Universitaet Muenchen, https://mediatum.ub.tum.de/node?id=1231221
The bandwidth is selected by minimizing the MISE using the Frank copula as the reference family. The copula parameter is set by inversion of Kendall's tau. See Nagler (2014) for details.
bw_mr(udata)
bw_mr(udata)
udata |
data. |
To speed things up, optimal bandwidths have been pre-calculated on a grid of tau values.
A scalar bandwidth parameter.
Nagler, T. (2014). Kernel Methods for Vine Copula Estimation. Master's Thesis, Technische Universitaet Muenchen, https://mediatum.ub.tum.de/node?id=1231221
The bandwidth is selected by a rule of thumb. It approximately minimizes the MISE of the Gaussian copula on the transformed domain. The usual normal reference matrix is multiplied by 1.25 to account for the higher variance on the copula level.
bw_t(udata)
bw_t(udata)
udata |
data. |
The formula is
where is empirical covariance matrix of the transformed
random vector.
A 2 x 2
bandwidth matrix.
Nagler, T. (2014). Kernel Methods for Vine Copula Estimation. Master's Thesis, Technische Universitaet Muenchen, https://mediatum.ub.tum.de/node?id=1231221
The bandwidth is selected by a rule of thumb similar to bw_t
.
bw_tll(udata, deg)
bw_tll(udata, deg)
udata |
data. |
deg |
degree of the polynomial. |
The formula is
where is empirical covariance matrix of the transformed
random vector and
for
TLL1
and for
TLL2
.
A 2 x 2
bandwidth matrix.
The smoothing parameters is selected by the method of Geenens et al. (2017). It uses principal components for the rotation matrix and selects the nearest neighbor fraction along each principal direction by approximate least-squares cross-validation.
bw_tll_nn(udata, deg)
bw_tll_nn(udata, deg)
udata |
data. |
deg |
degree of the polynomial. |
A list with entries:
B
rotation matrix,
alpha
nearest neighbor fraction (this one is multiplied
with mult
in kdecop()
),
kappa
correction factor for differences in roughness along the axes,
see Geenens et al. (2017).
Geenens, G., Charpentier, A., and Paindaveine, D. (2017). Probit transformation for nonparametric kernel estimation of the copula density. Bernoulli, 23(3), 1848-1873.
The smoothing parameters are selected by the method of Wen and Wu (2015).
bw_tt_pi(udata, rho.add = TRUE) bw_tt_cv(udata, rho.add = T)
bw_tt_pi(udata, rho.add = TRUE) bw_tt_cv(udata, rho.add = T)
udata |
data. |
rho.add |
logical; whether a rotation (correlation) parameter shall be included. |
Optimal smoothing parameters as in Wen and Wu (2015): a numeric
vector of length 4; entries are .
Kuangyu Wen
Wen, K. and Wu, X. (2015). Transformation-Kernel Estimation of the Copula Density, Working paper, http://agecon2.tamu.edu/people/faculty/wu-ximing/agecon2/public/copula.pdf
kdecop()
fitCalculates several dependence measures derived from the copula density. All
measures except "blomqvist"
are computed by quasi Monte Carlo methods
(see rkdecop()
.
dep_measures(object, measures = "all", n_qmc = 10^3, seed = 5)
dep_measures(object, measures = "all", n_qmc = 10^3, seed = 5)
object |
an object of class |
measures |
which measures to compute, see Details. |
n_qmc |
the number of quasi Monte Carlo samples. |
seed |
the seed for quasi Monte Carlo integration. |
A named vector of dependence measures.
The following measures are available:
"kendall"
Kendall's , see Nelsen (2007); computed as the
sample version of a quasi Monte Carlo sample.
"spearman"
Spearman's , see Nelsen (2007); computed as the
sample version of a quasi Monte Carlo sample.
"blomqvist"
Blomqvist's , see Nelsen (2007); computed
as
.
"gini"
Gini's , see Nelsen (2007); computed by quasi
Monte Carlo integration.
"vd_waerden"
van der Waerden's coefficient, see Genest and Verret (2005); computed as the sample version of a quasi Monte Carlo sample.
"minfo"
mutual information, see Joe (1989); computed by quasi Monte Carlo integration.
"linfoot"
Linfoot's correlation coefficient, see Joe (1989); computed by quasi Monte Carlo integration.
Nelsen, R. (2007). An introduction to copulas. Springer Science & Business Media, 2007.
Genest, C., and Verret, F. (2005). Locally most powerful rank tests of independence for copula models. Journal of Nonparametric Statistics, 17(5)
Joe, H. (1989). Relative Entropy Measures of Multivariate Dependence. Journal of the American Statistical Association, 84(405)
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) ## estimate copula density and calculate dependence measures fit <- kdecop(udat[, 5:6]) dep_measures(fit)
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) ## estimate copula density and calculate dependence measures fit <- kdecop(udat[, 5:6]) dep_measures(fit)
kdecopula
objectsThe function kdecop()
stores it's result in object of class kdecopula
.
The density estimate can be evaluated on arbitrary points with dkdecop()
;
the cdf with pkdecop()
. Furthermore, synthetic data can be simulated with
rkdecop()
.
dkdecop(u, obj, stable = FALSE) pkdecop(u, obj) rkdecop(n, obj, quasi = FALSE)
dkdecop(u, obj, stable = FALSE) pkdecop(u, obj) rkdecop(n, obj, quasi = FALSE)
u |
|
obj |
|
stable |
logical; option for stabilizing the estimator: the estimated
density is cut off at |
n |
integer; number of observations. |
quasi |
logical; the default ( |
A numeric vector of the density/cdf or a n x 2
matrix of
simulated data.
Thomas Nagler
Geenens, G., Charpentier, A., and Paindaveine, D. (2017). Probit
transformation for nonparametric kernel estimation of the copula density.
Bernoulli, 23(3), 1848-1873.
Nagler, T. (2014). Kernel Methods for
Vine Copula Estimation. Master's Thesis, Technische Universitaet Muenchen,
https://mediatum.ub.tum.de/node?id=1231221
Cambou, T., Hofert,
M., Lemieux, C. (2015). A primer on quasi-random numbers for copula models,
arXiv:1508.03483
kdecop
,
plot.kdecopula
,
ghalton
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) ## estimation of copula density of variables 5 and 6 fit <- kdecop(udat[, 5:6]) plot(fit) ## evaluate density estimate at (u1,u2)=(0.123,0.321) dkdecop(c(0.123, 0.321), fit) ## evaluate cdf estimate at (u1,u2)=(0.123,0.321) pkdecop(c(0.123, 0.321), fit) ## simulate 500 samples from density estimate plot(rkdecop(500, fit))
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) ## estimation of copula density of variables 5 and 6 fit <- kdecop(udat[, 5:6]) plot(fit) ## evaluate density estimate at (u1,u2)=(0.123,0.321) dkdecop(c(0.123, 0.321), fit) ## evaluate cdf estimate at (u1,u2)=(0.123,0.321) pkdecop(c(0.123, 0.321), fit) ## simulate 500 samples from density estimate plot(rkdecop(500, fit))
kdecop()
fits.Simply calls predict(object, object$udata, what)
.
## S3 method for class 'kdecopula' fitted(object, what = "pdf", ...)
## S3 method for class 'kdecopula' fitted(object, what = "pdf", ...)
object |
an object of class |
what |
what to predict, one of |
... |
unused. |
predict.kdecopula()
data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) fit <- kdecop(udat[, 5:6]) all.equal(fitted(fit), predict(fit, fit$udata))
data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) fit <- kdecop(udat[, 5:6]) all.equal(fitted(fit), predict(fit, fit$udata))
kdecop()
fitEvaluates the h-function (or its inverse) corresponding to a kdecopula
object. H-functions are conditional distribution functions obtained by
integrating the copula density w.r.t. to one of its arguments (see also
VineCopula::BiCopHfunc()
.
hkdecop(u, obj, cond.var, inverse = FALSE)
hkdecop(u, obj, cond.var, inverse = FALSE)
u |
|
obj |
|
cond.var |
integer; |
inverse |
logical; indicates whether the h-function or its inverse shall be calculated. |
A length vector of the (inverse) h-function evaluated at
u
.
Thomas Nagler
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) ## estimation of copula density of variables 5 and 6 fit <- kdecop(udat[, 5:6]) plot(fit) ## evaluate h-function estimate and its inverse at (u1|u2) = (0.123 | 0.321) hkdecop(c(0.123, 0.321), fit, cond.var = 2) hkdecop(c(0.123, 0.321), fit, cond.var = 2, inverse = TRUE)
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) ## estimation of copula density of variables 5 and 6 fit <- kdecop(udat[, 5:6]) plot(fit) ## evaluate h-function estimate and its inverse at (u1|u2) = (0.123 | 0.321) hkdecop(c(0.123, 0.321), fit, cond.var = 2) hkdecop(c(0.123, 0.321), fit, cond.var = 2, inverse = TRUE)
Based on samples from a bivariate copula, the copula density is estimated. The user can choose between different methods. If no bandwidth is provided by the user, it will be set by a method-specific automatic selection procedure. The related (d/p/r)kdecop functions evaluate the density and cdf or simulate synthetic data, respectively.
kdecop(udata, bw = NA, mult = 1, method = "TLL2nn", knots = 30, renorm.iter = 3L, info = TRUE)
kdecop(udata, bw = NA, mult = 1, method = "TLL2nn", knots = 30, renorm.iter = 3L, info = TRUE)
udata |
|
bw |
bandwidth specification; if |
mult |
bandwidth multiplier, has to be positive; useful for making estimates more/less smooth manually. |
method |
|
knots |
integer; number of knots in each dimension for the spline approximation. |
renorm.iter |
integer; number of iterations for the renormalization procedure (see Details). |
info |
logical; if |
We use a Gaussian product kernel function for all methods
except the beta kernel and Bernstein estimators. For details on bandwidth
selection for a specific method, see: bw_t
,
bw_tll
, bw_tll_nn
, bw_tt_pi
,
bw_tt_cv
, bw_mr
, bw_beta
,
bw_bern
.
Kernel estimates are usually no proper copula densities. In particular, the
estimated marginal densities are not uniform. We mitigate this issue by
a renormalization procedure. The number of iterations of the
renormalization algorithm can be specified with the renorm.iter
argument. Typically, a very small number of iterations is sufficient.
The function kdecop
returns an
object of class kdecopula
that contains all information necessary for
evaluation of the estimator. If no bandwidth was provided in the function
call, the automatically selected value can be found in the variable
object$bw
. If info=TRUE
, also the following will be available
under object$info
:
likvalues |
Estimator evaluated in sample points |
loglik |
Log likelihood |
effp |
Effective number of parameters |
AIC |
Akaike information criterion |
cAIC |
Bias-corrected version of Akaike information criterion |
BIC |
Bayesian information criterion. |
The density estimate can be evaluated on arbitrary points with
dkdecop
; the cdf with
pkdecop
. Furthermore, synthetic data can be
simulated with rkdecop
, and several plotting
options are available with plot
and contour
.
The implementation of the tapered transformation estimator ("TTPI"/"TTCV") was kindly provided by Kuangyu Wen.
Thomas Nagler
Geenens, G., Charpentier, A., and Paindaveine, D. (2017).
Probit transformation for nonparametric kernel estimation of the copula
density.
Bernoulli, 23(3), 1848-1873.
Wen, K. and Wu, X. (2015).
Transformation-Kernel Estimation of the Copula Density,
Working paper,
http://agecon2.tamu.edu/people/faculty/wu-ximing/agecon2/public/copula.pdf
Gijbels, I. and Mielniczuk, J. (1990).
Estimating the density of a copula function.
Communications in Statistics - Theory and Methods, 19(2):445-464.
Charpentier, A., Fermanian, J.-D., and Scaillet, O. (2006).
The estimation of copulas: Theory and practice.
In Rank, J., editor, Copulas: From theory to application in finance. Risk Books.
Weiss, G. and Scheffer, M. (2012).
Smooth Nonparametric Bernstein Vine Copulas.
arXiv:1210.2043
Nagler, T. (2014).
Kernel Methods for Vine Copula Estimation.
Master's Thesis, Technische Universitaet Muenchen,
https://mediatum.ub.tum.de/node?id=1231221
kdecopula
,
plot.kdecopula
,
predict.kdecopula
,
fitted.kdecopula
,
simulate.kdecopula
,
dkdecop
,
pkdecop
,
rkdecop
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) ## estimation of copula density of variables 5 and 6 fit <- kdecop(udat[, 5:6]) summary(fit) plot(fit) contour(fit) ## evaluate density estimate at (u1,u2)=(0.123,0.321) dkdecop(c(0.123, 0.321), fit) ## evaluate cdf estimate at (u1,u2)=(0.123,0.321) pkdecop(c(0.123, 0.321), fit) ## simulate 500 samples from density estimate plot(rkdecop(500, fit)) # pseudo-random plot(rkdecop(500, fit, quasi = TRUE)) # quasi-random
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) ## estimation of copula density of variables 5 and 6 fit <- kdecop(udat[, 5:6]) summary(fit) plot(fit) contour(fit) ## evaluate density estimate at (u1,u2)=(0.123,0.321) dkdecop(c(0.123, 0.321), fit) ## evaluate cdf estimate at (u1,u2)=(0.123,0.321) pkdecop(c(0.123, 0.321), fit) ## simulate 500 samples from density estimate plot(rkdecop(500, fit)) # pseudo-random plot(rkdecop(500, fit, quasi = TRUE)) # quasi-random
This package provides fast implementations of kernel estimators for the copula density. Due to its several plotting options it is particularly useful for the exploratory analysis of dependence structures. It can be further used for flexible nonparametric estimation of copula densities and resampling.
The function kdecop
can be used to estimate a copula density
with a number of popular kernel estimators. The density estimate can be
evaluated on arbitrary points with dkdecop
;
the cdf with pkdecop
. Furthermore, synthetic
data can be simulated with rkdecop
, and
several plot options are provided by
plot.kdecopula
.
Thomas Nagler
Gijbels, I. and Mielniczuk, J. (1990).
Estimating the density of a copula function.
Communications in Statistics - Theory and Methods, 19(2):445-464.
Charpentier, A., Fermanian, J.-D., and Scaillet, O. (2006).
The estimation of copulas: Theory and practice.
In Rank, J., editor, Copulas: From theory to application in finance. Risk Books.
Geenens, G., Charpentier, A., and Paindaveine, D. (2017).
Probit transformation for nonparametric kernel estimation of the copula
density.
Bernoulli, 23(3), 1848-1873.
Nagler, T. (2014).
Kernel Methods for Vine Copula Estimation.
Master's Thesis, Technische Universitaet Muenchen,
https://mediatum.ub.tum.de/node?id=1231221
Wen, K. and Wu, X. (2015).
Transformation-Kernel Estimation of the Copula Density,
Working paper,
http://agecon2.tamu.edu/people/faculty/wu-ximing/agecon2/public/copula.pdf
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x)/(length(x)+1)) ## estimation of copula density of variables 5 and 6 dens.est <- kdecop(udat[, 5:6]) plot(dens.est) ## evaluate density estimate at (u1,u2)=(0.123,0.321) dkdecop(c(0.123, 0.321), dens.est) ## evaluate cdf estimate at (u1,u2)=(0.123,0.321) pkdecop(c(0.123, 0.321), dens.est) ## simulate 500 samples from density estimate rkdecop(500, dens.est)
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x)/(length(x)+1)) ## estimation of copula density of variables 5 and 6 dens.est <- kdecop(udat[, 5:6]) plot(dens.est) ## evaluate density estimate at (u1,u2)=(0.123,0.321) dkdecop(c(0.123, 0.321), dens.est) ## evaluate cdf estimate at (u1,u2)=(0.123,0.321) pkdecop(c(0.123, 0.321), dens.est) ## simulate 500 samples from density estimate rkdecop(500, dens.est)
kdecopula
objectLog-Likelihood of a kdecopula
object
## S3 method for class 'kdecopula' logLik(object, ...)
## S3 method for class 'kdecopula' logLik(object, ...)
object |
an object of class |
... |
not used. |
Returns an object of class logLik
containing the log-
likelihood, number of observations and effective number of parameters ("df").
Thomas Nagler
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) ## estimation of copula density of variables 5 and 6 fit <- kdecop(udat[, 5:6]) ## compute fit statistics logLik(fit) AIC(fit) BIC(fit)
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) ## estimation of copula density of variables 5 and 6 fit <- kdecop(udat[, 5:6]) ## compute fit statistics logLik(fit) AIC(fit) BIC(fit)
kdecopula
objectsProduces perspective or contour plots for a kdecopula
object.
## S3 method for class 'kdecopula' plot(x, type = "surface", margins, size, ...) ## S3 method for class 'kdecopula' contour(x, margins = "norm", size = 100L, ...)
## S3 method for class 'kdecopula' plot(x, type = "surface", margins, size, ...) ## S3 method for class 'kdecopula' contour(x, margins = "norm", size = 100L, ...)
x |
|
type |
plot type; either |
margins |
|
size |
integer; the plot is based on values on a |
... |
Thomas Nagler
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x)/(length(x)+1)) ## estimation of copula density of variables 5 and 6 obj <- kdecop(udat[, 5:6]) ## plots plot(obj) # surface plot of copula density contour(obj) # contour plot with standard normal margins contour(obj, margins = "unif") # contour plot of copula density
## load data and transform with empirical cdf data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x)/(length(x)+1)) ## estimation of copula density of variables 5 and 6 obj <- kdecop(udat[, 5:6]) ## plots plot(obj) # surface plot of copula density contour(obj) # contour plot with standard normal margins contour(obj, margins = "unif") # contour plot of copula density
kdecop()
fitsPredicts the pdf, cdf, or (inverse) h-functions by calling dkdecop()
,
pkdecop()
, or hkdecop()
.
## S3 method for class 'kdecopula' predict(object, newdata, what = "pdf", stable = FALSE, ...)
## S3 method for class 'kdecopula' predict(object, newdata, what = "pdf", stable = FALSE, ...)
object |
an object of class |
newdata |
evaluation point(s), a length two vector or a matrix with two columns. |
what |
what to predict, one of |
stable |
only used for |
... |
unused. |
A numeric vector of predictions.
data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) fit <- kdecop(udat[, 5:6]) all.equal(predict(fit, c(0.1, 0.2)), dkdecop(c(0.1, 0.2), fit)) all.equal(predict(fit, udat, "hfunc1"), hkdecop(udat, fit, cond.var = 1))
data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) fit <- kdecop(udat[, 5:6]) all.equal(predict(fit, c(0.1, 0.2)), dkdecop(c(0.1, 0.2), fit)) all.equal(predict(fit, udat, "hfunc1"), hkdecop(udat, fit, cond.var = 1))
kdecop()
fit.See rkdecop()
.
## S3 method for class 'kdecopula' simulate(object, nsim = 1, seed = NULL, quasi = FALSE, ...)
## S3 method for class 'kdecopula' simulate(object, nsim = 1, seed = NULL, quasi = FALSE, ...)
object |
an object of class |
nsim |
integer; number of observations. |
seed |
integer; |
quasi |
logical; the default ( |
... |
unused. |
Simulated data from the fitted kdecopula
model.
data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) fit <- kdecop(udat[, 5:6]) plot(simulate(fit, 500))
data(wdbc) udat <- apply(wdbc[, -1], 2, function(x) rank(x) / (length(x) + 1)) fit <- kdecop(udat[, 5:6]) plot(simulate(fit, 500))
The data contain measurements on cells in suspicious lumps in a woman's breast. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. All samples are classified as either benign or malignant.
wdbc
is a data.frame
with 31 columns. The first column
indicates whether the sample is classified as benign (B
) or malignant
(M
). The remaining columns contain measurements for 30 features.
Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b)
texture (standard deviation of gray-scale values)
c) perimeter
d)
area
e) smoothness (local variation in radius lengths)
f)
compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave
portions of the contour)
h) concave points (number of concave portions
of the contour)
i) symmetry
j) fractal dimension ("coastline
approximation" - 1)
The references listed below contain detailed descriptions of how these features are computed.
The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features.
This breast cancer database was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.
http://mlr.cs.umass.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
Bache, K. & Lichman, M. (2013).
UCI Machine Learning Repository.
Irvine, CA: University of California, School of Information and Computer
Science.
O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via
linear programming",
SIAM News, Volume 23, Number 5, September 1990, pp 1
& 18.
William H. Wolberg and O.L. Mangasarian: "Multisurface method of pattern
separation for medical diagnosis applied to breast cytology",
Proceedings of the National Academy of Sciences, U.S.A., Volume 87, December
1990, pp 9193-9196.
K. P. Bennett & O. L. Mangasarian: "Robust linear programming discrimination
of two linearly inseparable sets",
Optimization Methods and Software 1,
1992, 23-34 (Gordon & Breach Science Publishers).
data(wdbc) str(wdbc)
data(wdbc) str(wdbc)