This package provides tools for ultra-high dimensional population covariance testing. It contains SY2010, LC2012, CLX2013, HC2018 and a new proposed test by Ding, Hu and Wang.

library(UHDtst)

Here we use one simple example to see how those functions work. We here generate Gaussian i.i.d samples \(X\) and \(Y\), with sample size 100 and 100 and dimension 200. That is, \[x_{ij},y_{ij}\stackrel{i.i.d}{\sim}\mathcal{N}(0,1)\]

n1 = 100
n2 = 100
p = 200
X = matrix(rnorm(n1*p), ncol=p)
Y = matrix(rnorm(n2*p), ncol=p)

For SY2010, LC2012 and CLX2013, their syntax is pretty straightforward

SY2010(X, Y)
## $Q2
## [1] 0.03296582
## 
## $pvalue
## [1] 0.8559242
LC2012(X, Y)
## $statistic
## [1] 0.7282392
## 
## $pvalue
## [1] 0.2332336
CLX2013(X, Y)
## $TSvalue
## [1] 2.120086
## 
## $pvalue
## [1] 0.06677129

The return will contain statistics and their corresponding p-values. If we set significance level as 0.05, we should accept them all.

For HC2018, since it is a multiple testing, we need to assert the size of our test. The default is 0.05. We also need to determine the numbers of super-diagonals. And the default is \(\lfloor p^{0.7}\rfloor\).

HC2018(X, Y)
## $reject
## [1] 0
## 
## $pvalues
##  [1] 0.01258424 0.81108555 0.32788619 0.39929033 0.04521339 0.38235345
##  [7] 0.14337943 0.76582997 0.57831070 0.39019748 0.84341088 0.98453290
## [13] 0.80384539 0.72632591 0.21964800 0.45255329 0.79649849 0.12153013
## [19] 0.66461830 0.16648744 0.38069693 0.17555943 0.94075624 0.35581851
## [25] 0.61388496 0.28958792 0.75689870 0.37551798 0.15565095 0.22573675
## [31] 0.04623414 0.64256622 0.23495541 0.49062623 0.88271767 0.54555658
## [37] 0.74713805 0.39005512 0.95264554 0.53042765 0.61839862
## 
## $N
## [1] 40

This function will return the number of rejections in the multiple testing. If it is larger than zero, we will reject the null hypothesis. It will return all p-values of single tests. If you want to use another size of test, you do not need to run it again.

For DHW2023, we can choose to either tune the parameter automatically or give some pre-known values. Again, we need to assert the size of our test and 0.05 is the default. The syntax is as follows.

TwoSampleTest(X, Y)
## $decision
## [1] 0
## 
## $df
## [1] 100
## 
## $reject
## [1] 5

If you know the tuning parameter, we can use this syntax.

TwoSampleTest(X, Y, const=0.5)
## $decision
## [1] 0
## 
## $df
## [1] 100
## 
## $reject
## [1] 3

For the output \(\mathrm{decision}=1\) means we should reject the null. \(\mathrm{df}\) is the number of efficient splittings in our algorithm and \(\mathrm{reject}\) is the total number of rejections in our testing. Notice our method will contain randomness since it uses bootstrapping.

Reference

[1] Srivastava, M. S., & Yanagihara, H. (2010). Testing the equality of several covariance matrices with fewer observations than the dimension. Journal of Multivariate Analysis, 101(6), 1319-1329.

[2] Li, J., & Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40(2), 908–940.

[3] Cai, T., Liu, W., & Xia, Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of the American Statistical Association, 108(501), 265-277.

[4] He, J., & Chen, S. X. (2018). High-dimensional two-sample covariance matrix testing via super-diagonals. Statistica Sinica, 28(4), 2671-2696.

[5] Ding, X. C. & Hu, Y. C., & Wang, Z. G. (2023). Two sample test for covariance matrices in ultra-high dimension. Avaviable on arXiv.


  1. UC Davis, , ↩︎

  2. UC Davis, ↩︎