sphunif: Uniformity Tests on the Circle, Sphere, and Hypersphere

Just give me a quick example!

Circular data

Suppose that you want to test if a sample of circular data is uniformly distributed. For example, the following circular uniform sample in radians:

set.seed(987202226)
cir_data <- runif(n = 30, min = 0, max = 2 * pi)

Then you can call the main function in the sphunif package, unif_test, specifying the type of test to be performed. For example, the Watson (omnibus) test:

library(sphunif)
#> Loading required package: Rcpp
unif_test(data = cir_data, type = "Watson") # An htest object
#> 
#>  Watson test of circular uniformity
#> 
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity

By default, the calibration of the test statistic is done by Monte Carlo. This can be changed with p_value = "asymp" to use the asymptotic distribution:

unif_test(data = cir_data, type = "Watson", p_value = "MC") # Monte Carlo
#> Loading required package: foreach
#> Loading required package: future
#> 
#>  Watson test of circular uniformity
#> 
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8592
#> alternative hypothesis: any alternative to circular uniformity
unif_test(data = cir_data, type = "Watson", p_value = "asymp") # Asymp. distr.
#> 
#>  Watson test of circular uniformity
#> 
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity

You can perform several tests within a single call to unif_test. Choose the available circular uniformity tests from

avail_cir_tests
#>  [1] "Ajne"           "Bakshaev"       "Bingham"        "Cressie"       
#>  [5] "CCF09"          "FG01"           "Gine_Fn"        "Gine_Gn"       
#>  [9] "Gini"           "Gini_squared"   "Greenwood"      "Hermans_Rasson"
#> [13] "Hodges_Ajne"    "Kuiper"         "Log_gaps"       "Max_uncover"   
#> [17] "Num_uncover"    "PAD"            "PCvM"           "PRt"           
#> [21] "Pycke"          "Pycke_q"        "Range"          "Rao"           
#> [25] "Rayleigh"       "Riesz"          "Rothman"        "Vacancy"       
#> [29] "Watson"         "Watson_1976"

For example, you can use the Projected Anderson–Darling (García-Portugués, Navarro-Esteban, and Cuesta-Albertos (2020), also an omnibus test) test and the Watson test:

# A *list* of htest objects
unif_test(data = cir_data, type = c("PAD", "Watson"))
#> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 1.137e-08).
#> $PAD
#> 
#>  Projected Anderson-Darling test of circular uniformity
#> 
#> data:  cir_data
#> statistic = 0.54217, p-value = 0.8403
#> alternative hypothesis: any alternative to circular uniformity
#> 
#> 
#> $Watson
#> 
#>  Watson test of circular uniformity
#> 
#> data:  cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity

García-Portugués and Verdebout (2018) gives a review of different tests of uniformity on the circle. Section 5.1 in Pewsey and García-Portugués (2021) also gives an overview of recent advances.

Spherical data

Suppose now that you want to test if a sample of spherical data is uniformly distributed. Consider a non-uniformly-generated¹ sample of directions in Cartesian coordinates, such as:

# Sample data on S^2
set.seed(987202226)
theta <- runif(n = 30, min = 0, max = 2 * pi)
phi <- runif(n = 30, min = 0, max = pi)
sph_data <- cbind(cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi))

The available spherical uniformity tests:

avail_sph_tests
#>  [1] "Ajne"        "Bakshaev"    "Bingham"     "CJ12"        "CCF09"      
#>  [6] "Gine_Fn"     "Gine_Gn"     "PAD"         "PCvM"        "PRt"        
#> [11] "Pycke"       "Rayleigh"    "Rayleigh_HD" "Riesz"

See again García-Portugués and Verdebout (2018) for a review of tests of uniformity on the sphere and complementary Section 5.1 in Pewsey and García-Portugués (2021).

The default type = "all" equals type = avail_sph_tests, which might be too much for standard analysis:

unif_test(data = sph_data, type = "all", p_value = "MC", M = 1e3)
#> $Ajne
#> 
#>  Ajne test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.057937, p-value = 0.991
#> alternative hypothesis: any non-axial alternative to spherical uniformity
#> 
#> 
#> $Bakshaev
#> 
#>  Bakshaev (2010) test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 1.0215, p-value = 0.622
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $Bingham
#> 
#>  Bingham test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 17.573, p-value = 0.003
#> alternative hypothesis: scatter matrix different from constant
#> 
#> 
#> $CJ12
#> 
#>  Cai and Jiang (2012) test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 19.442, p-value = 0.78
#> alternative hypothesis: unclear consistency
#> 
#> 
#> $CCF09
#> 
#>  Cuesta-Albertos et al. (2009) test of spherical uniformity with k = 50
#> 
#> data:  sph_data
#> statistic = 1.1355, p-value = 0.764
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $Gine_Fn
#> 
#>  Gine's Fn test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 1.5391, p-value = 0.392
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $Gine_Gn
#> 
#>  Gine's Gn test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 1.3074, p-value = 0.003
#> alternative hypothesis: any axial alternative to spherical uniformity
#> 
#> 
#> $PAD
#> 
#>  Projected Anderson-Darling test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.91653, p-value = 0.5
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $PCvM
#> 
#>  Projected Cramer-von Mises test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.12769, p-value = 0.622
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $PRt
#> 
#>  Projected Rothman test of spherical uniformity with t = 0.333
#> 
#> data:  sph_data
#> statistic = 0.15523, p-value = 0.666
#> alternative hypothesis: any alternative to spherical uniformity if t is irrational
#> 
#> 
#> $Pycke
#> 
#>  Pycke test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.042839, p-value = 0.299
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $Rayleigh
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.13345, p-value = 0.98
#> alternative hypothesis: mean direction different from zero
#> 
#> 
#> $Rayleigh_HD
#> 
#>  HD-standardized Rayleigh test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = -1.1703, p-value = 0.98
#> alternative hypothesis: mean direction different from zero
#> 
#> 
#> $Riesz
#> 
#>  Warning! This is an experimental test not meant to be used
#> 
#> data:  sph_data
#> statistic = 1.0215, p-value = 0.622
#> alternative hypothesis: unclear, experimental test
unif_test(data = sph_data, type = "Rayleigh", p_value = "asymp")
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  sph_data
#> statistic = 0.13345, p-value = 0.9875
#> alternative hypothesis: mean direction different from zero

Higher dimensions

The hyperspherical setting is treated analogously to the spherical setting, and the available tests are exactly the same (avail_sph_tests). Here is an example of testing uniformity with a sample of size 100 on the $9$-sphere:

# Sample data on S^9
hyp_data <- r_unif_sph(n = 30, p = 10)

# Test
unif_test(data = hyp_data, type = "Rayleigh", p_value = "asymp")
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  hyp_data
#> statistic = 14.081, p-value = 0.1693
#> alternative hypothesis: mean direction different from zero

A data example: are Venusian craters uniformly distributed?

The following snippet partially reproduces the data application in García-Portugués, Navarro-Esteban, and Cuesta-Albertos (2021) on testing the uniformity of known Venusian craters. Incidentally, it also illustrates how to monitor the progress of a Monte Carlo simulation when p_value = "MC" using progressr.

# Load spherical data
data(venus)
head(venus)
#>        name diameter     theta      phi
#> 1    Janice     10.0 4.5724136 1.523672
#> 2  HuaMulan     24.0 5.8939769 1.514946
#> 3   Tatyana     19.0 3.7070793 1.490511
#> 4 Landowska     33.0 1.2967796 1.476025
#> 5 Ruslanova     44.3 0.2897247 1.465029
#> 6     Sveta     21.0 4.7684140 1.439181
nrow(venus)
#> [1] 967

# Compute Cartesian coordinates from polar coordinates
venus$X <- cbind(cos(venus$theta) * cos(venus$phi),
                 sin(venus$theta) * cos(venus$phi),
                 sin(venus$phi))

# Test uniformity using the Projected Cramér-von Mises and Projected
# Anderson-Darling tests on S^2, both using asymptotic distributions
unif_test(data = venus$X, type = c("PCvM", "PAD"), p_value = "asymp")
#> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 6.249e-14).
#> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 4.999e-13).
#> $PCvM
#> 
#>  Projected Cramer-von Mises test of spherical uniformity
#> 
#> data:  venus$X
#> statistic = 0.25844, p-value = 0.1272
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $PAD
#> 
#>  Projected Anderson-Darling test of spherical uniformity
#> 
#> data:  venus$X
#> statistic = 1.5077, p-value = 0.1149
#> alternative hypothesis: any alternative to spherical uniformity

# Define a handler for progressr
require(progress)
#> Loading required package: progress
require(progressr)
#> Loading required package: progressr
handlers(handler_progress(
  format = ":spin [:bar] :percent Total: :elapsedfull End \u2248 :eta",
  clear = FALSE))

# Test uniformity using Monte-Carlo approximated null distributions
with_progress(
  unif_test(data = venus$X, type = c("PCvM", "PAD"),
            p_value = "MC", chunks = 1e2, M = 5e2, cores = 2)
)
#> $PCvM
#> 
#>  Projected Cramer-von Mises test of spherical uniformity
#> 
#> data:  venus$X
#> statistic = 0.25844, p-value = 0.116
#> alternative hypothesis: any alternative to spherical uniformity
#> 
#> 
#> $PAD
#> 
#>  Projected Anderson-Darling test of spherical uniformity
#> 
#> data:  venus$X
#> statistic = 1.5077, p-value = 0.11
#> alternative hypothesis: any alternative to spherical uniformity

Simulation studies done simple

unif_stat allows to compute several statistics to different samples within a single call, thus facilitating Monte Carlo experiments:

# M samples of size n on S^2
samps_sph <- r_unif_sph(n = 30, p = 3, M = 10)

# Apply all statistics to the M samples
unif_stat(data = samps_sph, type = "all")
#>          Ajne  Bakshaev  Bingham     CJ12     CCF09   Gine_Fn   Gine_Gn
#> 1  0.47609328 2.2119954 2.382230 18.36313 1.5714737 2.2702163 0.3658432
#> 2  0.05231638 0.5266412 4.984710 21.26823 0.8961785 0.7372848 0.5280193
#> 3  0.21601031 1.0974739 3.752754 18.42823 1.4807671 1.2776073 0.4135661
#> 4  0.16994745 0.8920866 2.710392 26.76415 1.1073186 1.0173293 0.3375395
#> 5  0.24320463 1.2717785 3.607297 23.14723 1.3761164 1.4079420 0.4351235
#> 6  0.31846184 1.5920190 4.297304 21.63874 1.3718298 1.7129064 0.4390590
#> 7  0.29956640 1.5662143 4.377052 21.03363 1.5190590 1.7113360 0.5130704
#> 8  0.24576739 1.1694726 1.151481 22.37936 1.1051464 1.2430711 0.2600015
#> 9  0.30644656 1.5440129 4.982531 21.16927 1.4369394 1.6971020 0.4713157
#> 10 0.29664792 1.5808092 5.911191 24.17573 1.4809133 1.7620336 0.5754419
#>          PAD       PCvM        PRt        Pycke   Rayleigh Rayleigh_HD
#> 1  1.5396354 0.27649943 0.39262094  0.157845184 6.50306437  1.43012004
#> 2  0.4863028 0.06583015 0.08353385 -0.140634304 0.09011958 -1.18795371
#> 3  0.8766092 0.13718424 0.16400787  0.019631134 1.87776213 -0.45815169
#> 4  0.6948689 0.11151082 0.14774812 -0.103164830 1.75644170 -0.50768055
#> 5  0.9503006 0.15897231 0.21864541 -0.006193974 2.95653601 -0.01774410
#> 6  1.1442056 0.19900238 0.27062129  0.007535607 4.18444451  0.48354745
#> 7  1.1468233 0.19577679 0.27158541  0.055528018 3.88963126  0.36319044
#> 8  0.8602975 0.14618407 0.19759185 -0.066992075 2.93592539 -0.02615835
#> 9  1.1340723 0.19300161 0.25856393  0.037310492 3.76467886  0.31217884
#> 10 1.1687326 0.19760115 0.27044780  0.065156710 3.75199894  0.30700228
#>        Riesz
#> 1  2.2119954
#> 2  0.5266412
#> 3  1.0974739
#> 4  0.8920866
#> 5  1.2717785
#> 6  1.5920190
#> 7  1.5662143
#> 8  1.1694726
#> 9  1.5440129
#> 10 1.5808092

Additionally, unif_stat_MC is an utility for performing the previous simulation through a convenient parallel wrapper:

# Break the simulation in 10 chunks of tasks to be divided between 2 cores
sim <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e4, cores = 2,
                    chunks = 10)

# Critical values for test statistics
sim$crit_val_MC
#>           Ajne Bakshaev   Bingham     CJ12    CCF09  Gine_Fn   Gine_Gn      PAD
#> 0.1  0.4469816 2.186773  9.161085 26.00228 1.651559 2.327829 0.7642920 1.542838
#> 0.05 0.5440321 2.606217 10.817409 26.78061 1.763648 2.701988 0.8673248 1.801016
#> 0.01 0.7533937 3.516652 14.560299 27.96065 1.989172 3.582482 1.1082368 2.389605
#>           PCvM       PRt     Pycke  Rayleigh Rayleigh_HD    Riesz
#> 0.1  0.2733466 0.3787205 0.1484198  6.161694    1.290756 2.186773
#> 0.05 0.3257771 0.4554343 0.2132590  7.659209    1.902114 2.606217
#> 0.01 0.4395814 0.6296545 0.3562978 11.083608    3.300119 3.516652

# Test statistics
head(sim$stats_MC)
#>        Ajne  Bakshaev   Bingham     CJ12    CCF09   Gine_Fn   Gine_Gn       PAD
#> 1 0.1095132 0.7945384  4.802204 18.99292 1.186288 1.0109726 0.5729197 0.6776932
#> 2 0.1330324 1.0832080 11.866309 17.15186 1.173641 1.4474474 0.9153178 0.9097490
#> 3 0.1570914 1.0143378  6.772900 18.85335 1.228233 1.2455681 0.6172027 0.8126579
#> 4 0.1227879 0.6856030  3.188910 19.24447 1.037321 0.8544891 0.3633373 0.5906101
#> 5 0.2680238 1.2598328  2.211225 12.27737 1.367386 1.3410810 0.2689859 0.9149259
#> 6 0.1171362 0.8103943  7.611415 25.07804 1.038336 1.0689460 0.6004014 0.6906607
#>         PCvM        PRt       Pycke  Rayleigh Rayleigh_HD     Riesz
#> 1 0.09931730 0.13195058 -0.04782339 0.8419743  -0.8810103 0.7945384
#> 2 0.13540100 0.16156212  0.03973539 1.0884959  -0.7803683 1.0832080
#> 3 0.12679222 0.16485102 -0.04383765 1.6056949  -0.5692227 1.0143378
#> 4 0.08570037 0.09593621 -0.09811272 0.6702957  -0.9510978 0.6856030
#> 5 0.15747910 0.21016107 -0.03650681 3.2666485   0.1088588 1.2598328
#> 6 0.10129928 0.11489275 -0.08651426 0.7350144  -0.9246765 0.8103943

# Power computation using a pre-built sampler for the alternative
pow <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e4, cores = 2,
                    chunks = 10, r_H1 = r_alt, crit_val = sim$crit_val_MC,
                    alt = "vMF", kappa = 1)
pow$power_MC
#>        Ajne Bakshaev Bingham   CJ12  CCF09 Gine_Fn Gine_Gn    PAD   PCvM    PRt
#> 0.1  0.8262   0.8202  0.1380 0.0906 0.7812  0.8110  0.1358 0.8157 0.8202 0.8242
#> 0.05 0.7230   0.7190  0.0793 0.0449 0.6695  0.7156  0.0786 0.7166 0.7190 0.7224
#> 0.01 0.4813   0.4765  0.0195 0.0096 0.4147  0.4656  0.0187 0.4634 0.4765 0.4706
#>       Pycke Rayleigh Rayleigh_HD  Riesz
#> 0.1  0.7853   0.8268      0.8268 0.8202
#> 0.05 0.6881   0.7264      0.7264 0.7190
#> 0.01 0.4443   0.4791      0.4791 0.4765

# How to use a custom sampler according to ?unif_stat_MC
r_H1 <- function(n, p, M, l = 1) {

  samp <- array(dim = c(n, p, M))
  for (j in 1:M) {

    samp[, , j] <- mvtnorm::rmvnorm(n = n, mean = c(l, rep(0, p - 1)),
                                    sigma = diag(rep(1, p)))
    samp[, , j] <- samp[, , j] / sqrt(rowSums(samp[, , j]^2))

  }
  return(samp)

}
pow <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e2, cores = 2,
                    chunks = 5, r_H1 = r_H1, crit_val = sim$crit_val_MC)
pow$power_MC
#>      Ajne Bakshaev Bingham CJ12 CCF09 Gine_Fn Gine_Gn  PAD PCvM  PRt Pycke
#> 0.1  0.99     0.99    0.46 0.04  0.99    0.99    0.47 0.99 0.99 0.99  0.99
#> 0.05 0.99     0.99    0.35 0.02  0.99    0.99    0.32 0.99 0.99 0.99  0.99
#> 0.01 0.97     0.97    0.15 0.00  0.93    0.98    0.16 0.98 0.97 0.97  0.95
#>      Rayleigh Rayleigh_HD Riesz
#> 0.1      0.99        0.99  0.99
#> 0.05     0.99        0.99  0.99
#> 0.01     0.97        0.97  0.97

unif_stat_MC can be used for constructing the Monte Carlo calibration necessary for unif_test, either for providing a rejection rule based on $crit_val_MC or for approximating the $p$-value by $stats_MC.

# Using precomputed critical values
ht1 <- unif_test(data = samps_sph[, , 1], type = "Rayleigh",
                 p_value = "crit_val", crit_val = sim$crit_val_MC)
ht1
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  samps_sph[, , 1]
#> statistic = 6.5031, p-value = NA
#> alternative hypothesis: mean direction different from zero
ht1$reject
#>   0.1  0.05  0.01 
#> FALSE  TRUE  TRUE

# Using precomputed Monte Carlo statistics
ht2 <- unif_test(data = samps_sph[, , 1], type = "Rayleigh",
                 p_value = "MC", stats_MC = sim$stats_MC)
ht2
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  samps_sph[, , 1]
#> statistic = 6.5031, p-value = 0.9993
#> alternative hypothesis: mean direction different from zero
ht2$reject
#>  0.1 0.05 0.01 
#> TRUE TRUE TRUE

# Faster than
unif_test(data = samps_sph[, , 1], type = "Rayleigh", p_value = "MC")
#> 
#>  Rayleigh test of spherical uniformity
#> 
#> data:  samps_sph[, , 1]
#> statistic = 6.5031, p-value = 0.09281
#> alternative hypothesis: mean direction different from zero

Uniformly-distributed polar coordinates do not translate into uniformly-distributed Cartesian coordinates!↩︎

`sphunif`: Uniformity Tests on the Circle, Sphere, and Hypersphere

Eduardo García-Portugués and Thomas Verdebout

2021-09-01, v1.0.1