Application of the BLE to the Stratified Simple Random Sample design

(From Section 2.3.2 of the “Gonçalves, Moura and Migon: Bayes linear estimation for finite population with emphasis on categorical data”)

In a simple model, where there is no auxiliary variable, and a Stratified Simple Random Sample was taken from the population, we can calculate the Bayes Linear Estimator for the individuals of each strata of the population with the BLE_SSRS() function, which receives the following parameters:

\(y_s\) - a vector containing either the observed values (aggregated by strata) or sample mean for each strata (\(\sigma\) parameter will be required in this case);
\(h\) - a vector containing the number of observations of each strata in the sample;
\(N\) - a vector containing the total size of each strata;
\(m\) - a vector containing the prior mean of each strata. If NULL, sample mean for each strata will be used (non-informative prior);
\(v\) - a vector containing the prior variance of an element from each strata (\(v_i> \sigma_i^2\) for each strata \(i\)). If NULL, it will tend to infinity (non-informative prior);
\(\sigma\) - a vector containing the prior estimate of variability (standard deviation) within each strata. If NULL, sample variance of each strata will be used.

Examples

We will use the TeachingSampling’s BigCity dataset for this example (actually we have to take a sample of size \(10000\) from this dataset so that R can perform the calculations). Imagine that we want to estimate the mean or the total Expenditure of this population, but we know that there is a difference between the rural individuals expenditure mean and the urban ones. After taking a stratified simple random sample of 30 individuals from each area, we want to estimate the real expenditure means, conjugating the sample information with an expert expectation (a priori mean will be \(280\) for the rural area and \(420\) for the urban).

data(BigCity)
end <- dim(BigCity)[1]
s <- seq(from = 1, to = end, by = 1)

set.seed(3)
samp <- sample(s, size = 10000, replace = FALSE)
ordered_samp <- sort(samp)
BigCity_red <- BigCity[ordered_samp,]

Rural <- BigCity_red[which(BigCity_red$Zone == "Rural"),]
Rural_Exp <- Rural$Expenditure
length(Rural_Exp)
#> [1] 4757

Rural_ys <- sample(Rural_Exp, size = 30, replace = FALSE)

Urban <- BigCity_red[which(BigCity_red$Zone == "Urban"),]
Urban_Exp <- Urban$Expenditure
length(Urban_Exp)
#> [1] 5243

Urban_ys <- sample(Urban_Exp, size = 30, replace = FALSE)

The real expenditure means will be the values we want to estimate. In this example we know their real values:

mean(Rural_Exp)
#> [1] 291.978
mean(Urban_Exp)
#> [1] 449.0023

Our design-based estimator for the mean will be the sample mean for each strata:

mean(Rural_ys)
#> [1] 302.5523
mean(Urban_ys)
#> [1] 477.8243

Applying the prior information about the population we can get a better estimate, especially in cases when only a small sample is available:

ys <- c(Rural_ys, Urban_ys)
h <- c(30,30)
N <- c(length(Rural_Exp), length(Urban_Exp))
m <- c(280, 420)
v=c(4*(10.1^4), 10.1^5)
sigma = c(sqrt(4*10^4), sqrt(10^5))

Estimator <- BLE_SSRS(ys, h, N, m, v, sigma)

Our Bayes Linear Estimator for the mean expenditure of each strata:

Estimator$est.beta
#>       Beta
#> 1 292.3850
#> 2 454.9716
Estimator$Vest.beta
#>         V1       V2
#> 1 732.2238    0.000
#> 2   0.0000 2015.967

Example from the help page

ys <- c(2,-1,1.5, 6,10, 8,8)
h <- c(3,2,2)
N <- c(5,5,3)
m <- c(0,9,8)
v <- c(3,8,1)
sigma <- c(1,2,0.5)

Estimator <- BLE_SSRS(ys, h, N, m, v, sigma)
Estimator
#> $est.beta
#>        Beta
#> 1 0.7142857
#> 2 8.3333333
#> 3 8.0000000
#> 
#> $Vest.beta
#>          V1       V2        V3
#> 1 0.2857143 0.000000 0.0000000
#> 2 0.0000000 1.333333 0.0000000
#> 3 0.0000000 0.000000 0.1071429
#> 
#> $est.mean
#>      y_nots
#> 1 0.7142857
#> 2 0.7142857
#> 3 8.3333333
#> 4 8.3333333
#> 5 8.3333333
#> 6 8.0000000
#> 
#> $Vest.mean
#>          V1        V2       V3       V4       V5        V6
#> 1 1.2857143 0.2857143 0.000000 0.000000 0.000000 0.0000000
#> 2 0.2857143 1.2857143 0.000000 0.000000 0.000000 0.0000000
#> 3 0.0000000 0.0000000 5.333333 1.333333 1.333333 0.0000000
#> 4 0.0000000 0.0000000 1.333333 5.333333 1.333333 0.0000000
#> 5 0.0000000 0.0000000 1.333333 1.333333 5.333333 0.0000000
#> 6 0.0000000 0.0000000 0.000000 0.000000 0.000000 0.3571429
#> 
#> $est.tot
#> [1] 68.92857
#> 
#> $Vest.tot
#> [1] 27.5

Example from the help page, but informing sample means instead of sample observations

y1 <- mean(c(2,-1,1.5))
y2 <- mean(c(6,10))
y3 <- mean(c(8,8))
ys <- c(y1, y2, y3)
h <- c(3,2,2)
N <- c(5,5,3)
m <- c(0,9,8)
v <- c(3,8,1)
sigma <- c(1,2,0.5)

Estimator <- BLE_SSRS(ys, h, N, m, v, sigma)
#> sample means informed instead of sample observations, parameter 'sigma' will be necessary
Estimator
#> $est.beta
#>        Beta
#> 1 0.7142857
#> 2 8.3333333
#> 3 8.0000000
#> 
#> $Vest.beta
#>          V1       V2        V3
#> 1 0.2857143 0.000000 0.0000000
#> 2 0.0000000 1.333333 0.0000000
#> 3 0.0000000 0.000000 0.1071429
#> 
#> $est.mean
#>      y_nots
#> 1 0.7142857
#> 2 0.7142857
#> 3 8.3333333
#> 4 8.3333333
#> 5 8.3333333
#> 6 8.0000000
#> 
#> $Vest.mean
#>          V1        V2       V3       V4       V5        V6
#> 1 1.2857143 0.2857143 0.000000 0.000000 0.000000 0.0000000
#> 2 0.2857143 1.2857143 0.000000 0.000000 0.000000 0.0000000
#> 3 0.0000000 0.0000000 5.333333 1.333333 1.333333 0.0000000
#> 4 0.0000000 0.0000000 1.333333 5.333333 1.333333 0.0000000
#> 5 0.0000000 0.0000000 1.333333 1.333333 5.333333 0.0000000
#> 6 0.0000000 0.0000000 0.000000 0.000000 0.000000 0.3571429
#> 
#> $est.tot
#> [1] 68.92857
#> 
#> $Vest.tot
#> [1] 27.5