Replicate weights

Replicate weights are a method for estimating the standard errors of survey statistics in complex sample designs.

The basic idea behind replicate weights is to create multiple versions of the original sample weights, each with small, randomly generated perturbations. The multiple versions of the sample weights are then used to calculate the survey statistic of interest, such as the mean or total, on multiple replicate samples. The variance of the survey statistic is then estimated by computing the variance across the replicate samples.

Currently, the Rao-Wu bootstrap^[1] and the Jackknife ^[2] are the only methods in the package for generating replicate weights. In the future, the package will support additional types of inference methods, which will be passed when creating a ReplicateDesign object.

The bootweights function of the package can be used to generate a ReplicateDesign using the Rao-Wu bootstrap method from a SurveyDesign. For example:

julia> using Survey
julia> apistrat = load_data("apistrat")200×40 DataFrame
 Row │ Column1  cds             stype    name             sname                ⋯
     │ Int64    Int64           String1  String15         String               ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │       1  19647336097927  E        Open Magnet: Ce  Open Magnet: Center  ⋯
   2 │       2  19647336016018  E        Belvedere Eleme  Belvedere Elementary
   3 │       3  19648816021505  E        Altadena Elemen  Altadena Elementary
   4 │       4  19647336019285  E        Soto Street Ele  Soto Street Elementa
   5 │       5  56739406115430  E        Walnut Canyon E  Walnut Canyon Elemen ⋯
   6 │       6  56726036084917  E        Atherwood Eleme  Atherwood Elementary
   7 │       7  56726036055800  E        Township Elemen  Township Elementary
   8 │       8  15633216109078  E        Thorner (Dr. Ju  Thorner (Dr. Juliet)
  ⋮  │    ⋮           ⋮            ⋮            ⋮                         ⋮    ⋱
 194 │     194  19650526022933  E        Emperor Element  Emperor Elementary   ⋯
 195 │     195   1612426001572  E        Alvarado Elemen  Alvarado Elementary
 196 │     196  19647336018568  E        One Hundred Twe  One Hundred Twelfth
 197 │     197  33670333331600  H        Corona Senior H  Corona Senior High
 198 │     198   4755076003164  M        Sycamore Middle  Sycamore Middle      ⋯
 199 │     199  56724626055016  E        Larsen (Ansgar)  Larsen (Ansgar) Elem
 200 │     200  31669513134657  H        Lincoln High (C  Lincoln High (Char)
                                                 36 columns and 185 rows omitted
julia> dstrat = SurveyDesign(apistrat; strata=:stype, weights=:pw)SurveyDesign:
data: 200×44 DataFrame
strata: stype
    [E, E, E  …  H]
cluster: none
popsize: [4420.9999, 4420.9999, 4420.9999  …  755.0]
sampsize: [100, 100, 100  …  50]
weights: [44.21, 44.21, 44.21  …  15.1]
allprobs: [0.0226, 0.0226, 0.0226  …  0.0662]
julia> bstrat = bootweights(dstrat; replicates = 10)ReplicateDesign{BootstrapReplicates}:
data: 200×54 DataFrame
strata: stype
    [E, E, E  …  H]
cluster: none
popsize: [4420.9999, 4420.9999, 4420.9999  …  755.0]
sampsize: [100, 100, 100  …  50]
weights: [44.21, 44.21, 44.21  …  15.1]
allprobs: [0.0226, 0.0226, 0.0226  …  0.0662]
type: bootstrap
replicates: 10

The jackknifeweights function of the package can be used to generate a ReplicateDesign using the Jackknife method from a SurveyDesign. For example:

julia> using Survey
julia> apistrat = load_data("apistrat")200×40 DataFrame
 Row │ Column1  cds             stype    name             sname                ⋯
     │ Int64    Int64           String1  String15         String               ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │       1  19647336097927  E        Open Magnet: Ce  Open Magnet: Center  ⋯
   2 │       2  19647336016018  E        Belvedere Eleme  Belvedere Elementary
   3 │       3  19648816021505  E        Altadena Elemen  Altadena Elementary
   4 │       4  19647336019285  E        Soto Street Ele  Soto Street Elementa
   5 │       5  56739406115430  E        Walnut Canyon E  Walnut Canyon Elemen ⋯
   6 │       6  56726036084917  E        Atherwood Eleme  Atherwood Elementary
   7 │       7  56726036055800  E        Township Elemen  Township Elementary
   8 │       8  15633216109078  E        Thorner (Dr. Ju  Thorner (Dr. Juliet)
  ⋮  │    ⋮           ⋮            ⋮            ⋮                         ⋮    ⋱
 194 │     194  19650526022933  E        Emperor Element  Emperor Elementary   ⋯
 195 │     195   1612426001572  E        Alvarado Elemen  Alvarado Elementary
 196 │     196  19647336018568  E        One Hundred Twe  One Hundred Twelfth
 197 │     197  33670333331600  H        Corona Senior H  Corona Senior High
 198 │     198   4755076003164  M        Sycamore Middle  Sycamore Middle      ⋯
 199 │     199  56724626055016  E        Larsen (Ansgar)  Larsen (Ansgar) Elem
 200 │     200  31669513134657  H        Lincoln High (C  Lincoln High (Char)
                                                 36 columns and 185 rows omitted
julia> dstrat = SurveyDesign(apistrat; strata=:stype, weights=:pw)SurveyDesign:
data: 200×44 DataFrame
strata: stype
    [E, E, E  …  H]
cluster: none
popsize: [4420.9999, 4420.9999, 4420.9999  …  755.0]
sampsize: [100, 100, 100  …  50]
weights: [44.21, 44.21, 44.21  …  15.1]
allprobs: [0.0226, 0.0226, 0.0226  …  0.0662]
julia> bstrat = jackknifeweights(dstrat; replicates = 10)ERROR: MethodError: no method matching jackknifeweights(::SurveyDesign; replicates::Int64)

Closest candidates are:
  jackknifeweights(::SurveyDesign) got unsupported keyword argument "replicates"
   @ Survey ~/work/Survey.jl/Survey.jl/src/jackknife.jl:42

For each replicate, the DataFrame of ReplicateDesign has an additional column. The name of the column is replicate_ followed by the replicate number.

julia> names(bstrat.data)54-element Vector{String}:
 "Column1"
 "cds"
 "stype"
 "name"
 "sname"
 "snum"
 "dname"
 "dnum"
 "cname"
 "cnum"
 ⋮
 "replicate_2"
 "replicate_3"
 "replicate_4"
 "replicate_5"
 "replicate_6"
 "replicate_7"
 "replicate_8"
 "replicate_9"
 "replicate_10"

replicate_1, replicate_2, replicate_3, replicate_4, replicate_5, replicate_6, replicate_7, replicate_8, replicate_9, replicate_10, are the replicate weight columns.

While a SurveyDesign can be used to estimate a statistics. For example:

julia> mean(:api00, dstrat)1×1 DataFrame
 Row │ mean    
     │ Float64 
─────┼─────────
   1 │ 662.287

The ReplicateDesign can be used to compute the standard error of the statistic. For example:

julia> mean(:api00, bstrat)1×2 DataFrame
 Row │ mean     SE      
     │ Float64  Float64 
─────┼──────────────────
   1 │ 662.287  11.2546

For each replicate weight, the statistic is calculated using it instead of the weight. The standard deviation of those statistics is the standard error of the estimate.

Replicate weights

References