Replicate weights

Replicate weights are a method for estimating the standard errors of survey statistics in complex sample designs.

The basic idea behind replicate weights is to create multiple versions of the original sample weights, each with small, randomly generated perturbations. The multiple versions of the sample weights are then used to calculate the survey statistic of interest, such as the mean or total, on multiple replicate samples. The variance of the survey statistic is then estimated by computing the variance across the replicate samples.

Currently, the Rao-Wu bootstrap[1] and the Jackknife [2] are the only methods in the package for generating replicate weights. In the future, the package will support additional types of inference methods, which will be passed when creating a ReplicateDesign object.

The bootweights function of the package can be used to generate a ReplicateDesign using the Rao-Wu bootstrap method from a SurveyDesign. For example:

julia> using Survey
julia> apistrat = load_data("apistrat")200×40 DataFrame Row Column1 cds stype name sname ⋯ │ Int64 Int64 String1 String15 String ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1 19647336097927 E Open Magnet: Ce Open Magnet: Center ⋯ 2 │ 2 19647336016018 E Belvedere Eleme Belvedere Elementary 3 │ 3 19648816021505 E Altadena Elemen Altadena Elementary 4 │ 4 19647336019285 E Soto Street Ele Soto Street Elementa 5 │ 5 56739406115430 E Walnut Canyon E Walnut Canyon Elemen ⋯ 6 │ 6 56726036084917 E Atherwood Eleme Atherwood Elementary 7 │ 7 56726036055800 E Township Elemen Township Elementary 8 │ 8 15633216109078 E Thorner (Dr. Ju Thorner (Dr. Juliet) ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 194 │ 194 19650526022933 E Emperor Element Emperor Elementary ⋯ 195 │ 195 1612426001572 E Alvarado Elemen Alvarado Elementary 196 │ 196 19647336018568 E One Hundred Twe One Hundred Twelfth 197 │ 197 33670333331600 H Corona Senior H Corona Senior High 198 │ 198 4755076003164 M Sycamore Middle Sycamore Middle ⋯ 199 │ 199 56724626055016 E Larsen (Ansgar) Larsen (Ansgar) Elem 200 │ 200 31669513134657 H Lincoln High (C Lincoln High (Char) 36 columns and 185 rows omitted
julia> dstrat = SurveyDesign(apistrat; strata=:stype, weights=:pw)SurveyDesign: data: 200×44 DataFrame strata: stype [E, E, E … H] cluster: none popsize: [4420.9999, 4420.9999, 4420.9999 … 755.0] sampsize: [100, 100, 100 … 50] weights: [44.21, 44.21, 44.21 … 15.1] allprobs: [0.0226, 0.0226, 0.0226 … 0.0662]
julia> bstrat = bootweights(dstrat; replicates = 10)ReplicateDesign{BootstrapReplicates}: data: 200×54 DataFrame strata: stype [E, E, E … H] cluster: none popsize: [4420.9999, 4420.9999, 4420.9999 … 755.0] sampsize: [100, 100, 100 … 50] weights: [44.21, 44.21, 44.21 … 15.1] allprobs: [0.0226, 0.0226, 0.0226 … 0.0662] type: bootstrap replicates: 10

The jackknifeweights function of the package can be used to generate a ReplicateDesign using the Jackknife method from a SurveyDesign. For example:

julia> using Survey
julia> apistrat = load_data("apistrat")200×40 DataFrame Row Column1 cds stype name sname ⋯ │ Int64 Int64 String1 String15 String ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1 19647336097927 E Open Magnet: Ce Open Magnet: Center ⋯ 2 │ 2 19647336016018 E Belvedere Eleme Belvedere Elementary 3 │ 3 19648816021505 E Altadena Elemen Altadena Elementary 4 │ 4 19647336019285 E Soto Street Ele Soto Street Elementa 5 │ 5 56739406115430 E Walnut Canyon E Walnut Canyon Elemen ⋯ 6 │ 6 56726036084917 E Atherwood Eleme Atherwood Elementary 7 │ 7 56726036055800 E Township Elemen Township Elementary 8 │ 8 15633216109078 E Thorner (Dr. Ju Thorner (Dr. Juliet) ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 194 │ 194 19650526022933 E Emperor Element Emperor Elementary ⋯ 195 │ 195 1612426001572 E Alvarado Elemen Alvarado Elementary 196 │ 196 19647336018568 E One Hundred Twe One Hundred Twelfth 197 │ 197 33670333331600 H Corona Senior H Corona Senior High 198 │ 198 4755076003164 M Sycamore Middle Sycamore Middle ⋯ 199 │ 199 56724626055016 E Larsen (Ansgar) Larsen (Ansgar) Elem 200 │ 200 31669513134657 H Lincoln High (C Lincoln High (Char) 36 columns and 185 rows omitted
julia> dstrat = SurveyDesign(apistrat; strata=:stype, weights=:pw)SurveyDesign: data: 200×44 DataFrame strata: stype [E, E, E … H] cluster: none popsize: [4420.9999, 4420.9999, 4420.9999 … 755.0] sampsize: [100, 100, 100 … 50] weights: [44.21, 44.21, 44.21 … 15.1] allprobs: [0.0226, 0.0226, 0.0226 … 0.0662]
julia> bstrat = jackknifeweights(dstrat; replicates = 10)ERROR: MethodError: no method matching jackknifeweights(::SurveyDesign; replicates::Int64) Closest candidates are: jackknifeweights(::SurveyDesign) got unsupported keyword argument "replicates" @ Survey ~/work/Survey.jl/Survey.jl/src/jackknife.jl:42

For each replicate, the DataFrame of ReplicateDesign has an additional column. The name of the column is replicate_ followed by the replicate number.

julia> names(bstrat.data)54-element Vector{String}:
 "Column1"
 "cds"
 "stype"
 "name"
 "sname"
 "snum"
 "dname"
 "dnum"
 "cname"
 "cnum"
 ⋮
 "replicate_2"
 "replicate_3"
 "replicate_4"
 "replicate_5"
 "replicate_6"
 "replicate_7"
 "replicate_8"
 "replicate_9"
 "replicate_10"

replicate_1, replicate_2, replicate_3, replicate_4, replicate_5, replicate_6, replicate_7, replicate_8, replicate_9, replicate_10, are the replicate weight columns.

While a SurveyDesign can be used to estimate a statistics. For example:

julia> mean(:api00, dstrat)1×1 DataFrame
 Row  mean     Float64 
─────┼─────────
   1 │ 662.287

The ReplicateDesign can be used to compute the standard error of the statistic. For example:

julia> mean(:api00, bstrat)1×2 DataFrame
 Row  mean     SE       Float64  Float64 
─────┼──────────────────
   1 │ 662.287  11.2546

For each replicate weight, the statistic is calculated using it instead of the weight. The standard deviation of those statistics is the standard error of the estimate.

References