Frequentist Regression Models
CRRao.FrequentistRegression
— TypeFrequentistRegression{RegressionType}
Type to represent frequentist regression models returned by fit
functions. This type is used internally by the package to represent all frequentist regression models. RegressionType
is a Symbol
representing the model class.
Linear Regression
StatsAPI.fit
— Methodfit(formula::FormulaTerm, data::DataFrame, modelClass::LinearRegression; kwargs...)
Fit an OLS Linear Regression model on the input data. Uses the lm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:LinearRegression}
. Supports the same keyword arguments as lm.
Example
julia> using CRRao, RDatasets, StatsPlots, StatsModels
julia> df = dataset("datasets", "mtcars")
32×12 DataFrame
Row │ Model MPG Cyl Disp HP DRat WT QSec VS AM Gear Carb
│ String31 Float64 Int64 Float64 Int64 Float64 Float64 Float64 Int64 Int64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ Mazda RX4 21.0 6 160.0 110 3.9 2.62 16.46 0 1 4 4
2 │ Mazda RX4 Wag 21.0 6 160.0 110 3.9 2.875 17.02 0 1 4 4
3 │ Datsun 710 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
4 │ Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 │ Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2
6 │ Valiant 18.1 6 225.0 105 2.76 3.46 20.22 1 0 3 1
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
27 │ Porsche 914-2 26.0 4 120.3 91 4.43 2.14 16.7 0 1 5 2
28 │ Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
29 │ Ford Pantera L 15.8 8 351.0 264 4.22 3.17 14.5 0 1 5 4
30 │ Ferrari Dino 19.7 6 145.0 175 3.62 2.77 15.5 0 1 5 6
31 │ Maserati Bora 15.0 8 301.0 335 3.54 3.57 14.6 0 1 5 8
32 │ Volvo 142E 21.4 4 121.0 109 4.11 2.78 18.6 1 1 4 2
20 rows omitted
julia> container = fit(@formula(MPG ~ HP + WT + Gear), df, LinearRegression())
Model Class: Linear Regression
Likelihood Mode: Gauss
Link Function: Identity
Computing Method: Optimization
────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────────
(Intercept) 32.0137 4.63226 6.91 <1e-06 22.5249 41.5024
HP -0.0367861 0.00989146 -3.72 0.0009 -0.0570478 -0.0165243
WT -3.19781 0.846546 -3.78 0.0008 -4.93188 -1.46374
Gear 1.01998 0.851408 1.20 0.2410 -0.72405 2.76401
────────────────────────────────────────────────────────────────────────────
julia> coeftable(container)
────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────────
(Intercept) 32.0137 4.63226 6.91 <1e-06 22.5249 41.5024
HP -0.0367861 0.00989146 -3.72 0.0009 -0.0570478 -0.0165243
WT -3.19781 0.846546 -3.78 0.0008 -4.93188 -1.46374
Gear 1.01998 0.851408 1.20 0.2410 -0.72405 2.76401
────────────────────────────────────────────────────────────────────────────
julia> sigma(container)
2.5741691724978972
julia> aic(container)
157.05277871921942
julia> predict(container)
32-element Vector{Float64}:
23.668849952338718
22.85340824320634
25.253556140740894
20.746171762311384
17.635570543830177
20.14663845388644
14.644831040166633
23.61182872351372
⋮
16.340457241090512
27.47793682112109
26.922715039574857
28.11844900519874
17.264981908248554
21.818065399379595
13.374047477198516
23.193986311384343
julia> residuals(container)
32-element Vector{Float64}:
-2.668849952338718
-1.8534082432063386
-2.4535561407408935
0.6538282376886144
1.0644294561698224
-2.0466384538864375
-0.3448310401666319
0.7881712764862776
⋮
2.8595427589094875
-0.1779368211210901
-0.9227150395748573
2.2815509948012576
-1.4649819082485536
-2.1180653993795957
1.6259525228014837
-1.7939863113843444
julia> plot(cooksdistance(container))
StatsAPI.fit
— Functionfit(formula::FormulaTerm, data::DataFrame, modelClass::LinearRegression, bootstrap::Boot_Residual, sim_size::Int64 = 1000)
Fit a Bootstrap Regression model on the input data. Uses the lm method from the GLM package under the hood. Returns an object of type DataFrame
.
Example
julia> using CRRao, RDatasets, StableRNGs, StatsModels
julia> df = dataset("datasets", "mtcars")
32×12 DataFrame
Row │ Model MPG Cyl Disp HP DRat WT QSec VS AM Gear Carb
│ String31 Float64 Int64 Float64 Int64 Float64 Float64 Float64 Int64 Int64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ Mazda RX4 21.0 6 160.0 110 3.9 2.62 16.46 0 1 4 4
2 │ Mazda RX4 Wag 21.0 6 160.0 110 3.9 2.875 17.02 0 1 4 4
3 │ Datsun 710 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
4 │ Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 │ Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2
6 │ Valiant 18.1 6 225.0 105 2.76 3.46 20.22 1 0 3 1
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
27 │ Porsche 914-2 26.0 4 120.3 91 4.43 2.14 16.7 0 1 5 2
28 │ Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
29 │ Ford Pantera L 15.8 8 351.0 264 4.22 3.17 14.5 0 1 5 4
30 │ Ferrari Dino 19.7 6 145.0 175 3.62 2.77 15.5 0 1 5 6
31 │ Maserati Bora 15.0 8 301.0 335 3.54 3.57 14.6 0 1 5 8
32 │ Volvo 142E 21.4 4 121.0 109 4.11 2.78 18.6 1 1 4 2
20 rows omitted
julia> CRRao.set_rng(StableRNG(123))
StableRNGs.LehmerRNG(state=0x000000000000000000000000000000f7)
julia> container = fit(@formula(MPG ~ HP + WT + Gear), df, LinearRegression(), Boot_Residual())
4×5 DataFrame
Row │ Predictor Coef Std Error Lower 5% Upper 95%
│ String Float64 Float64 Float64 Float64
─────┼─────────────────────────────────────────────────────────────
1 │ (Intercept) 32.1309 4.57528 24.8024 39.9568
2 │ HP -0.0364971 0.00962225 -0.0519917 -0.0201571
3 │ WT -3.22576 0.834607 -4.61517 -1.80358
4 │ Gear 1.00012 0.842335 -0.429382 2.35324
With bootstrap
Fitting linear regression while estimating the standard error using bootstrap statistics.
CRRao.Boot_Residual
— TypeBoot_Residual
Type representing Residual Bootstrap.
Breusch-Pagan Lagrange Multiplier test for heteroscedasticity
Breusch-Pagan tests the homoscedasticity assumption of the residual variance in linear regression.
CRRao.BPTest
— MethodBPTest(container::FrequentistRegression, data::DataFrame)
Perform the Brush-Pegan test. This test only works with linear regression.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get the BPTest
BPTest(container, mtcars)
Logistic Regression
StatsAPI.fit
— Methodfit(formula::FormulaTerm, data::DataFrame, modelClass::LogisticRegression, Link::Logit; kwargs...)
Fit a Logistic Regression model on the input data using the Logit link. Uses the glm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:LogisticRegression}
. Supports the same keyword arguments as glm.
Example
julia> using CRRao, RDatasets, StatsModels
julia> turnout = dataset("Zelig", "turnout")
2000×5 DataFrame
Row │ Race Age Educate Income Vote
│ Cat… Int32 Float64 Float64 Int32
──────┼───────────────────────────────────────
1 │ white 60 14.0 3.3458 1
2 │ white 51 10.0 1.8561 0
3 │ white 24 12.0 0.6304 0
4 │ white 38 8.0 3.4183 1
5 │ white 25 12.0 2.7852 1
6 │ white 67 12.0 2.3866 1
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
1995 │ white 22 7.0 0.2364 0
1996 │ white 26 16.0 3.3834 0
1997 │ white 34 12.0 2.917 1
1998 │ white 51 16.0 7.8949 1
1999 │ white 22 10.0 2.4811 0
2000 │ white 59 10.0 0.5523 0
1988 rows omitted
julia> container = fit(@formula(Vote ~ Age + Race + Income + Educate), turnout, LogisticRegression(), Logit())
Model Class: Logistic Regression
Likelihood Mode: Binomial
Link Function: Identity
Computing Method: Optimization
────────────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────────
(Intercept) -3.03426 0.325927 -9.31 <1e-19 -3.67307 -2.39546
Age 0.0283543 0.00346034 8.19 <1e-15 0.0215722 0.0351365
Race: white 0.250798 0.146457 1.71 0.0868 -0.0362521 0.537847
Income 0.177112 0.0271516 6.52 <1e-10 0.123896 0.230328
Educate 0.175634 0.0203308 8.64 <1e-17 0.135786 0.215481
────────────────────────────────────────────────────────────────────────────
julia> coeftable(container)
────────────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────────
(Intercept) -3.03426 0.325927 -9.31 <1e-19 -3.67307 -2.39546
Age 0.0283543 0.00346034 8.19 <1e-15 0.0215722 0.0351365
Race: white 0.250798 0.146457 1.71 0.0868 -0.0362521 0.537847
Income 0.177112 0.0271516 6.52 <1e-10 0.123896 0.230328
Educate 0.175634 0.0203308 8.64 <1e-17 0.135786 0.215481
────────────────────────────────────────────────────────────────────────────
julia> loglikelihood(container)
-1011.9906318515575
julia> aic(container)
2033.981263703115
julia> bic(container)
2061.9857760008254
StatsAPI.fit
— Methodfit(formula::FormulaTerm, data::DataFrame, modelClass::LogisticRegression, Link::Probit; kwargs...)
Fit a Logistic Regression model on the input data using the Probit link. Uses the glm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:LogisticRegression}
. Supports the same keyword arguments as glm.
StatsAPI.fit
— Methodfit(formula::FormulaTerm, data::DataFrame, modelClass::LogisticRegression, Link::Cloglog; kwargs...)
Fit a Logistic Regression model on the input data using the Cloglog link. Uses the glm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:LogisticRegression}
. Supports the same keyword arguments as glm.
StatsAPI.fit
— Methodfit(formula::FormulaTerm, data::DataFrame, modelClass::LogisticRegression, Link::Cauchit; kwargs...)
Fit a Logistic Regression model on the input data using the Cauchit link. Uses the glm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:LogisticRegression}
. Supports the same keyword arguments as glm.
Negative Binomial Regression
StatsAPI.fit
— Methodfit(formula::FormulaTerm, data::DataFrame, modelClass::NegBinomRegression; kwargs...)
Fit a Negative Binomial Regression model on the input data (with the default link function being the Log link). Uses the negbin method from the GLM package under the hood. Returns an object of type FrequentistRegression{:NegativeBinomialRegression}
. Supports the same keyword arguments as negbin.
Example
julia> using CRRao, RDatasets, StatsModels
julia> sanction = dataset("Zelig", "sanction")
78×8 DataFrame
Row │ Mil Coop Target Import Export Cost Num NCost
│ Int32 Int32 Int32 Int32 Int32 Int32 Int32 Cat…
─────┼───────────────────────────────────────────────────────────────────
1 │ 1 4 3 1 1 4 15 major loss
2 │ 0 2 3 0 1 3 4 modest loss
3 │ 0 1 3 1 0 2 1 little effect
4 │ 1 1 3 1 1 2 1 little effect
5 │ 0 1 3 1 1 2 1 little effect
6 │ 0 1 3 0 1 2 1 little effect
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
73 │ 1 3 1 1 1 2 14 little effect
74 │ 0 2 1 0 0 1 2 net gain
75 │ 0 1 3 0 1 2 1 little effect
76 │ 0 4 3 1 0 2 13 little effect
77 │ 0 1 2 0 0 1 1 net gain
78 │ 1 3 1 1 1 2 10 little effect
66 rows omitted
julia> container = fit(@formula(Num ~ Target + Coop + NCost), sanction, NegBinomRegression())
Model Class: Count Regression
Likelihood Mode: Negative Binomial
Link Function: Log
Computing Method: Optimization
──────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────────────────
(Intercept) -1.14517 0.480887 -2.38 0.0172 -2.0877 -0.202652
Target 0.00862527 0.145257 0.06 0.9527 -0.276074 0.293324
Coop 1.06397 0.115995 9.17 <1e-19 0.836621 1.29131
NCost: major loss -0.23511 0.511443 -0.46 0.6457 -1.23752 0.7673
NCost: modest loss 1.30767 0.276012 4.74 <1e-05 0.766698 1.84865
NCost: net gain 0.183453 0.275387 0.67 0.5053 -0.356296 0.723202
──────────────────────────────────────────────────────────────────────────────────
Poisson Regression
StatsAPI.fit
— Methodfit(formula::FormulaTerm, data::DataFrame, modelClass::PoissonRegression; kwargs...)
Fit a Poisson Regression model on the input data (with the default link function being the Log link). Uses the glm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:PoissonRegression}
. Supports the same keyword arguments as glm.
Example
julia> using CRRao, RDatasets, StatsModels
julia> sanction = dataset("Zelig", "sanction")
78×8 DataFrame
Row │ Mil Coop Target Import Export Cost Num NCost
│ Int32 Int32 Int32 Int32 Int32 Int32 Int32 Cat…
─────┼───────────────────────────────────────────────────────────────────
1 │ 1 4 3 1 1 4 15 major loss
2 │ 0 2 3 0 1 3 4 modest loss
3 │ 0 1 3 1 0 2 1 little effect
4 │ 1 1 3 1 1 2 1 little effect
5 │ 0 1 3 1 1 2 1 little effect
6 │ 0 1 3 0 1 2 1 little effect
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
73 │ 1 3 1 1 1 2 14 little effect
74 │ 0 2 1 0 0 1 2 net gain
75 │ 0 1 3 0 1 2 1 little effect
76 │ 0 4 3 1 0 2 13 little effect
77 │ 0 1 2 0 0 1 1 net gain
78 │ 1 3 1 1 1 2 10 little effect
66 rows omitted
julia> container = fit(@formula(Num ~ Target + Coop + NCost), sanction, PoissonRegression())
Model Class: Poisson Regression
Likelihood Mode: Poison
Link Function: Log
Computing Method: Optimization
─────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────────────────
(Intercept) -1.91392 0.261667 -7.31 <1e-12 -2.42678 -1.40106
Target 0.157769 0.0653822 2.41 0.0158 0.0296218 0.285915
Coop 1.15127 0.0561861 20.49 <1e-92 1.04114 1.26139
NCost: major loss -0.324051 0.230055 -1.41 0.1590 -0.774951 0.126848
NCost: modest loss 1.71973 0.100518 17.11 <1e-64 1.52272 1.91674
NCost: net gain 0.463907 0.16992 2.73 0.0063 0.13087 0.796944
─────────────────────────────────────────────────────────────────────────────────
Extended functions from StatsAPI.jl
StatsAPI.coef
— Methodcoef(container::FrequentistRegression)
Estimated coefficients of the model. Extends the coef
method from StatsAPI.jl.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get table of coefficients
coef(container)
StatsAPI.coeftable
— Methodcoeftable(container::FrequentistRegression)
Table of coefficients and other statistics of the model. Extends the coeftable
method from StatsAPI.jl.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get table of coefficients
coeftable(container)
StatsAPI.r2
— Methodr2(container::FrequentistRegression)
Coeffient of determination. Extends the r2
method from StatsAPI.jl.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get r2
r2(container)
StatsAPI.adjr2
— Methodadjr2(container::FrequentistRegression)
Adjusted coeffient of determination. Extends the adjr2
method from StatsAPI.jl.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get adjr2
adjr2(container)
StatsAPI.loglikelihood
— Methodloglikelihood(container::FrequentistRegression)
Log-likelihood of the model. Extends the loglikelihood
method from StatsAPI.jl.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get loglikelihood
adjr2(container)
StatsAPI.aic
— Methodaic(container::FrequentistRegression)
Akaike's Information Criterion. Extends the aic
method from StatsAPI.jl.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get aic
aic(container)
StatsAPI.bic
— Methodbic(container::FrequentistRegression)
Bayesian Information Criterion. Extends the bic
method from StatsAPI.jl.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get bic
bic(container)
CRRao.sigma
— Methodsigma(container::FrequentistRegression)
The sigma
computes the residual standard error from StatsAPI.jl.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get sigma
sigma(container)
StatsAPI.predict
— Methodpredict(container::FrequentistRegression)
Predicted response of the model. Extends the predict
method from StatsAPI.jl.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get predicted response
predict(container)
StatsAPI.residuals
— Methodresiduals(container::FrequentistRegression)
Residuals of the model. Extends the residuals
method from StatsAPI.jl.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get residuals
residuals(container)
StatsAPI.cooksdistance
— Methodcooksdistance(container::FrequentistRegression)
Compute Cook's distance for each observation in a linear model. Extends the cooksdistance
method from StatsAPI.jl.
Example
using CRRao, RDatasets, StatsModels
# Get the dataset
mtcars = dataset("datasets", "mtcars")
# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())
# Get vector of Cook's distances
cooksdistance(container)