General Interface
Understanding the interface
CRRao exports the fit function, which is used to train all types of models supported by the package. As of now, the function supports the following signatures.
fit(formula, data, modelClass)
fit(formula, data, modelClass, link)
fit(formula, data, modelClass, prior)
fit(formula, data, modelClass, link, prior)It should be noted that not all model classes support every type of signature. The parameters passed above mean the following.
The parameter
formulamust be a formula of typeStatsModels.FormulaTerm. Any formula has an LHS and an RHS. The LHS represents the response variable, and the RHS represents the independent variables.The parameter
datamust be aDataFrame. This variable represents the dataset on which the model must be trained.modelClassrepresents the type of the statistical model to be used. Currently, CRRao supports four regression models, and the type ofmodelClassmust be one of the following:Certain model classes (like Logistic Regression) support link functions; this is represented by the
linkparameter. Currently four link functions are supported: Logit, Probit, Cloglog and Cauchit. So, the type oflinkmust be one of the following:CRRao also supports Bayesian models, and the priors to be can be specified while calling
fit. Currently CRRao supports six different kinds of priors, and the type of thepriorparameter must be one of the following.
Model Classes and Data Models
CRRao.LinearRegression — TypeLinearRegressionType representing the Linear Regression model class.
\[y =\alpha + X \beta+ \varepsilon,\]
where
\[\varepsilon \sim N(0,\sigma^2),\]
- $y$ is the response vector of size $n$,
- $X$ is the matrix of predictor variable of size $n \times p$,
- $n$ is the sample size, and $p$ is the number of predictors,
- $\alpha$ is the intercept of the model,
- $\beta$ is the regression coefficients of the model, and
- $\sigma$ is the standard deviation of the noise $\varepsilon$.
CRRao.LogisticRegression — TypeLogisticRegressionType representing the Logistic Regression model class.
\[y_i \sim Bernoulli(p_i), \]
where $i=1,2,\cdots,n, 0 < p_i < 1$,
- $\mathbb{E}(y_i)=p_i$,
- $\mathbb{P}(y_i=1) = p_i$ and $\mathbb{P}(y_i=0) = 1-p_i$, such that
\[\mathbb{E}(y_i)= p_i =g(\alpha +\mathbf{x}_i^T\beta),\]
- $g(.)$ is the link-function,
- $y_i$ is the $i^{th}$ element of the response vector $y$,
- $\mathbf{x}_i=(x_{i1},x_{i2},\cdots,x_{in})$ is the $i^{th}$ row of the design matix of size $n \times p$,
- $\alpha$ is the intercept of the model, and
- $\beta$ is the regression coefficients of the model.
CRRao.NegBinomRegression — TypeNegBinomRegressionType representing the Negative Binomial Regression model class.
\[y_i \sim NegativeBinomial(\mu_i,\phi), i=1,2,\cdots,n\]
where
\[\mu_i = \exp(\alpha +\mathbf{x}_i^T\beta),\]
- $y_i$ is the $i^{th}$ element of the response vector $y$,
- $\mathbf{x}=(x_{i1},x_{i2},\cdots,x_{in})$ is the $i^{th}$ row of the design matix of size $n \times p$,
- $\alpha$ is the intercept of the model, and
- $\beta$ is the regression coefficients of the model.
CRRao.PoissonRegression — TypePoissonRegressionType representing the Poisson Regression model class.
\[y_i \sim Poisson(\lambda_i), i=1,2,\cdots,n\]
where
\[\lambda_i = \exp(\alpha +\mathbf{x}_i^T\beta),\]
- $y_i$ is the $i^{th}$ element of the response vector $y$,
- $\mathbf{x}=(x_{i1},x_{i2},\cdots,x_{in})$ is the $i^{th}$ row of the design matix of size $n \times p$,
- $\alpha$ is the intercept of the model, and
- $\beta$ is the regression coefficients of the model.
Link functions.
CRRao.CRRaoLink — TypeCRRaoLinkAbstract type representing link functions which are used to dispatch to appropriate calls.
CRRao.Logit — TypeLogit <: CRRaoLinkA type representing the Logit link function, which is defined by the formula
\[z\mapsto \dfrac{1}{1 + \exp(-z)}\]
CRRao.Probit — TypeProbit <: CRRaoLinkA type representing the Probit link function, which is defined by the formula
\[z\mapsto \mathbb{P}[Z\le z]\]
where $Z\sim \text{Normal}(0, 1)$.
CRRao.Cloglog — TypeCloglog <: CRRaoLinkA type representing the Cloglog link function, which is defined by the formula
\[z\mapsto 1 - \exp(-\exp(z))\]
CRRao.Cauchit — TypeCauchit <: CRRaoLinkA type representing the Cauchit link function, which is defined by the formula
\[z\mapsto \dfrac{1}{2} + \dfrac{\text{atan}(z)}{\pi}\]
Prior Distributions
CRRao.Prior_Gauss — TypePrior_GaussType representing the Gaussian Prior. Users have specific prior mean and standard deviation, for $\alpha$ and $\beta$ for linear regression model.
Prior model
\[\sigma \sim InverseGamma(a_0,b_0),\]
\[\alpha | \sigma,v \sim Normal(\alpha_0,\sigma_{\alpha_0}),\]
\[\beta | \sigma,v \sim Normal_p(\beta_0,\sigma_{\beta_0}),\]
Likelihood or data model
\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]
\[y_i \sim N(\mu_i,\sigma),\]
Note: $N()$ is Gaussian distribution of $y_i$, where
- $\mathbf{E}(y_i)=g(\mu_i)$, and
- $Var(y_i)=\sigma^2$.
CRRao.Prior_Ridge — TypePrior_RidgeType representing the Ridge Prior.
Prior model
\[v \sim InverseGamma(h,h),\]
\[\sigma \sim InverseGamma(a_0,b_0),\]
\[\alpha | \sigma,v \sim Normal(0,v*\sigma),\]
\[\beta | \sigma,v \sim Normal_p(0,v*\sigma),\]
Likelihood or data model
\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]
\[y_i \sim D(\mu_i,\sigma),\]
Note: $D()$ is appropriate distribution of $y_i$ based on the modelClass, where
- $\mathbf{E}(y_i)=g(\mu_i)$, and
- $Var(y_i)=\sigma^2$.
CRRao.Prior_Laplace — TypePrior_LaplaceType representing the Laplace Prior.
Prior model
\[v \sim InverseGamma(h,h),\]
\[\sigma \sim InverseGamma(a_0,b_0),\]
\[\alpha | \sigma,v \sim Laplace(0,v*\sigma),\]
\[\beta | \sigma,v \sim Laplace(0,v*\sigma),\]
Likelihood or data model
\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]
\[y_i \sim D(\mu_i,\sigma),\]
Note: $D()$ is appropriate distribution of $y_i$ based on the modelClass, where
- $\mathbf{E}(y_i)=g(\mu_i)$, and
- $Var(y_i)=\sigma^2$.
CRRao.Prior_Cauchy — TypePrior_CauchyType representing the Cauchy Prior.
Prior model
\[\sigma \sim Half-Cauchy(0,1),\]
\[\alpha | \sigma \sim Cauchy(0,\sigma),\]
\[\beta | \sigma \sim Cauchy(0,v*\sigma),\]
Likelihood or data model
\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]
\[y_i \sim D(\mu_i,\sigma),\]
Note: $D()$ is appropriate distribution of $y_i$ based on the modelClass, where
- $\mathbf{E}(y_i)=g(\mu_i)$, and
- $Var(y_i)=\sigma^2$.
CRRao.Prior_TDist — TypePrior_TDistType representing the T-Distributed Prior.
Prior model
\[v \sim InverseGamma(h,h),\]
\[\sigma \sim InverseGamma(a_0,b_0),\]
\[\alpha | \sigma,v \sim \sigma t(v),\]
\[\beta | \sigma,v \sim \sigma t(v),\]
Likelihood or data model
\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]
\[y_i \sim D(\mu_i,\sigma),\]
Note: $D()$ is appropriate distribution of $y_i$ based on the modelClass, where
- $\mathbf{E}(y_i)=g(\mu_i)$, and
- $Var(y_i)=\sigma^2$.
- The $t(v)$ is $t$ distribution with $v$ degrees of freedom.
CRRao.Prior_HorseShoe — TypePrior_HorseShoeType representing the HorseShoe Prior.
Prior model
\[\tau \sim HalfCauchy(0,1),\]
\[\lambda_j \sim HalfCauchy(0,1), j=1,2,\cdots,p\]
\[\sigma \sim HalfCauchy(0,1),\]
\[\alpha | \sigma,\tau \sim N(0,\tau *\sigma),\]
\[\beta_j | \sigma,\lambda_j ,\tau \sim Normal(0,\lambda_j *\tau *\sigma),\]
Likelihood or data model
\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]
\[y_i \sim D(\mu_i,\sigma), i=1,2,\cdots,n\]
Note: $D()$ is appropriate distribution of $y_i$ based on the modelClass, where
- $\mathbf{E}(y_i)=g(\mu_i)$,
- $Var(y_i)=\sigma^2$, and
- $\beta$=($\beta_1,\beta_2,\cdots,\beta_p$)
Setting Random Number Generators
CRRao.set_rng — Functionset_rng(rng)Set the random number generator. This is useful if you want to work with reproducible results. rng must be a random number generator.
Example
using StableRNGs
CRRao.set_rng(StableRNG(1234))