Data Analysis - R code

Data Analysis
(R code)
1. Population and Sampling Distributions
Distribution test
2. Bivariate Regression Analysis
Quantile-Normal Plot
car::qqPlot( )
car::scatterplot( )
shapiro.test( )
Box-Cox Transformation
car::powerTransform( )
car::bcPower( )
Gamma for negative values
Lambda for all transformation
symbox( )
Confidence interval
Regression Line
Points
3. Multiple Regression Analysis
car::scatterplotMatrix( )
Conditional Effect Plots
effects:: allEffects ( )
plot(Effect( ))
Beta Coefficients
coefplot ::coefplot( )
Stepwise Regression
MASS:: StepAIC( )
Factoer Analysis
scatterplot( y~x1 | x2 )
4. Instrumental Variable Regression
AER::ivreg( original model | instrumental variable + exogenous )
summary(ivreg, diagnostics=T)
Sargan test
n * R square, n means n observations
n <- nobs ( model )
1-pchisq( )
Combination of IV estimation
5. Regression Diagnostics
Multicollinearity
vif( )
Partial effects plots
avPlots(lm)
Residual plots (Tukey test)
residualPlots(lm)
student.resid
rstudent(lm)
DFBeta
dfbetas(lm)
Cook distance
cooks.distance(lm)
Leverage Plot
Leverage Plot
Diagnostics Plot
Bonferroni p-values
car::influenceIndexPlot(lm)
Then use boxplot
6. Spatial Autocorrelation & Heteroscedasticity
Heteroscedasticity
car::ncvTest(lmBase, var.formula=~log(pop), data=Bladder)
lmUpdated <- update(lmBase, weights=1/exp(predLogSigma2))
lmHetero( )
weighted.residuals( )
Spatial Autocorrelation
mapColorQual(prov.shp$REGION, prov.shp, map.title="Italy's Regions", legend.title="Region", add.to.map=T)
Link matrix
poly2nb(prov.shp, queen=F)
row-sum standardized neighbors
prov.linkW <- nb2listw(prov.link, style="W")
Test with W-coding scheme
lm.morantest(fert.wlm, prov.linkW)
moran.plot(weighted.residuals(fert.wlm),prov.linkW, labels=prov.shp$PROVNAME)
Spatial autoregressive model (SAR)
spautolm( )
7. Logistic Regression Analysis
Normal Logistic
glm(y~x,family=binomial(logit))
glm(y~x,family=binomial(probit))
confint(GLM.01, level=0.95, type="Wald",trace = FALSE)
Find the confidence interval of regression coefficients
Effects Plot
effects::plot(allEffects(GLM.03), type="response", ylim=c(0,1), ask=FALSE)
Low & High Prob
eff.GLM.low <- effect("lived",GLM.03, given.values=c(educ=20,"contamyes"=0,"hscyes"=0,"nodadyes"=1))
Residual Exploration
resid.GLM.03 <- residuals(GLM.03, type="response")
pred.GLM.03 <- predict(GLM.03, type="response")
8. The Generalized Linear Model
Overcome dispersion
Normal Possion
Likelihood Ratio Test
Logistic Regression
LR<- -2 *(logLik(log2) - logLik(log1))
anova(log2, log1, test = "LRT")
17