Regression

easystats

Author

Steen Flammild Harsted

Published

January 10, 2024

1 Regression

Install the easystats package and add a library call to easystats in the code chunk where you call your other libraries.

1.1 The `palmerpenguins` dataset

Install the palmerpenguins package and add a library call to palmerpenguins in the code chunk where you call your other libraries. Read the documentation for palmerpenguins here

1.1.0.1 Using the `penguins` dataset. Let `body_mass_g` be your dependent variable (y), and investigate how changes in the other variables in `penguins` impact `body_mass_g`

Build at least two models
Consider making the formula for one of the models body_mass_g ~ . - What does this do?
Store them as (pen_model1, pen_model2, etc.)
Remember that we are just practicing coding here. Today is about How do we code it. The purpose of this particular exercise is to introduce you to a few very useful functions from the easystats package(s), and to get you started with using lm().

Code

pen_model1 <- penguins %>% lm(body_mass_g ~ sex + species, data = .)
pen_model2 <- penguins %>% lm(body_mass_g ~ bill_depth_mm + bill_length_mm + flipper_length_mm, data = .)
pen_model3 <- penguins %>% lm(body_mass_g ~ ., data = .)

1.1.0.2 Base R

Print a summary of penguins (summary(penguins))
Explore the elements in one of your models using $
- Type pen_model1$ and explore the drop down menu
- coefficients(pen_model1)
- Print a summary() of one of your models

1.1.0.3 `easystats`

Check the assumptions of your models (check_model)
- Discuss if/why some assumptions are violated
- What could be done to remedy this?

Code

check_model(pen_model1)

Code

check_model(pen_model2)

Code

check_model(pen_model3)

Investigate the model parameters of your models model_parameters()

Code

model_parameters(pen_model1)
model_parameters(pen_model2)
model_parameters(pen_model3)

Compare your models compare_models()

Code

compare_models(pen_model1, pen_model2, pen_model3)

Compare the performance of your models compare_performance()

Code

compare_performance(pen_model1, pen_model2, pen_model3)

Plot a comparison of your models compare_performance() %>% plot()

Code

compare_performance(pen_model1, pen_model2, pen_model3) %>% plot()

Make a report of the model you think is the best

Code

report(pen_model3)

1.2 The `PhDPublications` dataset

Install the package AER and load the package (+add library call to your library code chunk)
AER contains >100 different datasets (to see a list: data(package = 'AER')).
Load the dataset PhDPublications, by writing data(PhDPublications)
Read the documentation for PhDPublications

1.2.0.1 Using `PhDPublications`, investigate what influenced the number of `articles` PhD students published during the last 3 years of their PhD.

We want to predict articles what type of data is this?
- ANSWER
  R will tell you this is an integer, but if you read the documentaion you will realize that the numbers represent Count or Rate data (Number of articles within a three year period). We typically need to model this type of data using another approach than linear regression. Lets investigate the distribution of articles.
Investigate the distribution of the articles variable (see plot below).
- Does it look normal?
- Can you recognize the type of distribution?
- ANSWER
  This looks like a possion distribution, and you should not model this using normal linear regression with lm(). Instead you should use glm(, family = poisson())

Code

PhDPublications %>% 
  ggplot(aes(x = articles))+
  geom_density(bw = 0.5, fill = "lightgrey", size = 1)+
  theme_modern()

Build at least two models
Consider making the formula for one of the models articles ~ . - What does this do?
Store them as (phd_model1, phd_model2, etc.)
Remember that we are just practicing coding here. Today is about How do we code it. The purpose of this particular exercise is to show you that you can use the same easystats functions to work with different model types. The output will automagically adjust. This is a big advantage to other packages and programs.

Code

phd_model1 <- PhDPublications %>% glm(articles ~ gender, data = ., family = poisson())
phd_model2 <- PhDPublications %>% glm(articles ~ ., data = ., family = poisson())
phd_model3 <- PhDPublications %>% glm(articles ~ gender + kids + mentor,  data = ., family = poisson())

1.2.0.2 Base R

Print a summary of PhDPublications (summary(PhDPublications))
Explore the elements in one of your models using $
- Type phd_model1$ and explore the drop down menu
- coefficients(phd_model1)
- Print a summary() of one of your models

1.2.0.3 `easystats`

Check the assumptions of your models (check_model)
- Discuss if/why some assumptions are violated
- What could be done to remedy this?

Code

check_model(phd_model1)

Code

check_model(phd_model2)

Code

check_model(phd_model3)

Investigate the model parameters of your models model_parameters()

Code

model_parameters(phd_model1)
model_parameters(phd_model2)
model_parameters(phd_model3)

Compare your models compare_models()

Code

compare_models(phd_model1, phd_model2, phd_model3)

Compare the performance of your models compare_performance()

Code

compare_performance(phd_model1, phd_model2, phd_model3)

Plot a comparison of your models compare_performance() %>% plot()

Code

compare_performance(phd_model1, phd_model2, phd_model3) %>% plot()

Make a report of the model you think is the best

Code

report(phd_model3)

Want More?

For more information on easystats read here.

1.2.0.4 `gtsummary` can also be used to create regressiontables:

tbl_regression()
add_n()
add_glance_source_note()

Code

pen_model2 %>% 
  tbl_regression() %>% 
  add_n() %>% 
  add_glance_source_note()

try to pipe tbl_regression() to a plot()

Code

pen_model2 %>% 
  tbl_regression() %>% 
  plot()

tbl_uvregression() is great for a large number of univariable regressions

Code

penguins %>% 
  tbl_uvregression(y = body_mass_g,
                   method = lm) %>% 
  bold_labels() %>% 
  italicize_levels()

1.2.0.5 `GGally`

Install the GGally package and add a library call to the code chunk where you call your libraries.

GGally has many powerful functions for dataanalysis
- ggpairs() to investigate your data. Try:
  - ggpairs(PhDPublications)
  - ggpairs(PhDPublications, mapping = aes(color = gender, alpha = 0.2))
- ggcoef_model() to plot you coefficients. Try:

Code

models <- list("model 2" = phd_model2, "model 3" = phd_model3)
ggcoef_compare(models = models)

1 Regression

1.1 The palmerpenguins dataset

1.1.0.1 Using the penguins dataset. Let body_mass_g be your dependent variable (y), and investigate how changes in the other variables in penguins impact body_mass_g

1.1.0.2 Base R

1.1.0.3 easystats

1.2 The PhDPublications dataset

1.2.0.1 Using PhDPublications, investigate what influenced the number of articles PhD students published during the last 3 years of their PhD.

1.2.0.2 Base R

1.2.0.3 easystats

Want More?

1.2.0.4 gtsummary can also be used to create regressiontables:

1.2.0.5 GGally

1.1 The `palmerpenguins` dataset

1.1.0.1 Using the `penguins` dataset. Let `body_mass_g` be your dependent variable (y), and investigate how changes in the other variables in `penguins` impact `body_mass_g`

1.1.0.3 `easystats`

1.2 The `PhDPublications` dataset

1.2.0.1 Using `PhDPublications`, investigate what influenced the number of `articles` PhD students published during the last 3 years of their PhD.

1.2.0.3 `easystats`

1.2.0.4 `gtsummary` can also be used to create regressiontables:

1.2.0.5 `GGally`