Hypothesis testing

ggstatsplot

Author

Steen Harsted

Published

January 10, 2024

1 `ggstatsplot`

Install the ggstatsplot and rstatix packages and add the library calls for these packages to your library code chunk

1.1 The `bugs_long` dataset

bugs_long provides information on the extent to which men and women want to kill arthropods that vary in disgustingness (low, high) and freighteningness (low, high) (four groups in total). Each participant rated their attitude towards all four kinds of anthropods. bugs_long is a subset of the data reported by Ryan et al.(2013) .

Note that this is a repeated measures design because the same participant gave four different ratings across four different conditions (LDLF, LDHF, HDLF, HDHF).

desire - The desire to kill an arthropod was indicated on a scale from 0 to 10
gender Male/Female
region
condition
- LDLF: low disgustingness and low freighteningness
- LDHF: low disgustingness and high freighteningness
- HDLF: high disgustingness and low freighteningness
- HDHF: high disgustingness and high freighteningness

Picture from Ryan et al. (2013) https://doi.org/10.1016/j.chb.2013.01.024

1.1.0.1 In `bugs_long`, is there a difference within the participants in their `desire` to kill bugs from the four different `conditions`?

Should you use ggwithinstats() or ggbetweenstats() when comparing
Is it reasonable to assume normality?

Code

bugs_long %>% group_by(condition) %>% shapiro_test(desire)

# qqplot
bugs_long %>% 
  ggplot(aes(sample = desire, group = condition)) +
  geom_qq()+
  geom_qq_line()

Code

# Density plot
bugs_long %>% 
  ggplot(aes(x = desire, fill = condition)) +
  geom_density(alpha = 0.2)

Make the appropriate test

Code

bugs_long %>% 
  ggwithinstats(x = condition,
                y = desire,
                type = "nonparametric")

Code

# Note that the ggstatstutorial actually runs this as a "parametric" test

What is the name of the statistical test that was performed?
What is your interpretation?
What is the consequence if you change the type of test?

1.1.0.2 Is there a difference between men and women in the `desire` to kill bugs that are LDHF (low disgustingness and high freighteningness).

Create a filtered data frame of bugs_long

Code

bl_LDHF <- bugs_long %>% filter(condition == "LDHF")

Should you use ggwithinstats() or ggbetweenstats() for this test?
Is it reasonable to assume normality?

Code

bl_LDHF %>% 
  filter(!is.na(gender), !is.na(desire)) %>% 
  group_by(gender) %>% 
  shapiro_test(desire)

# qqplot
bl_LDHF %>% 
  ggplot(aes(sample = desire, color = gender)) +
  geom_qq()+
  geom_qq_line()

Code

# Density plot
bl_LDHF %>% 
  ggplot(aes(x = desire, fill = gender)) +
  geom_density(alpha = 0.2)

Make the appropriate test

Code

bl_LDHF %>% 
  ggbetweenstats(x = gender,
                 y = desire,
                 type = "nonparametric")

What is the name of the statistical test that was performed?
What is your interpretation?
What is the consequence if you change the type of test?

1.1.0.3 Is there a difference in the frequency of men and women between `North America` and the remaining regions?.

First, lump region togehter to two levels (fct_lump())
Reduce the data to one row pr subject ID, and discuss why this is a good idea.

Code

bl_region <- bugs_long %>%
  mutate(region = fct_lump(region, 1)) %>% 
  group_by(subject) %>% 
  slice(1) %>% 
  ungroup()

Should you use ggwithinstats() or ggbetweenstats() or perhaps ggbarstats() for this test?
Should you asses normality?

Code

# Both variables are factors (categorical). Normality has to do with continuous data

Make the appropriate test

Code

bl_region %>% 
  ggbarstats(x = gender,
             y = region)

What is the name of the statistical test that was performed? (check the help page under the paired argument)
What is your interpretation?
What is the consequence if you change the type of test?

1.2 The `ToothGrowth` dataset

ToothGrowth gives information on tooth length in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

len Tooth length
supp Supplement type
- VC Vitamin C as ascorbic acid
- OJ Orange Juice
dose Dose in milligrams/day (0.5, 1, or 2)

1.2.0.1 Is there a difference in Tooth length based on the type of supplement?

Should you use ggwithinstats() or ggbetweenstats() when comparing
Is it reasonable to assume normality?

Code

ToothGrowth %>% group_by(supp) %>% shapiro_test(len)

# qqplot
ToothGrowth %>% 
  ggplot(aes(sample = len, color = supp)) +
  geom_qq()+
  geom_qq_line()

Code

# Density plot
ToothGrowth %>% 
  ggplot(aes(x = len, fill = supp)) +
  geom_density(alpha = 0.2)

Make the appropriate test

Code

ToothGrowth %>% 
  ggbetweenstats(x = supp,
                 y = len,
                 type = "robust")

Code

# It is likely completely fine to run this as a "parametric" test

What is the name of the statistical test that was performed?
What is your interpretation?
What is the consequence if you change the type of test?

1 ggstatsplot

1.1 The bugs_long dataset

1.1.0.1 In bugs_long, is there a difference within the participants in their desire to kill bugs from the four different conditions?

1.1.0.2 Is there a difference between men and women in the desire to kill bugs that are LDHF (low disgustingness and high freighteningness).

1.1.0.3 Is there a difference in the frequency of men and women between North America and the remaining regions?.

1.2 The ToothGrowth dataset

1.2.0.1 Is there a difference in Tooth length based on the type of supplement?

1 `ggstatsplot`

1.1 The `bugs_long` dataset

1.1.0.1 In `bugs_long`, is there a difference within the participants in their `desire` to kill bugs from the four different `conditions`?

1.1.0.2 Is there a difference between men and women in the `desire` to kill bugs that are LDHF (low disgustingness and high freighteningness).

1.1.0.3 Is there a difference in the frequency of men and women between `North America` and the remaining regions?.

1.2 The `ToothGrowth` dataset