# 1
<- d %>%
d mutate(s=factor(c("M", "F"))[as.numeric(substr(id,nchar(id),nchar(id))) %% 2])
# 2
<- d %>% filter(s=="M") mData
Naming stuff
1 Naming stuff
1.1 Meaning
Let variable, function and file names convey meaning.
Look at the code and suggest better variable names
Suggested solution
# 1
<- data %>%
data mutate(sex=factor(c("M", "F"))[as.numeric(substr(id,nchar(id),nchar(id))) %% 2])
# 2
<- data %>% filter(sex=="M")
data_males_only <- data %>% filter(sex=="M") males_only_data
An even better solution
<- data %>%
data mutate(sex=cpr2sex(id))
Alas, the function cpr2sex
does not exist in base R or Tidyverse, but we can write it ourselves:
Tip
# Requires a custom function like this -- which could be sourced from file
<- function(x) {
cpr2sex # This function takes a string (x), presumed to be a valid Danish CPR
# and return "F", "M" or NA depending on the last character in the string
# If the last CPR character is an even number, it indicates female sex, and
# an odd number indicates male sex.
if (str_sub(x, str_length(x), str_length(x)) %in% c("0","2","4","6","8")) {
return("F")
else if (str_sub(x, str_length(x), str_length(x)) %in% c("1","3","5","7","9")) {
} return("M")
} return(NA) # Last character in CPR is not a ciffre
}
We could hide this away in a separate file and ‘source’ it .. or even make a new package…
1.2 Compound names
- Do use under_scores
- Do not use camelCase
- Do not use kebab-case
1.3 Nouns and verbs
<- function(x) {
make_larger_by_10 return(x+10)
}
<- make_larger_by_10(112)
ten_larger
# For instance:
# selected_data <- data %>% select(..)
1.4 Names
Main points
- Names should be meaningful
- Use under_scores, not CamelCase, nor kebab-case
- Function names should be verbs
- Variable names should be nouns