Setting up your course project
1 Setting up your course project
Follow these instructions step by step:
- Start a new project. Think about the name and where you place the project. You are going to be using this project a lot. (Important: project_and_folder_names_matter)
- Create a folder in your project called
raw_data
- Create a folder in your project called
clean_data
- Create a folder in your project called
scripts
- Place the
soldiers.csv
file from ItsLearning in theraw_data
folder
- Create an R script (File -> New File -> R script), save it in the
scripts
folder and call it01_import.R
- Close the R script you just created and leave it for now.
This file and folder structure is a basic setup that will work for most projects. We will return to this later today, and on day 4 you will learn how to improve this setup even further.
- Discuss with you neighbor what the meaning of this madness can be about?
2 Setting up your course project (continued)
You should not do this before you have completed the wrangling exercises for select()
, filter()
, summarise()
, group_by()
, arrange()
, and mutate()
You now have the skills to continue the work we started in Section 1.
This coming task is important for the rest of your course.
As you have noticed the soldiers
dataset is not perfect when we load it, e.g., Height
is measured in inches, weightkg
is measured in Kg*10, etc..
Therefore, we need to change a few things before we can continue our work with this data.
It is essential in datascience and research that this process is documented and reproducible.
- Open the R script
01_import.R
that you created in Section 1.
- For the rest of this section you are going to work in this R script.
Write the necessary code to import the soldiers.csv
file and update the data
Write comments and explain your code as you solve the steps below
- Add
heightcm
(height in cm) - Fix
weightkg
- Explore the
sex
variable and fix it - Add
BMI
- Add
category
(level of BMI) - Add
race
- Base the values in race on the description below - Remove
Heightin
- Place the variables in an order that you like (use
relocate()
) - Make sure that all changes are assigned to
soldiers
. Your script should provide you with an updated version ofsoldiers
in the environment pane.
race
in soldiers
DODRace
is a variable in the soldiers
dataset. The description is given below:
DODRace
– Department of Defense Race; a single digit indicating a subject’s self-reported preferred single race where selecting multiple races is not an option. This variable is intended to be comparable to the Defense Manpower Data Center demographic data. Where 1 = White, 2 = Black, 3 = Hispanic, 4 = Asian, 5 = Native American, 6 = Pacific Islander, 8 = Other