Setting up your course project

Author

Steen Flammild Harsted

Published

January 10, 2024

  

1 Setting up your course project

Follow these instructions step by step:

  1. Start a new project. Think about the name and where you place the project. You are going to be using this project a lot. (Important: project_and_folder_names_matter)
  2. Create a folder in your project called raw_data
  3. Create a folder in your project called clean_data
  4. Create a folder in your project called scripts
  5. Place the soldiers.csv file from ItsLearning in the raw_data folder
  6. Create an R script (File -> New File -> R script), save it in the scripts folder and call it 01_import.R
  7. Close the R script you just created and leave it for now.

    This file and folder structure is a basic setup that will work for most projects. We will return to this later today, and on day 4 you will learn how to improve this setup even further.
  • Discuss with you neighbor what the meaning of this madness can be about?









2 Setting up your course project (continued)

Important

You should not do this before you have completed the wrangling exercises for select(), filter(), summarise(), group_by(), arrange(), and mutate()

You now have the skills to continue the work we started in Section 1.
This coming task is important for the rest of your course.

As you have noticed the soldiers dataset is not perfect when we load it, e.g., Height is measured in inches, weightkg is measured in Kg*10, etc..
Therefore, we need to change a few things before we can continue our work with this data.
It is essential in datascience and research that this process is documented and reproducible.

  • Open the R script 01_import.R that you created in Section 1.
  • For the rest of this section you are going to work in this R script.

 

Write the necessary code to import the soldiers.csv file and update the data

Write comments and explain your code as you solve the steps below

  • Add heightcm (height in cm)
  • Fix weightkg
  • Explore the sex variable and fix it
  • Add BMI
  • Add category (level of BMI)
  • Add race - Base the values in race on the description below
  • Remove Heightin
  • Place the variables in an order that you like (use relocate())
  • Make sure that all changes are assigned to soldiers. Your script should provide you with an updated version of soldiers in the environment pane.

DODRace is a variable in the soldiers dataset. The description is given below: 

DODRace – Department of Defense Race; a single digit indicating a subject’s self-reported preferred single race where selecting multiple races is not an option. This variable is intended to be comparable to the Defense Manpower Data Center demographic data. Where 1 = White, 2 = Black, 3 = Hispanic, 4 = Asian, 5 = Native American, 6 = Pacific Islander, 8 = Other