Data loading using readr

Cleaning Data with tidyverse (What even is tidy data?)

Dear Friends, In this tutorial, we will read and parse a CSV file using the readr package’s read CSV function. CSV (Comma-Separated Values) files contain data separated by commas. The following CSV file will be used in the following example. To begin, pass the path to the file to be read to the read_csv function. The read CSV function generates tibbles that can be attached to variables.

R

# load the tidyverse by running this code: 
library(tidyverse) 
  
# create a tibble named rand 
rand <- read_csv("Example.csv") 

Output:

chr (2): ── Column specification ────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Code, Age_single_years
dbl (2): Census_night_population_count, Census_usually_resident_population_count

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message

Inline CSV input is very useful, and these options can also help you in normal file parsing too.

R

# give inline csv input 
read_csv("a,b,c 
  1,2,3 
  4,5,6") 

Output:

      a     b     c
  <dbl> <dbl> <dbl>
1     1     2     3
2     4     5     6

The first line of a CSV file is the name of the columns. However, there are other options for dealing with exceptions.

R

read_csv("first line of metadata 
  second line of metadata 
  a,b,c 
  1,2,3", skip = 2) 

Output:

      x     y     z
  <dbl> <dbl> <dbl>
1     1     2     3

R

# when we need to ignore comments in csv file 
read_csv("#ignore it is a comment 
 #ignore this is another comment 
 x,y,z 
 1,2,3 
 4,5,6", comment = "#") 

Output:

      x     y     z
  <dbl> <dbl> <dbl>
1     1     2     3
2     4     5     6

If the first line is not the name of the columns, then we can do this

R

# If you do not set column names then R does it for you. 
# The false flag tells the computer that the  
# first line is not column names. 
read_csv("1,2,3\n4,5,6", col_names=FALSE) 

Output:

     X1    X2    X3
  <dbl> <dbl> <dbl>
1     1     2     3
2     4     5     6

R

# You can set custom column names 
read_csv("1,2,3\n4,5,6", col_names = c("COLUMN1","COLUMN2","COLUMN3"))

Output:

  COLUMN1 COLUMN2 COLUMN3
    <dbl>   <dbl>   <dbl>
1       1       2       3
2       4       5       6

R

# you can use na to represent missing data 
read_csv("a,b,c\n1,2,.", na = ".")

Output:

      a     b c    
  <dbl> <dbl> <lgl>
1     1     2 NA

Loading and Cleaning Data with R and the tidyverse

The tidyverse is a collection of packages that work well together due to shared data representations and API design. The tidyverse package is intended to make it simple to install and load core tidyverse packages with a single command.

To install tidyverse, put the following code in RStudio:

R

# Install from CRAN 
install.packages("tidyverse") 
  
# to check your installation  
library(tidyverse) 

Output:

── Attaching packages ─────────────────────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

The tidyr package will be used for data cleaning, and the readr package will be used for data loading.

Tags:

#R Language

Cleaning Data with tidyverse (What even is tidy data?)

Data loading using readr

R

R

R

R

R

R

R

Loading and Cleaning Data with R and the tidyverse

R

Similar Reads

Contact Us