Data loading using readr

Dear Friends, In this tutorial, we will read and parse a CSV file using the readr package’s read CSV function. CSV (Comma-Separated Values) files contain data separated by commas. The following CSV file will be used in the following example. To begin, pass the path to the file to be read to the read_csv function. The read CSV function generates tibbles that can be attached to variables. 

R




# load the tidyverse by running this code:
library(tidyverse)
  
# create a tibble named rand
rand <- read_csv("Example.csv")


Output:

chr (2): ── Column specification ────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Code, Age_single_years
dbl (2): Census_night_population_count, Census_usually_resident_population_count

β„Ή Use `spec()` to retrieve the full column specification for this data.
β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message

 

Inline CSV input is very useful, and these options can also help you in normal file parsing too.

R




# give inline csv input
read_csv("a,b,c
  1,2,3
  4,5,6")


Output:

      a     b     c
  <dbl> <dbl> <dbl>
1     1     2     3
2     4     5     6

The first line of a CSV file is the name of the columns. However, there are other options for dealing with exceptions. 

R




read_csv("first line of metadata
  second line of metadata
  a,b,c
  1,2,3", skip = 2)


Output:

      x     y     z
  <dbl> <dbl> <dbl>
1     1     2     3

R




# when we need to ignore comments in csv file
read_csv("#ignore it is a comment
 #ignore this is another comment
 x,y,z
 1,2,3
 4,5,6", comment = "#")


Output:

      x     y     z
  <dbl> <dbl> <dbl>
1     1     2     3
2     4     5     6

If the first line is not the name of the columns, then we can do this

R




# If you do not set column names then R does it for you.
# The false flag tells the computer that the 
# first line is not column names.
read_csv("1,2,3\n4,5,6", col_names=FALSE)


Output:

     X1    X2    X3
  <dbl> <dbl> <dbl>
1     1     2     3
2     4     5     6

R




# You can set custom column names
read_csv("1,2,3\n4,5,6", col_names = c("COLUMN1","COLUMN2","COLUMN3"))


Output:

  COLUMN1 COLUMN2 COLUMN3
    <dbl>   <dbl>   <dbl>
1       1       2       3
2       4       5       6

R




# you can use na to represent missing data
read_csv("a,b,c\n1,2,.", na = ".")


Output:

      a     b c    
  <dbl> <dbl> <lgl>
1     1     2 NA 
 

Loading and Cleaning Data with R and the tidyverse

The tidyverse is a collection of packages that work well together due to shared data representations and API design. The tidyverse package is intended to make it simple to install and load core tidyverse packages with a single command.

To install tidyverse, put the following code in RStudio:

R




# Install from CRAN
install.packages("tidyverse")
  
# to check your installation 
library(tidyverse)


Output:

── Attaching packages ─────────────────────────────────────────────────────── tidyverse 1.3.2 ──
βœ” ggplot2 3.3.6      βœ” purrr   0.3.5 
βœ” tibble  3.1.8      βœ” dplyr   1.0.10
βœ” tidyr   1.2.1      βœ” stringr 1.4.1 
βœ” readr   2.1.3      βœ” forcats 0.5.2 
── Conflicts ────────────────────────────────────────────────────────── tidyverse_conflicts() ──
βœ– dplyr::filter() masks stats::filter()
βœ– dplyr::lag()    masks stats::lag()

The tidyr package will be used for data cleaning, and the readr package will be used for data loading.

Similar Reads

Data loading using readr

...

Cleaning Data with tidyverse (What even is tidy data?)

Dear Friends, In this tutorial, we will read and parse a CSV file using the readr package’s read CSV function. CSV (Comma-Separated Values) files contain data separated by commas. The following CSV file will be used in the following example. To begin, pass the path to the file to be read to the read_csv function. The read CSV function generates tibbles that can be attached to variables....

Contact Us