Compare Adjacent Rows in R data.table
The data.table package is used to ease the data manipulation operations such as subsetting, grouping, and updation operations of the data table in R Programming Language.
Indexing methods are used to create a new column that computes the lag with the previous value encountered within the same group. The group is illustrated using the “by” attribute. The new column is added and its corresponding values are added using the c(NA, x[-.N]) method, where x is an indicator of the column to use to compute the new column’s value. The first instance of a value in a particular group is replaced using NA.
Syntax:
dt[, new-col-name := c(NA, x[-.N]), by ]
Example 1: Comparing Adjacent rows in R Data.table
R
# importing required packages library ( "data.table" ) # declaring data frame data_frame <- data.table (col1 = sample ( letters [1:4],12, replace = TRUE ), col2 = sample (1:6 , 12, replace = TRUE ) ) print ( "original data frame" ) print (data_frame) # computing lag group by column1 data_frame[, lag := c ( NA , col2[-.N]), by = col1] print ( "modified data frame" ) print (data_frame) |
Output
[1] "original data frame" col1 col2 1: b 6 2: c 5 3: a 1 4: d 6 5: d 5 6: b 6 7: b 5 8: a 2 9: c 6 10: a 3 11: a 4 12: d 1 [1] "modified data frame" col1 col2 lag 1: b 6 NA 2: c 5 NA 3: a 1 NA 4: d 6 NA 5: d 5 6 6: b 6 6 7: b 5 6 8: a 2 1 9: c 6 5 10: a 3 2 11: a 4 3 12: d 1 5
Now, the difference between adjacent rows is computed using the formula where the values of the new column and the existing column x are used in the data table.
Syntax:
data_frame[, diff-col := x – new-col-name]
Example 2: Difference between adjacent data.table in R
R
# importing required packages library ( "data.table" ) # declaring data frame data_frame <- data.table (col1 = sample ( letters [1:4],12, replace = TRUE ), col2 = sample (1:6 , 12, replace = TRUE ) ) print ( "original data frame" ) print (data_frame) # computing lag group by column1 data_frame[, lag := c ( NA , col2[-.N]), by = col1] print ( "modified data frame" ) print (data_frame) data_mod <-data_frame[, difference := col2 - lag] print ( "modified data frame" ) print (data_mod) |
Output
[1] "original data frame" col1 col2 1: a 1 2: d 3 3: d 6 4: d 3 5: d 2 6: b 4 7: d 5 8: c 6 9: d 2 10: b 4 11: d 1 12: a 6 [1] "modified data frame" col1 col2 lag difference 1: a 1 NA NA 2: d 3 NA NA 3: d 6 3 3 4: d 3 6 -3 5: d 2 3 -1 6: b 4 NA NA 7: d 5 2 3 8: c 6 NA NA 9: d 2 5 -3 10: b 4 4 0 11: d 1 2 -1 12: a 6 1 5
Example 3:
R
# importing required packages library ( "data.table" ) # declaring data frame data_frame <- data.table (col1 = sample ( letters [1:4],16, replace = TRUE ), col2 = 100:115 ) print ( "original data frame" ) print (data_frame) # computing difference data_frame[, col3 := c ( NA , col2[-.N]), by = col1] data_mod <-data_frame[, difference := col2 - col3] print ( "modified data frame" ) print (data_mod) |
Output
[1] "original data frame" col1 col2 1: d 100 2: a 101 3: b 102 4: a 103 5: d 104 6: d 105 7: c 106 8: a 107 9: b 108 10: a 109 11: b 110 12: d 111 13: b 112 14: d 113 15: c 114 16: b 115 [1] "modified data frame" col1 col2 col3 difference 1: d 100 NA NA 2: a 101 NA NA 3: b 102 NA NA 4: a 103 101 2 5: d 104 100 4 6: d 105 104 1 7: c 106 NA NA 8: a 107 103 4 9: b 108 102 6 10: a 109 107 2 11: b 110 108 2 12: d 111 105 6 13: b 112 110 2 14: d 113 111 2 15: c 114 106 8 16: b 115 112 3
Contact Us