Data Structures
A data structure is a particular way of organizing data in a computer so that it can be used effectively.
Vectors:
Vectors in R are the same as the arrays in C language which are used to hold multiple data values of the same type. One major key point is that in R the indexing of the vector will start from ‘1’ and not from ‘0’.
Example:
R
# R program to illustrate Vector # Numeric Vector N = c (1, 3, 5, 7, 8) # Character vector C = c ( 'Geeks' , 'For' , 'Geeks' ) # Logical Vector L = c ( TRUE , FALSE , FALSE , TRUE ) # Printing vectors print (N) print (C) print (L) |
Output:
[1] 1 3 5 7 8 [1] "Geeks" "For" "Geeks" [1] TRUE FALSE FALSE TRUE
Accessing Vector Elements:
There are many ways through which we can access the elements of the vector. The most common is using the ‘[]’, symbol.
Example:
R
# Accessing elements using # the position number. X <- c (2, 9, 8, 0, 5) print ( 'using Subscript operator' ) print (X[2]) # Accessing specific values by passing # a vector inside another vector. Y <- c (6, 2, 7, 4, 0) print ( 'using c function' ) print (Y[ c (4, 1)]) # Logical indexing Z <- c (1, 6, 9, 4, 6) print ( 'Logical indexing' ) print (Z[Z>3]) |
Output:
[1] "using Subscript operator" [1] 9 [1] "using c function" [1] 4 6 [1] "Logical indexing" [1] 6 9 4 6
Refer to the below articles to get detailed information about vectors in R.
Lists:
A list is a generic object consisting of an ordered collection of objects. Lists are heterogeneous data structures.
Example:
R
# R program to create a List # The first attributes is a numeric vector # containing the employee IDs which is created # using the command here empId = c (1, 2, 3, 4) # The second attribute is the employee name # which is created using this line of code here # which is the character vector empName = c ( "Nisha" , "Nikhil" , "Akshu" , "Sambha" ) # The third attribute is the number of employees # which is a single numeric variable. numberOfEmp = 4 # The fourth attribute is the name of organization # which is a single character variable. Organization = "GFG" # We can combine all these three different # data types into a list # containing the details of employees # which can be done using a list command empList = list (empId, empName, numberOfEmp, Organization) print (empList) |
Output:
[[1]] [1] 1 2 3 4 [[2]] [1] "Nisha" "Nikhil" "Akshu" "Sambha" [[3]] [1] 4 [[4]] [1] "GFG"
Accessing List Elements:
- Access components by names: All the components of a list can be named and we can use those names to access the components of the list using the dollar command.
- Access components by indices: We can also access the components of the list using indices. To access the top-level components of a list we have to use a double slicing operator “[[ ]]” which is two square brackets and if we want to access the lower or inner level components of a list we have to use another square bracket “[ ]” along with the double slicing operator “[[ ]]“.
Example:
R
# R program to access # components of a list # Creating a list by naming all its components empId = c (1, 2, 3, 4) empName = c ( "Nisha" , "Nikhil" , "Akshu" , "Sambha" ) numberOfEmp = 4 empList = list ( "ID" = empId, "Names" = empName, "Total Staff" = numberOfEmp ) print ( "Initial List" ) print (empList) # Accessing components by names cat ( "\nAccessing name components using $ command\n" ) print (empList$Names) # Accessing a top level components by indices cat ( "\nAccessing name components using indices\n" ) print (empList[[2]]) print (empList[[1]][2]) print (empList[[2]][4]) |
Output:
[1] "Initial List" $ID [1] 1 2 3 4 $Names [1] "Nisha" "Nikhil" "Akshu" "Sambha" $`Total Staff` [1] 4 Accessing name components using $ command [1] "Nisha" "Nikhil" "Akshu" "Sambha" Accessing name components using indices [1] "Nisha" "Nikhil" "Akshu" "Sambha" [1] 2 [1] "Sambha"
Adding and Modifying list elements:
- A list can also be modified by accessing the components and replacing them with the ones which you want.
- List elements can be added simply by assigning new values using new tags.
Example:
R
# R program to access # components of a list # Creating a list by naming all its components empId = c (1, 2, 3, 4) empName = c ( "Nisha" , "Nikhil" , "Akshu" , "Sambha" ) numberOfEmp = 4 empList = list ( "ID" = empId, "Names" = empName, "Total Staff" = numberOfEmp ) print ( "Initial List" ) print (empList) # Adding new element empList[[ "organization" ]] <- "GFG" cat ( "\nAfter adding new element\n" ) print (empList) # Modifying the top-level component empList$ "Total Staff" = 5 # Modifying inner level component empList[[1]][5] = 7 cat ( "\nAfter modification\n" ) print (empList) |
Output:
[1] "Initial List" $ID [1] 1 2 3 4 $Names [1] "Nisha" "Nikhil" "Akshu" "Sambha" $`Total Staff` [1] 4 After adding new element $ID [1] 1 2 3 4 $Names [1] "Nisha" "Nikhil" "Akshu" "Sambha" $`Total Staff` [1] 4 $organization [1] "GFG" After modification $ID [1] 1 2 3 4 7 $Names [1] "Nisha" "Nikhil" "Akshu" "Sambha" $`Total Staff` [1] 5 $organization [1] "GFG"
Refer to the below articles to get detailed information about lists in R
Matrices:
A matrix is a rectangular arrangement of numbers in rows and columns. Matrices are two-dimensional, homogeneous data structures.
Example:
R
# R program to illustrate a matrix A = matrix ( # Taking sequence of elements c (1, 4, 5, 6, 3, 8), # No of rows and columns nrow = 2, ncol = 3, # By default matrices are # in column-wise order # So this parameter decides # how to arrange the matrix byrow = TRUE ) print (A) |
Output:
[,1] [,2] [,3] [1,] 1 4 5 [2,] 6 3 8
Accessing Matrix Elements:
Matrix elements can be accessed using the matrix name followed by a square bracket with a comma in between the array. Value before the comma is used to access rows and value that is after the comma is used to access columns.
Example:
R
# R program to illustrate # access rows in metrics # Create a 3x3 matrix A = matrix ( c (1, 4, 5, 6, 3, 8), nrow = 2, ncol = 3, byrow = TRUE ) cat ( "The 2x3 matrix:\n" ) print (A) print (A[1, 1]) print (A[2, 2]) # Accessing first and second row cat ( "Accessing first and second row\n" ) print (A[1:2, ]) # Accessing first and second column cat ( "\nAccessing first and second column\n" ) print (A[, 1:2]) |
Output:
The 2x3 matrix: [,1] [,2] [,3] [1,] 1 4 5 [2,] 6 3 8 [1] 1 [1] 3 Accessing first and second row [,1] [,2] [,3] [1,] 1 4 5 [2,] 6 3 8 Accessing first and second column [,1] [,2] [1,] 1 4 [2,] 6 3
Modifying Matrix Elements:
You can modify the elements of the matrices by a direct assignment.
Example:
R
# R program to illustrate # editing elements in metrics # Create a 3x3 matrix A = matrix ( c (1, 4, 5, 6, 3, 8), nrow = 2, ncol = 3, byrow = TRUE ) cat ( "The 2x3 matrix:\n" ) print (A) # Editing the 3rd rows and 3rd # column element from 9 to 30 # by direct assignments A[2, 1] = 30 cat ( "After edited the matrix\n" ) print (A) |
Output:
The 2x3 matrix: [,1] [,2] [,3] [1,] 1 4 5 [2,] 6 3 8 After edited the matrix [,1] [,2] [,3] [1,] 1 4 5 [2,] 30 3 8
Refer to the below articles to get detailed information about Matrices in R
DataFrame:
Dataframes are generic data objects of R which are used to store the tabular data. They are two-dimensional, heterogeneous data structures. These are lists of vectors of equal lengths.
Example:
R
# R program to illustrate dataframe # A vector which is a character vector Name = c ( "Nisha" , "Nikhil" , "Raju" ) # A vector which is a character vector Language = c ( "R" , "Python" , "C" ) # A vector which is a numeric vector Age = c (40, 25, 10) # To create dataframe use data.frame command # and then pass each of the vectors # we have created as arguments # to the function data.frame() df = data.frame (Name, Language, Age) print (df) |
Output:
Name Language Age 1 Nisha R 40 2 Nikhil Python 25 3 Raju C 10
Getting the structure and data from DataFrame:
- One can get the structure of the data frame using str() function.
- One can extract a specific column from a data frame using its column name.
Example:
R
# R program to get the # structure of the data frame # creating a data frame friend.data <- data.frame ( friend_id = c (1:5), friend_name = c ( "Aman" , "Nisha" , "Nikhil" , "Raju" , "Raj" ), stringsAsFactors = FALSE ) # using str() print ( str (friend.data)) # Extracting friend_name column result <- data.frame (friend.data$friend_name) print (result) |
Output:
'data.frame': 5 obs. of 2 variables: $ friend_id : int 1 2 3 4 5 $ friend_name: chr "Aman" "Nisha" "Nikhil" "Raju" ... NULL friend.data.friend_name 1 Aman 2 Nisha 3 Nikhil 4 Raju 5 Raj
Summary of dataframe:
The statistical summary and nature of the data can be obtained by applying summary() function.
Example:
R
# R program to get the # structure of the data frame # creating a data frame friend.data <- data.frame ( friend_id = c (1:5), friend_name = c ( "Aman" , "Nisha" , "Nikhil" , "Raju" , "Raj" ), stringsAsFactors = FALSE ) # using summary() print ( summary (friend.data)) |
Output:
friend_id friend_name Min. :1 Length:5 1st Qu.:2 Class :character Median :3 Mode :character Mean :3 3rd Qu.:4 Max. :5
Refer to the below articles to get detailed information about DataFrames in R
Arrays:
Arrays are the R data objects which store the data in more than two dimensions. Arrays are n-dimensional data structures.
Example:
R
# R program to illustrate an array A = array ( # Taking sequence of elements c (2, 4, 5, 7, 1, 8, 9, 2), # Creating two rectangular matrices # each with two rows and two columns dim = c (2, 2, 2) ) print (A) |
Output:
, , 1 [,1] [,2] [1,] 2 5 [2,] 4 7 , , 2 [,1] [,2] [1,] 1 9 [2,] 8 2
Accessing arrays:
The arrays can be accessed by using indices for different dimensions separated by commas. Different components can be specified by any combination of elements’ names or positions.
Example:
R
vec1 <- c (2, 4, 5, 7, 1, 8, 9, 2) vec2 <- c (12, 21, 34) row_names <- c ( "row1" , "row2" ) col_names <- c ( "col1" , "col2" , "col3" ) mat_names <- c ( "Mat1" , "Mat2" ) arr = array ( c (vec1, vec2), dim = c (2, 3, 2), dimnames = list (row_names, col_names, mat_names)) # accessing matrix 1 by index value print ( "Matrix 1" ) print (arr[,,1]) # accessing matrix 2 by its name print ( "Matrix 2" ) print (arr[,, "Mat2" ]) # accessing matrix 1 by index value print ( "1st column of matrix 1" ) print (arr[, 1, 1]) # accessing matrix 2 by its name print ( "2nd row of matrix 2" ) print (arr[ "row2" ,, "Mat2" ]) # accessing matrix 1 by index value print ( "2nd row 3rd column matrix 1 element" ) print (arr[2, "col3" , 1]) # accessing matrix 2 by its name print ( "2nd row 1st column element of matrix 2" ) print (arr[ "row2" , "col1" , "Mat2" ]) # print elements of both the rows and columns # 2 and 3 of matrix 1 print (arr[, c (2, 3), 1]) |
Output:
[1] "Matrix 1" col1 col2 col3 row1 2 5 1 row2 4 7 8 [1] "Matrix 2" col1 col2 col3 row1 9 12 34 row2 2 21 2 [1] "1st column of matrix 1" row1 row2 2 4 [1] "2nd row of matrix 2" col1 col2 col3 2 21 2 [1] "2nd row 3rd column matrix 1 element" [1] 8 [1] "2nd row 1st column element of matrix 2" [1] 2 col2 col3 row1 5 1 row2 7 8
Adding elements to array:
Elements can be appended at the different positions in the array. The sequence of elements is retained in order of their addition to the array. There are various in-built functions available in R to add new values:
- c(vector, values)
- append(vector, values):
- Using the length function of the array
Example:
R
# creating a uni-dimensional array x <- c (1, 2, 3, 4, 5) # addition of element using c() function x <- c (x, 6) print ( "Array after 1st modification " ) print (x) # addition of element using append function x <- append (x, 7) print ( "Array after 2nd modification " ) print (x) # adding elements after computing the length len <- length (x) x[len + 1] <- 8 print ( "Array after 3rd modification " ) print (x) # adding on length + 3 index x[len + 3]<-9 print ( "Array after 4th modification " ) print (x) # append a vector of values to the # array after length + 3 of array print ( "Array after 5th modification" ) x <- append (x, c (10, 11, 12), after = length (x)+3) print (x) # adds new elements after 3rd index print ( "Array after 6th modification" ) x <- append (x, c (-1, -1), after = 3) print (x) |
Output:
[1] "Array after 1st modification " [1] 1 2 3 4 5 6 [1] "Array after 2nd modification " [1] 1 2 3 4 5 6 7 [1] "Array after 3rd modification " [1] 1 2 3 4 5 6 7 8 [1] "Array after 4th modification " [1] 1 2 3 4 5 6 7 8 NA 9 [1] "Array after 5th modification" [1] 1 2 3 4 5 6 7 8 NA 9 10 11 12 [1] "Array after 6th modification" [1] 1 2 3 -1 -1 4 5 6 7 8 NA 9 10 11 12
Removing Elements from Array:
- Elements can be removed from arrays in R, either one at a time or multiple together. These elements are specified as indexes to the array, wherein the array values satisfying the conditions are retained and rest removed.
- Another way to remove elements is by using %in% operator wherein the set of element values belonging to the TRUE values of the operator are displayed as result and the rest are removed.
Example:
R
# creating an array of length 9 m <- c (1, 2, 3, 4, 5, 6, 7, 8, 9) print ( "Original Array" ) print (m) # remove a single value element:3 # from array m <- m[m != 3] print ( "After 1st modification" ) print (m) # removing elements based on condition # where either element should be # greater than 2 and less than equal # to 8 m <- m[m>2 & m<= 8] print ( "After 2nd modification" ) print (m) # remove sequence of elements using # another array remove <- c (4, 6, 8) # check which element satisfies the # remove property print (m % in % remove) print ( "After 3rd modification" ) print (m [! m % in % remove]) |
Output:
[1] "Original Array" [1] 1 2 3 4 5 6 7 8 9 [1] "After 1st modification" [1] 1 2 4 5 6 7 8 9 [1] "After 2nd modification" [1] 4 5 6 7 8 [1] TRUE FALSE TRUE FALSE TRUE [1] "After 3rd modification" [1] 5 7
Refer to the below articles to get detailed information about arrays in R.
Factors:
Factors are the data objects which are used to categorize the data and store it as levels. They are useful for storing categorical data.
Example:
R
# Creating a vector x<- c ( "female" , "male" , "other" , "female" , "other" ) # Converting the vector x into # a factor named gender gender<- factor (x) print (gender) |
Output:
[1] female male other female other Levels: female male other
Accessing elements of a Factor:
Like we access elements of a vector, the same way we access the elements of a factor
Example:
R
x<- c ( "female" , "male" , "other" , "female" , "other" ) print (x[3]) |
Output:
[1] "other"
Modifying of a Factor:
After a factor is formed, its components can be modified but the new values which need to be assigned must be in the predefined level.
Example:
R
x<- c ( "female" , "male" , "other" , "female" , "other" ) x[1]<- "male" print (x) |
Output:
[1] "male" "male" "other" "female" "other"
Refer to the below articles to get detailed information Factors.
Learn R Programming
R is a Programming Language that is mostly used for machine learning, data analysis, and statistical computing. It is an interpreted language and is platform independent that means it can be used on platforms like Windows, Linux, and macOS.
In this R Language tutorial, we will Learn R Programming Language from scratch to advance and this tutorial is suitable for both beginners and experienced developers).
Contact Us