How to Converting a R code into C++ for Rcpp implementation
When dealing with performance issues in R code, there may be situations where R alone is not sufficiently fast. To rescue there is a powerful package in R Programming Language called Rcpp that allows for seamless integration of C++ code into R, providing significant performance improvements. Converting R code into C++ using Rcpp can enhance computational efficiency, especially for computationally intensive tasks. This guide will walk you through the process of converting R code into C++ using Rcpp.
What is Rcpp?
Rcpp is an R package that provides a simple and efficient way to write high-performance R functions in C++. Rcpp allows for direct access to R data structures and functions, making it easier to bridge the gap between R and C++. Rcpp allows for direct access to R data structures and functions, making it easier to bridge the gap between R and C++.
Why use Rcpp?
Improved performance
C++ is a lower-level language that can execute computations more efficiently than R, thus C++ code executed via Rcpp runs significantly faster than the equivalent R code.
Seamless integration and Developer Friendly API
Rcpp provides a simple syntax that enables developers to write C++ code within R scripts without needing to switch between languages, for this Rcpp provides C++ classes that align with R’s object-oriented programming style. These classes, such as NumericVector, CharacterVector, List, DataFrame, etc., enable developers to work with R data structures in a natural and efficient manner within the same R script.
Installation
R
install.packages ( "Rcpp" ) |
To use it you will need a C++ compiler
- On Windows, install Rtools.
- On Mac, install Xcode from the app store.
- On Linux,
sudo apt-get install r-base-dev
or similar, in a few distros it is already present.
Implementation
Rcpp can be used in two ways
- Inline C++the: C++ code in same R script file.
- sourceCpp: Using standalone C++ files and importing them in R script .i.e modularity.
Using Inline C++
Example 1
Adding two vectors using Rcpp
The classes for the most common types of R vectors based on their data type are:
- NumericVector
- IntegerVector
- CharacterVector
- LogicalVector
R
# R code to add two vectors sumTwoR <- function (x, y) { total <- 0 for (i in 1: length (x)) { total <- total + x[i] + y[i] } return (total) } x <- c (1, 2, 3) y <- c (4, 5, 6) # Calling sumTwoR sumTwoR (x,y) |
Output:
[1] 21
First, we created a function called sumTwoR which will add two vectors x and y using for loop.
Then we are calling this function as sumTwoR(x,y) providing x {1,2,3} and y {4,5,6} as arguments.
Converted to Rcpp
We are creating a C++ equivalent sumTwoC(x,y) of the above function in R script, to do this we need a function called cppFunction() from the Rcpp package.
R
library (Rcpp) # Code in C++ to calculate the sum of vector values cppFunction (' double sumTwoC (NumericVector x, NumericVector y) { int n = x.size (); double total = 0; for (int i = 0; i < n; ++i) { total += x[i] + y[i]; } return total; }') # Similarly calling sumTwoC sumTwoC ( c (1, 2, 3), c (4, 5, 6)) |
Output
[1] 21
First, we called the package Rcpp using, library(Rcpp), now explaining the function sumTwoC
cppFunction
: An R function from the ‘Rcpp’ package used to define and compile C++ code within an R script.double
: A C++ data type indicating that the function will return a decimal value.sumTwoC
: The name of the C++ function defined in this code snippet.NumericVector
: An Rcpp class representing a vector of numeric values passed as input to the function.x
andy
: Two input parameters of typeNumericVector
representing two vectors of numeric values.int
: A C++ data type indicating an integer value.n
: An integer variable representing the size (length) of the input vectors.for
loop: A loop that iterates from 0 ton-1
to perform element-wise addition ofx
andy
and accumulate the result in thetotal
variable.return
: A keyword indicating the value to be returned by the function.total
: The accumulated sum of corresponding elements fromx
andy
, which is returned as the output of the function.
We will call the function sumTwoC in the same way we were calling the function sumTwoR. We have just changed the function definition of the R code everything else is the same, this is the seamless integration I was talking about before.
Example 2: Matrix input
The classes for the most common types of R matrices based on their data type are:
- NumericMatrix
- IntegerMatrix
- CharacterMatrix
- LogicalMatrix
R
multipleMatricesR <- function (A, B) { return (A %*% B) } # Define a 3x3 matrix A <- matrix (1:9, nrow = 3, ncol = 3) B <- matrix (1:9, nrow = 3, ncol = 3) # Multiply the matrices using the R function multipleMatricesR (A, B) |
Output:
[,1] [,2] [,3]
[1,] 30 66 102
[2,] 36 81 126
[3,] 42 96 150
Converted to Rcpp
R
# Rcpp code to multiply two matrices library (Rcpp) cppFunction ( 'NumericMatrix multiplyMatricesC (NumericMatrix A, NumericMatrix B) { int n = A.nrow (), k = A.ncol (), m = B.ncol (); NumericMatrix C (n, m); for (int i = 0; i < n; ++i) { for (int j = 0; j < m; ++j) { for (int l = 0; l < k; ++l) { C (i, j) += A (i, l) * B (l, j); } } } return C; }') # Define a 3x3 matrix A <- matrix (1:9, nrow = 3, ncol = 3) B <- matrix (1:9, nrow = 3, ncol = 3) # Multiply the matrices using the R function multipleMatricesR (A, B) |
Output
[,1] [,2] [,3]
[1,] 30 66 102
[2,] 36 81 126
[3,] 42 96 150
cppFunction
: An R function from the Rcpp library used to define and compile C++ code within an R script.NumericMatrix
: An Rcpp class representing a matrix of numeric values.multiplyMatricesC
: The name of the C++ function defined in this code snippet.A
andB
: Input parameters of typeNumericMatrix
representing two matrices.return
: A keyword indicating the value to be returned by the function.for
loops: Nested loops that iterate over the rows and columns of matricesA
,B
, andC
to perform matrix multiplication which is stored in matrix C.
The other way of using Rcpp .i.e is by using standalone C++ files and importing them in R script using sourceCpp.
Using sourceCpp()
The sourceCpp(“Path to cpp file”) function allows us to separate the C++ code from the R script by sourcing the C++ code from a separate file. Instead of including the C++ code directly in the R file, we can provide the path to the C++ file to the sourceCpp() function.
This approach promotes modularity and organization, making it easier to manage and maintain the C++ code separately from the R script. By sourcing the C++ code using sourceCpp(), the C++ functions defined in the file become accessible within the R environment, enabling us to utilize their functionality seamlessly within our R code.
Example
We have seen vectors and matrices, let’s work with data frames. I am creating a function to count the number of each grade given to a student.
R
# Create a data frame with id, name and marks df <- data.frame ( id = c (1, 2, 3, 4, 5, 6), name = c ( "John" , "Smith" , "Jane" , "Doe" , "Peter" , "Parker" ), grades = c ( "A" , "B" , "A" , "C" , "D" , "C" ) ) # Function to count the number of grades count_grades <- function (df) { grades <- df$grades unique_grades <- unique (grades) counts <- c () for (grade in unique_grades) { count <- length (grades[grades == grade]) counts <- c (counts, count) } return ( data.frame (grade = unique_grades, count = counts)) } count_grades (df) |
Output:
grade count
1 A 2
2 B 1
3 C 2
4 D 1
Let’s go through the code line by line:
include <Rcpp.h>
: This line includes the Rcpp library, which allows seamless integration of C++ code with R.include <map>
: This line includes the standard C++ library forstd::map
, which will be used to store the count of grades.using namespace Rcpp;
: This line brings theRcpp
namespace into the current scope, so we can useRcpp
functions and classes without explicitly specifying the namespace.DataFrame count_grades_in_cpp(DataFrame df)
: This line defines the functioncount_grades_in_cpp
, which takes a DataFramedf
as input and returns a DataFrame as output.std::map<std::string, int> grade_count;
: This line declares a map calledgrade_count
, where the key is astd::string
representing the grade name, and the value is anint
representing the count of that grade.int n = df.nrows();
: This line gets the number of rows in the input DataFramedf
and stores it in the variablen
.CharacterVector grades = df["grades"];
: This line extracts the column named “grades” from the input DataFramedf
and stores it in aCharacterVector
calledgrades
.for (int i = 0; i < n; i++) { ... }
: This is a loop that iterates over each row of thegrades
vector.grade_count[as<std::string>(grades[i])]++;
: This line converts thei
-th element of thegrades
vector to astd::string
usingas<std::string>()
and then uses it as a key to access the corresponding value in thegrade_count
map. The value is then incremented by one.CharacterVector grade_name;
: This line declares an emptyCharacterVector
calledgrade_name
, which will store the grade names for the output DataFrame.IntegerVector grade_count_vec;
: This line declares an emptyIntegerVector
calledgrade_count_vec
, which will store the grade counts for the output DataFrame.for (auto it = grade_count.begin(); it != grade_count.end(); it++) { ... }
: This loop iterates over thegrade_count
map using an iteratorit
.grade_name.push_back(it->first);
: This line adds the key (grade name) of the current element pointed to by the iteratorit
to thegrade_name
vector.grade_count_vec.push_back(it->second);
: This line adds the value (grade count) of the current element pointed to by the iteratorit
to thegrade_count_vec
vector.return DataFrame::create(_["grade"] = grade_name, _["count"] = grade_count_vec);
: This line creates a new DataFrame usingDataFrame::create()
with two named columns: “grade” (using thegrade_name
vector) and “count” (using thegrade_count_vec
vector). The created DataFrame is then returned as the output of the function.
Now, create a R script in which we will import dot productdot product
.cpp using sourceCpp(“dotproduct.cpp”)
R
# Create a data frame with id, name and marks df <- data.frame ( id = c (1, 2, 3, 4, 5, 6), name = c ( "John" , "Smith" , "Jane" , "Doe" , "Peter" , "Parker" ), grades = c ( "A" , "B" , "A" , "C" , "D" , "C" ) ) library (Rcpp) sourceCpp ( "grades.cpp" ) # Call the function count_grades_in_cpp (df) |
Output
grade count
1 A 2
2 B 1
3 C 2
4 D 1
Here we use sourceCpp(“dotproduct.cpp”) to import the C++ code from the “dotproduct.cpp” file, which contains the implementation of the dotProductCpp function. The function becomes available in the R environment after sourcing the file. Then we call the function directly in our R script using dotProductCpp(x,y).
Note
The dotproduct.cpp should be replaced with the path of dotproduct.cpp file .i.e. sourceCpp( PATH TO C++ FILE).
Benchmarking
Two understand the benefits of using C++ equivalent we’ll benchmark both functions sumTwoR and sumTwoC, using package microbenchmark. Install it using install.packages(“microbenchmark”).
R
library (Rcpp) library (microbenchmark) # Writing a code in C++ to calculate the sum of vector values cppFunction (' double sumTwoC (NumericVector x, NumericVector y) { int n = x.size (); double total = 0; for (int i = 0; i < n; ++i) { total += x[i] + y[i]; } return total; }') # R code to add two vectors sumTwoR <- function (x, y) { total <- 0 for (i in 1: length (x)) { total <- total + x[i] + y[i] } return (total) } # Benchmarking the two functions x <- runif (1e6) y <- runif (1e6) # Create a horizontal bar plot of the results benchmark <- microbenchmark ( sumTwoC (x, y), sumTwoR (x, y), times = 100) |
Unit: milliseconds
expr min lq mean median uq max neval
sumTwoC(x, y) 1.100829 1.142826 1.250264 1.185279 1.231635 2.252383 100
sumTwoR(x, y) 44.789824 46.995613 49.763670 48.157485 50.708915 73.723904 100
Here we can see that the minimum time taken by sumTwoC is 2.25 ms but sumTwoR takes 44.78 ms, it’s a huge difference.
Plotting the benchmark results
I’m using ggplot2 for plotting the benchmark results.
R
# Plotting the results library (ggplot2) ggplot (benchmark, aes (x = expr, y = time, fill = expr)) + geom_bar (stat = "identity" ) + theme_bw () + coord_flip () + theme (axis.text.x = element_text (angle = 45, hjust = 1), axis.text = element_text (face = "bold" )) + labs (x = "Function" , y = "Time (ms)" , title = "Benchmarking sumTwo and sumTwoR" ) # Save the plot ggsave ( "benchmark.png" , width = 10, height = 5, dpi = 100) |
Output
Conclusion
We have learned that Rcpp is a powerful tool that can be used to implement R code in C++. This can lead to significant performance improvements, as C++ is a compiled language that is typically much faster than R. To learn more you can check out the official website of Rcpp.
Contact Us