Statistics

Statistics simply means numerical data, and is field of math that generally deals with collection of data, tabulation, and interpretation of numerical data. It is an area of applied mathematics concern with data collection analysis, interpretation, and presentation. Statistics deals with how data can be used to solve complex problems.

Mean, Median and Mode:

  • Mean: It is the sum of observation divided by the total number of observations.
  • Median: It is the middle value of the data set.
  • Mode: It is the value that has the highest frequency in the given data set. R does not have a standard in-built function to calculate mode.

Example:

R




# Create the data
A <- c(17, 12, 8, 53, 1, 12,
       43, 17, 43, 10)
 
print(mean(A))
print(median(A))
 
mode <- function(x) {
   a <- unique(x)
   a[which.max(tabulate(match(x, a)))]
}
 
# Calculate the mode using
# the user function.
print(mode(A)


Output:

[1] 21.6
[1] 14.5
[1] 17

Note: For more information, refer Mean, Median and Mode in R Programming

Normal Distribution:

Normal Distribution tells about how the data values are distributed. For example, the height of the population, shoe size, IQ level, rolling a dice, and many more. In R, there are 4 built-in functions to generate normal distribution:
 

  • dnorm() function in R programming measures density function of distribution.
dnorm(x, mean, sd)
  • pnorm() function is the cumulative distribution function which measures the probability that a random number X takes a value less than or equal to x
pnorm(x, mean, sd)
  • qnorm() function is the inverse of pnorm() function. It takes the probability value and gives output which corresponds to the probability value.
qnorm(p, mean, sd)
  • rnorm() function in R programming is used to generate a vector of random numbers which are normally distributed.
rnorm(n, mean, sd)

Example:

R




# creating a sequence of values
# between -10 to 10 with a
# difference of 0.1
x <- seq(-10, 10, by=0.1)
 
 
y = dnorm(x, mean(x), sd(x))
plot(x, y, main='dnorm')
 
y <- pnorm(x, mean(x), sd(x))
plot(x, y, main='pnorm')
 
y <- qnorm(x, mean(x), sd(x))
plot(x, y, main='qnorm')
 
x <- rnorm(x, mean(x), sd(x))
hist(x, breaks=50, main='rnorm')


 
 Output:

 

Note: For more information refer Normal Distribution in R

Binomial Distribution in R Programming:

The binomial distribution is a discrete distribution and has only two outcomes i.e. success or failure. For example, determining whether a particular lottery ticket has won or not, whether a drug is able to cure a person or not, it can be used to determine the number of heads or tails in a finite number of tosses, for analyzing the outcome of a die, etc. We have four functions for handling binomial distribution in R namely:

  • dbinom()
dbinom(k, n, p)
  • pbinom()
pbinom(k, n, p)

where n is total number of trials, p is probability of success, k is the value at which the probability has to be found out.

  • qbinom()
qbinom(P, n, p)

Where P is the probability, n is the total number of trials and p is the probability of success.

  • rbinom()
rbinom(n, N, p)

 Where n is numbers of observations, N is the total number of trials, p is the probability of success.

Example:

R




probabilities <- dbinom(x = c(0:10), size = 10, prob = 1 / 6)
plot(0:10, probabilities, type = "l", main='dbinom')
 
probabilities <- pbinom(0:10, size = 10, prob = 1 / 6)
plot(0:10, , type = "l", main='pbinom')
 
x <- seq(0, 1, by = 0.1)
y <- qbinom(x, size = 13, prob = 1 / 6)
plot(x, y, type = 'l')
 
probabilities <- rbinom(8, size = 13, prob = 1 / 6)
hist(probabilities)


 
Output:

 

Note: For more information, refer Binomial Distribution in R Programming

Time Series Analysis:

Time Series in R is used to see how an object behaves over a period of time. In R, it can be easily done by ts() function.

Example: Let’s take the example of COVID-19 pandemic situation. Taking total number of positive cases of COVID-19 cases weekly from 22 January, 2020 to 15 April, 2020 of the world in data vector.

R




# Weekly data of COVID-19 positive cases from
# 22 January, 2020 to 15 April, 2020
x <- c(580, 7813, 28266, 59287, 75700,
    87820, 95314, 126214, 218843, 471497,
    936851, 1508725, 2072113)
 
# library required for decimal_date() function
library(lubridate)
 
# creating time series object
# from date 22 January, 2020
mts <- ts(x, start = decimal_date(ymd("2020-01-22")),
                            frequency = 365.25 / 7)
 
# plotting the graph
plot(mts, xlab ="Weekly Data",
        ylab ="Total Positive Cases",
        main ="COVID-19 Pandemic",
        col.main ="darkgreen")


Output:

Note: For more information, refer Time Series Analysis in R



Learn R Programming

R is a Programming Language that is mostly used for machine learning, data analysis, and statistical computing. It is an interpreted language and is platform independent that means it can be used on platforms like Windows, Linux, and macOS.

In this R Language tutorial, we will Learn R Programming Language from scratch to advance and this tutorial is suitable for both beginners and experienced developers).

Similar Reads

Why Learn R Programming Language?

R programming is used as a leading tool for machine learning, statistics, and data analysis. R is an open-source language that means it is free of cost and anyone from any organization can install it without purchasing a license. It is available across widely used platforms like windows, Linux, and macOS. R programming language is not only a statistic package but also allows us to integrate with other languages (C, C++). Thus, you can easily interact with many data sources and statistical packages. Its user base is growing day by day and has vast community support. R Programming Language is currently one of the most requested programming languages in the Data Science job market that makes it the hottest trend nowadays....

Key Features and Applications

Some key features of R that make the R one of the most demanding job in data science market are:...

Download and Installation

There are many IDE’s available for using R in this article we will dealing with the installation of RStudio in R....

Hello World in R

R Program can be run in several ways. You can choose any of the following options to continue with this tutorial....

Fundamentals of R

...

Data Types

Variables:...

Basics of Input/Output

...

Decision Making

...

Control Flow

...

Loop Control Statements

...

Functions

...

Data Structures

...

Error Handling

Each variable in R has an associated data type. Each data type requires different amounts of memory and has some specific operations which can be performed over it. R supports 5 type of data types. These are –...

Charts and Graphs

...

Statistics

Taking Input from the User:...

Contact Us