Statistics
Statistics simply means numerical data, and is field of math that generally deals with collection of data, tabulation, and interpretation of numerical data. It is an area of applied mathematics concern with data collection analysis, interpretation, and presentation. Statistics deals with how data can be used to solve complex problems.
Mean, Median and Mode:
- Mean: It is the sum of observation divided by the total number of observations.
- Median: It is the middle value of the data set.
- Mode: It is the value that has the highest frequency in the given data set. R does not have a standard in-built function to calculate mode.
Example:
R
# Create the data A <- c (17, 12, 8, 53, 1, 12, 43, 17, 43, 10) print ( mean (A)) print ( median (A)) mode <- function (x) { a <- unique (x) a[ which.max ( tabulate ( match (x, a)))] } # Calculate the mode using # the user function. print ( mode (A) |
Output:
[1] 21.6 [1] 14.5 [1] 17
Note: For more information, refer Mean, Median and Mode in R Programming
Normal Distribution:
Normal Distribution tells about how the data values are distributed. For example, the height of the population, shoe size, IQ level, rolling a dice, and many more. In R, there are 4 built-in functions to generate normal distribution:
- dnorm() function in R programming measures density function of distribution.
dnorm(x, mean, sd)
- pnorm() function is the cumulative distribution function which measures the probability that a random number X takes a value less than or equal to x
pnorm(x, mean, sd)
- qnorm() function is the inverse of pnorm() function. It takes the probability value and gives output which corresponds to the probability value.
qnorm(p, mean, sd)
- rnorm() function in R programming is used to generate a vector of random numbers which are normally distributed.
rnorm(n, mean, sd)
Example:
R
# creating a sequence of values # between -10 to 10 with a # difference of 0.1 x <- seq (-10, 10, by=0.1) y = dnorm (x, mean (x), sd (x)) plot (x, y, main= 'dnorm' ) y <- pnorm (x, mean (x), sd (x)) plot (x, y, main= 'pnorm' ) y <- qnorm (x, mean (x), sd (x)) plot (x, y, main= 'qnorm' ) x <- rnorm (x, mean (x), sd (x)) hist (x, breaks=50, main= 'rnorm' ) |
Output:
Note: For more information refer Normal Distribution in R
Binomial Distribution in R Programming:
The binomial distribution is a discrete distribution and has only two outcomes i.e. success or failure. For example, determining whether a particular lottery ticket has won or not, whether a drug is able to cure a person or not, it can be used to determine the number of heads or tails in a finite number of tosses, for analyzing the outcome of a die, etc. We have four functions for handling binomial distribution in R namely:
- dbinom()
dbinom(k, n, p)
- pbinom()
pbinom(k, n, p)
where n is total number of trials, p is probability of success, k is the value at which the probability has to be found out.
- qbinom()
qbinom(P, n, p)
Where P is the probability, n is the total number of trials and p is the probability of success.
- rbinom()
rbinom(n, N, p)
Where n is numbers of observations, N is the total number of trials, p is the probability of success.
Example:
R
probabilities <- dbinom (x = c (0:10), size = 10, prob = 1 / 6) plot (0:10, probabilities, type = "l" , main= 'dbinom' ) probabilities <- pbinom (0:10, size = 10, prob = 1 / 6) plot (0:10, , type = "l" , main= 'pbinom' ) x <- seq (0, 1, by = 0.1) y <- qbinom (x, size = 13, prob = 1 / 6) plot (x, y, type = 'l' ) probabilities <- rbinom (8, size = 13, prob = 1 / 6) hist (probabilities) |
Output:
Note: For more information, refer Binomial Distribution in R Programming
Time Series Analysis:
Time Series in R is used to see how an object behaves over a period of time. In R, it can be easily done by ts() function.
Example: Let’s take the example of COVID-19 pandemic situation. Taking total number of positive cases of COVID-19 cases weekly from 22 January, 2020 to 15 April, 2020 of the world in data vector.
R
# Weekly data of COVID-19 positive cases from # 22 January, 2020 to 15 April, 2020 x <- c (580, 7813, 28266, 59287, 75700, 87820, 95314, 126214, 218843, 471497, 936851, 1508725, 2072113) # library required for decimal_date() function library (lubridate) # creating time series object # from date 22 January, 2020 mts <- ts (x, start = decimal_date ( ymd ( "2020-01-22" )), frequency = 365.25 / 7) # plotting the graph plot (mts, xlab = "Weekly Data" , ylab = "Total Positive Cases" , main = "COVID-19 Pandemic" , col.main = "darkgreen" ) |
Output:
Note: For more information, refer Time Series Analysis in R
Learn R Programming
R is a Programming Language that is mostly used for machine learning, data analysis, and statistical computing. It is an interpreted language and is platform independent that means it can be used on platforms like Windows, Linux, and macOS.
In this R Language tutorial, we will Learn R Programming Language from scratch to advance and this tutorial is suitable for both beginners and experienced developers).
Contact Us