R – Stem and Leaf Plots
Stem and Leaf plot is a technique of displaying the frequencies with which some classes of values may occur. It is basically a method of representing the quantitative data in the graphical format. The stem and leaf plot retains the original data item up to two significant figures unlike histogram. The data is put in order which eases the move to no parametric statistics and order-based inference. Let us understand how this plotting technique works.
Example:
On World’s Obesity Day, suppose in a school a teacher decided to measure the weight of any 10 students whom she feels may have obesity. So she records the weight of 10 students as follows:
54, 43, 67, 76, 45, 59, 66, 78, 80, 92.
Now the stem and leaf plot on these records will be:
4 | 3 5 5 | 4 9 6 | 6 7 7 | 6 8 8 | 0 9 | 2
Here the records are arranged based on their most significant digit. The stem is the left side of the chart while the right side is the leaf. In order to increase readability sometimes, the alternate rows can be merged with it’s immediate next row. In case of infinite values or missing values of the number, they are discarded.
Stem and Leaf Plot in R
In R, stem and leaf plots(also known as stem and leaf diagrams) of any quantitative variable, say x, is a textual graph that is used to classify the data items in order of their most significant numeric digits. The term stem and leaf is so because the plot is given in a tabular format where each numeric value or data item is split into a stem i.e. the first digit and a leaf i.e. the last digit. For example, suppose the input data is 94. Then 9 will be the stem and 4 will be the leaf.
Syntax: stem(number, scale = 1, width = 80, atom = 1e-08) Parameters: number: the data on which we want to draw the stem and leaf plot [either a numeric vector or a list of numeric vectors] scale: the scale we want to use for our plot width: the desired width for our plot [it is 80 by default] atom: tolerance
Creating a Stem and Leaf Plot in R
There are many reserved datasets in RStudio. Here let’s use the ChickWeight data set where considering the weight. At first let’s see how to use the stem and leaf plot in a simpler manner by using stem().
Example:
r
# R program to illustrate # Stem and Leaf Plot # using stem() stem (ChickWeight$weight) |
Output:
The decimal point is 1 digit(s) to the right of the | 2 | 599999999 4 | 00000111111111111111111112222222222222223333456678888888899999999999+38 6 | 00111111122222222333334444455555666677777888888900111111222222333334+8 8 | 00112223344444455555566777788999990001223333566666788888889 10 | 0000111122233333334566667778889901122223445555667789 12 | 00002223333344445555667788890113444555566788889 14 | 11123444455556666677788890011234444555666777777789 16 | 00002233334444466788990000134445555789 18 | 12244444555677782225677778889999 20 | 0123444555557900245578 22 | 0012357701123344556788 24 | 08001699 26 | 12344569259 28 | 01780145 30 | 355798 32 | 12712 34 | 1 36 | 13
Explanation:
Here $ sign is used in the command to extract the data from the list used. The stem() command extracts the numeric data and splits them into two parts namely, the stem and leaf. The left side shows the most significant digit while the last digit is shown on the right hand side. For better readability the numbers having same stem value are merged together.
Using the scale argument
Now let’s see the same stem and leaf plot after rescaling our desired plot. To rescale the plot we need to use the scale argument inside the stem() function.
Example:
r
# R program to illustrate # Stem and Leaf Plot # Drawing Stem and Leaf Plot after rescaling stem (ChickWeight$weight, scale = 5) |
Output:
The decimal point is 1 digit(s) to the right of the | 3 | 599999999 4 | 000001111111111111111111122222222222222233334 4 | 5667888888889999999999999 5 | 00000011111111222233333444 5 | 5555566667778888899999 6 | 001111111222222223333344444 6 | 555556666777778888889 7 | 001111112222223333344444444 7 | 6667778889999 8 | 001122233444444 8 | 5555556677778899999 9 | 0001223333 9 | 566666788888889 10 | 0000111122233333334 10 | 5666677788899 11 | 0112222344 11 | 5555667789 12 | 0000222333334444 12 | 555566778889 13 | 0113444 13 | 555566788889 14 | 111234444 14 | 5555666667778889 15 | 0011234444 15 | 555666777777789 16 | 000022333344444 16 | 6678899 17 | 000013444 17 | 5555789 18 | 12244444 18 | 55567778 19 | 222 19 | 5677778889999 20 | 0123444 20 | 5555579 21 | 0024 21 | 5578 22 | 00123 22 | 577 23 | 01123344 23 | 556788 24 | 0 24 | 8 25 | 001 25 | 699 26 | 12344 26 | 569 27 | 2 27 | 59 28 | 01 28 | 78 29 | 014 29 | 5 30 | 3 30 | 5579 31 | 31 | 8 32 | 12 32 | 7 33 | 12 33 | 34 | 1 34 | 35 | 35 | 36 | 1 36 | 37 | 3
Explanation:
After changing the scaling, the distribution of data has changed horizontally. Again here the stems are on the left side and the leaves are on the right side.
Using the width argument
On using width argument in the stem() function one can change the width of the plot into a desired plot.
Example:
r
# R program to illustrate # Stem and Leaf Plot # Drawing Stem and Leaf Plot by changing the width stem (ChickWeight$weight, width = 100) |
Output:
The decimal point is 1 digit(s) to the right of the | 2 | 599999999 4 | 0000011111111111111111111222222222222222333345667888888889999999999999000000111111112222+18 6 | 0011111112222222233333444445555566667777788888890011111122222233333444444446667778889999 8 | 00112223344444455555566777788999990001223333566666788888889 10 | 0000111122233333334566667778889901122223445555667789 12 | 00002223333344445555667788890113444555566788889 14 | 11123444455556666677788890011234444555666777777789 16 | 00002233334444466788990000134445555789 18 | 12244444555677782225677778889999 20 | 0123444555557900245578 22 | 0012357701123344556788 24 | 08001699 26 | 12344569259 28 | 01780145 30 | 355798 32 | 12712 34 | 1 36 | 13
Explanation:
As the width is changed to 100 from 80, the distribution of data has also changed. Here the left side of the chart shows the stem while the leaves are on the right-hand side of the chart.
Usage of Stem and Leaf plot
- The stem and leaf plot is very useful for displaying the shape and relative density of data, hence giving the reader or customer a quick overview of the kind of distribution.
- Most of the times they can retain the raw data with quite perfect integrity.
- A very useful method for highlighting the outliers and also for finding the mode.
Contact Us