Statistics Interview Questions for Intermediate Level

Statistics Interview Questions for Basic Level

Statistics Interview Questions for Expert Level

19. How to check if a Distribution is Normal?

There are many possible ways to check if a distribution is normal or not. Some of them are :

Histogram: Plot the data distribution into a histogram, if the shape of plot is like a Bell, with highest frequency at the center and melts down on both sides, it is normally distributed.
QQplot: Plot the distribution as a qqplot. If the data points mostly align along the straight diagonal line, it is normally distributed.
Measure of Central tendency: For a normal distribution, mean = median = mode.

20. What is Measure of Position? How is it helpful in Descriptive Statistics?

Measure of position is used where we want to determine “where a specific data point or value falls” in a sample or distribution. It is sometimes necessary to know about the relative relation between two data points in terms of their position (like 75th percentile data, etc.). Some of the common measures of Position are:

Percentile: It is a number below which a certain percentage of data points falls.
Quartiles: Quartiles divides your data points into 4 quarters, one lower quarter, two middle quarter and one higher quarter.
Five number summary: It includes the lowest value, the 3 interquartile ranges and the highest value.

21. What are the different types of Probability Sampling methods?

Probability sampling method is a technique of selecting a sample from a population such that each individual of the population has equal chance of getting selected into the sample. This is done by randomly selecting an individual. Some known types of Probability Sampling method are:

Simple random sampling: In this, every member of a population is selected randomly and has an equal chance of being chosen
Stratified random sampling: Here, you divide the population into groups and then randomly select a member from each group to be included in the sample.
Cluster random sampling: in this, you create cluster from the whole population and randomly choose a whole cluster and each member of that cluster as your sample.
Systematic random sampling: In this, you randomly choose a starting point from an ordered population and then choose a member at equal interval to be included in the sample.

22. What is Confidence level? How is Confidence level related to Width of Confidence interval?

Confidence Level: It is a statistical measure used to estimate the degree of confidence or certainty about an estimation process.

Confidence Interval: It is the range of values explaining the uncertainty surrounding an estimate.

In statistical analysis, The width confidence interval is directly proportional to the confidence level, i.e., as the confidence level increases, the width of confidence interval increases as well.

23. What are the different types of Probability Distributions?

There two types of probability distribution:

Discrete probability distribution – It is the probability distribution associated with discrete random variable. A discrete random variable is variable that has countable number of possible values. There are several types of discrete probability distribution, out of which some commonly known are listed below:
- Uniform Distribution: It is done for distribution where the likelihood of each event occurring is the same.
- Binomial Distribution: It models the distribution where there are only two likely scenarios – success or failure. And those two events are mutually exclusive i.e., they cannot occur at the same time.
- Bernoulli Distribution: It is also the same as Binomial except this is for a single trial of event whereas Binomial is for repeated trials
- Poisson Distribution: This models the probability of a given number of events in a fixed time interval.
Continuous Probability Distribution – The probability distribution associated with continuous value, i.e., it cannot be counted as it can take any value within a given range. Some known types of Continuous Probability Distribution are:
- Normal Distribution: It is the most common distribution characterized by its iconic bell shaped curve where the mean is at the center of the shape.
- Exponential Distribution: This describes the probability distribution of a Poisson process. A Poisson process determines the probability of random processes in a time period.

24. What is Hypothesis testing? Where do we use it?

Hypothesis testing is a fundamental part of inferential statistics which is used to help figure out whether an assumption about a population is true of not. This is done so by testing a sample from that particular population. The possible use cases for hypothesis testing are as follows:

Inference : We can draw conclusions about a possible effect in a population
Quality testing : Hypothesis testing allows us to evaluate product features and its quality
Scientific research : We use Hypothesis testing for Scientific purposes where any assumption’s statistical significance is checked.
Policy Evaluation : We can use hypothesis testing for different policy evaluations too,

25. What is Interquartile range?

Interquartile range (IQR) is the range between its first quartile (Q1 – representing 25% of data points) and third quartile (Q3 – representing 75% of the data points). This is used as a measure of dispersion and sometimes as measure of position as well.

26. What is Conditional Probability? How is it related to Bayes Theorem?

Conditional Probability: It is the probability of one event occurring when another (related to the first event) has already occurred. The formula for Conditional probability is:

[Tex]P(A|B) = \frac{P(A\cap B)}{P(B)} [/Tex]

Bayes Theorem: It states that for any two event A and B, the probability of A given B is equal to the probability of B given A multiplied by Probability of A divided by Probability of B. It is given as:

[Tex]P(A|B) = \frac {P(B|A)\cdot{P(A)}}{P(B)} [/Tex]

Bayes theorem is based on the principles of Conditional probability.

27. Explain the Joint and Marginal Probability.

Joint Probability : It is the probability of two or more events that are happening together. It can represented as the intersection between two or more probabilities.

Marginal Probability: It is the probability of a single random variable in isolation and it is not dependent on any other event.

28. What is the difference Between Probability Mass function and Probability Distribution Function?

Category	Probability Mass Function (PMF)	Probability Distribution Function (PDF)
Definition	PMF is used for discrete random variables. It assigns probabilities to individual values of a discrete random variable.	PDF is used for continuous random variables. It represents the probability density of a continuous random variable over a range of values
Area Under the curve	Since, Discrete probability Distributions are represented as bars or spikes, The sum of all PMF values over all possible values equals 1.	As Continuous Distributions are represented as a line or a curve, the area under that curve is equal to 1.
Example	Tossing a fair coin	Height of Giraffes

29. What is Z-score? How do you calculate it?

A Z-score or Standard score is a statistical measure which helps find out about how many standard deviation above or below the population mean is a data point situated. It is a form of Measure of Position. It is given as:

[Tex]Z = \frac {x-\mu}{\sigma} [/Tex] where,

Z = standard score

x = data point

[Tex]\mu [/Tex] = mean of the distribution

[Tex]\sigma [/Tex] = standard deviation of the distribution.

To calculate Z-score:

Subtract the mean of the population from the data point
Then divide it by the standard deviation of the distribution

based on the result, a Z-score indicates:

The data point is above the mean if Z-score is positive
The data point is below the mean if Z-score is negative
The data point is the mean if Z-score is 0

30. What is meant by standardization? Why do we sometimes standardize Normal Distribution?

Standardization refers to the process of transforming the data into a standard scale. Standardization is done by subtracting the mean and then dividing by standard deviation. It is done so that the data is centered around 0 and has the standard deviation of 1.

Standardization is sometimes implemented on Normal Distribution so that it is transformed into a more standardized scale. This is done so that:

it is more comparable with respect to the original distribution which will further help in inferring how much the data point varies
Allows various tests like Z-test and T-test, which largely assumes that the distribution is standardized.
helps in Outlier Detection.

31. What are Axioms of Probability?

Axioms of Probability are foundations of probability used to assign it to an event. There are 3 axioms of probability which are:

Probability of any event is a non-negative real number.
Probability of the entire Sample Space is one
If there are two mutually exclusive probabilities [Tex]E_{1} [/Tex]and [Tex]E_{2} [/Tex], we can say that:

[Tex]P(E_{1} \cup E_{2}) = P(E_{1}) + P(E_{2}) [/Tex]

32. What is Empirical Rule?

Empirical rule states that for a normal distribution:

Approximately 68% of the data falls within one standard deviation of the mean.
Approximately 95% of the data falls within two standard deviations of the mean.
Approximately 99.7% of the data falls within three standard deviations of the mean.

33. What is the difference Between Null Hypothesis and Alternate Hypothesis?

Category	Null Hypothesis	Alternate Hypothesis
Definition	Null Hypothesis ([Tex]H_{0} [/Tex]) is a statement which is assumed to be true unless proven otherwise	Alternate hypothesis ([Tex]H_{a} [/Tex]) is the contradicting statement which is proven true if there’s enough convincing evidence.
Objective	Represents the default or null assumption that you aim to test against.	Represents the specific research question or hypothesis you want to investigate and support.
Direction of effect	Assumes no relation, no effect	Usually assumes a logical relation between. (<,>,etc)

34. What is Standard Error? How is it related to Variance of a Data Distribution?

Standard Error is defined as the amount of variability or uncertainty associated with Sample mean. It helps us understand how much the sample mean is likely to vary from the population mean if we were to take multiple random samples from the same population It is give as:

[Tex]SE = \frac {\sigma}{\sqrt {n}} [/Tex]

where, [Tex]\sigma [/Tex] = population standard deviation

n = the sample size

Standard error is directly proportional to Variance, i.e., as the Variance of a dataset increases, the standard error will increase too.

Statistics Interview Questions for Intermediate Level

Top 50 Plus Interview Questions for Statistics with Answers 2023

Contact Us

Statistics Interview Questions for Intermediate Level

Top 50 Plus Interview Questions for Statistics with Answers 2023

Similar Reads

Contact Us