What is the standard error of the sample mean ?

The standard error of the sample mean measures the variability of sample means around the population mean.

How is the standard error of the sample mean calculated ?

It is calculated by dividing the sample's standard deviation by the square root of the sample size.

What is a confidence interval ?

A confidence interval provides a range of values that is likely to contain the population parameter of interest.

How do you interpret a 95% confidence interval ?

A 95% confidence interval means that 95% of such intervals would contain the true population parameter if the population were sampled repeatedly.

What is the difference between the z-statistic and t-statistic in hypothesis testing ?

The z-statistic is used when the population variance is known and the sample size is large, while the t-statistic is used when the population variance is unknown or the sample size is small.

What is a null hypothesis in hypothesis testing ?

The null hypothesis is a statement that there is no effect or no difference, and it is tested against an alternative hypothesis.

What are Type I and Type II errors ?

A Type I error occurs when the null hypothesis is incorrectly rejected, while a Type II error occurs when the null hypothesis is not rejected when it is false.

How does sample size affect Type I and Type II errors ?

Increasing the sample size reduces the probability of Type II errors without affecting the probability of Type I errors.

What is the power of a test in hypothesis testing ?

The power of a test is the probability of correctly rejecting the null hypothesis when it is false.

How is the equality of two means tested ?

The equality of two means is tested using a hypothesis test that compares the difference between the sample means to a critical value from a statistical distribution.

Standard Error of Sample Mean

The standard error of the sample mean is the standard deviation of the distribution of the sample means.

where σ is the standard deviation of the population.

It is same as the standard deviation of the sample mean as discussed earlier.

Practically, the population’s standard deviation is not known. So the standard error of the sample mean is estimated by dividing the standard deviation of the sample by √n.

where s is the standard deviation of the population.

Confidence Intervals

A confidence interval provides a range of values which is likely to contain the population parameter of interest.
Confidence intervals are constructed at a confidence level, selected by the user. For example a 95% confidence interval would mean that if the same population is sampled infinite number of times, and intervals are estimated each time, the resulting intervals would contain the true population parameter in approximately 95% of the cases.
A simpler, but not completely satisfactory interpretation (or the layman interpretation) can be –
Confidence interval is the probability that a value will fall between an upper and lower bound of a probability distribution.
For example, if the 95% confidence interval for a stock A’s return is from -3% to 5% over the next year, then, you are 95% confident that the returns of stock A over the next year will fall between -3% and 5%.
The biggest misconception regarding confidence intervals is that they represent the percentage of data from a given sample that falls between the upper and lower bounds.
A confidence stated at a 1-α level can be thought of as the inverse of a significance level, α i.e. if significance level is α, then confidence level is 1-α. For example a 95% confidence level also represents 5% level of significance.
In general, confidence intervals take on the following form:

point estimate ± (reliability factor × standard error)

If the population has a normal distribution with a known variance, a confidence interval for the population mean can be calculated as:

If population variance is unknown, then critical value of t\-statistic is preferable as the reliability factor, although critical value of z-statistic is also permissible.

Test Statistic Selection criteria – z or t

Distribution Type	Population Variance	Sample Size
Distribution Type	Population Variance	n≥30	n<30
Normal	Known	z – statistic	z – statistic
Non-Normal	Known	z – statistic	NA
Normal	Unknown	t – statistic (or z – statistic)	t – statistic
Non-Normal	Unknown	t – statistic (or z – statistic)	NA

Hypothesis Testing

Hypothesis testing is a method for testing a claim or hypothesis about a parameter in a population, using data measured in a sample.
It is a way of testing the results of a survey or experiment to assess whether meaningful results have been obtained or not.
It can also be interpreted as a method of determining the validity of the results by estimating the probability that the results have happened just by chance. If the results may have occurred by chance, the experiment will not be repeated and so has little use.
A hypothesis is a statement about the value of a population parameter which has to be tested .
For example, if the researcher is interested in the mean monthly return of a stock A, then an example of a hypothesis can be – “The mean monthly return of stock A is greater than 10%”.

Hypothesis Testing – Null and Alternate Hypothesis

Although it is not necessary, but the null hypothesis, written as H_0, is generally constructed so that the desired result is false. This implies that the null hypothesis, is the hypothesis the researcher wants to reject, because it is against his belief. For example, if the researcher believes that the mean return of stock A is greater than 10%, the null hypothesis will be constructed like this:

Based on the results of test, if there is a very high probability of the return being greater than 10%, then we say that we reject the null hypothesis at a certain confidence level.

We never accept the null hypothesis – we either reject it or fail to reject it.
The alternate hypothesis, written as H_A, is written in exactly opposite terms as the null hypothesis, and is concluded if there is enough evidence to reject the null hypothesis.

Hypothesis Testing Procedure

State the hypothesis
State the level of significance, α
Select the appropriate test statistic, and specify the critical value of the test statistic based on the significance level stated in step 2
State the decision rule regarding the hypothesis, based on the critical value of the test statistic specified in step 3
Calculate the value of the sample test statistic
Make a decision based on step 4 and step 5
Draw a conclusion based on step 1 and step 6

Hypothesis Testing Example – One tailed Test(Right)

An electric bulb manufacturer claims that the average lifetime of bulbs produced by it, is more than 36 months, with a standard deviation of 3 months. A random sample of 50 bulbs was taken and the average lifetime of those bulbs was found to be 37 months. Test the manufacturer’s claim at 5% level of significance. ?
An electric bulb manufacturer claims that the average lifetime of bulbs produced by it, is more than 36 months, with a standard deviation of 3 months. A random sample of 50 bulbs was taken and the average lifetime of those bulbs was found to be 37 months. Find p-value.?

Hypothesis Testing Example – One tailed Test(left)

The owner of a chemist store claims that the average time in which the customers are serviced in his store is less than 15 minutes. A random sample of 100 customers was observed and the average service time was found to be 14 minutes and 30 seconds, with a sample standard deviation of 45 seconds. Test the owner’s claim at 5% level of significance.?

Hypothesis Testing Example – Two Tailed Test

Blood glucose levels for obese patients of Mumbai have been traditionally found to have a mean of 120 with a standard deviation of 18. A diabetologist prescribes a new MF diet and claims that it will have mostly a positive effect on blood glucose levels, but he also warns that for some patients, it may have a negative effect as well. A sample of 81 patients who have tried the new FS diet have a mean glucose level of 110. Test whether the MF diet had an effect, at a 5% level of significance.

Type I, Type II Errors and power of a Test

Type I error : rejecting the null hypothesis when it is actually true.
Type II error : failure to reject the null hypothesis when it is actually false
The test size (or significance level) is the (maximum) probability of making a Type I error.
The power of a test is the probability of correctly rejecting the null hypothesis when it is false. Hence, the power of a test is actually one minus the probability of making a Type II error.

Actual Inference	H₀ is True	H₀ is False
H₀ is True	Correct Decision Confidence Level = 1-α	Type-II Error P(Type-II Error)= β
H₀ is False	Type-I Error P(Type-I Error)= α	Correct Decision Power=1-β

Tradeoff Between type I and Type II Errors

CHOICE OF SIGNIFICANCE LEVEL AND EFFECT OF SAMPLE SIZE

Decreasing the probability of a Type I error will result in increase in probability of Type II error, and vice-versa, if the sample size is constant.
If significance level is fixed, then probability of Type II error can only be decreased by increasing the sample size.

Testing the Equality of Two Means

Testing whether the means of two series are equal is a common problem. Consider the iid bivariate random variable W_i = [X_i, Y_i].The component random variables X_i and Y_i are each iid and may be contemporaneously correlated (i.e., Corr[X_i, Y_i] ≠ 0).
Now consider a test of the null hypothesis H₀: μ_X = μ_Y(i.e., that the component random variables X_iand Y_i have equal means). To implement a test, it is important to construct a new random variable –

Z_i = X_i – Y_i
If the null hypothesis is true, then –

This is a standard hypothesis test of H₀:μ_Z = 0 against the alternative that H₁:μ_Z ≠ 0 and the test statistic is constructed as –

The calculated test statistic can be compared to a standard normal. This method automatically accounts for any correlation between X_i and Y_i and hence, this test statistic can be equivalently expressed as –

This expression for 𝑇 shows that the correlation between the two series affects the test statistic. If the two series are positively correlated, then the covariance between X_i and Y_i reduces the variance of the difference.
When X_i and Y_i are independent, they are not paired as a bivariate random variable, and so the number of sample points for X_i (n_X)and Y_i(n_Y) may differ. The test statistic for testing that the means are equal (i.e., H₀:μ_X = μ_Y) is then given as –

Testing the Equality of Two Means – Example

Consider two distributions that measure the average rainfall in City X and City Y, respectively. Their means, variances, sample size and correlation are –