sample mean vs sample proportion

Sample mean and sample proportion are both statistics used to summarize data, but they serve different purposes and are calculated differently. Here’s a detailed comparison of the two:

1. Sample Mean

Definition:
The sample mean is the average of a set of numerical values. It provides a measure of central tendency, indicating where the center of the data lies.

Calculation:
The sample mean (( \bar{x} )) is calculated as:

[
\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}
]

where:

( n ) = number of observations in the sample
( x_i ) = each individual observation

Characteristics:

Data Type: Used for quantitative (continuous or discrete) data.
Range: Can take any real number value, depending on the data.
Sensitivity: The mean is sensitive to outliers (extreme values can significantly affect the mean).
Distribution: The sampling distribution of the sample mean will tend to be normally distributed (by the Central Limit Theorem) as ( n ) increases, even if the original data is not normally distributed.

Example:
If you have a sample of test scores: 85, 90, 75, 88, the sample mean would be:
[
\bar{x} = \frac{85 + 90 + 75 + 88}{4} = \frac{338}{4} = 84.5
]

2. Sample Proportion

Definition:
The sample proportion is the ratio of the number of successes in a sample to the total number of observations in that sample. It is particularly useful when you’re dealing with categorical data.

Calculation:
The sample proportion (( \hat{p} )) is calculated as:

[
\hat{p} = \frac{x}{n}
]

where:

( x ) = number of successes (the count of a certain category or outcome)
( n ) = total number of observations in the sample

Characteristics:

Data Type: Used for categorical (qualitative) data where outcomes can be categorized (e.g., yes/no, success/failure).
Range: Values range between 0 and 1 (or expressed as a percentage, 0% to 100%).
Interpretation: Represents the likelihood or proportion of success within the sample.
Distribution: For large samples, the distribution of the sample proportion can be approximated using the normal distribution if certain conditions are met (usually when both ( np ) and ( n(1-p) ) are greater than 5).

Example:
If you conducted a survey of 100 people, and 40 of them said they preferred brand A, the sample proportion would be:
[
\hat{p} = \frac{40}{100} = 0.40 \text{ (or 40\%)}
]

Summary

Feature	Sample Mean	Sample Proportion
Definition	Average of numerical data	Ratio of successes in categorical data
Data Type	Quantitative	Categorical
Calculation	(\bar{x} = \frac{\sum x_i}{n})	(\hat{p} = \frac{x}{n})
Range	Any real number	[0, 1] (or percentage)
Sensitivity	Sensitive to outliers	Less sensitive to outliers
Distribution	Approaches normality (CLT)	Approaches normality (CLT) under certain conditions

Both statistics are essential in inferential statistics and help researchers make conclusions about the larger population based on sample data.