[1] 0.17
Inference for a Proportion
Let \(\pi\) represent the proportion of bike owners on campus then \(\pi =\) 0.15.
\(p_1\) - Proportion of first sample (n = 100)
[1] 0.17
\(p_2\) -Proportion of second sample (n = 100)
[1] 0.12
\(p_3\) -Proportion of third sample (n = 100)
[1] 0.14
If certain conditions are met then
\[p \sim \text{approximately } N(\pi, \frac{\pi(1-\pi)}{n})\]
\(p \sim \text{approximately } N(\text{mean} = \pi, \text{sd} = \sqrt{\frac{\pi(1-\pi)}{n}})\)
We call the standard deviation of the sampling distribution standard error of the estimate.
Standard error of a proportion is \(\sqrt{\frac{p(1-p)}{n}}\).
CI = \(\text{point estimate} \pm \text { margin of error}\)
CI = \(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)
CI for a proportion = \(p \pm \text {critical value} \times \text{standard error}\)
CI for a proportion = \(p \pm \text {critical value} \times \sqrt{\frac{p(1-p)}{n}}\)
95% CI for a proportion = \(p \pm 1.96 \times \sqrt{\frac{p(1-p)}{n}}\) because …
95% of the data falls within 1.96 standard deviations in the normal distribution.
Recall \(p = 0.17\) and \(n = 100\)
95% CI for a proportion = \(p \pm 1.96 \times \sqrt{\frac{p(1-p)}{n}}\)
95% CI = \(0.17 \pm 1.96 \times \sqrt{\frac{0.17(1-0.17)}{100}}\)
95% CI = \(0.17 \pm 1.96 \times 0.03756328\)
95%CI = \(0.17 \pm 0.07362403\)
95%CI = (0.09637597, 0.243624)
95%CI = (0.09637597, 0.243624)
We are 95% confident that the true population proportion of bike owners is in this confidence interval.
95%CI = (0.09637597, 0.243624)
I have taken 100 samples with \(n = 100\), calculated the sample proportion, standard error, and 95% CI interval for each sample
# A tibble: 100 × 4
p SE lower_bound upper_bound
<dbl> <dbl> <dbl> <dbl>
1 0.19 0.0392 0.113 0.267
2 0.21 0.0407 0.130 0.290
3 0.15 0.0357 0.0800 0.220
4 0.15 0.0357 0.0800 0.220
5 0.13 0.0336 0.0641 0.196
6 0.11 0.0313 0.0487 0.171
7 0.16 0.0367 0.0881 0.232
8 0.11 0.0313 0.0487 0.171
9 0.19 0.0392 0.113 0.267
10 0.16 0.0367 0.0881 0.232
# ℹ 90 more rows
Which of the following confidence intervals would be the widest? Why?
CI = \(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)
CI = \(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)
CI = \(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)
CI = \(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)
Thus 99% CI would be the widest.
Researchers A, B, and C are interested in proportion of bike ownership took samples. They each take separate samples of size 100, 500, and 1000 respectively. They each have a sample proportion of 0.18. What a surprise! Which of the researchers will find the narrowest 95% CI?
Researcher A: 95% CI for a proportion = \(0.18 \pm 1.96 \times \sqrt{\frac{0.18(1-0.18)}{100}}\)
Researcher B: 95% CI for a proportion = \(0.18 \pm 1.96 \times \sqrt{\frac{0.18(1-0.18)}{500}}\)
Researcher C: 95% CI for a proportion = \(0.18 \pm 1.96 \times \sqrt{\frac{0.18(1-0.18)}{1000}}\)
As sample size increases, the standard error decreases and the margin of error also decreases, thus the confidence interval interval gets narrower. The Researcher C would have the narrowest CI.
If these conditions are met then
\(p \sim \text{approximately } N(\pi, \frac{\pi(1-\pi)}{n})\)
Example
According to a Gallup Survey of 1017 adults living in US 66% of Americans favor legalizing marijuana.
Compute 95% confidence interval for the population proportion of those who favor legalizing marijuana.
Information on the survey can be found here
\(n = 1017\) and \(p = 0.66\)
\(np = 1017 \cdot 0.66 = 671.22\). There are more than 10 people favoring legalizing marijuana. \(n(1-p) = 1017 \cdot (1-0.66) = 345.78\). There are more than 10 people opposing legalizing marijuana.
CI = \(\text{point estimate} \pm \text { margin of error}\)
CI = \(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)
95% CI for a proportion = \(p \pm 1.96 \times \sqrt{\frac{p(1-p)}{n}}\)
95% CI = \(0.66 \pm 1.96 \times \sqrt{\frac{0.66(1-0.66)}{1017}}\)
95% CI = (0.6308857, 0.6891143)
We are 95% confident that the true proportion of Americans who support legalizing marijuana falls between 0.6308857 and 0.6891143.
If we do not find any evidence against the null (a single pink cow) then we fail to reject the null. We can keep searching for more evidence against the null (i.e. continue looking for a pink cow). We will never be able to say the null is true so we never accept the null. We fail to reject the null. All we can do is keep looking for a pink cow.
We are searching for evidence against the null. We are searching for evidence in the sample that may serve as an evidence against the null.
Do the majority of Americans approve allowing DACA immigrants to become citizens?
\(H_0: \pi \leq 0.5\)
\(H_A: \pi > 0.5\)
Recall that according to CLT \(p \sim \text{approximately }N(\pi, \frac{\pi(1-\pi)}{n})\)
If \(H_0: \pi = 0.5\) then the null sampling distribution would be \(N(0.5, \frac{0.5(1-0.5)}{n})\)
According to a Gallup survey of 1520 US adults , 83% approve of allowing DACA immigrants to become citizens.
\(p = 0.83\)
\(n = 1520\)
We said that the null sampling distribution would be \(N(0.5, \frac{0.5(1-0.5)}{n})\) which is
\(N(0.5, \frac{0.5(1-0.5)}{1520})\)
\(N(0.5, 0.0001644737)\)
Any sample proportion that falls of really far away from the center of the distribution would count as an evidence against the null.
If the null is true, then it would be unlikely to observe extremely high or low sample proportions.
We want to know the probability of observing an extreme sample proportion like ours (p = 0.83) if the \(H_0\) were true.
If our sample proportion is “extreme” then so is 0.90, 0.91, 0.917, 0.9273423 etc.
If the \(H_0\) is true what is the probability that we will observe an extremely high sample proportion?
Probability of observing sample proportion of 0.83 and higher
P-value is the probability of observing a sample statistic at least as extreme as the one that has been observed if the null hypothesis were true.
If p-value is less than 0.05 we reject the null hypothesis.
In the Gallup survey the sample proportion was 0.83.
If the null hypothesis ( \(H_0 = \pi = 0.5\) ) were true, then it would be quite unlikely (p = \(2.594241 \times 10^{-146}\)) to observe a sample proportion that is at least extreme like the one we observed. We consider this “quite unlikely” since p < 0.05.
Since we have observed the sample proportion of 0.83 then we reject the \(H_0\).
We conclude that the population proportion of Americans who approve allowing DACA immigrants to become citizens is higher than 0.5.
Gallup website indicates that the sample was randomly selected. We will assume independence.
1520 < 10% US Population