Common Mistakes in Statistics

Posted by Jinjian Mu on October 11, 2021
CI_for_hypothesis_testings.knit

Confidence intervals for hypothesis testing

Some people may wonder if we can perform hypothesis testings for group comparisons by comparing two confidence intervals. Strictly speaking, the answer is we can’t.

Let’s use two sample t-test with known variances as an example. Suppose \(X_1, \ldots, X_{n_1} \overset{\text{iid}}{\sim} N(\mu_1, \sigma_1^2)\) and \(Y_1, \ldots, Y_{n_2} \overset{\text{iid}}{\sim} N(\mu_2, \sigma_2^2)\) where \(\sigma_1\) and \(\sigma_2\) are both known. We want to test \(H_0: \mu_1=\mu_2\). The confidence interval aligned with the test is \[\bar{X} - \bar{Y}\pm z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\] for \(\mu_1 - \mu_2\), so we can do hypothesis testing by checking if the confidence interval contains 0.

The confidence interval for \(\mu_1\) is \(\bar{X}\pm z_{\alpha/2}\frac{\sigma_1}{\sqrt{n_1}}\), and The confidence interval for \(\mu_2\) is \(\bar{Y}\pm z_{\alpha/2}\frac{\sigma_2}{\sqrt{n_2}}\). A possible way for hypothesis testing is to check if the two confidence intervals overlap. Without loss of generality, we can assume \(\bar{X}>\bar{Y}\). Therefore, we can check if the two confidence intervals overlap by comparing \(\bar{X}- z_{\alpha/2}\frac{\sigma_1}{\sqrt{n_1}}\) with \(\bar{Y}+ z_{\alpha/2}\frac{\sigma_2}{\sqrt{n_2}}\). In other words, the two confidence intervals overlap if \[(\bar{X}- z_{\alpha/2}\frac{\sigma_1}{\sqrt{n_1}}) - (\bar{Y}+ z_{\alpha/2}\frac{\sigma_2}{\sqrt{n_2}}) < 0.\]

\((\bar{X}- z_{\alpha/2}\frac{\sigma_1}{\sqrt{n_1}}) - (\bar{Y}+ z_{\alpha/2}\frac{\sigma_2}{\sqrt{n_2}}) = \bar{X} - \bar{Y} - z_{\alpha/2}\left(\frac{\sigma_1}{\sqrt{n_1}} + \frac{\sigma_2}{\sqrt{n_2}}\right)\) is not equal to \(\bar{X} - \bar{Y}- z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\), so we can not make a conclusion by comparing the two confidence intervals. However, since \(\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \leq \sqrt{\left(\frac{\sigma_1}{\sqrt{n_1}} + \frac{\sigma_2}{\sqrt{n_2}}\right)^2} = \frac{\sigma_1}{\sqrt{n_1}} + \frac{\sigma_2}{\sqrt{n_2}}\), \[\bar{X} - \bar{Y}- z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} > \bar{X} - \bar{Y} - z_{\alpha/2}\left(\frac{\sigma_1}{\sqrt{n_1}} + \frac{\sigma_2}{\sqrt{n_2}}\right).\] Therefore, if \(\bar{X} - \bar{Y} - z_{\alpha/2}\left(\frac{\sigma_1}{\sqrt{n_1}} + \frac{\sigma_2}{\sqrt{n_2}}\right) > 0\) then \(\bar{X} - \bar{Y}- z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} > 0\), which means if the two confidence intervals for \(\mu_1\) and \(\mu_2\) don’t overlap, then the confidence interval for \(\mu_1-\mu_2\) doesn’t contain 0 thus \(\mu_1\) and \(\mu_2\) are significantly different. This conclusion is also correct in other cases.