# Calculating confidence intervals for proportions

Heres a couple of functions for calculating the confidence intervals for proportions.

UPDATE: These confidence intervals, together with many more, have actually been programmed in the binom package (binom.confint). Use them instead. For Stata users, CIs from proportions are available with the ci program.

Firstly I give you the Simple Asymtotic Method:

simpasym <- function(n, p, z=1.96, cc=TRUE){ out <- list() if(cc){ out$lb <- p - z*sqrt((p*(1-p))/n) - 0.5/n out$ub <- p + z*sqrt((p*(1-p))/n) + 0.5/n } else { out$lb <- p - z*sqrt((p*(1-p))/n) out$ub <- p + z*sqrt((p*(1-p))/n) } out }

which can be used thusly….

```
simpasym(n=30, p=0.3, z=1.96, cc=TRUE)
$lb
[1] 0.119348
$ub
[1] 0.480652
```

Where n is the sample size, p is the proportion, z is the z value for the % interval (i.e. 1.96 provides the 95% CI) and cc is whether a continuity correction should be applied. The returned results are the lower boundary ($lb) and the upper boundary ($ub).

The second method is the Score method and is define as follows:

scoreint <- function(n, p, z=1.96, cc=TRUE){ out <- list() q <- 1-p zsq <- z^2 denom <- (2*(n+zsq)) if(cc){ numl <- (2*n*p)+zsq-1-(z*sqrt(zsq-2-(1/n)+4*p*((n*q)+1))) numu <- (2*n*p)+zsq+1+(z*sqrt(zsq+2-(1/n)+4*p*((n*q)-1))) out$lb <- numl/denom out$ub <- numu/denom if(p==1) out$ub <- 1 if(p==0) out$lb <- 0 } else { out$lb <- ((2*n*p)+zsq-(z*sqrt(zsq+(4*n*p*q))))/denom out$ub <- ((2*n*p)+zsq+(z*sqrt(zsq+(4*n*p*q))))/denom } out }

and is used in the same manner as simpasym…

```
scoreint(n=30, p=0.3, z=1.96, cc=TRUE)
$lb
[1] 0.1541262
$ub
[1] 0.4955824
```

These formulae (and a couple of others) are discussed in Newcombe, R. G. (1998) who suggests that the score method should be more frequently available in statistical software packages.

Hope that help someone!!!

Reference:

Newcombe, R. G. (1998) Two-sided confidence intervals for the single proportion: comparison of seven methods. Statist. Med., 17: 857-872. doi: 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.C

Hello,

for your information,

the package: binGroup provides a function which calculates several confidence intervals for a single binomial proportion.

Best Regards

Johannes

Thanks for that. I’ll look into it next time i need to do something with proportions.

I may be misunderstanding your eNote as I deal with a different kind of data, but if you are working with proportional data I assume you are talking about a closed array [?] in which the proportions sum to 1 or 100. If that is so how can you apply normal classical statistics without transformation – you hit the constant-sum problem and are working in simplex space not real-number space. See for example: last couple of paragraphs at http://www.geol.lsu.edu/Faculty/Hart/enotes/eNotes.html/eNote00 or more importantly Aitchison’s works [1981,1982,1983,1984,1986,2003], and that of Pawlowsky-Glhan et al.,[2006, 2007].

I apologize if I have mis-read you but if not it sounds like trouble!

George Hart

Hi George,

Yeah, I think there might be some misunderstanding. I wasnt particularly talking about arrays of values at all…just individual values.

I’ll briefly explain what I was doing. Using some medical data I found that 3 from 40 people suffered from a disease. I also wanted a CI for the 7.5%. Newcombe lists a nonexhaustive 7 methods for calculating such a CI, of which 4 can be calculated using the functions in the post. This is pretty well established stuff though, as far as I’m aware…

There are issues with the simple asymtote method at high and low proportions though – the interval extends outside of the 0-1 range. The Score method doesnt seem to suffer from this though.

> simpasym(40,1/40)

$lb

[1] -0.03588362

$ub

[1] 0.08588362

> scoreint(40,.1/40)

$lb

[1] 0.001341018

$ub

[1] 0.113182

In case someone was looking for the link its http://www.geol.lsu.edu/Faculty/Hart/eNotes/eNote00.pdf

You might also be interested in the Agresti-Coull intervals. See:

Agresti, A. and Coull, B. A. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician, 52(2):119–126.

and perhaps:

Agresti, A. and Caffo, B. (2000). Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. The American Statistician, 54(4):280–288.

Thank you for your thoughts on confidence intervals of proportions. Your R code has been very helpful.