Skip to content

Calculating confidence intervals for proportions

April 9, 2014

Heres a couple of functions for calculating the confidence intervals for proportions.

UPDATE: These confidence intervals, together with many more, have actually been programmed in the binom package (binom.confint). Use them instead. For Stata users, CIs from proportions are available with the ci program.

Firstly I give you the Simple Asymtotic Method:

simpasym <- function(n, p, z=1.96, cc=TRUE){
  out <- list()
  if(cc){
    out$lb <- p - z*sqrt((p*(1-p))/n) - 0.5/n
    out$ub <- p + z*sqrt((p*(1-p))/n) + 0.5/n
  } else {
    out$lb <- p - z*sqrt((p*(1-p))/n)
    out$ub <- p + z*sqrt((p*(1-p))/n)
  }
  out
}

which can be used thusly….

simpasym(n=30, p=0.3, z=1.96, cc=TRUE)
$lb
[1] 0.119348

$ub
[1] 0.480652

 

Where n is the sample size, p is the proportion, z is the z value for the % interval (i.e. 1.96 provides the 95% CI) and cc is whether a continuity correction should be applied. The returned results are the lower boundary ($lb) and the upper boundary ($ub).

The second method is the Score method and is define as follows:

scoreint <- function(n, p, z=1.96, cc=TRUE){
  out <- list()
  q <- 1-p
  zsq <- z^2
  denom <- (2*(n+zsq))
  if(cc){ 
    numl <- (2*n*p)+zsq-1-(z*sqrt(zsq-2-(1/n)+4*p*((n*q)+1)))
    numu <- (2*n*p)+zsq+1+(z*sqrt(zsq+2-(1/n)+4*p*((n*q)-1)))
    out$lb <- numl/denom
    out$ub <- numu/denom
    if(p==1) out$ub <- 1
    if(p==0) out$lb <- 0
  } else {
    out$lb <- ((2*n*p)+zsq-(z*sqrt(zsq+(4*n*p*q))))/denom
    out$ub <- ((2*n*p)+zsq+(z*sqrt(zsq+(4*n*p*q))))/denom
  }
  out
}

and is used in the same manner as simpasym…

scoreint(n=30, p=0.3, z=1.96, cc=TRUE)
$lb
[1] 0.1541262

$ub
[1] 0.4955824

These formulae (and a couple of others) are discussed in Newcombe, R. G. (1998) who suggests that the score method should be more frequently available in statistical software packages.

Hope that help someone!!!

Reference:

Newcombe, R. G. (1998) Two-sided confidence intervals for the single proportion: comparison of seven methods. Statist. Med., 17: 857-872. doi: 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.C

 

Advertisements

From → R

7 Comments
  1. Hello,

    for your information,
    the package: binGroup provides a function which calculates several confidence intervals for a single binomial proportion.

    Best Regards
    Johannes

    • Thanks for that. I’ll look into it next time i need to do something with proportions.

  2. I may be misunderstanding your eNote as I deal with a different kind of data, but if you are working with proportional data I assume you are talking about a closed array [?] in which the proportions sum to 1 or 100. If that is so how can you apply normal classical statistics without transformation – you hit the constant-sum problem and are working in simplex space not real-number space. See for example: last couple of paragraphs at http://www.geol.lsu.edu/Faculty/Hart/enotes/eNotes.html/eNote00 or more importantly Aitchison’s works [1981,1982,1983,1984,1986,2003], and that of Pawlowsky-Glhan et al.,[2006, 2007].

    I apologize if I have mis-read you but if not it sounds like trouble!

    George Hart

    • Hi George,

      Yeah, I think there might be some misunderstanding. I wasnt particularly talking about arrays of values at all…just individual values.
      I’ll briefly explain what I was doing. Using some medical data I found that 3 from 40 people suffered from a disease. I also wanted a CI for the 7.5%. Newcombe lists a nonexhaustive 7 methods for calculating such a CI, of which 4 can be calculated using the functions in the post. This is pretty well established stuff though, as far as I’m aware…
      There are issues with the simple asymtote method at high and low proportions though – the interval extends outside of the 0-1 range. The Score method doesnt seem to suffer from this though.
      > simpasym(40,1/40)
      $lb
      [1] -0.03588362
      $ub
      [1] 0.08588362

      > scoreint(40,.1/40)
      $lb
      [1] 0.001341018
      $ub
      [1] 0.113182

      In case someone was looking for the link its http://www.geol.lsu.edu/Faculty/Hart/eNotes/eNote00.pdf

  3. Alexis permalink

    You might also be interested in the Agresti-Coull intervals. See:

    Agresti, A. and Coull, B. A. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician, 52(2):119–126.

    and perhaps:

    Agresti, A. and Caffo, B. (2000). Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. The American Statistician, 54(4):280–288.

  4. Thank you for your thoughts on confidence intervals of proportions. Your R code has been very helpful.

Trackbacks & Pingbacks

  1. Calculating confidence intervals for proportions ← Patient 2 Earn

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: