Skip to content

Convert between citations formats with R

rOpenSci has recently released a package that should be very handy for people that work with multiple citation managers (e.g. Endnote and a BibTex based manager like JabRef). Here’s the announcement on the rOpenSci blog… looks very simple to use!

BBC release ggplot2 cookbook and style package

The BBC News Graphics guys have made a big step… they’ve released some internal tools (an R package – bbplot) and a cookbook (or in github) for ggplot2. The latter looks like a great resource to complement the official(?) ggplot2 cheatsheet. Here’s a blog post on why they did it…

Extract data from a PNG/TIFF

Sometimes it’s useful to be able to extract data from a published figure. If the figure isn’t a vector based format (for which the numeric data is probably still in the file), it’s possible to digitize the image with R, click the points and extract it that way. The digitize package is simple to use for this purpose…

If you save the top figure from here to a PNG called “Fig2a_Kreners2015” in your working directory, you can digitize the three lines and replicate the figure (or do other stuff if you so wish) as follows

dig10 <- digitize("Fig2a_Kremers2015.png") # do the solid line
dig15 <- digitize("Fig2a_Kremers2015.png") # do the dashed line
dig20 <- digitize("Fig2a_Kremers2015.png") # do the dotted line

# rename the first variable
names(dig10)[1] <- names(dig15)[1] <- names(dig20)[1] <- "age"

plot(dig10, type = "l")
lines(dig15, lty = 2)
lines(dig20, lty = 3)


Easy huh? It just gets a bit laborious with many points… 🙂


lmer vs INLA for variance components

Just for fun, I decided to compare the estimates from lmer and INLA for the variance components of an LMM (this isn’t really something that you would ordinarily do – comparing frequentist and bayesian approaches). The codes are below. A couple of plots are drawn, which show the distribution of the hyperparameters (in this case variances) from INLA, which are difficult to get from the frequentist framework (there’s a link to a presentation by Douglas Bates in the code, detailing why you might not want to do it [distribution is not symmetrical], and how you could do it… but it’s a lot of work).

Note that we’re comparing the precision (tau) rather than the variance or SD… SD = 1/sqrt(tau)

# Compare lmer and inla for LMM
# largely taken from Spatial and spatio-temporal bayesian models with R-INLA (Blangiardo & Cameletti, 2015), section 5.4.2
m <- 10000 # N obs
x <- rnorm(m)
group <- sample(seq(1, 100), size = m, replace = TRUE)
# simulate random intercept
tau.ri <- .25
1/sqrt(tau.ri) #SD
v <- rnorm(length(unique(group)), 0, sqrt(1/tau.ri))
# assign random intercept to individuals
vj <- v[group]
# simulate y
tau <- 3
1/sqrt(tau) #SD
b0 <- 5
beta1 <- 2
y <- rnorm(m, b0 + beta1*x + vj, 1/sqrt(tau))
mod <- lmer(y ~ x + (1|group))
vc <- VarCorr(mod)
form <- y ~ x + f(group, model = "iid", param = c(1, 5e-5))
imod <- inla(form, family = "gaussian",
data = data.frame(y = y, x = x, group = group))
cbind(truth = c(tau, tau.ri), lmer = 1/c(attr(vc, "sc")^2, unlist(vc)), inla = imod$summary.hyperpar$`0.5quant`)
plot.fixed.effects = F,
plot.lincomb = F,
plot.random.effects = F,
plot.predictor = F,
plot.prior = TRUE)
plot(imod$marginals.hyperpar$`Precision for the Gaussian observations`, type = "l")
# the equivalent of this for lmer is not easy to get at all. One would have to profile the deviance function (see
lo <- imod$summary.hyperpar$`0.025quant`[1]
hi <- imod$summary.hyperpar$`0.975quant`[1]
dd <- imod$marginals.hyperpar$`Precision for the Gaussian observations`
dd <- dd[dd[,1] >= lo & dd[,1] <= hi, ]
dd <- rbind(dd, c(hi, 0))
dd <- rbind(dd, c(lo, 0))
polygon(dd, col = "blue")
plot(imod$marginals.hyperpar$`Precision for group`, type = "l")
lo <- imod$summary.hyperpar$`0.025quant`[2]
hi <- imod$summary.hyperpar$`0.975quant`[2]
dd <- imod$marginals.hyperpar$`Precision for group`
dd <- dd[dd[,1] >= lo & dd[,1] <= hi, ]
dd <- rbind(dd, c(hi, 0))
dd <- rbind(dd, c(lo, 0))
polygon(dd, col = "blue")

view raw
hosted with ❤ by GitHub

As you’d hope, the results come pretty close to each other and the truth:

cbind(truth = c(tau, tau.ri), lmer = 1/c(attr(vc, "sc")^2, unlist(vc)), inla = imod$summary.hyperpar$`0.5quant`)
      truth      lmer      inla
       3.00 2.9552444 2.9556383
group  0.25 0.2883351 0.2919622

Code on Github…

anytime – dates in R

I just saw an announcement on R Bloggers about the anytime package. It looks to be a very handy package to convert dates in pretty much any format to Date or POSIX classes, without the need to define the format – it’s guessed by an underlying C++ library.

It certainly seems to be flexible… putting in the same date in 8 different formats all yielded the same result! (FWIW, “15th October 2010” doesn’t work…)

> anytime::anydate(c("2010-10-15", "2010-Oct-15", 
+                    "20101015", "15 Oct 2010", 
+                    "10-15-2010", "15 October 2010", 
+                    "15oct2010", "2010oct15", "15th October 2010"))
[1] "2010-10-15" "2010-10-15" "2010-10-15" "2010-10-15" "2010-10-15" "2010-10-15"
[7] "2010-10-15" "2010-10-15" NA       

There’s an equivalent function for times (anytime instead of anydate). Looks like working with dates and times just got easier!


Estimating Pi

I came across this post which gives a method to estimate Pi by using a circle, it’s circumscribed square and (lots of) random points within said square. Booth used Stata to estimate Pi, but here’s some R code to do the same thing…

x <- 0.5 # center x
y <- 0.5 # center y
n <- 1000 # nr of pts
r <- 0.5 # radius
pts <- seq(0, 2 * pi, length.out = n)
plot(sin(pts), cos(pts), type = 'l', asp = 1) # test

xy <- cbind(x + r * sin(pts), y + r * cos(pts))
sl <- SpatialPolygons(list(Polygons(list(Polygon(xy)), "polygon")))
plot(sl, add=FALSE, col = 'red', axes=T )

# the square
xy <- cbind(c(0, 1, 1, 0), c(0, 0, 1, 1))
sq <- SpatialPolygons(list(Polygons(list(Polygon(xy)), "polygon")))

plot(sq, add = TRUE)

N <- 1e6
x <- runif(N, 0, 1)
y <- runif(N, 0, 1)
sp <- SpatialPoints(cbind(x, y))
plot(sp, add = TRUE, col = "green")

(sim_pi <- (sum(gIntersects(sp, sl, byid = TRUE))/N) *4)
sim_pi - pi

Note the use of sp and rgeos packages to calculate the intersections.

“Where to bird” geostats in R

I just came across a nice little post on acquiring and visualizing geodata in R using the Max Planck Institute of Ornithology as an example. It’s by the rOpenSci guys. Some useful code in there by the look of it… 🙂 Worth a look…

Coloured output in the R console

Just a little fun today… the R console isn’t the most interesting of things… text is typically either black or red (assuming default settings in RStudio). There’s a package though called crayon which allows one to change the style of text in terms of colour, background and some font-type settings. It could be an interesting way to spice up summaries (I’m not recommending it, but it’s a possibility. As far as packages are concerned, it’s just another dependency…)…

  'I am a green line ' %+%
  blue$underline$bold('with a blue substring') %+%
  yellow$italic(' that becomes yellow and italicised!\n')


rOpenSci’s drake package

If you don’t know rOpenSci, then I recommend checking them out. They write a lot of really good packages for R. A relatively new seems to be drake. I’ve not played with it yet, but it looks to be very useful at giving indications about which parts of an analysis are subject to changes, and only rerunning those parts to speed up redoing an analysis (envisage the overlays for some version control systems or dropbox that show the status of files, although it’s more complicated than that). Knitr has caching, which goes some way to handling this, but here you can see where outdated parts are in the scope of the entire analysis…

It looks like a super tool!drake_dep

Rough looking figures from R

A recent blog post regarding data visualization had some barplots I liked the look of (aesthetically…for research purposes, they wouldn’t be suitable). They look as if they’ve be coloured in with a pencil, rather than having solid blocks of colour… I wondered whether it’s possible with R, and indeed it is. There’s a github project called ggrough that interacts with ggplot2.