Skip to content

Convert between citations formats with R

rOpenSci has recently released a package that should be very handy for people that work with multiple citation managers (e.g. Endnote and a BibTex based manager like JabRef). Here’s the announcement on the rOpenSci blog… looks very simple to use!

BBC release ggplot2 cookbook and style package

The BBC News Graphics guys have made a big step… they’ve released some internal tools (an R package – bbplot) and a cookbook (or in github) for ggplot2. The latter looks like a great resource to complement the official(?) ggplot2 cheatsheet. Here’s a blog post on why they did it…

Extract data from a PNG/TIFF

Sometimes it’s useful to be able to extract data from a published figure. If the figure isn’t a vector based format (for which the numeric data is probably still in the file), it’s possible to digitize the image with R, click the points and extract it that way. The digitize package is simple to use for this purpose…

If you save the top figure from here to a PNG called “Fig2a_Kreners2015” in your working directory, you can digitize the three lines and replicate the figure (or do other stuff if you so wish) as follows

dig10 <- digitize("Fig2a_Kremers2015.png") # do the solid line
dig15 <- digitize("Fig2a_Kremers2015.png") # do the dashed line
dig20 <- digitize("Fig2a_Kremers2015.png") # do the dotted line

# rename the first variable
names(dig10)[1] <- names(dig15)[1] <- names(dig20)[1] <- "age"

plot(dig10, type = "l")
lines(dig15, lty = 2)
lines(dig20, lty = 3)


Easy huh? It just gets a bit laborious with many points… 🙂


lmer vs INLA for variance components

Just for fun, I decided to compare the estimates from lmer and INLA for the variance components of an LMM (this isn’t really something that you would ordinarily do – comparing frequentist and bayesian approaches). The codes are below. A couple of plots are drawn, which show the distribution of the hyperparameters (in this case variances) from INLA, which are difficult to get from the frequentist framework (there’s a link to a presentation by Douglas Bates in the code, detailing why you might not want to do it [distribution is not symmetrical], and how you could do it… but it’s a lot of work).

Note that we’re comparing the precision (tau) rather than the variance or SD… SD = 1/sqrt(tau)

As you’d hope, the results come pretty close to each other and the truth:

cbind(truth = c(tau, tau.ri), lmer = 1/c(attr(vc, "sc")^2, unlist(vc)), inla = imod$summary.hyperpar$`0.5quant`)
      truth      lmer      inla
       3.00 2.9552444 2.9556383
group  0.25 0.2883351 0.2919622

Code on Github…

anytime – dates in R

I just saw an announcement on R Bloggers about the anytime package. It looks to be a very handy package to convert dates in pretty much any format to Date or POSIX classes, without the need to define the format – it’s guessed by an underlying C++ library.

It certainly seems to be flexible… putting in the same date in 8 different formats all yielded the same result! (FWIW, “15th October 2010” doesn’t work…)

> anytime::anydate(c("2010-10-15", "2010-Oct-15", 
+                    "20101015", "15 Oct 2010", 
+                    "10-15-2010", "15 October 2010", 
+                    "15oct2010", "2010oct15", "15th October 2010"))
[1] "2010-10-15" "2010-10-15" "2010-10-15" "2010-10-15" "2010-10-15" "2010-10-15"
[7] "2010-10-15" "2010-10-15" NA       

There’s an equivalent function for times (anytime instead of anydate). Looks like working with dates and times just got easier!


Estimating Pi

I came across this post which gives a method to estimate Pi by using a circle, it’s circumscribed square and (lots of) random points within said square. Booth used Stata to estimate Pi, but here’s some R code to do the same thing…

x <- 0.5 # center x
y <- 0.5 # center y
n <- 1000 # nr of pts
r <- 0.5 # radius
pts <- seq(0, 2 * pi, length.out = n)
plot(sin(pts), cos(pts), type = 'l', asp = 1) # test

xy <- cbind(x + r * sin(pts), y + r * cos(pts))
sl <- SpatialPolygons(list(Polygons(list(Polygon(xy)), "polygon")))
plot(sl, add=FALSE, col = 'red', axes=T )

# the square
xy <- cbind(c(0, 1, 1, 0), c(0, 0, 1, 1))
sq <- SpatialPolygons(list(Polygons(list(Polygon(xy)), "polygon")))

plot(sq, add = TRUE)

N <- 1e6
x <- runif(N, 0, 1)
y <- runif(N, 0, 1)
sp <- SpatialPoints(cbind(x, y))
plot(sp, add = TRUE, col = "green")

(sim_pi <- (sum(gIntersects(sp, sl, byid = TRUE))/N) *4)
sim_pi - pi

Note the use of sp and rgeos packages to calculate the intersections.

“Where to bird” geostats in R

I just came across a nice little post on acquiring and visualizing geodata in R using the Max Planck Institute of Ornithology as an example. It’s by the rOpenSci guys. Some useful code in there by the look of it… 🙂 Worth a look…