Skip to content

anytime – dates in R

I just saw an announcement on R Bloggers about the anytime package. It looks to be a very handy package to convert dates in pretty much any format to Date or POSIX classes, without the need to define the format – it’s guessed by an underlying C++ library.

It certainly seems to be flexible… putting in the same date in 8 different formats all yielded the same result! (FWIW, “15th October 2010” doesn’t work…)

> anytime::anydate(c("2010-10-15", "2010-Oct-15", 
+                    "20101015", "15 Oct 2010", 
+                    "10-15-2010", "15 October 2010", 
+                    "15oct2010", "2010oct15", "15th October 2010"))
[1] "2010-10-15" "2010-10-15" "2010-10-15" "2010-10-15" "2010-10-15" "2010-10-15"
[7] "2010-10-15" "2010-10-15" NA       

There’s an equivalent function for times (anytime instead of anydate). Looks like working with dates and times just got easier!

 

Advertisements

Estimating Pi

I came across this post which gives a method to estimate Pi by using a circle, it’s circumscribed square and (lots of) random points within said square. Booth used Stata to estimate Pi, but here’s some R code to do the same thing…

x <- 0.5 # center x
y <- 0.5 # center y
n <- 1000 # nr of pts
r <- 0.5 # radius
pts <- seq(0, 2 * pi, length.out = n)
plot(sin(pts), cos(pts), type = 'l', asp = 1) # test

require(sp)
xy <- cbind(x + r * sin(pts), y + r * cos(pts))
sl <- SpatialPolygons(list(Polygons(list(Polygon(xy)), "polygon")))
plot(sl, add=FALSE, col = 'red', axes=T )


# the square
xy <- cbind(c(0, 1, 1, 0), c(0, 0, 1, 1))
sq <- SpatialPolygons(list(Polygons(list(Polygon(xy)), "polygon")))

plot(sq, add = TRUE)

N <- 1e6
x <- runif(N, 0, 1)
y <- runif(N, 0, 1)
sp <- SpatialPoints(cbind(x, y))
plot(sp, add = TRUE, col = "green")

require(rgeos)
(sim_pi <- (sum(gIntersects(sp, sl, byid = TRUE))/N) *4)
sim_pi - pi

Note the use of sp and rgeos packages to calculate the intersections.

“Where to bird” geostats in R

I just came across a nice little post on acquiring and visualizing geodata in R using the Max Planck Institute of Ornithology as an example. It’s by the rOpenSci guys. Some useful code in there by the look of it… 🙂 Worth a look…

Coloured output in the R console

Just a little fun today… the R console isn’t the most interesting of things… text is typically either black or red (assuming default settings in RStudio). There’s a package though called crayon which allows one to change the style of text in terms of colour, background and some font-type settings. It could be an interesting way to spice up summaries (I’m not recommending it, but it’s a possibility. As far as packages are concerned, it’s just another dependency…)…

devtools::install_github("r-lib/crayon")
library(crayon)
cat(green(
  'I am a green line ' %+%
  blue$underline$bold('with a blue substring') %+%
  yellow$italic(' that becomes yellow and italicised!\n')
))

crayon

rOpenSci’s drake package

If you don’t know rOpenSci, then I recommend checking them out. They write a lot of really good packages for R. A relatively new seems to be drake. I’ve not played with it yet, but it looks to be very useful at giving indications about which parts of an analysis are subject to changes, and only rerunning those parts to speed up redoing an analysis (envisage the overlays for some version control systems or dropbox that show the status of files, although it’s more complicated than that). Knitr has caching, which goes some way to handling this, but here you can see where outdated parts are in the scope of the entire analysis…

It looks like a super tool!drake_dep

Rough looking figures from R

A recent blog post regarding data visualization had some barplots I liked the look of (aesthetically…for research purposes, they wouldn’t be suitable). They look as if they’ve be coloured in with a pencil, rather than having solid blocks of colour… I wondered whether it’s possible with R, and indeed it is. There’s a github project called ggrough that interacts with ggplot2.

Beeswarms instead of histograms

Histograms are good, density plots are also good. Violin and bean plots too. Recently I had someone ask for a plot where you could see each individual point along a continuum, give the points specific colours based on a second variable (similar to the figure), which deviates somewhat from the typical density type plots. Apparently, they’re called beeplots or beeswarms. And there’s a way to make them in R (of course, there’s probably more than one… ggplot2??).

beeswarm

Here’s one way (slightly modified from the packages help files)…

library(beeswarm) # assuming you've installed it ;)
data(breast)
beeswarm(time_survival ~ ER, data = breast,
         pch = 16, pwcol = 1 + as.numeric(event_survival),
         xlab = "", ylab = "Follow-up time (months)", horizontal = TRUE, 
         labels = c("ER neg", "ER pos"), method = "c")
legend("topright", legend = c("Yes", "No"),
       title = "Censored", pch = 16, col = 1:2)

Horizontal is optional of course…

Feel free to comment if you know of other approaches.

See here for more examples.

Hope that helps someone 🙂

UPDATE… I just remembered…these plots are also sometimes referred to as turnip plots…