Skip to content

Coloured output in the R console

Just a little fun today… the R console isn’t the most interesting of things… text is typically either black or red (assuming default settings in RStudio). There’s a package though called crayon which allows one to change the style of text in terms of colour, background and some font-type settings. It could be an interesting way to spice up summaries (I’m not recommending it, but it’s a possibility. As far as packages are concerned, it’s just another dependency…)…

  'I am a green line ' %+%
  blue$underline$bold('with a blue substring') %+%
  yellow$italic(' that becomes yellow and italicised!\n')



rOpenSci’s drake package

If you don’t know rOpenSci, then I recommend checking them out. They write a lot of really good packages for R. A relatively new seems to be drake. I’ve not played with it yet, but it looks to be very useful at giving indications about which parts of an analysis are subject to changes, and only rerunning those parts to speed up redoing an analysis (envisage the overlays for some version control systems or dropbox that show the status of files, although it’s more complicated than that). Knitr has caching, which goes some way to handling this, but here you can see where outdated parts are in the scope of the entire analysis…

It looks like a super tool!drake_dep

Rough looking figures from R

A recent blog post regarding data visualization had some barplots I liked the look of (aesthetically…for research purposes, they wouldn’t be suitable). They look as if they’ve be coloured in with a pencil, rather than having solid blocks of colour… I wondered whether it’s possible with R, and indeed it is. There’s a github project called ggrough that interacts with ggplot2.

Beeswarms instead of histograms

Histograms are good, density plots are also good. Violin and bean plots too. Recently I had someone ask for a plot where you could see each individual point along a continuum, give the points specific colours based on a second variable (similar to the figure), which deviates somewhat from the typical density type plots. Apparently, they’re called beeplots or beeswarms. And there’s a way to make them in R (of course, there’s probably more than one… ggplot2??).


Here’s one way (slightly modified from the packages help files)…

library(beeswarm) # assuming you've installed it ;)
beeswarm(time_survival ~ ER, data = breast,
         pch = 16, pwcol = 1 + as.numeric(event_survival),
         xlab = "", ylab = "Follow-up time (months)", horizontal = TRUE, 
         labels = c("ER neg", "ER pos"), method = "c")
legend("topright", legend = c("Yes", "No"),
       title = "Censored", pch = 16, col = 1:2)

Horizontal is optional of course…

Feel free to comment if you know of other approaches.

See here for more examples.

Hope that helps someone πŸ™‚

UPDATE… I just remembered…these plots are also sometimes referred to as turnip plots…


Merging spatial buffers in R

I’m sure there’s a better way out there, but I struggled to find a way to dissolve polygons that touched/overlapped each other (the special case being buffers). For example,Β  using the osmdata package, we can download the polygons representing hospital buildings in Bern, Switzerland.

library(rgdal) ; library(maptools) ; library(rgeos)

q0 <- opq(bbox = "Bern, Switzerland", timeout = 60)
q1 <- add_osm_feature(q0, key = 'building', value = "hospital")
x <- osmdata_sp(q1)


spChFIDs(x$osm_polygons) <- 1:nrow(x$osm_polygons@data)
cent <- gCentroid(x$osm_polygons, byid = TRUE)
leaflet(cent) %>% addTiles() %>% addCircles()

Here we plot the building centroids.


Each point represents a hospital building. We don’t particularly care about the buildings themselves though. We want to create hospitals. To do so, we try a 150m buffer around each centroid.

buff <- gBuffer(cent, byid = TRUE, width = 0.0015)
leaflet(cent) %>% addTiles() %>% addPolygons(data = buff, col = "red") %>% addCircles()


We then want to merge the buffers into, in this case, four groups. This is the point that doesn’t seem to be implemented anywhere that I could see (I also tried QGIS but that just created one big feature, rather than many small ones). My approach is to get the unique sets of intersections, add them as a variable to the buffer and unify the polygons.

buff <- SpatialPolygonsDataFrame(buff, data.frame(row.names = names(buff), n = 1:length(buff)))
gt <- gIntersects(buff, byid = TRUE, returnDense = FALSE)
ut <- unique(gt)
nth <- 1:length(ut)
buff$n <- 1:nrow(buff)
buff$nth <- NA
for(i in 1:length(ut)){
  x <- ut[[i]]
  buff$nth[x] <- i
buffdis <- gUnaryUnion(buff, buff$nth)
leaflet(cent) %>% addTiles() %>% addPolygons(data = buffdis, col = "red") %>% addCircles()


As you see, it almost worked. The lower left group is composed of three polygons. Doing the same process again clears it (only code shown). Large jobs might need more iterations (or larger buffers). The final job is to get the hospital centroids.

gt <- gIntersects(buffdis, byid = TRUE, returnDense = FALSE)
ut <- unique(gt)
nth <- 1:length(ut)
buffdis <- SpatialPolygonsDataFrame(buffdis, data.frame(row.names = names(buffdis), n = 1:length(buffdis)))
buffdis$nth <- NA
for(i in 1:length(ut)){
  x <- ut[[i]]
  buffdis$nth[x] <- i
buffdis <- gUnaryUnion(buffdis, buffdis$nth)
leaflet(cent) %>% addTiles() %>% addPolygons(data = buffdis, col = "red") %>% addCircles()

buffcent <- gCentroid(buffdis, byid = TRUE

Code here.

Intersecting points and overlapping polygons


I’ve been doing some spatial stuff of late and the next little step will involve intersecting points with possibly many overlapping polygons. The sp package has a function called over which returns the polygons that points intersects with. The catch though, is that it only returns the last (highest numerical value) polygon a point overlaps with. So it’s not so useful if you have many overlapping polygons. A little playing, and I’ve overcome that problem…

Here’s a toy example.

Create a couple of polygons and put them into a SpatialPolygons object.


p1 <- matrix(c(1,1,
 3,2), ncol = 2, byrow = TRUE)
p2 <- matrix(c(2.2,1,
 2.8,3), ncol = 2, byrow = TRUE)
p1s <- Polygons(list(Polygon(p1)), 3)
p2s <- Polygons(list(Polygon(p2)), 4)
sps <- SpatialPolygons(list(p1s, p2s))

Define a few points and put them in a SpatialPoints object

point <- matrix(c(2.5, 1.5,
 3.2, 1.75,
 1.5, 1.25, 
 2.75, 2.5), ncol = 2, byrow = TRUE)
points <- SpatialPoints(point)

We can plot them…(not shown)

plot(sps, border = c("black", "blue"))
plot(points, add = TRUE)

As here we have the issue:

over(points, sps)
 1  2  3  4  5 
 2  1 NA  1  2 

only returning a single “hit” per point (point 1 overlaps with both polygons 1 and 2).

To get around this, we can rip the individual polygons out of the SpatialPolygons object and put them in a list, converting the individual polygons into SpatialPolygons along the way:

sps2 <- lapply(sps@polygons, function(x) SpatialPolygons(list(x)))

Then lapply-ing over sps2 we can see which polygons each point intersects…

lapply(sps2, function(x) over(points, x))
 1  2  3  4  5 
 1  1 NA  1 NA 

 1  2  3  4  5 
 1 NA NA NA  1

And now we see that point one overlaps with both polygons.

Code >>>here<<<


Someone in the comments mentions that over has a returnList argument to do much the same thing. While a couple of others mention that the sf package can do it easily too. You win some…you loose some… it was a little practice with lapply if nothing else.


Tips for great graphics

R is a great program for generating top-notch graphics. But to get the best out of it, you need to put in a little more work. Here are a few tips for adapting your R graphics to make them look a little better.

1) Dont use the “File/Save as…/” menu.

If you set up your graphic in the first place then theres no need to post-process (eg crop, scale etc) the graphic in other software.

Use the graphic devices (jpeg(), tiff(), postscript(), etc), set your height and width to whatever you want the finished product to be and then create the graph.

tiff("~/Manuscript/Figs/Fig1.tiff", width =2, height =2, units ="in", res = 600)
plot(dist ~ speed, cars) # cars is a base R dataset - data(cars)

The first argument to a graphic device such as tiff or jpeg is the filename, to which you can include the path. So that I dont have to worry about the order of arguments, I include the argument name. Width and height specify the width and height of the graphic in the units provided, in this case inches, but pixels, cm or mm can also be used. The resolution tells R how higher quality the graphic should be, the higher the better, but if you go too high, you could find that you have problems running out of space. I find 600 a nice compromise. Nice crisp lines, smooth curves and sharp letters. You can then import that straight into MS Word or whatever other word processor you use, or upload to go with that manuscript youve worked so hard on. Although, you could find yourself with BIG files. A 4×6 inch figure I made recently was 17.1MB! And for the web, 100 or 200 is probably enough.

This technique also provides you with the same output every time, which is not the case if you adjust the size of the default device window produced by plot()

2) Dont be afraid to change the default settings!

Personally, I find that a 1 inch margin at the base of the graphic is a bit generous. I also find the ticks a bit long and the gap between tick and the axis label a bit big. Luckily, these things are easy to change!

jpeg("t1.jpeg", width=3, height=3, units="in", res=100)
plot(dist~speed, cars)

The above code produces this figure.

See what I mean about the margins?

Heres how to change it!

par(mai=c(0.5, 0.5, 0.1, 0.1) )
plot(dist ~ speed, cars, tck = -0.01, las=1, mgp=c(1.4,0.2,0))

That call to par changes the “MArgin in Inches” setting. The tck argument to par deals with TiCK length (negative for outside, positive for inside) while mgp controls on which line certain things are printed (titles are the first argument, labels are the second and the axis itself is the third). The las argument controls the rotation of the labels (1 for horizontal, 2 for perpendicular and 3 for vertical).

This leads me nicely to number 3: Dont be afraid to have separate lines for different parts of your plot.

This allows far more freedom and flexibility than setting par arguments in the plot argument. You can have different mgp settings for each axis for instance.

par(mai=c(0.4, 0.5, 0.1, 0.1))
plot(dist ~ speed, cars, xaxt="n", mgp=c(1.4,0.2,0), las=1, tck=-0.01)
axis(side=1, tck = -0.01, las=1, mgp=c(0.5,0.2,0))
mtext("speed", side=1, line= 1)

This plots the same graph, but allows different distances for the x and y axes, in terms of margin and where the title is situated. The axis function places an axis on the side determined by its side argument and mtext places Margin TEXT, again at the side in its argument and in this case on line 1.