Skip to content

“Where to bird” geostats in R

I just came across a nice little post on acquiring and visualizing geodata in R using the Max Planck Institute of Ornithology as an example. It’s by the rOpenSci guys. Some useful code in there by the look of it… 🙂 Worth a look…

Advertisements

Coloured output in the R console

Just a little fun today… the R console isn’t the most interesting of things… text is typically either black or red (assuming default settings in RStudio). There’s a package though called crayon which allows one to change the style of text in terms of colour, background and some font-type settings. It could be an interesting way to spice up summaries (I’m not recommending it, but it’s a possibility. As far as packages are concerned, it’s just another dependency…)…

devtools::install_github("r-lib/crayon")
library(crayon)
cat(green(
  'I am a green line ' %+%
  blue$underline$bold('with a blue substring') %+%
  yellow$italic(' that becomes yellow and italicised!\n')
))

crayon

rOpenSci’s drake package

If you don’t know rOpenSci, then I recommend checking them out. They write a lot of really good packages for R. A relatively new seems to be drake. I’ve not played with it yet, but it looks to be very useful at giving indications about which parts of an analysis are subject to changes, and only rerunning those parts to speed up redoing an analysis (envisage the overlays for some version control systems or dropbox that show the status of files, although it’s more complicated than that). Knitr has caching, which goes some way to handling this, but here you can see where outdated parts are in the scope of the entire analysis…

It looks like a super tool!drake_dep

Rough looking figures from R

A recent blog post regarding data visualization had some barplots I liked the look of (aesthetically…for research purposes, they wouldn’t be suitable). They look as if they’ve be coloured in with a pencil, rather than having solid blocks of colour… I wondered whether it’s possible with R, and indeed it is. There’s a github project called ggrough that interacts with ggplot2.

Beeswarms instead of histograms

Histograms are good, density plots are also good. Violin and bean plots too. Recently I had someone ask for a plot where you could see each individual point along a continuum, give the points specific colours based on a second variable (similar to the figure), which deviates somewhat from the typical density type plots. Apparently, they’re called beeplots or beeswarms. And there’s a way to make them in R (of course, there’s probably more than one… ggplot2??).

beeswarm

Here’s one way (slightly modified from the packages help files)…

library(beeswarm) # assuming you've installed it ;)
data(breast)
beeswarm(time_survival ~ ER, data = breast,
         pch = 16, pwcol = 1 + as.numeric(event_survival),
         xlab = "", ylab = "Follow-up time (months)", horizontal = TRUE, 
         labels = c("ER neg", "ER pos"), method = "c")
legend("topright", legend = c("Yes", "No"),
       title = "Censored", pch = 16, col = 1:2)

Horizontal is optional of course…

Feel free to comment if you know of other approaches.

See here for more examples.

Hope that helps someone 🙂

UPDATE… I just remembered…these plots are also sometimes referred to as turnip plots…

 

Merging spatial buffers in R

I’m sure there’s a better way out there, but I struggled to find a way to dissolve polygons that touched/overlapped each other (the special case being buffers). For example,  using the osmdata package, we can download the polygons representing hospital buildings in Bern, Switzerland.

library(osmdata)
library(rgdal) ; library(maptools) ; library(rgeos)

q0 <- opq(bbox = "Bern, Switzerland", timeout = 60)
q1 <- add_osm_feature(q0, key = 'building', value = "hospital")
x <- osmdata_sp(q1)

library(leaflet)

spChFIDs(x$osm_polygons) <- 1:nrow(x$osm_polygons@data)
cent <- gCentroid(x$osm_polygons, byid = TRUE)
leaflet(cent) %>% addTiles() %>% addCircles()

Here we plot the building centroids.

hospcent

Each point represents a hospital building. We don’t particularly care about the buildings themselves though. We want to create hospitals. To do so, we try a 150m buffer around each centroid.

buff <- gBuffer(cent, byid = TRUE, width = 0.0015)
leaflet(cent) %>% addTiles() %>% addPolygons(data = buff, col = "red") %>% addCircles()

hospbuff

We then want to merge the buffers into, in this case, four groups. This is the point that doesn’t seem to be implemented anywhere that I could see (I also tried QGIS but that just created one big feature, rather than many small ones). My approach is to get the unique sets of intersections, add them as a variable to the buffer and unify the polygons.

buff <- SpatialPolygonsDataFrame(buff, data.frame(row.names = names(buff), n = 1:length(buff)))
gt <- gIntersects(buff, byid = TRUE, returnDense = FALSE)
ut <- unique(gt)
nth <- 1:length(ut)
buff$n <- 1:nrow(buff)
buff$nth <- NA
for(i in 1:length(ut)){
  x <- ut[[i]]
  buff$nth[x] <- i
}
buffdis <- gUnaryUnion(buff, buff$nth)
leaflet(cent) %>% addTiles() %>% addPolygons(data = buffdis, col = "red") %>% addCircles()

hospbuff2.png

As you see, it almost worked. The lower left group is composed of three polygons. Doing the same process again clears it (only code shown). Large jobs might need more iterations (or larger buffers). The final job is to get the hospital centroids.

gt <- gIntersects(buffdis, byid = TRUE, returnDense = FALSE)
ut <- unique(gt)
nth <- 1:length(ut)
buffdis <- SpatialPolygonsDataFrame(buffdis, data.frame(row.names = names(buffdis), n = 1:length(buffdis)))
buffdis$nth <- NA
for(i in 1:length(ut)){
  x <- ut[[i]]
  buffdis$nth[x] <- i
}
buffdis <- gUnaryUnion(buffdis, buffdis$nth)
leaflet(cent) %>% addTiles() %>% addPolygons(data = buffdis, col = "red") %>% addCircles()

buffcent <- gCentroid(buffdis, byid = TRUE

Code here.

Intersecting points and overlapping polygons

UPDATED…

I’ve been doing some spatial stuff of late and the next little step will involve intersecting points with possibly many overlapping polygons. The sp package has a function called over which returns the polygons that points intersects with. The catch though, is that it only returns the last (highest numerical value) polygon a point overlaps with. So it’s not so useful if you have many overlapping polygons. A little playing, and I’ve overcome that problem…

Here’s a toy example.

Create a couple of polygons and put them into a SpatialPolygons object.

library(sp)

p1 <- matrix(c(1,1,
 2,1,
 4,2,
 3,2), ncol = 2, byrow = TRUE)
p2 <- matrix(c(2.2,1,
 3,1,
 3,2,
 3,3,
 2.8,3), ncol = 2, byrow = TRUE)
p1s <- Polygons(list(Polygon(p1)), 3)
p2s <- Polygons(list(Polygon(p2)), 4)
sps <- SpatialPolygons(list(p1s, p2s))

Define a few points and put them in a SpatialPoints object

point <- matrix(c(2.5, 1.5,
 3.2, 1.75,
 2,3,
 1.5, 1.25, 
 2.75, 2.5), ncol = 2, byrow = TRUE)
points <- SpatialPoints(point)

We can plot them…(not shown)

plot(sps, border = c("black", "blue"))
plot(points, add = TRUE)

As here we have the issue:

over(points, sps)
 1  2  3  4  5 
 2  1 NA  1  2 

only returning a single “hit” per point (point 1 overlaps with both polygons 1 and 2).

To get around this, we can rip the individual polygons out of the SpatialPolygons object and put them in a list, converting the individual polygons into SpatialPolygons along the way:

sps2 <- lapply(sps@polygons, function(x) SpatialPolygons(list(x)))

Then lapply-ing over sps2 we can see which polygons each point intersects…

lapply(sps2, function(x) over(points, x))
[[1]]
 1  2  3  4  5 
 1  1 NA  1 NA 

[[2]]
 1  2  3  4  5 
 1 NA NA NA  1

And now we see that point one overlaps with both polygons.

Code >>>here<<<

UPDATE…

Someone in the comments mentions that over has a returnList argument to do much the same thing. While a couple of others mention that the sf package can do it easily too. You win some…you loose some… it was a little practice with lapply if nothing else.