The R Project for Maps

Fig 1 – interactive Leaflet choropleth of Census ACS household income $60k-$75k using Microsoft Open R

The main stay of web mapping applications for the last couple of decades has been three tier: Model – SQL, View – web UI, and Controller – server code. There are many variations on this theme: models residing in image tile pyramids, SQL Server, PostGIS, or Oracle; controller server code as Java, C#, or PHP. The visible action is on the viewer side. Html5 with ever expanding JavaScript libraries like jQuery, bootstrap, and angular.js make life interesting, while node.js is pushing JavaScript upstream to the controller.

For building end user applications it helps to know all three tiers and have at least one tool in each. With the right tools you can eventually accomplish just about anything spatially interesting. Emphasis is on the word “eventually.” SQL <=> C# <=> html5/JavaScript is very powerful, but extravagant for “one off” analytical work.

For ad hoc spatial work it was usually best to stick to a desk top application such as one of the big dollar Arc___ variations or better yet something open source like QGIS. In the early days these generally consisted of modular C/C++ functions threaded together with an all-purpose scripting language. If you wanted to get a little closer to the geo engine, knowledge of a scripting language (PHP, TCL, Python, or Ruby) helped to script modular toolkits like GDAL/OGR, OSSIM, GEOS, or GMT. This all works fine except for learning and relearning often arcane syntax, while repeatedly discovering and reading data documentation on various public resources from Census, USGS, NOAA, NASA, JPL … you get the idea.

R changes things in the geospatial world. The R project originated as a modular statistics and graphics toolkit. Unless you happen to be a true math prodigy, statistics are best visualized graphically. With powerful graphics libraries, R has evolved into a useful platform for ad hoc spatial analysis.

Coupled with an IDE such as RStudio, or the new Microsoft R Tools for Visual Studio, R wraps a large stable of component libraries into a script interpreter environment, ideal for “one off” analysis. Although learning arcane syntax is still a prerequisite, there is at least a universal environment with a really large contributor community. You can think of it as open source replacement for Tableau or Power BI but without proprietary limitations.

Example: networkD3 R library for creating D3 JavaScript network graphs.

# only a few lines of script
data(MisLinks, MisNodes)
forceNetwork(Links = MisLinks, Nodes = MisNodes, Source = "source",
             Target = "target", Value = "value", NodeID = "name",
             Group = "group", opacity = 0.4)

Community contributions are found in CRAN, Comprehensive R Archive Network for the R programming language. A search of CRAN or MRAN (Microsoft R Archive Network) for the term “spatial” yields a list of 145 R libraries.

Example: dygraph R library for creating interactive charts.

 # only a few lines of script
  dygraph(nhtemp, main = "New Haven Temperatures") %>%
   dyRangeSelector(dateWindow = c("1920-01-01", "1960-01-01"))

Here are just a few samples of CRAN libraries useful for spatial analysis:

library(rgdal)  # reading spatial files with gdal
library(ggmap)  # simple mapping and more
library(raster)  # defining extents and raster processing
      brick  # raster cube objects useful for multispectral operations
      stack  # multilayer raster manipulation
library(sp)  # working with spatial objects
library(leaflet)  # interactive web mapping using Leaflet
library(rgeos)  # R GEOS wrapper
library(tigris)  # downloading geography spatial census tiger
library(FedData)  # downloading federal data NED, NHD, SSURGO, GHCN
library(acs)  # tabular census data (American Community Survey) ACS, SF1, SF3
library(UScensus2010)  # spatial and demographic Census 2010 data county/tract/blkgrp/blk
library(RColorBrewer)  # color palettes for thematic mapping

For example, tigris is a useful library for reading US Census TIGER files. With just a couple lines of R scripting you can zoom around a polygonal plot of US Census urban areas. Library(tigris) handles all the details of obtaining the TIGER polygons and loading into local memory. Library(leaflet) handles creating the polygons and displaying over a default Leaflet map as tiles.

ua <- urban_areas(cb = TRUE)
ua %>% leaflet() %>% addTiles() %>% addPolygons(popup = ~NAME10)

Fig 2 – RStudio Interactive Leaflet plot of Census TIGER urban area extracted with tigris

Fig 3 – RStudio script of Census ACS tract household income percentage for $60k-$75K

These samples follow examples found in Zev Ross’s blog posts which contain a wealth of scripts using R for spatial analytics, including these posts on using FedData and rgdal.

Microsoft R

Microsoft recently entered the R world with several enhanced R tools, including:
Microsoft R Open
RTVS R Tools for Visual Studio
Microsoft R Server
SQL Server R Services
MRAN Microsoft R Application Network
Microsoft Azure R Server
Microsoft R Server for Hadoop

Apparently Data Science is a growth industry and Microsoft has an interest in providing useful tools beyond Power BI.

Microsoft R Open Microsoft R Open

Free Microsoft version of R script engine with a couple of enhancements:
• intel enhanced math library
• multi core support
• multithreading

Fig 4 – slide from Derek Norton webinar on R Server showing relative performance boost with enhanced Microsoft R Open

RTVS R Tools for Visual Studio RTVS R Tools for Visual Studio
Microsoft R Visual Studio IDE using the Data Science R settings. Users of Visual Studio will find all the familiar debug stepping, variable explorer, and intellisense editing they are using for other development languages.

Microsoft R Server Microsoft R Server
Licensed enterprise R Service that scales by avoiding in-memory data limitations using parallel chunked data streams.

Fig 5 – slide from Derek Norton webinar showing R Server scale enhancements

SQL Server 2016 R Services SQL Server R Services

SQL R Services – data ETL and visualization tool inside SQL.
T-SQL R interface with Database next to R code on the same server.

sp_execute_external_script – R code embedding
receives inputs, passes to external R runtime, and returns R results.
invoke sp to run R code in T-SQL

MRAN Microsoft R Application Network MRAN Microsoft R Application Network

CRAN fixed date snapshots allow shared R code pointing to compatible library versions
Checkpoint reproducibility

Fig 6 – RVST R Visual Studio 2015 Tools Leaflet demographic script

Example R Leaflet demographic script (ref Fig 1 above):

library(tigris)  # TIGER data
library(acs)     # ACS data
library(stringr) # to pad fips codes
library(dplyr)   # data manipulation
library(leaflet) # interactive mapping

#Colorado Front range counties
counties <- c(1, 5, 13, 31, 35, 39, 41, 59)
tracts <- tracts(state = 'CO', county = c(1, 5, 13, 31, 35, 39, 41, 59), cb = TRUE)

api.key.install(key = "<insert your own api key here>")
geo <- geo.make(state = c("CO"),
              county = c(1, 5, 13, 31, 35, 39, 41, 59), tract = "*")

income <- acs.fetch(endyear = 2012, span = 5, geography = geo,
                  table.number = "B19001", col.names = "pretty")
attr(income, "acs.colnames")
##  [1] "Household Income: Total:"
## [12] "Household Income: $60,000 to $74,999"  

income_df <- data.frame(paste0(str_pad(income@geography$state, 2, "left", pad = "0"),
                               str_pad(income@geography$county, 3, "left", pad = "0"),
                               str_pad(income@geography$tract, 6, "left", pad = "0")),
                        income@estimate[, c("Household Income: Total:",
                                           "Household Income: $60,000 to $74,999")],
                        stringsAsFactors = FALSE)

income_df <- select(income_df, 1:3)
rownames(income_df) <- 1:nrow(income_df)
names(income_df) <- c("GEOID", "total", "income60kTo75k")
income_df$percent <- 100 * (income_df$income60kTo75k / income_df$total)

income_merged <- geo_join(tracts, income_df, "GEOID", "GEOID")

popup <- paste0("GEOID: ", income_merged$GEOID, "<br>", "Percent of Households $60k-$75k: ", round(income_merged$percent, 2))
pal <- colorNumeric(
  palette = "YlGnBu",
  domain = income_merged$percent)

incomemap <- leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(data = income_merged,
              fillColor = ~pal(percent),
              color = "#b2aeae", # you need to use hex colors
fillOpacity = 0.7,
              weight = 1,
              smoothFactor = 0.2,
              popup = popup) %>%
  addLegend(pal = pal,
            values = income_merged$percent,
            position = "bottomright",
            title = "Percent of Households<br>$60k-$75k",
            labFormat = labelFormat(suffix = "%"))


Hillshade example using public SRTM 90 data:

    alt <- getData('alt', country = 'ITA')
    slope <- terrain(alt, opt = 'slope')
    aspect <- terrain(alt, opt = 'aspect')
    hill <- hillShade(slope, aspect, 40, 270)

    leaflet() %>% addProviderTiles("CartoDB.Positron") %>%
      addRasterImage(hill, colors = grey(0:100 / 100), opacity = 0.6)

Fig 7 – RVST R Visual Studio 2015 Tools Leaflet Hillshading image


R provides lots of interesting modules that help with spatial analytics. The script engine makes it easy to perform ad hoc visualization and publish the results online. However, there are limitations in performance and extents that make it more of a competitor to desktop GIS products or the newer commercial data visualizers like Tableau or PowerBI. For public facing web applications with generalized extents three tier performance using SQL + server code + web UI still makes the most sense.

The advent of Microsoft R Server and SQL Server R Services add scaling performance to make R solutions more competitive with the venerable three tier approach. It will be interesting to see how developers make use of SQL Server R Services. As a method of adding raster functionality to SQL Server, R sp_execute_external_script overlaps somewhat with PostGIS Raster. Exploring SQL Server 2016 R Services must await a future post.

Example: threejs R library with world flight data