Territorial Turf

Fig 1 – Territories from Census Zip Code Tabulation Areas (ZCTAs) on Bing Maps

An interesting GIS development over the years has been the evolution from monolithic applications to multiple open source plug and play tools. Considering GIS as a three tier system with back end storage, a controller in the middle, and a UI display out front, more and more of the middle tier is migrating to either end.

SQL DBs, such as SQL Server, Oracle Spatial, and especially PostGIS, now implement a multitude of GIS functions originally considered middle tier domain. On the other end, the good folks at turf.js and proj4js continue to push atomic GIS functions out to javascript, where they can fit into the client side UI. The middle tier is getting thinner and thinner as time goes on. Generally the middle is what costs money, so rolling your own middle with less work is a good thing. As a desirable side effect, instead of kitchen sink GIS, very specific user tools can be cobbled together as needed.

Looking for an excuse to experiment with turf.js, I decided to create a Territory Builder utilizing turf.js on the client and some of the US Census Bureau Services on the backend.

US Census Bureau does expose some “useful” services at TigerWeb. I tend to agree with Brian Timoney that .gov should stick to generating data exposed in useful ways. Apps and presentation are fast evolving targets and historically .gov can’t really hope to keep up. Although you can use the census custom TigerWeb applications for some needs, there are many other occasions when you would like to build something less generic. For example a Territory builder over Bing Maps.

Here is the simplified POC work flow:

1. Use Census WMS GetMap to show polygons on a canvas over Bing Maps.
2. Use Census WMS GetFeatureInfo to return a GeoID of selected polygon(s).
3. Use Census REST service and GeoID to return selected polygon vertices.
4. Use Turf.js merge function to merge selected polygons into a single Territory polygon
5. Display territory polygon using Bing Maps Ajax v7 Microsoft.Maps.AdvancedShapes polygon

Fig 2 – Territory polygon with a demographic overlay on top of Bing Maps

Note: Because there doesn’t appear to be an efficient way to join demographic data to geography with current TigerWeb service APIs, the demographic tab of this app uses a custom WMS PostGIS backend, hooking SF1 to TIGER polygons.

1. Census WMS GetMap over Bing Maps

WMS GetMap requests simply return an image. In order to overlay the image on a Bing Map this Territory app uses an html5 canvas and context app.context.drawImage(imageObj, 0, 0); The trick is to stack the canvas in between the Bing Map and the Bing Map navigation, scale, and selector tools. TigerWeb WMS conveniently exposes epsg:3857 which correctly aligns with Bing Map tiles.

    function showCensus() {
        if (app.canvas == null) {
            app.canvas = document.createElement('canvas');
            app.canvas.id = 'censuscanvas';
            app.canvas.style.position = 'relative';
            var mapDiv = app.map.getRootElement();
            //position this canvas under Bing navigation and over map tiles
            mapDiv.insertBefore(app.canvas, mapDiv.childNodes[3]);
            app.context = app.canvas.getContext("2d");
        }
        //match canvas size to map size
        app.canvas.height = app.map.getHeight();
        app.canvas.width = app.map.getWidth();

        var imageObj = new Image();
        //image onload function draws image on context
        imageObj.onload = function () {
            app.canvas.style.opacity = app.alpha / 100;
            app.context.drawImage(imageObj, 0, 0);
        };
        //prepare a TigerWeb WMS GetMap request
        //use proj4.js to transform ll to mercator bbox
        var b = app.map.getBounds();
        var epsg3857 = "+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext  +no_defs";
        var sw = proj4(epsg3857).forward([b.getWest(), b.getSouth()]);
        var ne = proj4(epsg3857).forward([b.getEast(), b.getNorth()]);
        bbox = sw[0] + "," + sw[1] + "," + ne[0] + "," + ne[1];

        var url = "";
        if (tab == "Territory") {
            app.alpha = 100;
            // GetMap request
            url = "http://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_Current/MapServer/WmsServer?REQUEST=GetMap&SERVICE=WMS&VERSION=1.3.0&LAYERS=" + app.censusPolygonType + "&STYLES=&FORMAT=image/png&BGCOLOR=0xFFFFFF&TRANSPARENT=TRUE&CRS=EPSG:3857&BBOX=" + bbox + "&WIDTH=" + app.canvas.width + "&HEIGHT=" + app.canvas.height;
        }
        imageObj.src = url;
    }

2. Census WMS GetFeatureInfo for obtaining GeoID

Unfortunately the WMS GetMap request has no IDs available for polygons, even if requested format is image/svg+xml. SVG is an xml format and could easily contain associated GeoID values for joining with other data resources, but this is contrary to the spirit of OGC WMS service specs. Naturally we must obey OGC Law, which is too bad. Adding a GeoID attribute would allow options such as choropleth fills directly on existing svg paths. For example adding id = “80132″ would allow fill colors by zipcode with a bit of javascript.

http://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_Current/MapServer/WmsServer?REQUEST=GetMap&SERVICE=WMS&VERSION=1.3.0 &LAYERS=2010 Census ZIP Code Tabulation Areas&STYLES=&FORMAT=image/svg+xml&BGCOLOR=0xFFFFFF&TRANSPARENT=TRUE&CRS=EPSG:3857 &BBOX=-11679625.942909468,4709198.547476525,-11645573.246808422,4737900.651597611&WIDTH=891&HEIGHT=751

<g transform="matrix(1.3333333 0 0 -1.3333333 445.5 375.5)">
<path id="80132" d="M300.50809 -258.78592 L302.54407 -257.91829 L303.17977 -257.55608 L304.45554 -256.98889
                                .
				.
L280.30116 -268.55133 L280.7184 -268.34637 L298.33448 -259.95117 L300.50809 -258.78592 "
  stroke="#99454A" fill="none" stroke-opacity="1" stroke-width="1.5"
 stroke-linecap="round" stroke-linejoin="round" stroke-miterlimit="10" />

Instead TigerWeb WMS exposes GetFeatureInfo requests. Using a click event we calculate a lat,lon location and send a GetFeatureInfo request to find a GeoID for the enclosing polygon.

Request:
http://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_Current/MapServer/WmsServer?REQUEST=GetFeatureInfo&SERVICE=WMS&VERSION=1.3.0&CRS=EPSG:4326 &BBOX=13.6972399985862,-128.010213138288,52.2800368298775,-53.444917258507&WIDTH=1061&HEIGHT=549 &INFO_FORMAT=text/xml&QUERY_LAYERS=2010 Census ZIP Code Tabulation Areas&X=388&Y=190

Response:
<?xml version=”1.0″ encoding=”UTF-8″?>
<FeatureInfoResponse xmlns=”http://www.esri.com/wms” xmlns:esri_wms=”http://www.esri.com/wms”>
<FIELDS INTPTLON=”-101.2099848″ INTPTLAT=”+38.9263824″ CENTLON=”-101.2064010″ CENTLAT=”+38.9226272″ STGEOMETRY=”Polygon” OBJECTID=”16553″ AREAWATER=”163771″ AREALAND=”1327709219″ FUNCSTAT=”S” ZCTA5CC=”B5″ MTFCC=”G6350″ NAME=”ZCTA5 67764″ LSADC=”Z5″ BASENAME=”67764″ GEOID=”67764″ ZCTA5=”67764″ OID=”221404258331221″/>
</FeatureInfoResponse>

Since cross browser restrictions come into play it’s necessary to add a bit of server side proxy code to actually return the xml.

Some MVC controller magic:

 function GetCensusFeature(e) {
    var x = e.pageX - $("#map").offset().left;
    var y = e.pageY - $("#map").offset().top;
    var FeatureUrl = {
        url: "http://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_Current/MapServer/WmsServer?REQUEST=GetFeatureInfo&SERVICE=WMS&VERSION=1.3.0&LAYERS=" + app.censusPolygonType + "&STYLES=&FORMAT=image/png&BGCOLOR=0xFFFFFF&TRANSPARENT=TRUE&CRS=EPSG:3857&BBOX=" + bbox + "&WIDTH=" + app.canvas.width + "&HEIGHT=" + app.canvas.height + "&INFO_FORMAT=text/xml&QUERY_LAYERS=" + app.censusPolygonType + "&X=" + x + "&Y=" + y
    }
        app.postit('/api/Operation/GetFeatureInfo', {
            data: JSON.stringify(FeatureUrl),
            success: function (response) {
                if (response.length > 0) {
                    var parser = new DOMParser();
                    var xmlDoc = parser.parseFromString(response, "text/xml");
				          .
				          .

Eventally gets here to a C# Controller which can proxy the GetFeatureInfo request:

        /// <summary>
        /// GetFeatureInfo
        /// </summary>
        /// <param name="FeatureUrl"></param>
        /// <returns>xml doc</returns>
        [HttpPost]
        public async Task<IHttpActionResult> GetFeatureInfo(Models.FeatureUrl feature)
        {
            var wc = new HttpClient();
            var getFeatureInfoRequest = new Uri(feature.url);
            var response = await wc.GetAsync(getFeatureInfoRequest);
            response.EnsureSuccessStatusCode();
            var content = await response.Content.ReadAsStreamAsync();
            var result = "";
            using (var reader = new StreamReader(content))
            {
                while (!reader.EndOfStream)
                {
                    result += reader.ReadLine();
                }
            }
            return Ok(result);
        }

Finally, on success, javascript can parse the xml for GeoID and add a canvas context box with some text information:

     success: function (response) {
              if (response.length > 0) {
                    var parser = new DOMParser();
                    var xmlDoc = parser.parseFromString(response, "text/xml");
                    if (tab == "Territory" && xmlDoc.getElementsByTagName("FeatureInfoResponse")[0].hasChildNodes()) {
                        app.context.beginPath();
                        app.context.rect(x, y, 125, 25);
                        app.context.fillStyle = 'white';
                        app.context.fill();
                        app.context.lineWidth = 1;
                        app.context.strokeStyle = 'black';
                        app.context.stroke();
                        app.context.fillStyle = 'black';
                        app.context.font = '11pt Calibri';
                        var fields = xmlDoc.getElementsByTagName("FIELDS")[0];
                        app.context.fillText(fields.getAttribute('NAME'), x + 5, y + 15);
                        var geoid= fields.getAttribute('GEOID')
					            .
					            .

Fig 3 – building territories from Census Block Groups on Bing Maps

3. Census REST to obtain vertices

The GEOID retrieved from our proxied GetFeatureInfo request allows us to grab vertices with another TigerWeb service, Census REST.

This spec is a little more proprietary and requires some detective work to unravel.
FeatureUrl.url = “http://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/PUMA_TAD_TAZ_UGA_ZCTA/MapServer/1/query?where=GEOID%3D” + geoid + “&geometryPrecision=6&outSR=4326&f=pjson”;

There are different endpoint urls for the various polygon types. In this case Zip Codes are found in PUMA_TAD_TAZ_UGA_ZCTA. We don’t need more than 1 meter resolution so precision 6 is good enough and we would like the results in epsg:4326 to avoid a proj4 transform on the client.

This follows the same process as previously sending the REST request through a proxy controller. You can see the geojson geometry result with this sample request:
http://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/PUMA_TAD_TAZ_UGA_ZCTA/MapServer/1/query?where=GEOID%3D80004&geometryPrecision=6&outSR=4326&f=pjson

Census REST doesn’t appear to offer a simplify parameter so the coordinates returned are at the highest resolution. Highly detailed polygons can easily return several thousand vertices, which is a problem for performance, but the trade-off is eliminating hosting data ourselves.

4. turf.js merge function to Build Territory

Finally the interesting part, where we get to use turf.js to handle merging multiple polygons into a single territory polygon.

var FeatureUrl = {
url: "http://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/PUMA_TAD_TAZ_UGA_ZCTA/MapServer/1/query?where=GEOID%3D" + geoid + "&geometryPrecision=6&outSR=4326&f=pjson";
}
app.postit('/api/Operation/GetFeatureGeo', {
    data: JSON.stringify(FeatureUrl),
    success: function (response) {
        var polygon = JSON.parse(response);
        if (polygon.features.length == 0 || polygon.features[0].geometry.rings.length == 0) {
            $('#loading').addClass('hide');
            alert("No Geo Features returned.")
            return;
        }
        if (app.territoryFeatures == null) {
            app.territoryFeatures = new Array();
            app.territory = turf.polygon(polygon.features[0].geometry.rings);
            app.territoryFeatures.push(app.territory);
        }
        else {
            var tpolygon = turf.polygon(polygon.features[0].geometry.rings);
            app.territoryFeatures.push(tpolygon);
            var fc = turf.featurecollection(app.territoryFeatures);
            try {
                app.territory = turf.merge(fc);
            }
            catch (err) {
                $('#loading').addClass('hide');
                alert("turf.js error: " + err.message);
                return;
            }
        }
        if (app.territory.geometry.coordinates.length > 0) {
            displayTerritory();
            $('#totalDivID').removeClass('hide');
        }

        $('#loading').addClass('hide');
    },
    error: function (response) {
        alert("TigerWeb error:" + response);
    }
});

5. Display Territory Bing Maps Ajax polygon

The final rendering just uses Bing Maps Ajax v7 Microsoft.Maps.AdvancedShapes to add the new territory to the map.

function displayTerritory() {
    app.territoryLayer.clear();
    var coordLen = app.territory.geometry.coordinates.length;
    if (app.territory.geometry.type == 'Polygon') {
        var rings = new Array();
        for (var i = 0; i < coordLen; i++) {
            var vertices = new Array();
            for (var j = 0; j < app.territory.geometry.coordinates[i].length; j++) {
                vertices.push(new Microsoft.Maps.Location(app.territory.geometry.coordinates[i][j][1], app.territory.geometry.coordinates[i][j][0]));
            }
            rings.push(vertices);
        }
        var polygon = new Microsoft.Maps.Polygon(rings, { fillColor: new Microsoft.Maps.Color(100, 100, 0, 100) });
        app.territoryLayer.push(polygon);
    }
    else if (app.territory.geometry.type == 'MultiPolygon') {
        var multi = new Array();
        for (var i = 0; i < coordLen; i++) {
            var ringslen = app.territory.geometry.coordinates[i].length;
            for (var j = 0; j < ringslen; j++) {
                var vertices = new Array();
                for (var k = 0; k < app.territory.geometry.coordinates[i][j].length; k++) {
                    vertices.push(new Microsoft.Maps.Location(app.territory.geometry.coordinates[i][j][k][1], app.territory.geometry.coordinates[i][j][k][0]));
                }
                multi.push(vertices)
            }
        }
        var polygon = new Microsoft.Maps.Polygon(multi, { fillColor: new Microsoft.Maps.Color(100, 100, 0, 100) });
        app.territoryLayer.push(polygon);
    }
    else {
        $('#loading').addClass('hide');
        alert("geometry type is " + app.territory.geometry.type + ". Territory requires a Polygon or MultiPolygon.")
        return;
    }
}

Fig 4 – Territory Total for P0040003 Latino Origin – Map: Quantile population classifier Census Block Groups on Bing Map

Summary

TigerWeb offers some useful data access. With TigerWeb WMS and REST api, developers can customize apps without hosting and maintaining a backend SQL store. However, there are some drawbacks.

Some potential improvements:
1. Adding an svg id=GeoID would really improve the usefulness of TigerWeb WMS image/svg+xml, possibly eliminating steps 2 and 3 of the workflow.

2. Technically it’s possible to use the TigerWeb REST api to query geojson by area, but practically speaking the results are too detailed for useful performance. A helpful option for TigerWeb REST would be a parameter to request simplified polygons and avoid lengthy vertice transfers.

turf.js is a great tool box, however, occasionally the merge function had trouble with complex polygons from TigerWeb.

Fig 6 - turf merge had some rare difficulties with complex polygons from TigerWeb

In the end, territories are useful for query aggregates of SF1 demographic data.

Fig 5 – Territory Total for P0120025 Male 85 Years and older – Map: Jenks population classifier Census Block Group on Bing Map

Visualizing Large Data Sets with Bing Maps Web Apps

Fig 1 - MapD Twitter Map 80M tweets Oct 19 - Oct 30

Visualizing large data sets with maps is an ongoing concern these days. Just ask the NSA, or note this federal vehicle tracking initiative reported at the LA Times. Or, this SPD mesh network for tracking any MAC address wandering by.

“There was of course no way of knowing whether you were being watched at any given moment. How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all of the time. But at any rate they could plug in your wire whenever they wanted to.”

George Orwell, 1984

On a less intrusive note, large data visualization is also of interest to anyone dealing with BI or just fascinated with massive public data sets such as twitter universe. Web maps are the way to go for public distribution and all web apps face the same set of issues when dealing with large data sets.

1. Latency of data storage queries, typically SQL
2. Latency of services for mediating queries and data between the UI and storage.
3. Latency of the internet
4. Latency of client side rendering

All web map javascript APIs have these same issues whether it’s Google, MapQuest, Nokia Here, or Bing Maps. This is a Bing Maps centric perspective on large data mapping, because Bing Maps has been the focus of my experience for the last year or two.

Web Mapping Limitation

Bing Maps Ajax v7 is Microsoft’s javascript API for web mapping applications. It offers typical Point (Pushpin), Polyline, and Polygon vector rendering in the client over three tile base map styles, Road, Aerial, and AerialWithLabels. Additional overlay extensions are also available such as traffic. Like all the major web map apis, vector advantages include client side event functions at the shape object level.

Although this is a powerful mapping API rendering performance degrades with the number of vector entities in an overlay. Zoom and pan navigation performs smoothly on a typical client up to a couple of thousand points or a few hundred complex polylines and polygons. Beyond these limits other approaches are needed for visualizing geographic data sets. This client side limit is necessarily fuzzy as there is a wide variety of client hardware out there in user land, from older desktops and mobile phones to powerful gaming rigs.

Large data Visualization Approaches

1) Tile Pyramid – The Bing Maps Ajax v7 API offers a tileLayer resource that handles overlays of tile pyramids using a quadkey nomenclature. Data resources are precompiled into sets of small images called a tile pyramid which can then be used in the client map as a slippy tile overlay. This is the same slippy tile approach used for serving base Road, Aerial, and AerialWithLabels maps, similar to all web map brands.

Fig 2 - example of quadkey png image names for a tile pyramid

Pro:· Fast performance

  • Server side latency is eliminated by pre-processing tile pyramids
  • Internet streaming is reduced to a limited set of png or jpg tile images
  • Client side rendering is reduced to a small set of images in the overlay

Con: Static data – tile pyramids are pre-processed

  • data cannot be real time
  • Permutations limited – storage and time limitations apply to queries that have large numbers of permutations
  • Storage capacity – tile pyramids require large storage resources when provided for worldwide extents and full 20 zoom level depth

2) Dynamic tiles – this is a variation of the tile pyramid that creates tiles on demand at the service layer. A common approach is to provide dynamic tile creation with SQL or file based caching. Once a tile has been requested it is then available for subsequent queries directly as an image. This allows lower levels of the tile pyramid to be populated only on demand reducing the amount of storage required.

Pro:

  • Can handle larger number of query permutations
  • Server side latency is reduced by caching tile pyramid images (only the first request requires generating the image)
  • Internet streaming is reduced to a limited set of png tile images
  • Client side rendering is reduced to a small set of images in the overlay

Con:

  • Static data – dynamic data must still be refreshed in the cache
  • Tile creation performance is limited by server capability and can be a problem with public facing high usage websites.

3) Hybrid - This approach splits the zoom level depth into at least two sections. The lowest levels with the largest extent contain the majority of a data set’s features and is provided as a static tile pyramid. The higher zoom levels comprising smaller extents with fewer points can utilize the data as vectors. A variation of the hybrid approach is a middle level populated by a dynamic tile service.

Fig 3 – Hybrid architecture

Pro:

  • Fast performance – although not as fast as a pure static tile pyramid it offers good performance through the entire zoom depth.
  • Allows fully event driven vectors at higher zoom levels on the bottom end of the pyramid.

Con:

  • Static data at larger extents and lower zoom levels
  • Event driven objects are only available at the bottom end of the pyramid

Example:
sample site and demo video

tile layer sample

Fig 4 - Example of a tileLayer view - point data for earthquakes and Mile Markers

Fig 5 - Example of same data at a higher zoom using vector data display

4) Heatmap
Heatmaps refer to the use of color gradient or opacity overlays to display data density. The advantage of heat maps is the data reduction in the aggregating algorithm. To determine the color/opacity of a data set at a location the data is first aggregated by either a polygon or a grid cell. The sum of the data in a given grid cell is then applied to the color gradient dot for that cell. If heatmaps are rendered client side they have good performance only up to the latency limits of service side queries, internet bandwidth, and local rendering.

Fig 6 - Example of heatmap canvas over Bing Maps rendered client side

Grid Pyramids – Server side gridding
Hybrid server side gridding offers significant performance advantages when coupled with pre-processed grid cells. One technique of gridding processes a SQL data resource into a quadkey structure. Each grid cell is identified by its unique quadkey and contains the data aggregate at that grid cell. A grid quadkey sort by length identifies all of the grid aggregates at a specific quadtree level. This allows the client to efficiently download the grid data aggregates at each zoom level and render locally on the client in an html5 canvas over the top of a Bing Maps view. Since all grid levels are precompiled, resolution of cells can be adjusted by Zoom Level.

Pro:

  • Efficient display of very large data sets at wide extents
  • Can be coupled with vector displays at higher zoom levels for event driven objects

Con: gridding is pre-processed

  • real time data cannot be displayed
  • storage and time limitations apply to queries that have large numbers of permutations

Fig 7 – Grid Pyramid screen shot of UI showing opacity heatmap of Botnet infected computers

5) Thematic
Thematic maps use spatial regions such as states or zipcodes to aggregate data into color coded polygons. Data is aggregated for each region and color coded to show value. A hierarchy of polygons allows zoom levels to switch to more detailed regions at closer zooms. An example hierarchy might be Country, State, County, Sales territory, Zipcode, Census Block.

Pro:

  • Large data resources are aggregated into meaningful geographic regions.
  • Analysis is often easier using color ranges for symbolizing data variation

Con:

  • Rendering client side is limited to a few hundred polygons
  • Very large data sets require pre-processing data aggregates by region

Fig 8 - thematic map displaying data aggregated over 210 DMA regions using a quantized percentile range

6) Future trends
Big Data visualization is an important topic as the web continues to generate massive amounts of data useful for analysis. There are a couple of technologies on the horizon that help visualization of very large data resources.

A. Leverage of client side GPU

Here is an example WebGL http://www.web-maps.com/WebGLTest using CanvasLayer. ( only Firefox, chrome, IE11 *** Cannot be viewed in IE10 ***)

This sample shows speed of pan zoom rendering of 30,000 random points which would overwhelm typical js shape rendering. Data performance is good up to about 500,000 points per Brendan Kenny. Complex shapes need to be built up from triangle primitives. Tessellation rates for polygon generation approaches 1,000,000 triangles per 1000ms using libtess. Once tessellated the immediate mode graphic pipeline can navigate at up to 60fps. Sample code is available on github.

This performance is achieved by leveraging the client GPU. Because immediate mode graphics is a powerful animation engine, time animations can be used to uncover data patterns and anomalies as well as making some really impressive dynamic maps like this Uber sample. Unfortunately all the upstream latency remains: collecting the data from storage and sending it across the wire. Since we’re talking about larger sets of data this latency is more pronounced. Once data initialization finishes, client side performance is amazing. Just don’t go back to the server for new data very often.

Pro:

  • Good client side navigation performance up to about 500,000 points

Con:

  • requires a webgl enabled browser
  • requires GPU on the client hardware
  • subject to latency issues of server query and internet streaming
  • WebGL tessellation triangle primitives make display of polylines and polygons complex

Fig 9 – test webGL 30,000 random generated points (requires WebGL enabled browser – Firefox, Chrome, IE11)

Note: IE11 added WebGL capability which is a big boost for the web. There are still some glitches, however, and gl_PointSize in shader is broken for simple points like this sample.

Fig 10 – Very interesting WebGL animations of shipping GPS tracks using WebGL Canvas –courtesy Brendan Kenny

B. Leverage of server side GPU
MapD – Todd Mostak has developed a GPU based spatial query system called MapD (Massively Parallel Database)

MapD Synopsis:
  • MapD is a new database in development at MIT, created by Todd Mostak.
  • MapD stands for “massively parallel database.”
  • The system uses graphics processing units (GPUs) to parallelize computations. Some statistical algorithms run 70 times faster compared to CPU-based systems like MapReduce.
  • A MapD server costs around $5,000 and runs on the same power as five light bulbs.
  • MapD runs at between 1.4 and 1.5 teraflops, roughly equal to the fastest supercomputer in 2000.
  • uses SQL to query data.
  • Mostak intends to take the system open source sometime in the next year.
  • Bing Test: http://onterrawms.blob.core.windows.net/bingmapd/index.htm

    Bing Test shows an example of tweet points over Bing Maps and illustrates the performance boost from the MapD query engine. Each zoom or pan results in a GetMap request to the MapD engine that queries millions of tweet point records (81 million tweets Oct 19 – Oct 30), generating a viewport png image for display over Bing Map. The server side query latency is amazing considering the population size of the data. Here are a couple of screen capture videos to give you the idea of the higher fps rates:

    MapDBingTestAerialYellow50ms.wmv
    MapDBingHeatTest.wmv

    Interestingly, IE and FireFox handle cache in such a way that animations up to 100fps are possible. I can set a low play interval of 10ms and the player appears to do nothing. However, 24hr x12 days = 288 images are all being downloaded in just a few seconds. Consequently the next time through the play range the images come from cache and animation is very smooth. Chrome handles local cache differently in Windows 8 and it won’t grab from cache the second time. In the demo case the sample runs at 500ms or 2fps which is kind of jumpy but at least it works in Windows 8 Chrome with an ordinary internet download speed of 8Mbps

    Demo site for MapD: http://mapd.csail.mit.edu/

    Pro:

    • Server side performance up to 70x
    • Internet stream latency reduced to just the viewport image overlay
    • Client side rendering as a single image overlay is fast

    Con:

    • Source code not released, and there may be proprietary license restrictions
    • Most web servers do not include GPU or GPU clusters – especially cloud instances

    Note: Amazon AWS offers GPU Clusters but not cheap.

    Cluster GPU Quadruple Extra Large 22 GiB memory, 33.5 EC2 Compute Units, 2 x NVIDIA Tesla “Fermi” M2050 GPUs, 1690 GB of local instance storage, 64-bit platform, 10 Gigabit Ethernet( $2.100 per Hour)

    NVidia Tesla M2050 – 448 CUDA Cores per GPU and up to 515 Gigaflops of double-precision peak performance in each GPU!

    Fig 11 - Demo displaying public MapD engine tweet data over Bing Maps

    C. Spatial Hadoophttp://spatialhadoop.cs.umn.edu/
    Spatial Hadoop applies the parallelism of Hadoop clusters to spatial problems using the MapReduce technique made famous by Google. In the Hadoop world a problem space is distributed across multiple CPUs or servers. Spatial Hadoop adds a nice collection of spatial objects and indices. Although Azure Hadoop supports .NET, there doesn’t seem to be a spatial Hadoop in the works for .NET projects. Apparently MapD as open source would leap frog Hadoop clusters at least for performance per dollar.

    D. In Memory database (SQL Server 2014 Hekatron in preview release) – Microsoft plans to enhance the next version of SQL Server with in-memory options. SQL server 2014 in-memory options allows high speed queries for very large data sets when deployed to high memory capacity servers.

    Current SQL Server In-Memory OLTP CTP2

    Creating Tables
    Specifying that the table is a memory-optimized table is done using the MEMORY_OPTIMIZED = ON clause. A memory-optimized table can only have columns of these supported datatypes:

    • Bit
    • All integer types: tinyint, smallint, int, bigint
    • All money types: money, smallmoney
    • All floating types: float, real
    • date/time types: datetime, smalldatetime, datetime2, date, time
    • numeric and decimal types
    • All non-LOB string types: char(n), varchar(n), nchar(n), nvarchar(n), sysname
    • Non-LOB binary types: binary(n), varbinary(n)
    • Uniqueidentifier”

    Since geometry and geography data types are not supported with the next SQL Server 2014 in-memory release, spatial data queries will be limited to point (lat,lon) float/real data columns. It has been previously noted that for point data, float/real columns have equivalent or even better search performance than points in a geography or geometry form. In-memory optimizations would then apply primarily to spatial point sets rather than polygon sets.

    Natively Compiled Stored Procedures The best execution performance is obtained when using natively compiled stored procedures with memory-optimized tables. However, there are limitations on the Transact-SQL language constructs that are allowed inside a natively compiled stored procedure, compared to the rich feature set available with interpreted code. In addition, natively compiled stored procedures can only access memory-optimized tables and cannot reference disk-based tables.”

    SQL Server 2014 natively compiled stored procedures will not include any spatial functions. This means optimizations at this level will also be limited to float/real lat,lon column data sets.

    For fully spatialized in-memory capability we’ll probably have to wait for SQL Server 2015 or 2016.

    Pro:

    • Reduce server side latency for spatial queries
    • Enhances performance of image based server side techniques
      • Dynamic Tile pyramids
      • images (similar to MapD)
      • Heatmap grid clustering
      • Thematic aggregation

    Con:

    • Requires special high memory capacity servers
    • It’s still unclear what performance enhancements can be expected from spatially enabled tables

    D. Hybrids

    The trends point to a hybrid solution in the future which addresses the server side query bottleneck as well as client side navigation rendering bottleneck.

    Server side –
    a. In-Memory spatial DB
    b. Or GPU based parallelized queries

    Client side – GPU enhanced with some version of WebGL type functionality that can makes use of client GPU

    Summary

    Techniques are available today that can accommodate large date resources in Bing Maps. Trends indicate that near future technology can really increase performance and flexibility. Perhaps the sweet spot for Big Data map visualization over the next few years will look like a MapD or a GPU Hadoop engine on the server communicating to a WebGL UI over 1 gbps fiber internet.

    Orwell feared that we would become a captive audience. Huxley feared the truth would be drowned in a sea of irrelevance.

    Amusing Ourselves to Death, Neil Postman

    Of course, in America, we have to have the best of both worlds. Here’s my small contribution to irrelevance:

    Fig 12 - Heatmap animation of Twitter from MapD over Bing Maps (100fps)

    Kind of 3D with D3

    Fig 1 – NASA Near Earth Observation - Cloud Optical Thickness using D3js.org Orthographic projection

    Charts and Graphs

    Growing up in the days before graphing calculators (actually before calculators at all), I’ve had a lingering antipathy toward graph and chart art. Painful evenings plotting interminable polynomials one dot at a time across blue lined graph paper left its inevitable scars. In my experience, maps are a quite different thing altogether. Growing up in Ohio at the very edge of Appalachia, I have quite pleasant memories of perusing road atlases, planning escapes to a dimly grasped world somewhere/anywhere else.

    The “I’m outa here” syndrome of a mid-century youth, landed me in Colorado where I quickly found the compelling connection between USGS Topo maps and pleasant weekends climbing in Colorado’s extensive National Forests. I still fondly recall the USGS map desk at Denver’s Lakewood Federal Center where rows of flat metal cabinet drawers slid out to expose deep piles of 1:24000 scale topography maps with their rich linear contour features.

    I bore you with this personal recollection to set the stage for my amazement at discovering d3.js. My enduring prejudice of charts and graphs had already begun to crack a bit after seeing Hans Rosling’s entertaining Ted lectures from 2006. (D3 version of Wealth and Health of Nations). Perhaps my pentient for the concrete posed a barrier to more abstract demands of statistical modeling. At any rate Hans Rosling’s engaging enthusiasm was able to make a dent in my prejudice, and I found myself wondering how his graphs were coded. Apart from the explicit optimism conveyed by Rosling’s energy, the visual effect of primary colored balloons rising in celebratory fashion is quite hopefully contradictory of the harsh Hobbesian dictum.

    “the life of man, solitary, poor, nasty, brutish, and short.”

    The revelation that my childhood experience of the 50s is recapitulated by the modern economic experience of children in Trinidad and Tobago was something of an eye opener. Although no maps are in sight, the dynamic visualization of normally depressing socio-economic statistics lent visual energy to a changing landscape of global poverty.

    Fig 2 – Hans Rosling’s Wealth & Health of Nations implemented in D3

    The application of technology to the human condition seems so inspiring that it’s worth the reminder that technology has its flaws. Can it really be that the computational power in hand on the streets of Lagos Nigeria now exceeds that of Redmond or Palo Alto? Will ready access to MIT courseware on Probability and Statistics actually mitigate the desperation of Nairobi’s slums? A dash of Realpolitik is in order, but the graphics are none the less beautiful and inspiring.

    Fig 3 – Mobile phones transform Lagos

    D3.Geo

    d3js.org is an open source javascript library for data visualization. In d3, data enters, performs, and exits with a novel select pattern geared to dynamic rendering. The houselights dim, the stage is lit, data enters stage left, the data performance is exquisite, data exits stage right. This is a time oriented framework with data on the move. The dynamism of data is the key to compelling statistics and d3 examples illustrate this time after time.

    With d3.js Economics may remain the Dismal Science, but its charts and graphs can now be a thing of beauty. D3 charts are visually interesting, but more than that, chart smarts are leaking into the spatial geography domain. Mike Bostock of NYT fame and Jason Davies are prolific contributors to d3js.org.

    D3 version 3.0, released just a couple of weeks ago, added a wide range of projective geometry to d3.geo. Paths, Projections, and Streams combine to realize a rich html5 rendering library capable of animation effects rarely seen in GIS or on the web.

    Visually these are not your grandfathers’ charts and maps.

    TopoJSON

    There’s more. Sean Gillies recently remarked on the advent of TopoJSON, apparently a minor side project of mbostock, fully supported by D3.Geo.Paths. In the GIS world ESRI has long held the high ground on topological data (the Arc of GIS), and now some blokes from NYT charts and graphs have squashed it into an easy to use javascript library. Wow!

    TopoJSON is still more or less a compressed transport mechanism between OGC Simple Features in a server DB and client side SVG, but I imagine there will shortly be server side conversions for PostGIS and SQL Server to take advantage of major low latency possibilities. Client side simplification using the Visvalingam’s algorithm are virtually instantaneous, so zoom reduction can occur client side quite nicely.

    I think you get the idea. D3.js is powerful stuff and lots of fun.

    Some Experiments

    It’s the New Year’s holiday with spare time to fire up a favorite data source, NASA Neo, and try out a few things. The orthographic projection is an azimuthal projection with the advantage of reproducing the visual perspective of earth from a distant vantage point.

    Simple as:

    var projection = d3.geo.orthographic()
        .scale(245)
        .clipAngle(90);
    

    Fig 4 – Natural earth world data in d3js.org orthographic projection

    In addition to a graticule and circle edge paths, this takes advantage of TopoJSON formatted natural earth world data published on GitHUB by, you guessed it, mbostock.

            d3.json("topojson/world-110m.json", function (error, world) {
                nasa.svg.append("path")
                    .datum(topojson.object(world, world.objects.land))
                    .attr("class", "boundary")
                    .attr("d", nasa.path);
            });
    

    Add some mouse event handlers to whirl the globe:

    mousedown: function () {
            nasa.p0 = [d3.event.pageX, d3.event.pageY];
            nasa.context.clearRect(0, 0, nasa.width, nasa.height);
            d3.event.preventDefault();
        },
        mousemove: function () {
            if (nasa.p0) {
                nasa.p0 = [d3.event.pageX, d3.event.pageY];
                nasa.projection.rotate([nasa.λ(nasa.p0[0]), nasa.φ(nasa.p0[1])]);
                nasa.svg.selectAll("path").attr("d", nasa.path);
            }
        },
        mouseup: function () {
            if (nasa.p0) {
                nasa.mousemove();
                nasa.context.clearRect(0, 0, nasa.width, nasa.height);
                nasa.getOverlay(nasa.overlay);
                nasa.p0 = null;
            }
        },
    

    Interactive orthographic from mbostock has a nice set of handlers adapted for this experiment. Note there are still a few quirks regarding transposing mousedown event locations to be worked out in my experimental adaptation. (My holiday free time is running out so some things have to make do.)

    With mouse action, d3’s orthographic projection becomes a globe. It responds much more smoothly in Chrome than IE, apparently due to a javascript performance advantage in Chrome.

    Fig 5 – javascript performance testing

    This ortho globe feels 3D all because D3 affords a fast refresh through a projection on the vector continent and graticule paths.

    nasa.svg.selectAll("path").attr("d", nasa.path); 

    Vectors are fast, but for deeper information content I turn to NASA’s Near Earth Observation data, available as imagery from this public WMS service. The beauty of this imagery is still compelling after all these years.

    Fig 6 – NASA Land night temperature overlay

    However, geodetic imagery needs to be transformed to orthographic as well. D3 has all the necessary plumbing. All I added was the NASA WMS proxy with a .net WCF service.

    getOverlay: function (overlay) {
            if (overlay != null && overlay.length>0) {
    
                nasa.Showloading();
                nasa.image = new Image;
                nasa.image.onload = nasa.onload;
                nasa.image.src = Constants.ServiceUrl + "/GetMap?url=http://neowms.sci.gsfc.nasa.gov/wms/wms?Service=WMS%26version=1.1.1%26Request=GetMap%26Layers=" + overlay + "%26SRS=EPSG:4326%26BBOX=-180.0,-90,180,90%26width="+nasa.width+"%26height="+nasa.height+"%26format=image/jpeg%26Exceptions=text/xml";
            }
        },
    
        onload: function () {
    
            var dx = nasa.image.width,
                dy = nasa.image.height;
    
            nasa.context.drawImage(nasa.image, 0, 0, dx, dy);
    
            var sourceData = nasa.context.getImageData(0, 0, dx, dy).data,
                target = nasa.context.createImageData(nasa.width, nasa.height),
                targetData = target.data;
    
            for (var y = 0, i = -1; y < nasa.height; ++y) {
                for (var x = 0; x < nasa.width; ++x) {
                    var p = nasa.projection.invert([x, y]), λ = p[0], φ = p[1];
                    if (λ > 180 || λ < -180 || φ > 90 || φ < -90) { i += 4; continue; }
                    var q = ((90 - φ) / 180 * dy | 0) * dx + ((180 + λ) / 360 * dx | 0) << 2;
                    targetData[++i] = sourceData[q];
                    targetData[++i] = sourceData[++q];
                    targetData[++i] = sourceData[++q];
                    targetData[++i] = 255;
                }
            }
            nasa.context.clearRect(0, 0, nasa.width, nasa.height);
            nasa.context.putImageData(target, 0, 0);
            nasa.Hideloading();
        },
    

    Looping through 1,182,720 pixels in javascript is not the fastest, but just to be able to do it at all with only a dozen lines of javascript is very satisfying. There may be some server side options to improve performance and PostGIS Raster is worthy of investigation. However, html5 browsers with enhanced GPU access should eventually supply higher performance raster transforms.

    Fig 7 – NASA Land night temperatures transformed with D3 Orthographic projection.invert

    Selection JsTree

    For this experiment I also made use of JsTree for the layer selections out of the NASA WMS GetCapabilities. Layer choices are extensive and even with a tree expander approach the options overflow the available div space of IE’s jQuery UI draggable accordion panel. Scroll bars work poorly with draggable divs. A future enhancement could be manually allocated multiple expanders following NASA’s Ocean, Atmosphere, Energy, Land, and Life categories. Unfortunately this would invalidate the nice GetCapabilities layer extraction feature of the proxy service. NASA’s WMS also provides LegendURL elements for better understanding of the color ranges, which should be added to the selection panel.

    MouseWheel
    Since d3.geo offers projection.scale(), mousewheel events are a nice to have option that is easily bound to a browser window with jquery-mousewheel plugin.

    $(window).mousewheel(function (event, delta, deltaX, deltaY) {
                var s = nasa.projection.scale();
                if (delta > 0) {
                    nasa.projection.scale(s * 1.1);
                }
                else {
                    nasa.projection.scale(s * 0.9);
                }
                nasa.svg.selectAll("path").attr("d", nasa.path);
                nasa.context.clearRect(0, 0, nasa.width, nasa.height);
                nasa.getOverlay(nasa.overlay);
            });
    

    Tilted Perspective
    Even though NEO data resolution doesn’t really warrant it, using tilted perspective d3.geo.satellite projection affords a space station view point. Incorporating a steerable camera is a more complex project than time allows.

    var projection = d3.geo.satellite()
        .distance(1.1)
        .scale(5500)
        .rotate([76.00, -34.50, 32.12])
        .center([-2, 5])
        .tilt(25)
        .clipAngle(25);
    

    Fig 8 – Tilted perspective d3.geo.satellite projection of NASA BlueMarbleNG-TB

    Possible Extensions:
    Time is the next target, since NASA Neo exposes Time extension features.
    <Extent default=”2012-12-11″ name=”time”>

    Example:
    http://neowms.sci.gsfc.nasa.gov/wms/wms?Service=WMS&version=1.1.1&Request=GetMap&Layers=MOD_LSTAD_D&time=2000-03-06/2000-04-26/P1D&SRS=EPSG:4326&BBOX=-180.0,-85,180,85&width=1000&height=500&format=image/jpeg&Exceptions=text/xml

    One approach would be a time series download as an array of image objects from the proxy service, with a step through display bound to a slider on the UI. Once downloaded and projected the image stack could be serially displayed as an animation sequence. NASA WMS Time domain is somewhat abbreviated with only about 12-14 steps available. For a longer animation affect some type of caching worker role might store images as a continuous set of PostGIS raster data types with an ongoing monthly update. PostGIS has the additional advantage of pre-projected imagery fetching for more efficient time lapse performance.

    Warning:
    IIS requires adding a .json application/json mime type to make use of TopoJSON formatted data.

    Summary:

    Mike Bostock and Jason Davies have done us all a great favor making the power of d3 freely available on GitHUB under BSD License. For sheer prodigality http://bl.ocks.org/mbostock and http://bl.ocks.org/jasondavies examples can hardly be matched. Anyone need a Guyou Hemisphere projection or perhaps a Peirce Quincuncial tessellation? d3.geo has it ready to serve up in web friendly fashion.

    James Fee sums up d3 quite succinctly in his recent post “Why D3 Will Change How We Publish Maps”. I agree. We’re all joining the blokes over in charts and graphs.

    links:
    http://www.jasondavies.com/maps
    Interrupted Bonne
    World Tour
    Rotating EquiRect