Territorial Turf

Fig 1 – Territories from Census Zip Code Tabulation Areas (ZCTAs) on Bing Maps

An interesting GIS development over the years has been the evolution from monolithic applications to multiple open source plug and play tools. Considering GIS as a three tier system with back end storage, a controller in the middle, and a UI display out front, more and more of the middle tier is migrating to either end.

SQL DBs, such as SQL Server, Oracle Spatial, and especially PostGIS, now implement a multitude of GIS functions originally considered middle tier domain. On the other end, the good folks at turf.js and proj4js continue to push atomic GIS functions out to javascript, where they can fit into the client side UI. The middle tier is getting thinner and thinner as time goes on. Generally the middle is what costs money, so rolling your own middle with less work is a good thing. As a desirable side effect, instead of kitchen sink GIS, very specific user tools can be cobbled together as needed.

Looking for an excuse to experiment with turf.js, I decided to create a Territory Builder utilizing turf.js on the client and some of the US Census Bureau Services on the backend.

US Census Bureau does expose some “useful” services at TigerWeb. I tend to agree with Brian Timoney that .gov should stick to generating data exposed in useful ways. Apps and presentation are fast evolving targets and historically .gov can’t really hope to keep up. Although you can use the census custom TigerWeb applications for some needs, there are many other occasions when you would like to build something less generic. For example a Territory builder over Bing Maps.

Here is the simplified POC work flow:

1. Use Census WMS GetMap to show polygons on a canvas over Bing Maps.
2. Use Census WMS GetFeatureInfo to return a GeoID of selected polygon(s).
3. Use Census REST service and GeoID to return selected polygon vertices.
4. Use Turf.js merge function to merge selected polygons into a single Territory polygon
5. Display territory polygon using Bing Maps Ajax v7 Microsoft.Maps.AdvancedShapes polygon

Fig 2 – Territory polygon with a demographic overlay on top of Bing Maps

Note: Because there doesn’t appear to be an efficient way to join demographic data to geography with current TigerWeb service APIs, the demographic tab of this app uses a custom WMS PostGIS backend, hooking SF1 to TIGER polygons.

1. Census WMS GetMap over Bing Maps

WMS GetMap requests simply return an image. In order to overlay the image on a Bing Map this Territory app uses an html5 canvas and context app.context.drawImage(imageObj, 0, 0); The trick is to stack the canvas in between the Bing Map and the Bing Map navigation, scale, and selector tools. TigerWeb WMS conveniently exposes epsg:3857 which correctly aligns with Bing Map tiles.

    function showCensus() {
        if (app.canvas == null) {
            app.canvas = document.createElement('canvas');
            app.canvas.id = 'censuscanvas';
            app.canvas.style.position = 'relative';
            var mapDiv = app.map.getRootElement();
            //position this canvas under Bing navigation and over map tiles
            mapDiv.insertBefore(app.canvas, mapDiv.childNodes[3]);
            app.context = app.canvas.getContext("2d");
        }
        //match canvas size to map size
        app.canvas.height = app.map.getHeight();
        app.canvas.width = app.map.getWidth();

        var imageObj = new Image();
        //image onload function draws image on context
        imageObj.onload = function () {
            app.canvas.style.opacity = app.alpha / 100;
            app.context.drawImage(imageObj, 0, 0);
        };
        //prepare a TigerWeb WMS GetMap request
        //use proj4.js to transform ll to mercator bbox
        var b = app.map.getBounds();
        var epsg3857 = "+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext  +no_defs";
        var sw = proj4(epsg3857).forward([b.getWest(), b.getSouth()]);
        var ne = proj4(epsg3857).forward([b.getEast(), b.getNorth()]);
        bbox = sw[0] + "," + sw[1] + "," + ne[0] + "," + ne[1];

        var url = "";
        if (tab == "Territory") {
            app.alpha = 100;
            // GetMap request
            url = "http://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_Current/MapServer/WmsServer?REQUEST=GetMap&SERVICE=WMS&VERSION=1.3.0&LAYERS=" + app.censusPolygonType + "&STYLES=&FORMAT=image/png&BGCOLOR=0xFFFFFF&TRANSPARENT=TRUE&CRS=EPSG:3857&BBOX=" + bbox + "&WIDTH=" + app.canvas.width + "&HEIGHT=" + app.canvas.height;
        }
        imageObj.src = url;
    }

2. Census WMS GetFeatureInfo for obtaining GeoID

Unfortunately the WMS GetMap request has no IDs available for polygons, even if requested format is image/svg+xml. SVG is an xml format and could easily contain associated GeoID values for joining with other data resources, but this is contrary to the spirit of OGC WMS service specs. Naturally we must obey OGC Law, which is too bad. Adding a GeoID attribute would allow options such as choropleth fills directly on existing svg paths. For example adding id = “80132″ would allow fill colors by zipcode with a bit of javascript.

http://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_Current/MapServer/WmsServer?REQUEST=GetMap&SERVICE=WMS&VERSION=1.3.0 &LAYERS=2010 Census ZIP Code Tabulation Areas&STYLES=&FORMAT=image/svg+xml&BGCOLOR=0xFFFFFF&TRANSPARENT=TRUE&CRS=EPSG:3857 &BBOX=-11679625.942909468,4709198.547476525,-11645573.246808422,4737900.651597611&WIDTH=891&HEIGHT=751

<g transform="matrix(1.3333333 0 0 -1.3333333 445.5 375.5)">
<path id="80132" d="M300.50809 -258.78592 L302.54407 -257.91829 L303.17977 -257.55608 L304.45554 -256.98889
                                .
				.
L280.30116 -268.55133 L280.7184 -268.34637 L298.33448 -259.95117 L300.50809 -258.78592 "
  stroke="#99454A" fill="none" stroke-opacity="1" stroke-width="1.5"
 stroke-linecap="round" stroke-linejoin="round" stroke-miterlimit="10" />

Instead TigerWeb WMS exposes GetFeatureInfo requests. Using a click event we calculate a lat,lon location and send a GetFeatureInfo request to find a GeoID for the enclosing polygon.

Request:
http://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_Current/MapServer/WmsServer?REQUEST=GetFeatureInfo&SERVICE=WMS&VERSION=1.3.0&CRS=EPSG:4326 &BBOX=13.6972399985862,-128.010213138288,52.2800368298775,-53.444917258507&WIDTH=1061&HEIGHT=549 &INFO_FORMAT=text/xml&QUERY_LAYERS=2010 Census ZIP Code Tabulation Areas&X=388&Y=190

Response:
<?xml version=”1.0″ encoding=”UTF-8″?>
<FeatureInfoResponse xmlns=”http://www.esri.com/wms” xmlns:esri_wms=”http://www.esri.com/wms”>
<FIELDS INTPTLON=”-101.2099848″ INTPTLAT=”+38.9263824″ CENTLON=”-101.2064010″ CENTLAT=”+38.9226272″ STGEOMETRY=”Polygon” OBJECTID=”16553″ AREAWATER=”163771″ AREALAND=”1327709219″ FUNCSTAT=”S” ZCTA5CC=”B5″ MTFCC=”G6350″ NAME=”ZCTA5 67764″ LSADC=”Z5″ BASENAME=”67764″ GEOID=”67764″ ZCTA5=”67764″ OID=”221404258331221″/>
</FeatureInfoResponse>

Since cross browser restrictions come into play it’s necessary to add a bit of server side proxy code to actually return the xml.

Some MVC controller magic:

 function GetCensusFeature(e) {
    var x = e.pageX - $("#map").offset().left;
    var y = e.pageY - $("#map").offset().top;
    var FeatureUrl = {
        url: "http://tigerweb.geo.census.gov/arcgis/services/TIGERweb/tigerWMS_Current/MapServer/WmsServer?REQUEST=GetFeatureInfo&SERVICE=WMS&VERSION=1.3.0&LAYERS=" + app.censusPolygonType + "&STYLES=&FORMAT=image/png&BGCOLOR=0xFFFFFF&TRANSPARENT=TRUE&CRS=EPSG:3857&BBOX=" + bbox + "&WIDTH=" + app.canvas.width + "&HEIGHT=" + app.canvas.height + "&INFO_FORMAT=text/xml&QUERY_LAYERS=" + app.censusPolygonType + "&X=" + x + "&Y=" + y
    }
        app.postit('/api/Operation/GetFeatureInfo', {
            data: JSON.stringify(FeatureUrl),
            success: function (response) {
                if (response.length > 0) {
                    var parser = new DOMParser();
                    var xmlDoc = parser.parseFromString(response, "text/xml");
				          .
				          .

Eventally gets here to a C# Controller which can proxy the GetFeatureInfo request:

        /// <summary>
        /// GetFeatureInfo
        /// </summary>
        /// <param name="FeatureUrl"></param>
        /// <returns>xml doc</returns>
        [HttpPost]
        public async Task<IHttpActionResult> GetFeatureInfo(Models.FeatureUrl feature)
        {
            var wc = new HttpClient();
            var getFeatureInfoRequest = new Uri(feature.url);
            var response = await wc.GetAsync(getFeatureInfoRequest);
            response.EnsureSuccessStatusCode();
            var content = await response.Content.ReadAsStreamAsync();
            var result = "";
            using (var reader = new StreamReader(content))
            {
                while (!reader.EndOfStream)
                {
                    result += reader.ReadLine();
                }
            }
            return Ok(result);
        }

Finally, on success, javascript can parse the xml for GeoID and add a canvas context box with some text information:

     success: function (response) {
              if (response.length > 0) {
                    var parser = new DOMParser();
                    var xmlDoc = parser.parseFromString(response, "text/xml");
                    if (tab == "Territory" && xmlDoc.getElementsByTagName("FeatureInfoResponse")[0].hasChildNodes()) {
                        app.context.beginPath();
                        app.context.rect(x, y, 125, 25);
                        app.context.fillStyle = 'white';
                        app.context.fill();
                        app.context.lineWidth = 1;
                        app.context.strokeStyle = 'black';
                        app.context.stroke();
                        app.context.fillStyle = 'black';
                        app.context.font = '11pt Calibri';
                        var fields = xmlDoc.getElementsByTagName("FIELDS")[0];
                        app.context.fillText(fields.getAttribute('NAME'), x + 5, y + 15);
                        var geoid= fields.getAttribute('GEOID')
					            .
					            .

Fig 3 – building territories from Census Block Groups on Bing Maps

3. Census REST to obtain vertices

The GEOID retrieved from our proxied GetFeatureInfo request allows us to grab vertices with another TigerWeb service, Census REST.

This spec is a little more proprietary and requires some detective work to unravel.
FeatureUrl.url = “http://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/PUMA_TAD_TAZ_UGA_ZCTA/MapServer/1/query?where=GEOID%3D” + geoid + “&geometryPrecision=6&outSR=4326&f=pjson”;

There are different endpoint urls for the various polygon types. In this case Zip Codes are found in PUMA_TAD_TAZ_UGA_ZCTA. We don’t need more than 1 meter resolution so precision 6 is good enough and we would like the results in epsg:4326 to avoid a proj4 transform on the client.

This follows the same process as previously sending the REST request through a proxy controller. You can see the geojson geometry result with this sample request:
http://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/PUMA_TAD_TAZ_UGA_ZCTA/MapServer/1/query?where=GEOID%3D80004&geometryPrecision=6&outSR=4326&f=pjson

Census REST doesn’t appear to offer a simplify parameter so the coordinates returned are at the highest resolution. Highly detailed polygons can easily return several thousand vertices, which is a problem for performance, but the trade-off is eliminating hosting data ourselves.

4. turf.js merge function to Build Territory

Finally the interesting part, where we get to use turf.js to handle merging multiple polygons into a single territory polygon.

var FeatureUrl = {
url: "http://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/PUMA_TAD_TAZ_UGA_ZCTA/MapServer/1/query?where=GEOID%3D" + geoid + "&geometryPrecision=6&outSR=4326&f=pjson";
}
app.postit('/api/Operation/GetFeatureGeo', {
    data: JSON.stringify(FeatureUrl),
    success: function (response) {
        var polygon = JSON.parse(response);
        if (polygon.features.length == 0 || polygon.features[0].geometry.rings.length == 0) {
            $('#loading').addClass('hide');
            alert("No Geo Features returned.")
            return;
        }
        if (app.territoryFeatures == null) {
            app.territoryFeatures = new Array();
            app.territory = turf.polygon(polygon.features[0].geometry.rings);
            app.territoryFeatures.push(app.territory);
        }
        else {
            var tpolygon = turf.polygon(polygon.features[0].geometry.rings);
            app.territoryFeatures.push(tpolygon);
            var fc = turf.featurecollection(app.territoryFeatures);
            try {
                app.territory = turf.merge(fc);
            }
            catch (err) {
                $('#loading').addClass('hide');
                alert("turf.js error: " + err.message);
                return;
            }
        }
        if (app.territory.geometry.coordinates.length > 0) {
            displayTerritory();
            $('#totalDivID').removeClass('hide');
        }

        $('#loading').addClass('hide');
    },
    error: function (response) {
        alert("TigerWeb error:" + response);
    }
});

5. Display Territory Bing Maps Ajax polygon

The final rendering just uses Bing Maps Ajax v7 Microsoft.Maps.AdvancedShapes to add the new territory to the map.

function displayTerritory() {
    app.territoryLayer.clear();
    var coordLen = app.territory.geometry.coordinates.length;
    if (app.territory.geometry.type == 'Polygon') {
        var rings = new Array();
        for (var i = 0; i < coordLen; i++) {
            var vertices = new Array();
            for (var j = 0; j < app.territory.geometry.coordinates[i].length; j++) {
                vertices.push(new Microsoft.Maps.Location(app.territory.geometry.coordinates[i][j][1], app.territory.geometry.coordinates[i][j][0]));
            }
            rings.push(vertices);
        }
        var polygon = new Microsoft.Maps.Polygon(rings, { fillColor: new Microsoft.Maps.Color(100, 100, 0, 100) });
        app.territoryLayer.push(polygon);
    }
    else if (app.territory.geometry.type == 'MultiPolygon') {
        var multi = new Array();
        for (var i = 0; i < coordLen; i++) {
            var ringslen = app.territory.geometry.coordinates[i].length;
            for (var j = 0; j < ringslen; j++) {
                var vertices = new Array();
                for (var k = 0; k < app.territory.geometry.coordinates[i][j].length; k++) {
                    vertices.push(new Microsoft.Maps.Location(app.territory.geometry.coordinates[i][j][k][1], app.territory.geometry.coordinates[i][j][k][0]));
                }
                multi.push(vertices)
            }
        }
        var polygon = new Microsoft.Maps.Polygon(multi, { fillColor: new Microsoft.Maps.Color(100, 100, 0, 100) });
        app.territoryLayer.push(polygon);
    }
    else {
        $('#loading').addClass('hide');
        alert("geometry type is " + app.territory.geometry.type + ". Territory requires a Polygon or MultiPolygon.")
        return;
    }
}

Fig 4 – Territory Total for P0040003 Latino Origin – Map: Quantile population classifier Census Block Groups on Bing Map

Summary

TigerWeb offers some useful data access. With TigerWeb WMS and REST api, developers can customize apps without hosting and maintaining a backend SQL store. However, there are some drawbacks.

Some potential improvements:
1. Adding an svg id=GeoID would really improve the usefulness of TigerWeb WMS image/svg+xml, possibly eliminating steps 2 and 3 of the workflow.

2. Technically it’s possible to use the TigerWeb REST api to query geojson by area, but practically speaking the results are too detailed for useful performance. A helpful option for TigerWeb REST would be a parameter to request simplified polygons and avoid lengthy vertice transfers.

turf.js is a great tool box, however, occasionally the merge function had trouble with complex polygons from TigerWeb.

Fig 6 - turf merge had some rare difficulties with complex polygons from TigerWeb

In the end, territories are useful for query aggregates of SF1 demographic data.

Fig 5 – Territory Total for P0120025 Male 85 Years and older – Map: Jenks population classifier Census Block Group on Bing Map

2020 The Last Census?

Fig 1 - SF1QP Quantile Population County P0010001 P1.TOTAL POPULATION Universe: Total population

Preparation for US 2020 Census is underway at this mid-decennial point and we’ll see activity ramping up over the next few years. Will 2020 be the last meaningful decennial demographic data dump? US Census has been a data resource since 1790. It took a couple centuries for Census data to migrate into the digital age, but by Census 2000, data started trickling into the internet community. At first this was simply a primitive ftp data dump, ftp2.census.gov, still very useful for developers, and finally after 2011 exposed as OGC WMS, TigerWeb UI, and ESRI REST.

However, static data in general, and decennial static data in particular, is fast becoming anachronistic in the modern era. Surely the NSA data tree looks something like phone number JOIN Facebook account JOIN Twitter account JOIN social security id JOIN bank records JOIN IRS records JOIN medical records JOIN DNA sequence….. Why should this data access be limited to a few black budget bureaus? Once the data tree is altered a bit to include mobile devices, static demographics are a thing of the past. Queries in 2030 may well ask “how many 34 year old male Hispanic heads of households with greater than 3 dependents with a genetic predisposition to diabetes are in downtown Denver Wed at 10:38AM, at 10:00PM?” For that matter let’s run the location animation at 10 minute intervals for Tuesday and then compare with Sat.

“Privacy? We don’t need no stinking privacy!”

I suppose Men and Black may find location aware DNA queries useful for weeding out hostile alien grays, but shouldn’t local cancer support groups also be able to ping potential members as they wander by Star Bucks? Why not allow soda vending machines to check for your diabetic potential and credit before offering appropriate selections? BTW How’s that veggie smoothie?

Back to Old School

Fig 2 - SF1QD Quantile Density Census Block Group P0050008 P5.HISPANIC OR LATINO ORIGIN BY RACE Universe: Total population

By late 2011 census OGC services began to appear along with some front end data web UIs, and ESRI REST interfaces. [The ESRI connection is a tightly coupled symbiotic relationship as the Census Bureau, like many government bureaucracies, relies on ESRI products for both publishing and consuming data. From the outside ESRI could pass as an agency of the federal government. For better or worse “Arc this and that” are deeply rooted in the .gov GIS community.]

For mapping purposes there are two pillars of Census data, spatial and demographic. The spatial data largely resides as TIGER data while the demographic data is scattered across a large range of products and data formats. In the basement, a primary demographic resource is the SF1, Summary File 1, population data.

“Summary File 1 (SF 1) contains the data compiled from the questions asked of all people and about every housing unit. Population items include sex, age, race, Hispanic or Latino origin, household relationship, household type, household size, family type, family size, and group quarters. Housing items include occupancy status, vacancy status, and tenure (whether a housing unit is owner-occupied or renter-occupied).”

The intersection of SF1 and TIGER is the base level concern of census demographic mapping. There are a variety of rendering options, but the venerable color themed choropleth map is still the most widely recognized. This consists of assigning a value class to a color range and rendering polygons with their associated value color. This then is the root visualization of Census demographics, TIGER polygons colored by SF1 population classification ranges.

Unfortunately, access to this basic visualization is not part of the 2010 TigerWeb UI.

There are likely a few reasons for this, even aside from the glacially slow adoption of technology at the Bureau of the Census. A couple of obvious reasons are the sheer size of this data resource and the range of the statistics gathered. A PostGIS database with 5 level primary spatial hierarchy, all 48 SF1 population value files, appropriate indices, plus a few helpful functions consumes a reasonable 302.445 GB of a generic Amazon EC2 SSD elastic block storage. But, contained in those 48 SF1 tables are 8912 demographic values which you are welcome to peruse here. A problem for any UI is how to make 8912 plus 5 spatial levels usable.

Fig 3 – 47 SF1 tables plus sf1geo geography join file

Filling a gap

Since the Census Bureau budget did not include public visualization of TIGER/Demographics what does it take to fill in the gap? Census 2010 contains a large number of geographic polygons. The core hierarchy for useful demographic visualization is state, county, tract, block group, and block.

Fig 4 – Census polygon hierarchy

Loading the data into PostGIS affords low cost access to data for SF1 Polygon value queries such as this:

-- block tabblock
SELECT poly.GEOM, geo.stusab, geo.sumlev, geo.geocomp, geo.state, geo.county, geo.tract, geo.blkgrp, geo.block, poly.geoid10, sf1.P0010001, geo.stlogrecno
FROM tabblock poly
JOIN sf1geo geo ON geo.geoid10 = poly.geoid10
JOIN sf1_00001 sf1 ON geo.stlogrecno = sf1.stlogrecno
WHERE geo.geocomp='00' AND geo.sumlev = '101' AND ST_Intersects(poly.GEOM, ST_GeometryFromText('POLYGON ((-104.878035974004 38.9515291859429,-104.721023973742 38.9515291859429,-104.721023973742 39.063158980149,-104.878035974004 39.063158980149,-104.878035974004 38.9515291859429))', 4269))
ORDER BY geo.state, geo.county, geo.tract, geo.blkgrp, geo.block

Returning 1571 polygons in 1466 ms. Not too bad, but surely there’s room for improvement. Where is Paul Ramsey when you need him?

Fig 5 - PostgreSQL PostGIS Explain Query

Really Old School – WMS

Some considerations on the data:

A. Queries become unwieldy for larger extents with large numbers of polygons

Polygon Counts
county 3,233
tract 74,133
blockgroup 220,740
tabblock 11,166,336

These polygon counts rule out visualizations of the entire USA, or even moderate regions, at tract+ levels of the hierarchy. Vector mapping is not optimal here.

B. The number of possible image tile pyramids for 8912 values over 5 polygon levels is 5 * 8192 = 44,560. This rules out tile pyramids of any substantial depth without some deep Google like pockets for storage. Tile pyramids are not optimal either.

C. Even though vector grid pyramids would help with these 44,560 demographic variations, they suffer from the same restrictions as A. above.

One possible compromise of performance/visualization is to use an old fashioned OGC WMS GetMap request scheme that treats polygon types as layer parameters and demographic types as style parameters. With appropriate use of WMS <MinScaleDenominator> <MaxScaleDenominator> the layers are only rendered at sufficient zoom to reasonably limit the number of polygons. Using this scheme puts rendering computation right next to the DB on the same EC2 instance, while network latency is reduced to simple jpeg/png image download. Scaling access to public consumption is still problematic, but for in-house it does work.

Fig 6 – Scale dependent layer rendering for SF1JP - SF1 Jenks P0010001 (not density)

Fig 7 - a few of 8912 demographic style choices

There are still issues with a scale rendering approach. Since population is not very homogenous over US coverage extent, scale dependent rendering asks to be variable as well. This is easily visible over population centers. Without some type of pre-calculated density grid, the query is already completed prior to knowledge of the ideal scale dependency. Consequently, static rendering scales have to be tuned to high population urban regions. Since “fly over” US is generally less interesting to analysts, we can likely live with this compromise.

Fig 8 - SF1QD SF1 Quantile Density Census Tract P0010001/geographic Area



Classification schemes

Dividing a value curve to display adequately over a viewport range can be accomplished in a few different ways: equal intervals, equal quantile, jenks natural break optimization, K-means clustering, or “other.” Leaning toward the simpler, I chose a default quantile (guarantees some color) with a ten class single hue progression which of course is not recommended by color brewer. However 10 seems an appropriate number for decennial data. I also included a jenks classifier option which is considered a better representation. The classifier is based only on visible polygons rather than the entire polygon population. This means comparisons region to region are deceptive, but after all this is visualization of statistics.

“There are three kinds of lies: lies, damned lies, and statistics.” Mark Twain

Fig 9 – SF1JP SF1 Jenks Census Tract P0010001 (not density)

In order to manage Census data on a personal budget these compromises are involved:

1. Only expose SF1 demographic data for 2010 i.e. 8912 population value types
2. Only provide primary level polygon hierarchy – state, county, tract, blockgroup, block
3. Code a custom OGC WMS service – rendering GetMap image on the server
4. Resolution scale rendering to limit polygon counts down the polygon hierarchy
5. Provide only quantile and Jenks classifier options
6. Classifier applied only to viewport polygon selection

This is a workable map service for a small number of users. Exposing as an OGC WMS service offers some advantages. First there are already a ton of WMS clients available to actually see the results. Second, the Query, geometry parsing, and image computation (including any required re-projection) are all server side on the same instance reducing network traffic. Unfortunately the downside is that the computation cost is significant and discouraging for a public facing service.

Scaling could be accomplished in a few ways:

1. Vertical scaling to a high memory EC2 R3 instance(s) and a memory tuned PostGIS
2. Horizontal auto scaling to multiple instances with a load balancer
3. Storage scaling with pre-populated S3 tile pyramids for upper extents

Because this data happens to be read only for ten years, scaling is not too hard, as long as there is a budget. It would also be interesting to try some reconfiguration of data into NoSQL type key/value documents with perhaps each polygon document containing the 8912 values embedded along with the geometry. This would cost a bit in storage size but could decrease query times. NoSQL also offers some advantages for horizontal scaling.

Summary

The Census Bureau and its census are obviously not going away. The census is a bureaucracy with a curious inertial life stretching back to the founding of our country (United States Constitution Article 1, section 2). Although static aggregate data is not going to disappear, dynamic real time data has already arrived on stage in numerous and sundry ways from big data portals like Google, to marketing juggernauts like Coca Cola and the Democratic Party, to even more sinister black budget control regimes like the NSA.

Census data won’t disappear. It will simply be superseded.

The real issue for 2020 and beyond is, how to actually intelligently use the data. Already data overwhelms analytic capabilities. By 2030, will emerging AI manage floods of real time data replacing human analysts? If Wall Street still exists, will HFT algos lock in dynamic data pipelines at unheard of scale with no human intervention? Even with the help of tools like R Project perhaps the human end of data analysis will pass into anachronism along with the decennial Census.

Fig 10 - SF1JP SF1 Jenks Census Blocks P0010001

FOSS4G 2011

I was privileged to spend a few days at FOSS4G in Denver. It’s quite the opportunity to have an international convergence of GIS developers handily just a bus ride up the highway. I could walk among the current luminaries of OsGeo, Arnulf Christl, Paul Ramsey, Frank Warmerdam, Simone Giannecchini, and so on and on, and trade a handshake with the hard working front range folks, Peter Batty, Brian Timoney (way to go Brian!), and Matt Kruzemark too. The ideas are percolating in every corner, but it strikes me in an odd way that this world has changed irrevocably. The very technology that makes Geo FOSS possible is also making it, of all things, mundane.

Open Standards

It wasn’t that long ago that OGC standards were brand new. Each release anticipated a new implementation for discovery. Now implementations proliferate helter skelter. OGC services? Take your pick. Or, slice, dice, and merge with MapProxy. Map Tiling engines line up like yogurt brands at Whole Foods. GeoFOSS has progressed a long way from naive amazement over OGC WMS connections. WFS and WCS have already passed their respective moments of glorious novelty.

This is the year of WPS, Web Processing Service, and the hope of constructing webs of analysis chains from disparate nodes. Each node adds a piece toward a final solution. Anyone with some experience using FME Transformers has some idea of what this looks like on the desktop. WPS moves analytic workflows into open standards and onto the web. Ref: 52north.org and GeoServer Ext

FME Workbench
Fig2 – example of FME Workbench processing chain

On the floor at FOSS4G 2011

In a different vein, the next release of PostGIS will push raster analysis right into SQL. This is new territory for PostGIS and elevates a raster datatype to the same level as geometry. This PostGIS Raster Functions page gives some flavor for the raster functions soon to be available when 2.0 releases. Ever needed a raster reprojection?

raster ST_Transform(raster rast, integer srid, double precision scalex, double precision scaley, text algorithm=NearestNeighbor, double precision maxerr=0.125);

Algorithm options are: ‘NearestNeighbor’, ‘Bilinear’, ‘Cubic’, ‘CubicSpline’, and ‘Lanczos.’

Wow ST_MapAlgebra! And on the roadmap MapAlgebra raster on raster! MySQL and SQL Server have quite the feature curve to follow as PostGIS forges ahead.

(See even the slightly jaded can catch some excitement at FOSS4G.)

Numerous workshops dealt with javascript map frameworks OpenLayers, MapQuery, GeoExt, Leaflet. The html5 trend is underlined by news flashes from the concurrent Microsoft Build Conference.

“For the web to move forward and for consumers to get the most out of touch-first browsing, the Metro style browser in Windows 8 is as HTML5-only as possible, and plug-in free. The experience that plug-ins provide today is not a good match with Metro style browsing and the modern HTML5 web.” Steven Sinofsky

CouchDB/GeoCouch was present although “eventually consistent” NoSQL seems less pressing.

Although Oracle is trying hard to break Java, it is still a popular platform among FOSS projects. The Javascript platform is a growth industry with the help of backend tools like Node.js, html5 WebSockets, and Asynchronous WebWorkers. C/C++ takes away wins in the popular performance shootout.

3D WebGL made an appearance with WebGLEarth, while Nasa World Wind still has an occasional adherent.

Open Software

With over 900 attendees, the small world of FOSS seems all of a sudden crowded, jostling through accelerating growth. The Ordnance Survey was represented at a plenary session, “We have to learn to embrace open source. We want to use OS. We want to encourage OS.” Who’d of ever thought? FCC.gov opens their kimono in a big way with data transparency and OS APIs. Work shop topics include such things as: “Open Source Geospatial Software Powering Department of Defense Installation and Environment Business Systems,” or catch this, “The National Geospatial-Intelligence Agency OSS Challenge.” Is this the new Face of FOSS?

Open Bureaucracy?

Such bureaucracies know little of agile. The memorable phrase “embrace, extend, extinguish” comes to mind. But who embraces whom? Is the idea of FOSS and its larger www parent, a trend that eventually extinguishes bureaucracy or does the ancient ground of bureaucratic organization trump connection? Byzantine bureaucracy has been around since … well the Byzantine Empire, and given human nature is likely enduring. But, what does a “flat” Byzantine Bureaucracy look like? How about crowd sourced Byzantia? Would an aging Modernity mourn the loss of “Kafkaesque” as an adjective?

Assuming growth continues, will success reduce the camaraderie of community as a motivation? Just this year we hear of OSM’s Steve Coast sidling into Microsoft, followed a little later by GDAL’s Frank Warmerdam beaming up into the Google mother ship. The corporate penalty, of course, is the loss of personal intellectual property. In other words, will Steve Coast’s imaginative ideas now fall under the rubric, “all your base are belong to us,” and Frank’s enduring legacy recede into the patent portfolio of “Do no evil?” FOSS4G halls are still filled with independent consultants and academics, but a significant corporate representation is evident by this apparent oxymoron, a presentation by ESRI, “Open Source GIS Solutions.”

Some Perspective:

The GIS nervous system grows connections around the world faster than a three year old’s brain. OK, maybe I exaggerate, after all, “During the first three years, your child’s brain establishes about 1,000 trillion nerve connections!” Really, how many connections are there in the World Wide Web. For all its impact on life, our beloved internet encompasses the equivalent of what, a few cubic centimeters of a single infant’s anatomy.

Open Geospatial and FOSS are just a small part of this minor universe, but it’s easy to forget that even ten years ago this all hardly existed. “The first Interoperability Program testbed (Web Mapping Testbed) appeared in 1999.” About the same time Frank Warmerdam started GDAL and the GFOSS engines started.

I still recall the wonder of downloading Sol Katz utilities from BLM’s ftp. The novelty of all this data free to use from the USGS was still fresh and amazing. Sol sadly died before seeing his legacy, but what a legacy. The Sol Katz award this year went to Java Topology Suite’s well deserving Martin Davis.

Map Clipping with Silverlight



Fig 1 – Clip Map Demo

Bing Maps Silverlight Control has a lot of power, power that is a lot of fun to play with even when not especially practical. This weekend I was presented with a challenge to find a way to show web maps, but restricted to a subset of states, a sub-region. I think the person making the request had more in mind the ability to cut out an arbitrary region and print it for reporting. However, I began to think why be restricted to just the one level of the pyramid. With all of this map data available we should be able to provide something as primitive as coupon clipping, but with a dynamic twist.

Silverlight affords a Clip masking mechanism and it should be simple to use.

1. Region boundary:

The first need is to compile the arbitrary regional boundary. This would be a challenge to code from scratch, but SQL Server spatial already has a function called “STUnion.” PostGIS has had an even more powerful Union function for some time, and Paul Ramsey has pointed out the power of fast cascaded unions. Since I’m interested in seeing how I can use SQL Serrver, though, I reverted to the first pass SQL Server approach. But, as I was looking at STUnion it was quickly apparent that this is a simple geo1.STUnion(geo2) function and what is needed is an aggregate union. The goal is to union more than just two geography elements at a time, preferably the result of an arbitrary query.

Fortunately there is a codeplex project, SQL Server Tools, which includes the very thing needed, along with some other interesting functions. GeographyAggregateUnion is the function I need, Project/Unproject and AffineTransform:: will have to await another day. This spatial tool kit consists of a dll and a register.sql script that is used to import the functions to an existing DB. Once in place the functions can be used like this:

SELECT dbo.GeographyUnionAggregate(wkb_geometry)) as Boundary
FROM [Census2009].[dbo].[states]
WHERE NAME = ‘Colorado’ OR NAME = ‘Utah’ or NAME = ‘Wyoming’

Ignoring my confusing choice of geography column name, “wkb_geometry,” this function takes a “geography” column result and provides the spatial union:

Or in my case:


Fig 3 – GeographyUnionAggregate result in SQL Server

Noting that CO, WY, and UT are fairly simple polygons but the result is 1092 nodes I tacked on a .Reduce() function.
dbo.GeographyUnionAggregate(wkb_geometry).Reduce(10) provides 538 points
dbo.GeographyUnionAggregate(wkb_geometry).Reduce(100) provides 94 points
dbo.GeographyUnionAggregate(wkb_geometry).Reduce(100) provides 19 points

Since I don’t need much resolution I went with the 19 points resulting from applying the Douglas-Peuker thinning with a tolerance factor of 1000.

2. Adding the boundary

The next step is adding this union boundary outlining my three states to my Silverlight Control. In Silverlight there are many ways to accomplish this, but by far the easiest is to leverage the builtin MapPolygon control and add it to a MapLayer inside the Map hierarchy:

<m:MapLayer>
  <m:MapPolygon x:Name=”region”
  Stroke=”Blue” StrokeThickness=”5″
    Locations=”37.0003960382868,-114.05060006067 37.000669,-112.540368
    36.997997,-110.47019 36.998906,-108.954404
         .
        .
        .
    41.996568,-112.173352 41.99372,-114.041723
     37.0003960382868,-114.05060006067 “/>
</m:MapLayer>


Now I have a map with a regional boundary for the combined states, CO, WY, and UT.

3. The third step is to do some clipping with the boundary:

UIElement.Clip is available for every UIElement, however, it is restricted to Geometry clipping elements. Since MapPolygon is not a geometry it must be converted to a geometry to be used as a clip element. Furthermore PathGeometry is very different from something as straight forward as MapPolygon, whose shape is defined by a simple LocationCollection of points.

PathGeometry in XAML:


<Canvas.Clip>
  <PathGeometry>
    <PathFigureCollection>
      <PathFigure StartPoint=”1,1″>
        <PathSegmentCollection>
          <LineSegment Point=”1,2″/>
          <LineSegment Point=”2,2″/>
          <LineSegment Point=”2,1″/>
          <LineSegment Point=”1,1″/>
        </PathSegmentCollection>
      </PathFigure>
    </PathFigureCollection>
  </PathGeometry>
</Canvas.Clip>


The easiest thing then is to take the region MapPolygon boundary and generate the necessary Clip PathGeometry in code behind:

  private void DrawClipFigure()
  {
    if (!(MainMap.Clip == null))
    {
     &nbspMainMap.ClearValue(Map.ClipProperty);
    }
    PathFigure clipPathFigure = new PathFigure();
    LocationCollection locs = region.Locations;
    PathSegmentCollection clipPathSegmentCollection = new PathSegmentCollection();
    bool start = true;
    foreach (Location loc in locs)
    {
      Point p = MainMap.LocationToViewportPoint(loc);
      if (start)
      {
       clipPathFigure.StartPoint = p;
       start = false;
     }
     else
     {
      LineSegment clipLineSegment = new LineSegment();
      clipLineSegment.Point = p;
      clipPathSegmentCollection.Add(clipLineSegment);
     }
    }
    clipPathFigure.Segments = clipPathSegmentCollection;
    PathFigureCollection clipPathFigureCollection = new PathFigureCollection();
    clipPathFigureCollection.Add(clipPathFigure);

    PathGeometry clipPathGeometry = new PathGeometry();
    clipPathGeometry.Figures = clipPathFigureCollection;
    MainMap.Clip = clipPathGeometry;
  }

This Clip PathGeometry can be applied to the m:Map named MainMap to mask the underlying Map. This is easily done with a Button Click event. But when navigating with pan and zoom, the clip PathGeometry is not automatically updated. It can be redrawn with each ViewChangeEnd:
private void MainMap_ViewChangeEnd(object sender, MapEventArgs e)
{
  if (MainMap != null)
  {
    if ((bool)ShowBoundary.IsChecked) DrawBoundary();
    if ((bool)ClipBoundary.IsChecked) DrawClipFigure();
  }
}


This will change the clip to match a new position, but only after the fact. The better way is to add the redraw clip to the ViewChangeOnFrame:

MainMap.ViewChangeOnFrame += new EventHandler<MapEventArgs>(MainMap_ViewChangeOnFrame);

private void MainMap_ViewChangeOnFrame(object sender, MapEventArgs e)
{
  if (MainMap != null)
  {
    if ((bool)ShowBoundary.IsChecked) DrawBoundary();
    if ((bool)ClipBoundary.IsChecked) DrawClipFigure();
  }
}


In spite of the constant clip redraw with each frame of the navigation animation, navigation is smooth and not appreciably degraded.

Summary:

Clipping a map is not terrifically useful, but it is supported with Silverlight Control and provides another tool in the webapp mapping arsenal. What is very useful, are the additional functions found in SQL Server Tools. Since SQL Server spatial is in the very first stage of its life, several useful functions are not found natively in this release. It is nice to have a superset of tools like GeographyAggregateUnion, Project/Unproject,
and AffineTransform::.

The more generalized approach would be to allow a user to click on the states he wishes to include in a region, and then have a SQL Server query produce the boundary for the clip action from the resulting state set. This wouldn’t be a difficult extension. If anyone thinks it would be useful, pass me an email and I’ll try a click select option.



Fig 4 – Clip Map Demo

Hauling Out the Big RAM

Amazon released a handful of new stuff.

“Make that a Quadruple Extra Large with room for a Planet OSM”

Big Mmeory
Fig 1 – Big Foot Memory

1. New Price for EC2 instances

US EU
Linux Windows SQL Linux Windows SQL
m1.small $0.085 $0.12 $0.095 $0.13
m1.large $0.34 $0.48 $1.08 $0.38 $0.52 $1.12
m1.xlarge $0.68 $0.96 $1.56 $0.76 $1.04 $1.64
c1.medium $0.17 $0.29 $0.19 $0.31
c1.xlarge $0.68 $1.16 $2.36 $0.76 $1.24 $2.44

Notice the small instance, now $0.12/hr, matches Azure Pricing

Compute = $0.12 / hour

This is not really apples to apples since Amazon is a virtual instance, while Azure is per deployed application. A virtual instance can have multple service/web apps deployed.

2. Amazon announces a Relational Database Service RDS
Based on MySQL 5.1, this doesn’t appear to add a whole lot since you always could start an instance with any database you wanted. MySQL isn’t exactly known for geospatial even though it has some spatial capabilities. You can see a small comparison of PostGIS vs MySQL by Paul Ramsey. I don’t know if this comparison is still valid, but I haven’t seen much use of MySQL for spatial backends.

This is similar to Azure SQL Server which is also a convenience deployment that lets you run SQL Server as an Azure service, without all the headaches of administration and maintenance tasks. Neither of these options are cloud scaled, meaning that they are still single instance versions, not cross partition capable. SQL Azure Server CTP has an upper limit of 10Gb, as in hard drive not RAM.

3. Amazon adds New high memory instances

  • High-Memory Double Extra Large Instance 34.2 GB of memory, 13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform $1.20-$1.44/hr
  • High-Memory Quadruple Extra Large Instance 68.4 GB of memory, 26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform $2.40-$2.88/hr

These are new virtual instance AMIs that scale up as opposed to scale out. Scaled out options use clusters of instances in the Grid Computing/Hadoop type of architectures. There is nothing to prohibit using clusters of scaled up instances in a hybridized architecture, other than cost. However, the premise of Hadoop arrays is “divide and conquer,” so it makes less sense to have massive nodes in the array. Since scaling out involves moving the problem to a whole new parallel programming paradigm with all of its consequent complexity, it also means owning the code. In contrast scaling up is generally very simple. You don’t have to own the code or even recompile just install on more capable hardware.

Returning us back to the Amazon RDS, Amazon has presumably taken an optimized compiled route and offers prepackaged MySQL 5.1 instances ready to use:

  • db.m1.small (1.7 GB of RAM, $0.11 per hour).
  • db.m1.large (7.5 GB of RAM, $0.44 per hour)
  • db.m1.xlarge (15 GB of RAM, $0.88 per hour).
  • db.m2.2xlarge (34 GB of RAM, $1.55 per hour).
  • db.m2.4xlarge (68 GB of RAM, $3.10 per hour).

Of course the higher spatial functionality of PostgreSQL/PostGIS can be installed on any of these high memory instances as well. It is just not done by Amazon. The important thing to note is memory approaches 100Gb per instance! What does one do with all that memory?

Here is one use:

“Google query results are now served in under an astonishingly fast 200ms, down from 1000ms in the olden days. The vast majority of this great performance improvement is due to holding indexes completely in memory. Thousands of machines process each query in order to make search results appear nearly instantaneously.”
Google Fellow Jeff Dean keynote speech at WSDM 2009.

Having very large memory footprints makes sense for increasing performance on a DB application. Even fairly large data tables can reside entirely in memory for optimum performance. Whether a database makes use of the best optimized compiler for Amazon’s 64bit instances would need to be explored. Open source options like PostgreSQL/PostGIS would let you play with compiling in your choice of compilers, but perhaps not successfully.

Todd Hoff has some insightful analysis in his post, “Are Cloud-Based Memory Architectures the Next Big Thing?”

Here is Todd Hoff’s point about having your DB run inside of RAM – remember that 68Gb Quadruple Extra Large memory:

“Why are Memory Based Architectures so attractive? Compared to disk, RAM is a high bandwidth and low latency storage medium. Depending on who you ask the bandwidth of RAM is 5 GB/s. The bandwidth of disk is about 100 MB/s. RAM bandwidth is many hundreds of times faster. RAM wins. Modern hard drives have latencies under 13 milliseconds. When many applications are queued for disk reads latencies can easily be in the many second range. Memory latency is in the 5 nanosecond range. Memory latency is 2,000 times faster. RAM wins again.”

Wow! Can that be right? “Memory latency is 2,000 times faster .”

(Hmm… 13 milliseconds = 13,000,000 nanoseconds
so 13,000,000n/5n = 2,600,000x? And 5Gb/s / 100Mb/s = 50x? Am I doing the math right?)

The real question, of course, is what will actual benchmarks reveal? Presumably optimized memory caching narrows the gap between disk storage and RAM. Which brings up the problem of configuring a Database to use large RAM pools. PostgreSQL has a variety of configuration settings but to date RDBMS software doesn’t really have a configuration switch that simply caches the whole enchilada.

Here is some discussion of MySQL front-ending the database with In-Memory-Data-Grid (IMDG).

Here is an article on a PostgreSQL configuration to use a RAM disk.

Here is a walk through on configuring PostgreSQL caching and some PostgreSQL doc pages.

Tuning for large memory is not exactly straightforward. There is no “one size fits all.” You can quickly get into Managing Kernel Resources. The two most important parameters are:

  • shared_buffers
  • sort_mem
“As a start for tuning, use 25% of RAM for cache size, and 2-4% for sort size. Increase if no swapping, and decrease to prevent swapping. Of course, if the frequently accessed tables already fit in the cache, continuing to increase the cache size no longer dramatically improves performance.”

OK, given this rough guideline on a Quadruple Extra Large Instance 68Gb:

  • shared_buffers = 17Gb (25%)
  • sort_mem = 2.72Gb (4%)

This still leaves plenty of room, 48.28Gb, to avoid dreaded swap pagein by the OS. Let’s assume a more normal 8Gb memory for the OS. We still have 40Gb to play with. Looking at sort types in detail may make adding some more sort_mem helpful, maybe bump to 5Gb. Now there is still an additional 38Gb to drop into shared_buffers for a grand total of 55Gb. Of course you have to have a pretty hefty set of spatial tables to use up this kind of space.

Here is a list of PostgreSQL limitations. As you can see it is technically possible to run out of even 68Gb.


Limit

Value
Maximum Database Size Unlimited
Maximum Table Size 32 TB
Maximum Row Size 1.6 TB
Maximum Field Size 1 GB
Maximum Rows per Table Unlimited
Maximum Columns per Table 250 – 1600 depending on column types
Maximum Indexes per Table Unlimited

Naturally the Obe duo has a useful posting on determining PostGIS sizes: Determining size of database, schema, tables, and geometry

To get some perspective on size an Open Street Map dump of the whole world fits into a 90Gb EBS Amazon Public Data Set configured for PostGIS with pg_createcluster. Looks like this just happened a couple weeks ago. Although 90Gb is just a little out of reach for a for even a Quadruple Extra Large, I gather the current size of planet osm is still in the 60Gb range and you might just fit it into 55Gb RAM. It would be a tad tight. Well maybe the Octuple Extra Large Instance 136Gb instance is not too far off. Of course who knows how big Planet OSM will ultimately end up being.
See planet.openstreetmap.org

Another point to notice is the 8 virtual cores in a Quadruple Extra Large Instance. Unfortunately

“PostgreSQL uses a multi-process model, meaning each database connection has its own Unix process. Because of this, all multi-cpu operating systems can spread multiple database connections among the available CPUs. However, if only a single database connection is active, it can only use one CPU. PostgreSQL does not use multi-threading to allow a single process to use multiple CPUs.”

Running a single connection query apparently won’t benefit from a multi cpu virtual system, even though running multi threaded will definitely help with multiple connection pools.

I look forward to someone actually running benchmarks since that would be the genuine reality check.

Summary

Scaling up is the least complex way to boost performance on a lagging application. The Cloud offers lots of choices suitable to a range of budgets and problems. If you want to optimize personnel and adopt a decoupled SOA architecture, you’ll want to look at Azure + SQL Azure. If you want the adventure of large scale research problems, you’ll want to look at instance arrays and Hadoop clusters available in Amazon AWS.

However, if you just want a quick fix, maybe not 2000x but at least a some x, better take a look at Big RAM. If you do, please let us know the benchmarks!

Neo vs Paleo Geography

Jurassic Dinosaur skeleton
Fig 1 – Paleo, Neo – what about Jurassic Geography?

I gather that there is some twittering about neo versus paleo geography. See Peter Batty’s blog entry or James Fee’s blog. I myself don’t Twitter, but in general I’m happy for Peter’s paleo accomodation of the non twitterers, repeating the conversation in a blog entry. Peter has also updated comments with a new post questioning, “Are we now in a post neogeography era?” The dreaded paradigm shifts are coming fast and furiously.

I am not really able to comment on neo vs paleo as I myself fall even further back into “Jurassic Geography.” Looking at connotations we have this accounting:

····neo - 1990 – present, new, recent, different, Obama, Keynesian, Apple, Google Earth, Cloud, Java C# RubyRails Twitter

····paleo - as in paleolithic 2.8m – 10k old, prehistoric, ancient, early, primitive, Nixon, Supply Side, Microsoft, Windows Desktop, ESRI Arc???, C C++ Javascript telephone

Obviously the “paleo” label is not carried with quite the honor of “neo.” It’s reminiscent of the Galen / Myers-Brigg personality typology characterized as Lion, Otter, Beaver, and Golden Retriever. What do you want to be? Obviously not the beaver, but there has to be a significant part of the world in that category, like it or not. After all what would lions eat for dinner without a few of us beavers? Likewise there is a significant branch of paleo in the GIS kingdom.

However, in the pre-paleolithic era there are still a few of us left, falling into the “long tail” of the Jurassic. So carrying on down the connotation stream here is the Jurassic geography equivalent:

····jurassic – 206m-144m dinosaurs, fossils, pre paleolithic, Hoover, laissez faire, IBM Big Iron, Assembly Cobol, open source

Wait “Open Source” – Jurassic Geography? How did that get in there? The notoriously frugal days of Hoover never made it into the paleolithic era’s “Supply Side” economy. It’s Keynesian economics all over the neo world, so Jurassic geography is the frugal end of the spectrum and how can you get more frugal than free! Obviously Open Source is as Jurassic as they come in Geography circles.

As I’ve recently been in a gig hunting mode, I’ve been having quite a few in depth conversations about GIS stacks. As a small businessman outside the corporate halls of paleo geography, I’ve had few occasions to get an in depth education on the corporate pricing world. So I spent the past couple of days looking into it.

Let’s start at the bottom of the stack. Here is some retail pricing on a few popular GIS databases:

  • Oracle Standard Spatial $17,500 + $3850 annual
  • Oracle Enterprise Locator $47,500 + $10,450 annual
  • SQL Server 2008 Web edition ~ $3500
  • PostgreSQL/PostGIS $0.00

If you’re a Jurassic geographer which do you choose? Probably not Oracle Enterprise Locator. If your Paleo you look at that and think, “Man, I am the negotiator! We don’t pay any of that retail stuff for the masses.” Neo? – well how would I know how a neo thinks?

Next take a look at the middle tier:

  • ESRI ArcGIS Server standard workgroup license
    ····Minimum $5000 2cores + $1250 2core annual
    ····Additional cores $2500/core + $625/core annual
  • ESRI ArcGIS hosted application server license
    ····Minimum $40,000 4 cores + $10,000 4 core annual
    ····Additional cores $10,000/core + $2500/core annual
  • OWS GeoServer or MapServer minimum $0 + annual $0
    But, this is GIS isn’t it? We want some real analytic tools not just a few hundred spatial functions in JTS Topology suite. OK, better throw in a few QGIS or GRASS installs and add a few $0s to the desktop production. Oh, and cores, we need some, “make that a 16core 64 bit please” – plus $0.

I think you catch the Jurassic drift here. How about client side.

  • ESRI Silverlight free, well sort of , if you’re a developer, NGO, educational, or non-profit otherwise take a look at that ArcGIS license back a few lines.
  • Google API it’s Neo isn’t it? $10k per annum for a commercial use, maybe its Paleo after all.
  • Virtual / Bing Maps api $8k per annum transaction based and in typical license obfuscation fashion impossible to predict what the final cost will be. Paleo, “Just send me the invoice.”
  • OpenLayers is a javascript api client layer too, just solidly Jurassic at $0
  • Silverlight well it can be Jurassic, try DeepEarth over at codeplex or MapControl from Microsoft with the Bing imageservice turned off, OSM on.

It’s been an interesting education. Here is the ideal Jurassic GIS stack:
Amazon EC2 Windows instance + PostGIS database + GeoServer OWS + IIS Silverlight MapControl client
The cost: starts at $100/mo(1 processor 1.7Gb 32bit) up to $800/mo(4 processor 15Gb 64bit)

So what does a Jurassic geographer get in this stack?

Hardware:
Amazon Cloud based virtual server, S3 Backup, AMI image replication, Elastic IP, AWS console, choice of OS, cores, memory, and drive space. Ability to scale in a big way with Elastic load balancing, auto scaling, and CloudWatch monitoring. Performance options like CloudFront edge service or something experimental like Elastic MapReduce Hadoop clusters.

Database:
PostgreSQL/PostGIS – Standards compliant SQL server with GIST spatial indexing on OGC “Simple Features for SQL” specification compliant geometry with extended support for 3DZ, 3DM and 4D coordinates. A full set of roughly 175 geometry, management, and spatial functions. It supports almost all projections. All this and performance? maybe a little vague but not shabby:

“PostGIS users have compared performance with proprietary databases on massive spatial data sets and PostGIS comes out on top.”

Middle Tier:
Geoserver – standards compliant OWS service for WMS, WFS, WCS.
Data sources: Shapefile, Shapefile Directory, PostGIS, external WFS, ArcSDE, GML, MySQL, Oracle, Oracle NG, SQL Server, VPF
Export formats: WFS GML, KML, SVG, PDF, GeoRSS, Png, Jpeg, Geotiff, OGR Output – MapInfo Tab and MID/MIF, Shp, CSV, GeoJSON …
OGC standard SLD styling, built in gwc tile caching – seeded or as needed, managed connection pools, RESTful configuration api, and ACEGI integrated security.

WCS adds :

  1. ArcGrid – Arc Grid Coverage Format
  2. ImageMosaic – Image mosaicking plugin
  3. WorldImage – A raster file accompanied by a spatial data file
  4. Gtopo30 – Gtopo30 Coverage Format
  5. GeoTIFF – Tagged Image File Format with Geographic information
“GeoServer is the reference implementation of the Open Geospatial Consortium (OGC) Web Feature Service (WFS) and Web Coverage Service (WCS) standards, as well as a high performance certified compliant Web Map Service (WMS). “

Browser client viewer:
Take your pick here’s a few:

Summary:
Well in these economic times Jurassic may in fact meet Neo. The GIS world isn’t flat and Jurassic going right eventually meets Neo going left, sorry Paleos. Will Obama economics turn out to be Hooverian in the end? Who knows, but here’s a proposition for the Paleos:

Let me do a GIS distribution audit. If I can make a Jurassic GIS Stack do what the Paleo stack is currently providing, you get to keep those annual Paleo fees from here to recovery. How about it?

rkgeorge @cadmaps.com

mxd to SLD – ArcMap2SLD


ArcMap2SLD screenshot
Fig 1 – ArcMap2SLD tool running with ArcMap 9.3

I had some more time to work out the ArcMap2SLD script tool.

Using an mxd in ArcMap 9.3, I first made sure all of my shape layers were connected. Then I ran the ArcMap2SLD tool which requires a session of ArcMap to be running.

This is a simple ArcMap project with a set of shp layers. It takes about 5-10 minutes to run through all the layers. The analysis seems to read through each symbol in a layer and then to loop through every record. After a few minutes it asks again for a file to store the sld and now the test.sld is ready.

The resulting SLD is a single file with all of the Feature types and rules listed together:

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<sld:StyledLayerDescriptor version="1.0.0"
xmlns:sld="http://www.opengis.net/sld"
xmlns:ogc="http://www.opengis.net/ogc"
xmlns:xlink="http://www.w3.org/1999/xlink">
  <sld:NamedLayer>
    <sld:Name>ovroads</sld:Name>
    <sld:UserStyle>
      <sld:Name>Style1</sld:Name>
      <sld:FeatureTypeStyle>
        <sld:FeatureTypeName>ovroads</sld:FeatureTypeName>
        <sld:Rule>
          <sld:Name>Interstates</sld:Name>
          <sld:Title>Interstates</sld:Title>
          <ogc:Filter>
            <ogc:PropertyIsEqualTo>
              <ogc:PropertyName>RDCLS</ogc:PropertyName>
              <ogc:Literal>010</ogc:Literal>
            </ogc:PropertyIsEqualTo>
          </ogc:Filter>
          <sld:LineSymbolizer>
            <sld:Stroke>
              <sld:CssParameter name="stroke">#000000</sld:CssParameter>
              <sld:CssParameter name="stroke-width">0.8</sld:CssParameter>
              <sld:CssParameter name="stroke-opacity">1</sld:CssParameter>
            </sld:Stroke>
          </sld:LineSymbolizer>
          <sld:LineSymbolizer>
            <sld:Stroke>
              <sld:CssParameter name="stroke">#FFFB86</sld:CssParameter>
              <sld:CssParameter name="stroke-width">2.6</sld:CssParameter>
              <sld:CssParameter name="stroke-opacity">1</sld:CssParameter>
            </sld:Stroke>
          </sld:LineSymbolizer>
          <sld:LineSymbolizer>
            <sld:Stroke>
              <sld:CssParameter name="stroke">#000000</sld:CssParameter>
              <sld:CssParameter name="stroke-width">3.4</sld:CssParameter>
              <sld:CssParameter name="stroke-opacity">1</sld:CssParameter>
            </sld:Stroke>
          </sld:LineSymbolizer>
        </sld:Rule>
            .
            .

Copying a subset for the roads layer into GeoServer style did not produce any features the first time around.

First the Filter PropertyName is case sensitive. The loading batch I used produces lower case field names in the PostGIS tables. As a result the RDCLS property name had to be changed to rdcls through out the SLD.

If you notice in the above SLD output, the Geometry reference is missing:
    <Geometry><ogc:PropertyName>the_geom</ogc:PropertyName></Geometry>

Also missing are scale elements:
    <MinScaleDenominator>32000</MinScaleDenominator>
    <MaxScaleDenominator>20000000</MaxScaleDenominator>

After the necessary modification the SLD rules look like this:

        <Rule>
          <Name>U S Highways</Name>
          <Title>U S Highways</Title>
          <ogc:Filter>
            <ogc:PropertyIsEqualTo>
              <ogc:PropertyName>rdcls</ogc:PropertyName>
              <ogc:Literal>014</ogc:Literal>
            </ogc:PropertyIsEqualTo>
          </ogc:Filter>
          <MinScaleDenominator>32000</MinScaleDenominator>
          <MaxScaleDenominator>20000000</MaxScaleDenominator>
          <LineSymbolizer>
            <Geometry>
              <ogc:PropertyName>the_geom</ogc:PropertyName>
            </Geometry>
            <Stroke>
              <CssParameter name="stroke">#FA3411</CssParameter>
              <CssParameter name="stroke-width">3.4</CssParameter>
              <CssParameter name="stroke-opacity">1</CssParameter>
            </Stroke>
          </LineSymbolizer>
        </Rule>

This still requires some manual intervention, but it does handle setting Filters, stroke color, stroke-width, and stroke-opacity parameters with a text editor along with cut & paste. It handled the PolygonSymbolizer in the mxd as well.

This is a time saver, however, the appearance didn’t seem to match in the resulting SLD tilesources. It appears as though some of the SLD stroke widths end up masking underlying strokes so that for example the purple highway symbolization does not appear in the SLD tilesource version.

ArcMap2SLD screenshot
Fig 2 – Silverlight tile source with new SLD

After some examination I realized that multiple LineSymbolizations were in reverse order. The thickest line width needs to be first with thinner stroke widths later in the SLD file, because rendering is in file order. Here is an example of the final resulting SLD LineSymbolization with corrected ordering:

        <Rule>
          <Name>Boulder Turnpike</Name>
          <Title>Boulder Turnpike</Title>
          <ogc:Filter>
            <ogc:PropertyIsEqualTo>
              <ogc:PropertyName>rdcls</ogc:PropertyName>
              <ogc:Literal>016</ogc:Literal>
            </ogc:PropertyIsEqualTo>
          </ogc:Filter>
          <MinScaleDenominator>32000</MinScaleDenominator>
          <MaxScaleDenominator>20000000</MaxScaleDenominator>
          <LineSymbolizer>
            <Geometry><ogc:PropertyName>the_geom</ogc:PropertyName></Geometry>
            <Stroke>
              <CssParameter name="stroke">#000000</CssParameter>
              <CssParameter name="stroke-width">3.4</CssParameter>
              <CssParameter name="stroke-opacity">1</CssParameter>
            </Stroke>
          </LineSymbolizer>

          <LineSymbolizer>
            <Geometry><ogc:PropertyName>the_geom</ogc:PropertyName></Geometry>
            <Stroke>
              <CssParameter name="stroke">#FFFB86</CssParameter>
              <CssParameter name="stroke-width">2.6</CssParameter>
              <CssParameter name="stroke-opacity">1</CssParameter>
            </Stroke>
          </LineSymbolizer>

          <LineSymbolizer>
            <Geometry><ogc:PropertyName>the_geom</ogc:PropertyName></Geometry>
            <Stroke>
              <CssParameter name="stroke">#000000</CssParameter>
              <CssParameter name="stroke-width">0.8</CssParameter>
              <CssParameter name="stroke-opacity">1</CssParameter>
            </Stroke>
          </LineSymbolizer>
        </Rule>

ArcMap2SLD screenshot
Fig 3 – Silverlight tile source with new SLD with corrected order

Summary

This tool appears to be a helpful start for translating mxd style to sld. The result gives colors and basic widths etc but requires several modifications:

  1. match case to the PostGIS table
  2. add Geometry elements
  3. add min and max scale denominators
  4. reverse order of multiple LineSymbolizations

It’s still a good start.

Input stack – it keeps growing!

NorthMetro Silverlight screenshot
Fig 1 – port of ArcView project to Open source stack

Well what to do? There are a whole lot of new items on the list just in the past week or so.

Here is an input stack of interesting things to try out and write about:

1. REST WCF Starter Kit download preview 2

2. Silverlight 3

3. Silverlight Toolkit July release and Sample page

4. ESRI’s Silverlight api resources gallery announcements toolkit


ESRI ArcGIS screenshot
Fig 2 – ArcGIS Silverlight – WMS viewer

5. ESRI MapIt

6. Porting ESRI ArcMap project to Open Source stack –
    PostGIS + GeoServer +Silverlight Map Control

I put my name down for a possible unconference talk at Wherecamp5280 coming up in Aug. In case it interests folks at the unconference, I decided to put together some notes here on my latest little project.

Open source GIS stacks do a great job of exposing ordinary GIS functionality. For web applications they do as well as anything out there from Oracle, Microsoft, or even ESRI. I use PostGIS and GeoServer all the time. Now that Microsoft’s Silverlight MapControl is available the client side is also easy to create. Generally porting a project involves just a few steps:
a. load all the shape files to PostGIS tables – ogr2ogr, shp2pgsql
b. create the WMS service layers in GeoServer
c. create the client using, my preference at the moment, Silverlight MapControl

Step a. is fairly straightforward. There are a lot of good tools including shp2pgsql and ogr2ogr. Here are a couple of example loading commands:
shp2pgsql -s 4326 -d -I CityLimits.shp citylimits | psql -h localhost -d City -U username

ogr2ogr -overwrite -nln tract -a_srs EPSG:4326 -f PostgreSQL PG:”user=username dbname=City host=localhost password=password port=5432″ tr08_d00.shp

Loading a set of shape files is normally a matter of writing a batch script. If there are some complicating factors then it is also easy enough to create a simple custom Java or C# loader specific to a particular job. In a more involved ArcSDE environment, you could theoretically skip the intermediary .shp and use table dumps from the backing DB. The complication there is doing the due diligence to get the DB idiosynchrosies mapped from a commercial license DB to PostgreSQL, which is generally more standards compliant.

Step b. above is the hardest part at present. The GeoServer RESTful configuration service extension promises to make life a bit more developer friendly. At least you can build Post and Put tools to add layers and styles in your own program. But, the really sticky part is translating all of the numerous styling, legend, range,.. etc parameters from an ESRI mxd project to SLD (or in the case of TatukGIS ttkgp files). SLD is a powerful style description language which can do just about any kind of styling you can think of, but it seems to be lacking interactive tools. It will be nice to see layer sld editors with popup colors, selectable view ranges, filter query generators, captions, auto legend generator etc. Perhaps it already exists and I’ve just missed it?

At any rate to do this manually with a text editor is not impossible, but very tedious. It requires creating sets of FeatureTypeStyle Rules with filters, symbolizers, min and max scale denominators, labels etc. This can get quite involved. For example, layers containing roads with numerous classifications to arrange in the SLD require some fairly involved xml plus a lot of cut & paste. Putting together icon symbols can get tedious too. ESRI mxd, like most things ESRI, is not exactly open format, but some diligent folks have done a bit of Arc scripting. On my input stack is trying out ArcMap2SLD to see how effectively it can be used to create the SLD side of things.
·· ·mxd to SLD – ArcMap2SLD
Well I tried out the ArcMap2SLD for a Arc9.3 project but received an error in German so it’s manual mode for the present.

Once GeoServer FeatureTypes are configured for a WMS, step c. is just wrapping the WMS calls inside a Silverlight MapControl. There are two ways to approach this. First, the one I like, is using MapTileLayer.TileSources for the various layers. The tiles are sourced from the geowebcache, gwc, included with newer versions of GeoServer. A second approach is to create a GetMap WMS request with explicit BBox. They both work fine, I just kind of like the dynamic zoom in affect of MultiScale tile sources as they spiral down the quadkey.

Silverlight 3 is out now so I installed the latest and then ran the new Silverlight Client through the update process in VisualStudio. It worked without a problem. I also downloaded the latest version of Silverlight Toolkit. This tool kit has a lot of nice controls to put to use. Using a control is pretty simple, just reference in the library dlls and start adding controls:

<controlsToolkit:Expander ExpandDirection="Down" Header="Query" Foreground="White" Margin="2">
    <controlsToolkit:Expander.Content>
                 .
                 .
    </controlsToolkit:Expander.Content>
</controlsToolkit:Expander>

Want to change the entire page style? Just add one of the dozen theme styles under root to wrap the whole page: <expressionDark:ExpressionDarkTheme>

It’s convenient to include a base map tile source using OSM, but if this is an offline situation you can use a road layer of your own. Those with need of a high level background map including aerial imagery can use the Bing Maps tile service, license required. Silverlight MapControl cannot quite be offline yet. Looking at the request response traffic with Fiddler shows a couple of requests to the Microsoft server. Since I turned off Bing Maps in this project it does nothing more than confirm that their server is online and has a crossdomain.xml. If you are offline it throws up a white box message indicating “service is unavailable. try again later.” The message unfortunately is sticky and stays with the viewport regardless of zooming panning around your local tile sources.

The custom code for Silverlight is easy to create using C# which I’m learning to really like. One great feature compared to Javascript client code is performance. I was amazed the first time I brought in a list of a thousand ellipse shapes, no change in zoom and pan performance at all.

I then added a few specific ‘Find’ and ‘Query’ functions using some of the spatial functions in PostGIS. Double click FeatureInfo requests drill through the visible layer stack creating an expandable tree view for table attribute fields.

The end result, for those for whom this matters, is a complete online map viewer for your ArcMap project minus any license cost or hassles.

One caveat I should mention, the Silverlight MapControl is still CTP. Final release ETA is unknown at present, so work is currently subject to change and revision once final release is out.

NorthMetro Silverlight screenshot
Fig 3 – port of ESRI to Open source stack

Back to the input stack.

7. Amazon ECommerce Service API / Google Books Data API / LibraryThing API cover images article
  a. Build a book cover image tile pyramid
  b. use ranking attributes for proximity algorithms
    1. sales rank
    2. category hierarchy rankings
    3. ‘also bought’ bot chains
  c. add a GeoServer stack and Silverlight MapControl TileSource to zoom around book cover image
  d. add a 3D WPF terrain mesh based on elevation attributes such as rank …..
  e. try a Silverlight 3 perspective Transform on a spine to cover animation

8. Add a line draw and profile view window to the LiDAR Silverlight client
  a. try a Silverlight 3 Perspective 3d Transform to follow a profile around an arbitrary LiDAR profile polyline. This is a non standard extension to the WMS request spec, which allows a client to grab a profile view out of the point cloud. Pretty interesting stuff since you can see ghostly building point cloud shapes rising out of the terrain.

Heat Map Image
Fig 4 – LiDAR Server profile

9. Look at heatmap applications for emergency response analysis.


Heat Map Image
Fig 5 – Heat Map

10. Find a gig – resume Unfortunately, my last line item needs to be moved up to the top of the input stack. My primary client was forced to drop their software division due to the economic climate and it looks like I will need to focus on a new gig … but there are so many other interesting things to try out . . . If anyone out there needs a GIS WEB developer please shoot me an email, thanks!

Open up that data, Cloud Data

 James Fee looks at AWS data and here is the Tiger .shp snapshot James mentions: Amazon TIGER snapshot
More details here: Tom MacWright

Too bad it is only Linux/Unix since I’d prefer to attach to a Windows EC2. TIGER is there as raw data files ready to attach to your choice of Linux EC2. As is Census data galore.

But why not look further?  It’s interesting to think about other spatial data out in the Cloud.

Jeffrey Johnson adds a comment to spatially adjusted about OSM with the question – what form a pg_dump or a pg database? This moves a little beyond raw Amazon public data sets.

Would it be possible to provide an EBS volume with data already preloaded to PostGIS? A user could then attach the EBS ready to use. Adding a middle tier WMS/WFS like GeoServer or MapServer can tie together multiple PG sources, assuming you want to add other pg databases.

Jeffrey mentions one caveat about the 5GB S3 limit. Does this mark the high end of a snapshot requiring modularized splitting of OSM data? Doesn’t sound like S3 will be much help in the long run if OSM continues expansion.

What about OpenAerial? Got to have more room for OpenAerial and someday OpenTerrain(LiDAR)!
EBS – volumes from 1 GB to 1 TB. Do you need the snapshot (only 5GB) to start a new EBS? Can this accommodate OpenAerial tiles, OpenLiDAR X3D GeoElevationGrid LOD. Of course we want mix and match deployment in the Cloud.

Would it be posible for Amazon to just host the whole shebang? What do you think Werner?

Put it out there as an example of an Auto Scaling, Elastic Load Balancing OSM, OpenAerial tile pyramids as CloudFront Cache, OpenTerrain X3D GeoElevationGrid LOD stacks. OSM servers are small potatoes in comparison. I don’t think Amazon wants to be the Open Source Google, but with Google and Microsoft pushing into the Cloud game maybe Amazon could push back a little in the map end.

I can see GeoServer sitting in the middle of all this data delight handing out OSM to a tile client where it is stacked on OpenAerial, and draped onto OpenTerrain. Go Cloud, Go!

Silverlight MapControl and PostGIS


PostGIS Silverlight MapControl
Fig 1 – PostGIS and Silverlight MapControl

It is worth revisiting connecting Silverlight MapControl to a database in order to try duplicating the SQL Server layers in PostgreSQL/PostGIS.

Why?
SQL Server 2008 is still lacking some of the spatial capabilities found in PostGIS and also a rough performance comparison between the two databases would be interesting.

The same process is used in accessing a database from inside the Silverlight MapController for either SQL Server or PostGIS. A Data service is used with an [OperationContract] to expose the results of a DB query to the Silverlight through a Service Reference on the Silverlight side. From there it is a matter of using Silverlight MapController events to send a request to the service and process the results into shapes on the map.

The main difference is the code to handle the PostGIS connection, command query, and result reader. Microsoft includes these for SQL Server in the System.Data.SqlClient namespace as part of a default project setup. Naturally PostGIS doesn’t happen to be in the Microsoft System.Data.dll, however, there is a nice project that provides access to PostgreSQL inside .NET: Npgsql – .Net Data Provider for Postgresql

I downloaded Npgsql version 2.04 binaries and referenced it into my previous GeoTest. Now I can add a new [OperationContract] to the existing SFBayDataService that uses an Npgsql connection to PostgreSQL. Since it would be interesting to compare SQL Server and PostgreSQL/PostGIS, my new OperationContract includes the ability to switch from one to the other.
   [OperationContract]
  public List<string> GetGeom(string dbtype, string table, string bbox, string layer)

dbtype can be ‘PostGIS’ or ‘SQL Server’. I also refactored to allow selection of table and layer as well as a single string bbox.

Morten Nielsen has a blog on using npgsql here: SharpMap Shp2pgsql He used npgsql to create a nice little import tool to load shp format data into PostGIS with a GUI similar to the SQL Server shp loader. I used it on the SF Bay test data after modifying by adding a necessary System.Int64 type to the Type2PgType list and also to the Type2NpgsqlType. It then worked fine for loading my three simple test tables.

I can use npgsql to open a connection and add a query command for a PostGIS database:
   NpgsqlConnection conn = new NpgsqlConnection(“Server=127.0.0.1;Port=5432;User Id=your id;Password=your password;Database=TestShp;”);
   conn.Open();
   NpgsqlCommand command = new NpgsqlCommand(“Select *, asText(the_geom) as geom from \”" + table + “\” WHERE the_geom && SetSRID(‘BOX3D(” + bbox + “)’::box3d,4326);”, conn);

Using the npgsql reader lets me process through the query results one record at a time:
  NpgsqlDataReader rdr = command.ExecuteReader();
  while (rdr.Read()) {…}

which leads to a decision. SQL Server lets me grab a geom field directly into a geography data type:
   SqlGeography geo = (SqlGeography)rdr["geom"];
but npgsql has no equivalent convenience. Rather than work out the WKB stream I took the easy way out and converted geom on the PostGIS query like this: asText(the_geom) as geom
Now I can take the text version of the geometry and put it back together as a result string for my Silverlight MapControl svc_GetGeomCompleted.

[OperationContract]
public List GetGeom(string dbtype, string table, string bbox, string layer)
{
    List records = new List();

    if (dbtype.Equals("SQL Server"))
    {
        SqlServer ss = new SqlServer(layer, table, bbox);
        records = ss.Query();
     }
    else if (dbtype.Equals("PostGIS"))
    {
        PostGIS pg = new PostGIS(layer, table, bbox);
        records = pg.Query();
    }
    return records;
}

Listing 1 – Data Service Operation Contract

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using Npgsql;
using System.Text.RegularExpressions;
using System.Text;
using System.Configuration;

namespace GeoTest.Web
{
 public class PostGIS
 {
  private string _layer;
  private string _bbox;
  private string _table;

  public PostGIS(string layer, string table,string bbox )
  {
   this._layer = layer;
   this._table = table;
   this._bbox = bbox;
  }

  public List Query()
  {
    StringBuilder sb;
    List records = new List();
    string[] fields;
    string geomType;
    string connStr = ConfigurationManager.AppSettings["PGconnecttionStr"];
    NpgsqlConnection conn = new NpgsqlConnection(connStr);
    conn.Open();
    NpgsqlCommand command = new NpgsqlCommand("Select *, asText(the_geom) as geom from \"" +
 _table + "\" WHERE the_geom && SetSRID('BOX3D(" + _bbox + ")'::box3d,4326);", conn);
    try
    {
     NpgsqlDataReader rdr = command.ExecuteReader();
     while (rdr.Read())
     {
      sb = new StringBuilder(_layer + ";");
      for (int i = 0; i < rdr.FieldCount - 2; i++)
      {
       sb.Append(rdr[i].ToString() + ";");
      }
      fields = Regex.Split(rdr["geom"].ToString(), @"\(+(.*?)\)+");
      geomType = fields[0];
      switch (geomType) {
       case "POINT":
        {
         sb.Append((fields[1].Split(' '))[1]);//latitude
         sb.Append(" " + (fields[1].Split(' '))[0]);//longitude
         break;
        }
       case "LINESTRING":
       case "MULTILINESTRING":
        {
         string[] pts = fields[1].Split(',');
         for (int i = 0; i < pts.Length; i++)
         {
          if (i > 0) sb.Append(",");
          sb.Append((pts[i].Split(' '))[1]);//latitude
          sb.Append(" " + (pts[i].Split(' '))[0]);//longitude
         }
         break;
        }
      }

      records.Add(sb.ToString());
     }
     rdr.Close();
    }
    catch (Exception e)
    {
     records.Add(e.Message);
    }
    conn.Close();
    return records;
   }
 }
}

Listing 2 – PostGIS class with Query Result as Text

This works after a manner, as long as MultiLineStrings are really just simple LineStrings along with other simplifications, but it would be much better to have a generalized OGC Simple Feature reader as well as a more efficient WKB reader. I am too lazy to work out all of that, so it is better to look around for a solution. In java the wkbj4 project is useful, but I didn’t find an equivalent wkb4c#. The npgsqltypes list features like point, path, polygon, but these are PostgreSQL types not the more useful OGC Simple Features found in PostGIS geometries. However, this port of JTS, Java Topology Suite, to C# is useful: Net Topology Suite

Code Geometry Type Coordinates
0 GEOMETRY X,Y
1 POINT X,Y
2 LINESTRING X,Y
3 POLYGON X,Y
4 MULTIPOINTe X,Y
5 MULTILINESTRING X,Y
6 MULTIPOLYGON X,Y
7 GEOMCOLLECTION X,Y

Table 1: OGC Simple Feature Geometry type codes (not all features)

Modifying the PostGIS class to take advantage of Net Topology Suite’s WKB and WKT readers results in this improved version:

using System;
using System.Collections.Generic;
using System.Configuration;
using System.Text;
using GeoAPI.Geometries;
using GisSharpBlog.NetTopologySuite.IO;
using Npgsql;

namespace GeoTest.Web
{
  public class PostGIS
  {
    private string _layer;
    private string _bbox;
    private string _table;

    public PostGIS(string layer, string table, string bbox)
    {
      this._layer = layer;
      this._table = table;
      this._bbox = bbox;
    }

    public List Query(bool wkb)
    {
      StringBuilder sb;
      List records = new List();
      string connStr = ConfigurationManager.AppSettings["PGconnecttionStr"];
      NpgsqlConnection conn = new NpgsqlConnection(connStr);
      conn.Open();
      NpgsqlCommand command;

      if (wkb) command = new NpgsqlCommand("Select *, AsBinary(the_geom) as geom from \"" +
_table + "\" WHERE the_geom && SetSRID('BOX3D(" + _bbox + ")'::box3d,4326);", conn);
      else command = new NpgsqlCommand("Select *, AsText(the_geom) as geom from \"" +
_table + "\" WHERE the_geom && SetSRID('BOX3D(" + _bbox + ")'::box3d,4326);", conn);
      try
      {
        NpgsqlDataReader rdr = command.ExecuteReader();
        while (rdr.Read())
        {
          IGeometry g;
          sb = new StringBuilder(_layer + ";");
          //concatenate other fields
          for (int i = 0; i < rdr.FieldCount - 2; i++)
          {
            sb.Append(rdr[i].ToString() + ";");
          }

          if (wkb)
          {
            WKBReader binRdr = new WKBReader();
            Byte[] geoBuf = new Byte[Convert.ToInt32((rdr.GetBytes(rdr.FieldCount - 1,
 0, null, 0, Int32.MaxValue)))];
            long cnt = rdr.GetBytes(rdr.FieldCount - 1, 0, geoBuf, 0, geoBuf.Length);
            g = binRdr.Read(geoBuf);
          }
          else //WKT
          {
            WKTReader wktRdr = new WKTReader();
            g = wktRdr.Read(rdr["geom"].ToString());
          }
          switch (g.GeometryType.ToUpper())
          {
            case "POINT":
              {
                sb.Append(g.Coordinate.Y);//latitude
                sb.Append(" " + g.Coordinate.X);//longitude
                break;
              }
            case "LINESTRING":
            case "MULTILINESTRING":
              {
                for (int i = 0; i < g.Coordinates.Length; i++)
                {
                  if (i > 0) sb.Append(",");
                  sb.Append(g.Coordinates[i].Y);//latitude
                  sb.Append(" " + g.Coordinates[i].X);//longitude
                }
                break;
              }
            case "POLYGON":
            case "MULTIPOLYGON":
              {
                for (int i = 0; i < g.Coordinates.Length; i++)
                {
                  if (i > 0) sb.Append(",");
                  sb.Append(g.Coordinates[i].Y);//latitude
                  sb.Append(" " + g.Coordinates[i].X);//longitude
                }
                break;
              }
          }
          records.Add(sb.ToString());
        }
        rdr.Close();
      }
      catch (Exception e)
      {
        records.Add(e.Message);
      }
      conn.Close();
      return records;
    }
  }
}

Listing 3 – PostGIS class improved using Net Topology Suite

Summary

Using just a rough response sense, the peroformance in best to worst order looks like this:

  1. PostGIS WKB
  2. PostGIS WKT
  3. MS SQL Server 2008

No real surprise here. This isn’t a precise comparison, but it does help me get a feel for response using these three approaches. Once OR/M is available for the geography data type, it will be of interest to see how a Linq OR/M generated version compares. All versions could be improved by using Base64 encoding to reduce the latency of pushing query results to the client view. An OR/M generated version should handle the DataService export to clients more efficiently.

Both npgsql and the .Net Topology Suite will be useful to anyone adapting .net c# code for use with spatial databases, especially considering the ubiquity of OGC Simple Feature types.