Open up that data, Cloud Data

 James Fee looks at AWS data and here is the Tiger .shp snapshot James mentions: Amazon TIGER snapshot
More details here: Tom MacWright

Too bad it is only Linux/Unix since I’d prefer to attach to a Windows EC2. TIGER is there as raw data files ready to attach to your choice of Linux EC2. As is Census data galore.

But why not look further?  It’s interesting to think about other spatial data out in the Cloud.

Jeffrey Johnson adds a comment to spatially adjusted about OSM with the question – what form a pg_dump or a pg database? This moves a little beyond raw Amazon public data sets.

Would it be possible to provide an EBS volume with data already preloaded to PostGIS? A user could then attach the EBS ready to use. Adding a middle tier WMS/WFS like GeoServer or MapServer can tie together multiple PG sources, assuming you want to add other pg databases.

Jeffrey mentions one caveat about the 5GB S3 limit. Does this mark the high end of a snapshot requiring modularized splitting of OSM data? Doesn’t sound like S3 will be much help in the long run if OSM continues expansion.

What about OpenAerial? Got to have more room for OpenAerial and someday OpenTerrain(LiDAR)!
EBS – volumes from 1 GB to 1 TB. Do you need the snapshot (only 5GB) to start a new EBS? Can this accommodate OpenAerial tiles, OpenLiDAR X3D GeoElevationGrid LOD. Of course we want mix and match deployment in the Cloud.

Would it be posible for Amazon to just host the whole shebang? What do you think Werner?

Put it out there as an example of an Auto Scaling, Elastic Load Balancing OSM, OpenAerial tile pyramids as CloudFront Cache, OpenTerrain X3D GeoElevationGrid LOD stacks. OSM servers are small potatoes in comparison. I don’t think Amazon wants to be the Open Source Google, but with Google and Microsoft pushing into the Cloud game maybe Amazon could push back a little in the map end.

I can see GeoServer sitting in the middle of all this data delight handing out OSM to a tile client where it is stacked on OpenAerial, and draped onto OpenTerrain. Go Cloud, Go!

Silverlight MapControl and PostGIS


PostGIS Silverlight MapControl
Fig 1 – PostGIS and Silverlight MapControl

It is worth revisiting connecting Silverlight MapControl to a database in order to try duplicating the SQL Server layers in PostgreSQL/PostGIS.

Why?
SQL Server 2008 is still lacking some of the spatial capabilities found in PostGIS and also a rough performance comparison between the two databases would be interesting.

The same process is used in accessing a database from inside the Silverlight MapController for either SQL Server or PostGIS. A Data service is used with an [OperationContract] to expose the results of a DB query to the Silverlight through a Service Reference on the Silverlight side. From there it is a matter of using Silverlight MapController events to send a request to the service and process the results into shapes on the map.

The main difference is the code to handle the PostGIS connection, command query, and result reader. Microsoft includes these for SQL Server in the System.Data.SqlClient namespace as part of a default project setup. Naturally PostGIS doesn’t happen to be in the Microsoft System.Data.dll, however, there is a nice project that provides access to PostgreSQL inside .NET: Npgsql – .Net Data Provider for Postgresql

I downloaded Npgsql version 2.04 binaries and referenced it into my previous GeoTest. Now I can add a new [OperationContract] to the existing SFBayDataService that uses an Npgsql connection to PostgreSQL. Since it would be interesting to compare SQL Server and PostgreSQL/PostGIS, my new OperationContract includes the ability to switch from one to the other.
   [OperationContract]
  public List<string> GetGeom(string dbtype, string table, string bbox, string layer)

dbtype can be ‘PostGIS’ or ‘SQL Server’. I also refactored to allow selection of table and layer as well as a single string bbox.

Morten Nielsen has a blog on using npgsql here: SharpMap Shp2pgsql He used npgsql to create a nice little import tool to load shp format data into PostGIS with a GUI similar to the SQL Server shp loader. I used it on the SF Bay test data after modifying by adding a necessary System.Int64 type to the Type2PgType list and also to the Type2NpgsqlType. It then worked fine for loading my three simple test tables.

I can use npgsql to open a connection and add a query command for a PostGIS database:
   NpgsqlConnection conn = new NpgsqlConnection(“Server=127.0.0.1;Port=5432;User Id=your id;Password=your password;Database=TestShp;”);
   conn.Open();
   NpgsqlCommand command = new NpgsqlCommand(“Select *, asText(the_geom) as geom from \”" + table + “\” WHERE the_geom && SetSRID(‘BOX3D(” + bbox + “)’::box3d,4326);”, conn);

Using the npgsql reader lets me process through the query results one record at a time:
  NpgsqlDataReader rdr = command.ExecuteReader();
  while (rdr.Read()) {…}

which leads to a decision. SQL Server lets me grab a geom field directly into a geography data type:
   SqlGeography geo = (SqlGeography)rdr["geom"];
but npgsql has no equivalent convenience. Rather than work out the WKB stream I took the easy way out and converted geom on the PostGIS query like this: asText(the_geom) as geom
Now I can take the text version of the geometry and put it back together as a result string for my Silverlight MapControl svc_GetGeomCompleted.

[OperationContract]
public List GetGeom(string dbtype, string table, string bbox, string layer)
{
    List records = new List();

    if (dbtype.Equals("SQL Server"))
    {
        SqlServer ss = new SqlServer(layer, table, bbox);
        records = ss.Query();
     }
    else if (dbtype.Equals("PostGIS"))
    {
        PostGIS pg = new PostGIS(layer, table, bbox);
        records = pg.Query();
    }
    return records;
}

Listing 1 – Data Service Operation Contract

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using Npgsql;
using System.Text.RegularExpressions;
using System.Text;
using System.Configuration;

namespace GeoTest.Web
{
 public class PostGIS
 {
  private string _layer;
  private string _bbox;
  private string _table;

  public PostGIS(string layer, string table,string bbox )
  {
   this._layer = layer;
   this._table = table;
   this._bbox = bbox;
  }

  public List Query()
  {
    StringBuilder sb;
    List records = new List();
    string[] fields;
    string geomType;
    string connStr = ConfigurationManager.AppSettings["PGconnecttionStr"];
    NpgsqlConnection conn = new NpgsqlConnection(connStr);
    conn.Open();
    NpgsqlCommand command = new NpgsqlCommand("Select *, asText(the_geom) as geom from \"" +
 _table + "\" WHERE the_geom && SetSRID('BOX3D(" + _bbox + ")'::box3d,4326);", conn);
    try
    {
     NpgsqlDataReader rdr = command.ExecuteReader();
     while (rdr.Read())
     {
      sb = new StringBuilder(_layer + ";");
      for (int i = 0; i < rdr.FieldCount - 2; i++)
      {
       sb.Append(rdr[i].ToString() + ";");
      }
      fields = Regex.Split(rdr["geom"].ToString(), @"\(+(.*?)\)+");
      geomType = fields[0];
      switch (geomType) {
       case "POINT":
        {
         sb.Append((fields[1].Split(' '))[1]);//latitude
         sb.Append(" " + (fields[1].Split(' '))[0]);//longitude
         break;
        }
       case "LINESTRING":
       case "MULTILINESTRING":
        {
         string[] pts = fields[1].Split(',');
         for (int i = 0; i < pts.Length; i++)
         {
          if (i > 0) sb.Append(",");
          sb.Append((pts[i].Split(' '))[1]);//latitude
          sb.Append(" " + (pts[i].Split(' '))[0]);//longitude
         }
         break;
        }
      }

      records.Add(sb.ToString());
     }
     rdr.Close();
    }
    catch (Exception e)
    {
     records.Add(e.Message);
    }
    conn.Close();
    return records;
   }
 }
}

Listing 2 – PostGIS class with Query Result as Text

This works after a manner, as long as MultiLineStrings are really just simple LineStrings along with other simplifications, but it would be much better to have a generalized OGC Simple Feature reader as well as a more efficient WKB reader. I am too lazy to work out all of that, so it is better to look around for a solution. In java the wkbj4 project is useful, but I didn’t find an equivalent wkb4c#. The npgsqltypes list features like point, path, polygon, but these are PostgreSQL types not the more useful OGC Simple Features found in PostGIS geometries. However, this port of JTS, Java Topology Suite, to C# is useful: Net Topology Suite

Code Geometry Type Coordinates
0 GEOMETRY X,Y
1 POINT X,Y
2 LINESTRING X,Y
3 POLYGON X,Y
4 MULTIPOINTe X,Y
5 MULTILINESTRING X,Y
6 MULTIPOLYGON X,Y
7 GEOMCOLLECTION X,Y

Table 1: OGC Simple Feature Geometry type codes (not all features)

Modifying the PostGIS class to take advantage of Net Topology Suite’s WKB and WKT readers results in this improved version:

using System;
using System.Collections.Generic;
using System.Configuration;
using System.Text;
using GeoAPI.Geometries;
using GisSharpBlog.NetTopologySuite.IO;
using Npgsql;

namespace GeoTest.Web
{
  public class PostGIS
  {
    private string _layer;
    private string _bbox;
    private string _table;

    public PostGIS(string layer, string table, string bbox)
    {
      this._layer = layer;
      this._table = table;
      this._bbox = bbox;
    }

    public List Query(bool wkb)
    {
      StringBuilder sb;
      List records = new List();
      string connStr = ConfigurationManager.AppSettings["PGconnecttionStr"];
      NpgsqlConnection conn = new NpgsqlConnection(connStr);
      conn.Open();
      NpgsqlCommand command;

      if (wkb) command = new NpgsqlCommand("Select *, AsBinary(the_geom) as geom from \"" +
_table + "\" WHERE the_geom && SetSRID('BOX3D(" + _bbox + ")'::box3d,4326);", conn);
      else command = new NpgsqlCommand("Select *, AsText(the_geom) as geom from \"" +
_table + "\" WHERE the_geom && SetSRID('BOX3D(" + _bbox + ")'::box3d,4326);", conn);
      try
      {
        NpgsqlDataReader rdr = command.ExecuteReader();
        while (rdr.Read())
        {
          IGeometry g;
          sb = new StringBuilder(_layer + ";");
          //concatenate other fields
          for (int i = 0; i < rdr.FieldCount - 2; i++)
          {
            sb.Append(rdr[i].ToString() + ";");
          }

          if (wkb)
          {
            WKBReader binRdr = new WKBReader();
            Byte[] geoBuf = new Byte[Convert.ToInt32((rdr.GetBytes(rdr.FieldCount - 1,
 0, null, 0, Int32.MaxValue)))];
            long cnt = rdr.GetBytes(rdr.FieldCount - 1, 0, geoBuf, 0, geoBuf.Length);
            g = binRdr.Read(geoBuf);
          }
          else //WKT
          {
            WKTReader wktRdr = new WKTReader();
            g = wktRdr.Read(rdr["geom"].ToString());
          }
          switch (g.GeometryType.ToUpper())
          {
            case "POINT":
              {
                sb.Append(g.Coordinate.Y);//latitude
                sb.Append(" " + g.Coordinate.X);//longitude
                break;
              }
            case "LINESTRING":
            case "MULTILINESTRING":
              {
                for (int i = 0; i < g.Coordinates.Length; i++)
                {
                  if (i > 0) sb.Append(",");
                  sb.Append(g.Coordinates[i].Y);//latitude
                  sb.Append(" " + g.Coordinates[i].X);//longitude
                }
                break;
              }
            case "POLYGON":
            case "MULTIPOLYGON":
              {
                for (int i = 0; i < g.Coordinates.Length; i++)
                {
                  if (i > 0) sb.Append(",");
                  sb.Append(g.Coordinates[i].Y);//latitude
                  sb.Append(" " + g.Coordinates[i].X);//longitude
                }
                break;
              }
          }
          records.Add(sb.ToString());
        }
        rdr.Close();
      }
      catch (Exception e)
      {
        records.Add(e.Message);
      }
      conn.Close();
      return records;
    }
  }
}

Listing 3 – PostGIS class improved using Net Topology Suite

Summary

Using just a rough response sense, the peroformance in best to worst order looks like this:

  1. PostGIS WKB
  2. PostGIS WKT
  3. MS SQL Server 2008

No real surprise here. This isn’t a precise comparison, but it does help me get a feel for response using these three approaches. Once OR/M is available for the geography data type, it will be of interest to see how a Linq OR/M generated version compares. All versions could be improved by using Base64 encoding to reduce the latency of pushing query results to the client view. An OR/M generated version should handle the DataService export to clients more efficiently.

Both npgsql and the .Net Topology Suite will be useful to anyone adapting .net c# code for use with spatial databases, especially considering the ubiquity of OGC Simple Feature types.