I recently had to work with spatial data, and I did so using MS SQL Server to do the spatial analysis. I learned a few things about spatial data in the process.
Spatial data broadly comes in two varieties geographic (e.g. latitudes and longitudes) and geometry (e.g. x and y). Geometric data is fairly straightforward but geographic data is more complicated, and that's what I'm talking about below.
Datums, Projections and SRIDs
Firstly, not all latitudes and longitudes are the same. There are different types of datum, which is the name for the co-ordinate system for mapping latitudes and longitudes onto a sphere (or elipsoid or whatever). The upshot of this is that a lat/long in one datum won't quite map onto the same point as in another.
The reason for this is that some systems might historically have been chosen because they work well for a particular area of the world. An example of this is the OSGB36 datum, which was chosen by Ordinance Survey in the UK because it works well for the particular part of the globe where the United Kingdom sits.
However, these days the most common example of datum is WGS84, which works uniformly well across the globe, and is used by GPS systems.
Secondly, some geographic data has been projected. This means the latitude and longitudes have been mapped from spherical onto a 'flat' surface, like a rectangle, as pseudo x, y geometric data. An example of this is the UK ordinance survey national grid. This is a 'tranverse mercator' projection of the OSGB36 datum. Instead of having units of degrees, it has units of meters.
The combination of the datum, projection and other factors (for instance the units) for a particular set of spatial data is captured in a single ID called the SRID (Spatial Reference System Identifier). For instance, for the unprojected WGS84 datum with degrees as the unit, the SRID is 4326. And the SRID for the UK National Grid (projected OSGB36) with the unit of meters is 1936.
The upshot of all this is that when you're pulling different bits of geographic data together, you need to make sure they're all in (or convert them to) the same SRID before you can work with them. Plus you need to know what the SRID of any output results is.
Fileformat
Add to this that there isn't a universally accepted file format for geographic data. Sometimes it comes as a csv of lats and longs, or projected x and y's. Often, however, geographic data comes in the form of a shapefile. This is a binary file format, which has arisen from a popular piece of geospatial software called ArcGIS by ESRI.
The problem with this is that if you're not working in ArcGIS (or something that can take shapefiles), then you going to need to convert it to something more straightforward before you can move on. Plus, if you've got different SRIDs for your different data sets, you're going to need to convert them all to the same type (probably the ubiquitous 4326).
Loading Shapefiles
To load shapefile data into SQL server as geographic data, I found the following very helpful tool called 'SQL Server Spatial tools' here.
This let's you select a shapefile, auto-identifies the SRID and spatial datatype, and connects and uploads the data straight to your SQL database for you. Easy.
Converting
A lot of spatial software will allow you to load in geographic data and convert between SRIDs. However I was using SQL server geographic data types to do my analysis. While SQL understands the concept of SRIDs and won't let you combine geographic data from two different SRIDs, it also won't let you convert between them.
Luckily, at this point I found the super flexible ogr2ogr tool, which is part of the open source GDAL suite. This command line tool allows you to convert shapefiles between SRIDs very efficiently. Also, having spoken to some guys that use ArcGIS to do these conversions, I have reason to believe this tool does the job signficantly quicker.
Once I had converted my shapefile to the correct SRID, I was able to re-upload it to SQL, and complete my analysis!
(I'll write another post soon to demonstrate a few SQL spatial joins!).