SHP, Shapefiles

The most ubiquitous of living fossil GIS formats, ESRI's SHP format, also known as "shape format" or "shapefiles," originally was used with ESRI's ArcView to save vector data in the early 1990's.  When ESRI published SHP format as a written standard shapefiles became the most popular format for data interchange of vector data in GIS.    Despite the antique limitations of the format, shapefiles continue to be used for data interchange.

 

See Example: Import Shapefile and Create a Map for a step by step process to import a shapefile and to create a map.

 

See Example: Import a Shapefile for an example of importing a shapefile that does not have a projection specified.

 

To import a drawing from SHP format:

 

  1. Choose File - Import from the main menu.

  2. In the Import dialog, browse to the location of the file and double-click on the .shp file desired.  

  3. If the shapefile ensemble did not include a .prj file, manually specify the projection (coordinate system) used by the shapefile by opening the drawing and by launching Assign Initial Coordinate System in the Info pane.

 

Shapefiles created by older systems that did not automatically add a .prj file to the shapefile ensemble have no way of telling Manifold what projection they are in, so immediately upon import we must specify the projection manually by by launching Assign Initial Coordinate System in the Info pane.

 

Shapefiles unaccompanied by a .prj will often be in Latitude / Longitude projection.    To specify Latitude / Longitude projection as the initial projection we open the drawing and launch Assign Initial Coordinate System in the Info pane.   Choose Latitude / Longitude projection and we are done.

 

When importing a shapefile, text field values that are all spaces are imported as NULLs.

 

To export a drawing to SHP format:

 

  1. Open the drawing in a drawing window.

  2. Choose File - Export from the main menu.  We can also right click on a drawing in the Project pane and choose Export in the context menu.

  3. In the Export dialog choose SHP Files in the Save as type box and specify a File name to use.

  4. Press Save.

 

See Example: Import Shapefile and Create a Map for a step by step process to import a shapefile and to create a map.

 

Linking Shapefiles

Linking a shapefile leaves the data in a shapefile.  The main reason we might want to do that instead of simply importing data from the shapefile is if we will have to share that shapefile with someone else who does not have Manifold, so we want to leave the data in shapefile format and we do not want to have to remember to export any changes we make to a fresh shapefile.   Linking a shapefile takes the same time as importing a shapefile, because Manifold behind the scenes will import the entire shapefile to build a local cache and a local, high-speed index.   That will give subsequent work with that shapefile performance as if the shapefile had been imported, but it will cost a lot of time during the initial File - Link process.

 

To link a file:

 

  1. Choose File - Link from the main menu.

  2. In the Link dialog, browse to the location of the file and double-click on the .shp file desired.  

  3. If the shapefile ensemble did not include a .prj file, manually specify the projection (coordinate system) used by the shapefile by opening the drawing and by launching Assign Initial Coordinate System in the Info pane.  

 

Using File - Link to link a shapefile will create a local cache for the shapefile.  To avoid creating a cache, use File - Create - New Data Source to link the shapefile.

About Shapefiles

A "shapefile" is not just one file but usually consists of three similarly named files with differing extensions: a .shp, a .shx and a .dbf file.  Even though there are three files involved almost all GIS people will refer to the set of three files using the singular term shapefile.   The .dbf file is a dBase database system format file that stores data attributes for the drawing.   The .dbf part of shapefiles is even older than ArcView and dates back to 1979.  

 

But for all the limitations of shapefiles the format remains ubiquitous in GIS.   SHP is not a bad choice for a least common denominator method of exchanging data if the data is simple enough to fit within the limitations of shapefiles.  On the plus side, SHP is widely supported and it is a reasonably fast format, faster for "in place" editing than other old vector formats such as DXF, MapInfo MID/MIF or  GML/KML.  

 

Manifold therefore reads and writes shapefiles, using a variety of strategies when exporting data into shapefiles to dumb down modern data to fit into the limitations imposed by SHP format.    

 

The main limitations of shapefiles are:

 

 

Many applications fail to honor the above limitations so the world is full of nonstandard "shapefiles" which cannot be read correctly by applications which adhere to the standard.   All the same, some extensions to shapefile format, such as the use of .prj files to specify projections or the use of .cpg files to indicate character set encoding within the .dbf file, have become so common they can more or less safely be used.

Multipatch Features

A multipatch feature is an object that stores a collection of patches that represent the boundary of a 3D object as a single row in a table.  Patches can be triangles, triangle fans, triangle strips, or (as ESRI calls inner/outer closed boundaries of areas) rings. Given the antiquity of shapefiles, multipatch geometry in shapefiles is fairly rare, and usually represents rings as used to define area objects.

 

The Manifold shapefile dataport reads multipatch geometry values, automatically converting them into line object geometry, a natural fit to the use of multipatch ring definitions, and a reasonable way to provide at least some representation of what could be a 3D object in 2D form for patch geometries other than rings.

 

Importing a shapefile containing multipatch geometry will import the multipatch boundaries of objects as boundary line objects, basically the footprint/bounding line of the multipatches.

 

Linking a shapefile that contains multipatch geometry values will show the geometry field containing multipatch values as a read-only field.  For the same reason ESRI support states "Multipatch features cannot be created interactively using the standard editing user interface," Manifold also does not allow direct edits of a multipatch geometry field.

 

Manifold does not export multipatch geometry, since there is no equivalent Manifold 3D surface type or TIN.

How Manifold Exports Shapefiles

Manifold honors the shapefile standard and deals with the above limitations as follows:

 

File size - Shapefile format as currently used by ESRI allows going up to 4 GB, but many software packages stop working sooner, at 2 GB, the traditional limit.   Exporting a drawing to a SHP file automatically starts a new SHP file if the amount of exported data exceeds 2 GB.

 

Data types - On export, Manifold automatically will convert modern types into simplified representations that can be stored in a shapefile.   For example,  variable-length text data is exported as fixed-length text, since various third party programs do not seem to be able to handle memo fields.  Floating point types will be converted into text.  The conversion can involve data loss, for example as will happen when truncating a long, variable-length text value into a fixed length character field.

 

Projections - When Manifold exports a shapefile it will always add a PRJ file to the ensemble that specifies the projection the shapefile uses.   Manifold also creates a .mapmeta file, for use by any Manifold Release 9 or later installation that imports the shapefile, that provides precise coordinate system information in JSON format.

 

Tech Tip:  For maximum interoperability, do not export projected drawings to shapefiles.  Use Latitude / Longitude with WGS 84 datum, and do NOT use scale factors other than 1 or any offsets.  Keep field names short and the name of the drawing short.  Manifold can read shapefiles with longer names, but other software might be limited to the original shapefile spec.

 

Character encoding - When Manifold exports a shapefile it will write character (text) data into the DBF part of the ensemble as UTF-8, that is, as Unicode, and will add a CPG file, for the use of third-party products that can utilize CPG files, describing the encoding within the DBF as UTF-8.

 

Text field length optimization -  DBF files cannot store text fields longer than about 250 characters. When Manifold exports a DBF file (including as part of a shapefile ensemble) the system computes the maximum length of each text field, increases that computed length value to the nearest multiple of 8 to accommodate for future edits, and uses that as the DBF field length.  

 

Text value truncation -  Exporting data to a shapefile automatically truncates text values larger than 250 characters to 250 characters, the maximum allowed.   When text fields are truncated, the log window will report the names of fields containing truncated values as well as the total number of truncated values in the log.

 

File and Field Names - Manifold will automatically truncate field names into the limited forms allowed by shapefiles and will eliminate spaces and other disallowed characters.  For example, a field name called Highest Z-value (meter) in a Manifold drawing's table will be converted into a field called HighestZva in the shapefile's DBF.  Manifold allows longer file names.

 

Z Values - Exporting areas with Z values to SHP retains Z values.

 

Object types -  Manifold drawings can contain a mixture of areas, lines and points along with curvilinear objects.   When a Manifold drawing that contains a mix of areas, lines and points is exported to shapefile format Manifold creates three sets of shapefiles:  shapefiles for the areas, shapefiles for the lines and shapefiles for the points.  Curvilinear objects are interpolated into the area or line equivalents.  

 

When exporting Manifold drawings containing objects of only one type (only areas or only lines or only points) to shapefiles no postfixes will be appended to the filename. When Manifold drawings contain more than one type of object, Manifold will create a file with no postfix for the areas and will then create files with _lines and _points postfixes to indicate which shapefiles contain lines and points.

 

Multipoints - Exporting a drawing to SHP writes points with a single coordinate and points with multiple coordinates into separate files.  Some third party software that claims to read shapefiles cannot handle multipoints, so Manifold separates points and multipoints into different files. This at least realistically avoids all avoidable issues with third-party software that cannot handle multipoints:  if a particular product cannot handle multipoints, the user will not be able to work with them, but at least all other objects will be available for use.  

 

Dealing with the limitations some software has on object types and multipoints is not easy.  Even as well-respected a package as the open source GDAL/OGR library does not deal with them automatically.  As the GDAL  documentation notes: "ESRI shapefiles can only store one kind of geometry per layer (shapefile).  [...] Note that this can make it very difficult to translate a mixed geometry layer from another format into Shapefile format using ogr2ogr, since ogr2ogr has no support for separating out geometries from a source layer. "

Incompatibilities

In addition to the fundamental limitations designed into shapefiles there are various incompatibilities that arise when shapefiles are used in modern settings.  The most common are:

 

 

Manifold manages the above incompatibilities as follows:

 

DBF drivers - Manifold does not rely on a third party DBF driver.  Instead, Manifold uses a special, Manifold-written DBF driver within Manifold's SHP dataport that is used only for reading and writing shapefiles.  The Manifold DBF driver can work around non-standard variations of DBF to extract as much information as possible.  When writing DBF,  Manifold tries to create a least common denominator DBF that can be read by as many shapefile reading packages as possible.

 

Editing incompatibilities - Manifold allows editing shapefiles "in place," with edits managed to avoid surprises when popular GIS packages import any shapefiles created or edited by Manifold.    For example, objects deleted during "in place" editing of a shapefile with Manifold will also be considered deleted when that shapefile is opened by ESRI products or by shapefile-using packages that employ the GDAL/OGR library to interact with shapefiles.  

 

Projection incompatibilities - Manifold reads the most common PRJ variations with a focus on correctly utilizing PRJ files created by ESRI products.    When exporting, Manifold writes an ESRI-style PRJ for shapefiles and also creates a .mapmeta file for each shapefile that writes the coordinate system information for each shapefile in JSON format.

   

Tech tip: Even though the JSON metadata will provide a highly precise and very "open" description of the coordinate system used, and even though Manifold PRJ files for shapefiles will do a really good job of conveying coordinate systems as best as any PRJ can do, it is still wise to follow the advice that experienced shapefile users have offered for over 25 years: do not use shapefiles to publish data in coordinate systems other than Latitude / Longitude.   The wise shapefile author always publishes shapefiles only in Latitude / Longitude "unprojected" form using degrees as a unit of measure with WGS84 as the base.  In addition, all attributes will have short names and the drawing name will be short, with all names eight characters or less in length, not beginning with a number and using no special characters.  

 

There is no loss to doing so since any modern package that can read shapefiles can effortlessly reproject unprojected data into whatever coordinate system is desired.   There is no point introducing an interoperability risk from other coordinate systems when one can completely avoid such risk by publishing a shapefile using Latitude / Longitude projection.

Localization

Manifold text fields use Unicode, which is not supported by DBF files in the original shapefile standard, but which now can be handled by most shapefile-reading applications through the use of CPG files.   When Manifold exports a shapefile it will write character (text) data into the DBF part of the ensemble as UTF-8, that is, as Unicode, and will add a CPG file, for the use of third-party products that can utilize CPG files, describing the encoding within the DBF as UTF-8.

 

Importing a .dbf file (either by importing a table from a .dbf or by importing a drawing from a shapefile) will automatically translate text fields into Unicode.

Exporting Projected Shapefiles

Because SHP format does not capture projection information it is unwise to export projected drawings into SHP format. However, if for some reason we absolutely must export projected data we should keep in mind the raw nature of data in projected form and the options used to represent locations in projected coordinate systems.

 

For example, suppose we have a drawing in some metric projection that uses local offsets of 100, 100 and local scales of 10, 10. Suppose we have a point the coordinates of which are 1, 2 in this coordinate system. When exporting this drawing as a SHP, sometimes we may want the coordinate numbers locating the point in the SHP file to be 1, 2 and sometimes 110, 120.

 

The Manifold SHP exporter does not transform the coordinate numbers in any way, so Manifold will always export 1, 2 for the coordinates of the point. If desired, we can force Manifold to export 110, 120 by first reprojecting the drawing into the coordinate system using local offsets of 0 and local scales of 1.

 

Example: Export a Drawing to SHP Format  

Suppose we have a drawing called Monaco that contains a mix of points, lines and areas.   When we export the drawing to SHP we will create the following files.

 

For areas:

 

Monaco.dbf

Monaco.prj

Monaco.shp

Monaco.shp.mapmeta

Monaco.shx

 

For lines:

 

Monaco_lines.dbf

Monaco_lines.prj

Monaco_lines.shp

Monaco_lines.shp.mapmeta

Monaco_lines.shx

 

For points:

 

Monaco_points.dbf

Monaco_points.prj

Monaco_points.shp

Monaco_points.shp.mapmeta

Monaco_points.shx

 

The .prj file contains ESRI-style coordinate system information.  The .mapmeta files contain coordinate system information in JSON format.   For example, the Monaco.shp.mapmeta contains:

 

{ "CoordSystem": { "Base": "World Geodetic 1984 (WGS84)", "Eccentricity": 0.08181919084262149, "LocalScaleX": 0.0001, "LocalScaleY": 0.0001, "MajorAxis": 6378137, "Name": "Latitude \/ Longitude", "System": "Latitude \/ Longitude", "Unit": "Degree" } }

 

Notes

Longer file and field names - The original dBase package used no more than eight characters in field names and no more than eight characters in a file name plus the three letter extension.  Over the years so many applications, including dBase descendents, have used slightly longer names that the current consensus is field names should have no more than ten characters and that file names also can be longer.   Manifold therefore allows ten characters for field names and significantly longer names for file names.  

 

Do not send only the .shp file - A "shapefile" consists of at least three files, and in modern times often four files if a PRJ file is written, and five files if in addition a CPG file is written as well.  Despite the singular form of the word "shapefile" it is not just one file but at least three files.  When Manifold exports to "shp" format it creates six files: a .dbf, a .shp and a .shx file to make up the classic three files required by the ESRI shape format definition, plus a fourth .prj file specifying the coordinate system in not standard, but customary way, plus a fifth .cpg file describing any text in the .dbf as UTF-8 Unicode , and finally a sixth .mapmeta file unique to Manifold products that provides a very open JSON format description of the coordinate system.  When providing the result of our export to someone else, we must not forget to provide at least the five customary files and not make the beginner's blunder of providing just the .shp file.

 

Invalid Z values - Reading a 3D geometry value in a SHP file forces invalid Z values such as NaN or Inf to 0.

 

Large Imports take time - Importing data from big files can be slow while the data is converted into the special form Manifold uses internally to support fast performance.  Manifold cares about making work within Manifold as fast as possible, since that is what Manifold users spend most of their time doing, not about making faster round trips for import, edit and export.  Once imported data gets into Manifold it can be worked with astonishing speed, almost certainly much faster than in its original environment.  It can be saved as a Manifold .map project that thereafter can be opened instantly and saved instantly.  But to get that speed we pay a one-time cost of the time it takes to convert the data into fast Manifold .map form.   If we only intend to do a one-time edit it could be faster just to leave the data in place and edit it as a linked file, if possible.

 

Linking Large Files also takes time - When linking a large file, by default Manifold will build a MAPCACHE accessory file to enable faster viewing and editing of the linked file.  That can take a long time for large files but only the first time: once a .MAPCACHE file has been built thereafter linking that same file will be very fast.  Whether it is quicker to edit a single, large file by linking or by importing is often a toss-up, the choice being made on whether it is more important to edit the file "in place" because the file may be used by other programs that expect the original format.  To avoid building a cache, instead of using File - Link to link a file, use File - Create - New Data Source to link a file, since the New Data Source dialog provides options to not create a cache.

Videos

Manifold 9 - Re-Project a Shapefile  - New coordinate system dialogs make it easier than ever to reproject data, often in only one click. This video shows how to import a shapefile and then rapidly reproject it into different coordinate systems. We then show how maps reproject their contents on the fly for display and how to exploit that to rapidly show data in different projections.

 

See Also

File - Import

 

File - Export

 

Assign Initial Coordinate System

 

Reproject Component

 

File - Create - New Data Source

 

DBF, dBase / FoxPro

 

Example: Import Shapefile and Create a Map - Step by step process to import a shapefile and to create a map.

 

Example: Reproject a Drawing - An essential example on changing the projection of a drawing, either within the drawing itself, or by changing the projection of a map window that shows the drawing and on the fly reprojects the drawing for display.

 

Example: Import a Shapefile - ESRI shapefiles are a very popular format for publishing GIS and other spatial data.  Unfortunately, shapefiles often will not specify what projection should be used.  This example shows how to deal with that quickly and easily.

 

Latitude and Longitude are Not Enough

 

Shapefiles Strangely Out of Shape

 

Three Letter Extensions