Shapefiles are a file standard for storing GIS data introduced many years ago by ESRI for a desktop GIS package. Despite employing a Stone Age level of database technology they have become one of the most common standards for interchanging GIS information.
Unfortunately, shapefiles include numerous limitations that make them a poor choice as an interchange format in modern times. These include:
Use of multiple files - A "shapefile," is not just one file but an ensemble of files that must contain content that is exactly synchronized between the files making up the "shapefile" ensemble. Some content is duplicated between files, but the standard is silent about what to do when such duplicated content does not match up. The use of multiple files leads to frequent errors in interchange because only one of the files involved has a .shp extension and the name of the format is singular, as in "shapefile" and not "shapefiles." Inexpert users therefore often make the error of sending only the .shp file and not the other files in the ensemble, thinking they have sent the "shapefile."
.dbf format for data - .dbf is a very old file format invented in Neolithic times for the paleo-DBMS dBase II. It was created by a hobbyist to manage personal data for an office football betting pool based on software written in 1973. It was later used in the dBase II database product first sold almost 40 years ago for CP/M computers. .dbf format is so incredibly old it pre-dates both the original IBM PC as well as MS-DOS. Despite the great antiquity of .dbf and the extreme limitations of the format, the shapefile ensemble of files utilizes a .dbf file to store data. The .dbf format includes gross limitations on field names, profoundly limited data types and other severe limitations arising from a combination of its stone-age antiquity and personal hobby origins. That we must utilize .dbf at all in modern time for dealing with GIS data is nothing short of astonishing, somewhat like having to chip at rocks with stone tools to exchange GIS data. That almost 45 years after the creation of .dbf computer hobbyists would still be so poorly educated in DBMS that anyone would choose .dbf as the native format for an "open" GIS, and the most popular one at that, is simply depressing to educated people.
One type per shapefile - Shapefiles can only contain one type of object, that is, a shapefile may contain only points, only lines or only areas, but not any mix of those types. Drawings which contain points, lines and areas must be exported as three separate shapefiles. Objects cannot be a mix of XY and XYZ points, nor does the shapefile standard comprehend multipoints or object in curved geometry such as splines.
No support for projections - The shapefile standard provides no way of storing projections. Although later "extensions" to shapefiles have been proposed and often are used to store projections, such extensions routinely fail to provide full projection information and often cause endless chaos from lack of standardization.
No provision for dynamic read/write access - The shapefile standard envisions creation of shapefiles in a single operation as a written ensemble of synchronized files. It does not define how shapefiles may be opened for dynamic read/write editing on the fly.
Shapefiles are fine for data interchange of very simple, unprojected data, the original purpose for which they were created. Using them for data interchange or data storage beyond that causes no end of trouble.
While the painful limits of .dbf, the restrictions on single object types per shapefile and endless difficulties caused by using shapefiles to convey projected information are old news to GIS practitioners, a new generation of troubles can arise when using software like Manifold that can open shapefiles in dynamic read/write mode to edit shapefiles on the fly. Troubles arise because the shapefile spec says nothing about dynamic modifications, so different interpretations of how that might be done can cause different interpretations of the same shapefiles by different software packages.
Problems may arise due to differing interpretations of how to best harmonize the ensemble of files that make a "shapefile," which is not just one file but an ensemble of files that must contain content that exactly corresponds with each other. For example, each .shp file has an accompanying .dbf file in dBASE II DBMS format. The number of objects in the .shp file must coincide exactly with the number of records in the .dbf file.
The official spec is written with the expectation that a .shp file will be written out together with the corresponding .dbf with both populated with objects and records as required; however, the spec provides no documented way to dynamically delete an object in an existing .shp file that has been opened in read/write mode. The .dbf provides a way to delete a record in a .dbf by marking the record as "deleted" (by changing one byte in the record header) but the shapefile spec does not specify any corresponding way to delete the object associated with that record in the .shp. The shapefile spec thus leaves open some ambiguity in how to represent dynamic deletions of objects.
A software package could use the "nuclear option" of simply writing out a new .shp file along with a new .dbf for all objects but the deleted one and thus guaranteeing lowest common denominator conformity with other programs. But doing that is not really dynamic editing of an existing .shp file: it is, instead, a fake dynamism that simply edits objects in memory and writes out new files to replace the original files in a classically non-dynamic manner. One key reason to want dynamic read/write capability is to avoid the slowness of having to write out an entire, potentially large, file when making simple changes such as the deletion of a few objects.
Many programs, including Manifold, offer dynamic editing and allow deletion of objects by leveraging the primary role of .dbf in the shapefile standard. When an object is deleted, say, in a drawing window, the corresponding record for that object in the .dbf is marked as deleted. When Manifold opens a shapefile, if a .shp file contains an object for which the corresponding record in the .dbf is marked as deleted, Manifold treats that object as deleted as well, using the rationale that since each object in a .shp should have a corresponding .dbf record if the record has been marked deleted the object should be considered deleted as well. That's not a bad call in a universe of programs which provide dynamic read/write editing of shapefiles. But it is not the only possible call.
Programs which do not offer dynamic read/write editing can simply always write out new .dbf and .shp files which harmonize exactly. However, even those programs which always write out harmonized .dbf and .shp pairs can encounter shapefiles created by other programs where the .dbf and .shp files have implicit or explicit disagreements. For example, they may open a shapefile set where the .dbf has marked a record as deleted while the .shp file retains an object for that record.
In real life where GIS users will encounter an entire zoo of shapefile pathologies a working program must make pragmatic decisions about how to best handle shapefiles where the files involved disagree. Such disagreements can occur as a result of file damage, erroneous program operation or many other reasons. The usual strategy is to try to recover as much data as is possible from "broken" shapefiles. Whether to treat as "broken" and a candidate for recovery a .shp file that contains objects for which the associated records in the .dbf file are marked as deleted is a matter of opinion, as is the decision to consider the deletion flag on the record an error and not an intentional edit.
Programs which put a higher value on possibly recovering data from incorrectly written or damaged shapefiles may choose to assume the deletion flag is an error and the object should be retained. That can be a logical approach in a universe of programs which are committed to always writing harmonious .dbf and .shp pairs and where inconsistency between the two files is logically taken as an error. Manifold Release 8 and prior Manifold GIS editions take that approach.
Given recent trends toward dynamic read/write editing of shapefiles there are now many more programs which choose to dynamically allow editing by utilizing the deletion flag on records in the .dbf file. It is therefore now routine to encounter shapefiles where the .dbf and .shp files are discordant, and where one style of interpretation will show objects that the other style of interpretation will not show. For example, if we create a data source from a shapefile in Manifold, open it for editing and then delete an object and then if we open that shapefile in a program such as Manifold 8 that gives priority to the .shp and which ignores deletion flags in the .dbf the object will still be there.
Given the lack of explicit guidance in what has now become a very old shapefile spec both programs will be "right." Resolving such discordances therefore boils down to which approach we prefer if we want to have dynamic read/write editing of shapefiles.