Identifiers, Constants and Literals

This topic describes notation for identifiers, constants, and literals.   SQL in the following refers to SQL in Manifold products.

 

The lists in this topic are comprehensive but new items are often added with updates to Manifold and may appear in the product before they appear in this documentation.   See the query builder tab of the Command Window for an authoritative list of operators, commands, functions, and constants.

Identifiers

Identifiers are names of database things, like the names of tables, fields (columns) in a table, indexes, or constraints.  

 

The prefix mfd_ is reserved for system use.  Do not name any fields, tables, indexes, constraints, etc., using a name that begins with mfd_.

 

Although Manifold SQL allows a wider variety of identifier names, for portability of queries and data it makes sense when choosing identifier names to stick to the usual SQL conventions in the database community:

 

 

The above conventions help to avoid confusion.  Identifiers do not need to be quoted if they stick to the above conventions.

 

SQL is case-insensitive for identifiers and commands.   SELECT NaMe is the same as select name.  

 

Choose a particular style for using upper-case or lower-case letters and stick with that style to write more easily understood SQL.   For example, using all capitals for SQL key words like SELECT while using lower case or title case for identifiers helps distinguish identifiers from SQL key words, as in:

 

SELECT Population FROM Countries;

Delimiters

We can use identifier names that break the usual rules if we enclose those identifier names with quote characters, which are called delimiter characters or just delimiters.   Manifold allows using square brackets [ ],  back tick ` characters, or double quote " characters as delimiters.  All three delimiter character types are allowed, to provide greater interoperability with other SQL implementations.     Single quote ' characters are used to delimit text literals in Manifold, so most people prefer to use either square brackets [ ], or double quote " characters to delimit identifiers.

 

Sticking to the usual rules for identifiers allows using those identifiers in queries without enclosing them in delimiter characters.   For example, the query

 

SELECT * FROM Roads_2016;

 

requires no delimiters around the Roads_2016 table name.

 

In contrast a table name of 2016 Roads would require delimiters:

 

SELECT * FROM [2016 Roads];

 

or

 

SELECT * FROM `2016 Roads`;

 

or

 

SELECT * FROM "2016 Roads";

 

In GIS use, of course, we often must deal with field names and table names, such as 2016 Roads, that have been imported from CSV files or other non-database sources that do not follow the usual SQL rules for identifier names.   We can use such identifiers by enclosing them in delimiter characters.

 

Manifold SQL provides three different choices for delimiters to allow easier interchange of SQL text with other SQL packages.   For example, Microsoft SQL Server and Microsoft Access use square brackets, while MySQL uses back ticks.   Double quote " characters are used to delimit identifiers in Oracle, PostgreSQL, and SQLite, and can also be accepted by SQL Server and MySQL in ANSI_QUOTES mode.   Having all three options available in SQL makes it possible to write queries that work in any of those packages as well as in Manifold products.

 

Of the three available options for delimiters,  reverse quote ` characters are a poor choice for most users, because they are too easy to confuse with single quote ' characters that are used to delimit text strings.  Users with extensive experience in MySQL may prefer reverse quote ` characters as delimiters.

 

Because of Manifold's extensive leveraging of Microsoft standards, square brackets [ ] have been the default choice over double quote " characters for most users.  Microsoft database products like Access and SQL Server use square brackets [ ]  for identifiers, and they use double quote " characters to delimit text strings, not identifiers.   Using square brackets [ ] in preference to double quote " characters avoids confusion with single quote ' characters that enclose text strings, and also avoids confusion from possible typographical errors like two single quote '' characters in a row.

 

Dialogs, like the Edit Query button in the Transform pane, which automatically write SQL for us will enclose delimiters in square brackets [ ].   This documentation is written using square brackets [ ] for examples.  If we prefer double quote " characters as delimiters, when copying and pasting examples of queries from this documentation we can search and replace those square bracket characters with double quote " characters.

 

Whichever delimiters we prefer, to enhance legibility, we should pick the delimiters we like and use them consistently.  Queries are more legible if we always use the same delimiters, even with identifiers that do not require delimiters.  For example, if we have a field named Population 1990 and a table called Countries we can write

 

SELECT [Population 1990] FROM Countries;

 

or

 

SELECT `Population 1990` FROM Countries;

 

or

 

SELECT "Population 1990" FROM Countries;

 

But it is usually more legible to use delimiters throughout:

 

SELECT [Population 1990] FROM [Countries];

 

or

 

SELECT "Population 1990" FROM "Countries";

 

Although we can mix different delimiter styles in the same query (so long as the open and closing quote characters are the same style), that is obviously a bad idea for legibility:

 

SELECT "Population 1990" FROM [Countries];

 

The above is perfectly legal, but it may confuse people.   It is much better to pick a style we like and then use that same style consistently.

 

Note that we cannot write SELECT Population 1990 FROM Countries; (red color is used to indicate erroneous syntax), because Population 1990 has a space in it and thus requires quotation with delimiters.

 

We do not have to use delimiters at all if identifiers are unambiguous.  If we have a table called people that contains a field called name we can write  

 

SELECT name FROM people;  

 

In contrast, if the name of the table is people 2 we must write

 

SELECT name FROM [people 2];

 

People who create their own tables and data sometimes choose to always use identifiers that do not require delimiters.   That does, indeed, make it possible to write very legible queries.    THis documentation sometimes uses such simple names to provide simple examples.  

Identifiers and Key Words

If we have a choice in the matter, it is a really bad idea to use identifiers that are the same as SQL key words, like SELECT, WHERE, or COLLECT.   Although using SQL key words as identifiers obviously can be confusing, it is legal to do so if we enclose such identifiers with delimiters.  

 

Sometimes such identifiers cannot be avoided, for example, when working with tabular data originally created in GIS settings where compatibility with DBMS standards was not a consideration.  In such cases we can deal with such identifiers by putting delimiters around them.

 

For example, suppose we have a table called species with fields called name and collect.  We can use the collect field in a query by putting brackets around it, as in

 

SELECT name, [collect] FROM species;

 

Note that since neither name nor species is ambiguous, we do not need to put brackets around those identifiers.  

 

SQL is easier to understand if we apply the same style to all identifiers.   Many people therefore get in the habit of using delimiters around all identifiers, even if there is no ambiguity about a given identifier and thus identifiers are optional for that name:

 

SELECT [name], [collect] FROM [species];

 

Given that table names and field names which contain spaces are ubiquitous in most GIS work, and thus we will frequently be using delimiters, we may as well get in the habit of using them all the time as a matter of uniform style:   

 

SELECT [Name], [Population 2020] FROM [States Table];

 

Another good reason to get into the habit of using delimiters is that many common words, like VALUE, FIRST, and LAST, are key words in SQL.  If we are new to SQL we might not know that an identifier, like value, is a key word.  Using delimiters with all identifiers automatically prevents errors from any accidental use of a key word as an identifier.

 

It is easy always to use delimiters when writing SQL, because we rarely manually keyboard an identifier name:  we can simply double-click in the Query Builder to add an identifier automatically, complete with square brackets.

Fields in Different Tables

When a query uses fields from different tables, for example in a JOIN, we refer to fields in a table using dot . nomenclature:  [States].[Population] means the Population field within the States table.   [States].[Capital] means the Capital field within the States table.

 

SELECT * FROM [States] WHERE [Population] > 3000000;

 

is the same as writing

 

SELECT * FROM [States] WHERE [States].[Population] > 3000000;

 

Adding the [States]. specification is redundant, since there is only one table in the query and the Population field is unambiguously from that table.   

 

The situation is different when an expression uses fields from different tables, such as in a JOIN, and especially when fields with the same name are in the different tables:

 

SELECT [States].[State], [States].[Population], [Capitals].[Capital]

  FROM [States] JOIN [Capitals]

  ON [States].[State] = [Capitals].[State];

 

The example above is taken from the JOIN Statements topic.  Both the States table and the Capitals table have a State field, so when we use the two tables in the same query we must explicitly use dot syntax to make it clear to which State field in which table we mean, in constructions such as:

 

  ON [States].[State] = [Capitals].[State];

Paths to Tables in Different Data Sources

Tables in different data sources are named in queries by specifying the path to the table using the double colon syntax [datasource]::[table].   That syntax unambiguously identifies the table by specifying the path to the table through the data source hierarchy.   The double colon :: symbology is analogous to the forward slash / or backslash \ character used to specify paths through file systems.   

 

For example, suppose we have a data source called mysql which is a connection to a MySQL database that contains a table called Products.   We could write a query that selects all records from the Products table within the mysql data source as:

 

SELECT * FROM [mysql]::[Products];

 

Double colon path syntax can specify a path to tables more than one level deep in data source hierarchies.   Suppose, for example, we have a project called Europe that contains the mysql data source.   If we link that Europe project as a data source into our current project, we could refer to the products table as [Europe]::[mysql]::[Products], for example:

 

SELECT * FROM [Europe]::[mysql]::[Products];

 

See the Example: Refer to a Table within a Data Source topic for illustrated examples of referring to tables within nested data sources.

 

Double colon syntax is the syntax for a path to a table through a nested hierarchy of data sources.  It is not used to refer to fields.   Combining dot nomenclature with double colon nomenclature is an error, because the field which is referred to in an expression either is inferred from the context by the query engine or it is explicitly stated using dot nomenclature only, using aliases.

 

Suppose we have a data source called books which contains tables such as titles.  The titles table in the books data source has a pages field. We can write a query that uses the titles table and the pages field as follows:

 

SELECT * FROM [books]::[titles] WHERE [pages] > 200;

 

The reference to the titles table uses double colon syntax to specify the full path to that table, so the query engine can find the intended table.  Once the query engine knows the intended table, there is no ambiguity about what table the pages field is in.  There is only one table in the SELECT statement so the pages field means the pages field in that table.  The reference to the pages field can use just the name of the field, as in the statement above.

 

If we wanted to, we could use explicit dot nomenclature to redundantly specify the pages field:

 

SELECT * FROM [books]::[titles] WHERE [titles].[pages] > 200;

 

The above statement is redundant, but it is perfectly legal.   The titles table in the [titles].[pages] construction can only mean the titles table specified by the [books]::[titles] path.  We do not have to explicitly name that table, since the pages field can only mean a field in the one table that is used in the SELECT statement.  But if we want to write a redundant identifier to name the field, we can do that.   

 

However, we do not write:

 

SELECT * FROM [books]::[titles] WHERE [books]::[titles].[pages] > 200;

 

The statement above is an error because the WHERE clause expects a field name, but instead we feed it a path specification to a table.

 

There are situations, for example in JOIN statements, where we might use tables in different data sources where the tables have the same names and fields with the same names, where we must disambiguate references to the different tables, and to the fields they contain.   The way we do that is by using aliases, which can provide disambiguated table names to use in disambiguated dot nomenclature field names.  

 

To see how that works, we can fix the above erroneous statement using an alias.    Instead of the erroneous statement:

 

SELECT * FROM [books]::[titles] WHERE [books]::[titles].[pages] > 200;

 

We write:

 

SELECT * FROM [books]::[titles] AS [book titles] WHERE [book titles].[pages] > 200;

 

The path specification [books]::[titles] using double colon syntax identifies the exact table intended.   The [book titles] alias provides a fully disambiguated name for the intended table that can be used anywhere a table name can be used, such as in dot nomenclature specification of a desired field within that table.

 

The statement above uses a redundant constructions.   There is no need to use an alias, since we could simply have written

 

SELECT * FROM [books]::[titles] WHERE [pages] > 200;

 

since [pages] is not ambiguous in a SELECT statement that uses only one table.   But the example shows how aliases can be used to name tables deep within hierarchies of data sources.    Besides capturing potentially lengthy paths, alias can also make queries much shorter and more understandable.

 

Consider an example using a JOIN, using the JOIN query shown earlier in this topic:

 

SELECT [States].[State], [States].[Population], [Capitals].[Capital]

  FROM [States] JOIN [Capitals]

  ON [States].[State] = [Capitals].[State];

 

In the above query, the tables in the JOIN are all in the local project.   

 

Suppose, however, the tables are located in different data sources, with the Capitals table found within a data source called Cities and the States table found within a data source called USA.   In that case, using double colon notation we can refer to the Capitals table as [CIties]::[Capitals] and to the States table as [USA]::[States].  

 

We then could write the JOIN with aliases used to unambiguously specify the field names:

 

SELECT [USA States].[State], [USA States].[Population], [City Capitals].[Capital]

  FROM [USA]::[States] AS [USA States] JOIN [Cities]::[Capitals] AS [City Capitals]

  ON [USA States].[State] = [City Capitals].[State];

 

We can use whatever alias names we want, but it does make sense to choose names that are self documenting, to make it easier to understand what is going on in big queries when we return to them after a few months.

Double Colon Notation and Geodatabases

Double colon syntax often is used with file geodatabases.  Suppose we have two geodatabases, one called Colorado and the other called USA, and further suppose that both geodatabases contain a Roads layer.  The Colorado geodatabase's Roads layer contains roads within Colorado, while the USA geodatabase's Roads layer contains roads for the entire US.    We might want to work with both Roads layers at the same time, for example, to use more recently updated roads in the Colorado state geodatabase to make corrections to the roads layer for the entire US.

 

Suppose we are working with ArcGIS Pro and we add the Roads layer from the Colorado geodatabase to our map, and then suppose we also add the Roads layer from the USA geodatabase.  ArcGIS Pro will simply add two layers, with both of them being named Roads.   If Pro had SQL built in for file geodatabases (it does not),  it would have to have a way for users to say when they mean the Roads layer in the Colorado geodatabase and when they mean the Roads layer in the USA geodatabase.

 

SQL solves that problem when connecting to multiple geodatabases by using double colon notation [Colorado]::[Roads] to mean the Roads table in the Colorado geodatabase and [USA]::[Roads] to mean the Roads table in the USA geodatabase.   When writing queries that use both tables, we can use aliases to enhance legibility, perhaps using aliases of [Colorado Roads] and [USA Roads] for short.

Constants

SQL provides a variety of useful constants:

 

CR

A string constant for a carriage return character.

CRLF

A string constant consisting of CR plus LF, the end of line sequence in Windows.

E

A number, 2.718...

FALSE

A boolean.

LF

A string constant for a line feed character.

NULL

NULL

PI

A number, 3.1415...

TRUE

A boolean.

WHITESPACE

A string constant for use with StringTrimXxx functions:  contains a space character, a tab character, CR and LF.

 

Useful constants related to data types:

 

INT8MIN

INT8MAX

The minimum and maximum numbers that can be represented in int8 data type.

UINT8MAX

The maximum number that can be represented in uint8 data type.

INT16MIN

INT16MAX

The minimum and maximum numbers that can be represented in int16 data type.

UINT16MAX

The maximum number that can be represented in uint16 data type.

INT32MIN

INT32MAX

The minimum and maximum numbers that can be represented in int32 data type.

UINT32MAX

The maximum number that can be represented in uint32 data type.

FLOAT32MIN

FLOAT32MAX

The minimum and maximum numbers that can be represented in float32 data type.

FLOAT64MIN

FLOAT64MAX

The minimum and maximum numbers that can be represented in float64 data type.

 

There are no constants for int64 and uint64 because computations in queries are performed predominantly in float64 numbers, which cannot store numbers large enough to represent the minimum and maximum numbers possible to represent in int64 or the maximum number possible to represent in uint64.  

Literals and Delimiters

Enter literals using the following notation, setting off the literal value with the indicated characters as delimiters.  Characters such as single quote ' characters, square brackets [ ], or double quote " characters, or other characters that are used to enclose literals, identifiers, or other items are called delimiters.  They are said to delimit something, that is, to set it apart.

 

In the examples below, replace the ... in the pattern with the desired literal.  The terms string literals, text literals, and strings are used as synonyms.

 

'...'

Enclose text literals in single quote ' characters:  

 

SELECT * FROM [Orders] WHERE [City] = 'London' ;

 

\

In text literals the backslash \ character escapes the following character.  To enter the literal text don't we would use 'don\'t' in the query.  To enter the backslash character itself use two backslashes: \\ as in 'C:\\Windows\\System' to mean the literal text value C:\Windows\System.

 

Because strings (text between ' delimiters) in SQL are processed to parse backslash characters, that can complicate the use of strings as arguments to SQL functions, as patterns in regular expressions, or as patterns when using the LIKE operator, where the use of the backslash character has special meaning.  

 

For example, the two character sequence \s in regular expressions matches any single whitespace character.  However, a string consisting of '\s' in SQL simply means "the single, literal, lower-case character s."  If that string is used as a pattern argument in a regular expression SQL function the pattern seen by the regular expression will be s not \s.   To deal with that we must use two backslashes, as discussed in the Regular Expressions topic.

@'...'

Manifold SQL allows specifying string literals in expressions without using escape characters like \ in SQL.  Replace the three dots in the pattern at left with the string literal desired.  This notation applies in SQL expressions and  regular expressions, where it is especially useful.   The string literal cannot include a single ' quote.  To include a single quote within an expression that also uses this @ notation, we must build it with a concatenation expression in the form @'...' & '\'' & @'...'     See the Regular Expressions topic for examples.

#...#

Delimit date literals with hash # characters:  

 

SELECT * FROM [Orders] WHERE [Date] = #6/24/2016# ;

 

(date variations)

Surprisingly many common ways of specifying dates will work as literals, including

 

#5/15/1991#

#May 15 1991#

#May 15, 1991#

#15 May 1991#

#15 May, 1991#

#1991 May 15#

 

Date literals including time can use AM or PM:

 

#01/21/2017 12:05:15#

#01/21/2017 08:05:15 PM#

#01/21/2017 08:05:15 AM#

#01/21/2017 20:05:15#

 

To specify milliseconds, add a dot .  character and the millisecond value:

 

#01/21/2017 12:01:15.126#

 

TRUE or true

Boolean literals may be either true or false  with case not significant:  

 

SELECT * FROM [Orders] WHERE [Shipped] = TRUE;  

 

and

 

SELECT * FROM [Orders] WHERE [Shipped] = trUe;

 

are equivalent.   We do not need the = comparison since

 

SELECT * FROM [Orders] WHERE [Shipped] ;

 

is the same.

 

Character Literals and Unicode text

Text literals entered into query text are stored as Unicode internally using UTF16, that is, two bytes per character.   We can see that when evaluating expressions such as

 

? DataLength('SQL is Great!')

 

which gives a result of float64: 28, meaning 28 bytes are used for the string: two bytes for each of the 13 characters in the string "SQL is Great!" plus two more bytes for the closing zero.    If we wanted to compute the length of the string as ANSI text using one byte per character we could cast the default Unicode of string literals into the ANSI data type of VARCHAR by writing

 

? DataLength(CAST ('SQL is Great!' AS VARCHAR))

 

for a result of float64: 14, meaning one byte for each of the 13 characters plus one byte for the closing zero.

 

To specify a given character in SQL use the Chr function or specify the Unicode value of the character using four hexadecimal digits, as in the pattern '\uXXXX' with X replaced by hexadecimal digits.

 

? 'Z' & Chr(246) & 'e'

 

? 'Z\u00f6e'

 

...both of the above return

 

nvarchar: Zöe

 

Constants - We can combine the use of Chr and Unicode values with available SQL constants, such as CRLF for newline characters:

 

? 'Line 1' & Chr(13) & Chr(10) & 'Line 2'

 

? 'Line 1' & Cr & Lf & 'Line 2'

 

? 'Line 1' & CrLf & 'Line 2'

 

? 'Line 1\u000d\u000aLine 2'

 

... all of the above evaluate to

 

nvarchar: Line 1

Line 2

 

 

 

See Also

Tables

 

Add an Index to a Table

 

Functions

 

Queries

 

Regular Expressions

 

Command Window

 

Command Window - Query Builder

 

SQL Operators

 

SQL Statements

 

SQL Functions

 

Temporary Databases

 

EXECUTE