Join Example: Join Data from Many Records into One Record

A table to table join:  We use the Edit - Join dialog to take data from many records and add that into a single record, as a list, as an average and as a maximum value from the many records.

 

To follow along with table examples in this topic, download from the Examples page on the Manifold website the books.mxb project.  That project contains the publishers and titles tables used in examples.

 

One of the most powerful aspects of Joins is how they can look up data from multiple records and then combine that, using transfer methods such as averaging, into a single record.  For example, if a publisher has many book titles and each such book has a record of sales, the publisher may be interested in finding the average sales considering all of the books.   That is called a many to one join, easy to do in the Join dialog.

 

See the video version of this example in the Join Dialog Part 2 - Joins between Tables video.  

 

 

For this example our original table will be the publishers table.   Using the titles table as the joined table, we will take data from the titles table to populate new fields in the publishers table that list all the titles for each publisher, that provide the average sales across all titles, and that provide the most recent publication date of a title.

 

 

The fields we will use from the titles table are the title_name field, the sales field and the pubdate field.  Note that the titles table we are using does not have the new line added to the bottom of the table in the prior example, so it has no titles for the publisher with a pub_id of P04.

 

With the focus on the open publishers table window, we choose Edit - Join in the main menu.

 

 

In the Join dialog we choose the titles table as the joined table.  Manifold automatically guesses we want to use the pub_id field in that table as the key field. If it guesses wrong, we can choose some other field from the pull down list of fields that can be used as key fields.

 

Press the Add button and choose Fields to launch the Add Fields dialog.

 

 

In the Add Fields dialog we check the title_name, sales, and pubdate fields.  Press OK.

 

 

Back in the Join dialog, that adds the three fields we checked as new fields to the original table, that is, to the publishers table.  

 

We double-click into the title_name new field in the original table and change the name to all titles.  

 

 

Next, we double-click into the transfer method and choose join tokens.   This will harvest the title names from the various titles for that publisher and join them into a single string, using a comma , character to separate the titles within the string.   That single string will be loaded into the new all titles field in the original table.

 

 

We double-click into the sales new field in the original table and change it to average sales.  Next, we double-click into the transfer method and choose average.   This will harvest the sales numbers from the various titles for that publisher and average them into a single number, which will be loaded into the new average sales field in the original table.

 

 

We double-click into the pubdate new field in the original table and change it to last publication.  Next, we double-click into the transfer method and choose max.   This will harvest the pubdate from the various titles for that publisher and choose the maximum, that is, the latest date, which will be loaded into the new last publication field in the original table.

 

 

Press Join.

 

 

The system immediately alters the publishers table to add the new fields we have specified, populating them with values derived from fields in the titles table as we have specified.  The new all titles field contains a list of all titles for that publisher in the titles table, with a comma , character separating each title.   The average sales field has been loaded with the average of sales for all titles for that publisher in the titles table, and the last publication field has been loaded with the maximum, that is, the latest date, for all titles for that publisher in the titles table

 

In addition, a new query appears in the Project pane, called publishers Update.  That query contains the SQL that the dialog used to alter the titles table.

Using an Update Query

Just before we pressed Join Component we checked the Save update query box, to create a new query in the project called publishers Update.   That query contains the SQL magic the dialog used to accomplish the join that we ordered.   We can Run the update query any time we want to update the publishers table to incorporate any changes.

 

 

For example, suppose a new record, for a new title, appears in the titles table.  The new title is from a publisher with a pub_id of P04.  

 

We can repeat the Join operation to update the publishers table. With the focus on the publishers table window, we launch the Edit - Join dialog again.

 

 

We set the dialog up as before, using the procedure shown in the Join Example: Update an Existing field with Join topic or the Join Example: Add Publisher Name to a Table of Book Titles topic to specify source fields and transfer methods for the all titles, average sales, and last publication fields.  

 

We could just press the Join button to update the publishers table.  However, in this case we will click the Setup Join and Edit Query button to see the SQL query Manifold will use to do the update.   That will allow us to run the query as is, or to modify it if we would like it to operate differently.   It will also allow us to save the query for future use, without having to set up the Join dialog again.

 

 

Pressing the Setup Join and Edit Query button launches the Command Window with the query used, an UPDATE query.    The text of the query is:

 

 

-- $manifold$

--

-- Auto-generated

-- Join

--

 

UPDATE (

  SELECT

    t.[pub_id] AS tkey0,

    t.[all titles] AS t0, t.[average sales] AS t1, t.[last publication] AS t2,

    s.sjoinkey, s.s0, s.s1, s.s2

  FROM [publishers] AS t LEFT JOIN (

    SELECT

      [pub_id] AS sjoinkey,

      StringJoinTokens([title_name], ',') AS s0,

      Avg([sales]) AS s1,

      Max([pubdate]) AS s2

    FROM [titles]

    GROUP BY [pub_id]

  ) AS s ON t.[pub_id] = s.sjoinkey

) SET

  t0 = s0, t1 = s1, t2 = s2;

 

 

We can run the query as is by pressing the ! Run button in the main toolbar.

 

 

When we run the query, the publishers table is automatically updated to populate the all titles, average sales, and last publication fields.  Note how the last record now has a title in it and not just a NULL as before.

Modifying the Update Query

The update query works great, but suppose we do not like the lack of a space between the comma and the next title, for example, as in

 

200 Years of German Humor,I Blame My Mother,What Are The Civilian Applications?

 

Suppose also titles have commas in them?  We would prefer that the list of titles used a semicolon ; character as a separator, and also followed each separator with a space character.   That is easy to arrange.

 

In the Command window, we start with the following SQL:

 

-- $manifold$

--

-- Auto-generated

-- Join

--

 

UPDATE (

  SELECT

    t.[pub_id] AS tkey0,

    t.[all titles] AS t0, t.[average sales] AS t1, t.[last publication] AS t2,

    s.sjoinkey, s.s0, s.s1, s.s2

  FROM [publishers] AS t LEFT JOIN (

    SELECT

      [pub_id] AS sjoinkey,

      StringJoinTokens([title_name], ',') AS s0,

      Avg([sales]) AS s1,

      Max([pubdate]) AS s2

    FROM [titles]

    GROUP BY [pub_id]

  ) AS s ON t.[pub_id] = s.sjoinkey

) SET

  t0 = s0, t1 = s1, t2 = s2;

 

 

We will change the StringJoinTokens function so it uses a semicolon ; character followed by a space character as a separator:

 

      StringJoinTokens([title_name], '; ') AS s0,

 

The full text of the query is now:

 

-- $manifold$

--

-- Auto-generated

-- Join

--

 

UPDATE (

  SELECT

    t.[pub_id] AS tkey0,

    t.[all titles] AS t0, t.[average sales] AS t1, t.[last publication] AS t2,

    s.sjoinkey, s.s0, s.s1, s.s2

  FROM [publishers] AS t LEFT JOIN (

    SELECT

      [pub_id] AS sjoinkey,

      StringJoinTokens([title_name], '; ') AS s0,

      Avg([sales]) AS s1,

      Max([pubdate]) AS s2

    FROM [titles]

    GROUP BY [pub_id]

  ) AS s ON t.[pub_id] = s.sjoinkey

) SET

  t0 = s0, t1 = s1, t2 = s2;

 

We press the ! Run button in the main toolbar to run the update query.

 

 

Instantly, the publishers table is updated using the new formatting, for example:

 

200 Years of German Humor; I Blame My Mother; What Are The Civilian Applications?

 

The all titles values now use a semicolon and a space character between titles.   That is better, but suppose we would prefer to construct the all titles strings as JSON objects, so we can re-cycle them into other applications?    That also is easy to do.   

 

In the Command Window, we change the StringJoinTokens line to use the string concatenation SQL operator, &, to prepend a bit of text that starts the JSON expression before the StringJoinTokens function, we change the function to use a double quote, a comma, a space and another double quote as the separator between titles, and then we use & again to append a bit of text to close the JSON expression:

 

      '{"Title":["' & StringJoinTokens([title_name], '", "') & '"]}' AS s0,

 

Lucky for us, the StringJoinTokens function does not add on a comma at the end of the string it builds, so we have no issues with a dangling comma after the last item in our JSON array.   Because & operators come ahead of AS in order of precedence, we do not have to encase the expression within parentheses.

 

The full text of the query is now:

 

-- $manifold$

--

-- Auto-generated

-- Join

--

 

UPDATE (

  SELECT

    t.[pub_id] AS tkey0,

    t.[all titles] AS t0, t.[average sales] AS t1, t.[last publication] AS t2,

    s.sjoinkey, s.s0, s.s1, s.s2

  FROM [publishers] AS t LEFT JOIN (

    SELECT

      [pub_id] AS sjoinkey,

      '{"Title":["' & StringJoinTokens([title_name], '", "') & '"]}' AS s0,

      Avg([sales]) AS s1,

      Max([pubdate]) AS s2

    FROM [titles]

    GROUP BY [pub_id]

  ) AS s ON t.[pub_id] = s.sjoinkey

) SET

  t0 = s0, t1 = s1, t2 = s2;

 

We press the ! Run button in the main toolbar to run the update query.

 

 

The all_titles values now contain a JSON object of the title strings, for example:

 

{"Title":["200 Years of German Humor", "I Blame My Mother", "What Are The Civilian Applications?"]}

 

In the real world, book titles might contain special characters, like quotes or back slashes, so we might add one more refinement to our query in case book titles contain special characters.   We can take advantage of Manifold's ever-popular StringEscapeJson SQL function, which adds character escaping to JSON strings so they safely can be used within functions or queries without special characters in them causing errors:

 

      '{"Title":["' & StringJoinTokens(StringEscapeJson([title_name]), '", "') & '"]}' AS s0,

 

If all we want is a JSON array, we do not need { } curly brackets and we do not need to name the array.   We can simply use:

 

      '["' & StringJoinTokens(StringEscapeJson([title_name]), '", "') & '"]' AS s0,

 

... which will produce all_titles values such as:

 

["200 Years of German Humor", "I Blame My Mother", "What Are The Civilian Applications?"]

 

Modifying the update query written by the Join dialog is a great way to use a point and click dialog to quickly build a join, which for many people can be confusing, and then to customize what the join does in a more compartmentalized way that is easy to understand.

 

Notes

Terminology - The original table also may be called the target table, and the table providing data also may be called the source table.

 

Quick reference - With the Join dialog open, press F1 to open a quick reference guide in a browser window.

 

Guessing the key field - When we launch the dialog with an original field and when we choose a joined field, the dialog will try to guess what fields we might want to use as key fields and will load those first by default.  If it guesses wrong, we can choose the field we want.   For the original table, the dialog tries to use a field (other than mfd_id) that has a BTREE / BTREENULL index, with a data type preference for numbers, then text, then everything else, and a name preference for field names with id in them, such as ... id or ..._id, then ...id and then everything else. For the joined table, the dialog uses similar logic, but first tries to use a field with the same name as in the original table.

 

Added fields are static - Added fields are static, that is, if data in the joined table changes the fields in the original table that received data in the Join operation from the joined table will not automatically be updated.  We can easily update the table any time we want using a saved update query.  

 

Spatial joins are automatically parallel - Joining data between drawings uses parallel GeomOverlay function variants.  Joins from images to drawings are parallelized through a nested SELECT using THREADS.

 

Why setup, and then update? -  Why does the Setup Join and Edit Query button first add the new field desire and then generate an UPDATE query, instead of generating a query that both adds the new field and also does the UPDATE?  Two steps are used to enable repetitive use of the same query to adjust what data is joined into the new field.   If the generated query also created that new field, to run that query multiple times, each time we ran it we would first have to delete the newly created field from the schema, so that part of the query would not cause an error.   By generating only an UPDATE query we can modify it and re-run it however many times we want, to try out different variations of how we UPDATE the table.

 

Why only in the same data source? - The Join dialog works only with tables and queries within the same data source to keep the dialog simple and fast.   An important part of keeping the dialog simple is to show only those tables or queries in the pull down list for the joined table that can be used in a join.   Scanning all tables and queries in the same data source is reasonable, but if there are many data sources or many levels of nesting (data sources within data sources) there could be hundreds of thousands of tables and queries within the hierarchy to scan.  For people working in map projects without such elaborate data sources, most of the time what they do will be within the .map project data source so everything works by default.  To use tables and queries from other data sources as the joined table in the Join dialog we can quickly add a simple aliasing query, as illustrated in the Join Example: Joins between Data Sources topic.  This method also makes it possible for the same Join dialog to work within other data sources, when aliasing queries refer back either to the .map project data source or to other data sources.   Almost all queries can be used as the source, joined table.  The Join dialog also can also use a query as the target, original table, to be used to copy data into the table through the query, as shown in the Join Example: Join into a Query and Join Example:  Join into a Query into a Selection topics, if the result table is writable and the Join does not attempt to create new columns in the table.

 

Buy and read a Fehily book on SQL - Manifold recommends Chris Fehily's excellent books on SQL.  Chris makes learning SQL really easy.  To encourage users to buy a Fehily book and to read it, Manifold examples often use examples similar to those in the Fehily SQL books.

 

Example data - To follow along with table examples in this topic, download from the Examples page on the Manifold website the books.mxb project.  That project contains the publishers and titles tables used in examples.

 

Videos

Join Videos

 

Find Percentages of Open Space in ZIP Code Area - Find the percentage of open space in each ZIP code area given a layer of polygons representing ZIP codes and a layer of polygons showing open spaces like parks and green spaces. This video shows how to do that start to finish in a few simple steps, from initial importing of shape files to final results, in just five minutes, with an additional few minutes of explanation what each step does. Works in Manifold Release 9 or using the free Manifold Viewer.

See Also

Maps

 

Tables

 

Queries

 

Drawings

 

Images

 

Editing and Combining Data

 

Join

 

Join Videos

 

Join Examples

 

Command Window

 

JOIN Statements

 

Editable Results Tables