Chapter 5. Data Access and Change

Fred Toussi

The HSQL Development Group

$Revision: 6807 $

Copyright 2010-2024 Fred Toussi. Permission is granted to distribute this document without any alteration under the terms of the HSQLDB license. Additional permission is granted to the HSQL Development Group to distribute this document with or without alterations under the terms of the HSQLDB license.

2024-10-25

Table of Contents

Overview
Cursors And Result Sets
Columns and Rows
Navigation
Updatability
Sensitivity
Holdability
Autocommit
JDBC Overview
JDBC Parameters
JDBC and Data Change Statements
JDBC Callable Statement
JDBC Returned Values
Cursor Declaration
Syntax Elements
Literals
References, etc.
Value Expression
Predicates
Aggregate Functions
Other Syntax Elements
Data Access Statements
Select Statement
Table
Subquery
Query Specification
Table Expression
Joined Table
Selection
Projection
Computed Columns
Naming
Grouping Operations
Aggregation
Set Operations
With Clause and Recursive Queries
Query Expression
Ordering
Slicing
Indexes Used in SELECT and DML Statements
Data Change Statements
Delete Statement
Truncate Statement
Insert Statement
Update Statement
Merge Statement
Diagnostics and State

Overview

HyperSQL data access and data change statements are compatible with the latest SQL:2023 Standard. There are a few extensions and some relaxation of rules, but these do not affect statements that are written to the Standard syntax. There is full support for classic SQL, as specified by SQL-92, and many enhancements added in later versions of the standard.

Cursors And Result Sets

An SQL statement can be executed in two ways. One way is to use the java.sql.Statement interface. The Statement object can be reused to execute completely different SQL statements. Alternatively, a PreparedStatment can be used to execute an SQL statement repeatedly, and the statements can use parameters. Using either form, if the SQL statement is a query expression, a ResultSet is returned.

In SQL, when a query expression (SELECT or similar SQL statement) is executed, an ephemeral table is created. When this table is returned to the application program, it is returned as a result set, which is accessed row by row by a cursor. A JDBC ResultSet represents an SQL result set and its cursor.

The minimal definition of a cursor is a list of rows with a position that can move forward. Some cursors also allow the position to move backwards or jump to any position in the list.

An SQL cursor has several attributes. These attributes depend on the query expression. Some of these attributes can be overridden by specifying qualifiers in the SQL statement or by specifying values for the parameters of the JDBC Statement or PreparedStatement.

Columns and Rows

The columns of the result set are determined by the query expression. The number of columns and the type and name characteristics of each column are known when the query expression is compiled and before its execution. This metadata information remains constant regardless of changes to the contents of the tables used in the query expression. The metadata for the JDBC ResultSet is in the form of a ResultSetMetaData object. Various methods of the ResultSetMetaData interface return different properties of each column of the ResultSet.

A result set may contain 0 or more rows. The rows are determined by the execution of the query expression.

The setMaxRows(int) method of JDBC Statement allows limiting the number of rows returned by the statement. This limit is conceptually applied after the result has been built, and the excess rows are discarded.

Navigation

A cursor is either scrollable or not. Scrollable cursors allow accessing rows by absolute or relative positioning. No-scroll cursors only allow moving to the next row. The cursor can be optionally declared with the SQL qualifiers SCROLL, or NO SCROLL. The JDBC statement parameter can be specified as: TYPE_FORWARD_ONLY and TYPE_SCROLL_INSENSITIVE. The JDBC type TYPE_SCROLL_SENSITIVE is not supported by HSQLDB.

The default is NO SCROLL or TYPE_FORWARD_ONLY.

When a JDBC ResultSet is opened, it is positioned before the first row. Using the next() method, the position is moved to the first row. While the ResultSet is positioned on a row, various getter methods can be used to access the columns of the row.

Updatability

The result returned by some query expressions is updatable. HSQLDB supports core SQL updatability features, plus some enhancements from the SQL optional features.

A query expression is updatable if it is a SELECT from a single underlying base table (or updatable view) either directly or indirectly. A SELECT statement featuring DISTINCT or GROUP BY or FETCH, LIMIT, OFFSET is not updatable. In an updatable query expression, one or more columns are updatable. An updatable column is a column that can be traced directly to the underlying table. Therefore, columns that contain expressions are not updatable. Examples of updatable query expressions are given below. The view V is updatable when its query expression is updatable. The SELECT statement from this view is also updatable:

 SELECT A, B FROM T WHERE C > 5
 SELECT A, B FROM (SELECT * FROM T WHERE C > 10) AS TT WHERE TT.B <10
 CREATE VIEW V(X,Y) AS SELECT A, B FROM T WHERE C > 0 AND B < 10
 SELECT X FROM V WHERE Y = 5

If a cursor is declared with the SQL qualifier, FOR UPDATE OF <column name list>, then only the stated columns in the result set become updatable. If any of the stated columns is not actually updatable, then the cursor declaration will not succeed.

If the SQL qualifier, FOR UPDATE is used, then all the updatable columns of the result set become updatable.

If a cursor is declared with FOR READ ONLY, then it is not updatable.

In HyperSQL, if FOR READ ONLY or FOR UPDATE is not used then all the updatable columns of the result set become updatable. This relaxes the SQL standard rule that in this case limits updatability to only simply updatable SELECT statements (where all columns are updatable).

In JDBC, CONCUR_READ_ONLY or CONCUR_UPDATABLE can be specified for the Statement parameter. CONCUR_UPDATABLE is required if the returning ResultSet is to be updatable. If CONCUR_READ_ONLY, which is the default, is used, then even an updatable ResultSet becomes read-only.

When a ResultSet is updatable, various setter methods can be used to modify the column values. The names of the setter methods begin with "update". After all the updates on a row are done, the updateRow() method must be called only once to finalise the row update.

An updatable ResultSet may or may not be insertable-into. In an insertable ResultSet, all columns of the result are updatable and any column of the base table that is not in the result must be a generated column or have a default value.

In the ResultSet object, a special pseudo-row, called the insert row, is used to populate values for insertion into the ResultSet (and consequently, into the base table). The setter methods must be used on all the columns, followed by a call to insertRow().

Individual rows from all updatable result sets can be deleted one at a time. The deleteRow() is called when the ResultSet is positioned on a row.

While using an updatable ResultSet to modify data, it is recommended not to change the same data using another ResultSet and not to execute SQL data change statements that modify the same data.

Sensitivity

The sensitivity of the cursor relates to visibility of changes made to the data by the same transaction but without using the given cursor. While the result set is open, the same transaction may use statements such as INSERT or UPDATE, and change the data of the tables from which the result set data is derived. A cursor is SENSITIVE if it reflects those changes. It is INSENSITIVE if it ignores such changes. It is ASENSITIVE if behaviour is implementation dependent.

The SQL default is ASENSITIVE, i.e., implantation dependent.

In HyperSQL all cursors are INSENSITIVE. They do not reflect changes to the data made by other statements.

Holdability

A cursor is holdable if the result set is not automatically closed when the current transaction is committed. Holdability can be specified in the cursor declaration using the SQL qualifiers WITH HOLD or WITHOUT HOLD.

In JDBC, holdability is specified using either of the following values for the Statement parameter: HOLD_CURSORS_OVER_COMMIT, or CLOSE_CURSORS_AT_COMMIT.

The SQL default is WITHOUT HOLD.

The JDBC default for HyperSQL result sets is WITH HOLD for read-only result sets and WITHOUT HOLD for updatable result sets.

If the holdability of a ResultSet is specified in a conflicting manner in the SQL statement and the JDBC Statement object, the JDBC setting takes precedence.

Autocommit

The autocommit property of a connection is a feature of JDBC and ODBC and is not part of the SQL Standard. In autocommit mode, all transactional statements are followed by an implicit commit. In autocommit mode, all ResultSet objects are read-only and holdable.

JDBC Overview

The JDBC settings, ResultSet.CONCUR_READONLY and ResultSet.CONCUR_UPDATABLE are the available alternatives for read-only or updatability. The default is ResultSet.CONCUR_READONLY.

The JDBC settings, ResultSet.TYPE_FORWARD_ONLY, ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.TYPE_SCROLL_SENSITIVE are the available alternatives for both scrollability (navigation) and sensitivity. HyperSQL does not support ResultSet.TYPE_SCROLL_SENSITIVE. The two other alternatives can be used for both updatable and read-only result sets.

The JDBC settings ResultSet.CLOSE_CURSORS_AT_COMMIT and ResultSet.HOLD_CURSORS_OVER_COMMIT are the alternatives for the lifetime of the result set. The default is ResultSet.CLOSE_CURSORS_AT_COMMIT. The other setting can only be used for read-only result sets.

Examples of creating statements for updatable result sets are given below:

 Connection c = newConnection();
 Statement st;
 c.setAutoCommit(false);
 st = c.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_UPDATABLE);
 st = c.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE);

JDBC Parameters

When a JDBC PreparedStatement or CallableStatement is used with an SQL statement that contains dynamic parameters, the data types of the parameters are resolved and determined by the engine when the statement is prepared. The SQL Standard has detailed rules to determine the data types and imposes limits on the maximum length or precision of the parameter. HyperSQL applies the standard rules with two exceptions for parameters with String and BigDecimal Java types. HyperSQL ignores the limits when the parameter value is set, and only enforces the necessary limits for inserts and updates when the PreparedStatement is executed. In all other cases, parameter type limits are checked and enforced when the parameter is set.

In the example below the setString() calls do not raise an exception, but one of the execute() statements does.

 // table definition: CREATE TABLE T (NAME VARCHAR(12), ...)
 Connection c = newConnection();
 PreparedStatement st = c.prepareStatement("SELECT * FROM T WHERE NAME = ?");
 // type of the parameter is VARCHAR(12), which limits length to 12 characters
 st.setString(1, "Eyjafjallajokull"); // string is longer than type, but no exception is raised here
 st.execute(); // executes with no exception and does not find any rows

 // but if an UPDATE is attempted, an exception is raised
 st = c.prepareStatement("UPDATE T SET NAME = ? WHERE ID = 10");
 st.setString(1, "Eyjafjallajokull"); // string is longer than type, but no exception is raised here
 st.execute(); // exception is thrown when HyperSQL checks the value for update

JDBC parameters can be set with any compatible type, as supported by the JDBC specification. For CLOB and BLOB types, you can use streams, or create instances of BLOB or CLOB before assigning them to the parameters. You can even use CLOB or BLOB objects returned from connections to other RDBMS servers. The Connection.createBlob() and createClob() methods can be used to create the new LOBs. For very large LOBs the stream methods are preferable as they use less memory.

For array parameters, you can use a java.sql.Array object that contains the array elements before assigning to JDBC parameters. The Connection.createArrayOf(...) method can be used to create a new object, or you can use an Array returned from connections to other RDBMS servers. You can also use Java arrays of primitives to assign to the SQL array parameters.

The above also applies to the methods of java.sql.ResultSet that are used for setting values in new and updated rows in updatable ResultSet objects.

The setObject() methods of PreparedStatement and CallableStatement also accept objects of the new types introduced in Java 8 and listed below under JDBC Returned Values. The new Java 8 method, getObject(int columnIndex, Class<T> type), can be used to retrieve the value of an OUT parameter from a CallableStatement.

JDBC and Data Change Statements

Data change statements, also called data manipulation statements (DML) such as INSERT, UPDATE, MERGE can be called with different executeUpdate() methods of java.sql.Statement and java.sql.PreparedStatement. Some of these methods allow you to specify how values for generated columns of the table are returned. These methods are documented in the JavaDoc for org.hsqldb.jdbc.JDBCStatement and org.hsqldb.jdbc.JDBCPreparedStatement. HyperSQL can return not just the generated columns, but any set of columns of the table. You can use this to retrieve the columns values that may be modified by a BEFORE TRIGGER on the table.

JDBC Callable Statement

The JDBC CallableStatement interface is used to call Java or SQL procedures that have been defined in the database. The SQL statement is in the form of CALL procedureName ( ... ) with constant value arguments or with parameter markers. Note that you must use a parameter marker for OUT and INOUT arguments of the procedure you are calling. The OUT arguments should not be set before executing the callable statement.

After executing the statement, you can retrieve the OUT and INOUT parameters with the appropriate getXXX() method.

Procedures can also return one or more result sets. You should call the getResultSet() and getMoreResults() methods to retrieve the result sets one by one.

SQL functions can also return a table. You can call such functions the same way as procedures and retrieve the table as a ResultSet.

JDBC Returned Values

The methods of the JDBC ResultSet interface are used to return values and to convert value to different types as supported by the JDBC specification. Methods of JDBC CallableStatement that have the same signature are used to return values from procedure calls.

When a CLOB and BLOB object is returned from a ResultSet, no data is transferred until the data is read by various methods of java.sql.CLOB and java.sql.BLOB. Data is streamed in large blocks to avoid excessive memory use.

Array objects are returned as instances of java.sql.Array.

Methods added in Java 8 are supported and the behaviour of the getObject(int columnIndex) method for retrieving TIMESTAMP WITH TIME ZONE has changed. In versions before 2.4.0, this method returned a java.sql.Timestamp object. It now returns a java.time.OffsetDateTime object which contains the time zone offset value as well as the other fields of the TIMESTAMP.

A new method, getObject(int columnIndex, Class<T> type) is available. With this method, you specify the required return type. The common types such as String, Integer, byte[] are supported, as well as new types that can be used for DATE, TIME, TIMESTAMP, and INTERVAL values. The table below shows which Java classes are the most appropriate for the specified SQL TYPES. In addition, you can use these Java types for values of SQL type that are not fully matched. For example, java.time.LocalDateTime can be used to retrieve a DATE value.

java.util.UUID = UUID
java.time.LocalDate = DATE
java.sql.Date = DATE
java.time.LocalTime = TIME
java.sql.Time = TIME
java.time.LocalDateTime = TIMESTAMP
java.sql.Timestamp = TIMESTAMP
java.time.OffsetTime = TIME WITH TIME ZONE
java.time.OffsetDateTime = TIMESTAMP WITH TIME ZONE
java.time.Duration = INTERVAL MONTH, INTERVAL YEAR, INTERVAL YEAR TO MONTH
java.time.Period = INTERVAL SECOND, INTERVAL MINUTE, INTERVAL HOUR, INTERVAL DAY, and their range combinations
java.sql.Array = all ARRAY

Cursor Declaration

The DECLARE CURSOR statement is used within an SQL PROCEDURE body. In the current version of HyperSQL, the cursor is used only to return a result set from the procedure. Therefore, the cursor must be declared WITH RETURN and can only be READ ONLY.

DECLARE CURSOR

declare cursor statement

<declare cursor> ::= DECLARE <cursor name>

[ { SENSITIVE | INSENSITIVE | ASENSITIVE } ] [ { SCROLL | NO SCROLL } ]

CURSOR [ { WITH HOLD | WITHOUT HOLD } ] [ { WITH RETURN | WITHOUT RETURN } ]

FOR <query expression>

[ FOR { READ ONLY | UPDATE [ OF <column name list> ] } ]

The query expression is a SELECT statement or similar, and is discussed in the rest of this chapter. In the example below a cursor is declared for a SELECT statement. It is later opened to create the result set. The cursor is specified WITHOUT HOLD, so the result set is not kept after a commit. Use WITH HOLD to keep the result set. Note that you need to declare the cursor WITH RETURN as it is returned by the CallableStatement.

 DECLARE thiscursor SCROLL CURSOR WITHOUT HOLD WITH RETURN FOR SELECT * FROM INFORMATION_SCHEMA.TABLES;
 --
 OPEN thiscursor;

Syntax Elements

The syntax elements that can be used in data access and data change statements are described in this section. The SQL Standard has a very extensive set of definitions for these elements. The BNF definitions given here are sometimes simplified.

Literals

Literals are used to express constant values. The general type of a literal is known by its format. The specific type is based on conventions.

unicode escape elements

unicode escape elements

<Unicode escape specifier> ::= [ UESCAPE <quote><Unicode escape character><quote> ]

<Unicode escape value> ::= <Unicode 4 digit escape value> | <Unicode 6 digit escape value> | <Unicode character escape value>

<Unicode 4 digit escape value> ::= <Unicode escape character><hexit><hexit><hexit><hexit>

<Unicode 6 digit escape value> ::= <Unicode escape character><plus sign> <hexit><hexit><hexit><hexit><hexit><hexit>

<Unicode character escape value> ::= <Unicode escape character><Unicode escape character>

<Unicode escape character> ::= a single character other than a <hexit> (a-f, A-F, 0-9), <plus sign>, <quote>, <double quote>, or <white space>

character literal

character literal

<character string literal> ::= [ <introducer><character set specification> ] <quote> [ <character representation>... ] <quote> [ { <separator> <quote> [ <character representation>... ] <quote> }... ]

<introducer> ::= <underscore>

<character representation> ::= <nonquote character> | <quote symbol>

<nonquote character> ::= any character apart from the quote symbol.

<quote symbol> ::= <quote><quote>

<national character string literal> ::= N <quote> [ <character representation>... ] <quote> [ { <separator> <quote> [ <character representation>... ] <quote> }... ]

<Unicode character string literal> ::= [ <introducer><character set specification> ] U<ampersand><quote> [ <Unicode representation>... ] <quote> [ { <separator> <quote> [ <Unicode representation>... ] <quote> }... ] <Unicode escape specifier>

<Unicode representation> ::= <character representation> | <Unicode escape value>

The type of a character literal is CHARACTER. The length of the string literal is the character length of the type. If the quote character is used in a string, it is represented with two quote characters. Long literals can be divided into multiple quoted strings, separated with a space or end-of-line character.

Unicode literals start with U& and can contain ordinary characters and Unicode escapes. A Unicode escape begins with the backslash ( \ ) character and is followed by four hexadecimal characters which specify the character code. The Unicode escape character can be custom defined for a literal string by adding UESPACE as in one of the examples below.

Example of character literals are given below:

 'a literal'  ' string seperated'  ' into parts'
 'a string''s literal form with quote character'
 U&'Unicode string with Greek delta \0394 and phi \03a6 letters'
 U&'Unicode string with forward slash // as custom escape character'UESCAPE'/'

binary literal

binary literal

<binary string literal> ::= X <quote> [ <space>... ] [ { <hexit> [ <space>... ] <hexit> [ <space>... ] }... ] <quote> [ { <separator> <quote> [ <space>... ] [ { <hexit> [ <space>... ] <hexit> [ <space>... ] }... ] <quote> }... ]

<hexit> ::= <digit> | A | B | C | D | E | F | a | b | c | d | e | f

The type of a binary literal is BINARY. The octet length of the binary literal is the length of the type. Case-insensitive hexadecimal characters are used in the binary string. Each pair of characters in the literal represents a byte in the binary string. Long literals can be divided into multiple quoted strings, separated with a space or end-of-line character.

 X'1abACD34' 'Af'

bit literal

bit literal

<bit string literal> ::= B <quote> [ <bit> ... ] <quote> [ { <separator> <quote> [ <bit>... ] <quote> }... ]

<bit> ::= 0 | 1

The type of a binary literal is BIT. The bit length of the bit literal is the length of the type. Digits 0 and 1 are used to represent the bits. Long literals can be divided into multiple quoted strings, separated with a space or end-of-line character.

 B'10001001' '00010'

numeric literal

numeric literal

<signed numeric literal> ::= [ <sign> ] <unsigned numeric literal>

<unsigned numeric literal> ::= <exact numeric literal> | <approximate numeric literal>

<exact numeric literal> ::= <unsigned integer> [ <period> [ <unsigned integer> ] ] | <period> <unsigned integer>

<sign> ::= <plus sign> | <minus sign>

<approximate numeric literal> ::= <mantissa> E <exponent>

<mantissa> ::= <exact numeric literal>

<exponent> ::= <signed integer>

<signed integer> ::= [ <sign> ] <unsigned integer>

<unsigned integer> ::= <digit>...

The type of an exact numeric literal without a decimal point is INTEGER, BIGINT, or DECIMAL, depending on the value of the literal (the smallest type that can represent the value is the type).

The type of an exact numeric literal with a decimal point is DECIMAL. The precision of a decimal literal is the total number of digits of the literal. The scale of the literal is the total number of digits to the right of the decimal point.

The type of an approximate numeric literal is DOUBLE. An approximate numeric literal always includes the mantissa and exponent, separated by E.

 12
 34.35
 +12E-2

boolean literal

boolean literal

<boolean literal> ::= TRUE | FALSE | UNKNOWN

The boolean literal is one of the specified keywords.

datetime and interval literal

datetime and interval literal

<datetime literal> ::= <date literal> | <time literal> | <timestamp literal>

<date literal> ::= DATE <date string>

<time literal> ::= TIME <time string>

<timestamp literal> ::= TIMESTAMP <timestamp string>

<date string> ::= <quote> <unquoted date string> <quote>

<time string> ::= <quote> <unquoted time string> <quote>

<timestamp string> ::= <quote> <unquoted timestamp string> <quote>

<time zone interval> ::= <sign> <hours value> <colon> <minutes value>

<date value> ::= <years value> <minus sign> <months value> <minus sign> <days value>

<time value> ::= <hours value> <colon> <minutes value> <colon> <seconds value>

<interval literal> ::= INTERVAL [ <sign> ] <interval string> <interval qualifier>

<interval string> ::= <quote> <unquoted interval string> <quote>

<unquoted date string> ::= <date value>

<unquoted time string> ::= <time value> [ <time zone interval> ]

<unquoted timestamp string> ::= <unquoted date string> <space> <unquoted time string>

<unquoted interval string> ::= [ <sign> ] { <year-month literal> | <day-time literal> }

<year-month literal> ::= <years value> [ <minus sign> <months value> ] | <months value>

<day-time literal> ::= <day-time interval> | <time interval>

<day-time interval> ::= <days value> [ <space> <hours value> [ <colon> <minutes value> [ <colon> <seconds value> ] ] ]

<time interval> ::= <hours value> [ <colon> <minutes value> [ <colon> <seconds value> ] ] | <minutes value> [ <colon> <seconds value> ] | <seconds value>

<years value> ::= <datetime value>

<months value> ::= <datetime value>

<days value> ::= <datetime value>

<hours value> ::= <datetime value>

<minutes value> ::= <datetime value>

<seconds value> ::= <seconds integer value> [ <period> [ <seconds fraction> ] ]

<seconds integer value> ::= <unsigned integer>

<seconds fraction> ::= <unsigned integer>

<datetime value> ::= <unsigned integer>

The type of a datetime or interval type is specified in the literal. The fractional second precision is the number of digits in the fractional part of the literal. Details are described in the SQL Language chapter

 DATE '2008-08-08'
 TIME '20:08:08'
 TIMESTAMP '2008-08-08 20:08:08.235'

 INTERVAL '10' DAY
 INTERVAL -'08:08' MINUTE TO SECOND

References, etc.

References are identifier chains, which can be a single identifiers or identifiers chains composed of single identifiers chained together with the period symbol.

identifier chain

identifier chain

<identifier chain> ::= <identifier> [ { <period> <identifier> }... ]

<basic identifier chain> ::= <identifier chain>

A period-separated chain of identifiers. The identifiers in an identifier chain can refer to database objects in a hierarchy. The possible hierarchies are as follows. In each hierarchy, elements from the start or the end can be missing, but the order of elements cannot be changed.

catalog, schema, database object

catalog, schema, table, column

correlation name, column

Examples of identifier chain are given below:

 SELECT MYCATALOG.MYSCHEMA.MYTABLE.MYCOL FROM MYCATALOG.MYSCHEMA.MYTABLE
 SELECT MYTABLE.MYCOL FROM MYSCHEMA.MYTABLE
 DROP TABLE MYCATALOG.MYSCHEMA.MYTABLE CASCADE
 ALTER SEQUENCE MYCATALOG.MYSCHEMA.MYSEQUENCE RESTART WITH 100

column reference

column reference

<column reference> ::= <basic identifier chain> | MODULE <period> <qualified identifier> <period> <column name>

Reference a column or a routine variable.

SQL parameter reference

SQL parameter reference

<SQL parameter reference> ::= <basic identifier chain>

Reference an SQL routine parameter.

contextually typed value specification

contextually typed value specification

<contextually typed value specification> ::= <null specification> | <default specification>

<null specification> ::= NULL

<default specification> ::= DEFAULT

Specify a value whose data type or value is inferred from its context.

DEFAULT is used for assignments to table columns that have a default value, or to table columns that are generated either as an IDENTITY value or as an expression.

NULL can be used only in a context where the type of the value is known. For example, a NULL can be assigned to a column of the table in an INSERT or UPDATE statement, because the type of the column is known. But if NULL is used in a SELECT list, it must be used in a CAST statement.

Value Expression

Value expression is a general name for all expressions that return a value. Different types of expressions are allowed in different contexts.

value expression primary

value expression primary

<value expression primary> ::= <parenthesized value expression> | <nonparenthesized value expression primary>

<parenthesized value expression> ::= <left paren> <value expression> <right paren>

<nonparenthesized value expression primary> ::= <unsigned value specification> | <column reference> | <set function specification> | <scalar subquery> | <case expression> | <cast specification> | <next value expression> | <current value expression> | <routine invocation>

Specify a value that is syntactically self-delimited.

value specification

value specification

<value specification> ::= <literal> | <general value specification>

<unsigned value specification> ::= <unsigned literal> | <general value specification>

<target specification> ::= <host parameter specification> | <SQL parameter reference> | <column reference> | <dynamic parameter specification>

<simple target specification> ::= <host parameter specification> | <SQL parameter reference> | <column reference> | <embedded variable name>

<host parameter specification> ::= <host parameter name> [ <indicator parameter> ]

<dynamic parameter specification> ::= <question mark>

Specify one or more values, host parameters, SQL parameters, dynamic parameters, or host variables.

row value expression

row value expression

<row value expression> ::= <row value special case> | <explicit row value constructor>

<row value predicand> ::= <row value special case> | <row value constructor predicand>

<row value special case> ::= <nonparenthesized value expression primary>

<explicit row value constructor> ::= <left paren> <row value constructor element> <comma> <row value constructor element list> <right paren> |

ROW <left paren> <row value constructor element list> <right paren> | <row subquery>

Specify a row consisting of one or more elements. A comma separated list of expressions, enclosed in brackets, with the optional keyword ROW. In SQL, a row containing a single element can often be used where a single value is expected.

set function specification

set function specification

<set function specification> ::= <aggregate function> | <grouping operation>

<grouping operation> ::= GROUPING <left paren> <column reference> [ { <comma> <column reference> }... ] <right paren>

Specify an integer value formed by bits denoting the presence of the column in the current row of the result of GROUPING SETS. HyperSQL supports <grouping operation> from version 2.5.1.

COALESCE

coalesce expression

<coalesce expression> := COALESCE <left paren> <value expression> { <comma> <value expression> }... <right paren>

Replace null values with another value. The coalesce expression has two or more instances of <value expression>. If the first <value expression> evaluates to a non-null value, it is returned as the result of the coalesce expression. If it is null, the next <value expression> is evaluated and if it evaluates to a non-non value, it is returned, and so on.

The type of the return value of a COALESCE expression is the aggregate type of the types of all the <value expression> instances. Therefore, any value returned is implicitly cast to this type. HyperSQL also features built-in functions with similar functionality.

NULLIF

nullif expression

<nullif expression> := NULLIF <left paren> <value expression> <comma> <value expression> <right paren>

Return NULL if two values are equal. If the result of the first <value expression> is not equal to the result of the second, then it is returned, otherwise NULL is returned. The type of the return value is the type of the first <value expression>.

 SELECT i, NULLIF(n, 'not defined') FROM t

CASE

case specification

<case specification> ::= <simple case> | <searched case>

<simple case> ::= CASE <case operand> <simple when clause>... [ <else clause> ] END

<searched case> ::= CASE <searched when clause>... [ <else clause> ] END

<simple when clause> ::= WHEN <when operand list> THEN <result>

<searched when clause> ::= WHEN <search condition> THEN <result>

<else clause> ::= ELSE <result>

<case operand> ::= <row value predicand> | <overlaps predicate part 1>

<when operand list> ::= <when operand> [ { <comma> <when operand> }... ]

<when operand> ::= <row value predicand> | <comparison predicate part 2> | <between predicate part 2> | <in predicate part 2> | <character like predicate part 2> | <octet like predicate part 2> | <similar predicate part 2> | <regex like predicate part 2> | <null predicate part 2> | <quantified comparison predicate part 2> | <match predicate part 2> | <overlaps predicate part 2> | <distinct predicate part 2>

<result> ::= <result expression> | NULL

<result expression> ::= <value expression>

Specify a conditional value. The result of a case expression is always a value. All the values introduced with THEN must be of the same type or convertible to the same type. The WHEN clause of CASE can be used in two different forms. The first form starts with a variable and the WHEN clauses follow, either as possible values for the variable, or as conditions. The second form does not start with a variable and each WHEN is followed by a self-contained conditional expression which can use any variables.

Some simple examples of the CASE expression are given below. The first two examples return 'Britain', 'Germany', or 'Other country' depending on the value of dial code. The third example uses IN and smaller-than predicates.

 CASE dialcode WHEN 44 THEN 'Britain' WHEN 49 THEN 'Germany' ELSE 'Other country' END
 CASE WHEN dialcode=44 THEN 'Britain' WHEN dialcode=49 THEN 'Germany' WHEN dialcode < 0 THEN 'bad dial code' ELSE 'Other country' END
 CASE dialcode WHEN IN (44,49,30) THEN 'Europe' WHEN IN (86,91,92) THEN 'Asia' WHEN < 0 THEN 'bad dial code' ELSE 'Other continent' END

The case statement can be far more complex and involve several conditions.

CAST

cast specification

<cast specification> ::= CAST <left paren> <cast operand> AS <cast target> <right paren>

<cast operand> ::= <value expression> | <implicitly typed value specification>

<cast target> ::= <domain name> | <data type>

Specify a data conversion. Data conversion takes place automatically among variants of a general type. For example, numeric values are freely converted from one type to another in expressions.

Explicit type conversion is necessary in two cases. One case is to determine the type of a NULL value. The other case is to force conversion for special purposes. Values of data types can be cast to a character type. The exception is BINARY and OTHER types. The result of the cast is the literal expression of the value. Conversely, a value of a character type can be converted to another type if the character value is a literal representation of the value in the target type. Special conversions are possible between numeric and interval types, which are described in the section covering interval types.

The examples below show examples of cast with their result:

 CAST (NULL AS TIMESTAMP)
 CAST ('   199  ' AS INTEGER) = 199
 CAST ('tRue ' AS BOOLEAN) = TRUE
 CAST (INTERVAL '2' DAY AS INTEGER) = 2
 CAST ('1992-04-21' AS DATE) = DATE '1992-04-21'

NEXT VALUE FOR

next value expression

<next value expression> ::= NEXT VALUE FOR <sequence generator name>

Return the next value of a sequence generator. This expression can be used as a select list element in queries, or in assignments to table columns in data change statements. If the expression is used more than once in a single row that is being evaluated, the same value is returned for each invocation. After evaluation of the particular row is complete, the sequence generator will return a different value from the old value. The new value is generated by the sequence generator by adding the increment to the last value it generated. In SQL syntax compatibility modes, variants of this expression in different SQL dialects are supported. In the example below the expression is used in an insert statement:

INSERT INTO MYTABLE(COL1, COL2) VALUES 2, NEXT VALUE FOR MYSEQUENCE

CURRENT VALUE FOR

current value expression

<current value expression> ::= CURRENT VALUE FOR <sequence generator name>

Return the latest value that was returned by the NEXT VALUE FOR expression for a sequence generator in this session. In the example below, the value that was generated by the sequence for the first insert, is reused for the second insert:

 INSERT INTO MYTABLE(COL1, COL2) VALUES 2, NEXT VALUE FOR MYSEQUENCE;
 INSERT INTO CHILDTABLE(COL1, COL2) VALUES 10, CURRENT VALUE FOR MYSEQUENCE;

value expression

value expression

<value expression> ::= <numeric value expression> | <string value expression> | <datetime value expression> | <interval value expression> | <boolean value expression> | <row value expression>

An expression that returns a value. The value can be a single value, or a row consisting more than one value.

numeric value expression

numeric value expression

<numeric value expression> ::= <term> | <numeric value expression> <plus sign> <term> | <numeric value expression> <minus sign> <term>

<term> ::= <factor> | <term> <asterisk> <factor> | <term> <solidus> <factor>

<factor> ::= [ <sign> ] <numeric primary>

<numeric primary> ::= <value expression primary> | <numeric value function>

Specify a numeric value. The BNF indicates that <asterisk> and <solidus> (the operators for multiplication and division) have precedence over <minus sign> and <plus sign>.

numeric value function

numeric value function

<numeric value function> ::= <position expression> | <extract expression> | <length expression> ...

Specify a function yielding a value of type numeric. The supported numeric value functions are listed and described in the Built In Functions chapter.

string value expression

string value expression

<string value expression> ::= <string concatenation> | <string factor>

<string factor> ::= <value expression primary> | <string value function>

<string concatenation> ::= <string value expression> <concatenation operator> <string factor>

<concatenation operator> ::= ||

Specify a character string value, a binary string value, or a bit string value. The BNF indicates that a string value expression can be formed by concatenation of two or more <value expression primary>. The types of the <value expression primary> elements must be compatible, that is, all must be string, or binary or bit string values.

character value function

string value function

<string value function> ::= ...

Specify a function that returns a character string or binary string. The supported character value functions are listed and described in the Built In Functions chapter.

datetime value expression

datetime value expression

<datetime value expression> ::= <datetime term> | <interval value expression> <plus sign> <datetime term> | <datetime value expression> <plus sign> <interval term> | <datetime value expression> <minus sign> <interval term>

<datetime term> ::= <datetime factor>

<datetime factor> ::= <datetime primary> [ <time zone> ]

<datetime primary> ::= <value expression primary> | <datetime value function>

<time zone> ::= AT <time zone specifier>

<time zone specifier> ::= LOCAL | TIME ZONE { <interval primary> | <time zone name> }

Specify a datetime value. Details are described in the SQL Language chapter.

datetime value function

datetime value function

<datetime value function> ::= ...

Specify a function that returns a datetime value. The supported datetime value functions are listed and described in the Built In Functions chapter.

interval term

interval value expression

<interval value expression> ::= <interval term> | <interval value expression 1> <plus sign> <interval term 1> | <interval value expression 1> <minus sign> <interval term 1> | <left paren> <datetime value expression> <minus sign> <datetime term> <right paren> <interval qualifier>

<interval term> ::= <interval factor> | <interval term 2> <asterisk> <factor> | <interval term 2> <solidus> <factor> | <term> <asterisk> <interval factor>

<interval factor> ::= [ <sign> ] <interval primary>

<interval primary> ::= <value expression primary> [ <interval qualifier> ] | <interval value function>

<interval value expression 1> ::= <interval value expression>

<interval term 1> ::= <interval term>

<interval term 2> ::= <interval term>

Specify an interval value. Details are described in the SQL Language chapter.

interval absolute value function

interval value function

<interval value function> ::= <interval absolute value function>

<interval absolute value function> ::= ABS <left paren> <interval value expression> <right paren>

Specify a function that returns the absolute value of an interval. If the interval is negative, it is negated, otherwise the original value is returned.

boolean value expression

boolean value expression

<boolean value expression> ::= <boolean term> | <boolean value expression> OR <boolean term>

<boolean term> ::= <boolean factor> | <boolean term> AND <boolean factor>

<boolean factor> ::= [ NOT ] <boolean test>

<boolean test> ::= <boolean primary> [ IS [ NOT ] <truth value> ]

<truth value> ::= TRUE | FALSE | UNKNOWN

<boolean primary> ::= <predicate> | <boolean predicand>

<boolean predicand> ::= <parenthesized boolean value expression> | <nonparenthesized value expression primary>

<parenthesized boolean value expression> ::= <left paren> <boolean value expression> <right paren>

Specify a boolean value.

Predicates

Predicates are conditions and evaluate to a boolean value. Some predicates have two sides. The left side of the predicate, the <row value predicand>, is the common element of all two-sided predicates. This element is a generalisation of both <value expression>, which is a scalar, and of <explicit row value constructor>, which is a row. The two sides of a predicate can be split in CASE expressions where the <row value predicand> is part of multiple predicates.

In the following example, a column of the table is the left side of two predicates in a CASE expression.

SELECT CASE city WHEN 'Oslo' THEN 'Scandinavia' WHEN IN ('Dallas', 'Boston') THEN 'America' ELSE '?' END FROM customer

The number of fields in all <row value predicand> used in predicates must be the same and the types of the fields in the same position must be compatible for comparison. If either of these conditions does not hold, an exception is raised. The number of fields in a row is called the degree.

In many types of predicates (but not all of them), if the <row value predicand> evaluates to NULL, the result of the predicate is UNKNOWN. If the <row value predicand> has more than one element, and one or more of the fields evaluate to NULL, the result depends on the particular predicate.

comparison predicand

comparison predicate

<comparison predicate> ::= <row value predicand> <comp op> <row value predicand>

<comp op> ::= <equals operator> | <not equals operator> | <less than operator> | <greater than operator> | <less than or equals operator> | <greater than or equals operator>

Specify a comparison of two row values. If either <row value predicand> evaluates to NULL, the result of <comparison predicate> is UNKNOWN. Otherwise, the result is TRUE, FALSE or UNKNOWN.

If the degree of <row value predicand> is larger than one, comparison is performed between each field and the corresponding field in the other <row value predicand> from left to right, one by one.

When comparing two elements, if either field is NULL then the result is UNKNOWN.

For <equals operator>, if the result of comparison is TRUE for all field, the result of the predicate is TRUE. If the result of comparison is FALSE for one field, the result of predicate is FALSE. Otherwise the result is UNKNOWN.

The <not equals operator> is translated to NOT (<row value predicand> = <row value predicand>).

The <less than or equals operator> is translated to (<row value predicand> = <row value predicand>) OR (<row value predicand> < <row value predicand>). The <greater than or equals operator> is translated in a similar way.

For the <less than operator> and <greater than operator>, if two fields at a given position are equal, then comparison continues to the next field. Otherwise, the result of the last performed comparison is returned as the result of the predicate. This means that if the first field is NULL, the result is always UNKNOWN.

The logic that governs NULL values and UNKNOWN result is as follows: Suppose the NULL values were substituted by arbitrary real values. If substitution cannot change the result of the predicate, then the result is TRUE or FALSE, based on the existing non-NULL values, otherwise the result of the predicate is UNKNOWN.

The examples of comparison given below use literals, but the literals actually represent the result of evaluation of some expression.

 ((1, 2, 3, 4) = (1, 2, 3, 4)) IS TRUE
 ((1, 2, 3, 4) = (1, 2, 3, 5)) IS FALSE
 ((1, 2, 3, 4) < (1, 2, 3, 4)) IS FALSE
 ((1, 2, 3, 4) < (1, 2, 3, 5)) IS TRUE
 ((NULL, 1, NULL) = (NULL, 1, NULL)) IS UNKNOWN  
 ((NULL, 1, NULL) = (NULL, 2, NULL)) IS FALSE  
 ((NULL, 1, NULL) <> (NULL, 2, NULL)) IS TRUE  
 ((NULL, 1, 2) <all operators> (NULL, 1, 2)) IS UNKNOWN
 ((1, NULL, ...) < (1, 2, ...)) IS UNKNOWN  
 ((1, NULL, ...) < (2, NULL, ...)) IS TRUE
 ((2, NULL, ...) < (1, NULL, ...)) IS FALSE

BETWEEN

between predicate

<between predicate> ::= <row value predicand> <between predicate part 2>

<between predicate part 2> ::= [ NOT ] BETWEEN [ ASYMMETRIC | SYMMETRIC ] <row value predicand> AND <row value predicand>

Specify a range comparison. The default is ASYMMETRIC. The expression X BETWEEN Y AND Z is equivalent to (X >= Y AND X <= Z). Therefore, if Y > Z, the BETWEEN expression is never true. The expression X BETWEEN SYMMETRIC Y AND Z is equivalent to (X >= Y AND X <= Z) OR (X >= Z AND X <= Y). The expression Z NOT BETWEEN ... is equivalent to NOT (Z BETWEEN ...). If any of the three <row value predicand> evaluates to NULL, the result is UNKNOWN.

IN

in predicate

<in predicate> ::= <row value predicand> [ NOT ] IN <in predicate value>

<in predicate value> ::= <table subquery> | <left paren> <in value list> <right paren>

| <left paren> UNNEST <left paren> <array value expression> <right paren> <right paren>

<in value list> ::= <row value expression> [ { <comma> <row value expression> }... ]

Specify a quantified comparison. The expression X NOT IN Y is equivalent to NOT (X IN Y). The ( <in value list> ) is converted into a table with one or more rows. The expression X IN Y is equivalent to X = ANY Y, which is a <quantified comparison predicate>.

If the <table subquery> returns no rows, the result is FALSE. Otherwise the <row value predicand> is compared one by one with each row of the <table subquery>.

If the comparison is TRUE for at least one row, the result is TRUE. If the comparison is FALSE for all rows, the result is FALSE. Otherwise the result is UNKNOWN.

HyperSQL supports an extension to the SQL Standard to allow an array to be used in the <in predicate value>. This is intended to be used with prepared statements where a variable length array of values can be used as the parameter value for each call. The example below shows how this is used in SQL. The JDBC code must create a new java.sql.Array object that contains the values and set the parameter with this array.

 SELECT * FROM customer WHERE firstname IN ( UNNEST(?) )

 Connection conn;
 PreparedStatement ps;
 // conn and ps are instantiated here
 Array arr = conn.createArrayOf("INTEGER", new Integer[] {1, 2, 3});
 ps.setArray(1, arr);
 ResultSet rs = ps.executeQuery();

LIKE

like predicate

<like predicate> ::= <character like predicate> | <octet like predicate>

<character like predicate> ::= <row value predicand> [ NOT ] LIKE <character pattern> [ ESCAPE <escape character> ]

<character pattern> ::= <character value expression>

<escape character> ::= <character value expression>

<octet like predicate> ::= <row value predicand> [ NOT ] LIKE <octet pattern> [ ESCAPE <escape octet> ]

<octet pattern> ::= <binary value expression>

<escape octet> ::= <binary value expression>

Specify a pattern-match comparison for character or binary strings. The <row value predicand> is always a <string value expression> of character or binary type. The <character pattern> or <octet pattern> is a <string value expression> in which the underscore and percent characters have special meanings. The underscore means match any one character, while the percent means match a sequence of zero or more characters. The <escape character> or <escape octet> is also a <string value expression> that evaluates to a string of exactly one character length. If the underscore or the percent is required as normal characters in the pattern, the specified <escape character> or <escape octet> can be used in the pattern before the underscore or the percent. The <row value predicand> is compared with the <character pattern> and the result of comparison is returned. If any of the expressions in the predicate evaluates to NULL, the result of the predicate is UNKNOWN. The expression A NOT LIKE B is equivalent to NOT (A LIKE B). If the length of the escape is not 1 or it is used in the pattern not immediately before an underscore or a percent character, an exception is raised.

IS NULL

null predicate

<null predicate> ::= <row value predicand> IS [ NOT ] NULL

Specify a test for a null value. The expression X IS NOT NULL is NOT equivalent to NOT (X IS NULL)if the degree of the <row value predicand> is larger than 1. The rules are: If all fields are null, X IS NULL is TRUE and X IS NOT NULL is FALSE. If only some fields are null, both X IS NULL and X IS NOT NULL are FALSE. If all fields are not null, X IS NULL is FALSE and X IS NOT NULL is TRUE.

ALL and ANY

quantified comparison predicate

<quantified comparison predicate> ::= <row value predicand> <comp op> <quantifier> <table subquery>

<quantifier> ::= <all> | <some>

<all> ::= ALL

<some> ::= SOME | ANY

Specify a quantified comparison. For a quantified comparison, the <row value predicand> is compared one by one with each row of the <table sub query>.

If the <table subquery> returns no rows, then if ALL is specified the result is TRUE, but if SOME or ANY is specified the result is FALSE.

If ALL is specified, if the comparison is TRUE for all rows, the result of the predicate is TRUE. If the comparison is FALSE for at least one row, the result is FALSE. Otherwise the result is UNKNOWN.

If SOME or ANY is specified, if the comparison is TRUE for at least one row, the result is TRUE. If the comparison is FALSE for all rows, the result is FALSE. Otherwise the result is UNKNOWN. Note that the IN predicate is equivalent to the SOME or ANY predicate using the <equals operator>.

In the examples below, the date of an invoice is compared to holidays in a given year. In the first example the invoice date must equal one of the holidays, in the second example it must be later than all holidays (later than the last holiday), in the third example it must be on or after some holiday (on or after the first holiday), and in the fourth example, it must be before all holidays (before the first holiday).

 invoice_date = SOME (SELECT holiday_date FROM holidays)
 invoice_date > ALL (SELECT holiday_date FROM holidays)
 invoice_date >= ANY (SELECT holiday_date FROM holidays)
 invoice_date < ALL (SELECT holiday_date FROM holidays)

EXISTS

exists predicate

<exists predicate> ::= EXISTS <table subquery>

Specify a test for a non-empty set. If the evaluation of <table subquery> results in one or more rows, then the expression is TRUE, otherwise FALSE.

UNIQUE

unique predicate

<unique predicate> ::= UNIQUE <table subquery>

Specify a test for the absence of duplicate rows. The result of the test is either TRUE or FALSE (never UNKNOWN). The rows of the <table subquery> that contain one or more NULL values are not considered for this test. If the rest of the rows are distinct from each other, the result of the test is TRUE, otherwise it is FALSE. The distinctness of rows X and Y is tested with the predicate X IS DISTINCT FROM Y.

MATCH

match predicate

<match predicate> ::= <row value predicand> MATCH [ UNIQUE ] [ SIMPLE | PARTIAL | FULL ] <table subquery>

Specify a test for matching rows. The default is MATCH SIMPLE without UNIQUE. The result of the test is either TRUE or FALSE (never UNKNOWN).

The interpretation of NULL values is different from other predicates and quite counter-intuitive. If the <row value predicand> is NULL, or all of its fields are NULL, the result is TRUE.

Otherwise, the <row value predicand> is compared with each row of the <table subquery>.

If SIMPLE is specified, if some field of <row value predicate> is NULL, the result is TRUE. Otherwise if <row value predicate> is equal to one or more rows of <table subquery> the result is TRUE if UNIQUE is not specified, or if UNIQUE is specified and only one row matches. Otherwise the result is FALSE.

If PARTIAL is specified, if the non-null values <row value predicate> are equal to those in one or more rows of <table subquery> the result is TRUE if UNIQUE is not specified, or if UNIQUE is specified and only one row matches. Otherwise the result is FALSE.

If FULL is specified, if some field of <row value predicate> is NULL, the result is FALSE. Otherwise if <row value predicate> is equal to one or more rows of <table subquery> the result is TRUE if UNIQUE is not specified, or if UNIQUE is specified and only one row matches.

Note that MATCH can also be used in FOREIGN KEY constraint definitions. The exact meaning is described in the Schemas and Database Objects chapter.

CONTAINS

contains predicate

<contains predicate> ::= PERIOD <row value predicand> CONTAINS PERIOD <row value predicand>

Specify a test for two datetime periods. Each <row value predicand> must have two fields and the fields together represent a datetime period. So the predicates is always in the form PERIOD (X1, X2) CONTAINS PERIOD (Y1, Y2). Fields in each period are always a datetime value of the same type (DATE or TIMESTAMP).

All datetime values are converted to TIMESTAMP WITH TIME ZONE. The second datetime value must be after the first, otherwise a data error is returned.

If the second period is fully within the first period, the result is TRUE. Otherwise it is false.

If any of the values is NULL, the result is UNDEFINED.

EQUALS

equals predicate

<equals predicate> ::= PERIOD <row value predicand> EQUALS PERIOD <row value predicand>

Specify a test for two datetime periods. The conversions and checks are applied the same way as with the CONTAINS predicate. If the two periods have the same begin and end datetime values the result is TRUE. Otherwise it is false.

If any of the values is NULL, the result is UNDEFINED.

IS DISTINCT

is distinct predicate

<distinct predicate> ::= <row value predicand> IS [ NOT ] DISTINCT FROM <row value predicand>

Specify a test of whether two row values are distinct. The result of the test is either TRUE or FALSE (never UNKNOWN). The degree the two <row value predicand> must be the same. Each field of the first <row value predicand> is compared to the field of the second <row value predicand> at the same position. If one field is NULL and the other is not NULL, or if the elements are NOT equal, then the result of the expression is TRUE. If no comparison result is TRUE, then the result of the predicate is FALSE. The expression X IS NOT DISTINCT FROM Y is equivalent to NOT (X IS DISTINCT FORM Y). The following check returns true if startdate is not equal to enddate. It also returns true if either startdate or enddate is NULL. It returns false in other cases.

 startdate IS DISTINCT FROM enddate

OVERLAPS

overlaps predicate

<overlaps predicate> ::= <row value predicand> OVERLAPS <row value predicand>

<overlaps predicate> ::= PERIOD <row value predicand> OVERLAPS PERIOD <row value predicand>

The OVERLAPS predicate tests for an overlap between two datetime periods. This predicate has two forms. The one without the PERIOD keywords is more relaxed in terms of valid periods.

If there is there is any overlap between the two datetime periods, the result is TRUE. Otherwise it is false.

If any of the values is NULL, the result is UNDEFINED.

In the example below, the period is compared with a week long period ending yesterday.

 (startdate, enddate) OVERLAPS (CURRENT_DATE - 7 DAY, CURRENT_DATE - 1 DAY)

PRECEDES

precedes predicate

<precedes predicate> ::= PERIOD <row value predicand> [ IMMEDIATELY] PRECEDES PERIOD <row value predicand>

Specify a test for two datetime periods. The conversions and checks are applied the same way as with the CONTAINS predicate. If the second period begins after the end of the first period, the result is TRUE. Otherwise it is false.

If IMMEDIATELY is specified, the second period must follow immediately after the end of the first period. This means the end of the first period is the same point of time as the start of the second period.

If any of the values is NULL, the result is UNDEFINED.

SUCCEEDS

succeeds predicate

<succeeds predicate> ::= PERIOD <row value predicand> [ IMMEDIATELY ] SUCCEEDS PEDIOD <row value predicand>

Specify a test for two datetime periods with similar syntax to PRECEDES. If the first period begins after the end of the second period, the result is TRUE. Otherwise it is false.

If IMMEDIATELY is specified, the first period must follow immediately after the end of the second period.

If any of the values is NULL, the result is UNKNOWN.

The example below shows a predicate that returns TRUE.

PERIOD (CURRENT_DATE - 7 DAY, CURRENT_DATE) IMMEDIATELY PRECEDES (CURRENT_DATE, CURRENT_DATE + 7 DAY)

Aggregate Functions

aggregate function

aggregate function

<aggregate function> ::= COUNT <left paren> <asterisk> <right paren> [ <filter clause> ] | <general set function> [ <filter clause> ] | <array aggregate function> [ <filter clause> ]

<general set function> ::= <set function type> <left paren> [ <set quantifier> ] <value expression> <right paren>

<set function type> ::= <computational operation>

<computational operation> ::= AVG | MAX | MIN | ANY_VALUE | SUM | EVERY | ANY | SOME | COUNT | STDDEV_POP | STDDEV_SAMP | VAR_SAMP | VAR_POP | MEDIAN

<set quantifier> ::= DISTINCT | ALL

<filter clause> ::= FILTER <left paren> WHERE <search condition> <right paren>

<array aggregate function> ::= ARRAY_AGG <left paren> [ <set quantifier> ] <value expression> [ <order by clause> ] <right paren>

<listagg set function> ::= LISTAGG <left paren> [ <set quantifier> ] <value expression> <comma> <separator> [ <fetch first clause> ] [ <listagg overflow clause> ] <right paren> <within group specification>

<listagg overflow clause> ::= ON OVERFLOW <overflow behavior>

<overflow behavior> ::= ERROR | TRUNCATE [ <listagg truncation filler> ] [ <listagg count indication> ]

<listagg truncation filler> ::= <character string literal>

<listagg count indication> ::= WITH COUNT | WITHOUT COUNT

<group concat function> ::= GROUP_CONCAT <left paren> [ <set quantifier> ] <value expression> [ <order by clause> ] [ SEPARATOR <separator> ] <right paren>

<string agg function> ::= STRING_AGG <left paren> [ <set quantifier> ] <value expression> [ <order by clause> ] [ <comma> <separator> ] <right paren> [ <within group specification> ]

<separator> ::= <character string literal>

<within group specification> ::= WITHIN GROUP <left paren> ORDER BY <sort specification list> <right paren>

Specify a value computed from a collection of rows.

An aggregate function is used exclusively in a <query specification> and its use transforms a normal query into an aggregate query returning a single row instead of the multiple rows that the original query returns. For example, SELECT acolumn <table expression> is a query that returns the value of acolumn for all the rows the satisfy the given condition. But SELECT MAX(acolumn) <table expression> returns only one row, containing the largest value in that column. The query SELECT COUNT(*) <table expression> returns the count of rows, while SELECT COUNT(acolumn) <table expression> returns the count of rows where acolumn IS NOT NULL.

If the <table expression> is a grouped table (has a GROUP BY clause), the aggregate function returns the result of the COUNT or <computational operation> for each group. In this case the result has the same number of rows as the original grouped query. For example, SELECT SUM(acolumn) <table expression> when <table expression> has a GROUP BY clause, returns the sum of values for acolumn in each group.

If all values are NULL, the aggregate function (except COUNT) returns NULL.

The SUM operations can be performed on numeric and interval expressions only. AVG and MEDIAN can be performed on numeric, interval or datetime expressions. AVG returns the average value, while SUM returns the sum of all values. MEDIAN returns the middle value in the sorted list of values.

MAX and MIN can be performed on all types of expressions and return the minimum or the maximum value.

ANY_VALUE returns a value from a single row of the original query. The filter condition can be used to limit the eligible values.

COUNT(*) returns the count of all values, including nulls, while COUNT(<value expression>) returns the count of non-NULL values. COUNT with DISTINCT also accepts multiple arguments. In this usage the distinct combinations of the arguments are counted. Examples below:

 SELECT COUNT(DISTINCT firstname, lastname) FROM customer
 SELECT COUNT(DISTINCT (firstname, lastname)) FROM customer

The EVERY, ANY and SOME operations can be performed on boolean expressions only. EVERY returns TRUE if all the values are TRUE, otherwise FALSE. ANY and SOME are the same operation and return TRUE if one of the values is TRUE, otherwise it returns FALSE.

The other operations perform the statistical functions STDDEV_POP, STDDEV_SAMP, VAR_SAMP, VAR_POP on numeric values. NULL values are ignored in calculations.

User-defined aggregate functions can be defined and used instead of the built-in aggregate functions. Syntax and examples are given in the SQL-Invoked Routines chapter.

The <filter clause> allows you to add a search condition. When the search condition evaluates to TRUE for a row, the row is included in aggregation. Otherwise the row is not included. In the example below a single query returns two different filtered counts:

 SELECT COUNT(ITEM) FILTER (WHERE GENDER = 'F') AS "FEMALE COUNT", COUNT(ITEM) FILTER (WHERE GENDER = 'M') AS "MALE COUNT" FROM PEOPLE

ARRAY_AGG is different from all other aggregate functions as it does not ignore the NULL values. This set function returns an array that contains all the values, for different rows, of the <value expression>. For example, if the <value expression> is a column reference, the SUM function adds the values for all the row together, while the ARRAY_AGG function adds the value for each row as a separate element of the array. ARRAY_AGG can include an optional <order by clause> which can reference all the available columns of the query, not just the <value expression> that is used as the ARRAY_AGG argument. The <order by clause> can have multiple elements (columns) and each element can include NULLS LAST or DESC qualifiers. The elements of the returned array are sorted according to the <order by clause>,

LISTAGG, GROUP_CONCAT and STRING_AGG are equivalent specialised functions derived from ARRAY_AGG. These functions compute the array in the same way as ARRAY_AGG, remove all the NULL elements, then return a string that is a concatenation of the elements of the array. If <separator> has been specified, it is used to separate the elements of the array, otherwise the comma is used to separate the elements.

LISTAGG is an SQL:2023 function and more flexible than the other two functions. HyperSQL extends the Standard syntax by allowing an optional <fetch first clause> to limit the number of elements. The <within group specification> determines the sort order. The optional <listagg overflow clause> allows an error to be returned or truncates the returned string. The optional <listagg truncation filler> is appended to a truncated result. The optional <listagg count indication> can specify that the count of truncated elements is appended to a truncated result.

The example below shows a grouped query with LISTAGG, ARRAY_AGG and GROUP_CONCAT. The CUSTOMER table that is included in the DatabaseManagerSwing GUI app is the source of the data.

 SELECT LASTNAME, ARRAY_AGG(FIRSTNAME ORDER BY FIRSTNAME) FROM Customer GROUP BY LASTNAME

 LASTNAME  C2                                                         
 --------- ---------------------------------------------------------- 
 Steel     ARRAY['John','John','Laura','Robert']                      
 King      ARRAY['George','George','James','Julia','Robert','Robert'] 
 Sommer    ARRAY['Janet','Robert']                                    

 SELECT LASTNAME, LISTAGG(DISTINCT FIRSTNAME, '; ' FETCH 3 ROWS ONLY ON OVERFLOW TRUNCATE WITH COUNT)
    WITHIN GROUP (ORDER BY FIRSTNAME DESC) FROM Customer GROUP BY LASTNAME

 LASTNAME  C2                                                
 --------- ------------------------------------------------- 
 Steel     Robert; Laura; John                   
 King      Robert; Julia; James; ...(1)         
 Sommer    Robert; Janet
 
 SELECT LASTNAME, GROUP_CONCAT(DISTINCT FIRSTNAME ORDER BY FIRSTNAME DESC SEPARATOR ' * ') FROM Customer  GROUP BY LASTNAME

 LASTNAME  C2                                                
 --------- ------------------------------------------------- 
 Steel     Robert * Laura * John                   
 King      Robert * Julia * James * George         
 Sommer    Robert * Janet

Other Syntax Elements

search condition

search condition

<search condition> ::= <boolean value expression>

Specify a condition that is TRUE, FALSE, or UNKNOWN. A search condition is often a predicate.

PATH

path specification

<path specification> ::= PATH <schema name list>

<schema name list> ::= <schema name> [ { <comma> <schema name> }... ]

Specify an order for searching for a user-defined SQL-invoked routine. This is not currently supported by HyperSQL.

routine invocation

routine invocation

<routine invocation> ::= <routine name> <SQL argument list>

<routine name> ::= [ <schema name> <period> ] <qualified identifier>

<SQL argument list> ::= <left paren> [ <SQL argument> [ { <comma> <SQL argument> }... ] ] <right paren>

<SQL argument> ::= <value expression> | <target specification>

Invoke an SQL-invoked routine. Examples are given in the SQL-Invoked Routines chapter.

COLLATE

collate clause

<collate clause> ::= COLLATE <collation name>

Specify a collation for a column or for an ORDER BY expression. This collation is used for comparing the values of the column in different rows. Comparison can happen during the execution of SELECT, UPDATE or DELETE statements, when a UNIQUE constraint or index is defined on the column, or when the rows are sorted by an ORDER BY clause.

CONSTRAINT

constraint name definition

<constraint name definition> ::= CONSTRAINT <constraint name>

<constraint characteristics> ::= <constraint check time> [ [ NOT ] DEFERRABLE ] | [ NOT ] DEFERRABLE [ <constraint check time> ]

<constraint check time> ::= INITIALLY DEFERRED | INITIALLY IMMEDIATE

Specify the name of a constraint and its characteristics. The deferrable characteristic is an optional element of CONSTRAINT definition, not yet supported by HyperSQL.

Data Access Statements

HyperSQL fully supports all of SQL-92 data access statements, plus most of the additions from SQL:2011. Due to time constraints, the current version of this Guide does not cover the subject fully. You are advised to consult an SQL book such as the O'Reilly title, "SQL and Relational Theory" by C. J. Date.

Database queries are data access statements. The most commonly used data access statement is the SELECT statement, but there are other statements that perform a similar role. Data access statements access tables and return result tables. The returned result tables are falsely called result sets, as they are not necessarily sets of rows, but multisets of rows.

Result tables are formed by performing the following operations on base tables and views. These operations are loosely based on Relational Algebra.

JOIN operations

SET and MULTISET operations

SELECTION

PROJECTION

COMPUTING

COLUMN NAMING

GROUPING and AGGREGATION

SELECTION AFTER GROUPING OR AGGREGATION

SET and MULTISET (COLLECTION) OPERATIONS

ORDERING

SLICING

Conceptually, the operations are performed one by one in the above order if they apply to the given data access statement. In the example below a simple select statement is made more complex by adding various operations.

 CREATE TABLE atable (a INT, b INT, c INT, d INT, e INT, f INT);
 /* in the next SELECT, no join is performed and no further operation takes place */
 SELECT * FROM atable
 /* in the next SELECT, selection is performed by the WHERE clause, with no further action */
 SELECT * FROM atable WHERE a + b = c
 /* in the next SELECT, projection is performed after the other operations */
 SELECT d, e, f FROM atable WHERE a + b = c
 /* in the next SELECT, computation is performed after projection */
 SELECT (d + e) / f FROM atable WHERE a + b = c
 /* in the next two SELECT statements, column naming is performed in different ways*/
 SELECT (a + e) / f AS calc, f AS div FROM atable WHERE a + b = c
 SELECT dcol, ecol, fcol FROM atable(acol, bcol, ccol, dcol, ecol, fcol) WHERE acol + bcol = ccol
 /* in the next SELECT, both grouping and aggregation is performed */
 SELECT d, e, SUM(f) FROM atable GROUP BY d, e
 /* in the next SELECT, selection is performed after grouping and aggregation is performed */
 SELECT d, e, SUM(f) FROM atable GROUP BY d, e HAVING SUM(f) > 10
 /* in the next SELECT, a UNION is performed on two selects from the same table */
 SELECT d, e, f FROM atable WHERE d = 3 UNION SELECT a, b, c FROM atable WHERE a = 30
 /* in the next SELECT, ordering is performed */
 SELECT (a + e) / f AS calc, f AS div FROM atable WHERE a + b = c ORDER BY calc DESC, div NULLS LAST
 /* in the next SELECT, slicing is performed after ordering */
 SELECT * FROM atable WHERE a + b = c ORDER BY a FETCH 5 ROWS ONLY

The following sections discuss various types of tables and operations involved in data access statements.

Select Statement

The SELECT statement itself does not cover all types of data access statements, which may combine multiple SELECT statements. The <query specification> is the most common data access statement and begins with the SELECT keyword.

SELECT STATEMENT

select statement (general)

Users generally refer to the SELECT statement when they mean a <query specification> or <query expression>. If a statement begins with SELECT and has no UNION or other set operations, then it is a <query specification>. Otherwise it is a <query expression>.

Table

In data access statements, a table can be a database table (or view) or an ephemeral table formed for the duration of the query. Some types of table are <table primary> and can participate in joins without the use of extra parentheses. The BNF in the Table Primary section below lists different types of <table primary>:

Tables can also be formed by specifying the values that are contained in them:

<table value constructor> ::= VALUES <row value expression list>

<row value expression list> ::= <table row value expression> [ { <comma> <table row value expression> }... ]

In the example below a table with two rows and 3 columns is constructed out of some values:

 VALUES (12, 14, null), (10, 11, CURRENT_DATE)

When a table is used directly in a UNION or similar operation, the keyword TABLE is used with the name:

<explicit table> ::= TABLE <table or query name>

In the examples below, all rows of the two tables are included in the union. The keyword TABLE is used in the first example. The two examples below are equivalent.

 TABLE atable UNION TABLE anothertable
 SELECT * FROM atable UNION SELECT * FROM anothertable

Subquery

A subquery is simply a query expression in brackets. A query expression is usually a complete SELECT statement and is discussed in the rest of this chapter. A scalar subquery returns one row with one column. A row subquery returns one row with one or more columns. A table subquery returns zero or more rows with one or more columns. The distinction between different forms of subquery is syntactic. Different forms are allowed in different contexts. If a scalar subquery or a row subquery return more than one row, an exception is raised. If a scalar or row subquery returns no row, it is usually treated as returning a NULL. Depending on the context, this has different consequences.

<scalar subquery> ::= <subquery>

<row subquery> ::= <subquery>

<table subquery> ::= <subquery>

<subquery> ::= <left paren> <query expression> <right paren>

Query Specification

A query specification is also known as a SELECT statement. It is the most common form of <derived table> . A <table expression> is a base table, a view or any form of allowed derived table. The SELECT statement performs projection, naming, computing, or aggregation on the rows of the <table expression> .

<query specification> ::= SELECT [ DISTINCT | ALL ] <select list> <table expression>

<select list> ::= <asterisk> | <select sublist> [ { <comma> <select sublist> }... ]

<select sublist> ::= <derived column> | <qualified asterisk>

<qualified asterisk> ::= <asterisked identifier chain> <period> <asterisk>

<asterisked identifier chain> ::= <asterisked identifier> [ { <period> <asterisked identifier> }... ]

<asterisked identifier> ::= <identifier>

<derived column> ::= <value expression> [ <as clause> ]

<as clause> ::= [ AS ] <column name>

The qualifier DISTINCT or ALL apply to the results of the SELECT statement after all other operations have been performed. ALL simply returns the rows, while DISTINCT compares the rows and removes the duplicate ones.

Projection is performed by the <select list>.

A single <asterisk> means all columns of the <table expression> are included, in the same order as they appear in the <table expression>. An asterisk qualified by a table name means all the columns of the qualifier table name are included. If an unqualified asterisk is used, then no other items are allowed in the <select list>. When the <table expression> is the direct result of NATURAL or USING joins, the use of <asterisk> includes the columns used for the join before the other columns. A qualified asterisk does not cover the join columns.

A derived column is a <value expression>, optionally named with the <as clause>. A <value expression> can be many things. Common types include: the name of a column in the <table expression>; an expression based on different columns or constant values; a function call; an aggregate function; a CASE WHEN expression.

Table Expression

A table expression is part of the SELECT statement and consists of the FROM clause with optional other clauses that performs selection (of rows) and grouping from the table(s) in the FROM clause.

<table expression> ::= <from clause> [ <where clause> ] [ <group by clause> ] [ <having clause> ]

<from clause> ::= FROM <table reference> [ { <comma> <table reference> }... ]

<table reference> ::= <table primary> | <joined table>

<table primary> ::= <table or query name> [ <query system time period specification> ] [ [ AS ] <correlation name> [ <left paren> <derived column list> <right paren> ] ]

| <derived table> [ AS ] <correlation name> [ <left paren> <derived column list> <right paren> ]

| <lateral derived table> [ AS ] <correlation name> [ <left paren> <derived column list> <right paren> ]

| <collection derived table> [ AS ] <correlation name> [ <left paren> <derived column list> <right paren> ]

| <table function derived table> [ AS ] <correlation name> [ <left paren> <derived column list> <right paren> ]

| <parenthesized joined table> [ AS ] <correlation name> [ <left paren> <derived column list> <right paren> ]

<where clause> ::= WHERE <boolean value expression>

<group by clause> ::= GROUP BY [ <set quantifier> ] <grouping element> [ { <comma> <grouping element> }... ]

<having clause> ::= HAVING <boolean value expression>

<query system time period specification> ::= FOR SYSTEM_TIME AS OF <point in time 1>

| FOR SYSTEM_TIME BETWEEN [ SYMMETRIC ] <point in time 1> AND <point in time 2>

| FOR SYSTEM_TIME FROM <point in time 1> TO <point in time 2>

The <from clause> contains one or more <table reference> separated by commas. A table reference is often a table or view name or a joined table.

The <where clause> filters the rows of the table in the <from clause> and removes the rows for which the search condition is not TRUE.

Table primary refers to different forms of table reference in the FROM clause.

Table or Query Name

The simplest form of reference is simply a name. This is the name of a table, a view, a transition table in a trigger definition, or a query name specified in the WITH clause of a query expression.

<table or query name> ::= <table name> | <transition table name> | <query name>

System Time Period

The <query system time period specification> can be used after the name of a system-versioned table to query historic data in the table. Without this clause, only the current rows of the table are returned and historic rows are ignored. The first example below shows a list of customers as of a year ago. The second example also shows any changes made to the email column over the previous year.

SELECT firstname, lastname, email FROM customer FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP - 1 YEAR

SELECT DISTINCT firstname, lastname, email FROM customer FOR SYSTEM_TIME FROM CURRENT_TIMESTAMP - 2 YEAR TO CURRENT_TIMESTAMP - 1 YEAR

Derived Table

derived table

A query expression that is enclosed in parentheses and returns from zero to many rows is a <table subquery>. In a <derived table> the query expression is self contained and cannot reference the columns of other table references. This is the traditional and most common form of use of a <table subquery>.

<derived table> ::= <table subquery>

Lateral

LATERAL

When the word LATERAL is used before a <table subquery>, it means the query expression can reference the columns of other table references that go before it.

<lateral derived table> ::= LATERAL <table subquery>

The use of <lateral derived table> completely transforms the way a query is written. For example, the two queries below are equivalent, but with different forms. The query with LATERAL is evaluated separately for each row of the first table that satisfies the WHERE condition. The example below uses the tables that are created and populated in DatabaseManagerSwing with the "Insert test data" menu option. The first query uses a scalar subquery to compute the sum of invoice values for each customer. The second query is equivalent and uses a join with a LATERAL table.

SELECT firstname, lastname, (SELECT SUM(total) FROM invoice WHERE customerid = customer.id) s FROM customer

SELECT firstname, lastname, a.c FROM customer, LATERAL(SELECT SUM(total) FROM invoice WHERE customerid = customer.id) a (c)

UNNEST

UNNEST

UNNEST is similar to LATERAL, but instead of a query expression, one or more expressions that return an array are used. These expressions are converted into a table which has one column for each expression and contains the elements of the array. If WITH ORDINALITY is used, an extra column that contains the index of each element is added to this table. The number of rows in the table equals the length of the largest arrays. The smaller arrays are padded with NULL values. If an <array value expression> evaluates to NULL, an empty array is used in its place. The array expression can contain references to any column of the table references preceding the current table reference.

<collection derived table> ::= UNNEST <left paren> <array value expression>, ... <right paren> [ WITH ORDINALITY ]

The <array value expression> can be the result of a function call. If the arguments of the function call are values from the tables on the left of the UNNEST, then the function is called for each row of table.

In the first example below, UNNEST is used with the built in-function SEQUENCE_ARRAY to build a table containing dates for the last seven days and their ordinal position. In the second example, a select statement returns costs for the last seven days . In the third example, the WITH clause turns the two selects into named subqueries which are used in a SELECT statement that uses a LEFT join.

 SELECT * FROM UNNEST(SEQUENCE_ARRAY(CURRENT_DATE - 7 DAY, CURRENT_DATE - 1 DAY, 1 DAY)) WITH ORDINALITY AS T(D, I)  

 D          I 
 ---------- - 
 2010-07-25 1 
 2010-07-26 2 
 2010-07-27 3 
 2010-07-28 4 
 2010-07-29 5 
 2010-07-30 6 
 2010-07-31 7 

 CREATE TABLE expenses (item_date DATE, cost DECIMAL(8,2))
 --
 SELECT item_date, SUM(cost) AS total FROM expenses WHERE item_date >= CURRENT_DATE - 7 DAY GROUP BY item_date

 ITEM_DATE  TOTAL      
 ---------- ------ 
 2010-07-27 100.12 
 2010-07-29 50.45  

 WITH costs(item_date, total) AS (SELECT item_date, SUM(cost) FROM expenses WHERE item_date >= CURRENT_DATE - 7 DAY GROUP BY item_date),
 dates(d, i) AS (SELECT * FROM UNNEST(SEQUENCE_ARRAY(CURRENT_DATE - 7 DAY, CURRENT_DATE - 1 DAY, 1 DAY)) WITH ORDINALITY)   
 SELECT i, d, total FROM dates LEFT OUTER JOIN costs ON dates.d = costs.item_date

 I D          TOTAL      
 - ---------- ------ 
 1 2010-07-25 (null) 
 2 2010-07-26 (null) 
 3 2010-07-27 100.12 
 4 2010-07-28 (null) 
 5 2010-07-29 50.45  
 6 2010-07-30 (null) 
 7 2010-07-31 (null) 

Table Function Derived Table

Table Function Derived Table

When TABLE is used in this context, the <collection value expression> must be the result of a function call to a built-in function or user-defined function that returns an array or a table. When the function returns an array, this array is converted into a table, similar to the way UNNEST operates. When the function returns a table, the result is a MULTISET and is used as is.

<table function derived table> ::= TABLE <left paren> <collection value expression> <right paren>

Parenthesized Joined Table

A parenthesized joined table is simply a joined table contained in parentheses. Joined tables are discussed below.

<parenthesized joined table> ::= <left paren> <parenthesized joined table> <right paren> | <left paren> <joined table> <right paren>

Column Name List

column name list

The column list that is specified for the table reference must contain names that are unique within the list

<derived column list> ::= <column name list>

<column name list> ::= <column name> [ { <comma> <column name> }... ]

Joined Table

Joins are operators with two table as the operands, resulting in a third table, called joined table. All join operators are evaluated left to right, therefore, with multiple joins, the table resulting from the first join operator becomes an operand of the next join operator. Parentheses can be used to group sequences of joined tables and change the evaluation order. So if more than two tables are joined together with join operators, the end result is also a joined table. There are different types of join, each producing the result table in a different way.

<joined table> ::= <cross join> | <qualified join> | <natural join>

<cross join> ::= <table reference> CROSS JOIN <table factor>

<qualified join> ::= <table reference> | [ <join type> ] JOIN <table reference> <join specification>

<natural join> ::= <table reference> NATURAL [ <join type> ] JOIN <table factor>

<join specification> ::= <join condition> | <named columns join>

<join condition> ::= ON <search condition>

<named columns join> ::= USING <left paren> <join column list> <right paren>

<join type> ::= INNER | <outer join type> [ OUTER ]

<outer join type> ::= LEFT | RIGHT | FULL

<join column list> ::= <column name list>

CROSS JOIN

The simplest form of join is CROSS JOIN. The CROSS JOIN of two tables is a table that has all the columns of the first table, followed by all the columns of the second table, in the original order. Each row of the first table is combined with each row of the second table to fill the rows of the new table. If the rows of each table form a set, then the rows of the CROSS JOIN table form the Cartesian product of the rows of the two table operands.

Conditions are not allowed as part of a cross join, which is simply A CROSS JOIN B. Any conditions in a WHERE clause are later applied to the table resulting from the cross join.

Tables in the FROM CLAUSE separated with commas, are equivalent to cross joins between the tables. Two joined tables separated with a comma, such as A, B, is equivalent to (A) CROSS JOIN (B), which means the joined tables A and B are populated separately before they are joined.

CROSS JOIN is not is not generally very useful, as it returns large result tables unless WHERE conditions are used.

UNION JOIN

The UNION JOIN has limited use in queries. The result table has the same columns as that of CROSS JOIN. Each row of the first table is extended to the right with nulls and added to the new table. Each row of the second table is extended to the left with nulls and added to the new table. The UNION JOIN is expressed as A UNION JOIN B. This should not be confused with A UNION B, which is a set operation. Union join is for special applications and is not commonly used.

JOIN ... ON

The condition join is similar to CROSS JOIN, but a condition is tested for each row of the new table and the row is created only if the condition is true. This form of join is expressed as A JOIN B ON (<search condition>).

Equijoin is a condition join in which the search condition is an equality condition between on or more pairs of columns from the two table. Equijoin is the most commonly used type of join.

SELECT a.*, b.* FROM a INNER JOIN b ON a.col_one = b.col_two

JOIN ... USING

NATURAL JOIN

Joins with USING or NATURAL keywords are similar to an equijoin but they cannot be replaced simply with an equijoin. The new table is formed with the specified or implied shared columns of the two tables, followed by the rest of the columns from each table. In NATURAL JOIN, the shared columns are all the column pairs that have the same name in the first and second table. In JOIN USING, only columns names that are specified by the USING clause are shared. The joins are expressed as A NATURAL JOIN B, and A JOIN B USING (<comma separated column name list>).

The columns of the joined table are formed by the following procedures: In JOIN ... USING the shared columns are added to the joined table in the same order as they appear in the column name list. In NATURAL JOIN the shared columns are added to the joined table in the same order as they appear in the first table. In both forms of join, the non-shared columns of the first table are added in the order they appear in the first table, finally the non-shared columns of the second table are added in the order they appear in the second table.

The type of each shared column of the joined table is based on the type of the columns in the original tables. If the original types are not exactly the same, the type of the shared column is formed by type aggregation. Type aggregations selects a type that can represent values of both aggregated types. Simple type aggregation picks one of the types. For example, SMALLINT and INTEGER, results in INTEGER, or VARCHAR(10) and VARCHAR(20) results in VARCHAR(20). More complex type aggregation inherits properties from both types. For example DECIMAL(8) and DECIMAL (6,2) results in DECIMAL (8,2).

In the examples below, the rows are joined exactly the same way, but the first query contains a.col_two and b.col_two together with all the rest of the columns of both tables, while the second query returns only one copy of col_two.

 SELECT * FROM a INNER JOIN b ON a.col_two = b.col_two
 SELECT * FROM a INNER JOIN b USING (col_two)

OUTER JOIN

LEFT, RIGHT and FULL OUTER JOIN

The three qualifiers can be added to all types of JOIN except CROSS JOIN and UNION JOIN. First the new table is populated with the rows from the original join. If LEFT is specified, all the rows from the first table that did not make it into the new table are extended to the right with nulls and added to the table. If RIGHT is specified, all the rows from the second table that did not make it into the new table are extended to the left with nulls and added to the table. If FULL is specified, the addition of leftover rows is performed from both the first and the second table. These forms are expressed by prefixing the join specification with the given keyword. For example, A LEFT OUTER JOIN B ON (<search condition>) or A NATURAL FULL OUTER JOIN B or A FULL OUTER JOIN B USING (<comma separated column name list>).

 SELECT a.*, b.* FROM a LEFT OUTER JOIN b ON a.col_one = b.col_two

Selection

Despite the name, selection has nothing to do with the list of columns in a SELECT statement. In fact, it refers to the search condition used to limit the rows that from a result table (selection of rows, not columns). In SQL, simple selection is expressed with a WHERE condition appended to a single table or a joined table. In some cases, this method of selection is the only method available; for example in DELETE and UPDATE statements. But when it is possible to perform the selection with join conditions, this is the better method, as it results in a clearer expression of the query.

Projection

Projection is selection of the columns from a simple or joined table to form a result table. Explicit projection is performed in the SELECT statement by specifying the select column list. Some form of projection is also performed in JOIN ... USING and NATURAL JOIN.

The joined table has columns that are formed according to the rules mentioned above. But in many cases, not all the columns are necessary for the intended operation. If the statement is in the form, SELECT * FROM <joined table>, then all the columns of <joined table> are returned. But normally, the columns to be returned are specified after the SELECT keyword, separated from each other with commas.

Computed Columns

In the select list, it is possible to use expressions that reference any columns of <joined table>. Each of these expressions forms a computed column. It is computed for each row of the result table, using the values of the columns of the <joined table> for that row.

Naming

Naming is used to hide the original names of tables or table columns and to replace them with new names in the scope of the query. Naming is also used for defining names for computed columns.

Without explicit naming, the name of a column is a predefined name. If the column is a column of a table, or is a named parameter, the name is of the table column or parameter is used. Otherwise it is generated by the database engine. HyperSQL generates column names such as C1, C2, etc. As generated naming is implementation defined according to the SQL Standard, it is better to explicitly name the computed and derived columns in your applications.

Naming in Joined Table

Naming is performed by adding a new name after a table's real name and by adding a list of column names after the new table name. Both table naming and column naming are optional, but table naming is required for column naming. The expression A [AS] X (<comma separated column name list>) means table A is used in the query expression as table X and its columns are named as in the given list. The original name A, or its original column names, are not visible in the scope of the query. The BNF is given below. The <correlation name> can be the same or different from the name of the table. The <derived column list> is a comma separated list of column names. The degree of this list must be equal to the degree of the table. The column names in the list must be distinct. They can be the same or different from the names of the table's columns.

<table or query name> [ [ AS ] <correlation name> [ <left paren> <derived column list> <right paren> ] ]

In the examples below, the columns of the original tables are named (a, b, c, d, e, f). The two queries are equivalent. In the second query, the table and its columns are renamed and the new names are used in the WHERE clauses:

 CREATE TABLE atable (a INT, b INT, c INT, d INT, e INT, f INT);
 SELECT d, e, f FROM atable WHERE a + b = c
 SELECT x, y, z FROM atable AS t (u, v, w, x, y, z)  WHERE u + v = w

Naming in Select List

Naming in the SELECT list logically takes place after naming in the joined table. The new names for columns are not visible in the immediate query expression or query expression. They become visible in the ORDER BY clause and in the result table that is returned to the user. Or if the query expression is used as a derived table in an enclosing query expression.

In the example below, the query is on the same table but with column renaming in the Select list. The new names are used in the ORDER BY clause:

 SELECT x + y AS xysum, y + z AS yzsum FROM atable AS t (u, v, w, x, y, z)  WHERE u + v = w ORDER BY xysum, yzsum

If the names xysum or yzsum are not used, the computed columns cannot be referenced in the ORDER BY list.

Name Resolution

In a joined table, if a column name appears in tables on both sides then any reference to the name must use the table name in order to specify which table is being referred to.

Grouping Operations

Grouping Operations

Grouping results in the elimination of duplicate rows. A grouping operation is performed after the operations discussed above. A simple form of grouping is performed by the use of DISTINCT after SELECT. This eliminates all the duplicate rows (rows that have the same value in each of their columns when compared to another row). The other form of grouping is performed with the GROUP BY clause. This form is usually used together with aggregation.

GROUP BY

<group by clause> ::= GROUP BY [ <set quantifier> ] <grouping element> [ { <comma> <grouping element> }... ]

<grouping element> ::= <ordinary grouping set> | <rollup list> | <cube list> | <grouping sets specification> | <empty grouping set>

<ordinary grouping set> ::= <grouping column reference> | <left paren> <grouping column reference list> <right paren>

<grouping column reference list> ::= <grouping column reference> [ { <comma> <grouping column reference> }... ]

<grouping column reference> ::= <column reference> [ <collate clause> ]

<rollup list> ::= ROLLUP <left paren> <ordinary grouping set list> <right paren>

<ordinary grouping set list> ::= <ordinary grouping set> [ { <comma> <ordinary grouping set> }... ]

<cube list> ::= CUBE <left paren> <ordinary grouping set list> <right paren>

<grouping sets specification> ::= GROUPING SETS <left paren> <grouping set list> <right paren>

<grouping set list> ::= <grouping set> [ { <comma> <grouping set> }... ]

<grouping set> ::= <ordinary grouping set> | <rollup list> | <cube list> | <grouping sets specification> | <empty grouping set>

<empty grouping set> ::= <left paren> <right paren>

An ordinary <group by clause> is a comma separated list of columns of the table formed by the <from clause> or expressions based on the columns. This is the most common usage and can be described as GROUP BY <column reference> [ { <comma> <grouping column reference> }... ].

When a <group by clause> is used, only the columns used in the <group by clause> or expressions used there, can be used in the <select list>, together with any <aggregate function> on other columns. In other words, the column names or expressions listed in the GROUP BY clause dictate what can be used in the <select list>. After the rows of the table formed by the <from clause> and the <where clause> are finalised, the grouping operation groups together the rows that have the same values in the columns of the <group by clause>. Then any <aggregate function> in the <select list> is performed on each group, and for each group, a row is formed that contains the values of the columns of the <group by clause> and the values returned from each <aggregate function>.

When the type of <column reference> is character string, the <collate clause> can be used to specify the collation used for grouping the rows. For example, a collation that is not case sensitive can be used, or a collation for a different language than the original collation of the column.

The first example below shows a simple GROUP BY, while in the second example, has a HAVING condition.

 CREATE TABLE REVENUE(CHANNEL VARCHAR(20), YEAR INTEGER, COUNTRY VARCHAR(2), PROVINCE VARCHAR(20), SALES INTEGER);
 SELECT CHANNEL, YEAR, COUNTRY, SUM(SALES) FROM REVENUE GROUP BY CHANNEL, YEAR, COUNTRY;
 SELECT CHANNEL, YEAR, COUNTRY, SUM(SALES) FROM REVENUE GROUP BY CHANNEL, YEAR, COUNTRY HAVING SUM(SALES) > 50000;

An extended <group by clause> may comprise elements such as GROUPING SETS, ROLLUP, CUBE and the empty grouping set. These syntax elements are expanded and then simplified into a list of parenthesized column elements, which result in multiple grouping operations. HyperSQL supports all the syntax listed above. The example below uses ROLLUP for grouping.

 SELECT CHANNEL, YEAR, COUNTRY, SUM(SALES) AS S
 FROM REVENUE 
 GROUP BY ROLLUP(CHANNEL, YEAR, COUNTRY);

The ROLLUP is translated into 4 groupings: (channel, year, country), (channel, year), (channel), (). The result set will contain the rows as grouped by (channel, year, country), then rows as grouped by (channel, year) with the country column replaced by null, then rows as grouped by (channel) only, with year and country columns replaced by null, then a single row representing the () empty group with all three columns replaced by null.

CHANNEL      YEAR   COUNTRY S       
------------ ------ ------- ------- 
INTERNET     2009   GB      25000   
INTERNET     2009   US      275000  
INTERNET     2010   GB      45000   
INTERNET     2010   US      500000  
DIRECT SALES 2009   GB      162000  
DIRECT SALES 2009   US      1602500 
DIRECT SALES 2010   GB      181000  
DIRECT SALES 2010   US      1833000 
INTERNET     2009   (null)  300000  
INTERNET     2010   (null)  545000  
DIRECT SALES 2009   (null)  1764500 
DIRECT SALES 2010   (null)  2014000 
INTERNET     (null) (null)  845000  
DIRECT SALES (null) (null)  3778500 
(null)       (null) (null)  4623500 

If CUBE is used instead of ROLLUP, other permutations of the three columns are added to those produced by ROLLUP. These include (channel, country), (year, country), (year) and (country).

The optional <set quantifier> is either ALL or DISTINCT and defaults to ALL. When GROUPING SETS is used and multiple sets are specified and some duplicate groups are created as a result, use of DISTINCT eliminates the duplicate groups.

Note any ordering of the rows returned by GROUP BY in incidental. You need to use ORDER BY for the ordering you require.

HAVING

A <having clause> filters the rows of the table that is formed after applying the <group by clause> using its search condition. The search condition must be an expression based on the expressions in the GROUP BY list or the aggregate functions used.

DISTINCT

SELECT DISTINCT

When the keyword DISTINCT is used after SELECT, it works as a shortcut replacement for a simple GROUP BY clause. The expressions in the SELECT list are used directly as the <group by clause>. The following examples of SELECT DISTINCT and SELECT with GROUP BY are equivalent.

 SELECT DISTINCT d, e + f FROM atable WHERE a + b = c
 SELECT d, e + f FROM atable WHERE a + b = c GROUP BY d, e + f

Aggregation

Aggregation is an operation that computes a single value from the values of a column over several rows. The operation is performed with an aggregate function. The simplest form of aggregation is counting, performed by the COUNT function.

Other common aggregate functions return the maximum, minimum and average value among the values in different rows. Aggregate functions were discussed earlier in this chapter.

Set Operations

Set and Multiset Operations

While join operations generally result in laterally expanded tables, SET and COLLECTION operations are performed on two tables that have the same degree and result in a table of the same degree. The SET operations are UNION, INTERSECT and EXCEPT (difference). When each of these operations is performed on two tables, the collection of rows in each table and in the result is reduced to a set of rows, by eliminating duplicates. The set operations are then performed on the two tables, resulting in the new table which itself is a set of rows. Collection operations are similar but the tables are not reduced to sets before or after the operation and the result is not necessarily a set, but a collection of rows.

The set operations on two tables A and B are: A UNION [DISTINCT] B, A INTERSECT [DISTINCT] B and A EXCEPT [DISTINCT] B. The result table is formed in the following way: The UNION operation adds all the rows from A and B into the new table, but avoids copying duplicate rows. The INTERSECT operation copies only those rows from each table that also exist in the other table, but avoids copying duplicate rows. The EXCEPT operation copies those rows from the first table which do not exist in the second table, but avoids copying duplicate rows.

The collection operations are similar to the set operations, but can return duplicate rows. They are: A UNION ALL B, A INTERSECT ALL B and A EXCEPT ALL B. The UNION ALL operation adds all the rows from A and B into the new table. The INTERSECT operation copies only those rows from each table that also exist in the other table. If n copies of a rows exists in one table, and m copies in the other table, the number of copies in the result table is the smaller of n and m. The EXCEPT operation copies those rows from the first table which do not exist in the second table. If n copies of a row exist in the first table and m copies in the second table the number of copies in the result table is n-m, or if n < m, then zero.

With Clause and Recursive Queries

The optional WITH clause can be used in a query expression. The WITH clause lists one or more named ephemeral tables that can be referenced in the query expression body. The ephemeral tables are created and populated before the rest of the query expression is executed. Their contents do not change during the execution of the <query expression body> that follows the WITH clause.

<with clause> ::= WITH [ RECURSIVE ] <with list>

<with list> ::= <with list element> [ { <comma> <with list element> }... ]

<with list element> ::= <query name> [ <left paren> <with column list> <right paren> ] AS <left paren> <query expression> <right paren>

<with column list> ::= <column name list>

An example of the use of the WITH clause is given above under UNNEST. The <query expression> in the WITH clause is evaluated and its result table can be referenced in the body of the main <query expression body> using the specified <query name>.

When RECURSIVE is used, the <with column list> must be defined. The RECURSIVE keyword changes the way the <with list> is interpreted. The <query expression> contained in the <with list element> must be the UNION or UNION ALL of two <query expression body> elements (VALUES or SELECT statements). A working table is created and the left side SELECT of the UNION is evaluated only once and its result is copied to the working table. This result is also copied to the general result of the <query expression>. Iteration starts after this step. In each iteration, the right side SELECT is evaluated. The contents of the working table is used when the <query name> is referenced in the right side SELECT statement of the UNION. The result of this SELECT is then added to the previous general result of the <query expression> with UNION or UNION ALL. The working table is cleared and filled with the latest result. These operations are repeated again and again, until the latest result is empty and the general result of the <query expression> stops changing. The result of the <with list element> is now fully populated and is later used in the execution of the <query expression body> that follows the WITH clause.

HyperSQL limits recursion to 256 rounds. If this is exceeded, an error is raised.

From version 2.6, HyperSQL extends recursive query processing by allowing the use of RECURSIVE_TABLE to reference the current general result of the <query expression>. This table name can be used in subqueries with an IN predicate in order to reduce and limit the new result created in each iteration.

A trivial example of a recursive query is given below. Note the first column GEN. For example, if each row of the table represents a member of a family of dogs, together with its parent, the first column of the result indicates the calculated generation of each dog, ranging from first to fourth generation.

 CREATE TABLE pptree (pid INT, id INT, name VARCHAR(10));
 INSERT INTO pptree VALUES (NULL, 1, 'dizzi'),(1,2, 'fizzi'),
                           (1,3, 'gizzi'),(2,4, 'kizzi'),
                           (4,5, 'mizzi'),(3,6, 'pizzi'),
                           (3,7, 'tizzi');

 WITH RECURSIVE tree (gen, par, child, name) AS (
   VALUES(1, CAST(null as int), 1, 'dizzi')
   UNION
   SELECT gen + 1, pid, id, name FROM pptree, tree WHERE pid = child
   ) SELECT * FROM tree;

 GEN PAR    CHILD NAME  
 --- ------ ----- ----- 
 1   (null) 1     dizzi 
 2   1      2     fizzi 
 2   1      3     gizzi 
 3   2      4     kizzi 
 3   3      6     pizzi 
 3   3      7     tizzi 
 4   4      5     mizzi 

If recursive queries become complex, they also become very difficult to develop and debug. HyperSQL provides an alternative to this with user-defined SQL functions which return tables. Functions can perform any complex, repetitive task with better control, using loops, variables and, if necessary, recursion.

The query below computes the Fibonacci numbers up to 100 digits. The WHERE clause limits the iteration.

 WITH RECURSIVE Fibonacci(N1, F1, N2, F2) AS (
   VALUES(0, CAST(0 AS DECIMAL(100)), 1, CAST(1 AS DECIMAL(100))
   UNION
   SELECT N1 + 1, F1 + F2, N1 +2, F1 + 2 * F2
   FROM Fibonacci p 
   WHERE N1 < 100
 ) SELECT * FROM Fibonacci

Query Expression

A query expression consists of an optional WITH clause and a query expression body. The optional WITH clause lists one or more named ephemeral tables that can be referenced, just like the database tables in the query expression body.

<query expression> ::= [ <with clause> ] <query expression body>

A query expression body refers to a table formed by using UNION and other set operations. The query expression body is evaluated from left to right and the INTERSECT operator has precedence over the UNION and EXCEPT operators. A simplified BNF is given below:

<query expression body> ::= <query term> | <query expression body> UNION | EXCEPT [ ALL | DISTINCT ] [ <corresponding spec> ] <query term>

<query term> ::= <query primary> | <query term> INTERSECT [ ALL | DISTINCT ] [ <corresponding spec> ] <query term>

<query primary> ::= <simple table> | <left paren> <query expression body> [ <order by clause> ] [ <result offset clause> ] [ <fetch first clause> ] <right paren>

<simple table> ::= <query specification> | <table value constructor> | <explicit table> <explicit table> ::= TABLE <table or query name>

<corresponding spec> ::= CORRESPONDING [ BY <left paren> <column name list> <right paren> ]

A <query term> and a <query primary> can be a SELECT statement, an <explicit table>, or a <table value constructor>.

The CORRESPONDING clause is optional. If it is not specified, then the <query term> and the <query primary> must have the same number of columns. If CORRESPONDING is specified, the two sides need not have the same number of columns. If no column list is used with CORRESPONDING, then all the column names that are common in the tables on two sides are used in the order in which they appear in the first table. If a columns list is used, it allows you to select only some columns of the tables on the left and right side to create the new table. In the example below the columns named u and v from the two SELECT statements are used to create the UNION table.

 SELECT * FROM atable UNION CORRESPONDING BY (u, v) SELECT * FROM anothertable

The type of each column of the query expression is determined by combining the types of the corresponding columns from the two participating tables.

Ordering

When the rows of the result table have been formed, it is possible to specify the order in which they are returned to the user. The ORDER BY clause is used to specify the columns used for ordering, and whether ascending or descending ordering is used. It can also specify whether NULL values are returned first or last.

 SELECT x + y AS xysum, y + z AS yzsum FROM atable AS t (u, v, w, x, y, z)  WHERE u + v = w ORDER BY xysum NULLS LAST, yzsum NULLS FIRST

The ORDER BY clause specifies one or more <value expressions>. The list of rows is sorted according to the first <value expression>. When some rows are sorted equal then they are sorted according to the next <value expression> and so on.

<order by clause> ::= ORDER BY <sort specification list>

<sort specification list> ::= <sort specification> [ { <comma> <sort specification> }... ]

<sort specification> ::= <value expression> [ <collate clause> ] [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ]

The defaults are ASC and NULLS FIRST. Two database properties SQL NULLS FIRST and SQL NULLS ORDER can be modified to change the default behaviour.

A collation is used for columns of an ORDER BY expression that are of the type CHAR or VARCHAR. If a <collate clause> is not specified then the collation of the column, or the default collation of the database is used.

The default collation for a database is ASCII, with lowercase letters sorted after all uppercase letters. The example below shows the effect of collation on an ordered list.

 -- default collation collation for the database is ASCII
 SELECT id, lastname FROM customer ORDER BY lastname
 ID LASTNAME 
 -- -------- 
 40 Clancy   
 36 King     
 35 White    
 6  king

 -- a language collation is used, it treats king and King as adjacent entries
 SELECT id, lastname FROM customer ORDER BY lastname COLLATE "English"
 ID LASTNAME 
 -- -------- 
 40 Clancy   
 6  king     
 36 King     
 35 White

In the above example, if the LASTNAME column is itself defined in the table definition with COLLATE "English", then the COLLATE clause is not necessary in the ORDER BY expression.

An ORDER BY operation can sometimes be optimised by the engine when it can use the same index for accessing the table data and ordering. Optimisation can happen both with DESC + NULLS LAST and ASC + NULLS FIRST.

sort specification list

sort specification list

<sort specification list> ::= <value expression> [ASC | DESC] [NULLS FIRST | NULLS LAST]

Specify a sort order. A sort operation is performed on the result of a <query expression> or <query specification> and sorts the result according to one or more <value expression>. The <value expression> is usually a single column of the result, but in some cases it can be a column of the <table expression> that is not used in the select list. The default is ASC and NULLS FIRST.

Slicing

A different form of limiting the rows can be performed on the result table after it has been formed according to all the other operations (selection, grouping, ordering etc.). This is specified by the FETCH ... ROWS and OFFSET clauses of a SELECT statement. In this form, the specified OFFSET rows are removed from start of the table, then up to the specified FETCH rows are kept and the rest of the rows are discarded.

<result offset clause> ::= OFFSET <offset row count> { ROW | ROWS }

<fetch first clause> ::= FETCH { FIRST | NEXT } [ <fetch first row count> ] { ROW | ROWS } ONLY [ USING INDEX ]

<limit clause> ::= LIMIT <fetch first row count> [ USING INDEX ]

A slicing operation takes the result set that has been already processed and ordered. It then discards the specified number of rows from the start of the result set and returns the specified number of rows after the discarded rows. The <offset row count> and <fetch first row count> can be constants, dynamic variables, routine parameters, or routine variables. The type of the constants must be INTEGER.

 SELECT a, b FROM atable WHERE d < 5 ORDER BY absum OFFSET 3 FETCH 2 ROWS ONLY 
 SELECT a, b FROM atable WHERE d < 5 ORDER BY absum OFFSET 3 LIMIT 2 /* alternative keyword */ 

When the FETCH keyword is used, the specified number of rows must be at least 1, otherwise an error is returned. This behaviour is consistent with the SQL Standard. When the LIMIT keyword is used, the specified number of rows can be zero, which means return all rows (no LIMIT). In MySQL and PostgreSQL syntax modes, zero limit means no rows (empty result).

If there is an index on all the columns specified in the ORDER BY clause, it is normally used for slicing. In some queries, an index on another column may take precedence because it is used to process the WHERE condition. It is possible to add USING INDEX to the end of the slicing clause to force the use of the index for ordering and slicing, instead of the index for the WHERE condition.

Indexes Used in SELECT and DML Statements

A query expression, for example a SELECT statement, uses indexes for efficient data retrieval. The EXPLAIN PLAN statement lists the indexes used, together with other useful information about the query. EXPLAIN PLAN can also be used for data manipulation statements such as UPDATE.

EXPLAIN PLAN

explain plan

<explain plan> ::= EXPLAIN PLAN FOR <query expression>

For example, EXPLAIN PLAN FOR SELECT * FROM REVENUE WHERE COUNTRY = 'UK' .

Data Change Statements

Delete Statement

DELETE FROM

delete statement: searched

<delete statement: searched> ::= DELETE FROM <target table> [ [ AS ] <correlation name> ] [ WHERE <search condition> ][ LIMIT <fetch first row count> ]

Delete rows of a table. The search condition is a <boolean value expression> that is evaluated for each row of the table. If the condition is true, the row is deleted. If the condition is not specified, all the rows of the table are deleted. In fact, an implicit SELECT is performed in the form of SELECT * FROM <target table> [ WHERE <search condition>] and the selected rows are deleted. When used in JDBC, the number of rows returned by the implicit SELECT is returned as the update count.

If there are FOREIGN KEY constraints on other tables that reference the subject table, and the FOREIGN KEY constraints have referential actions, then rows from those other tables that reference the deleted rows are either deleted, or updated, according to the specified referential actions.

The LIMIT clause, or alternatively the ROWNUM() function in the WHERE clause, can be used to limit the number of rows that are deleted. This is useful when a very large number of rows needs to be deleted. In this situation, you can perform the operation is chunks and commit after each chunk to reduce memory usage and the total time of the operation.

In the second example below the rows that have the maximum value for column A are deleted;

 DELETE FROM T WHERE C > 5
 DELETE FROM T AS TT WHERE TT.A = (SELECT MAX(A) FROM T)

Truncate Statement

TRUNCATE TABLE

truncate table statement

<truncate table statement> ::= TRUNCATE TABLE <target table> [ <identity column restart option> ] [ <truncate options> ]

<identity column restart option> ::= CONTINUE IDENTITY | RESTART IDENTITY

<truncate options> ::= AND COMMIT [ NO CHECK ]

<truncate table versioning statement> ::= TRUNCATE TABLE <target table> VERSIONING TO { TIMESTAMP'YYYY-MM-DD HH:MM:SS' | CURRENT_TIMESTAMP }

Delete all rows of a table without firing its triggers. This statement can only be used on base tables (not views). If the table is referenced in a FOREIGN KEY constraint defined on another table, the statement causes an exception. Triggers defined on the table are not executed with this statement. The default for <identity column restart option> is CONTINUE IDENTITY. This means no change to the IDENTITY sequence of the table. If RESTART IDENTITY is specified, then the sequence is reset to its start value.

TRUNCATE is faster than ordinary DELETE. The TRUNCATE statement is an SQL Standard data change statement; therefore it is performed under transaction control and can be rolled back if the connection is not in the auto-commit mode.

HyperSQL also supports the optional AND COMMIT and NO CHECK options. If AND COMMIT is used, then the transaction is committed with the execution of the TRUNCATE statement. The action cannot be rolled back. If the additional NO CHECK option is also specified, then the TRUNCATE statement is executed even if the table is referenced in a FOREIGN KEY constraint defined on another, non-empty table. This form of TRUNCATE is faster than the default form and does not use much memory.

The <truncate table versioning statement> is for removing old history rows from a system-versioned table. All history rows that expired before the given timestamp are removed. No current row is removed.

TRUNCATE SCHEMA

truncate schema statement

<truncate schema statement> ::= TRUNCATE SCHEMA <target schema> [ <identity column restart option> ] AND COMMIT [ NO CHECK ]

Performs the equivalent of a TRUNCATE TABLE ... AND COMMIT on all the table in the schema. If the additional NO CHECK option is also specified, then the TRUNCATE statement is executed even if any of the tables in the schema is referenced in a FOREIGN KEY constraint defined on a non-empty table in a different schema.

If RESTART IDENTITY is specified, all table IDENTITY sequences and all SEQUENCE objects in the schema are reset to their start values.

Use of this statement requires schema ownership or administrative privileges.

Insert Statement

INSERT INTO

insert statement

<insert statement> ::= INSERT INTO <target table> [ [ AS ] <correlation name> ] <insert columns and source>

<insert columns and source> ::= <from subquery> | <from constructor> | <from default>

<from subquery> ::= [ <left paren> <insert column list> <right paren> ] [ <override clause> ] <query expression>

<from constructor> ::= [ <left paren> <insert column list> <right paren> ] [ <override clause> ] <contextually typed table value constructor>

<override clause> ::= OVERRIDING USER VALUE | OVERRIDING SYSTEM VALUE

<from default> ::= DEFAULT VALUES

<insert column list> ::= <column name list>

Insert new rows in a table. An INSERT statement inserts one or more rows into the table.

The special form, INSERT INTO <target table> DEFAULT VALUES can be used with tables which have a default value for each column.

With the other forms of INSERT, the optional (<insert column list>) specifies to which columns of the table the new values are assigned.

In one form, the inserted values are from a <query expression> and all the rows that are returned by the <query expression> are inserted into the table. If the <query expression> returns no rows, nothing is inserted.

In the other form, a comma separated list of values called <contextually typed table value constructor> is used to insert one or more rows into the table. This list is contextually typed, because the keywords NULL and DEFAULT can be used for the values that are assigned to each column of the table. In this form, the keyword DEFAULT means the default value of the column and can be used only if the target column has a default value or is an IDENTITY or GENERATED column of the table.

The <override clause> must be used when a value is explicitly assigned to a column that has been defined as GENERATED ALWAYS AS IDENTITY. The OVERRIDING SYSTEM VALUE clause must be used to override the sequence value with the user-supplied values. For columns defined as GENERATED BY DEFAULT AS IDENTITY, there is no need to use OVERRIDING when the user provides values to be used for the insert. The OVERRIDING USER VALUE clause can be used with all types of GENERATED columns and means the values provided by the user are simply ignored and new values generated by the system are used instead. Two examples of table definition are given below.

 CREATE TABLE t1 (id INTEGER GENERATED ALWAYS AS IDENTITY(START WITH 100), name VARCHAR(20) PRIMARY KEY)
 CREATE TABLE t2 (id INTEGER GENERATED BY DEFAULT AS IDENTITY(START WITH 1) PRIMARY KEY, name VARCHAR(20))

In both examples below, the value inserted for the id column is 14. In the first example, it is necessary to use OVERRIDING SYSTEM VALUE when inserting into the id column of table t1 because the column has been defined as GENERATED ALWAYS. In the second example, no OVERRIDING clause is required for the insert into table t2 as its id column is defined as GENERATED BY DEFAULT. In both examples, if there is an existing row with that value as primary key, a constraint violation exception is thrown.

 INSERT INTO t1 (id, name) OVERRIDING SYSTEM VALUE VALUES ( 14, 'Test Value')
 INSERT INTO t2 (id, name) VALUES ( 14, 'Test Value')

In the examples below, OVERRIDING USER VALUE is used to let the system generate values for the id column. The generated values override the value 14 in the first example, and the existing values for the id column in the table in the second example.

 INSERT INTO t1 (id, name) OVERRIDING USER VALUE VALUES ( 14, 'Another Test Value')
 INSERT INTO t1 (id, name) OVERRIDING USER VALUE (SELECT * FROM t1)

An array can be inserted into a column of the array type by using literals, by specifying a parameter in a prepared statement, or by an existing array returned by a query expression. The last example below inserts an array.

The rows that are inserted into the table are checked against all the constraints that have been declared on the table. The whole INSERT operation fails if any row fails to inserted due to constraint violation. Examples:

 CREATE TABLE T (A INTEGER GENERATED BY DEFAULT AS IDENTITY, B INTEGER DEFAULT 2)
 INSERT INTO T DEFAULT VALUES /* all columns of T have DEFAULT clauses */
 INSERT INTO T (SELECT * FROM Z) /* table Z has the same columns as table T */
 INSERT INTO T (A,B) VALUES ((1,2),(3,NULL), (DEFAULT,6)) /* three rows are inserted into table T */
 ALTER TABLE T ADD COLUMN D VARCHAR(10) ARRAY /* an ARRAY column is added */
 INSERT INTO T VALUES DEFAULT, 3, ARRAY['hot','cold']

If the table contains an IDENTITY column, the value for this column for the last row inserted by a session can be retrieved using a call to the IDENTITY() function. This call returns the last value inserted by the calling session. When the insert statement is executed with a JDBC Statement or PreparedStatement method, the getGeneratedKeys() method of Statement can be used to retrieve not only the IDENTITY column, but also any GENERATED computed column, or any other column. The getGeneratedKeys() returns a ResultSet with one or more columns. This contains one row per inserted row, and can therefore return all the generated columns for a multi-row insert.

There are three methods of specifying which generated keys should be returned. The first method does not specify the columns of the table. With this method, the returned ResultSet will have a column for each column of the table that is defined as GENERATED ... AS IDENTITY or GENERATED ... AS (<expression>). The two other methods require the user to specify which columns should be returned, either by column indexes, or by column names. With these methods, there is no restriction on which columns of the inserted values to be returned. This is especially useful when some columns have a default clause which is a function, or when there are BEFORE triggers on the table that may provide the inserted value for some of the columns.

In MySQL syntax compatibility mode, HyperSQL supports INSERT IGNORE, REPLACE and ON DUPLICATE KEY UPDATE variations of the INSERT statement.

Update Statement

UPDATE

update statement: searched

<update statement: searched> ::= UPDATE <target table> [ [ AS ] <correlation name> ] SET <set clause list> [ WHERE <search condition> ][ LIMIT <fetch first row count> ]

Update rows of a table. An UPDATE statement selects rows from the <target table> using an implicit SELECT statement formed in the following manner:

SELECT * FROM <target table> [ [ AS ] <correlation name> ] [ WHERE <search condition> ]

Then it applies the SET <set clause list> expression to each selected row.

If the implicit SELECT returns no rows, no update takes place. When used in JDBC, the number of rows returned by the implicit SELECT is returned as the update count.

If there are FOREIGN KEY constraints on other tables that reference the subject table, and the FOREIGN KEY constraints have referential actions, then rows from those other tables that reference the updated rows are updated, according to the specified referential actions.

The rows that are updated are checked against all the constraints that have been declared on the table. The whole UPDATE operation fails if any row violates any constraint.

The LIMIT clause, or alternatively the ROWNUM() function in the WHERE clause, can be used to limit the number of rows that are updated. This is useful when a very large number of rows needs to be updated. In this situation, you can perform the operation is chunks and commit after each chunk to reduce memory usage and the total time of the operation.

set clause list

set clause list

<set clause list> ::= <set clause> [ { <comma> <set clause> }... ]

<set clause> ::= <multiple column assignment> | <set target> <equals operator> <update source>

<multiple column assignment> ::= <set target list> <equals operator> <assigned row>

<set target list> ::= <left paren> <set target> [ { <comma> <set target> }... ] <right paren>

<assigned row> ::= <contextually typed row value expression>

<set target> ::= <column name>

<update source> ::= <value expression> | <contextually typed value specification>

Specify a list of assignments. This is used in UPDATE, MERGE and SET statements to assign values to a scalar or row target.

Apart from setting a whole target to a value, a SET statement can set individual elements of an array to new values. The last example below shows this form of assignment to the array in the column named B.

In the examples given below, UPDATE statements with single and multiple assignments are shown. Note in the third example, a SELECT statement is used to provide the update values for columns A and C, while the update value for column B is given separately. The SELECT statement must return exactly one row . In this example the SELECT statement refers to the existing value for column C in its search condition.

 UPDATE T SET A = 5 WHERE ...
 UPDATE T SET (A, B) = (1, NULL) WHERE ...
 UPDATE T SET (A, C) = (SELECT X, Y FROM U WHERE Z = C), B = 10 WHERE ...
 UPDATE T SET A = 3, B[3] = 'warm'

Merge Statement

MERGE INTO

merge statement

<merge statement> ::= MERGE INTO <target table> [ [ AS ] <merge correlation name> ] USING <table reference> ON <search condition> <merge operation specification>

<merge correlation name> ::= <correlation name>

<merge operation specification> ::= <merge when clause>...

<merge when clause> ::= <merge when matched clause> | <merge when not matched clause>

<merge when matched clause> ::= WHEN MATCHED [ AND <search condition> ] THEN <merge update or delete specification>

<merge update or delete specification> ::= <merge update specification> | <merge delete specification>

<merge when not matched clause> ::= WHEN NOT MATCHED [ AND <search condition> ] THEN <merge insert specification>

<merge update specification> ::= UPDATE SET <set clause list>

<merge delete specification> ::= DELETE

<merge insert specification> ::= INSERT [ <left paren> <insert column list> <right paren> ] [ <override clause> ] VALUES <merge insert value list>

<merge insert value list> ::= <left paren> <merge insert value element> [ { <comma> <merge insert value element> }... ] <right paren>

<merge insert value element> ::= <value expression> | <contextually typed value specification>

Update rows, delete rows or insert new rows into the <target table>. The MERGE statement uses a second table, specified by <table reference>, to determine the rows to be updated or inserted. It is possible to use the statement only to update rows, to delete rows or to insert rows, but usually both update and insert are specified.

The <search condition> matches each row of the <table reference> with each row of the <target table>. If the two rows match then the UPDATE clause is used to update the matching row of the target table. Those rows of <table reference> that have no matching rows are then used to insert new rows into the <target table>. Therefore, a MERGE statement can update or delete between 0 and all the rows of the <target table> and can insert between 0 and the number of the rows in <table reference> into the <target table>. If any row in the <target table> matches more than one row in <table reference> a cardinality error is raised. On the other hand, several rows in the <target table> can match a single row in <table reference> without any error. The constraints and referential actions specified on the database tables are enforced the same way as for an update, a delete and an insert statement.

The optional <search condition> in each WHEN clause can be used to filter (reduce) the rows for the particular action.

HyperSQL allows only one UPDATE, INSERT or DELETE operation in a MERGE statement. If both UPDATE and DELETE are used, the operations are performed in the order they appear in the MERGE statement. If the search conditions of both operations apply to the same row, only the first operation is performed.

The MERGE statement can be used with only the WHEN NOT MATCHED clause as a conditional INSERT statement that inserts a row if no existing rows match a condition.

In the first example below, the table originally contains two rows for different furniture. The <table reference> is the (VALUES(1, 'conference table'), (14, 'sofa'), (5, 'coffee table')) expression, which evaluates to a table with 3 rows. When the x value for a row matches an existing row, then the existing row is updated. When the x value does not match, the row is inserted. Therefore one row of table t is updated from 'dining table' to 'conference table', and two rows are inserted into table t.

The second example uses a SELECT statement as the source of the values for the MERGE.

In the third example, a new row in inserted into the table only when the primary key for the new row does not exist. This example uses parameters and should be executed as a JDBC PreparedStatement. The parameter is cast as INTEGER because the MERGE statement does not determine the types of values in the USING clause.

In the fourth example, existing rows that match are deleted.

 CREATE TABLE t (id INT PRIMARY KEY, description VARCHAR(100))
 INSERT INTO t VALUES (1, 'dining table'), (2, 'deck chair')

 MERGE INTO t USING (VALUES(1, 'conference table'), (14, 'sofa'), (5, 'coffee table')) 
   AS vals(x,y) ON t.id = vals.x
   WHEN MATCHED THEN UPDATE SET t.description = vals.y
   WHEN NOT MATCHED THEN INSERT VALUES vals.x, vals.y

 MERGE INTO t USING (SELECT * FROM tt WHERE acol = 2) AS vals(x,y) ON t.id = vals.x
   WHEN MATCHED THEN UPDATE SET t.description = vals.y
   WHEN NOT MATCHED THEN INSERT VALUES vals.x, vals.y

 MERGE INTO t USING (VALUES(CAST(? AS INT))) AS vals(x) ON t.id = vals.x
   WHEN NOT MATCHED THEN INSERT VALUES vals.x, ?

 MERGE INTO t USING (SELECT * FROM tt WHERE acol = 2) AS vals(x,y) ON t.id = vals.x
   WHEN MATCHED THEN DELETE
   WHEN NOT MATCHED THEN INSERT VALUES vals.x, vals.y

Diagnostics and State

HyperSQL supports some SQL statements, expressions, functions, and Java methods that report on the most recently executed statement.

The IDENTITY() function returns the last inserted identity value for the current session.

The GET DIAGNOSTICS statement is supported to a limited extent. The built-in function DIAGNOSTICS() is an alternative. These are normally used in SQL/PSM routines to check the result of the last data update operation.

GET DIAGNOSTICS

get diagnostics statement

<get diagnostics statement> ::= GET DIAGNOSTICS <simple target value specification> = ROW_COUNT

The <simple target value specification> is a session variable, or a routine variable or OUT parameter.

The keyword ROW_COUNT specifies the row count returned by the last executed statement. For INSERT, UPDATE, DELETE and MERGE statements, this is the number of rows affected by the statement. This is the same value as returned by JDBC executeUpdate() methods. For all other statements, zero is returned.

The value of ROW_COUNT is stored in the specified target.

This statement is often used in CREATE PROCEDURE statements.

In future versions, more options will be supported for diagnostics values.


$Revision: 6787 $