Performance
HyperSQL has multiple deployment and persistence options which
influence its
performance.
- The table type, MEMORY, CACHED or TEXT, indicates how the
table row data is stored and accessed by the database engine.
- In-process or server mode indicates how all the database
engine data is accessed by the user's application.
- The transaction model indicates how and when different
sessions (connections) wait for one another.
- Use of BLOBS and CLOBS
Table Type
The performance characteristics of different table types are:
- MEMORY tables provide the highest performance. All data
is in memory and each field of each row is a memory object
that can be read by the database engine without any
conversion. When data is updated, only a log record is written
to disk, usually with a fixed overhead per row.
- CACHED tables have a lower performance compared to MEMORY
tables. Data for this type of table comes from a row cache
(hence the name) which at any time holds a subset of all the
rows in all CACHED tables. Reduction in performance
is due to three reasons.
- First, because the size of the row cache is usually
smaller than the total row count of all the CACHED tables,
rows are frequently purged from the cache and other rows
are read from the disk and converted into memory
objects.
- Second, even if the row cache is large enough to hold
all the rows that are accessed by the user's application
in a period of time, the extra layer of access adds a
small overhead.
- Third, because data updates are both logged and written
to the data file, there is a larger overhead for data
updates.
- TEXT tables have similarities to both CACHED tables and
MEMORY tables. The indexes are held in memory, while the data
is held on disk and cached like CACHED tables. Because the
data is stored in text form as comma separated values (CSV) or
similar formats, reading and writing the data takes longer
than the same operation in binary format. On the other hand,
because the indexes are in memory and no separate logging is
performed, write operations may be faster than CACHED tables.
In-process and Server
The performance characteristics of in-process versus server mode
are:
- In-process access takes place in the same memory space as
the user's application. There is no data conversion or network
transfer overhead involved.
- Server mode usually has a different memory space than the
user's application. The data is converted into a byte stream,
transferred over the network, and then converted back into
objects. This introduces latency plus the extra processing
needed for conversion.
- HyperSQL supports SQL and Java stored procedures, which
allow a whole transaction to be encapsulated in a single SQL
statement. This speeds up access in the server mode as a
transaction can be completed in a single network round trip,
instead of execution of several statements. A single call to a
stored procedure can even return multiple result sets and
return values to the user's application.
Transaction Model
HyperSQL supports MVCC and two phased locking transaction models.
The
performance characteristics of the transaction model are:
- HyperSQL is fully multi threaded. If the vast majority of
operations are read operations, then performance is very high
in all transaction models. Multiple processes, each running in
a different thread, can access the same tables or rows at the
same time and return the results independently to the user's
application(s).
- If there is a significant amount of update operations,
the 2PL lock model performance can be reduced to a single
thread. Because locks are kept both for reads and writes, read
operations on an updated table are delayed until the
writer commits, and similarly write operations are delayed
until the reader commits.
- The multiversion concurrency control (MVCC) model provides
vastly greater performance than the lock based model, as no
read locks are used, while write locks are kept only on the
individual updated rows. Multiple threads can read and update
the database using multiple processor cores. This mode is more
performant than row level locking modes supported by some
other database engine.
- With MVCC and multi processor cores, if there is spare
processing power and many concurrent sessions, the time
overhead of network communications will not affect the overall
performance of the server mode deployments. Each connection
runs in a separate thread and uses the available
processing power for data conversion.
In summary, the fastest performance is typically achieved with the
combination
of
MEMORY tables,
in process access and
MVCC transaction model. If
reduced memory use
is required, some tables can be defined as CACHED tables, while
keeping the
most frequently accessed tables as MEMORY tables. If
server access is required,
then
stored procedures
can be used to reduce the
network round trips.
Blobs and Clobs
HSQLDB is the only SQL open source database that supports a
dedicated LOB
store. Blobs and clobs can be very large and benefit from a
separate store that
avoids mixing their data with row data which is not too large.
Internal
database tables are used for the LOB catalog. Therefore each
access to a LOB
has the overhead of catalog lookup. This overhead is justified
when the stored
LOBs are large. HSQLDB supports long VARCHAR and VARBINARY columns
that can be
used instead of CLOB and BLOB especially when the average lob size
is below 32
KB. These types do not have the LOB catalog overhead.
See the
Performance Tests page
for some
benchmark test results.