Comment on page
HarperDB 4.1 introduces the ability to use worker threads for concurrently handling HTTP requests. Previously this was handled by processes. This shift provides important benefits in terms of better control of traffic delegation with support for optimized load tracking and session affinity, better debuggability, and reduced memory footprint.
This means debugging will be much easier for custom functions. If you install/run HarperDB locally, most modern IDEs like WebStorm and VSCode support worker thread debugging, so you can start HarperDB in your IDE, and set breakpoints in your custom functions and debug them.
The associated routing functionality now includes session affinity support. This can be used to consistently route users to the same thread which can improve caching locality, performance, and fairness. This can be enabled in with the
http.sessionAffinityoption in your configuration.
HarperDB 4.1's NoSQL query handling has been revamped to consistently use iterators, which provide an extremely memory efficient mechanism for directly streaming query results to the network as the query results are computed. This results in faster Time to First Byte (TTFB) (only the first record/value in a query needs to be computed before data can start to be sent), and less memory usage during querying (the entire query result does not need to be stored in memory). These iterators are also available in query results for custom functions and can provide means for custom function code to iteratively access data from the database without loading entire results. This should be a completely transparent upgrade, all HTTP APIs function the same, with the one exception that custom functions need to be aware that they can't access query results by
[index](they should use array methods or for-in loops to handle query results).
4.1 includes configuration options for specifying the location of database storage files. This allows you to specifically locate database directories and files on different volumes for better flexibility and utilization of disks and storage volumes. See the storage configuration and schemas configuration for information on how to configure these locations.
A new operation called
cluster_networkwas added, this operation will ping the cluster and return a list of enmeshed nodes.
Updates to S3 import and export mean that these operations now require the bucket
regionin the request. Also, if referencing a nested object it should be done in the
keyparameter. See examples here.
Due to the AWS SDK v2 reaching end of life support we have updated to v3. This has caused some breaking changes in our operations
- A new attribute
regionwill need to be supplied
bucketattribute can no longer have trailing slashes. Slashes will now need to be in the
Starting HarperDB without any command (just
harperdb) now runs HarperDB like a standard process, in the foreground. This means you can use standard unix tooling for interacting with the process and is conducive for running HarperDB with systemd or any other process management tool. If you wish to have HarperDB launch itself in separate background process (and immediately terminate the shell process), you can do so by running
Internal Tickets completed:
- CORE-609 - Ensure that attribute names are always added to global schema as Strings
- CORE-1549 - Remove fastify-static code from Custom Functions server which auto serves content from "static" folder
- CORE-1655 - Iterator based queries
- CORE-1764 - Fix issue where describe_all operation returns an empty object for non super-users if schema(s) do not yet have table(s)
- CORE-1854 - Switch to using worker threads instead of processes for handling concurrency
- CORE-1877 - Extend the csv_url_load operation to allow for additional headers to be passed to the remote server when the csv is being downloaded
- CORE-1893 - Add last updated timestamp to describe operations
- CORE-1896 - Fix issue where Select * from system.hdb_info returns wrong HDB version number after Instance Upgrade
- CORE-1904 - Fix issue when executing GEOJSON query in SQL
- CORE-1905 - Add HarperDB YAML configuration setting which defines the storage location of NATS streams
- CORE-1906 - Add HarperDB YAML configuration setting defining the storage location of tables.
- CORE-1655 - Streaming binary format serialization
- CORE-1943 - Add configuration option to set mount point for audit tables
- CORE-1921 - Update NATS transaction lifecycle to handle message deduplication in work queue streams.
- CORE-1963 - Update logging for better readability, reduced duplication, and request context information.
- CORE-1968 - In server\nats\natsIngestService.js remove the js_msg.working(); line to improve performance.
- CORE-1976 - Fix error when calling describe_table operation with no schema or table defined in payload.
- CORE-1983 - Fix issue where create_attribute operation does not validate request for required attributes
- CORE-2015 - Remove PM2 logs that get logged in console when starting HDB
- CORE-2048 - systemd script for 4.1
- CORE-2052 - Include thread information in system_information for visibility of threads
- CORE-2061 - Add a better error msg when clustering is enabled without a cluster user set
- CORE-2068 - Create new log rotate logic since pm2 log-rotate no longer used
- CORE-2072 - Update to Node 18.15.0
- CORE-2090 - Upgrade Testing from v4.0.x and v3.x to v4.1.
- CORE-2091 - Run the performance tests
- CORE-2092 - Allow for automatic patch version updates of certain packages
- CORE-2109 - Add verify option to clustering TLS configuration
- CORE-2111 - Update AWS SDK to v3