5.0.0
Open Source and Pro Editions
Harper v5.0 is available in two editions: Open Source and Pro. The Open Source edition is free and open source under the Apache 2.0 license, while the Pro edition includes replication, certificate management, and licensing functionality (the source code is available under the Elastic 2.0 License).
The open source edition can be installed with:
npm i -g harper
And the pro edition can be installed with:
npm i -g @harperfast/harper-pro
Naming Updates
Along with new names for Harper packages, Harper now uses the name "harper" more consistently:
- For a fresh installation, the data, configuration, logs, and applications will be installed in the directory
~/harperdirectory by default (instead of~/hdb). - The configuration file will be named
harper-config.yamlby default (harperdb-config.yamlwill still be supported for backwards compatibility). - Applications should import from the
harpermodule instead ofharperdb, to access the Harper APIs. (harperdbwill still be supported for backwards compatibility).
RocksDB
Harper 5.0 now uses RocksDB as its default underlying storage engine. RocksDB provides a significantly more robust and reliable storage engine with consistent performance characteristics. RocksDB is well-maintained, has powerful background compaction capabilities, and a wide array of tuning options and features.
Harper also introduces its own native transaction log as a write-ahead log (WAL) for RocksDB, which drives ACID compliance in RocksDB, as well as powers real-time delivery of data. This is a highly optimized transaction log designed for high throughput of messaging and data. This is all powered by our new open source rocksdb-js library. The transaction log also utilizes separate log files for each node origin, for improved performance and reliability with replication.
RocksDB enables robust transactions in Harper, with the complete ability to read and query after writes (and get data from those writes) within a transaction.
Harper will continue to support the existing LMDB storage engine for v5.0, and will continue to load databases created with LMDB.
Switching a database from LMDB to RocksDB requires a database migration. This can be done using replication by creating new nodes and replicating the data from the old nodes to the new nodes.
Current Limitations of RocksDB
Currently, there are a number of optimizations with querying, caching, contention monitoring, and write batching that
have not yet been implemented in v5.0, but are planned for a future release. LMDB exhibits better performance for data
that is cached in-memory.
Retrieval of past events is not guaranteed to return every event when concurrent events take place on different nodes. This is often used by non-clean MQTT sessions. However, the latest message is always guaranteed to be delivered, sequences of messages from the same node are guaranteed to be delivered in order, ensuring the correctness of most applications and message retain consistency.
Retrieval of past events for subscriptions will not support a count option.
Published messages do not support streamed blobs.
Resource API Updates
Harper v5.0 has upgraded the resource API with several important changes:
- Harper v5.0 is specifically encouraging the use of
staticREST methods, and providing functionality to easily use these methods. - The
target(RequestTargettype) will now be parsed prior to calling static REST methods, for access to any query information in the URL. - The current request (
Requesttype) will be available in any function through asynchronous context tracking, using thegetContext()function available from theharpermodule. - The
getmethod will always return a frozen enumber record. The return value does not include all the methods from the Resource API (likewasLoadedFromSource,getContext, etc.). It only includes methodsgetUpdatedTimeandgetExpiresAt. - A source resource can return a standard
Responseobject (the resolved return value from afetchcall) in a cache resolvinggetmethod, and Harper will automatically handle the response, streaming the body and saving the headers into a cached record. - A
getResponsefunction is available as part of the standardharpermodule exports, allowing for easy access to the response object from within a resource method. - When using the LMDB storage engine, Harper will no longer attempt to cache resource instances that can make were used to make a record stored in a write visible in a subsequent read.
- All the default singular REST methods on tables will consistently return a
Promise. This includesTable.get(id),Table.put(...),Table.delete(id),Table.patch(...), andTable.invalidate(id). - The Table resource API now includes a
save()method to explicitly save a record to the database within the current transaction, making it visible to subsequent reads/queries. - RESTful methods can return a "Response-like" object, which is now identified as any object with a
headersproperty.- Source resources can return a standard
Responseobject (the resolved return value from afetchcall) in a cache resolvinggetmethod, and Harper will automatically handle the response, streaming the body and saving the headers into a cached record.
- Source resources can return a standard
Application Context Separation
Harper now runs each application its own separate JavaScript "context", which has its own global object, top level variables, and module imports. This provides isolation of applications and access to application-specific configuration data and functionality. These contexts will limit access to certain functionality including spawning new processes. This functionality can be controlled with configuration options. Specifically, any new processes that will be spawned need to be listed in applications.allowedShellCommands.
Harper will also "freeze" many of the intrinsic objects in the global object, to protect against prototype pollution type attacks and vulnerabilities.
This application context separation will also allow the logger to apply application-specific tagging to log messages, and leverage the application-specific configuration for logging.
Transitive Replication
Harper now uses an exclusion-based subscription model for replication. This means that replication will request data from nodes, excluding any other known nodes that will be sending data to the current node. With this approach, complex topologies can be created where additional nodes can be added and transitively replicate through other nodes, without explicit knowledge of all the nodes. Previously, replication required direct connections between all nodes, but transitive replication enables topologies with limited connections to proxy data throughout the cluster.
By default, with the replication of the system database, all nodes are reachable and will fully connect. To leverage transitive replication, you can disable the replication of the system database and individually configure the routing of each node (with replication.routes in the configuration).
Granular Operations Access for Roles
Roles can now be configured with granular access to operations. A role can be designed to have access to a subset of operations, or reference a named group of operations.
Impersonation
Super users can now impersonate other users, with the ability to specify the user's identity and roles for the impersonated session.
REST API Updates
For errors that occur during the execution of a REST method, Harper now follows RFC 9457 for error responses. This means that the error response will includetype, title, and code properties to describe the error.
Harper will no longer add a Server header in the response.
Operation API Updates
The update operation, and the upsert operation when applied to an existing record, will now follow the semantics of a patch method which means they will fully utilize CRDT semantics for resolution of conflicting updates across a cluster (separate properties can be independently updated and merged).
The operations API has fully switched from using hash_attribute to primary_key for all operations. The response from describe_all, describe_database, and describe_table operations will now include the primary_key attribute (instead of the legacy hash_attribute name).
The get_components operation will now return all files including files and directories that begin with a period (although node_modules will still be excluded).
Configuration
The harper-config.yaml file will now use relative paths to the root directory of the Harper data installation.
Please see the migration guide for suggestions on how to migrate from v4 to v5.