4.0.0

11/2/2022

Networking & Data Replication (Clustering)

The HarperDB clustering internals have been rewritten and the underlying technology for Clustering has been completely replaced with NATS, an enterprise grade connective technology responsible for addressing, discovery and exchanging of messages that drive the common patterns in distributed systems.

  • CORE-1464, CORE-1470, : Remove SocketCluster dependencies and all code related to them.

  • CORE-1465, CORE-1485, CORE-1537, CORE-1538, CORE-1558, CORE-1583, CORE_1665, CORE-1710, CORE-1801, CORE-1865 :Add nats-server code as dependency, on install of HarperDB download nats-server is possible else fallback to building from source code.

  • CORE-1593, CORE-1761: Add nats.js as project dependency.

  • CORE-1466: Build NATS configs on harperdb run based on HarperDB YAML configuration.

  • CORE-1467, CORE-1508: Launch and manage NATS servers with PM2.

  • CORE-1468, CORE-1507: Create a process which reads the work queue stream and processes transactions.

  • CORE-1481, CORE-1529, CORE-1698, CORE-1502, CORE-1696: On upgrade to 4.0, update pre-existing clustering configurations, create table transaction streams, create work queue stream, update hdb_nodes table, create clustering folder structure, and rebuild self-signed certs.

  • CORE-1494, CORE-1521, CORE-1755: Build out internals to interface with NATS.

  • CORE-1504: Update existing hooks to save transactions to work with NATS.

  • CORE-1514, CORE-1515, CORE-1516, CORE-1527, CORE-1532: Update add_node, update_node, and remove_node operations to no longer need host and port in payload. These operations now manage dynamically sourcing of table level transaction streams between nodes and work queues.

  • CORE-1522: Create NATSReplyService process which handles the receiving NATS based requests from remote instances and sending back appropriate responses.

  • CORE-1471, CORE-1568, CORE-1563, CORE-1534, CORE-1569: Update cluster_status operation.

  • CORE-1611: Update pre-existing transaction log operations to be audit log operations.

  • CORE-1541, CORE-1612, CORE-1613: Create translation log operations which interface with streams.

  • CORE-1668: Update NATS serialization / deserialization to use MessagePack.

  • CORE-1673: Add system_info param to hdb_nodes table and update on add_node and cluster_status.

  • CORE-1477, CORE-1493, CORE-1557, CORE-1596, CORE-1577: Both a full HarperDB restart & just clustering restart call the NATS server with a reload directive to maintain full uptime while servers refresh.

  • CORE-1474:HarperDB install adds clustering folder structure.

  • CORE-1530: Post drop_table HarperDB purges the related transaction stream.

  • CORE-1567: Set NATS config to always use TLS.

  • CORE-1543: Removed the transact_to_cluster attribute from the bulk load operations. Now bulk loads always replicate.

  • CORE-1533, CORE-1556, CORE-1561, CORE-1562, CORE-1564: New operation configure_cluster, this operation enables bulk publishing and subscription of multiple tables to multiple instances of HarperDB.

  • CORE-1535: Create work queue stream on install of HarperDB. This stream receives transactions from remote instances of HarperDB which are then ingested in order.

  • CORE-1551: Create transaction streams on the remote node if they do not exist when performing add_node or update_node.

  • CORE-1594, CORE-1605, CORE-1749, CORE-1767, CORE-1770: Optimize the work queue stream and its consumer to be more performant and validate exact once delivery.

  • CORE-1621, CORE-1692, CORE-1570, CORE-1693: NATS stream names are MD5 hashed to avoid characters that HarperDB allows, but NATS may not.

  • CORE-1762: Add a new optional attribute to add_node and update_node named opt_start_time. This attribute sets a starting time to start synchronizing transactions.

  • CORE-1785: Optimizations and bug fixes in regards to sourcing data from remote instances on HarperDB.

  • CORE-1588: Created new operation set_cluster_routes to enable setting routes for instances of HarperDB to mesh together.

  • CORE-1589: Created new operation get_cluster_routes to allow for retrieval of routes used to connect the instance of HarperDB to the mesh.

  • CORE-1590: Created new operation delete_cluster_routes to allow for removal of routes used to connect the instance of HarperDB to the mesh.

  • CORE-1667: Fix old environment variable CLUSTERING_PORT not mapping to new hub server port.

  • CORE-1609: Allow remove_node to be called when the other node cannot be reached.

  • CORE-1815: Add transaction lock to add_node and update_node to avoid concurrent nats source update bug.

  • CORE-1848: Update stream configs if the node name has been changed in the YAML configuration.

  • CORE-1873: Update add_node and update_node so that it auto-creates schema/table on both local and remote node respectively

Data Storage

We have made improvements to how we store, index, and retrieve data.

  • CORE-1619: Enabled new concurrent flushing technology for improved write performance.

  • CORE-1701: Optimize search performance for search_by_conditions when executing multiple AND conditions.

  • CORE-1652: Encode the values of secondary indices more efficiently for faster access.

  • CORE-1670: Store updated timestamp in lmdb.js' version property.

  • CORE-1651: Enabled multiple value indexing of array values which allows for the ability to search on specific elements in an array more efficiently.

  • CORE-1649, CORE-1659: Large text values (larger than 255 bytes) are no longer stored in separate blob index. Now they are segmented and delimited in the same index to increase search performance.

  • Complex objects and object arrays are no longer stored in a separate index to preserve storage and increase write throughput.

  • CORE-1650, CORE-1724, CORE-1738: Improved internals around interpreting attribute values.

  • CORE-1657: Deferred property decoding allows large objects to be stored, but individual attributes can be accessed (like with get_attributes) without incurring the cost of decoding the entire object.

  • CORE-1658: Enable in-memory caching of records for even faster access to frequently accessed data.

  • CORE-1693: Wrap updates in async transactions to ensure ACID-compliant updates.

  • CORE-1653: Upgrade to 4.0 rebuilds tables to reflect changes made to index improvements.

  • CORE-1753: Removed old node-lmdb dependency.

  • CORE-1787: Freeze objects returned from queries.

  • CORE-1821: Read the WRITE_ASYNC setting which enables LMDB nosync.

Logging

HarperDB has increased logging specificity by breaking out logs based on components logging. There are specific log files each for HarperDB Core, Custom Functions, Hub Server, Leaf Server, and more.

  • CORE-1497: Remove pino and winston dependencies.

  • CORE-1426: All logging is output via stdout and stderr, our default logging is then picked up by PM2 which handles writing out to file.

  • CORE-1431: Improved read_log operation validation.

  • CORE-1433, CORE-1463: Added log rotation.

  • CORE-1553, CORE-1555, CORE-1552, CORE-1554, CORE-1704: Performance gain by only serializing objects and arrays if the log is for the level defined in configuration.

  • CORE-1436: Upgrade to 4.0 updates internals for logging changes.

  • CORE-1428, CORE-1440, CORE-1442, CORE-1434, CORE-1435, CORE-1439, CORE-1482, CORE-1751, CORE-1752: Bug fixes, performance improvements and improved unit tests.

  • CORE-1691: Convert non-PM2 managed log file writes to use Node.js fs.appendFileSync function.

Configuration

HarperDB has updated its configuration from a properties file to YAML.

  • CORE-1448, CORE-1449, CORE-1519, CORE-1587: Upgrade automatically converts the pre-existing settings file to YAML.

  • CORE-1445, CORE-1534, CORE-1444, CORE-1858: Build out new logic to create, update, and interpret the YAML configuration file.

  • Installer has updated prompts to reflect YAML settings.

  • CORE-1447: Create an alias for the configure_cluster operation as set_configuration.

  • CORE-1461, CORE-1462, CORE-1483: Unit test improvements.

  • CORE-1492: Improvements to get_configuration and set_configuration operations.

  • CORE-1503: Modify HarperDB configuration for more granular certificate definition.

  • CORE-1591: Update routes IP param to host and to leaf config in harperdb.conf

  • CORE-1519: Fix issue when switching between old and new versions of HarperDB we are getting the config parameter is undefined error on npm install.

Broad NodeJS and Platform Support

  • CORE-1624: HarperDB can now run on multiple versions of NodeJS, from v14 to v19. We primarily test on v18, so that is the preferred version.

Windows 10 and 11

  • CORE-1088: HarperDB now runs natively on Windows 10 and 11 without the need to run in a container or installed in WSL. Windows is only intended for evaluation and development purposes, not for production work loads.

Extra Changes and Bug Fixes

  • CORE-1520: Refactor installer to remove all waterfall code and update to use Promises.

  • CORE-1573: Stop the PM2 daemon and any logging processes when stopping hdb.

  • CORE-1586: When HarperDB is running in foreground stop any additional logging processes from being spawned.

  • CORE-1626: Update docker file to accommodate new harperdb.conf file.

  • CORE-1592, CORE-1526, CORE-1660, CORE-1646, CORE-1640, CORE-1689, CORE-1711, CORE-1601, CORE-1726, CORE-1728, CORE-1736, CORE-1735, CORE-1745, CORE-1729, CORE-1748, CORE-1644, CORE-1750, CORE-1757, CORE-1727, CORE-1740, CORE-1730, CORE-1777, CORE-1778, CORE-1782, CORE-1775, CORE-1771, CORE-1774, CORE-1759, CORE-1772, CORE-1861, CORE-1862, CORE-1863, CORE-1870, CORE-1869:Changes for CI/CD pipeline and integration tests.

  • CORE-1661: Fixed issue where old boot properties file caused an error when attempting to install 4.0.0.

  • CORE-1697, CORE-1814, CORE-1855: Upgrade fastify dependency to new major version 4.

  • CORE-1629: Jobs are now running as processes managed by the PM2 daemon.

  • CORE-1733: Update LICENSE to reflect our EULA on our site.

  • CORE-1606: Enable Custom Functions by default.

  • CORE-1714: Include pre-built binaries for most common platforms (darwin-arm64, darwin-x64, linux-arm64, linux-x64, win32-x64).

  • CORE-1628: Fix issue where setting license through environment variable not working.

  • CORE-1602, CORE-1760, CORE-1838, CORE-1839, CORE-1847, CORE-1773: HarperDB Docker container improvements.

  • CORE-1706: Add support for encoding HTTP responses with MessagePack.

  • CORE-1709: Improve the way lmdb.js dependencies are installed.

  • CORE-1758: Remove/update unnecessary HTTP headers.

  • CORE-1756: On npm install and harperdb install change the node version check from an error to a warning if the installed Node.js version does not match our preferred version.

  • CORE-1791: Optimizations to authenticated user caching.

  • CORE-1794: Update README to discuss Windows support & Node.js versions

  • CORE-1837: Fix issue where Custom Function directory was not being created on install.

  • CORE-1742: Add more validation to audit log - check schema/table exists and log is enabled.

  • CORE-1768: Fix issue where when running in foreground HarperDB process is not stopping on harperdb stop.

  • CORE-1864: Fix to semver checks on upgrade.

  • CORE-1850: Fix issue where a cluster_user type role could not be altered.

Last updated