Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Getting started with HarperDB is easy and fast.
The quickest way to get up and running with HarperDB is with HarperDB Cloud, our database-as-a-service offering, which this guide will utilize.
Before you can start using HarperDB you need to set up an instance. Note, if you would prefer to install HarperDB locally, check out the installation guides including Linux, Mac, and many other options.
HarperDB Cloud instance provisioning typically takes 5-15 minutes. You will receive an email notification when your instance is ready.
Now that you have a HarperDB instance, you can do pretty much everything you’d like through the Studio. This section links to appropriate articles to get you started interacting with your data.
Load CSV data (Here’s a sample CSV of the HarperDB team’s dogs)
Complete HarperDB API documentation is available at api.harperdb.io. The HarperDB Studio features an example code builder that generates API calls in the programming language of your choice. For example purposes, a basic cURL command is shown below to create a schema called dev.
Breaking it down, there are only a few requirements for interacting with HarperDB:
Using the HTTP POST method.
Providing the URL of the HarperDB instance.
Providing the Authorization header (more on using Basic authentication).
Providing the Content-Type header.
Providing a JSON body with the desired operation and any additional operation properties (shown in the --data-raw parameter). This is the only parameter that needs to be changed to execute alternative operations on HarperDB.
HarperDB video tutorials are available within the HarperDB Studio. HarperDB and the HarperDB Studio are constantly changing, as such, there may be small discrepancies in UI/UX.
This documentation contains information for installing HarperDB locally. Note that if you’d like to get up and running quickly, you can try a managed instance with HarperDB Cloud. HarperDB is a cross-platform database; we recommend Linux for production use, but HarperDB can run on Windows and Mac as well, for development purposes. Installation is usually very simple and just takes a few steps, but there are a few different options documented here.
HarperDB runs on Node.js, so if you do not have it installed, you need to do that first (if you have installed, you can skip to installing HarperDB, itself). Node.js can be downloaded and installed from their site. For Linux and Mac, we recommend installing and managing Node versions with NVM, which has instructions for installation, but generally NVM can be installed with:
And then logout and login, and then install Node.js using nvm. We recommend using LTS, but support all currently maintained Node versions (which is currently version 14 and newer, and make sure to always uses latest minor/patch for the major version):
Then you can install HarperDB with NPM and start it:
HarperDB will automatically start after installation.
If you are setting up a production server on Linux, we have much more extensive documentation on how to configure volumes for database storage, set up a systemd script, configure your operating system for use a database server in our linux installation guide.
If you would like to run HarperDB in Docker, install Docker Desktop on your Mac or Windows computer. Otherwise, install the Docker Engine on your Linux server.
Once Docker Desktop or Docker Engine is installed, visit our Docker Hub page for information and examples on how to run a HarperDB container.
If you need to install HarperDB on a device that doesn't have an Internet connection, you can choose your version and download the npm package and install it directly (you’ll still need Node.js and NPM):
Once you’ve downloaded the .tgz file, run the following command from the directory where you’ve placed it:
For more information visit the HarperDB Command Line Interface guide.
HarperDB comes with binaries for standard AMD64/x64 or ARM64 CPU architectures on Linux, Windows (x64 only), and Mac (including Apple Silicon). However, if you are installing on a less common platform (Alpine, for example), you will need to ensure that you have build tools installed for the installation process to compile the binaries (this is handled automatically), including:
Go: version 1.19.1
GCC
Make
Python v3.7, v3.8, v3.9, or v3.10
HarperDB's documentation covers installation, getting started, APIs, security, and much more. Browse the topics at left, or choose one of the commonly used documentation sections below.
Start at the HarperDB Studio sign up page.
Provide the following information:
First Name
Last Name
Email Address
Subdomain
Part of the URL that will be used to identify your HarperDB Cloud Instances. For example, with subdomain “demo” and instance name “c1” the instance URL would be: https://c1-demo.harperdbcloud.com.
Coupon Code (optional)
Review the Privacy Policy and Terms of Service.
Click the sign up for free button.
You will be taken to a new screen to add an account password. Enter your password. Passwords must be a minimum of 8 characters with at least 1 lower case character, 1 upper case character, 1 number, and 1 special character.
Click the add account password button.
You will receive a Studio welcome email confirming your registration.
Note: Your email address will be used as your username and cannot be changed.
HarperDB Studio is the web-based GUI for HarperDB. Studio enables you to administer, navigate, and monitor all of your HarperDB instances in a simple, user friendly interface without any knowledge of the underlying HarperDB API. It’s free to sign up, get started today!
While HarperDB Studio is web based and hosted by us, all database interactions are performed on the HarperDB instance the studio is connected to. The HarperDB Studio loads in your browser, at which point you login to your HarperDB instances. Credentials are stored in your browser cache and are not transmitted back to HarperDB. All database interactions are made via the HarperDB Operations API directly from your browser to your instance.
HarperDB Studio enables users to manage both HarperDB Cloud instances and privately hosted instances all from a single UI. All HarperDB instances feature identical behavior whether they are hosted by us or by you.
HarperDB Studio resources are available regardless of whether or not you are logged in.
The HarperDB Marketplace is a collection of SDKs and connectors that enable developers to expand upon HarperDB for quick and easy solution development. Extensions are built and supported by the HarperDB Community. Each extension is hosted on the appropriate package manager or host.
To download a Marketplace extension:
Navigate to the HarperDB Marketplace page.
Identity the extension you would like to use.
Either click the link to the package.
Follow the extension’s instructions to proceed.
You can submit your rating for each extension by clicking on the stars.
HarperDB offers standard drivers to connect real-time HarperDB data with BI, analytics, reporting and data visualization technologies. Drivers are built and maintained by CData Software.
To download a driver:
Navigate to the HarperDB Drivers page.
Identity the driver you would like to use.
Click the download link.
For additional instructions, visit the support link on the driver card.
HarperDB offers video tutorials available in the Studio on the HarperDB Tutorials page as well as our YouTube channel. The HarperDB Studio is changing all the time, as a result these, the videos may not include all of the current Studio features.
The code examples page offers example code for many different programming languages. These samples will include a placeholder for your authorization token. Full code examples with the authorization token prepopulated are available within individual instance pages.
To log into your existing HarperDB Studio account:
Navigate to the HarperDB Studio.
Enter your email address.
Enter your password.
Click sign in.
To reset a forgotten password:
Navigate to the HarperDB Studio password reset page.
Enter your email address.
Click send password reset email.
If the account exists, you will receive an email with a temporary password.
Navigate back to the HarperDB Studio login page.
Enter your email address.
Enter your temporary password.
Click sign in.
You will be taken to a new screen to reset your account password. Enter your new password. Passwords must be a minimum of 8 characters with at least 1 lower case character, 1 upper case character, 1 number, and 1 special character.
Click the add account password button.
If you are already logged into the Studio, you can change your password though the user interface.
Navigate to the HarperDB Studio profile page.
In the password section, enter:
Current password.
New password.
New password again (for verification).
Click the Update Password button.
If you wish to install locally or already have a configured server, see the basic Installation Guide
The following is a recommended way to configure Linux and install HarperDB. These instructions should work reasonably well for any public cloud or on-premises Linux instance.
These instructions assume that the following has already been completed:
Linux is installed
Basic networking is configured
A non-root user account dedicated to HarperDB with sudo privileges exists
An additional volume for storing HarperDB files is attached to the Linux instance
Traffic to ports 9925 (HarperDB Operations API,) 9926 (HarperDB Custom Functions,) and 9932 (HarperDB Clustering) is permitted
For this example, we will use an AWS Ubuntu Server 22.04 LTS m5.large EC2 Instance with an additional General Purpose SSD EBS volume and the default “ubuntu” user account.
Logical Volume Manager (LVM) can be used to stripe multiple disks together to form a single logical volume. If striping disks together is not a requirement, skip these steps.
Find disk that already has a partition
Create array of free disks
Get quantity of free disks
Construct pvcreate command
Initialize disks for use by LVM
Create volume group
Create logical volume
Run lsblk
and note the device name of the additional volume
Create an ext4 filesystem on the volume (The below commands assume the device name is nvme1n1. If you used LVM to create logical volume, replace /dev/nvme1n1 with /dev/hdb_vg/hdb_lv)
Mount the file system and set the correct permissions for the directory
Create a fstab entry to mount the filesystem on boot
If a swap file or partition does not already exist, create and enable a 2GB swap file
Increase the open file limits for the ubuntu user
Install Node Version Manager (nvm)
Load nvm (or logout and then login)
Install Node.js using nvm (read more about specific Node version requirements)
Here is an example of installing HarperDB with minimal configuration.
Here is an example of installing HarperDB with commonly used additional configuration.
HarperDB will automatically start after installation. If you wish HarperDB to start when the OS boots, you have two options
You can set up a crontab:
Or you can create a systemd script at /etc/systemd/system/harperdb.service
Pasting the following contents into the file:
And then running the following:
For more information visit the HarperDB Command Line Interface guide and the HarperDB Configuration File guide.
Manage instance schemas/tables and browse data in tabular format with the following instructions:
Navigate to the HarperDB Studio Organizations page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click browse in the instance control bar.
Once on the instance browse page you can view data, manage schemas and tables, add new data, and more.
Click the plus icon at the top right of the schemas section.
Enter the schema name.
Click the green check mark.
Deleting a schema is permanent and irreversible. Deleting a schema removes all tables and data within it.
Click the minus icon at the top right of the schemas section.
Identify the appropriate schema to delete and click the red minus sign in the same row.
Click the red check mark to confirm deletion.
Select the desired schema from the schemas section.
Click the plus icon at the top right of the tables section.
Enter the table name.
Enter the primary key.
The primary key is also often referred to as the hash attribute in the studio, and it defines the unique identifier for each row in your table.
Click the green check mark.
Deleting a table is permanent and irreversible. Deleting a table removes all data within it.
Select the desired schema from the schemas section.
Click the minus icon at the top right of the tables section.
Identify the appropriate table to delete and click the red minus sign in the same row.
Click the red check mark to confirm deletion.
The following section assumes you have selected the appropriate table from the schema/table browser.
Click the magnifying glass icon at the top right of the table browser.
This expands the search filters.
The results will be filtered appropriately.
Click the data icon at the top right of the table browser. You will be directed to the CSV upload page where you can choose to import a CSV by URL or upload a CSV file.
To import a CSV by URL:
Enter the URL in the CSV file URL textbox.
Click Import From URL.
The CSV will load, and you will be redirected back to browse table data.
To upload a CSV file:
Click Click or Drag to select a .csv file (or drag your CSV file from your file browser).
Navigate to your desired CSV file and select it.
Click Insert X Records, where X is the number of records in your CSV.
The CSV will load, and you will be redirected back to browse table data.
Click the plus icon at the top right of the table browser.
The Studio will pre-populate existing table attributes in JSON format.
The primary key is not included, but you can add it in and set it to your desired value. Auto-maintained fields are not included and cannot be manually set. You may enter a JSON array to insert multiple records in a single transaction.
Enter values to be added to the record.
You may add new attributes to the JSON; they will be reflexively added to the table.
Click the Add New button.
Click the record/row you would like to edit.
Modify the desired values.
You may add new attributes to the JSON; they will be reflexively added to the table.
Click the save icon.
Deleting a record is permanent and irreversible. If transaction logging is turned on, the delete transaction will be recorded as well as the data that was deleted.
Click the record/row you would like to delete.
Click the delete icon.
Confirm deletion by clicking the check icon.
The following section assumes you have selected the appropriate table from the schema/table browser.
The first page of table data is automatically loaded on table selection. Paging controls are at the bottom of the table. Here you can:
Page left and right using the arrows.
Type in the desired page.
Change the page size (the amount of records displayed in the table).
Click the refresh icon at the top right of the table browser.
Toggle the auto switch at the top right of the table browser. The table data will now automatically refresh every 15 seconds. Filters and pages will remain set for refreshed data.
HarperDB Studio organizations provide the ability to group HarperDB Cloud Instances. Organization behavior is as follows:
Billing occurs at the organization level to a single credit card.
Organizations retain their own unique HarperDB Cloud subdomain.
Cloud instances reside within an organization.
Studio users can be invited to organizations to share instances.
An organization is automatically created for you when you sign up for HarperDB Studio. If you only have one organization, the Studio will automatically bring you to your organization’s page.
A summary view of all organizations your user belongs to can be viewed on the page. You can navigate to this page at any time by clicking the all organizations link at the top of the HarperDB Studio.
A new organization can be created as follows:
Navigate to the page.
Click the Create a New Organization card.
Fill out new organization details
Enter Organization Name This is used for descriptive purposes only.
Enter Organization Subdomain Part of the URL that will be used to identify your HarperDB Cloud Instances. For example, with subdomain “demo” and instance name “c1” the instance URL would be: https://c1-demo.harperdbcloud.com.
Click Create Organization.
An organization cannot be deleted until all instances have been removed. An organization can be deleted as follows:
Navigate to the HarperDB Studio Organizations page.
Identify the proper organization card and click the trash can icon.
Enter the organization name into the text box.
This is done for confirmation purposes to ensure you do not accidentally delete an organization.
Click the Do It button.
HarperDB Studio organization owners can manage users including inviting new users, removing users, and toggling ownership.
A new user can be invited to an organization as follows:
Click the appropriate organization card.
Click users at the top of the screen.
In the add user box, enter the new user’s email address.
Click Add User.
Users may or may not already be HarperDB Studio users when adding them to an organization. If the HarperDB Studio account already exists, the user will receive an email notification alerting them to the organization invitation. If the user does not have a HarperDB Studio account, they will receive an email welcoming them to HarperDB Studio.
Organization owners have full access to the organization including the ability to manage organization users, create, modify, and delete instances, and delete the organization. Users must have accepted their invitation prior to being promoted to an owner. A user’s organization owner status can be toggled owner as follows:
Navigate to the HarperDB Studio Organizations page.
Click the appropriate organization card.
Click users at the top of the screen.
Click the appropriate user from the existing users section.
Toggle the Is Owner switch to the desired status.
Users may be removed from an organization at any time. Removing a user from an organization will not delete their HarperDB Studio account, it will only remove their access to the specified organization. A user can be removed from an organization as follows:
Click the appropriate organization card.
Click users at the top of the screen.
Click the appropriate user from the existing users section.
Type DELETE in the text box in the Delete User row.
This is done for confirmation purposes to ensure you do not accidentally delete a user.
Click Delete User.
Billing is configured per organization and will be billed to the stored credit card at appropriate intervals (monthly or annually depending on the registered instance). Billing settings can be configured as follows:
Click the appropriate organization card.
Click billing at the top of the screen.
Here organization owners can view invoices, manage coupons, and manage the associated credit card.
HarperDB billing and payments are managed via Stripe.
Coupons are applicable towards any paid tier or user-installed instance and you can change your subscription at any time. Coupons can be added to your Organization as follows:
In the coupons panel of the billing page, enter your coupon code.
Click Add Coupon.
The coupon will then be available and displayed in the coupons panel.
Example code prepopulated with the instance URL and authorization token for the logged in database user can be found on the example code page of the HarperDB Studio. Code samples are generated based on the HarperDB API Documentation Postman collection. Code samples accessed with the following instructions:
Navigate to the page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click example code in the instance control bar.
Select the appropriate category from the left navigation.
Select the appropriate operation from the left navigation.
Select your desired language/variant from the Choose Programming Language dropdown.
Copy code from the sample code panel using the copy icon.
Sample code uses two identifiers: language and variant.
language is the programming language that the sample code is generated in.
variant is the methodology or library used by the language to send HarperDB requests.
The list of available language/variants are as follows:
SQL queries can be executed directly through the HarperDB Studio with the following instructions:
Navigate to the page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click query in the instance control bar.
Enter your SQL query in the SQL query window.
Click Execute.
Please note, the Studio will execute the query exactly as entered. For example, if you attempt to SELECT *
from a table with millions of rows, you will most likely crash your browser.
The first page of results set data is automatically loaded on query execution. Paging controls are at the bottom of the table. Here you can:
Page left and right using the arrows.
Type in the desired page.
Change the page size (the amount of records displayed in the table).
Click the refresh icon at the top right of the results set table.
Toggle the auto switch at the top right of the results set table. The results set will now automatically refresh every 15 seconds. Filters and pages will remain set for refreshed data.
Query history is stored in your local browser cache. Executed queries are listed with the most recent at the top in the query history section.
Identify the query from the query history list.
Click the appropriate query. It will be loaded into the sql query input box.
Click Execute.
Click the trash can icon at the top right of the query history section.
The HarperDB Studio includes a charting feature where you can build charts based on your specified queries. Visit the Charts documentation for more information.
The HarperDB Studio includes a charting feature within an instance. They are generated in real time based on your existing data and automatically refreshed every 15 seconds. Instance charts can be accessed with the following instructions:
Navigate to the page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click charts in the instance control bar.
Charts are generated based on SQL queries, therefore to build a new chart you first need to build a query. Instructions as follows (starting on the charts page described above):
Click query in the instance control bar.
Enter the SQL query you would like to generate a chart from.
For example, using the dog demo data from the API Docs, we can get the average dog age per owner with the following query: SELECT AVG(age) as avg_age, owner_name FROM dev.dog GROUP BY owner_name
.
Click Execute.
Click create chart at the top right of the results table.
Configure your chart.
Choose chart type.
HarperDB Studio offers many standard charting options like line, bar, etc.
Choose a data column.
This column will be used to plot the data point. Typically, this is the values being calculated in the SELECT
statement. Depending on the chart type, you can select multiple data columns to display on a single chart.
Depending on the chart type, you will need to select a grouping.
This could be labeled as x-axis, label, etc. This will be used to group the data, typically this is what you used in your GROUP BY clause.
Enter a chart name.
Used for identification purposes and will be displayed at the top of the chart.
Choose visible to all org users toggle.
Leaving this option off will limit chart visibility to just your HarperDB Studio user. Toggling it on will enable all users with this Organization to view this chart.
Click Add Chart.
The chart will now be visible on the charts page.
The example query above, configured as a bar chart, results in the following chart:
HarperDB Studio charts can be downloaded in SVG, PNG, and CSV format. Instructions as follows (starting on the charts page described above):
Identify the chart you would like to export.
Click the three bars icon.
Select the appropriate download option.
The Studio will generate the export and begin downloading immediately.
Delete a chart as follows (starting on the charts page described above):
Identify the chart you would like to delete.
Click the X icon.
Click the confirm delete chart button.
The chart will be deleted.
Deleting a chart that is visible to all Organization users will delete it for all users.
HarperDB instance clustering and replication can be configured directly through the HarperDB Studio. It is recommended to read through the clustering documentation first to gain a strong understanding of HarperDB clustering behavior.
All clustering configuration is handled through the cluster page of the HarperDB Studio, accessed with the following instructions:
Navigate to the page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click cluster in the instance control bar.
Note, the cluster page will only be available to super users.
HarperDB instances do not have clustering configured by default. The HarperDB Studio will walk you through the initial configuration. Upon entering the cluster screen for the first time you will need to complete the following configuration. Configurations are set in the enable clustering panel on the left while actions are described in the middle of the screen.
Create a cluster user, read more about this here: Clustering Users and Roles.
Enter username.
Enter password.
Click Create Cluster User.
Click Set Cluster Node Name.
Click Enable Instance Clustering.
At this point the Studio will restart your HarperDB Instance, required for the configuration changes to take effect.
Once initial clustering configuration is completed you a presented with a clustering management screen with the following properties:
connected instances
Displays all instances within the Studio Organization that this instance manages a connection with.
unconnected instances
Displays all instances within the Studio Organization that this instance does not manage a connection with.
unregistered instances
Displays all instances outside of the Studio Organization that this instance manages a connection with.
manage clustering
Once instances are connected, this will display clustering management options for all connected instances and all schemas and tables.
HarperDB Instances can be clustered together with the following instructions.
Ensure clustering has been configured on both instances and a cluster user with identical credentials exists on both.
Identify the instance you would like to connect from the unconnected instances panel.
Click the plus icon next the appropriate instance.
If configurations are correct, all schemas will sync across the cluster, then appear in the manage clustering panel. If there is a configuration issue, a red exclamation icon will appear, click it to learn more about what could be causing the issue.
HarperDB Instances can be disconnected with the following instructions.
Identify the instance you would like to disconnect from the connected instances panel.
Click the minus icon next the appropriate instance.
Subscriptions must be configured in order to move data between connected instances. Read more about subscriptions here: Creating A Subscription. The manage clustering panel displays a table with each row representing an channel per instance. Cells are bolded to indicate a change in the column. Publish and subscribe replication can be configured per table with the following instructions:
Identify the instance, schema, and table for replication to be configured.
For publish, click the toggle switch in the publish column.
For subscribe, click the toggle switch in the subscribe column.
The HarperDB Studio allows you to administer all of your HarperDB instances in one place. HarperDB currently offers the following instance types:
HarperDB Cloud Instance Managed installations of HarperDB, what we call .
5G Wavelength Instance Managed installations of HarperDB running on the Verizon network through AWS Wavelength, what we call . Note, these instances are only accessible via the Verizon network.
User-Installed Instance Any HarperDB installation that is managed by you. These include instances hosted within your cloud provider accounts (for example, from the AWS or Digital Ocean Marketplaces), privately hosted instances, or instances installed locally.
All interactions between the Studio and your instances take place directly from your browser. HarperDB stores metadata about your instances, which enables the Studio to display these instances when you log in. Beyond that, all traffic is routed from your browser to the HarperDB instances using the standard .
A summary view of all instances within an organization can be viewed by clicking on the appropriate organization from the page. Each instance gets their own card. HarperDB Cloud and user-installed instances are listed together.
Navigate to the page.
Click the appropriate organization for the instance to be created under.
Click the Create New HarperDB Cloud Instance + Register User-Installed Instance card.
Select your desired Instance Type.
For a HarperDB Cloud Instance or a HarperDB 5G Wavelength Instance, click Create HarperDB Cloud Instance.
Fill out Instance Info.
Enter Instance Name
This will be used to build your instance URL. For example, with subdomain “demo” and instance name “c1” the instance URL would be: https://c1-demo.harperdbcloud.com. The Instance URL will be previewed below.
Enter Instance Username
This is the username of the initial HarperDB instance super user.
Enter Instance Password
This is the password of the initial HarperDB instance super user.
Click Instance Details to move to the next page.
Select Instance Specs
Select Instance RAM
HarperDB Cloud Instances are billed based on Instance RAM, this will select the size of your provisioned instance. .
Select Storage Size
Each instance has a mounted storage volume where your HarperDB data will reside. Storage is provisioned based on space and IOPS. .
Select Instance Region
The geographic area where your instance will be provisioned.
Click Confirm Instance Details to move to the next page.
Review your Instance Details, if there is an error, use the back button to correct it.
Review the and , if you agree, click the I agree radio button to confirm.
Click Add Instance.
Your HarperDB Cloud instance will be provisioned in the background. Provisioning typically takes 5-15 minutes. You will receive an email notification when your instance is ready.
Click the appropriate organization for the instance to be created under.
Click the Create New HarperDB Cloud Instance + Register User-Installed Instance card.
Select Register User-Installed Instance.
Fill out Instance Info.
Enter Instance Name
This is used for descriptive purposes only.
Enter Instance Username
The username of a HarperDB super user that is already configured in your HarperDB installation.
Enter Instance Password
The password of a HarperDB super user that is already configured in your HarperDB installation.
Enter Host
The host to access the HarperDB instance. For example, harperdb.myhost.com
or localhost
.
Enter Port
The port to access the HarperDB instance. HarperDB defaults 9925
.
Select SSL
If your instance is running over SSL, select the SSL checkbox. If not, you will need to enable mixed content in your browser to allow the HTTPS Studio to access the HTTP instance. If there are issues connecting to the instance, the Studio will display a red error message.
Click Instance Details to move to the next page.
Select Instance Specs
Select Instance RAM
HarperDB instances are billed based on Instance RAM. Selecting additional RAM will enable the ability for faster and more complex queries.
Click Confirm Instance Details to move to the next page.
Review your Instance Details, if there is an error, use the back button to correct it.
Click Add Instance.
The HarperDB Studio will register your instance and restart it for the registration to take effect. Your instance will be immediately available after this is complete.
Instance deletion has two different behaviors depending on the instance type.
HarperDB Cloud Instance This instance will be permanently deleted, including all data. This process is irreversible and cannot be undone.
User-Installed Instance The instance will be removed from the HarperDB Studio only. This does not uninstall HarperDB from your system and your data will remain intact.
An instance can be deleted as follows:
Click the appropriate organization that the instance belongs to.
Identify the proper instance card and click the trash can icon.
Enter the instance name into the text box.
This is done for confirmation purposes to ensure you do not accidentally delete an instance.
Click the Do It button.
The Studio enables users to log in and out of different database users from the instance control panel. To log out of an instance:
Click the appropriate organization that the instance belongs to.
Identify the proper instance card and click the lock icon.
You will immediately be logged out of the instance.
To log in to an instance:
Click the appropriate organization that the instance belongs to.
Identify the proper instance card, it will have an unlocked icon and a status reading PLEASE LOG IN, and click the center of the card.
Enter the database username.
The username of a HarperDB user that is already configured in your HarperDB instance.
Enter the database password.
The password of a HarperDB user that is already configured in your HarperDB instance.
Click Log In.
Navigate to the page.
Navigate to the page.
Navigate to the page.
Navigate to the page.
Review the and , if you agree, click the I agree radio button to confirm.
Navigate to the page.
HarperDB instances can be resized on the page.
Navigate to the page.
Navigate to the page.
C#
RestSharp
cURL
cURL
Go
Native
HTTP
HTTP
Java
OkHttp
Java
Unirest
JavaScript
Fetch
JavaScript
jQuery
JavaScript
XHR
NodeJs
Axios
NodeJs
Native
NodeJs
Request
NodeJs
Unirest
Objective-C
NSURLSession
OCaml
Cohttp
PHP
cURL
PHP
HTTP_Request2
PowerShell
RestMethod
Python
http.client
Python
Requests
Ruby
Net:HTTP
Shell
Httpie
Shell
wget
Swift
URLSession
HarperDB users can be managed directly through the HarperDB Studio. It is recommended to read through the users & roles documentation to gain a strong understanding of how they operate.
Instance role configuration is handled through the roles page of the HarperDB Studio, accessed with the following instructions:
Navigate to the HarperDB Studio Organizations page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click rules in the instance control bar.
Note, the roles page will only be available to super users.
The roles management screen consists of the following panels:
super users
Displays all super user roles for this instance.
cluster users
Displays all cluster user roles for this instance.
standard roles
Displays all standard roles for this instance.
role permission editing
Once a role is selected for editing, permissions will be displayed here in JSON format.
Note, when new tables are added that are not configured, the Studio will generate configuration values with permissions defaulting to false
.
Click the plus icon at the top right of the appropriate role section.
Enter the role name.
Click the green check mark.
Configure the role permissions in the role permission editing panel.
Note, to have the Studio generate attribute permissions JSON, toggle show all attributes at the top right of the role permission editing panel.
Click Update Role Permissions.
Click the appropriate role from the appropriate role section.
Modify the role permissions in the role permission editing panel.
Note, to have the Studio generate attribute permissions JSON, toggle show all attributes at the top right of the role permission editing panel.
Click Update Role Permissions.
Deleting a role is permanent and irreversible. A role cannot be remove if users are associated with it.
Click the minus icon at the top right of the schemas section.
Identify the appropriate role to delete and click the red minus sign in the same row.
Click the red check mark to confirm deletion.
Enabling mixed content is required in cases where you would like to connect the HarperDB Studio to HarperDB Instances via HTTP. This should not be used for production systems, but may be convenient for development and testing purposes. Doing so will allow your browser to reach HTTP traffic, which is considered insecure, through an HTTPS site like the Studio.
A comprehensive guide is provided by Adobe here.
The HarperDB Studio display instance status and metrics on the instance status page, which can be accessed with the following instructions:
Navigate to the HarperDB Studio Organizations page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click status in the instance control bar.
Once on the instance browse page you can view host system information, HarperDB logs, and HarperDB Cloud alarms (if it is a cloud instance).
Note, the status page will only be available to super users.
HarperDB instance clustering and replication can be configured directly through the HarperDB Studio. It is recommended to read through the clustering documentation first to gain a strong understanding of HarperDB clustering behavior.
Instance user configuration is handled through the users page of the HarperDB Studio, accessed with the following instructions:
Navigate to the HarperDB Studio Organizations page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click users in the instance control bar.
Note, the users page will only be available to super users.
HarperDB instance users can be added with the following instructions.
In the add user panel on the left enter:
New user username.
New user password.
Select a role.
Learn more about role management here: Manage Instance Roles.
Click Add User.
HarperDB instance users can be modified with the following instructions.
In the existing users panel, click the row of the user you would like to edit.
To change a user’s password:
In the Change user password section, enter the new password.
Click Update Password.
To change a user’s role:
In the Change user role section, select the new role.
Click Update Role.
To delete a user:
In the Delete User section, type the username into the textbox.
This is done for confirmation purposes.
Click Delete User.
HarperDB Custom Functions are enabled by default and can be configured further through the HarperDB Studio. It is recommended to read through the Custom Functions documentation first to gain a strong understanding of HarperDB Custom Functions behavior.
All Custom Functions configuration is handled through the functions page of the HarperDB Studio, accessed with the following instructions:
Navigate to the HarperDB Studio Organizations page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click functions in the instance control bar.
Note, the functions page will only be available to super users.
On the functions page of the HarperDB Studio you are presented with a functions management screen with the following properties:
projects
Displays a list of Custom Functions projects residing on this instance.
/project_name/routes
Only displayed if there is an existing project. Displays the routes files contained within the selected project.
/project_name/helpers
Only displayed if there is an existing project. Displays the helper files contained within the selected project.
/project_name/static
Only displayed if there is an existing project. Displays the static file count and a link to the static files contained within the selected project. Note, static files cannot currently be deployed through the Studio and must be deployed via the HarperDB API or manually to the server (not applicable with HarperDB Cloud).
Root File Directory
Displays the root file directory where the Custom Functions projects reside on this instance.
Custom Functions Server URL
Displays the base URL in which all Custom Functions are accessed for this instance.
HarperDB Custom Functions Projects can be initialized with the following instructions.
If this is your first project, skip this step. Click the plus icon next to the projects heading.
Enter the project name in the text box located under the projects heading.
Click the check mark icon next the appropriate instance.
The Studio will take a few moments to provision a new project based on the Custom Functions template.
The Custom Functions project is now created and ready to modify.
Custom Functions routes and helper functions can be modified directly through the Studio. From the functions page:
Select the appropriate project.
Select the appropriate route or helper.
Modify the code with your desired changes.
Click the save icon at the bottom right of the screen.
Note, saving modifications will restart the Custom Functions server on your HarperDB instance and may result in up to 60 seconds of downtime for all Custom Functions.
To create an additional route to your Custom Functions project. From the functions page:
Select the appropriate Custom Functions project.
Click the plus icon to the right of the routes header.
Enter the name of the new route in the textbox that appears.
Click the check icon to create the new route.
Note, adding a route will restart the Custom Functions server on your HarperDB instance and may result in up to 60 seconds of downtime for all Custom Functions.
To create an additional helper to your Custom Functions project. From the functions page:
Select the appropriate Custom Functions project.
Click the plus icon to the right of the helpers header.
Enter the name of the new helper in the textbox that appears.
Click the check icon to create the new helper.
Note, adding a helper will restart the Custom Functions server on your HarperDB instance and may result in up to 60 seconds of downtime for all Custom Functions.
To delete a Custom Functions project from the functions page:
Click the minus icon to the right of the projects header.
Click the red minus icon to the right of the Custom Functions project you would like to delete.
Confirm deletion by clicking the red check icon.
Note, deleting a project will restart the Custom Functions server on your HarperDB instance and may result in up to 60 seconds of downtime for all Custom Functions.
To delete a Custom Functions project route from the functions page:
Select the appropriate Custom Functions project.
Click the minus icon to the right of the routes header.
Click the red minus icon to the right of the Custom Functions route you would like to delete.
Confirm deletion by clicking the red check icon.
Note, deleting a route will restart the Custom Functions server on your HarperDB instance and may result in up to 60 seconds of downtime for all Custom Functions.
To delete a Custom Functions project helper from the functions page:
Select the appropriate Custom Functions project.
Click the minus icon to the right of the helper header.
Click the red minus icon to the right of the Custom Functions header you would like to delete.
Confirm deletion by clicking the red check icon.
Note, deleting a header will restart the Custom Functions server on your HarperDB instance and may result in up to 60 seconds of downtime for all Custom Functions.
The HarperDB Studio provides the ability to deploy Custom Functions projects to additional HarperDB instances within the same Studio Organization. To deploy Custom Functions projects to additional instances, starting from the functions page:
Select the project you would like to deploy.
Click the deploy button at the top right.
A list of instances (excluding the current instance) within the organization will be displayed in tabular with the following information:
Instance Name: The name used to describe the instance.
Instance URL: The URL used to access the instance.
CF Capable: Describes if the instance version supports Custom Functions (yes/no).
CF Enabled: Describes if Custom Functions are configured and enabled on the instance (yes/no).
Has Project: Describes if the selected Custom Functions project has been previously deployed to the instance (yes/no).
Deploy: Button used to deploy the project to the instance.
Remote: Button used to remove the project from the instance. Note, this will only be visible if the project has been previously deployed to the instance.
In the appropriate instance row, click the deploy button.
Note, deploying a project will restart the Custom Functions server on the HarperDB instance receiving the deployment and may result in up to 60 seconds of downtime for all Custom Functions.
HarperDB instance configuration can be viewed and managed directly through the HarperDB Studio. HarperDB Cloud instances can be resized in two different ways via this page, either by modifying machine RAM or by increasing drive storage. User-installed instances can have their licenses modified by modifying licensed RAM.
All instance configuration is handled through the config page of the HarperDB Studio, accessed with the following instructions:
Navigate to the page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click config in the instance control bar.
Note, the config page will only be available to super users and certain items are restricted to Studio organization owners.
The instance overview panel displays the following instance specifications:
Instance URL
Instance Node Name (for clustering)
Instance API Auth Header (this user)
The Basic authentication header used for the logged in HarperDB database user
Created Date (HarperDB Cloud only)
Region (HarperDB Cloud only)
The geographic region where the instance is hosted.
Total Price
RAM
Storage (HarperDB Cloud only)
Disk IOPS (HarperDB Cloud only)
HarperDB Cloud instance size and user-installed instance licenses can be modified with the following instructions. This option is only available to Studio organization owners.
Note: For HarperDB Cloud instances, upgrading RAM may add additional CPUs to your instance as well. Click here to see how many CPUs are provisioned for each instance size.
In the update ram panel at the bottom left:
Select the new instance size.
If you do not have a credit card associated with your account, an Add Credit Card To Account button will appear. Click that to be taken to the billing screen where you can enter your credit card information before returning to the config tab to proceed with the upgrade.
If you do have a credit card associated, you will be presented with the updated billing information.
Click Upgrade.
The instance will shut down and begin reprovisioning/relicensing itself. The instance will not be available during this time. You will be returned to the instance dashboard and the instance status will show UPDATING INSTANCE.
Once your instance upgrade is complete, it will appear on the instance dashboard as status OK with your newly selected instance size.
Note, if HarperDB Cloud instance reprovisioning takes longer than 20 minutes, please submit a support ticket here: https://harperdbhelp.zendesk.com/hc/en-us/requests/new.
The HarperDB Cloud instance storage size can be increased with the following instructions. This option is only available to Studio organization owners.
Note: Instance storage can only be upgraded once every 6 hours.
In the update storage panel at the bottom left:
Select the new instance storage size.
If you do not have a credit card associated with your account, an Add Credit Card To Account button will appear. Click that to be taken to the billing screen where you can enter your credit card information before returning to the config tab to proceed with the upgrade.
If you do have a credit card associated, you will be presented with the updated billing information.
Click Upgrade.
The instance will shut down and begin reprovisioning itself. The instance will not be available during this time. You will be returned to the instance dashboard and the instance status will show UPDATING INSTANCE.
Once your instance upgrade is complete, it will appear on the instance dashboard as status OK with your newly selected instance size.
Note, if this process takes longer than 20 minutes, please submit a support ticket here: https://harperdbhelp.zendesk.com/hc/en-us/requests/new.
The HarperDB instance can be deleted/removed from the Studio with the following instructions. Once this operation is started it cannot be undone. This option is only available to Studio organization owners.
In the remove instance panel at the bottom left:
Enter the instance name in the text box.
The Studio will present you with a warning.
Click Remove.
The instance will begin deleting immediately.
The HarperDB Cloud instance can be restarted with the following instructions.
In the restart instance panel at the bottom right:
Enter the instance name in the text box.
The Studio will present you with a warning.
Click Restart.
The instance will begin restarting immediately.
HarperDB Cloud instance alarms are triggered when certain conditions are met. Once alarms are triggered organization owners will immediately receive an email alert and the alert will be available on the page. The below table describes each alert and their evaluation metrics.
Alarm: Title of the alarm.
Threshold: Definition of the alarm threshold.
Intervals: The number of occurrences before an alarm is triggered and the period that the metric is evaluated over.
Proposed Remedy: Recommended solution to avoid the alert in the future.
Storage
> 90% Disk
1 x 5min
CPU
> 90% Avg
2 x 5min
Memory
> 90% RAM
2 x 5min
Clustering does not run by default; it needs to be enabled.
To enable clustering the clustering.enabled
configuration element in the harperdb-config.yaml
file must be set to true
.
There are multiple ways to update this element, they are:
Directly editing the harperdb-config.yaml
file and setting enabled to true
Note: When making any changes to the harperdb-config.yaml
file HarperDB must be restarted for the changes to take effect.
Calling set_configuration
through the operations API
Note: When making any changes to HarperDB configuration HarperDB must be restarted for the changes to take effect.
Using command line variables.
Using environment variables.
An efficient way to install HarperDB, create the cluster user, set the node name and enable clustering in one operation is to combine the steps using command line and/or environment variables. Here is an example using command line variables.
HarperDB Cloud is the easiest way to test drive HarperDB, it’s HarperDB-as-a-Service. Cloud handles deployment and management of your instances in just a few clicks. HarperDB Cloud is currently powered by AWS with additional cloud providers on our roadmap for the future.
These instances are only accessible from the Verizon network. When accessing your HarperDB instance please ensure you are connected to the Verizon network, examples include Verizon 5G Internet, Verizon Hotspots, or Verizon mobile devices.
HarperDB on Verizon 5G Wavelength brings HarperDB closer to the end user exclusively on the Verizon network resulting in as little as single-digit millisecond response time from HarperDB to the client.
Instances are built via AWS Wavelength. You can read more about AWS Wavelength here.
HarperDB 5G Wavelength Instance Specs While HarperDB 5G Wavelength bills by RAM, each instance has other specifications associated with the RAM selection. The following table describes each instance size in detail*.
t3.medium
4
2
Up to 5
Up to 3.1 GHz Intel Xeon Platinum Processor
t3.xlarge
16
4
Up to 5
Up to 3.1 GHz Intel Xeon Platinum Processor
r5.2xlarge
64
8
Up to 10
Up to 3.1 GHz Intel Xeon Platinum Processor
*Specifications are subject to change. For the most up to date information, please refer to AWS documentation.
HarperDB 5G Wavelength utilizes AWS Elastic Block Storage (EBS) General Purpose SSD (gp2) volumes. This is the most common storage type used in AWS, as it provides reasonable performance for most workloads, at a reasonable price.
AWS EBS gp2 volumes have a baseline performance level, which determines the number of IOPS it can perform indefinitely. The larger the volume, the higher it’s baseline performance. Additionally, smaller gp2 volumes are able to burst to a higher number of IOPS for periods of time.
Smaller gp2 volumes are perfect for trying out the functionality of HarperDB, and might also work well for applications that don’t perform many database transactions. For applications that perform a moderate or high number of transactions, we recommend that you use a larger HarperDB volume. Learn more about the impact of IOPS on performance here.
You can read more about AWS EBS gp2 volume IOPS here.
HarperDB, like any database, can place a tremendous load on its storage resources. Storage, not CPU or memory, will more often be the bottleneck of server, virtual machine, or a container running HarperDB. Understanding how storage works, and how much storage performance your workload requires, is key to ensuring that HarperDB performs as expected.
The primary measure of storage performance is the number of input/output operations per second (IOPS) that a storage device can perform. Different storage devices can have dramatically different performance profiles. A hard drive (HDD) might only perform a hundred or so IOPS, while a solid state drive (SSD) might be able to perform tens or hundreds of thousands of IOPS.
Cloud providers like AWS, which powers HarperDB Cloud, don’t typically attach individual disks to a virtual machine or container. Instead, they combine large numbers of storage drives to create very high performance storage servers. Chunks (volumes) of that storage is then carved out and presented to many different virtual machines and containers. Due to the shared nature of this type of storage, the cloud provider places configurable limits on the number of IOPS that a volume can perform. The same way that cloud providers charge more for larger capacity volumes, they also charge more for volumes with more IOPS.
HarperDB Cloud utilizes AWS Elastic Block Storage (EBS) General Purpose SSD (gp3) volumes. This is the most common storage type used in AWS, as it provides reasonable performance for most workloads, at a reasonable price.
AWS EBS gp3 volumes have a baseline performance level of 3,000 IOPS, as a result, all HarperDB Cloud storage options will offer 3,000 IOPS. We plan to offer scalable IOPS as an option in the future.
You can read more about AWS EBS volume IOPS here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html.
The number of IOPS required for a particular workload is influenced by many factors. Testing your particular application is the best way to determine the number of IOPS required. A reliable method is to estimate about two IOPS for every index, including the primary key itself. So if a table has two indices besides primary key, estimate that an insert or update will require about six IOPS. Note that that can often be closer to one IOPS per index under load due to internal batching of writes, and sometimes even better when doing sequential inserts. Again it is best to test to verify this with application specific data and write patterns.
For assistance in estimating IOPS requirements feel free to contact HarperDB Support or join our Community Slack Channel.
Sensor Data Collection
In case of IoT sensors where data collection will be sustained high IOPS are required. While there are not typically large queries going on in this case, there is a high volume of data being ingested. This implies that IOPS will be sustained at a high level. For example, if you are collection 100 records per second you would expect to need roughly 3,000 IOPS just to handle the data inserts.
Data Analytics/BI Server
Providing a server for analytics purposes typically requires a larger machine. Typically these cases involve large scale SQL joins and aggregations, which puts a large strain on reads. HarperDB utilizes an in-memory cache, which provides a significant performance boost on machines with large amounts of memory. However, if disparate datasets are constantly being queried and/or new data is frequently being loaded, you will find that the system still needs to have high IOPS to meet performance demand.
Web Services
Typical web service implementations with discrete reads and writes often do not need high IOPS to perform as expected. This is often the case is more transactional systems without the requirement for high performance load. A good rule to follow is that any HarperDB operation that requires a data scan will be IOPS intensive, but if these are not frequent then the EBS boost will suffice. Queries utilizing equals operations in either SQL or NoSQL do not require a scan due to HarperDB’s native indexing.
High Performance Database
Ultimately, if performance is your top priority, HarperDB should be run on bare metal hardware. Cloud providers offer these options at a higher cost, but they come with obvious performance improvements.
While HarperDB Cloud bills by RAM, each instance has other specifications associated with the RAM selection. The following table describes each instance size in detail*.
t3.nano
0.5
2
Up to 5
2.5 GHz Intel Xeon Platinum 8000
t3.micro
1
2
Up to 5
2.5 GHz Intel Xeon Platinum 8000
t3.small
2
2
Up to 5
2.5 GHz Intel Xeon Platinum 8000
t3.medium
4
2
Up to 5
2.5 GHz Intel Xeon Platinum 8000
m5.large
8
2
Up to 10
Up to 3.1 GHz Intel Xeon Platinum 8000
m5.xlarge
16
4
Up to 10
Up to 3.1 GHz Intel Xeon Platinum 8000
m5.2xlarge
32
8
Up to 10
Up to 3.1 GHz Intel Xeon Platinum 8000
m5.4xlarge
64
16
Up to 10
Up to 3.1 GHz Intel Xeon Platinum 8000
m5.8xlarge
128
32
10
Up to 3.1 GHz Intel Xeon Platinum 8000
m5.12xlarge
192
48
10
Up to 3.1 GHz Intel Xeon Platinum 8000
m5.16xlarge
256
64
20
Up to 3.1 GHz Intel Xeon Platinum 8000
m5.24xlarge
384
96
25
Up to 3.1 GHz Intel Xeon Platinum 8000
*Specifications are subject to change. For the most up to date information, please refer to AWS documentation: https://aws.amazon.com/ec2/instance-types/.
HarperDB uses role-based, attribute-level security to ensure that users can only gain access to the data they’re supposed to be able to access. Our granular permissions allow for unparalleled flexibility and control, and can actually lower the total cost of ownership compared to other database solutions, since you no longer have to replicate subsets of your data to isolate use cases.
Node name is the name given to a node. It is how nodes are identified within the cluster and must be unique to the cluster.
The name cannot contain any of the following characters: .,*>
. Dot, comma, asterisk, greater than, or whitespace.
The name is set in the harperdb-config.yaml
file using the clustering.nodeName
configuration element.
Note: If you want to change the node name make sure there are no subscriptions in place before doing so. After the name has been changed a full restart is required.
There are multiple ways to update this element, they are:
Directly editing the harperdb-config.yaml
file.
Note: When making any changes to the harperdb-config.yaml
file HarperDB must be restarted for the changes to take effect.
Calling set_configuration
through the operations API
Using command line variables.
Using environment variables.
HarperDB uses token based authentication with JSON Web Tokens, JWTs.
This consists of two primary operations create_authentication_tokens
and refresh_operation_token
. These generate two types of tokens, as follows:
The operation_token
which is used to authenticate all HarperDB operations in the Bearer Token Authorization Header. The default expiry is one day.
The refresh_token
which is used to generate a new operation_token
upon expiry. This token is used in the Bearer Token Authorization Header for the refresh_operation_token
operation only. The default expiry is thirty days.
The create_authentication_tokens
operation can be used at any time to refresh both tokens in the event that both have expired or been lost.
Users must initially create tokens using their HarperDB credentials. The following POST body is sent to HarperDB. No headers are required for this POST operation.
A full cURL example can be seen here:
An example expected return object is:
The operation_token
value is used to authenticate all operations in place of our standard Basic auth. In order to pass the token you will need to create an Bearer Token Authorization Header like the following request:
operation_token
expires at a set interval. Once it expires it will no longer be accepted by HarperDB. This duration defaults to one day, and is configurable in harperdb-config.yaml. To generate a new operation_token
, the refresh_operation_token
operation is used, passing the refresh_token
in the Bearer Token Authorization Header. A full cURL example can be seen here:
This will return a new operation_token
. An example expected return object is:
The refresh_token
also expires at a set interval, but a longer interval. Once it expires it will no longer be accepted by HarperDB. This duration defaults to thirty days, and is configurable in harperdb-config.yaml. To generate a new operation_token
and a new refresh_token
the create_authentication_tokensoperation
is called.
Token timeouts are configurable in harperdb-config.yaml with the following parameters:
operationsApi.authentication.operationTokenTimeout
: Defines the length of time until the operation_token expires (default 1d).
operationsApi.authentication.refreshTokenTimeout
: Defines the length of time until the refresh_token expires (default 30d).
A full list of valid values for both parameters can be found here.
HarperDB clustering is the process of connecting multiple HarperDB databases together to create a database mesh network that enables users to define data replication patterns.
HarperDB’s clustering engine replicates data between instances of HarperDB using a highly performant, bi-directional pub/sub model on a per-table basis. Data replicates asynchronously with eventual consistency across the cluster following the defined pub/sub configuration. Individual transactions are sent in the order in which they were transacted, once received by the destination instance, they are processed in an ACID-compliant manor. Conflict resolution follows a last writer wins model based on recorded transaction time on the transaction and the timestamp on the record on the node.
A common use case is an edge application collecting and analyzing sensor data that creates an alert if a sensor value exceeds a given threshold:
The edge application should not be making outbound http requests for security purposes.
There may not be a reliable network connection.
Not all sensor data will be sent to the cloud--either because of the unreliable network connection, or maybe it’s just a pain to store it.
The edge node should be inaccessible from outside the firewall.
The edge node will send alerts to the cloud with a snippet of sensor data containing the offending sensor readings.
HarperDB simplifies the architecture of such an application with its bi-directional, table-level replication:
The edge instance subscribes to a “thresholds” table on the cloud instance, so the application only makes localhost calls to get the thresholds.
The application continually pushes sensor data into a “sensor_data” table via the localhost API, comparing it to the threshold values as it does so.
When a threshold violation occurs, the application adds a record to the “alerts” table.
The application appends to that record array “sensor_data” entries for the 60 seconds (or minutes, or days) leading up to the threshold violation.
The edge instance publishes the “alerts” table up to the cloud instance.
By letting HarperDB focus on the fault-tolerant logistics of transporting your data, you get to write less code. By moving data only when and where it’s needed, you lower storage and bandwidth costs. And by restricting your app to only making local calls to HarperDB, you reduce the overall exposure of your application to outside forces.
To create a cluster you must have two or more nodes* (aka instances) of HarperDB running.
*A node is a single instance/installation of HarperDB. A node of HarperDB can operate independently with clustering on or off.
On the following pages we'll walk you through the steps required, in order, to set up a HarperDB cluster.
HarperDB uses Basic Auth and JSON Web Tokens (JWTs) to secure our HTTP requests. In the context of an HTTP transaction, basic access authentication is a method for an HTTP user agent to provide a user name and password when making a request.
** You do not need to log in separately. Basic Auth is added to each HTTP request like create_schema, create_table, insert etc… via headers. **
A header is added to each HTTP request. The header key is “Authorization” the header value is “Basic <<your username and password buffer token>>”
In the below code sample, you can see where we add the authorization header to the request. This needs to be added for each and every HTTP request for HarperDB.
Note: This function uses btoa. Learn about btoa here.
HarperDB was set up to require very minimal configuration to work out of the box. There are, however, some best practices we encourage for anyone building an app with HarperDB.
HarperDB allows for managing cross-origin HTTP requests. By default, HarperDB enables CORS for all domains if you need to disable CORS completely or set up an access list of domains you can do the following:
Open the harperdb-config.yaml file this can be found in <ROOTPATH>, the location you specified during install.
In harperdb-config.yaml there should be 2 entries under operationsApi.network
: cors and corsAccessList.
cors
To turn off, change to: cors: false
To turn on, change to: cors: true
corsAccessList
The corsAccessList
will only be recognized by the system when cors
is true
To create an access list you set corsAccessList
to a comma-separated list of domains.
i.e. corsAccessList
is http://harperdb.io,http://products.harperdb.io
To clear out the access list and allow all domains: corsAccessList
is [null]
HarperDB provides the option to use an HTTP or HTTPS and HTTP/2 interface. The default port for the server is 9925.
These default ports can be changed by updating the operationsApi.network.port
value in <ROOTPATH>/harperdb-config.yaml
By default, HTTPS is turned off and HTTP is turned on. It is recommended that you never directly expose HarperDB's HTTP interface through a publicly available port. HTTP is intended for local or private network use.
You can toggle HTTPS and HTTP in the settings file. By setting operationsApi.network.https
to true/false. When https
is set to false
, the server will use HTTP (version 1.1). Enabling HTTPS will enable both HTTPS/1.1 and HTTPS/2.
HarperDB automatically generates a certificate (certificate.pem), a certificate authority (ca.pem) and a private key file (privateKey.pem) which live at <ROOTPATH>/keys/
.
You can replace these with your own certificates and key.
Changes to these settings require a restart. Use operation harperdb restart
from HarperDB Operations API.
Additional information that will help you define your clustering topology.
Transactions that are replicated across the cluster are:
Insert
Update
Upsert
Delete
Bulk loads
CSV data load
CSV file load
CSV URL load
Import from S3
When adding or updating a node any schemas and tables in the subscription that don’t exist on the remote node will be automatically created.
Destructive schema operations do not replicate across a cluster. Those operations include drop_schema
, drop_table
, and drop_attribute
. If the desired outcome is to drop schema information from any nodes then the operation(s) will need to be run on each node independently.
Users and roles are not replicated across the cluster.
HarperDB has built-in resiliency for when network connectivity is lost within a subscription. When connections are reestablished, a catchup routine is executed to ensure data that was missed, specific to the subscription, is sent/received as defined.
HarperDB clustering creates a mesh network between nodes giving end users the ability to create an infinite number of topologies. subscription topologies can be simple or as complex as needed.
Subscriptions can be added, updated, or removed through the API.
Note: The schema and tables in the subscription must exist on either the local or the remote node. Any schema and tables that do not exist on one particular node, for example, the local node, will be automatically created on the local node.
To add a single node and create one or more subscriptions use add_node
.
This is an example of adding Node2 to your local node. Subscriptions are created for two tables, dog and chicken.
To update one or more subscriptions with a single node use update_node
.
This call will update the subscription with the dog table. Any other subscriptions with Node2 will not change.
To add or update subscriptions with one or more nodes in one API call use configure_cluster
.
Note: configure_cluster
will override any and all existing subscriptions defined on the local node. This means that before going through the connections in the request and adding the subscriptions, it will first go through all existing subscriptions the local node has and remove them. To get all existing subscriptions use cluster_status
.
There is an optional property called start_time
that can be passed in the subscription. This property accepts an ISO formatted UTC date.
start_time
can be used to set from what time you would like to source transactions from a table when creating or updating a subscription.
This example will get all transactions on Node2’s dog table starting from 2022-09-02T20:06:35.993Z
and replicate them locally on the dog table.
If no start time is passed it defaults to the current time.
Note: start time utilizes clustering to back source transactions. For this reason it can only source transactions that occurred when clustering was enabled.
To remove a node and all its subscriptions use remove_node
.
To get the status of all connected nodes and see their subscriptions use cluster_status
.
HarperDB utilizes a Role-Based Access Control (RBAC) framework to manage access to HarperDB instances. A user is assigned a role that determines the user’s permissions to access database resources and run core operations.
Role permissions in HarperDB are broken into two categories – permissions around database manipulation and permissions around database definition.
Database Manipulation: A role defines CRUD (create, read, update, delete) permissions against database resources (i.e. data) in a HarperDB instance.
At the table-level access, permissions must be explicitly defined when adding or altering a role – i.e. HarperDB will assume CRUD access to be FALSE if not explicitly provided in the permissions JSON passed to the add_role
and/or alter_role
API operations.
At the attribute-level, permissions for attributes in all tables included in the permissions set will be assigned based on either the specific attribute-level permissions defined in the table’s permission set or, if there are no attribute-level permissions defined, permissions will be based on the table’s CRUD set.
Database Definition: Permissions related to managing schemas, tables, roles, users, and other system settings and operations are restricted to the built-in super_user
role.
Built-In Roles
There are three built-in roles within HarperDB. See full breakdown of operations restricted to only super_user roles .
super_user
- This role provides full access to all operations and methods within a HarperDB instance, this can be considered the admin role.
This role provides full access to all Database Definition operations and the ability to run Database Manipulation operations across the entire database schema with no restrictions.
cluster_user
- This role is an internal system role type that is managed internally to allow clustered instances to communicate with one another.
This role is an internally managed role to facilitate communication between clustered instances.
structure_user
- This role provides specific access for creation and deletion of data.
When defining this role type you can either assign a value of true which will allow the role to create and drop schemas & tables. Alternatively the role type can be assigned a string array. The values in this array are schemas and allows the role to only create and drop tables in the designated schemas.
User-Defined Roles
In addition to built-in roles, admins (i.e. users assigned to the super_user role) can create customized roles for other users to interact with and manipulate the data within explicitly defined tables and attributes.
Unless the user-defined role is given super_user
permissions, permissions must be defined explicitly within the request body JSON.
Describe operations will return metadata for all schemas, tables, and attributes that a user-defined role has CRUD permissions for.
Role Permissions
When creating a new, user-defined role in a HarperDB instance, you must provide a role name and the permissions to assign to that role. Reminder, only super users can create and manage roles.
role
name used to easily identify the role assigned to individual users.
Roles can be altered/dropped based on the role name used in and returned from a successful add_role
, alter_role
, or list_roles
operation.
permissions
used to explicitly defined CRUD access to existing table data.
Example JSON for add_role
request
Setting Role Permissions
There are two parts to a permissions set:
super_user
– boolean value indicating if role should be provided super_user access.
If super_user
is set to true, there should be no additional schema-specific permissions values included since the role will have access to the entire database schema. If permissions are included in the body of the operation, they will stored within HarperDB, but ignored, as super_users have full access to the database.
permissions
: Schema tables that a role should have specific CRUD access to should be included in the final, schema-specific permissions
JSON.
For user-defined roles (i.e. non-super_user roles, blank permissions will result in the user being restricted from accessing any of the database schema.
Table Permissions JSON
Each table that a role should be given some level of CRUD permissions to must be included in the tables
array for its schema in the roles permissions JSON passed to the API (see example above).
Important Notes About Table Permissions
If a schema and/or any of its tables are not included in the permissions JSON, the role will not have any CRUD access to the schema and/or tables.
If a table-level CRUD permission is set to false, any attribute-level with that same CRUD permission set to true will return an error.
Important Notes About Attribute Permissions
If there are attribute-specific CRUD permissions that need to be enforced on a table, those need to be explicitly described in the attribute_permissions
array.
If a non-hash attribute is given some level of CRUD access, that same access will be assigned to the table’s hash_attribute
, even if it is not explicitly defined in the permissions JSON.
See table_name1’s permission set for an example of this – even though the table’s hash attribute is not specifically defined in the attribute_permissions array, because the role has CRUD access to ‘attribute1’, the role will have the same access to the table’s hash attribute.
If attribute-level permissions are set – i.e. attribute_permissions.length > 0 – any table attribute not explicitly included will be assumed to have not CRUD access (with the exception of the hash_attribute
described in #2).
See table_name1’s permission set for an example of this – in this scenario, the role will have the ability to create, insert and update ‘attribute1’ and the table’s hash attribute but no other attributes on that table.
If an attribute_permissions
array is empty, the role’s access to a table’s attributes will be based on the table-level CRUD permissions.
See table_name2’s permission set for an example of this.
The __createdtime__
and __updatedtime__
attributes that HarperDB manages internally can have read perms set but, if set, all other attribute-level permissions will be ignored.
Please note that DELETE permissions are not included as a part of an individual attribute-level permission set. That is because it is not possible to delete individual attributes from a row, rows must be deleted in full.
If a role needs the ability to delete rows from a table, that permission should be set on the table-level.
The practical approach to deleting an individual attribute of a row would be to set that attribute to null via an update statement.
The table below includes all API operations available in HarperDB and indicates whether or not the operation is restricted to super_user roles.
Keep in mind that non-super_user roles will also be restricted within the operations they do have access to by the schema-level CRUD permissions set for the roles.
You may have gotten an error like, Error: Must execute as <<username>>
.
This means that you installed HarperDB as <<user>>
. Because HarperDB stores files natively on the operating system, we only allow the HarperDB executable to be run by a single user. This prevents permissions issues on files.
For example if you installed as user_a, but later wanted to run as user_b. User_b may not have access to the hdb files HarperDB needs. This also keeps HarperDB more secure as it allows you to lock files down to a specific user and prevents other users from accessing your files.
Custom functions are a key part of building a complete HarperDB application. It is highly recommended that you use Custom Functions as the primary mechanism for your application to access your HarperDB database. Using Custom Functions gives you complete control over the accessible endpoints, how users are authenticated and authorized, what data is accessed from the database, and how it is aggregated and returned to users.
Add your own API endpoints to a standalone API server inside HarperDB
Use HarperDB Core methods to interact with your data at lightning speed
Custom Functions are powered by Fastify, so they’re extremely flexible
Manage in HarperDB Studio, or use your own IDE and Version Management System
Distribute your Custom Functions to all your HarperDB instances with a single click
A route is a connection between two nodes. It is how the clustering network is established.
Routes do not need to cross connect all nodes in the cluster. You can select a leader node or a few leaders and all nodes connect to them, you can chain, etc… As long as there is one route connecting a node to the cluster all other nodes should be able to reach that node.
Using routes the clustering servers will create a mesh network between nodes. This mesh network ensures that if a node drops out all other nodes can still communicate with each other. That being said, we recommend designing your routing with failover in mind, this means not storing all your routes on one node but dispersing them throughout the network.
A simple route example is a two node topology, if Node1 adds a route to connect it to Node2, Node2 does not need to add a route to Node1. That one route configuration is all that’s needed to establish a bidirectional connection between the nodes.
A route consists of a port
and a host
.
port
- the clustering port of the remote instance you are creating the connection with. This is going to be the clustering.hubServer.cluster.network.port
in the HarperDB configuration on the node you are connecting with.
host
- the host of the remote instance you are creating the connection with.This can be an IP address or a URL.
Routes are set in the harperdb-config.yaml
file using the clustering.hubServer.cluster.network.routes
element, which expects an object array, where each object has two properties, port
and host
.
This diagram shows one way of using routes to connect a network of nodes. Node2 and Node3 do not reference any routes in their config. Node1 contains routes for Node2 and Node3, which is enough to establish a network between all three nodes.
There are multiple ways to set routes, they are:
Directly editing the harperdb-config.yaml
file (refer to code snippet above).
Calling cluster_set_routes
through the API.
Note: When making any changes to HarperDB configuration HarperDB must be restarted for the changes to take effect.
From the command line.
Using environment variables.
The API also has cluster_get_routes
for getting all routes in the config and cluster_delete_routes
for deleting routes.
HarperDB’s Custom Functions is built on top of , so our route definitions follow their specifications. Below is a very simple example of a route declaration.
Route URLs are resolved in the following manner:
[Instance URL]:[Custom Functions Port]/[Project Name]/[Route URL]
The route below, within the dogs project, with a route of breeds would be available at http://localhost:9926/dogs/breeds.
In effect, this route is just a pass-through to HarperDB. The same result could have been achieved by hitting the core HarperDB API, since it uses hdbCore.preValidation and hdbCore.request, which are defined in the “helper methods” section, below.
For endpoints where you want to execute multiple operations against HarperDB, or perform additional processing (like an ML classification, or an aggregation, or a call to a 3rd party API), you can define your own logic in the handler. The function below will execute a query against the dogs table, and filter the results to only return those dogs over 4 years in age.
IMPORTANT: This route has NO preValidation and uses hdbCore.requestWithoutAuthentication, which- as the name implies- bypasses all user authentication. See the security concerns and mitigations in the “helper methods” section, below.
The simple example above was just a pass-through to HarperDB- the exact same result could have been achieved by hitting the core HarperDB API. But for many applications, you may want to authenticate the user using custom logic you write, or by conferring with a 3rd party service. Custom preValidation hooks let you do just that.
Below is an example of a route that uses a custom validation hook:
When declaring routes, you are given access to 2 helper methods: hdbCore and logger.
hdbCore
hdbCore contains three functions that allow you to authenticate an inbound request, and execute operations against HarperDB directly, by passing the standard Operations API.
preValidation
This takes the authorization header from the inbound request and executes the same authentication as the standard HarperDB Operations API. It will determine if the user exists, and if they are allowed to perform this operation. If you use the request method, you have to use preValidation to get the authenticated user.
request
This will execute a request with HarperDB using the operations API. The request.body
should contain a standard HarperDB operation and must also include the hdb_user
property that was in request.body
provided in the callback.
requestWithoutAuthentication
Executes a request against HarperDB without any security checks around whether the inbound user is allowed to make this request. For security purposes, you should always take the following precautions when using this method:
Properly handle user-submitted values, including url params. User-submitted values should only be used for search_value
and for defining values in records. Special care should be taken to properly escape any values if user-submitted values are used for SQL.
logger
This helper allows you to write directly to the Custom Functions log file, custom_functions.log. It’s useful for debugging during development, although you may also use the console logger. There are 5 functions contained within logger, each of which pertains to a different logging.level configuration in your harperdb-config.yaml file.
logger.trace(‘Starting the handler for /dogs’)
logger.debug(‘This should only fire once’)
logger.warn(‘This should never ever fire’)
logger.error(‘This did not go well’)
logger.fatal(‘This did not go very well at all’)
Inter-node authentication takes place via HarperDB users. There is a special role type called cluster_user
that exists by default and limits the user to only clustering functionality.
A cluster_user
must be created and added to the harperdb-config.yaml
file for clustering to be enabled.
All nodes that are intended to be clustered together need to share the same cluster_user
credentials (i.e. username and password).
There are multiple ways a cluster_user
can be created, they are:
Through the operations API by calling add_user
When using the API to create a cluster user the harperdb-config.yaml
file must be updated with the username of the new cluster user.
This can be done through the API by calling set_configuration
or by editing the harperdb-config.yaml
file.
In the harperdb-config.yaml
file under the top-level clustering
element there will be a user element. Set this to the name of the cluster user.
Note: When making any changes to the harperdb-config.yaml
file, HarperDB must be restarted for the changes to take effect.
Upon installation using command line variables. This will automatically set the user in the harperdb-config.yaml
file.
Note: Using command line or environment variables for setting the cluster user only works on install.
Upon installation using environment variables. This will automatically set the user in the harperdb-config.yaml
file.
A subscription defines how data should move between two nodes. They are exclusively table level and operate independently. They connect a table on one node to a table on another node, the subscription will apply to a matching schema name and table name on both nodes.
Note: ‘local’ and ‘remote’ will often be referred to. In the context of these docs ‘local’ is the node that is receiving the API request to create/update a subscription and remote is the other node that is referred to in the request, the node on the other end of the subscription.
A subscription consists of:
schema
- the name of the schema that the table you are creating the subscription for belongs to.
table
- the name of the table the subscription will apply to.
publish
- a boolean which determines if transactions on the local table should be replicated on the remote table.
subscribe
- a boolean which determines if transactions on the remote table should be replicated on the local table.
This diagram is an example of a publish
subscription from the perspective of Node1.
The record with id 2 has been inserted in the dog table on Node1, after it has completed that insert it is sent to Node 2 and inserted in the dog table there.
This diagram is an example of a subscribe
subscription from the perspective of Node1.
The record with id 3 has been inserted in the dog table on Node2, after it has completed that insert it is sent to Node1 and inserted there.
This diagram shows both subscribe and publish but publish is set to false. You can see that because subscribe is true the insert on Node2 is being replicated on Node1 but because publish is set to false the insert on Node1 is not being replicated on Node2.
This shows both subscribe and publish set to true. The insert on Node1 is replicated on Node2 and the update on Node2 is replicated on Node1.
Notice we imported customValidation from the helpers directory. To include a helper, and to see the actual code within customValidation, see .
describe_all
describe_schema
describe_table
create_schema
X
drop_schema
X
create_table
X
drop_table
X
create_attribute
drop_attribute
X
insert
update
upsert
delete
search_by_hash
search_by_value
search_by_conditions
select
insert
update
delete
csv_data_load
csv_file_load
csv_url_load
import_from_s3
list_roles
X
add_role
X
alter_role
X
drop_role
X
list_users
X
user_info
add_user
X
alter_user
X
drop_user
X
cluster_set_routes
X
cluster_get_routes
X
cluster_delete_routes
X
add_node
X
update_node
X
cluster_status
X
remove_node
X
configure_cluster
X
custom_functions_status
X
get_custom_functions
X
get_custom_function
X
set_custom_function
X
drop_custom_function
X
add_custom_function_project
X
drop_custom_function_project
X
package_custom_function_project
X
deploy_custom_function_project
X
registration_info
get_fingerprint
X
set_license
X
get_job
search_jobs_by_start_date
X
read_log
X
read_transaction_log
X
delete_transaction_logs_before
X
read_audit_log
X
delete_audit_logs_before
X
delete_records_before
X
export_local
X
export_to_s3
X
system_information
X
restart
X
restart_service
X
get_configuration
X
configure_cluster
X
create_authentication_tokens
refresh_operation_token
Helpers are functions for use within your routes. You may want to use the same helper in multiple route files, so this allows you to write it once, and include it wherever you need it.
To use your helpers, they must be exported from your helper file. Please use any standard export mechanisms available for your module system. We like ESM, ECMAScript Modules. Our example below exports using module.exports
.
You must import the helper module into the file that needs access to the exported functions. With ESM, you'd use a require
statement. See this example in Define Routes.
Below is code from the customValidation helper that is referenced in Define Routes. It takes the request and the logger method from the route declaration, and makes a call to an external API to validate the headers using fetch. The API in this example is just returning a list of ToDos, but it could easily be replaced with a call to a real authentication service.
One way to manage Custom Functions is through HarperDB Studio. It performs all the necessary operations automatically. To get started, navigate to your instance in HarperDB Studio and click the subnav link for “functions”. If you have not yet enabled Custom Functions, it will walk you through the process. Once configuration is complete, you can manage and deploy Custom Functions in minutes.
HarperDB Studio manages your Custom Functions using nine HarperDB operations. You may view these operations within our API Docs. A brief overview of each of the operations is below:
custom_functions_status
Returns the state of the Custom Functions server. This includes whether it is enabled, upon which port it is listening, and where its root project directory is located on the host machine.
get_custom_functions
Returns an array of projects within the Custom Functions root project directory. Each project has details including each of the files in the routes and helpers directories, and the total file count in the static folder.
get_custom_function
Returns the content of the specified file as text. HarperDB Studio uses this call to render the file content in its built-in code editor.
set_custom_function
Updates the content of the specified file. HarperDB Studio uses this call to save any changes made through its built-in code editor.
drop_custom_function
Deletes the specified file.
add_custom_function_project
Creates a new project folder in the Custom Functions root project directory. It also inserts into the new directory the contents of our Custom Functions Project template, which is available publicly, here: https://github.com/HarperDB/harperdb-custom-functions-template.
drop_custom_function_project
Deletes the specified project folder and all of its contents.
package_custom_function_project
Creates a .tar file of the specified project folder, then reads it into a base64-encoded string and returns that string the user.
deploy_custom_function_project
Takes the output of package_custom_function_project, decrypts the base64-encoded string, reconstitutes the .tar file of your project folder, and extracts it to the Custom Functions root project directory.
The @fastify/static module can be utilized to serve static files.
Install the module in your project by running npm i @fastify/static
from inside your project directory.
Register @fastify/static
with the server and set root
to the absolute path of the directory that contains the static files to serve.
For further information on how to send specific files see the @fastify/static docs.
To create a project using our web-based GUI, HarperDB Studio, checkout out how to manage Custom Functions here.
Otherwise, to create a project, you have the following options:
Use the add_custom_function_project operation
This operation creates a new project folder, and populates it with templates for the routes, helpers, and static subfolders.
Clone our public gitHub project template
This requires a local installation. Remove the .git directory for a clean slate of git history.
Create a project folder in your Custom Functions root directory and initialize
This requires a local installation.
Custom function projects can be structured and managed like normal Node.js projects. You can include external dependencies, include them in your route and helper files, and manage your revisions without changing your development tooling or pipeline.
To initialize your project to use npm packages, use the terminal to execute npm init
from the root of your project folder.
To implement version control using git, use the terminal to execute git init
from the root of your project folder.
The purpose of this guide is to describe the available functionality of HarperDB as it relates to supported SQL functionality. The SQL parser is still actively being developed and this document will be updated as more features and functionality becomes available. A high-level view of supported features can be found .
HarperDB adheres to the concept of schemas & tables. This allows developers to isolate table structures from each other all within one database.
Check out our always-expanding library of templates in our open-source .
One way to manage Custom Functions is through . It performs all the necessary operations automatically. To get started, navigate to your instance in HarperDB Studio and click the subnav link for “functions”. If you have not yet enabled Custom Functions, it will walk you through the process. Once configuration is complete, you can manage and deploy Custom Functions in minutes.
For any changes made to your routes, helpers, or projects, you’ll need to restart the Custom Functions server to see them take effect. HarperDB Studio does this automatically whenever you create or delete a project, or add, edit, or edit a route or helper. If you need to start the Custom Functions server yourself, you can use the following operation to do so:
Library of example projects and tutorials using Custom Functions:
, by Yitaek Hwang
, by Danny Adams
, by Lucas Santos
, by Colby Fayock
, by Terra Roush
, by Andrew Baisden
, by Kevin Ashcraft
, by Patrick Löber
, by Soumya Ranjan Mohanty
, by Hrithwik Bharadwaj
, by Tapas Adhikary
, by Ankur Tyagi
, livestream by Jaxon Repp
, by Davis David
, Select* Podcast
HarperDB Custom Functions projects are managed by HarperDB’s process manager. As such, it may seem more difficult to debug Custom Functions than your standard project. The goal of this document is to provide best practices and recommendations for debugging your Custom Function.
For local debugging and development, it is recommended that you use standard console log statements for logging. For production use, you may want to use HarperDB's logging facilities, so you aren't logging to the console. The includes the HarperDB logger module in the primary function parameters with the name logger
. This logger can be used to output messages directly to the HarperDB log using standardized logging level functions, described below. The log level can be set in the .
HarperDB Logger Functions
trace(message)
: Write a 'trace' level log, if the configured level allows for it.
debug(message)
: Write a 'debug' level log, if the configured level allows for it.
info(message)
: Write a 'info' level log, if the configured level allows for it.
warn(message)
: Write a 'warn' level log, if the configured level allows for it.
error(message)
: Write a 'error' level log, if the configured level allows for it.
fatal(message)
: Write a 'fatal' level log, if the configured level allows for it.
notify(message)
: Write a 'notify' level log.
For debugging purposes, it is recommended to use notify
as these messages will appear in the log regardless of log level configured.
The HarperDB Log can be found on the or in the local Custom Functions log file, <HDBROOT>/log/custom_functions.log
. Additionally, you can use the to query the HarperDB log.
This example performs a SQL query in HarperDB and logs the result. This example utilizes the logger.notify
function to log the stringified version of the result. If an error occurs, it will output the error using logger.error
and return the error.
This example performs two SQL queries in HarperDB with logging throughout to describe what is happening. This example utilizes the logger.notify
function to log the stringified version of the operation and the result of each query. If an error occurs, it will output the error using logger.error
and return the error.
Before you get started with Custom Functions, here’s a primer on the basic configuration and the structure of a Custom Functions Project.
Custom Functions are configured in the harperdb-config.yaml file located in the operations API root directory (by default this is a directory named hdb
located in the home directory of the current user). Below is a view of the Custom Functions' section of the config YAML file, plus descriptions of important Custom Functions settings.
enabled
A boolean value that tells HarperDB to start the Custom Functions server. Set it to true to enable custom functions and false to disable. enabled
is true
by default.
network.port
This is the port HarperDB will use to start a standalone Fastify Server dedicated to serving your Custom Functions’ routes.
root
This is the root directory where your Custom Functions projects and their files will live. By default, it’s in your <ROOTPATH>, but you can locate it anywhere--in a developer folder next to your other development projects, for example.
Please visit our configuration docs for a more comprehensive look at these settings.
project folder
The name of the folder that holds your project files serves as the root prefix for all the routes you create. All routes created in the dogs project folder will have a URL like this: https://my-server-url.com:9926/dogs/my/route. As such, it’s important that any project folders you create avoid any characters that aren’t URL-friendly. You should avoid URL delimiters in your folder names.
/routes folder
Files in the routes folder define the requests that your Custom Functions server will handle. They are standard Fastify route declarations, so if you’re familiar with them, you should be up and running in no time. The default components for a route are the url, method, preValidation, and handler.
/helpers folder
These files are JavaScript modules that you can use in your handlers, or for custom preValidation
hooks. Examples include calls to third party Authentication services, filters for results of calls to HarperDB, and custom error responses. As modules, you can use standard import and export functionality.
/static folder
If you’d like to serve your visitors a static website, you can place the html and supporting files into a directory called static. The directory must have an index.html file, and can have as many supporting resources as are necessary in whatever subfolder structure you prefer within that static directory.
HarperDB supports updating existing table row(s) via UPDATE statements. Multiple conditions can be applied to filter the row(s) to update. At this time selecting from one table to update another is not supported.
HarperDB supports inserting 1 to n records into a table. The primary key must be unique (not used by any other record). If no primary key is provided, it will be assigned an auto-generated UUID. HarperDB does not support selecting from one table to insert into another at this time.
All HarperDB Add-Ons and SDKs can be found in the located in the .
is a free collaborative visualization tool which enables users to build configurable charts and tables quickly. The HarperDB Google Data Studio connector seamlessly integrates your HarperDB data with Google Data Studio so you can build custom, real-time data visualizations.
The HarperDB Google Data Studio Connector is subject to our and .
The HarperDB database must be accessible through the Internet in order for Google Data Studio servers to access it. The database may be hosted by you or via .
Get started by selecting the HarperDB connector from the .
Log in to https://datastudio.google.com/.
Add a new Data Source using the HarperDB connector. The current release version can be added as a data source by following this link: .
Authorize the connector to access other servers on your behalf (this allows the connector to contact your database).
Enter the Web URL to access your database (preferably with HTTPS), as well as the Basic Auth key you use to access the database. Just include the key, not the word “Basic” at the start of it.
Check the box for “Secure Connections Only” if you want to always use HTTPS connections for this data source; entering a Web URL that starts with https:// will do the same thing, if you prefer.
Check the box for “Allow Bad Certs” if your HarperDB instance does not have a valid SSL certificate. always has valid certificates, and so will never require this to be checked. Instances you set up yourself may require this, if you are using self-signed certs. If you are using or another instance you know should always have valid SSL certificates, do not check this box.
Choose your Query Type. This determines what information the configuration will ask for after pressing the Next button.
Table will ask you for a Schema and a Table to return all fields of using SELECT *
.
SQL will ask you for the SQL query you’re using to retrieve fields from the database. You may JOIN
multiple tables together, and use HarperDB specific SQL functions, along with the usual power SQL grants.
When all information is entered correctly, press the Connect button in the top right of the new Data Source view to generate the Schema. You may also want to name the data source at this point. If the connector encounters any errors, a dialog box will tell you what went wrong so you can correct the issue.
If there are no errors, you now have a data source you can use in your reports! You may change the types of the generated fields in the Schema view if you need to (for instance, changing a Number field to a specific currency), as well as creating new fields from the report view that do calculations on other fields.
You may sign out of your current user by going to the instances tab in HarperDB Studio, then clicking on the lock icon at the top-right of a given instance’s box. Click the lock again to sign in as any user. The Basic Auth token will be visible in the Authorization header portion of any code created in the Sample Code tab.
It’s highly recommended that you create a read-only user role in HarperDB Studio, and create a user with that role for your data sources to use. This prevents that authorization token from being used to alter your database, should someone else ever get ahold of it.
The RecordCount field is intended for use as a metric, for counting how many instances of a given set of values appear in a report’s data set.
Do not attempt to create fields with spaces in their names for any data sources! Google Data Studio will crash when attempting to retrieve a field with such a name, producing a System Error instead of a useful chart on your reports. Using CamelCase or snake_case gets around this.
HarperDB provides access to most SQL functions, and we’re always expanding that list. Check below to see if we cover what you need. If not, feel free to .
Both Postman and the app have ways to convert a user:password pair to a Basic Auth token. Use either to create the token for the connector’s user.
Multi-Conditions
✔
Wildcards
✔
IN
✔
LIKE
✔
Bit-wise Operators AND, OR
✔
Bit-wise Operators NOT
✔
NULL
✔
BETWEEN
✔
EXISTS,ANY,ALL
✔
Compare columns
✔
Compare constants
✔
Date Functions*
✔
Math Functions
✔
Sub-SELECT
✗
Multi-Column GROUP BY
✔
Aggregate function conditions
✔
Multi-Column ORDER BY
✔
Aliases
✔
Date Functions*
✔
Math Functions
✔
Values - multiple values supported
✔
Sub-SELECT
✗
SET
✔
Sub-SELECT
✗
Conditions
✔
Date Functions*
✔
Math Functions
✔
FROM
✔
Sub-SELECT
✗
Conditions
✔
Column SELECT
✔
Aliases
✔
Aggregator Functions
✔
Date Functions*
✔
Math Functions
✔
Constant Values
✔
Distinct
✔
Sub-SELECT
✗
Multi-table JOIN
✔
INNER JOIN
✔
LEFT OUTER JOIN
✔
LEFT INNER JOIN
✔
RIGHT OUTER JOIN
✔
RIGHT INNER JOIN
✔
FULL JOIN
✔
UNION
✗
Sub-SELECT
✗
TOP
✔
HarperDB supports deleting records from a table with condition support.
HarperDB utilizes in all internal SQL operations. This means that date values passed into any of the functions below will be assumed to be in UTC or in a format that can be translated to UTC.
When parsing date values passed to SQL date functions in HDB, we first check for formats, then for date-time format and then fall back to new Date(date_string)if a known format is not found.
Returns the current date in UTC in YYYY-MM-DD
String format.
Returns the current time in UTC in HH:mm:ss.SSS
String format.
Referencing this variable will evaluate as the current Unix Timestamp in milliseconds.
Formats and returns the date_string argument in UTC in YYYY-MM-DDTHH:mm:ss.SSSZZ
String format.
If a date_string is not provided, the function will return the current UTC date/time value in the return format defined above.
Adds the defined amount of time to the date provided in UTC and returns the resulting Unix Timestamp in milliseconds. Accepted interval values: Either string value (key or shorthand) can be passed as the interval argument.
Returns the difference between the two date values passed based on the interval as a Number. If an interval is not provided, the function will return the difference value in milliseconds.
Accepted interval values:
years
months
weeks
days
hours
minutes
seconds
Subtracts the defined amount of time from the date provided in UTC and returns the resulting Unix Timestamp in milliseconds. Accepted date_sub interval values- Either string value (key or shorthand) can be passed as the interval argument.
Extracts and returns the date_part requested as a String value. Accepted date_part values below show value returned for date = “2020-03-26T15:13:02.041+000”
Returns the current Unix Timestamp in milliseconds.
Returns the current date/time value based on the server’s timezone in YYYY-MM-DDTHH:mm:ss.SSSZZ
String format.
Returns the UTC date time value with the offset provided included in the return String value formatted as YYYY-MM-DDTHH:mm:ss.SSSZZ
. The offset argument will be added as minutes unless the value is less than 16 and greater than -16, in which case it will be treated as hours.
Returns the current Unix Timestamp in milliseconds.
This SQL keywords reference contains the SQL functions available in HarperDB.
*For more information on ARRAY() and DISTINCT_ARRAY() see .
The geoArea() function returns the area of one or more features in square meters.
geoArea(geoJSON)
Calculate the area, in square meters, of a manually passed GeoJSON polygon.
Find all records that have an area less than 1 square mile (or 2589988 square meters).
HarperDB has robust SELECT support, from simple queries all the way to complex joins with multi-conditions, aggregates, grouping & ordering.
All results are returned as JSON object arrays.
Query for all records and attributes in the dev.dog table:
Query specific columns from all rows in the dev.dog table:
Query for all records and attributes in the dev.dog table ORDERED BY age in ASC order:
*The ORDER BY keyword sorts in ascending order by default. To sort in descending order, use the DESC keyword.
HarperDB allows developers to join any number of tables and currently supports the following join types:
INNER JOIN LEFT
INNER JOIN LEFT
OUTER JOIN
Here’s a basic example joining two tables from our Get Started example- joining a dogs table with a breeds table:
This is a list of reserved words in the SQL Parser. Use of these words or symbols may result in unexpected behavior or inaccessible tables/attributes. If any of these words must be used, any SQL call referencing a schema, table, or attribute must have backticks (…
) or brackets ([…]) around the variable.
For Example, for a table called ASSERT in the dev schema, a SQL select on that table would look like:
Alternatively:
ABSOLUTE
ACTION
ADD
AGGR
ALL
ALTER
AND
ANTI
ANY
APPLY
ARRAY
AS
ASSERT
ASC
ATTACH
AUTOINCREMENT
AUTO_INCREMENT
AVG
BEGIN
BETWEEN
BREAK
BY
CALL
CASE
CAST
CHECK
CLASS
CLOSE
COLLATE
COLUMN
COLUMNS
COMMIT
CONSTRAINT
CONTENT
CONTINUE
CONVERT
CORRESPONDING
COUNT
CREATE
CROSS
CUBE
CURRENT_TIMESTAMP
CURSOR
DATABASE
DECLARE
DEFAULT
DELETE
DELETED
DESC
DETACH
DISTINCT
DOUBLEPRECISION
DROP
ECHO
EDGE
END
ENUM
ELSE
EXCEPT
EXISTS
EXPLAIN
FALSE
FETCH
FIRST
FOREIGN
FROM
GO
GRAPH
GROUP
GROUPING
HAVING
HDB_HASH
HELP
IF
IDENTITY
IS
IN
INDEX
INNER
INSERT
INSERTED
INTERSECT
INTO
JOIN
KEY
LAST
LET
LEFT
LIKE
LIMIT
LOOP
MATCHED
MATRIX
MAX
MERGE
MIN
MINUS
MODIFY
NATURAL
NEXT
NEW
NOCASE
NO
NOT
NULL
OFF
ON
ONLY
OFFSET
OPEN
OPTION
OR
ORDER
OUTER
OVER
PATH
PARTITION
PERCENT
PLAN
PRIMARY
PRIOR
QUERY
READ
RECORDSET
REDUCE
REFERENCES
RELATIVE
REPLACE
REMOVE
RENAME
REQUIRE
RESTORE
RETURN
RETURNS
RIGHT
ROLLBACK
ROLLUP
ROW
SCHEMA
SCHEMAS
SEARCH
SELECT
SEMI
SET
SETS
SHOW
SOME
SOURCE
STRATEGY
STORE
SYSTEM
SUM
TABLE
TABLES
TARGET
TEMP
TEMPORARY
TEXTSTRING
THEN
TIMEOUT
TO
TOP
TRAN
TRANSACTION
TRIGGER
TRUE
TRUNCATE
UNION
UNIQUE
UPDATE
USE
USING
VALUE
VERTEX
VIEW
WHEN
WHERE
WHILE
WITH
WORK
Formats and returns a date value in the String format provided. Find more details on accepted format values in the .
years
y
quarters
Q
months
M
weeks
w
days
d
hours
h
minutes
m
seconds
s
milliseconds
ms
years
y
quarters
Q
months
M
weeks
w
days
d
hours
h
minutes
m
seconds
s
milliseconds
ms
year
“2020”
month
“3”
day
“26”
hour
“15”
minute
“13”
second
“2”
millisecond
“41”
CURRENT_DATE
CURRENT_DATE()
Returns the current date in UTC in “YYYY-MM-DD” String format.
CURRENT_TIME
CURRENT_TIME()
Returns the current time in UTC in “HH:mm:ss.SSS” string format.
CURRENT_TIMESTAMP
CURRENT_TIMESTAMP
Referencing this variable will evaluate as the current Unix Timestamp in milliseconds. For more information, go here.
DATE
DATE([date_string])
Formats and returns the date_string argument in UTC in ‘YYYY-MM-DDTHH:mm:ss.SSSZZ’ string format. If a date_string is not provided, the function will return the current UTC date/time value in the return format defined above. For more information, go here.
DATE_ADD
DATE_ADD(date, value, interval)
Adds the defined amount of time to the date provided in UTC and returns the resulting Unix Timestamp in milliseconds. Accepted interval values: Either string value (key or shorthand) can be passed as the interval argument. For more information, go here.
DATE_DIFF
DATEDIFF(date_1, date_2[, interval])
Returns the difference between the two date values passed based on the interval as a Number. If an interval is not provided, the function will return the difference value in milliseconds. For more information, go here.
DATE_FORMAT
DATE_FORMAT(date, format)
Formats and returns a date value in the String format provided. Find more details on accepted format values in the moment.js docs. For more information, go here.
DATE_SUB
DATE_SUB(date, format)
Subtracts the defined amount of time from the date provided in UTC and returns the resulting Unix Timestamp in milliseconds. Accepted date_sub interval values- Either string value (key or shorthand) can be passed as the interval argument. For more information, go here.
DAY
DAY(date)
Return the day of the month for the given date.
DAYOFWEEK
DAYOFWEEK(date)
Returns the numeric value of the weekday of the date given(“YYYY-MM-DD”).NOTE: 0=Sunday, 1=Monday, 2=Tuesday, 3=Wednesday, 4=Thursday, 5=Friday, and 6=Saturday.
EXTRACT
EXTRACT(date, date_part)
Extracts and returns the date_part requested as a String value. Accepted date_part values below show value returned for date = “2020-03-26T15:13:02.041+000” For more information, go here.
GETDATE
GETDATE()
Returns the current Unix Timestamp in milliseconds.
GET_SERVER_TIME
GET_SERVER_TIME()
Returns the current date/time value based on the server’s timezone in YYYY-MM-DDTHH:mm:ss.SSSZZ
String format.
OFFSET_UTC
OFFSET_UTC(date, offset)
Returns the UTC date time value with the offset provided included in the return String value formatted as YYYY-MM-DDTHH:mm:ss.SSSZZ
. The offset argument will be added as minutes unless the value is less than 16 and greater than -16, in which case it will be treated as hours.
NOW
NOW()
Returns the current Unix Timestamp in milliseconds.
HOUR
HOUR(datetime)
Returns the hour part of a given date in range of 0 to 838.
MINUTE
MINUTE(datetime)
Returns the minute part of a time/datetime in range of 0 to 59.
MONTH
MONTH(date)
Returns month part for a specified date in range of 1 to 12.
SECOND
SECOND(datetime)
Returns the seconds part of a time/datetime in range of 0 to 59.
YEAR
YEAR(date)
Returns the year part for a specified date.
IF
IF(condition, value_if_true, value_if_false)
Returns a value if the condition is true, or another value if the condition is false.
IIF
IIF(condition, value_if_true, value_if_false)
Returns a value if the condition is true, or another value if the condition is false.
IFNULL
IFNULL(expression, alt_value)
Returns a specified value if the expression is null.
NULLIF
NULLIF(expression_1, expression_2)
Returns null if expression_1 is equal to expression_2, if not equal, returns expression_1.
ABS
ABS(expression)
Returns the absolute value of a given numeric expression.
CEIL
CEIL(number)
Returns integer ceiling, the smallest integer value that is bigger than or equal to a given number.
EXP
EXP(number)
Returns e to the power of a specified number.
FLOOR
FLOOR(number)
Returns the largest integer value that is smaller than, or equal to, a given number.
RANDOM
RANDOM(seed)
Returns a pseudo random number.
ROUND
ROUND(number,decimal_places)
Rounds a given number to a specified number of decimal places.
SQRT
SQRT(expression)
Returns the square root of an expression.
CONCAT
CONCAT(string_1, string_2, ...., string_n)
Concatenates, or joins, two or more strings together, resulting in a single string.
CONCAT_WS
CONCAT_WS(separator, string_1, string_2, ...., string_n)
Concatenates, or joins, two or more strings together with a separator, resulting in a single string.
INSTR
INSTR(string_1, string_2)
Returns the first position, as an integer, of string_2 within string_1.
LEN
LEN(string)
Returns the length of a string.
LOWER
LOWER(string)
Converts a string to lower-case.
REGEXP
SELECT column_name FROM schema.table WHERE column_name REGEXP pattern
Searches column for matching string against a given regular expression pattern, provided as a string, and returns all matches. If no matches are found, it returns null.
REGEXP_LIKE
SELECT column_name FROM schema.table WHERE REGEXP_LIKE(column_name, pattern)
Searches column for matching string against a given regular expression pattern, provided as a string, and returns all matches. If no matches are found, it returns null.
REPLACE
REPLACE(string, old_string, new_string)
Replaces all instances of old_string within new_string, with string.
SUBSTRING
SUBSTRING(string, string_position, length_of_substring)
Extracts a specified amount of characters from a string.
TRIM
TRIM([character(s) FROM] string)
Removes leading and trailing spaces, or specified character(s), from a string.
UPPER
UPPER(string)
Converts a string to upper-case.
BETWEEN
SELECT column_name(s) FROM schema.table WHERE column_name BETWEEN value_1 AND value_2
(inclusive) Returns values(numbers, text, or dates) within a given range.
IN
SELECT column_name(s) FROM schema.table WHERE column_name IN(value(s))
Used to specify multiple values in a WHERE clause.
LIKE
SELECT column_name(s) FROM schema.table WHERE column_n LIKE pattern
Searches for a specified pattern within a WHERE clause.
DISTINCT
SELECT DISTINCT column_name(s) FROM schema.table
Returns only unique values, eliminating duplicate records.
FROM
FROM schema.table
Used to list the schema(s), table(s), and any joins required for a SQL statement.
GROUP BY
SELECT column_name(s) FROM schema.table WHERE condition GROUP BY column_name(s) ORDER BY column_name(s)
Groups rows that have the same values into summary rows.
HAVING
SELECT column_name(s) FROM schema.table WHERE condition GROUP BY column_name(s) HAVING condition ORDER BY column_name(s)
Filters data based on a group or aggregate function.
SELECT
SELECT column_name(s) FROM schema.table
Selects data from table.
WHERE
SELECT column_name(s) FROM schema.table WHERE condition
Extracts records based on a defined condition.
CROSS JOIN
SELECT column_name(s) FROM schema.table_1 CROSS JOIN schema.table_2
Returns a paired combination of each row from table_1 with row from table_2. Note: CROSS JOIN can return very large result sets and is generally considered bad practice.
FULL OUTER
SELECT column_name(s) FROM schema.table_1 FULL OUTER JOIN schema.table_2 ON table_1.column_name = table_2.column_name WHERE condition
Returns all records when there is a match in either table_1 (left table) or table_2 (right table).
[INNER] JOIN
SELECT column_name(s) FROM schema.table_1 INNER JOIN schema.table_2 ON table_1.column_name = table_2.column_name
Return only matching records from table_1 (left table) and table_2 (right table). The INNER keyword is optional and does not affect the result.
LEFT [OUTER] JOIN
SELECT column_name(s) FROM schema.table_1 LEFT OUTER JOIN schema.table_2 ON table_1.column_name = table_2.column_name
Return all records from table_1 (left table) and matching data from table_2 (right table). The OUTER keyword is optional and does not affect the result.
RIGHT [OUTER] JOIN
SELECT column_name(s) FROM schema.table_1 RIGHT OUTER JOIN schema.table_2 ON table_1.column_name = table_2.column_name
Return all records from table_2 (right table) and matching data from table_1 (left table). The OUTER keyword is optional and does not affect the result.
IS NOT NULL
SELECT column_name(s) FROM schema.table WHERE column_name IS NOT NULL
Tests for non-null values.
IS NULL
SELECT column_name(s) FROM schema.table WHERE column_name IS NULL
Tests for null values.
DELETE
DELETE FROM schema.table WHERE condition
Deletes existing data from a table.
INSERT
INSERT INTO schema.table(column_name(s)) VALUES(value(s))
Inserts new records into a table.
UPDATE
UPDATE schema.table SET column_1 = value_1, column_2 = value_2, ...., WHERE condition
Alters existing records in a table.
AVG
AVG(expression)
Returns the average of a given numeric expression.
COUNT
SELECT COUNT(column_name) FROM schema.table WHERE condition
Returns the number records that match the given criteria. Nulls are not counted.
GROUP_CONCAT
GROUP_CONCAT(expression)
Returns a string with concatenated values that are comma separated and that are non-null from a group. Will return null when there are non-null values.
MAX
SELECT MAX(column_name) FROM schema.table WHERE condition
Returns largest value in a specified column.
MIN
SELECT MIN(column_name) FROM schema.table WHERE condition
Returns smallest value in a specified column.
SUM
SUM(column_name)
Returns the sum of the numeric values provided.
ARRAY*
ARRAY(expression)
Returns a list of data as a field.
DISTINCT_ARRAY*
DISTINCT_ARRAY(expression)
When placed around a standard ARRAY() function, returns a distinct (deduplicated) results set.
CAST
CAST(expression AS datatype(length))
Converts a value to a specified datatype.
CONVERT
CONVERT(data_type(length), expression, style)
Converts a value from one datatype to a different, specified datatype.
geoJSON
Required. One or more features.
Returns a new polygon with the difference of the second polygon clipped from the first polygon.
geoDifference(polygon1, polygon2)
polygon1
Required. Polygon or MultiPolygon GeoJSON feature.
polygon2
Required. Polygon or MultiPolygon GeoJSON feature to remove from polygon1.
Return a GeoJSON Polygon that removes City Park (polygon2) from Colorado (polygon1).
Takes a GeoJSON and measures its length in the specified units (default is kilometers).
geoLength(geoJSON[, units])
geoJSON
Required. GeoJSON to measure.
units
Optional. Specified as a string. Options are ‘degrees’, ‘radians’, ‘miles’, or ‘kilometers’. Default is ‘kilometers’.
Calculate the length, in kilometers, of a manually passed GeoJSON linestring.
Find all data plus the calculated length in miles of the GeoJSON, restrict the response to only lengths less than 5 miles, and return the data in order of lengths smallest to largest.
#geoDistance Calculates the distance between two points in units (default is kilometers).
geoDistance(point1, point2[, units])
point1
Required. GeoJSON Point specifying the origin.
point2
Required. GeoJSON Point specifying the destination.
units
Optional. Specified as a string. Options are ‘degrees’, ‘radians’, ‘miles’, or ‘kilometers’. Default is ‘kilometers’.
Calculate the distance, in miles, between HarperDB’s headquarters and the Washington Monument.
Find all locations that are within 40 kilometers of a given point, return that distance in miles, and sort by distance in an ascending order.
Determines if point1 and point2 are within a specified distance from each other, default units are kilometers. Returns a Boolean.
geoNear(point1, point2, distance[, units])
point1
Required. GeoJSON Point specifying the origin.
point2
Required. GeoJSON Point specifying the destination.
distance
Required. The maximum distance in units as an integer or decimal.
units
Optional. Specified as a string. Options are ‘degrees’, ‘radians’, ‘miles’, or ‘kilometers’. Default is ‘kilometers’.
Return all locations within 50 miles of a given point.
Return all locations within 2 degrees of the earth of a given point. (Each degree lat/long is about 69 miles [111 kilometers]). Return all data and the distance in miles, sorted by ascending distance.
HarperDB automatically indexes all top level attributes in a row / object written to a table. However, any attributes which holds JSON does not have its nested attributes indexed. In order to make searching and/or transforming these JSON documents easy, HarperDB offers a special SQL function called SEARCH_JSON. The SEARCH_JSON function works in SELECT & WHERE clauses allowing queries to perform powerful filtering on any element of your JSON by implementing the JSONata library into our SQL engine.
SEARCH_JSON(expression, attribute)
Executes the supplied string expression against data of the defined top level attribute for each row. The expression both filters and defines output from the JSON document.
Here are two records in the database:
Here is a simple query that gets any record with "Harper" found in the name.
The purpose of this query is to give us every movie where at least two of our favorite actors from Marvel films have acted together. The results will return the movie title, the overview, release date and an object array of the actor’s name and their character name in the movie.
Both function calls evaluate the credits.cast attribute, this attribute is an object array of every cast member in a movie.
A sample of this data from the movie The Avengers looks like
Let’s break down the SEARCH_JSON function call in the SELECT:
The first argument passed to SEARCH_JSON is the expression to execute against the second argument which is the cast attribute on the credits table. This expression will execute for every row. Looking into the expression it starts with “$[…]” this tells the expression to iterate all elements of the cast array.
Then the expression tells the function to only return entries where the name attribute matches any of the actors defined in the array:
So far, we’ve iterated the array and filtered out rows, but we also want the results formatted in a specific way, so we’ve chained an expression on our filter with: {“actor”: name, “character”: character}. This tells the function to create a specific object for each matching entry.
Sample Result
Just having the SEARCH_JSON function in our SELECT is powerful, but given our criteria it would still return every other movie that doesn’t have our matching actors, in order to filter out the movies we do not want we also use SEARCH_JSON in the WHERE clause.
This function call in the WHERE clause is similar, but we don’t need to perform the same transformation as occurred in the SELECT:
As seen above we execute the same name filter against the cast array, the primary difference is we are wrapping the filtered results in $count(…). As it looks this returns a count of the results back which we then use against our SQL comparator of >= 2.
To see further SEARCH_JSON examples in action view our Postman Collection that provides a sample schema & data with query examples: https://api.harperdb.io/
To learn more about how to build expressions check out the JSONata documentation: http://docs.jsonata.org/overview
Converts a series of coordinates into a GeoJSON of the specified type.
geoConvert(coordinates, geo_type[, properties])
coordinates
Required. One or more coordinates
geo_type
Required. GeoJSON geometry type. Options are ‘point’, ‘lineString’, ‘multiLineString’, ‘multiPoint’, ‘multiPolygon’, and ‘polygon’
properties
Optional. Escaped JSON array with properties to be added to the GeoJSON output.
Convert a given coordinate into a GeoJSON point with specified properties.
HarperDB geospatial features require data to be stored in a single column using the GeoJSON standard, a standard commonly used in geospatial technologies. Geospatial functions are available to be used in SQL statements.
If you are new to GeoJSON you should check out the full specification here: http://geojson.org/. There are a few important things to point out before getting started.
All GeoJSON coordinates are stored in [longitude, latitude]
format.
Coordinates or GeoJSON geometries must be passed as string when written directly in a SQL statement.
Note if you are using Postman for you testing. Due to limitations in the Postman client, you will need to escape quotes in your strings and your SQL will need to be passed on a single line.
In the examples contained in the left-hand navigation, schema and table names may change, but all GeoJSON data will be stored in a column named geo_data.
Determines if two GeoJSON features are the same type and have identical X,Y coordinate values. For more information see https://developers.arcgis.com/documentation/spatial-references/. Returns a Boolean.
geoEqual(geo1, geo2)
Find HarperDB Headquarters within all locations within the database.
The HarperDB command line interface (CLI) is used to administer .
To install HarperDB with CLI prompts, run the following command:
Alternatively, HarperDB installations can be automated with environment variables or command line arguments; . Note, when used in conjunction, command line arguments will override environment variables.
To start HarperDB after it is installed, run the following command:
To stop HarperDB once it is running, run the following command:
To restart HarperDB once it is running, run the following command:
The following commands are used to start, restart, or stop one or more HarperDB service without restarting the full application:
The following services are managed via the above commands:
HarperDB
Custom Functions
IPC
Clustering
To check the version of HarperDB that is installed run the following command:
To display all available HarperDB CLI commands along with a brief description run:
To display the status of the HarperDB process, the clustering hub and leaf processes, the clustering network and replication statuses, run:
HarperDB uses a transactional commit process that ensures that data on disk is always transactionally consistent with storage. This means that HarperDB maintains safety of database integrity in the event of a crash. It also means that you can use any standard volume snapshot tool to make a backup of a HarperDB database. Database files are stored in the hdb/schemas directory (organized schema directories). As long as the snapshot is an atomic snapshot of these database files, the data can be copied/movied back into the schemas directory to restore a previous backup (with HarperDB shut down) , and database integrity will be preserved. Note that simply copying an in-use database file (using cp
, for example) is not a snapshot, and this would progressively read data from the database at different points in time, which yields unreliable copy that likely will not be usable. Standard copying is only reliable for a database file that is not in use.
HarperDB maintains a log of events that take place throughout operation. Log messages can be used for diagnostics purposes as well as monitoring.
All logs (except for the install log) are stored in the main log file in the hdb directory <ROOTPATH>/log/hdb.log
. The install log is located in the HarperDB application directory most likely located in your npm directory npm/harperdb/logs
.
Each log message has several key components for consistent reporting of events. A log message has a format of:
For example, a typical log entry looks like:
The components of a log entry are:
timestamp - This is the date/time stamp when the event occurred
level - This is an associated log level that gives a rough guide to the importance and urgency of the message. The available log levels in order of least urgent (and more verbose) are: trace
, debug
, info
, warn
, error
, fatal
, and notify
.
thread/id - This reports the name of the thread and the thread id, that the event was reported on. Note that NATS logs are recorded by their process name and there is no thread id for them since they are a separate process. Key threads are:
main - This is the thread that is responsible for managing all other threads and routes incoming requests to the other threads
http - These are the worker threads that handle the primary workload of incoming HTTP requests to the operations API and custom functions.
Clustering* - These are threads and processes that handle replication.
job - These are job threads that have been started to handle operations that are executed in a separate job thread.
tags - Logging from a custom function will include a "custom-function" tag in the log entry. Most logs will not have any additional tags.
message - This is the main message that was reported.
We try to keep logging to a minimum by default, to do this the default log level is error
. If you require more information from the logs, increasing the log level down will provide that.
The log level can be changed by modifying logging.level
in the config file harperdb-config.yaml
.
HarperDB logs can optionally be streamed to standard streams. Logging to standard streams (stdout/stderr) is primarily used for container logging drivers. For more traditional installations, we recommend logging to a file. Logging to both standard streams and to a file can be enabled simultaneously. To log to standard streams effectively, make sure to directly run harperdb
and don't start it as a separate process (don't use harperdb start
) and logging.stdStreams
must be set to true. Note, logging to standard streams only will disable clustering catchup.
To access specific logs you may query the HarperDB API. Logs can be queried using the read_log
operation. read_log
returns outputs from the log based on the provided search criteria.
HarperDB is configured through a file called harperdb-config.yaml
located in the operations API root directory (by default this is a directory named hdb
located in the home directory of the current user).
All available configuration will be populated by default in the config file on install, regardless of whether it is used.
The configuration elements in harperdb-config.yaml
use camelcase: operationsApi
.
To change a configuration value edit the harperdb-config.yaml
file and save any changes. HarperDB must be restarted for changes to take effect.
Alternately, configuration can be changed via environment and/or command line variables or via the API. To access lower level elements, use underscores to append parent/child elements (when used this way elements are case insensitive):
clustering
The clustering
section configures the clustering engine, this is used to replicate data between instances of HarperDB.
Clustering offers a lot of different configurations, however in a majority of cases the only options you will need to pay attention to are:
clustering.enabled
Enable the clustering processes.
clustering.hubServer.cluster.network.port
The port other nodes will connect to. This port must be accessible from other cluster nodes.
clustering.hubServer.cluster.network.routes
The connections to other instances.
clustering.nodeName
The name of your node, must be unique within the cluster.
clustering.user
The name of the user credentials used for Inter-node authentication.
enabled
- Type: boolean; Default: false
Enable clustering.
Note: If you enabled clustering but do not create and add a cluster user you will get a validation error. See user
description below on how to add a cluster user.
clustering.hubServer.cluster
Clustering’s hubServer
facilitates the HarperDB mesh network and discovery service.
name
- Type: string, Default: harperdb
The name of your cluster. This name needs to be consistent for all other nodes intended to be meshed in the same network.
port
- Type: integer, Default: 9932
The port the hub server uses to accept cluster connections
routes
- Type: array, Default: null
An object array that represent the host and port this server will cluster to. Each object must have two properties port
and host
. Multiple entries can be added to create network resiliency in the event one server is unavailable. Routes can be added, updated and removed either by directly editing the harperdb-config.yaml
file or by using the cluster_set_routes
or cluster_delete_routes
API endpoints.
host
- Type: string
The host of the remote instance you are creating the connection with.
port
- Type: integer
The port of the remote instance you are creating the connection with. This is likely going to be the clustering.hubServer.cluster.network.port
on the remote instance.
clustering.hubServer.leafNodes
port
- Type: integer; Default: 9931
The port the hub server uses to accept leaf server connections.
clustering.hubServer.network
port
- Type: integer; Default: 9930
Use this port to connect a client to the hub server, for example using the NATs SDK to interact with the server.
clustering.leafServer
Manages streams, streams are ‘message stores’ that store table transactions.
port
- Type: integer; Default: 9940
Use this port to connect a client to the leaf server, for example using the NATs SDK to interact with the server.
routes
- Type: array; Default: null
An object array that represent the host and port the leaf node will directly connect with. Each object must have two properties port
and host
. Unlike the hub server, the leaf server will establish connections to all listed hosts. Routes can be added, updated and removed either by directly editing the harperdb-config.yaml
file or by using the cluster_set_routes
or cluster_delete_routes
API endpoints.
host
- Type: string
The host of the remote instance you are creating the connection with.
port
- Type: integer
The port of the remote instance you are creating the connection with. This is likely going to be the clustering.hubServer.cluster.network.port
on the remote instance.
clustering.leafServer.streams
maxAge
- Type: integer; Default: null
The maximum age of any messages in the stream, expressed in seconds.
maxBytes
- Type: integer; Default: null
The maximum size of the stream in bytes. Oldest messages are removed if the stream exceeds this size.
maxMsgs
- Type: integer; Default: null
How many messages may be in a stream. Oldest messages are removed if the stream exceeds this number.
path
- Type: string; Default: <ROOTPATH>/clustering/leaf
The directory where all the streams are kept.
logLevel
- Type: string; Default: error
Control the verbosity of clustering logs.
There exists a log level hierarchy in order as trace
, debug
, info
, warn
, and error
. When the level is set to trace
logs will be created for all possible levels. Whereas if the level is set to warn
, the only entries logged will be warn
and error
. The default value is error
.
nodeName
- Type: string; Default: null
The name of this node in your HarperDB cluster topology. This must be a value unique from the rest of the cluster node names.
Note: If you want to change the node name make sure there are no subscriptions in place before doing so. After the name has been changed a full restart is required.
tls
Transport Layer Security default values are automatically generated on install.
certificate
- Type: string; Default: <ROOTPATH>/keys/certificate.pem
Path to the certificate file.
certificateAuthority
- Type: string; Default: <ROOTPATH>/keys/ca.pem
Path to the certificate authority file.
privateKey
- Type: string; Default: <ROOTPATH>/keys/privateKey.pem
Path to the private key file.
insecure
- Type: boolean; Default: true
When true, will skip certificate verification. For use only with self-signed certs.
republishMessages
- Type: boolean; Default: true
When true, all transactions that are received from other nodes are republished to this node's stream. When subscriptions are not fully connected between all nodes, this ensures that messages are routed to all nodes through intermediate nodes. This also ensures that all writes, whether local or remote, are written to the NATS transaction log. However, there is additional overhead with republishing, and setting this is to false can provide better data replication performance. When false, you need to ensure all subscriptions are fully connected between every node to every other node, and be aware that the NATS transaction log will only consist of local writes.
verify
- Type: boolean; Default: true
When true, hub server will verify client certificate using the CA certificate.
user
- Type: string; Default: null
The username given to the cluster_user
. All instances in a cluster must use the same clustering user credentials (matching username and password).
Inter-node authentication takes place via a special HarperDB user role type called cluster_user
.
The user can be created either through the API using an add_user
request with the role set to cluster_user
, or on install using environment variables CLUSTERING_USER=cluster_person
CLUSTERING_PASSWORD=pass123!
or CLI variables harperdb --CLUSTERING_USER cluster_person
--CLUSTERING_PASSWORD
pass123!
customFunctions
The customFunctions
section configures HarperDB Custom Functions.
enabled
- Type: boolean; Default: true
Enable the Custom Function server or not.
customFunctions.network
cors
- Type: boolean; Default: true
Enable Cross Origin Resource Sharing, which allows requests across a domain.
corsAccessList
- Type: array; Default: null
An array of allowable domains with CORS
headersTimeout
- Type: integer; Default: 60,000 milliseconds (1 minute)
Limit the amount of time the parser will wait to receive the complete HTTP headers with.
https
- Type: boolean; Default: false
Enables HTTPS on the Custom Functions API. This requires a valid certificate and key. If false
, Custom Functions will run using standard HTTP.
keepAliveTimeout
- Type: integer; Default: 5,000 milliseconds (5 seconds)
Sets the number of milliseconds of inactivity the server needs to wait for additional incoming data after it has finished processing the last response.
port
- Type: integer; Default: 9926
The port used to access the Custom Functions server.
timeout
- Type: integer; Default: Defaults to 120,000 milliseconds (2 minutes)
The length of time in milliseconds after which a request will timeout.
nodeEnv
- Type: string; Default: production
Allows you to specify the node environment in which application will run.
production
native node logging is kept to a minimum; more caching to optimize performance. This is the default value.
development
more native node logging; less caching.
root
- Type: string; Default: <ROOTPATH>/custom_functions
The path to the folder containing Custom Function files.
tls
Transport Layer Security
certificate
- Type: string; Default: <ROOTPATH>/keys/certificate.pem
Path to the certificate file.
certificateAuthority
- Type: string; Default: <ROOTPATH>/keys/ca.pem
Path to the certificate authority file.
privateKey
- Type: string; Default: <ROOTPATH>/keys/privateKey.pem
Path to the private key file.
ipc
The ipc
section configures the HarperDB Inter-Process Communication interface.
port
- Type: integer; Default: 9383
The port the IPC server runs on. The default is 9383
.
localStudio
The localStudio
section configures the local HarperDB Studio, a simplified GUI for HarperDB hosted on the server. A more comprehensive GUI is hosted by HarperDB at https://studio.harperdb.io. Note, all database traffic from either localStudio
or HarperDB Studio is made directly from your browser to the instance.
enabled
- Type: boolean; Default: false
Enabled the local studio or not.
logging
The logging
section configures HarperDB logging across all HarperDB functionality. HarperDB leverages pm2 for logging. Each process group gets their own log file which is located in logging.root
.
auditLog
- Type: boolean; Default: false
Enabled table transaction logging.
To access the audit logs, use the API operation read_audit_log
. It will provide a history of the data, including original records and changes made, in a specified table.
file
- Type: boolean; Default: true
Defines whether or not to log to a file.
level
- Type: string; Default: error
Control the verbosity of logs.
There exists a log level hierarchy in order as trace
, debug
, info
, warn
, error
, fatal
, and notify
. When the level is set to trace
logs will be created for all possible levels. Whereas if the level is set to fatal
, the only entries logged will be fatal
and notify
. The default value is error
.
root
- Type: string; Default: <ROOTPATH>/log
The path where the log files will be written.
rotation
Rotation provides the ability for a user to systematically rotate and archive the hdb.log
file. To enable interval
and/or maxSize
must be set.
Note: interval
and maxSize
are approximates only. It is possible that the log file will exceed these values slightly before it is rotated.
enabled
- Type: boolean; Default: false
Enables logging rotation.
compress
- Type: boolean; Default: false
Enables compression via gzip when logs are rotated.
interval
- Type: string; Default: null
The time that should elapse between rotations. Acceptable units are D(ays), H(ours) or M(inutes).
maxSize
- Type: string; Default: null
The maximum size the log file can reach before it is rotated. Must use units M(egabyte), G(igabyte), or K(ilobyte).
path
- Type: string; Default: <ROOTPATH>/log
Where to store the rotated log file. File naming convention is HDB-YYYY-MM-DDT-HH-MM-SSSZ.log
.
stdStreams
- Type: boolean; Default: false
Log HarperDB logs to the standard output and error streams. The operationsApi.foreground
flag must be enabled in order to receive the stream.
operationsApi
The operationsApi
section configures the HarperDB Operations API.
authentication
operationTokenTimeout
- Type: string; Default: 1d
Defines the length of time an operation token will be valid until it expires. Example values: https://github.com/vercel/ms.
refreshTokenTimeout
- Type: string; Default: 1d
Defines the length of time a refresh token will be valid until it expires. Example values: https://github.com/vercel/ms.
foreground
- Type: boolean; Default: false
Determines whether or not HarperDB runs in the foreground.
network
cors
- Type: boolean; Default: true
Enable Cross Origin Resource Sharing, which allows requests across a domain.
corsAccessList
- Type: array; Default: null
An array of allowable domains with CORS
headersTimeout
- Type: integer; Default: 60,000 milliseconds (1 minute)
Limit the amount of time the parser will wait to receive the complete HTTP headers with.
https
- Type: boolean; Default: false
Enable HTTPS on the HarperDB operations endpoint. This requires a valid certificate and key. If false
, HarperDB will run using standard HTTP.
keepAliveTimeout
- Type: integer; Default: 5,000 milliseconds (5 seconds)
Sets the number of milliseconds of inactivity the server needs to wait for additional incoming data after it has finished processing the last response.
port
- Type: integer; Default: 9925
The port the HarperDB operations API interface will listen on.
timeout
- Type: integer; Default: Defaults to 120,000 milliseconds (2 minutes)
The length of time in milliseconds after which a request will timeout.
nodeEnv
- Type: string; Default: production
Allows you to specify the node environment in which application will run.
production
native node logging is kept to a minimum; more caching to optimize performance. This is the default value.
development
more native node logging; less caching.
tls
This configures the Transport Layer Security for HTTPS support.
certificate
- Type: string; Default: <ROOTPATH>/keys/certificate.pem
Path to the certificate file.
certificateAuthority
- Type: string; Default: <ROOTPATH>/keys/ca.pem
Path to the certificate authority file.
privateKey
- Type: string; Default: <ROOTPATH>/keys/privateKey.pem
Path to the private key file.
http
threads
- Type: number; Default: One less than the number of logical cores/ processors
The threads
option specifies the number of threads that will be used to service the HTTP requests for the operations API and custom functions. Generally, this should be close to the number of CPU logical cores/processors to ensure the CPU is fully utilized (a little less because HarperDB does have other threads at work), assuming HarperDB is the main service on a server.
sessionAffinity
- Type: string; Default: null
HarperDB is a multi-threaded server designed to scale to utilize many CPU cores with high concurrency. Session affinity can help improve the efficiency and fairness of thread utilization by routing multiple requests from the same client to the same thread. This provides a fairer method of request handling by keeping a single user contained to a single thread, can improve caching locality (multiple requests from a single user are more likely to access the same data), and can provide the ability to share information in-memory in user sessions. Enabling session affinity will cause subsequent requests from the same client to be routed to the same thread.
To enable sessionAffinity
, you need to specify how clients will be identified from the incoming requests. If you are using HarperDB to directly serve HTTP requests from users from different remote addresses, you can use a setting of ip
. However, if you are using HarperDB behind a proxy server or application server, all the remote ip addresses will be the same and HarperDB will effectively only run on a single thread. Alternately, you can specify a header to use for identification. If you are using basic authentication, you could use the "Authorization" header to route requests to threads by the user's credentials. If you have another header that uniquely identifies users/clients, you can use that as the value of sessionAffinity. But be careful to ensure that the value does provide sufficient uniqueness and that requests are effectively distributed to all the threads and fully utilizing all your CPU cores.
rootPath
rootPath
- Type: string; Default: home directory of the current user
The HarperDB database and applications/API/interface are decoupled from each other. The rootPath
directory specifies where the HarperDB application persists data, config, logs, and Custom Functions.
storage
writeAsync
- Type: boolean; Default: false
The writeAsync
option turns off disk flushing/syncing, allowing for faster write operation throughput. However, this does not provide storage integrity guarantees, and if a server crashes, it is possible that there may be data loss requiring restore from another backup/another node.
caching
- Type: boolean; Default: true
The caching
option enables in-memory caching of records, providing faster access to frequently accessed objects. This can incur some extra overhead for situations where reads are extremely random and don't benefit from caching.
compression
- Type: boolean; Default: false
The compression
option enables compression of records in the database. This can be helpful for very large databases in reducing storage requirements and potentially allowing more data to be cached. This uses the very fast LZ4 compression algorithm, but this still incurs extra costs for compressing and decompressing.
noReadAhead
- Type: boolean; Default: true
The noReadAhead
option advises the operating system to not read ahead when reading from the database. This provides better memory utilization, except in situations where large records are used or frequent range queries are used.
prefetchWrites
- Type: boolean; Default: true
The prefetchWrites
option loads data prior to write transactions. This should be enabled for databases that are larger than memory (although it can be faster to disable this for smaller databases).
path
- Type: string; Default: <rootPath>/schema
The path
configuration sets where all database files should reside.
Note: This configuration applies to all database files, which includes system tables that are used internally by HarperDB. For this reason if you wish to use a non default path
value you must move any existing schemas into your path
location. Existing schemas is likely to include the system schema which can be found at <rootPath>/schema/system
.
schemas
The schemas
section is an optional configuration that can be used to define where database files should reside down to the table level.
This configuration should be set before the schema and table have been created.
The configuration will not create the directories in the path, that must be done by the user.
To define where a schema and all its tables should reside use the name of your schema and the path
parameter.
To define where specific tables within a schema should reside use the name of your schema, the tables
parameter, the name of your table and the path
parameter.
This same pattern can be used to define where the audit log database files should reside. To do this use the auditPath
parameter.
Setting the schemas section through the command line, environment variables or API
When using command line variables,environment variables or the API to configure the schemas section a slightly different convention from the regular one should be used. To add one or more configurations use a JSON object array.
Using command line variables:
Using environment variables:
Using the API:
HarperDB clustering utilizes two servers, named Hub and Leaf. The Hub server is responsible for establishing the mesh network that connects instances of HarperDB and the Leaf server is responsible for managing the message stores (streams) that replicate and store messages between instances. Due to the verbosity of these servers there is a separate log level configuration for them. To adjust their log verbosity set clustering.logLevel
in the config file harperdb-config.yaml
. Valid log levels from least verbose are error
, warn
, info
, debug
and trace
.
Log rotation allows for managing log files, such as compressing rotated log files, archiving old log files, determining when to rotate, and the like. This will allow for organized storage and efficient use of disk space. For more information see “logging” in our .
geo1
Required. GeoJSON geometry or feature.
geo2
Required. GeoJSON geometry or feature.
geo1
Required. GeoJSON geometry or feature.
geo2
Required. GeoJSON geometry or feature.
geo1
Required. Polygon or MultiPolygon GeoJSON feature.
geo2
Required. Polygon or MultiPolygon GeoJSON feature tested to be contained by geo1.
HarperDB offers two options for logging transactions executed against a table. The options are similar but utilize different storage layers.
The first option is read_transaction_log
. The transaction log is built upon clustering streams. Clustering streams are per-table message stores that enable data to be propagated across a cluster. HarperDB leverages streams for use with the transaction log. When clustering is enabled all transactions that occur against a table are pushed to its stream, and thus make up the transaction log.
If you would like to use the transaction log, but have not set up clustering yet, please see "How to Cluster".
The read_transaction_log
operation returns a prescribed set of records, based on given parameters. The example below will give a maximum of 2 records within the timestamps provided.
See example response below.
See example request above.
The delete_transaction_logs_before
operation will delete transaction log data according to the given parameters. The example below will delete records older than the timestamp provided.
Note: Streams are used for catchup if a node goes down. If you delete messages from a stream there is a chance catchup won't work.
Read on for read_audit_log
, the second option, for logging transactions executed against a table.
This section contains technical details and reference materials for HarperDB.
This document describes best practices for upgrading self-hosted HarperDB instances. HarperDB can be upgraded using a combination of npm and built-in HarperDB upgrade scripts. Whenever upgrading your HarperDB installation it is recommended you make a backup of your data first. Note: This document applies to self-hosted HarperDB instances only. All HarperDB Cloud instances will be upgraded by the HarperDB Cloud team.
Upgrading HarperDB is a two-step process. First the latest version of HarperDB must be downloaded from npm, then the HarperDB upgrade scripts will be utilized to ensure the newest features are available on the system.
Install the latest version of HarperDB using npm install -g harperdb
.
Note -g
should only be used if you installed HarperDB globally (which is recommended).
Run harperdb
to initiate the upgrade process.
HarperDB will then prompt you for all appropriate inputs and then run the upgrade directives.
Node Version Manager (nvm) is an easy way to install, remove, and switch between different versions of Node.js as required by various applications. More information, including directions on installing nvm can be found here: https://nvm.sh/.
HarperDB supports Node.js versions 14.0.0 and higher, however, please check our NPM page for our recommended Node.js version. To install a different version of Node.js with nvm, run the command:
To switch to a version of Node run:
To see the current running version of Node run:
With a handful of different versions of Node.js installed, run nvm with the ls
argument to list out all installed versions:
When upgrading HarperDB, we recommend also upgrading your Node version. Here we assume you're running on an older version of Node; the execution may look like this:
Switch to the older version of Node that HarperDB is running on (if it is not the current version):
Make sure HarperDB is not running:
Uninstall HarperDB. Note, this step is not required, but will clean up old artifacts of HarperDB. We recommend removing all other HarperDB installations to ensure the most recent version is always running.
Switch to the newer version of Node:
Install HarperDB globally
Run the upgrade script
Start HarperDB
The audit log uses a standard HarperDB table to track transactions. For each table a user creates, a corresponding table will be created to track transactions against that table.
Audit log is disabled by default. To use the audit log, set logging.auditLog
to true in the config file, harperdb-config.yaml
. Then restart HarperDB for those changes to take place.
The read_audit_log
operation is flexible, enabling users to query with many parameters. All operations search on a single table. Filter options include timestamps, usernames, and table hash values. Additional examples found in the HarperDB API documentation.
Search by Timestamp
There are three outcomes using timestamp.
"search_values": []
- All records returned for specified table
"search_values": [1660585740558]
- All records after provided timestamp
"search_values": [1660585740558, 1760585759710]
- Records "from" and "to" provided timestamp
Search by Username
The above example will return all records whose username
is "admin."
Search by Primary Key
The above example will return all records whose primary key (hash_value
) is 318.
The example that follows provides records of operations performed on a table. One thing of note is that this the read_audit_log
operation gives you the original_records
.
Just like with transaction logs, you can clean up your audit logs with the delete_audit_logs_before
operation. It will delete audit log data according to the given parameters. The example below will delete records older than the timestamp provided.
All HarperDB API responses include headers that are important for interoperability and debugging purposes. The following headers are returned with all HarperDB API responses:
server-timing
db;dur=7.165
This reports the duration of the operation, in milliseconds. This follows the standard for Server-Timing and can be consumed by network monitoring tools.
hdb-response-time
7.165
This is the legacy header for reporting response time. It is deprecated and will be removed in 4.2.
content-type
application/json
This reports the MIME type of the returned content, which is negotiated based on the requested content type in the Accept header.
HarperDB supports several different content types (or MIME types) for both HTTP request bodies (describing operations) as well as for serializing content into HTTP response bodies. HarperDB follows HTTP standards for specifying both request body content types and acceptable response body content types. Any of these content types can be used with any of the standard HarperDB operations.
For request body content, the content type should be specified with the Content-Type
header. For example with JSON, use Content-Type: application/json
and for CBOR, include Content-Type: application/cbor
. To request that the response body be encoded with a specific content type, use the Accept
header. If you want the response to be in JSON, use Accept: application/json
. If you want the response to be in CBOR, use Accept: application/cbor
.
The following content types are supported:
JSON is the most widely used content type, and is relatively readable and easy to work with. However, JSON does not support all the data types that are supported by HarperDB, and can't be used to natively encode data types like binary data or explicit Maps/Sets. Also, JSON is not as efficient as binary formats. When using JSON, compression is recommended (this also follows standard HTTP protocol with the Accept-Encoding
header) to improve network transfer performance (although there is server performance overhead). JSON is a good choice for web development and when standard JSON types are sufficient and when combined with compression and debuggability/observability is important.
CBOR is a highly efficient binary format, and is a recommended format for most production use cases with HarperDB. CBOR supports the full range of HarperDB data types, including binary data, typed dates, and explicit Maps/Sets. CBOR is very performant and space efficient even without compression. Compression will still yield better network transfer size/performance, but compressed CBOR is generally not any smaller than compressed JSON. CBOR also natively supports streaming for optimal performance (using indefinite length arrays). The CBOR format has excellent standardization and HarperDB's CBOR provides an excellent balance of performance and size efficiency.
MessagePack is another efficient binary format like CBOR, with a support for all HarperDB data types. MessagePack generally has wider adoption than CBOR and can be useful in systems that don't have CBOR support (or good support). However, MessagePack does not have native support for streaming of arrays of data (for query results), and so query results are returned as a (concatenated) sequence of MessagePack objects/maps. MessagePack decoders used with HarperDB's MessagePack must be prepared to decode a direct sequence of MessagePack values to properly read responses.
Comma-separated values is an easy to use and understand format that can be readily imported into spreadsheets or used for data processing. CSV lacks hierarchical structure most data types, and shouldn't be used for frequent/production use, but when you need it, it is available.
HarperDB is built to make data ingestion simple. A primary driver of that is the Dynamic Schema. The purpose of this document is to provide a detailed explanation of the dynamic schema specifically related to schema definition and data ingestion.
The dynamic schema provides the structure of schema and table namespaces while simultaneously providing the flexibility of a data-defined schema. Individual attributes are reflexively created as data is ingested, meaning the table will adapt to the structure of data ingested. HarperDB tracks the metadata around schemas, tables, and attributes allowing for describe table, describe schema, and describe all operations.
HarperDB schemas are analogous to a namespace that groups tables together. A schema is required to create a table.
HarperDB tables group records together with a common data pattern. To create a table users must provide a table name and a primary key.
Table Name: Used to identify the table.
Primary Key: This is a required attribute that serves as the unique identifier for a record and is also known as the hash_attribute
in HarperDB.
Primary Key
The primary key (also referred to as the hash_attribute
) is used to uniquely identify records. Uniqueness is enforced on the primary; inserts with the same primary key will be rejected. If a primary key is not provided on insert, a GUID will be automatically generated and returned to the user. The HarperDB Storage Algorithm utilizes this value for indexing.
Standard Attributes
Additional attributes are reflexively added via insert and update operations (in both SQL and NoSQL) when new attributes are included in the data structure provided to HarperDB. As a result, schemas are additive, meaning new attributes are created in the underlying storage algorithm as additional data structures are provided. HarperDB offers create_attribute
and drop_attribute
operations for users who prefer to manually define their data model independent of data ingestion. When new attributes are added to tables with existing data the value of that new attribute will be assumed null
for all existing records.
Audit Attributes
HarperDB automatically creates two audit attributes used on each record.
__createdtime__
: The time the record was created in Unix Epoch with milliseconds format.
__updatedtime__
: The time the record was updated in Unix Epoch with milliseconds format.
To better understand the behavior let’s take a look at an example. This example utilizes HarperDB API operations.
Create a Schema
Create a Table
Notice the schema name, table name, and hash attribute name are the only required parameters.
At this point the table does not have structure beyond what we provided, so the table looks like this:
dev.dog
Insert Record
To define attributes we do not need to do anything beyond sending them in with an insert operation.
With a single record inserted and new attributes defined, our table now looks like this:
dev.dog
Indexes have been automatically created for dog_name
and owner_name
attributes.
Insert Additional Record
If we continue inserting records with the same data schema no schema updates are required. One record will omit the hash attribute from the insert to demonstrate GUID generation.
In this case, there is no change to the schema. Our table now looks like this:
dev.dog
Update Existing Record
In this case, we will update a record with a new attribute not previously defined on the table.
Now we have a new attribute called weight_lbs
. Our table now looks like this:
dev.dog
Query Table with SQL
Now if we query for all records where weight_lbs
is null
we expect to get back two records.
This results in the expected two records being returned.
The HarperDB storage algorithm is fundamental to the HarperDB core functionality, enabling the Dynamic Schema and all other user-facing functionality. HarperDB is built on top of Lightning Memory-Mapped Database (LMDB), a key-value store offering industry leading performance and functionality, which allows for our storage algorithm to store data in tables as rows/objects. This document will provide additional details on how data is stored within HarperDB.
The HarperDB storage algorithm was designed to abstract the data storage from any individual query language. HarperDB currently supports both SQL and NoSQL on top of this storage algorithm, with the ability to add additional query languages in the future. This means data can be inserted via NoSQL and read via SQL while hitting the same underlying data storage.
Utilizing Multi-Version Concurrency Control (MVCC) through LMDB, HarperDB offers ACID compliance independently on each node. Readers and writers operate independently of each other, meaning readers don’t block writers and writers don’t block readers. Each HarperDB table has a single writer process, avoiding deadlocks and assuring that writes are executed in the order in which they were received. HarperDB tables can have multiple reader processes operating at the same time for consistent, high scale reads.
All top level attributes are automatically indexed immediately upon ingestion. The HarperDB Dynamic Schema reflexively creates both the attribute and index reflexively as new schema metadata comes in. Indexes are agnostic of datatype, honoring the following order: booleans, numbers ordered naturally, strings ordered lexically. Within the LMDB implementation, table records are grouped together into a single LMDB environment file, where each attribute index is a sub-database (dbi) inside said environment file. An example of the indexing scheme can be seen below.
HarperDB inherits both functional and performance benefits by implementing LMDB as the underlying key-value store. Data is memory-mapped, which enables quick data access without data duplication. All writers are fully serialized, making writes deadlock-free. LMDB is built to maximize operating system features and functionality, fully exploiting buffer cache and built to run in CPU cache. To learn more about LMDB, visit their documentation.
HarperDB Jobs are asynchronous tasks performed by the Operations API.
Jobs uses an asynchronous methodology to account for the potential of a long-running operation. For example, exporting millions of records to S3 could take some time, so that job is started and the id is provided to check on the status.
The job status can be COMPLETE or IN_PROGRESS.
Example job operations include:
Example Response from a Job Operation
Whenever one of these operations is initiated, an asynchronous job is created and the request contains the id of that job which can be used to check on its status.
Get Job Request
Get Job Response
Search Jobs Request
Search Jobs Response
This document outlines limitations of HarperDB.
Case Sensitivity
HarperDB schema metadata (schema names, table names, and attribute/column names) are case sensitive. Meaning schemas, tables, and attributes can differ only by the case of their characters.
Restrictions on Schema Metadata Names
HarperDB schema metadata (schema names, table names, and attribute names) cannot contain the following UTF-8 characters:
Additionally, they cannot contain the first 31 non-printing characters. Spaces are allowed, but not recommended as best practice. The regular expression used to verify a name is valid is:
Attribute Maximum
HarperDB limits number of attributes to 10,000 per table.
HarperDB supports a rich set of data types for use in records in databases. Various data types can be used from both direct JavaScript interfaces in Custom Functions and the HTTP operations APIs. Using JSON for communication naturally limits the data types to those available in JSON (HarperDB’s supports all of JSON data types), but JavaScript code and alternate data formats facilitate the use of additional data types. As of v4.1, HarperDB supports , which allows for all of HarperDB supported data types. This includes:
true or false.
Strings, or text, are a sequence of any unicode characters and are internally encoded with UTF-8.
Numbers can be stored as signed integers up to 64-bit or floating point with 64-bit floating point precision, and numbers are automatically stored using the most optimal type. JSON is parsed by JS, so the maximum safe (precise) integer is 9007199254740991 (larger numbers can be stored, but aren’t guaranteed integer precision). Custom Functions may use BigInt numbers to store/access larger 64-bit integers, but integers beyond 64-bit can’t be stored with integer precision (will be stored as standard double-precision numbers).
Objects, or maps, that hold a set named properties can be stored in HarperDB. When provided as JSON objects or JavaScript objects, all property keys are stored as strings. The order of properties is also preserved in HarperDB’s storage. Duplicate property keys are not allowed (they are dropped in parsing any incoming data).
Arrays hold an ordered sequence of values and can be stored in HarperDB. There is no support for sparse arrays, although you can use objects to store data with numbers (converted to strings) as properties.
A null value can be stored in HarperDB property values as well.
Dates can be stored as a specific data type. This is not supported in JSON, but is supported by MessagePack and CBOR. Custom Functions can also store and use Dates using JavaScript Date instances.
Binary data can be stored in property values as well. JSON doesn’t have any support for encoding binary data, but MessagePack and CBOR support binary data in data structures, and this will be preserved in HarperDB. Custom Functions can also store binary data by using NodeJS’s Buffer or Uint8Array instances to hold the binary data.
Explicit instances of JavaScript Maps and Sets can be stored and preserved in HarperDB as well. This can’t be represented with JSON, but can be with CBOR.
HarperDB support is available with all paid instances. Support tickets are managed via our . Once a ticket is submitted the HarperDB team will triage your request and get back to you as soon as possible. Additionally, you can join our where HarperDB team members and others in the community are frequently active to help answer questions.
1 Gigabyte Limit to Request Bodies
HarperDB supports the body of a request to be up to 1 GB in size. This limit does not impact the CSV file import function the reads from the local file system or from an external URL. We recommend if you do need to bulk import large record sets that you utilize the CSV import function, especially if you run up on the 1 GB body size limit. Documentation for these functions can be found here.
Do not install as sudo
HarperDB should be installed using a specific user for HarperDB. This allows you to restrict the permissions that user has and who has access to the HarperDB file system. The reason behind this is that HarperDB files are written directly to the file system, and by using a specific HarperDB user this gives you granular control over who has access to these files.
Error: Must execute as User
You may have gotten an error like, Error: Must execute as <<username>>.
This means that you installed HarperDB as <<user>>
. Because HarperDB stores files directly to the file system, we only allow the HarperDB executable to be run by a single user. This prevents permissions issues on files. For example if you installed as user_a, but later wanted to run as user_b. User_b may not have access to the database files HarperDB needs. This also keeps HarperDB more secure as it allows you to lock files down to a specific user and prevents other users from accessing your files.
What operating system should I use to run HarperDB?
All major operating systems: Linux, Windows, and macOS. However, running HarperDB on Windows and macOS is intended only for development and evaluation purposes. Linux is strongly recommended for production use.
How are HarperDB’s SQL and NoSQL capabilities different from other solutions?
Many solutions offer NoSQL capability and separate processing for SQL such as in-memory transformation or multi-model support. HarperDB’s unique mechanism for storing each data attribute individually allows for performing NoSQL and SQL operations in real-time on the stored data set.
How does HarperDB ensure high availability and consistency?
HarperDB's clustering and replication capabilities allow high availability and fault-tolerance; if a server goes down, traffic can be quickly routed to other HarperDB servers that can service requests. HarperDB's replication uses a consistent resolution strategy (last-write-wins by logical timestamp), to ensure eventual consistency. HarperDB offers auditing capabilities that can be enabled to preserve a record of all changes so that mistakes or even malicious data changes are recorded and can be reverted.
Is HarperDB ACID-compliant?
HarperDB operations are atomic, consist, and isolated per instance. This means that any query will provide an isolated consistent snapshot view of the database (based on when the query started. Updating and insertion operations are also performed atomically; any reads and writes are performed within an atomic, isolated transaction with serialization isolation level, and will rollback if it can not be fully completed successfully. Data is immediately flushed to disk after a write to ensure eventual durability. ACID compliance is not guaranteed across instances in a cluster, rather the eventual consistency will propagate changes with last-write-wins (by last logical timestamp) resolution.
How Does HarperDB Secure My Data?
HarperDB has role and user based security allowing you to simply and easily control that the right people have access to your data. We also implement a number of authentication mechanisms to ensure the transactions submitted are trusted and secure.
Is HarperDB row or column oriented?
HarperDB can be considered column oriented, however, the exploded data model creates an interface that is free from either of these orientations. A user can search and update with columnar benefits and be as ACID as row oriented restrictions.
What do you mean when you say HarperDB is single model?
HarperDB takes every attribute of a database table object and creates a key:value for both the key and its corresponding value. For example, the attribute eye color will be represented by a key “eye-color” and the corresponding value “green” will be represented by a key with the value “green”. We use LMDB’s lightning-fast key:value store to underpin all these interrelated keys and values, meaning that every “column” is automatically indexed, and you get huge performance in a tiny package.
Are Primary Keys Case-Sensitive?
When using HarperDB, primary keys are case-sensitive. This can cause confusion for developers. For example, if you have a user table, it might make sense to use user.email
as the primary key. This can cause problems as Harper@harperdb.io and harper@harperdb.io would be seen as two different records. We recommend enforcing case on keys within your app to avoid this issue.
How Do I Move My HarperDB Data Directory?
HarperDB’s data directory can be moved from one location to another by simply updating the rootPath
in the config file (where the data lives, which you specified during installation) to a new location.
Next, edit HarperDB’s hdb_boot_properties.file to point HarperDB to the new location by updating the settings_path variable. Substitute the NEW_HDB_ROOT variable in the snippets below with the new path to your new data directory, making sure you escape any slashes.
On MacOS/OSX
On Linux
Finally, edit the config file in the root folder you just moved:
Edit the rootPath
parameter to reflect the new location of your data directory.
To check on a job's status, use the operation.
To find jobs (if the id is not know) use the operation.
Our 2nd Release Pup
Our 1st Release Pup
HarperDB 4.1 introduces the ability to use worker threads for concurrently handling HTTP requests. Previously this was handled by processes. This shift provides important benefits in terms of better control of traffic delegation with support for optimized load tracking and session affinity, better debuggability, and reduced memory footprint.
This means debugging will be much easier for custom functions. If you install/run HarperDB locally, most modern IDEs like WebStorm and VSCode support worker thread debugging, so you can start HarperDB in your IDE, and set breakpoints in your custom functions and debug them.
The associated routing functionality now includes session affinity support. This can be used to consistently route users to the same thread which can improve caching locality, performance, and fairness. This can be enabled in with the http.sessionAffinity
option in your configuration.
HarperDB 4.1's NoSQL query handling has been revamped to consistently use iterators, which provide an extremely memory efficient mechanism for directly streaming query results to the network as the query results are computed. This results in faster Time to First Byte (TTFB) (only the first record/value in a query needs to be computed before data can start to be sent), and less memory usage during querying (the entire query result does not need to be stored in memory). These iterators are also available in query results for custom functions and can provide means for custom function code to iteratively access data from the database without loading entire results. This should be a completely transparent upgrade, all HTTP APIs function the same, with the one exception that custom functions need to be aware that they can't access query results by [index]
(they should use array methods or for-in loops to handle query results).
4.1 includes configuration options for specifying the location of database storage files. This allows you to specifically locate database directories and files on different volumes for better flexibility and utilization of disks and storage volumes. See the storage configuration and schemas configuration for information on how to configure these locations.
Logging has been revamped and condensed into one hdb.log
file. See logging for more information.
A new operation called cluster_network
was added, this operation will ping the cluster and return a list of enmeshed nodes.
Custom Functions will no longer automatically load static file routes, instead the @fastify/static
plugin will need to be registered with the Custom Function server. See Host A Static Web UI-static.
Updates to S3 import and export mean that these operations now require the bucket region
in the request. Also, if referencing a nested object it should be done in the key
parameter. See examples here.
Due to the AWS SDK v2 reaching end of life support we have updated to v3. This has caused some breaking changes in our operations import_from_s3
and export_to_s3
:
A new attribute region
will need to be supplied
The bucket
attribute can no longer have trailing slashes. Slashes will now need to be in the key
.
Starting HarperDB without any command (just harperdb
) now runs HarperDB like a standard process, in the foreground. This means you can use standard unix tooling for interacting with the process and is conducive for running HarperDB with systemd or any other process management tool. If you wish to have HarperDB launch itself in separate background process (and immediately terminate the shell process), you can do so by running harperdb start
.
Internal Tickets completed:
CORE-609 - Ensure that attribute names are always added to global schema as Strings
CORE-1549 - Remove fastify-static code from Custom Functions server which auto serves content from "static" folder
CORE-1655 - Iterator based queries
CORE-1764 - Fix issue where describe_all operation returns an empty object for non super-users if schema(s) do not yet have table(s)
CORE-1854 - Switch to using worker threads instead of processes for handling concurrency
CORE-1877 - Extend the csv_url_load operation to allow for additional headers to be passed to the remote server when the csv is being downloaded
CORE-1893 - Add last updated timestamp to describe operations
CORE-1896 - Fix issue where Select * from system.hdb_info returns wrong HDB version number after Instance Upgrade
CORE-1904 - Fix issue when executing GEOJSON query in SQL
CORE-1905 - Add HarperDB YAML configuration setting which defines the storage location of NATS streams
CORE-1906 - Add HarperDB YAML configuration setting defining the storage location of tables.
CORE-1655 - Streaming binary format serialization
CORE-1943 - Add configuration option to set mount point for audit tables
CORE-1921 - Update NATS transaction lifecycle to handle message deduplication in work queue streams.
CORE-1963 - Update logging for better readability, reduced duplication, and request context information.
CORE-1968 - In server\nats\natsIngestService.js remove the js_msg.working(); line to improve performance.
CORE-1976 - Fix error when calling describe_table operation with no schema or table defined in payload.
CORE-1983 - Fix issue where create_attribute operation does not validate request for required attributes
CORE-2015 - Remove PM2 logs that get logged in console when starting HDB
CORE-2048 - systemd script for 4.1
CORE-2052 - Include thread information in system_information for visibility of threads
CORE-2061 - Add a better error msg when clustering is enabled without a cluster user set
CORE-2068 - Create new log rotate logic since pm2 log-rotate no longer used
CORE-2072 - Update to Node 18.15.0
CORE-2090 - Upgrade Testing from v4.0.x and v3.x to v4.1.
CORE-2091 - Run the performance tests
CORE-2092 - Allow for automatic patch version updates of certain packages
CORE-2109 - Add verify option to clustering TLS configuration
CORE-2111 - Update AWS SDK to v3
01/27/2023
Bug Fixes
CORE-2009 Fixed bug where add node was not being called when upgrading clustering.
03/09/2023
Bug Fixes
Fixed a data serialization error that occurs when a large number of different record structures are persisted in a single table.
02/15/2023
Bug Fixes
CORE-2029 Improved the upgrade process for handling existing user TLS certificates and correctly configuring TLS settings. Added a prompt to upgrade to determine if new certificates should be created or existing certificates should be kept/used.
Fix the way NATS connections are honored in a local environment.
Do not define the certificate authority path to NATS if it is not defined in the HarperDB config.
01/26/2023
Bug Fixes
CORE-2007 Add update nodes 4.0.0 launch script to build script to fix clustering upgrade.
Did you know our release names are dedicated to employee pups? For our fourth release, we have Tucker.
G’day, I’m Tucker. My dad is David Cockerill, a software engineer here at HarperDB. I am a 3-year-old Labrador Husky mix. I love to protect my dad from all the squirrels and rabbits we have in our yard. I have very ticklish feet and love belly rubs!
01/24/2023
Bug Fixes
CORE-2003 Fix bug where if machine had one core thread config would default to zero.
Update to lmdb 2.7.3 and msgpackr 1.7.0
11/2/2022
Networking & Data Replication (Clustering)
The HarperDB clustering internals have been rewritten and the underlying technology for Clustering has been completely replaced with NATS, an enterprise grade connective technology responsible for addressing, discovery and exchanging of messages that drive the common patterns in distributed systems.
CORE-1464, CORE-1470, : Remove SocketCluster dependencies and all code related to them.
CORE-1465, CORE-1485, CORE-1537, CORE-1538, CORE-1558, CORE-1583, CORE_1665, CORE-1710, CORE-1801, CORE-1865 :Add nats-server
code as dependency, on install of HarperDB download nats-server
is possible else fallback to building from source code.
CORE-1593, CORE-1761: Add nats.js
as project dependency.
CORE-1466: Build NATS configs on harperdb run
based on HarperDB YAML configuration.
CORE-1467, CORE-1508: Launch and manage NATS servers with PM2.
CORE-1468, CORE-1507: Create a process which reads the work queue stream and processes transactions.
CORE-1481, CORE-1529, CORE-1698, CORE-1502, CORE-1696: On upgrade to 4.0, update pre-existing clustering configurations, create table transaction streams, create work queue stream, update hdb_nodes
table, create clustering folder structure, and rebuild self-signed certs.
CORE-1494, CORE-1521, CORE-1755: Build out internals to interface with NATS.
CORE-1504: Update existing hooks to save transactions to work with NATS.
CORE-1514, CORE-1515, CORE-1516, CORE-1527, CORE-1532: Update add_node
, update_node
, and remove_node
operations to no longer need host and port in payload. These operations now manage dynamically sourcing of table level transaction streams between nodes and work queues.
CORE-1522: Create NATSReplyService
process which handles the receiving NATS based requests from remote instances and sending back appropriate responses.
CORE-1471, CORE-1568, CORE-1563, CORE-1534, CORE-1569: Update cluster_status
operation.
CORE-1611: Update pre-existing transaction log operations to be audit log operations.
CORE-1541, CORE-1612, CORE-1613: Create translation log operations which interface with streams.
CORE-1668: Update NATS serialization / deserialization to use MessagePack.
CORE-1673: Add system_info
param to hdb_nodes
table and update on add_node
and cluster_status
.
CORE-1477, CORE-1493, CORE-1557, CORE-1596, CORE-1577: Both a full HarperDB restart & just clustering restart call the NATS server with a reload directive to maintain full uptime while servers refresh.
CORE-1474:HarperDB install adds clustering folder structure.
CORE-1530: Post drop_table
HarperDB purges the related transaction stream.
CORE-1567: Set NATS config to always use TLS.
CORE-1543: Removed the transact_to_cluster
attribute from the bulk load operations. Now bulk loads always replicate.
CORE-1533, CORE-1556, CORE-1561, CORE-1562, CORE-1564: New operation configure_cluster
, this operation enables bulk publishing and subscription of multiple tables to multiple instances of HarperDB.
CORE-1535: Create work queue stream on install of HarperDB. This stream receives transactions from remote instances of HarperDB which are then ingested in order.
CORE-1551: Create transaction streams on the remote node if they do not exist when performing add_node
or update_node
.
CORE-1594, CORE-1605, CORE-1749, CORE-1767, CORE-1770: Optimize the work queue stream and its consumer to be more performant and validate exact once delivery.
CORE-1621, CORE-1692, CORE-1570, CORE-1693: NATS stream names are MD5 hashed to avoid characters that HarperDB allows, but NATS may not.
CORE-1762: Add a new optional attribute to add_node
and update_node
named opt_start_time
. This attribute sets a starting time to start synchronizing transactions.
CORE-1785: Optimizations and bug fixes in regards to sourcing data from remote instances on HarperDB.
CORE-1588: Created new operation set_cluster_routes
to enable setting routes for instances of HarperDB to mesh together.
CORE-1589: Created new operation get_cluster_routes
to allow for retrieval of routes used to connect the instance of HarperDB to the mesh.
CORE-1590: Created new operation delete_cluster_routes
to allow for removal of routes used to connect the instance of HarperDB to the mesh.
CORE-1667: Fix old environment variable CLUSTERING_PORT
not mapping to new hub server port.
CORE-1609: Allow remove_node
to be called when the other node cannot be reached.
CORE-1815: Add transaction lock to add_node
and update_node
to avoid concurrent nats source update bug.
CORE-1848: Update stream configs if the node name has been changed in the YAML configuration.
CORE-1873: Update add_node
and update_node
so that it auto-creates schema/table on both local and remote node respectively
Data Storage
We have made improvements to how we store, index, and retrieve data.
CORE-1619: Enabled new concurrent flushing technology for improved write performance.
CORE-1701: Optimize search performance for search_by_conditions
when executing multiple AND conditions.
CORE-1652: Encode the values of secondary indices more efficiently for faster access.
CORE-1670: Store updated timestamp in lmdb.js
' version property.
CORE-1651: Enabled multiple value indexing of array values which allows for the ability to search on specific elements in an array more efficiently.
CORE-1649, CORE-1659: Large text values (larger than 255 bytes) are no longer stored in separate blob index. Now they are segmented and delimited in the same index to increase search performance.
Complex objects and object arrays are no longer stored in a separate index to preserve storage and increase write throughput.
CORE-1650, CORE-1724, CORE-1738: Improved internals around interpreting attribute values.
CORE-1657: Deferred property decoding allows large objects to be stored, but individual attributes can be accessed (like with get_attributes) without incurring the cost of decoding the entire object.
CORE-1658: Enable in-memory caching of records for even faster access to frequently accessed data.
CORE-1693: Wrap updates in async transactions to ensure ACID-compliant updates.
CORE-1653: Upgrade to 4.0 rebuilds tables to reflect changes made to index improvements.
CORE-1753: Removed old node-lmdb
dependency.
CORE-1787: Freeze objects returned from queries.
CORE-1821: Read the WRITE_ASYNC
setting which enables LMDB nosync.
Logging
HarperDB has increased logging specificity by breaking out logs based on components logging. There are specific log files each for HarperDB Core, Custom Functions, Hub Server, Leaf Server, and more.
CORE-1497: Remove pino
and winston
dependencies.
CORE-1426: All logging is output via stdout
and stderr
, our default logging is then picked up by PM2 which handles writing out to file.
CORE-1431: Improved read_log
operation validation.
CORE-1433, CORE-1463: Added log rotation.
CORE-1553, CORE-1555, CORE-1552, CORE-1554, CORE-1704: Performance gain by only serializing objects and arrays if the log is for the level defined in configuration.
CORE-1436: Upgrade to 4.0 updates internals for logging changes.
CORE-1428, CORE-1440, CORE-1442, CORE-1434, CORE-1435, CORE-1439, CORE-1482, CORE-1751, CORE-1752: Bug fixes, performance improvements and improved unit tests.
CORE-1691: Convert non-PM2 managed log file writes to use Node.js fs.appendFileSync
function.
Configuration
HarperDB has updated its configuration from a properties file to YAML.
CORE-1448, CORE-1449, CORE-1519, CORE-1587: Upgrade automatically converts the pre-existing settings file to YAML.
CORE-1445, CORE-1534, CORE-1444, CORE-1858: Build out new logic to create, update, and interpret the YAML configuration file.
Installer has updated prompts to reflect YAML settings.
CORE-1447: Create an alias for the configure_cluster
operation as set_configuration
.
CORE-1461, CORE-1462, CORE-1483: Unit test improvements.
CORE-1492: Improvements to get_configuration and set_configuration operations.
CORE-1503: Modify HarperDB configuration for more granular certificate definition.
CORE-1591: Update routes
IP param to host
and to leaf
config in harperdb.conf
CORE-1519: Fix issue when switching between old and new versions of HarperDB we are getting the config parameter is undefined error on npm install.
Broad NodeJS and Platform Support
CORE-1624: HarperDB can now run on multiple versions of NodeJS, from v14 to v19. We primarily test on v18, so that is the preferred version.
Windows 10 and 11
CORE-1088: HarperDB now runs natively on Windows 10 and 11 without the need to run in a container or installed in WSL. Windows is only intended for evaluation and development purposes, not for production work loads.
Extra Changes and Bug Fixes
CORE-1520: Refactor installer to remove all waterfall code and update to use Promises.
CORE-1573: Stop the PM2 daemon and any logging processes when stopping hdb.
CORE-1586: When HarperDB is running in foreground stop any additional logging processes from being spawned.
CORE-1626: Update docker file to accommodate new harperdb.conf
file.
CORE-1592, CORE-1526, CORE-1660, CORE-1646, CORE-1640, CORE-1689, CORE-1711, CORE-1601, CORE-1726, CORE-1728, CORE-1736, CORE-1735, CORE-1745, CORE-1729, CORE-1748, CORE-1644, CORE-1750, CORE-1757, CORE-1727, CORE-1740, CORE-1730, CORE-1777, CORE-1778, CORE-1782, CORE-1775, CORE-1771, CORE-1774, CORE-1759, CORE-1772, CORE-1861, CORE-1862, CORE-1863, CORE-1870, CORE-1869:Changes for CI/CD pipeline and integration tests.
CORE-1661: Fixed issue where old boot properties file caused an error when attempting to install 4.0.0.
CORE-1697, CORE-1814, CORE-1855: Upgrade fastify dependency to new major version 4.
CORE-1629: Jobs are now running as processes managed by the PM2 daemon.
CORE-1733: Update LICENSE to reflect our EULA on our site.
CORE-1606: Enable Custom Functions by default.
CORE-1714: Include pre-built binaries for most common platforms (darwin-arm64, darwin-x64, linux-arm64, linux-x64, win32-x64).
CORE-1628: Fix issue where setting license through environment variable not working.
CORE-1602, CORE-1760, CORE-1838, CORE-1839, CORE-1847, CORE-1773: HarperDB Docker container improvements.
CORE-1706: Add support for encoding HTTP responses with MessagePack.
CORE-1709: Improve the way lmdb.js dependencies are installed.
CORE-1758: Remove/update unnecessary HTTP headers.
CORE-1756: On npm install
and harperdb install
change the node version check from an error to a warning if the installed Node.js version does not match our preferred version.
CORE-1791: Optimizations to authenticated user caching.
CORE-1794: Update README to discuss Windows support & Node.js versions
CORE-1837: Fix issue where Custom Function directory was not being created on install.
CORE-1742: Add more validation to audit log - check schema/table exists and log is enabled.
CORE-1768: Fix issue where when running in foreground HarperDB process is not stopping on harperdb stop
.
CORE-1864: Fix to semver checks on upgrade.
CORE-1850: Fix issue where a cluster_user
type role could not be altered.
01/20/2023
Bug Fixes
CORE-1992 Local studio was not loading because the path got mangled in the build.
CORE-2001 Fixed deploy_custom_function_project after node update broke it.