1 of 100

4.1 Developer Documentation

HarperDB's documentation covers installation, getting started, APIs, security, and much more. Browse the topics at left, or choose one of the commonly used documentation sections below.

Install HarperDB Locally
Getting Started
HarperDB Operations API
HarperDB Studio
HarperDB Cloud
Developer Project Examples
Support

Install HarperDB

This documentation contains information for installing HarperDB locally. Note that if you’d like to get up and running quickly, you can try a . HarperDB is a cross-platform database; we recommend Linux for production use, but HarperDB can run on Windows and Mac as well, for development purposes. Installation is usually very simple and just takes a few steps, but there are a few different options documented here.

HarperDB runs on Node.js, so if you do not have it installed, you need to do that first (if you have installed, you can skip to installing HarperDB, itself). Node.js can be downloaded and installed from . For Linux and Mac, we recommend installing and managing Node versions with , but generally NVM can be installed with:

And then logout and login, and then install Node.js using nvm. We recommend using LTS, but support all currently maintained Node versions (which is currently version 14 and newer, and make sure to always uses latest minor/patch for the major version):

Install and Start HarperDB

Then you can install HarperDB with NPM and start it:

HarperDB will automatically start after installation.

If you are setting up a production server on Linux, .

With Docker

If you would like to run HarperDB in Docker, install on your Mac or Windows computer. Otherwise, install the on your Linux server.

Once Docker Desktop or Docker Engine is installed, visit our for information and examples on how to run a HarperDB container.

Offline Install

If you need to install HarperDB on a device that doesn't have an Internet connection, you can choose your version and download the npm package and install it directly (you’ll still need Node.js and NPM):

Once you’ve downloaded the .tgz file, run the following command from the directory where you’ve placed it:

For more information visit the guide.

Installation on Less Common Platforms

HarperDB comes with binaries for standard AMD64/x64 or ARM64 CPU architectures on Linux, Windows (x64 only), and Mac (including Apple Silicon). However, if you are installing on a less common platform (Alpine, for example), you will need to ensure that you have build tools installed for the installation process to compile the binaries (this is handled automatically), including:

: version 1.19.1
GCC
Make
Python v3.7, v3.8, v3.9, or v3.10

On Linux

If you wish to install locally or already have a configured server, see the basic

The following is a recommended way to configure Linux and install HarperDB. These instructions should work reasonably well for any public cloud or on-premises Linux instance.

These instructions assume that the following has already been completed:

Linux is installed
Basic networking is configured
A non-root user account dedicated to HarperDB with sudo privileges exists
An additional volume for storing HarperDB files is attached to the Linux instance
Traffic to ports 9925 (HarperDB Operations API,) 9926 (HarperDB Custom Functions,) and 9932 (HarperDB Clustering) is permitted

For this example, we will use an AWS Ubuntu Server 22.04 LTS m5.large EC2 Instance with an additional General Purpose SSD EBS volume and the default “ubuntu” user account.

(Optional) LVM Configuration

Logical Volume Manager (LVM) can be used to stripe multiple disks together to form a single logical volume. If striping disks together is not a requirement, skip these steps.

Find disk that already has a partition

Create array of free disks

Get quantity of free disks

Construct pvcreate command

Initialize disks for use by LVM

Create volume group

Create logical volume

Configure Data Volume

Run lsblk and note the device name of the additional volume

Create an ext4 filesystem on the volume (The below commands assume the device name is nvme1n1. If you used LVM to create logical volume, replace /dev/nvme1n1 with /dev/hdb_vg/hdb_lv)

Mount the file system and set the correct permissions for the directory

Create a fstab entry to mount the filesystem on boot

Configure Linux and Install Prerequisites

If a swap file or partition does not already exist, create and enable a 2GB swap file

Increase the open file limits for the ubuntu user

Install Node Version Manager (nvm)

Load nvm (or logout and then login)

Install Node.js using nvm ()

Install and Start HarperDB

Here is an example of installing HarperDB with minimal configuration.

Here is an example of installing HarperDB with commonly used additional configuration.

HarperDB will automatically start after installation. If you wish HarperDB to start when the OS boots, you have two options

You can set up a crontab:

Or you can create a systemd script at /etc/systemd/system/harperdb.service

Pasting the following contents into the file:

And then running the following:

For more information visit the and the .

Getting Started

Getting started with HarperDB is easy and fast.

The quickest way to get up and running with HarperDB is with , our database-as-a-service offering, which this guide will utilize.

Set Up a HarperDB Instance

Before you can start using HarperDB you need to set up an instance. Note, if you would prefer to install HarperDB locally, .

HarperDB Cloud instance provisioning typically takes 5-15 minutes. You will receive an email notification when your instance is ready.

Using the HarperDB Studio

Now that you have a HarperDB instance, you can do pretty much everything you’d like through the Studio. This section links to appropriate articles to get you started interacting with your data.

(Here’s a sample CSV of the HarperDB team’s dogs)

Using the HarperDB API

Complete HarperDB API documentation is available at api.harperdb.io. The HarperDB Studio features an example code builder that generates API calls in the programming language of your choice. For example purposes, a basic cURL command is shown below to create a schema called dev.

Breaking it down, there are only a few requirements for interacting with HarperDB:

Using the HTTP POST method.
Providing the URL of the HarperDB instance.
Providing the Authorization header (more on using Basic authentication).
Providing the Content-Type header.
Providing a JSON body with the desired operation and any additional operation properties (shown in the --data-raw parameter). This is the only parameter that needs to be changed to execute alternative operations on HarperDB.

Video Tutorials

. HarperDB and the HarperDB Studio are constantly changing, as such, there may be small discrepancies in UI/UX.

HarperDB Studio

HarperDB Studio is the web-based GUI for HarperDB. Studio enables you to administer, navigate, and monitor all of your HarperDB instances in a simple, user friendly interface without any knowledge of the underlying HarperDB API. It’s free to sign up, get started today!

How does Studio Work?

While HarperDB Studio is web based and hosted by us, all database interactions are performed on the HarperDB instance the studio is connected to. The HarperDB Studio loads in your browser, at which point you login to your HarperDB instances. Credentials are stored in your browser cache and are not transmitted back to HarperDB. All database interactions are made via the HarperDB Operations API directly from your browser to your instance.

What type of instances can I manage?

HarperDB Studio enables users to manage both HarperDB Cloud instances and privately hosted instances all from a single UI. All HarperDB instances feature identical behavior whether they are hosted by us or by you.

Create an Account

Start at the .

Provide the following information:
- First Name
- Last Name
- Email Address
- Subdomain
  Part of the URL that will be used to identify your HarperDB Cloud Instances. For example, with subdomain “demo” and instance name “c1” the instance URL would be: https://c1-demo.harperdbcloud.com.
- Coupon Code (optional)
Review the Privacy Policy and Terms of Service.
Click the sign up for free button.
You will be taken to a new screen to add an account password. Enter your password. Passwords must be a minimum of 8 characters with at least 1 lower case character, 1 upper case character, 1 number, and 1 special character.
Click the add account password button.

You will receive a Studio welcome email confirming your registration.

Note: Your email address will be used as your username and cannot be changed.

Log In & Password Reset

Log In to Your HarperDB Studio Account

To log into your existing HarperDB Studio account:

Navigate to the HarperDB Studio.
Enter your email address.
Enter your password.
Click sign in.

Reset a Forgotten Password

To reset a forgotten password:

Navigate to the HarperDB Studio password reset page.
Enter your email address.
Click send password reset email.
If the account exists, you will receive an email with a temporary password.
Navigate back to the HarperDB Studio login page.
Enter your email address.
Enter your temporary password.
Click sign in.
You will be taken to a new screen to reset your account password. Enter your new password. Passwords must be a minimum of 8 characters with at least 1 lower case character, 1 upper case character, 1 number, and 1 special character.
Click the add account password button.

Change Your Password

If you are already logged into the Studio, you can change your password though the user interface.

Navigate to the HarperDB Studio profile page.
In the password section, enter:
- Current password.
- New password.
- New password again (for verification).
Click the Update Password button.

Resources (Marketplace, Drivers, Tutorials, & Example Code)

HarperDB Studio resources are available regardless of whether or not you are logged in.

HarperDB Marketplace

The is a collection of SDKs and connectors that enable developers to expand upon HarperDB for quick and easy solution development. Extensions are built and supported by the HarperDB Community. Each extension is hosted on the appropriate package manager or host.

To download a Marketplace extension:

Navigate to the page.
Identity the extension you would like to use.
Either click the link to the package.
Follow the extension’s instructions to proceed.

You can submit your rating for each extension by clicking on the stars.

HarperDB Drivers

HarperDB offers standard drivers to connect real-time HarperDB data with BI, analytics, reporting and data visualization technologies. Drivers are built and maintained by .

To download a driver:

Navigate to the page.
Identity the driver you would like to use.
Click the download link.
For additional instructions, visit the support link on the driver card.

Video Tutorials

HarperDB offers video tutorials available in the Studio on the page as well as our . The HarperDB Studio is changing all the time, as a result these, the videos may not include all of the current Studio features.

Example Code

The page offers example code for many different programming languages. These samples will include a placeholder for your authorization token. Full code examples with the authorization token prepopulated are available within individual instance pages.

Organizations

HarperDB Studio organizations provide the ability to group HarperDB Cloud Instances. Organization behavior is as follows:

Billing occurs at the organization level to a single credit card.
Organizations retain their own unique HarperDB Cloud subdomain.
Cloud instances reside within an organization.
Studio users can be invited to organizations to share instances.

An organization is automatically created for you when you sign up for HarperDB Studio. If you only have one organization, the Studio will automatically bring you to your organization’s page.

List Organizations

A summary view of all organizations your user belongs to can be viewed on the page. You can navigate to this page at any time by clicking the all organizations link at the top of the HarperDB Studio.

Create a New Organization

A new organization can be created as follows:

Navigate to the page.
Click the Create a New Organization card.
Fill out new organization details
- Enter Organization Name This is used for descriptive purposes only.
- Enter Organization Subdomain Part of the URL that will be used to identify your HarperDB Cloud Instances. For example, with subdomain “demo” and instance name “c1” the instance URL would be: https://c1-demo.harperdbcloud.com.
Click Create Organization.

Delete an Organization

An organization cannot be deleted until all instances have been removed. An organization can be deleted as follows:

Navigate to the HarperDB Studio Organizations page.
Identify the proper organization card and click the trash can icon.
Enter the organization name into the text box.
This is done for confirmation purposes to ensure you do not accidentally delete an organization.
Click the Do It button.

Manage Users

HarperDB Studio organization owners can manage users including inviting new users, removing users, and toggling ownership.

Inviting a User

A new user can be invited to an organization as follows:

Navigate to the page.
Click the appropriate organization card.
Click users at the top of the screen.
In the add user box, enter the new user’s email address.
Click Add User.

Users may or may not already be HarperDB Studio users when adding them to an organization. If the HarperDB Studio account already exists, the user will receive an email notification alerting them to the organization invitation. If the user does not have a HarperDB Studio account, they will receive an email welcoming them to HarperDB Studio.

Toggle a User’s Organization Owner Status

Organization owners have full access to the organization including the ability to manage organization users, create, modify, and delete instances, and delete the organization. Users must have accepted their invitation prior to being promoted to an owner. A user’s organization owner status can be toggled owner as follows:

Navigate to the HarperDB Studio Organizations page.
Click the appropriate organization card.
Click users at the top of the screen.
Click the appropriate user from the existing users section.
Toggle the Is Owner switch to the desired status.

Remove a User from an Organization

Users may be removed from an organization at any time. Removing a user from an organization will not delete their HarperDB Studio account, it will only remove their access to the specified organization. A user can be removed from an organization as follows:

Navigate to the page.
Click the appropriate organization card.
Click users at the top of the screen.
Click the appropriate user from the existing users section.
Type DELETE in the text box in the Delete User row.
This is done for confirmation purposes to ensure you do not accidentally delete a user.
Click Delete User.

Manage Billing

Billing is configured per organization and will be billed to the stored credit card at appropriate intervals (monthly or annually depending on the registered instance). Billing settings can be configured as follows:

Navigate to the page.
Click the appropriate organization card.
Click billing at the top of the screen.

Here organization owners can view invoices, manage coupons, and manage the associated credit card.

HarperDB billing and payments are managed via Stripe.

Add a Coupon

Coupons are applicable towards any paid tier or user-installed instance and you can change your subscription at any time. Coupons can be added to your Organization as follows:

In the coupons panel of the billing page, enter your coupon code.
Click Add Coupon.
The coupon will then be available and displayed in the coupons panel.

Query Instance Data

SQL queries can be executed directly through the HarperDB Studio with the following instructions:

Navigate to the page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click query in the instance control bar.
Enter your SQL query in the SQL query window.
Click Execute.

Please note, the Studio will execute the query exactly as entered. For example, if you attempt to SELECT * from a table with millions of rows, you will most likely crash your browser.

Browse Query Results Set

Browse Results Set Data

The first page of results set data is automatically loaded on query execution. Paging controls are at the bottom of the table. Here you can:

Page left and right using the arrows.
Type in the desired page.
Change the page size (the amount of records displayed in the table).

Refresh Results Set

Click the refresh icon at the top right of the results set table.

Automatically Refresh Results Set

Toggle the auto switch at the top right of the results set table. The results set will now automatically refresh every 15 seconds. Filters and pages will remain set for refreshed data.

Query History

Query history is stored in your local browser cache. Executed queries are listed with the most recent at the top in the query history section.

Rerun Previous Query

Identify the query from the query history list.
Click the appropriate query. It will be loaded into the sql query input box.
Click Execute.

Clear Query History

Click the trash can icon at the top right of the query history section.

Create Charts

The HarperDB Studio includes a charting feature where you can build charts based on your specified queries. Visit the Charts documentation for more information.

Manage Clustering

HarperDB instance clustering and replication can be configured directly through the HarperDB Studio. It is recommended to read through the clustering documentation first to gain a strong understanding of HarperDB clustering behavior.

All clustering configuration is handled through the cluster page of the HarperDB Studio, accessed with the following instructions:

Navigate to the HarperDB Studio Organizations page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click cluster in the instance control bar.

Note, the cluster page will only be available to super users.

Initial Configuration

HarperDB instances do not have clustering configured by default. The HarperDB Studio will walk you through the initial configuration. Upon entering the cluster screen for the first time you will need to complete the following configuration. Configurations are set in the enable clustering panel on the left while actions are described in the middle of the screen.

Create a cluster user, read more about this here: Clustering Users and Roles.
- Enter username.
- Enter password.
- Click Create Cluster User.
Click Set Cluster Node Name.
Click Enable Instance Clustering.

At this point the Studio will restart your HarperDB Instance, required for the configuration changes to take effect.

Manage Clustering

Once initial clustering configuration is completed you a presented with a clustering management screen with the following properties:

connected instances
Displays all instances within the Studio Organization that this instance manages a connection with.
unconnected instances
Displays all instances within the Studio Organization that this instance does not manage a connection with.
unregistered instances
Displays all instances outside of the Studio Organization that this instance manages a connection with.
manage clustering
Once instances are connected, this will display clustering management options for all connected instances and all schemas and tables.

Connect an Instance

HarperDB Instances can be clustered together with the following instructions.

Ensure clustering has been configured on both instances and a cluster user with identical credentials exists on both.
Identify the instance you would like to connect from the unconnected instances panel.
Click the plus icon next the appropriate instance.
If configurations are correct, all schemas will sync across the cluster, then appear in the manage clustering panel. If there is a configuration issue, a red exclamation icon will appear, click it to learn more about what could be causing the issue.

Disconnect an Instance

HarperDB Instances can be disconnected with the following instructions.

Identify the instance you would like to disconnect from the connected instances panel.
Click the minus icon next the appropriate instance.

Manage Replication

Subscriptions must be configured in order to move data between connected instances. Read more about subscriptions here: Creating A Subscription. The manage clustering panel displays a table with each row representing an channel per instance. Cells are bolded to indicate a change in the column. Publish and subscribe replication can be configured per table with the following instructions:

Identify the instance, schema, and table for replication to be configured.
For publish, click the toggle switch in the publish column.
For subscribe, click the toggle switch in the subscribe column.

Manage Instance Roles

HarperDB users can be managed directly through the HarperDB Studio. It is recommended to read through the users & roles documentation to gain a strong understanding of how they operate.

Instance role configuration is handled through the roles page of the HarperDB Studio, accessed with the following instructions:

Navigate to the HarperDB Studio Organizations page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click rules in the instance control bar.

Note, the roles page will only be available to super users.

The roles management screen consists of the following panels:

super users
Displays all super user roles for this instance.
cluster users
Displays all cluster user roles for this instance.
standard roles
Displays all standard roles for this instance.
role permission editing
Once a role is selected for editing, permissions will be displayed here in JSON format.

Note, when new tables are added that are not configured, the Studio will generate configuration values with permissions defaulting to false.

Role Management

Create a Role

Click the plus icon at the top right of the appropriate role section.
Enter the role name.
Click the green check mark.
Configure the role permissions in the role permission editing panel.
Note, to have the Studio generate attribute permissions JSON, toggle show all attributes at the top right of the role permission editing panel.
Click Update Role Permissions.

Modify a Role

Click the appropriate role from the appropriate role section.
Modify the role permissions in the role permission editing panel.
Note, to have the Studio generate attribute permissions JSON, toggle show all attributes at the top right of the role permission editing panel.
Click Update Role Permissions.

Delete a Role

Deleting a role is permanent and irreversible. A role cannot be remove if users are associated with it.

Click the minus icon at the top right of the schemas section.
Identify the appropriate role to delete and click the red minus sign in the same row.
Click the red check mark to confirm deletion.

Instance Metrics

The HarperDB Studio display instance status and metrics on the instance status page, which can be accessed with the following instructions:

Navigate to the HarperDB Studio Organizations page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click status in the instance control bar.

Once on the instance browse page you can view host system information, HarperDB logs, and HarperDB Cloud alarms (if it is a cloud instance).

Note, the status page will only be available to super users.

Instance Configuration

HarperDB instance configuration can be viewed and managed directly through the HarperDB Studio. HarperDB Cloud instances can be resized in two different ways via this page, either by modifying machine RAM or by increasing drive storage. User-installed instances can have their licenses modified by modifying licensed RAM.

All instance configuration is handled through the config page of the HarperDB Studio, accessed with the following instructions:

Navigate to the page.
Click the appropriate organization that the instance belongs to.
Select your desired instance.
Click config in the instance control bar.

Note, the config page will only be available to super users and certain items are restricted to Studio organization owners.

Instance Overview

The instance overview panel displays the following instance specifications:

Instance URL
Instance Node Name (for clustering)
Instance API Auth Header (this user)
The Basic authentication header used for the logged in HarperDB database user
Created Date (HarperDB Cloud only)
Region (HarperDB Cloud only)
The geographic region where the instance is hosted.
Total Price
RAM
Storage (HarperDB Cloud only)
Disk IOPS (HarperDB Cloud only)

Update Instance RAM

HarperDB Cloud instance size and user-installed instance licenses can be modified with the following instructions. This option is only available to Studio organization owners.

Note: For HarperDB Cloud instances, upgrading RAM may add additional CPUs to your instance as well. Click here to see how many CPUs are provisioned for each instance size.

In the update ram panel at the bottom left:
- Select the new instance size.
- If you do not have a credit card associated with your account, an Add Credit Card To Account button will appear. Click that to be taken to the billing screen where you can enter your credit card information before returning to the config tab to proceed with the upgrade.
- If you do have a credit card associated, you will be presented with the updated billing information.
- Click Upgrade.
The instance will shut down and begin reprovisioning/relicensing itself. The instance will not be available during this time. You will be returned to the instance dashboard and the instance status will show UPDATING INSTANCE.
Once your instance upgrade is complete, it will appear on the instance dashboard as status OK with your newly selected instance size.

Note, if HarperDB Cloud instance reprovisioning takes longer than 20 minutes, please submit a support ticket here: https://harperdbhelp.zendesk.com/hc/en-us/requests/new.

Update Instance Storage

The HarperDB Cloud instance storage size can be increased with the following instructions. This option is only available to Studio organization owners.

Note: Instance storage can only be upgraded once every 6 hours.

In the update storage panel at the bottom left:
- Select the new instance storage size.
- If you do not have a credit card associated with your account, an Add Credit Card To Account button will appear. Click that to be taken to the billing screen where you can enter your credit card information before returning to the config tab to proceed with the upgrade.
- If you do have a credit card associated, you will be presented with the updated billing information.
- Click Upgrade.
The instance will shut down and begin reprovisioning itself. The instance will not be available during this time. You will be returned to the instance dashboard and the instance status will show UPDATING INSTANCE.
Once your instance upgrade is complete, it will appear on the instance dashboard as status OK with your newly selected instance size.

Note, if this process takes longer than 20 minutes, please submit a support ticket here: https://harperdbhelp.zendesk.com/hc/en-us/requests/new.

Remove Instance

The HarperDB instance can be deleted/removed from the Studio with the following instructions. Once this operation is started it cannot be undone. This option is only available to Studio organization owners.

In the remove instance panel at the bottom left:
- Enter the instance name in the text box.
- The Studio will present you with a warning.
- Click Remove.
The instance will begin deleting immediately.

Restart Instance

The HarperDB Cloud instance can be restarted with the following instructions.

In the restart instance panel at the bottom right:
- Enter the instance name in the text box.
- The Studio will present you with a warning.
- Click Restart.
The instance will begin restarting immediately.

Enable Mixed Content

Enabling mixed content is required in cases where you would like to connect the HarperDB Studio to HarperDB Instances via HTTP. This should not be used for production systems, but may be convenient for development and testing purposes. Doing so will allow your browser to reach HTTP traffic, which is considered insecure, through an HTTPS site like the Studio.

A comprehensive guide is provided by Adobe here.

HarperDB Cloud

HarperDB Cloud is the easiest way to test drive HarperDB, it’s HarperDB-as-a-Service. Cloud handles deployment and management of your instances in just a few clicks. HarperDB Cloud is currently powered by AWS with additional cloud providers on our roadmap for the future.

IOPS Impact on Performance

HarperDB, like any database, can place a tremendous load on its storage resources. Storage, not CPU or memory, will more often be the bottleneck of server, virtual machine, or a container running HarperDB. Understanding how storage works, and how much storage performance your workload requires, is key to ensuring that HarperDB performs as expected.

IOPS Overview

The primary measure of storage performance is the number of input/output operations per second (IOPS) that a storage device can perform. Different storage devices can have dramatically different performance profiles. A hard drive (HDD) might only perform a hundred or so IOPS, while a solid state drive (SSD) might be able to perform tens or hundreds of thousands of IOPS.

Cloud providers like AWS, which powers HarperDB Cloud, don’t typically attach individual disks to a virtual machine or container. Instead, they combine large numbers of storage drives to create very high performance storage servers. Chunks (volumes) of that storage is then carved out and presented to many different virtual machines and containers. Due to the shared nature of this type of storage, the cloud provider places configurable limits on the number of IOPS that a volume can perform. The same way that cloud providers charge more for larger capacity volumes, they also charge more for volumes with more IOPS.

HarperDB Cloud Storage

HarperDB Cloud utilizes AWS Elastic Block Storage (EBS) General Purpose SSD (gp3) volumes. This is the most common storage type used in AWS, as it provides reasonable performance for most workloads, at a reasonable price.

AWS EBS gp3 volumes have a baseline performance level of 3,000 IOPS, as a result, all HarperDB Cloud storage options will offer 3,000 IOPS. We plan to offer scalable IOPS as an option in the future.

You can read more about AWS EBS volume IOPS here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html.

Estimating IOPS for HarperDB Instance

The number of IOPS required for a particular workload is influenced by many factors. Testing your particular application is the best way to determine the number of IOPS required. A reliable method is to estimate about two IOPS for every index, including the primary key itself. So if a table has two indices besides primary key, estimate that an insert or update will require about six IOPS. Note that that can often be closer to one IOPS per index under load due to internal batching of writes, and sometimes even better when doing sequential inserts. Again it is best to test to verify this with application specific data and write patterns.

For assistance in estimating IOPS requirements feel free to contact HarperDB Support or join our Community Slack Channel.

Example Use Case IOPS Requirements

Sensor Data Collection
In case of IoT sensors where data collection will be sustained high IOPS are required. While there are not typically large queries going on in this case, there is a high volume of data being ingested. This implies that IOPS will be sustained at a high level. For example, if you are collection 100 records per second you would expect to need roughly 3,000 IOPS just to handle the data inserts.
Data Analytics/BI Server
Providing a server for analytics purposes typically requires a larger machine. Typically these cases involve large scale SQL joins and aggregations, which puts a large strain on reads. HarperDB utilizes an in-memory cache, which provides a significant performance boost on machines with large amounts of memory. However, if disparate datasets are constantly being queried and/or new data is frequently being loaded, you will find that the system still needs to have high IOPS to meet performance demand.
Web Services
Typical web service implementations with discrete reads and writes often do not need high IOPS to perform as expected. This is often the case is more transactional systems without the requirement for high performance load. A good rule to follow is that any HarperDB operation that requires a data scan will be IOPS intensive, but if these are not frequent then the EBS boost will suffice. Queries utilizing equals operations in either SQL or NoSQL do not require a scan due to HarperDB’s native indexing.
High Performance Database
Ultimately, if performance is your top priority, HarperDB should be run on bare metal hardware. Cloud providers offer these options at a higher cost, but they come with obvious performance improvements.

Alarms

HarperDB Cloud instance alarms are triggered when certain conditions are met. Once alarms are triggered organization owners will immediately receive an email alert and the alert will be available on the page. The below table describes each alert and their evaluation metrics.

Heading Definitions

Alarm: Title of the alarm.
Threshold: Definition of the alarm threshold.
Intervals: The number of occurrences before an alarm is triggered and the period that the metric is evaluated over.
Proposed Remedy: Recommended solution to avoid the alert in the future.

Alarm

Threshold

Intervals

Proposed Remedy

Security

HarperDB uses role-based, attribute-level security to ensure that users can only gain access to the data they’re supposed to be able to access. Our granular permissions allow for unparalleled flexibility and control, and can actually lower the total cost of ownership compared to other database solutions, since you no longer have to replicate subsets of your data to isolate use cases.

Basic Authentication

HarperDB uses Basic Auth and JSON Web Tokens (JWTs) to secure our HTTP requests. In the context of an HTTP transaction, basic access authentication is a method for an HTTP user agent to provide a user name and password when making a request.

** You do not need to log in separately. Basic Auth is added to each HTTP request like create_schema, create_table, insert etc… via headers. **

A header is added to each HTTP request. The header key is “Authorization” the header value is “Basic <<your username and password buffer token>>”

Authentication in HarperDB Studio

In the below code sample, you can see where we add the authorization header to the request. This needs to be added for each and every HTTP request for HarperDB.

Note: This function uses btoa. Learn about btoa here.

function callHarperDB(call_object, operation, callback){

    const options = {
        "method": "POST",
        "hostname": call_object.endpoint_url,
        "port": call_object.endpoint_port,
        "path": "/",
        "headers": {
            "content-type": "application/json",
            "authorization": "Basic " + btoa(call_object.username + ':' + call_object.password),
            "cache-control": "no-cache"

        }
    };

    const http_req = http.request(options, function (hdb_res) {
        let chunks = [];

        hdb_res.on("data", function (chunk) {
            chunks.push(chunk);
        });

        hdb_res.on("end", function () {
            const body = Buffer.concat(chunks);
            if (isJson(body)) {
                return callback(null, JSON.parse(body));
            } else {
                return callback(body, null);

            }

        });
    });

    http_req.on("error", function (chunk) {
        return callback("Failed to connect", null);
    });

    http_req.write(JSON.stringify(operation));
    http_req.end();

}

Configuration

HarperDB was set up to require very minimal configuration to work out of the box. There are, however, some best practices we encourage for anyone building an app with HarperDB.

CORS

HarperDB allows for managing cross-origin HTTP requests. By default, HarperDB enables CORS for all domains if you need to disable CORS completely or set up an access list of domains you can do the following:

Open the harperdb-config.yaml file this can be found in <ROOTPATH>, the location you specified during install.
In harperdb-config.yaml there should be 2 entries under operationsApi.network: cors and corsAccessList.
- cors
  1. To turn off, change to: cors: false
  2. To turn on, change to: cors: true
- corsAccessList
  1. The corsAccessList will only be recognized by the system when cors is true
  2. To create an access list you set corsAccessList to a comma-separated list of domains.
    i.e. corsAccessList is http://harperdb.io,http://products.harperdb.io
  3. To clear out the access list and allow all domains: corsAccessList is [null]

SSL

HarperDB provides the option to use an HTTP or HTTPS and HTTP/2 interface. The default port for the server is 9925.

These default ports can be changed by updating the operationsApi.network.port value in <ROOTPATH>/harperdb-config.yaml

By default, HTTPS is turned off and HTTP is turned on. It is recommended that you never directly expose HarperDB's HTTP interface through a publicly available port. HTTP is intended for local or private network use.

You can toggle HTTPS and HTTP in the settings file. By setting operationsApi.network.https to true/false. When https is set to false, the server will use HTTP (version 1.1). Enabling HTTPS will enable both HTTPS/1.1 and HTTPS/2.

HarperDB automatically generates a certificate (certificate.pem), a certificate authority (ca.pem) and a private key file (privateKey.pem) which live at <ROOTPATH>/keys/.

You can replace these with your own certificates and key.

Changes to these settings require a restart. Use operation harperdb restart from HarperDB Operations API.

Clustering

HarperDB clustering is the process of connecting multiple HarperDB databases together to create a database mesh network that enables users to define data replication patterns.

HarperDB’s clustering engine replicates data between instances of HarperDB using a highly performant, bi-directional pub/sub model on a per-table basis. Data replicates asynchronously with eventual consistency across the cluster following the defined pub/sub configuration. Individual transactions are sent in the order in which they were transacted, once received by the destination instance, they are processed in an ACID-compliant manor. Conflict resolution follows a last writer wins model based on recorded transaction time on the transaction and the timestamp on the record on the node.

Common Use Case

A common use case is an edge application collecting and analyzing sensor data that creates an alert if a sensor value exceeds a given threshold:

The edge application should not be making outbound http requests for security purposes.
There may not be a reliable network connection.
Not all sensor data will be sent to the cloud--either because of the unreliable network connection, or maybe it’s just a pain to store it.
The edge node should be inaccessible from outside the firewall.
The edge node will send alerts to the cloud with a snippet of sensor data containing the offending sensor readings.

HarperDB simplifies the architecture of such an application with its bi-directional, table-level replication:

The edge instance subscribes to a “thresholds” table on the cloud instance, so the application only makes localhost calls to get the thresholds.
The application continually pushes sensor data into a “sensor_data” table via the localhost API, comparing it to the threshold values as it does so.
When a threshold violation occurs, the application adds a record to the “alerts” table.
The application appends to that record array “sensor_data” entries for the 60 seconds (or minutes, or days) leading up to the threshold violation.
The edge instance publishes the “alerts” table up to the cloud instance.

By letting HarperDB focus on the fault-tolerant logistics of transporting your data, you get to write less code. By moving data only when and where it’s needed, you lower storage and bandwidth costs. And by restricting your app to only making local calls to HarperDB, you reduce the overall exposure of your application to outside forces.

Requirements and Definitions

To create a cluster you must have two or more nodes* (aka instances) of HarperDB running.

*A node is a single instance/installation of HarperDB. A node of HarperDB can operate independently with clustering on or off.

On the following pages we'll walk you through the steps required, in order, to set up a HarperDB cluster.

Creating A Cluster User

Inter-node authentication takes place via HarperDB users. There is a special role type called cluster_user that exists by default and limits the user to only clustering functionality.

A cluster_user must be created and added to the harperdb-config.yaml file for clustering to be enabled.

All nodes that are intended to be clustered together need to share the same cluster_user credentials (i.e. username and password).

There are multiple ways a cluster_user can be created, they are:

Through the operations API by calling add_user

When using the API to create a cluster user the harperdb-config.yaml file must be updated with the username of the new cluster user.

This can be done through the API by calling set_configuration or by editing the harperdb-config.yaml file.

In the harperdb-config.yaml file under the top-level clustering element there will be a user element. Set this to the name of the cluster user.

Note: When making any changes to the harperdb-config.yaml file, HarperDB must be restarted for the changes to take effect.

Upon installation using command line variables. This will automatically set the user in the harperdb-config.yaml file.

Note: Using command line or environment variables for setting the cluster user only works on install.

Upon installation using environment variables. This will automatically set the user in the harperdb-config.yaml file.

Naming A Node

Node name is the name given to a node. It is how nodes are identified within the cluster and must be unique to the cluster.

The name cannot contain any of the following characters: .,*> . Dot, comma, asterisk, greater than, or whitespace.

The name is set in the harperdb-config.yaml file using the clustering.nodeName configuration element.

Note: If you want to change the node name make sure there are no subscriptions in place before doing so. After the name has been changed a full restart is required.

There are multiple ways to update this element, they are:

Directly editing the harperdb-config.yaml file.

Note: When making any changes to the harperdb-config.yaml file HarperDB must be restarted for the changes to take effect.

Calling set_configuration through the operations API

Using command line variables.

Using environment variables.

Enabling Clustering

Clustering does not run by default; it needs to be enabled.

To enable clustering the clustering.enabled configuration element in the harperdb-config.yaml file must be set to true.

There are multiple ways to update this element, they are:

Directly editing the harperdb-config.yaml file and setting enabled to true

Note: When making any changes to the harperdb-config.yaml file HarperDB must be restarted for the changes to take effect.

Calling set_configuration through the operations API

Note: When making any changes to HarperDB configuration HarperDB must be restarted for the changes to take effect.

Using command line variables.

Using environment variables.

An efficient way to install HarperDB, create the cluster user, set the node name and enable clustering in one operation is to combine the steps using command line and/or environment variables. Here is an example using command line variables.

Subscription Overview

A subscription defines how data should move between two nodes. They are exclusively table level and operate independently. They connect a table on one node to a table on another node, the subscription will apply to a matching schema name and table name on both nodes.

Note: ‘local’ and ‘remote’ will often be referred to. In the context of these docs ‘local’ is the node that is receiving the API request to create/update a subscription and remote is the other node that is referred to in the request, the node on the other end of the subscription.

A subscription consists of:

schema - the name of the schema that the table you are creating the subscription for belongs to.

table - the name of the table the subscription will apply to.

publish - a boolean which determines if transactions on the local table should be replicated on the remote table.

subscribe - a boolean which determines if transactions on the remote table should be replicated on the local table.

Publish subscription

This diagram is an example of a publish subscription from the perspective of Node1.

The record with id 2 has been inserted in the dog table on Node1, after it has completed that insert it is sent to Node 2 and inserted in the dog table there.

This diagram is an example of a subscribe subscription from the perspective of Node1.

The record with id 3 has been inserted in the dog table on Node2, after it has completed that insert it is sent to Node1 and inserted there.

This diagram shows both subscribe and publish but publish is set to false. You can see that because subscribe is true the insert on Node2 is being replicated on Node1 but because publish is set to false the insert on Node1 is not being replicated on Node2.

This shows both subscribe and publish set to true. The insert on Node1 is replicated on Node2 and the update on Node2 is replicated on Node1.

Custom Functions

Custom functions are a key part of building a complete HarperDB application. It is highly recommended that you use Custom Functions as the primary mechanism for your application to access your HarperDB database. Using Custom Functions gives you complete control over the accessible endpoints, how users are authenticated and authorized, what data is accessed from the database, and how it is aggregated and returned to users.

Add your own API endpoints to a standalone API server inside HarperDB
Use HarperDB Core methods to interact with your data at lightning speed
Custom Functions are powered by Fastify, so they’re extremely flexible
Manage in HarperDB Studio, or use your own IDE and Version Management System
Distribute your Custom Functions to all your HarperDB instances with a single click

Requirements and Definitions

Before you get started with Custom Functions, here’s a primer on the basic configuration and the structure of a Custom Functions Project.

Configuration

Custom Functions are configured in the harperdb-config.yaml file located in the operations API root directory (by default this is a directory named hdb located in the home directory of the current user). Below is a view of the Custom Functions' section of the config YAML file, plus descriptions of important Custom Functions settings.

customFunctions:
  enabled: true
  network:
    cors: true
    corsAccessList:
      - null
    headersTimeout: 60000
    https: false
    keepAliveTimeout: 5000
    port: 9926
    timeout: 120000
  nodeEnv: production
  root: ~/hdb/custom_functions
  tls:
    certificate: ~/hdb/keys/certificate.pem
    certificateAuthority: ~/hdb/keys/ca.pem
    privateKey: ~/hdb/keys/privateKey.pem

enabled A boolean value that tells HarperDB to start the Custom Functions server. Set it to true to enable custom functions and false to disable. enabled is true by default.
network.port This is the port HarperDB will use to start a standalone Fastify Server dedicated to serving your Custom Functions’ routes.
root This is the root directory where your Custom Functions projects and their files will live. By default, it’s in your <ROOTPATH>, but you can locate it anywhere--in a developer folder next to your other development projects, for example.

Please visit our configuration docs for a more comprehensive look at these settings.

Project Structure

project folder

The name of the folder that holds your project files serves as the root prefix for all the routes you create. All routes created in the dogs project folder will have a URL like this: https://my-server-url.com:9926/dogs/my/route. As such, it’s important that any project folders you create avoid any characters that aren’t URL-friendly. You should avoid URL delimiters in your folder names.

/routes folder

Files in the routes folder define the requests that your Custom Functions server will handle. They are standard Fastify route declarations, so if you’re familiar with them, you should be up and running in no time. The default components for a route are the url, method, preValidation, and handler.

module.exports = async (server, { hdbCore, logger }) => {
    server.route({
        url: '/',
        method: 'POST',
        preValidation: hdbCore.preValidation,
        handler: hdbCore.request,
    });
}

/helpers folder

These files are JavaScript modules that you can use in your handlers, or for custom preValidation hooks. Examples include calls to third party Authentication services, filters for results of calls to HarperDB, and custom error responses. As modules, you can use standard import and export functionality.

"use strict";

const dbFilter = (databaseResultsArray) => databaseResultsArray.filter((result) => result.showToApi === true);

module.exports = dbFilter;

/static folder

If you’d like to serve your visitors a static website, you can place the html and supporting files into a directory called static. The directory must have an index.html file, and can have as many supporting resources as are necessary in whatever subfolder structure you prefer within that static directory.

Create a Project

To create a project using our web-based GUI, HarperDB Studio, checkout out how to manage Custom Functions here.

Otherwise, to create a project, you have the following options:

Use the add_custom_function_project operation
This operation creates a new project folder, and populates it with templates for the routes, helpers, and static subfolders.

{
   "operation": "add_custom_function_project",
   "project": "dogs"
}

Clone our public gitHub project template
This requires a local installation. Remove the .git directory for a clean slate of git history.

> git clone https://github.com/HarperDB/harperdb-custom-functions-template.git ~/hdb/custom_functions/dogs

Create a project folder in your Custom Functions root directory and initialize
This requires a local installation.

> mkdir ~/hdb/custom_functions/dogs

> npm init

Define Helpers

Helpers are functions for use within your routes. You may want to use the same helper in multiple route files, so this allows you to write it once, and include it wherever you need it.

To use your helpers, they must be exported from your helper file. Please use any standard export mechanisms available for your module system. We like ESM, ECMAScript Modules. Our example below exports using module.exports.
You must import the helper module into the file that needs access to the exported functions. With ESM, you'd use a require statement. See in Define Routes.

Below is code from the customValidation helper that is referenced in . It takes the request and the logger method from the route declaration, and makes a call to an external API to validate the headers using fetch. The API in this example is just returning a list of ToDos, but it could easily be replaced with a call to a real authentication service.

Host A Static Web UI

The module can be utilized to serve static files.

Install the module in your project by running npm i @fastify/static from inside your project directory.

Register @fastify/static with the server and set root to the absolute path of the directory that contains the static files to serve.

For further information on how to send specific files see the docs.

Using NPM and GIT

Custom function projects can be structured and managed like normal Node.js projects. You can include external dependencies, include them in your route and helper files, and manage your revisions without changing your development tooling or pipeline.

To initialize your project to use npm packages, use the terminal to execute npm init from the root of your project folder.
To implement version control using git, use the terminal to execute git init from the root of your project folder.

Custom Functions Operations

One way to manage Custom Functions is through HarperDB Studio. It performs all the necessary operations automatically. To get started, navigate to your instance in HarperDB Studio and click the subnav link for “functions”. If you have not yet enabled Custom Functions, it will walk you through the process. Once configuration is complete, you can manage and deploy Custom Functions in minutes.

HarperDB Studio manages your Custom Functions using nine HarperDB operations. You may view these operations within our API Docs. A brief overview of each of the operations is below:

custom_functions_status
Returns the state of the Custom Functions server. This includes whether it is enabled, upon which port it is listening, and where its root project directory is located on the host machine.
get_custom_functions
Returns an array of projects within the Custom Functions root project directory. Each project has details including each of the files in the routes and helpers directories, and the total file count in the static folder.
get_custom_function
Returns the content of the specified file as text. HarperDB Studio uses this call to render the file content in its built-in code editor.
set_custom_function
Updates the content of the specified file. HarperDB Studio uses this call to save any changes made through its built-in code editor.
drop_custom_function
Deletes the specified file.
add_custom_function_project
Creates a new project folder in the Custom Functions root project directory. It also inserts into the new directory the contents of our Custom Functions Project template, which is available publicly, here: https://github.com/HarperDB/harperdb-custom-functions-template.
drop_custom_function_project
Deletes the specified project folder and all of its contents.
package_custom_function_project
Creates a .tar file of the specified project folder, then reads it into a base64-encoded string and returns that string the user.
deploy_custom_function_project
Takes the output of package_custom_function_project, decrypts the base64-encoded string, reconstitutes the .tar file of your project folder, and extracts it to the Custom Functions root project directory.

Restarting the Server

One way to manage Custom Functions is through . It performs all the necessary operations automatically. To get started, navigate to your instance in HarperDB Studio and click the subnav link for “functions”. If you have not yet enabled Custom Functions, it will walk you through the process. Once configuration is complete, you can manage and deploy Custom Functions in minutes.

For any changes made to your routes, helpers, or projects, you’ll need to restart the Custom Functions server to see them take effect. HarperDB Studio does this automatically whenever you create or delete a project, or add, edit, or edit a route or helper. If you need to start the Custom Functions server yourself, you can use the following operation to do so:

Debugging a Custom Function

HarperDB Custom Functions projects are managed by HarperDB’s process manager. As such, it may seem more difficult to debug Custom Functions than your standard project. The goal of this document is to provide best practices and recommendations for debugging your Custom Function.

For local debugging and development, it is recommended that you use standard console log statements for logging. For production use, you may want to use HarperDB's logging facilities, so you aren't logging to the console. The HarperDB Custom Functions template includes the HarperDB logger module in the primary function parameters with the name logger. This logger can be used to output messages directly to the HarperDB log using standardized logging level functions, described below. The log level can be set in the HarperDB Configuration File.

HarperDB Logger Functions

trace(message): Write a 'trace' level log, if the configured level allows for it.
debug(message): Write a 'debug' level log, if the configured level allows for it.
info(message): Write a 'info' level log, if the configured level allows for it.
warn(message): Write a 'warn' level log, if the configured level allows for it.
error(message): Write a 'error' level log, if the configured level allows for it.
fatal(message): Write a 'fatal' level log, if the configured level allows for it.
notify(message): Write a 'notify' level log.

For debugging purposes, it is recommended to use notify as these messages will appear in the log regardless of log level configured.

Viewing the Log

The HarperDB Log can be found on the Studio Status page or in the local Custom Functions log file, <HDBROOT>/log/custom_functions.log. Additionally, you can use the read_log operation to query the HarperDB log.

Example 1: Execute Query and Log Results

This example performs a SQL query in HarperDB and logs the result. This example utilizes the logger.notify function to log the stringified version of the result. If an error occurs, it will output the error using logger.error and return the error.

server.route({
    url: '/',
    method: 'GET',
    handler: async (request) => {
        request.body = {
            operation: 'sql',
            sql: 'SELECT * FROM dev.dog ORDER BY dog_name'
        };

        try {
            let result = await hdbCore.requestWithoutAuthentication(request);
            logger.notify(`Query Result: ${JSON.stringify(result)}`);
            return result;
        } catch (e) {
            logger.error(`Query Error: ${e}`);
            return e;
        }
    }
});

Example 2: Execute Multiple Queries and Log Activity

This example performs two SQL queries in HarperDB with logging throughout to describe what is happening. This example utilizes the logger.notify function to log the stringified version of the operation and the result of each query. If an error occurs, it will output the error using logger.error and return the error.

server.route({
    url: '/example',
    method: 'GET',
    handler: async (request) => {
        logger.notify('/example called!');
        const results = [];

        request.body = {
            operation: 'sql',
            sql: 'SELECT * FROM dev.dog WHERE id = 1'
        };
        logger.notify(`Query 1 Operation: ${JSON.stringify(request.body)}`);
        try {
            let result = await hdbCore.requestWithoutAuthentication(request);
            logger.notify(`Query 1: ${JSON.stringify(result)}`);
            results.push(result);
        } catch (e) {
            logger.error(`Query 1: ${e}`);
            return e;
        }

        request.body = {
            operation: 'sql',
            sql: 'SELECT * FROM dev.dog WHERE id = 2'
        };
        logger.notify(`Query 2 Operation: ${JSON.stringify(request.body)}`);
        try {
            let result = await hdbCore.requestWithoutAuthentication(request);
            logger.notify(`Query 2: ${JSON.stringify(result)}`);
            results.push(result);
        } catch (e) {
            logger.error(`Query 2: ${e}`);
            return e;
        }

        logger.notify('/example complete!');
        return results;
    }
});

Custom Functions Templates

Check out our always-expanding library of templates in our open-source .

Example Projects

Library of example projects and tutorials using Custom Functions:

Authorization in HarperDB using Okta Customer Identity Cloud, by Yitaek Hwang
How to Speed Up your Applications by Caching at the Edge with HarperDB, by Danny Adams
OAuth Authentication in HarperDB using Auth0 & Node.js, by Lucas Santos
How To Create a CRUD API with Next.js & HarperDB Custom Functions, by Colby Fayock
Build a Dynamic REST API with Custom Functions, by Terra Roush
How to use HarperDB Custom Functions to Build your Entire Backend, by Andrew Baisden
Using TensorFlowJS & HarperDB Custom Functions for Machine Learning, by Kevin Ashcraft
Build & Deploy a Fitness App with Python & HarperDB, by Patrick Löber
Create a Discord Slash Bot using HarperDB Custom Functions, by Soumya Ranjan Mohanty
How I used HarperDB Custom Functions to Build a Web App for my Newsletter, by Hrithwik Bharadwaj
How I used HarperDB Custom Functions and Recharts to create Dashboard, by Tapas Adhikary
How To Use HarperDB Custom Functions With Your React App, by Ankur Tyagi
Build a Web App Using HarperDB’s Custom Functions, livestream by Jaxon Repp
How to Web Scrape Using Python, Snscrape & Custom Functions, by Davis David
What’s the Big Deal w/ Custom Functions, Select* Podcast

Add-ons and SDKs

All HarperDB Add-Ons and SDKs can be found in the HarperDB Marketplace located in the HarperDB Studio.

SQL Guide

The purpose of this guide is to describe the available functionality of HarperDB as it relates to supported SQL functionality. The SQL parser is still actively being developed and this document will be updated as more features and functionality becomes available. A high-level view of supported features can be found .

HarperDB adheres to the concept of schemas & tables. This allows developers to isolate table structures from each other all within one database.

Insert

HarperDB supports inserting 1 to n records into a table. The primary key must be unique (not used by any other record). If no primary key is provided, it will be assigned an auto-generated UUID. HarperDB does not support selecting from one table to insert into another at this time.

INSERT INTO dev.dog (id, dog_name, age, breed_id)
  VALUES(1, 'Penny', 5, 347), (2, 'Kato', 4, 347)

Update

HarperDB supports updating existing table row(s) via UPDATE statements. Multiple conditions can be applied to filter the row(s) to update. At this time selecting from one table to update another is not supported.

UPDATE dev.dog
    SET owner_name = 'Kyle'
    WHERE id IN (1, 2)

Delete

HarperDB supports deleting records from a table with condition support.

DELETE FROM dev.dog
  WHERE age < 4

Select

HarperDB has robust SELECT support, from simple queries all the way to complex joins with multi-conditions, aggregates, grouping & ordering.

All results are returned as JSON object arrays.

Query for all records and attributes in the dev.dog table:

Query specific columns from all rows in the dev.dog table:

Query for all records and attributes in the dev.dog table ORDERED BY age in ASC order:

*The ORDER BY keyword sorts in ascending order by default. To sort in descending order, use the DESC keyword.

Joins

HarperDB allows developers to join any number of tables and currently supports the following join types:

INNER JOIN LEFT
INNER JOIN LEFT
OUTER JOIN

Here’s a basic example joining two tables from our Get Started example- joining a dogs table with a breeds table:

SQL JSON Search

HarperDB automatically indexes all top level attributes in a row / object written to a table. However, any attributes which holds JSON does not have its nested attributes indexed. In order to make searching and/or transforming these JSON documents easy, HarperDB offers a special SQL function called SEARCH_JSON. The SEARCH_JSON function works in SELECT & WHERE clauses allowing queries to perform powerful filtering on any element of your JSON by implementing the into our SQL engine.

Syntax

SEARCH_JSON(expression, attribute)

Executes the supplied string expression against data of the defined top level attribute for each row. The expression both filters and defines output from the JSON document.

Example 1

Search a string array

Here are two records in the database:

Here is a simple query that gets any record with "Harper" found in the name.

Example 2

The purpose of this query is to give us every movie where at least two of our favorite actors from Marvel films have acted together. The results will return the movie title, the overview, release date and an object array of the actor’s name and their character name in the movie.

Both function calls evaluate the credits.cast attribute, this attribute is an object array of every cast member in a movie.

A sample of this data from the movie The Avengers looks like

Let’s break down the SEARCH_JSON function call in the SELECT:

The first argument passed to SEARCH_JSON is the expression to execute against the second argument which is the cast attribute on the credits table. This expression will execute for every row. Looking into the expression it starts with “$[…]” this tells the expression to iterate all elements of the cast array.

Then the expression tells the function to only return entries where the name attribute matches any of the actors defined in the array:

So far, we’ve iterated the array and filtered out rows, but we also want the results formatted in a specific way, so we’ve chained an expression on our filter with: {“actor”: name, “character”: character}. This tells the function to create a specific object for each matching entry.

Sample Result

Just having the SEARCH_JSON function in our SELECT is powerful, but given our criteria it would still return every other movie that doesn’t have our matching actors, in order to filter out the movies we do not want we also use SEARCH_JSON in the WHERE clause.

This function call in the WHERE clause is similar, but we don’t need to perform the same transformation as occurred in the SELECT:

As seen above we execute the same name filter against the cast array, the primary difference is we are wrapping the filtered results in $count(…). As it looks this returns a count of the results back which we then use against our SQL comparator of >= 2.

To see further SEARCH_JSON examples in action view our Postman Collection that provides a sample schema & data with query examples: https://api.harperdb.io/

To learn more about how to build expressions check out the JSONata documentation: http://docs.jsonata.org/overview

SQL Geospatial Functions

HarperDB geospatial features require data to be stored in a single column using the , a standard commonly used in geospatial technologies. Geospatial functions are available to be used in SQL statements.

If you are new to GeoJSON you should check out the full specification here: http://geojson.org/. There are a few important things to point out before getting started.

All GeoJSON coordinates are stored in [longitude, latitude] format.
Coordinates or GeoJSON geometries must be passed as string when written directly in a SQL statement.
Note if you are using Postman for you testing. Due to limitations in the Postman client, you will need to escape quotes in your strings and your SQL will need to be passed on a single line.

In the examples contained in the left-hand navigation, schema and table names may change, but all GeoJSON data will be stored in a column named geo_data.

geoArea

The geoArea() function returns the area of one or more features in square meters.

Syntax

geoArea(geoJSON)

Parameters

Parameter

Description

Example 1

Calculate the area, in square meters, of a manually passed GeoJSON polygon.

Example 2

Find all records that have an area less than 1 square mile (or 2589988 square meters).

geoLength

Takes a GeoJSON and measures its length in the specified units (default is kilometers).

Syntax

geoLength(geoJSON[, units])

Parameters

Parameter

Description

geoJSON

Required. GeoJSON to measure.

units

Optional. Specified as a string. Options are ‘degrees’, ‘radians’, ‘miles’, or ‘kilometers’. Default is ‘kilometers’.

Example 1

Calculate the length, in kilometers, of a manually passed GeoJSON linestring.

SELECT geoLength('{
    "type": "Feature",
    "geometry": {
        "type": "LineString",
        "coordinates": [
            [-104.97963309288025,39.76163265441438],
            [-104.9823260307312,39.76365323407955],
            [-104.99193906784058,39.75616442110704]
        ]
    }
}')

Example 2

Find all data plus the calculated length in miles of the GeoJSON, restrict the response to only lengths less than 5 miles, and return the data in order of lengths smallest to largest.

SELECT *, geoLength(geo_data, 'miles') as length
FROM dev.locations
WHERE geoLength(geo_data, 'miles') < 5
ORDER BY length ASC

geoDifference

Returns a new polygon with the difference of the second polygon clipped from the first polygon.

Syntax

geoDifference(polygon1, polygon2)

Parameters

Parameter

Description

polygon1

Required. Polygon or MultiPolygon GeoJSON feature.

polygon2

Required. Polygon or MultiPolygon GeoJSON feature to remove from polygon1.

Example

Return a GeoJSON Polygon that removes City Park (polygon2) from Colorado (polygon1).

SELECT geoDifference('{
    "type": "Feature",
    "properties": {
      "name":"Colorado"
    },
    "geometry": {
        "type": "Polygon",
        "coordinates": [[
            [-109.072265625,37.00255267215955],
            [-102.01904296874999,37.00255267215955],
            [-102.01904296874999,41.0130657870063],
            [-109.072265625,41.0130657870063],
            [-109.072265625,37.00255267215955]
        ]]
      }
    }',
    '{
        "type": "Feature",
        "properties": {
          "name":"City Park"
        },
        "geometry": {
            "type": "Polygon",
            "coordinates": [[
                [-104.95973110198975,39.7543828214657],
                [-104.95955944061278,39.744781185675386],
                [-104.95904445648193,39.74422022399989],
                [-104.95835781097412,39.74402223643582],
                [-104.94097709655762,39.74392324244047],
                [-104.9408483505249,39.75434982844515],
                [-104.95973110198975,39.7543828214657]
            ]]
        }
    }'
)

geoNear

Determines if point1 and point2 are within a specified distance from each other, default units are kilometers. Returns a Boolean.

Syntax

geoNear(point1, point2, distance[, units])

Parameters

Parameter

Description

point1

Required. GeoJSON Point specifying the origin.

point2

Required. GeoJSON Point specifying the destination.

distance

Required. The maximum distance in units as an integer or decimal.

units

Optional. Specified as a string. Options are ‘degrees’, ‘radians’, ‘miles’, or ‘kilometers’. Default is ‘kilometers’.

Example 1

Return all locations within 50 miles of a given point.

SELECT *
FROM dev.locations
WHERE geoNear('[-104.979127,39.761563]', geo_data, 50, 'miles')

Example 2

Return all locations within 2 degrees of the earth of a given point. (Each degree lat/long is about 69 miles [111 kilometers]). Return all data and the distance in miles, sorted by ascending distance.

SELECT *, geoDistance('[-104.979127,39.761563]', geo_data, 'miles') as distance
FROM dev.locations
WHERE geoNear('[-104.979127,39.761563]', geo_data, 2, 'degrees')
ORDER BY distance ASC

geoContains

Determines if geo2 is completely contained by geo1. Returns a Boolean.

Syntax

geoContains(geo1, geo2)

Parameters

Parameter

Description

geo1

Required. Polygon or MultiPolygon GeoJSON feature.

geo2

Required. Polygon or MultiPolygon GeoJSON feature tested to be contained by geo1.

Example 1

Return all locations within the state of Colorado (passed as a GeoJSON string).

SELECT *
FROM dev.locations
WHERE geoContains('{
    "type": "Feature",
    "properties": {
      "name":"Colorado"
    },
    "geometry": {
        "type": "Polygon",
        "coordinates": [[
            [-109.072265625,37.00255267],
            [-102.01904296874999,37.00255267],
            [-102.01904296874999,41.01306579],
            [-109.072265625,41.01306579],
            [-109.072265625,37.00255267]
        ]]
    }
}', geo_data)

Example 2

Return all locations which contain HarperDB Headquarters.

SELECT *
FROM dev.locations
WHERE geoContains(geo_data, '{
    "type": "Feature",
    "properties": {
      "name": "HarperDB Headquarters"
    },
    "geometry": {
        "type": "Polygon",
        "coordinates": [[
            [-104.98060941696167,39.760704817357905],
            [-104.98053967952728,39.76065120861263],
            [-104.98055577278137,39.760642961109674],
            [-104.98037070035934,39.76049450588716],
            [-104.9802714586258,39.76056254790385],
            [-104.9805235862732,39.76076461167841],
            [-104.98060941696167,39.760704817357905]
        ]]
    }
}')

geoEqual

Determines if two GeoJSON features are the same type and have identical X,Y coordinate values. For more information see https://developers.arcgis.com/documentation/spatial-references/. Returns a Boolean.

Syntax

geoEqual(geo1, geo2)

Parameters

Parameter

Description

Example

Find HarperDB Headquarters within all locations within the database.

geoCrosses

Determines if the geometries cross over each other. Returns boolean.

Syntax

geoCrosses(geo1, geo2)

Parameters

Parameter

Description

geo1

Required. GeoJSON geometry or feature.

geo2

Required. GeoJSON geometry or feature.

Example

Find all locations that cross over a highway.

SELECT *
FROM dev.locations
WHERE geoCrosses(
    geo_data,
    '{
        "type": "Feature",
        "properties": {
          "name": "Highway I-25"
        },
        "geometry": {
            "type": "LineString",
            "coordinates": [
                [-104.9139404296875,41.00477542222947],
                [-105.0238037109375,39.715638134796336],
                [-104.853515625,39.53370327008705],
                [-104.853515625,38.81403111409755],
                [-104.61181640625,38.39764411353178],
                [-104.8974609375,37.68382032669382],
                [-104.501953125,37.00255267215955]
            ]
        }
    }'
)

HarperDB CLI

The HarperDB command line interface (CLI) is used to administer .

Installing HarperDB

To install HarperDB with CLI prompts, run the following command:

Alternatively, HarperDB installations can be automated with environment variables or command line arguments; . Note, when used in conjunction, command line arguments will override environment variables.

Environment Variables

Command Line Arguments

Starting HarperDB

To start HarperDB after it is installed, run the following command:

Stopping HarperDB

To stop HarperDB once it is running, run the following command:

Restarting HarperDB

To restart HarperDB once it is running, run the following command:

Managing HarperDB Service(s)

The following commands are used to start, restart, or stop one or more HarperDB service without restarting the full application:

The following services are managed via the above commands:

HarperDB
Custom Functions
IPC
Clustering

Getting the HarperDB Version

To check the version of HarperDB that is installed run the following command:

Get all available CLI commands

To display all available HarperDB CLI commands along with a brief description run:

Get the status of HarperDB and clustering

To display the status of the HarperDB process, the clustering hub and leaf processes, the clustering network and replication statuses, run:

Backups

HarperDB uses a transactional commit process that ensures that data on disk is always transactionally consistent with storage. This means that HarperDB maintains safety of database integrity in the event of a crash. It also means that you can use any standard volume snapshot tool to make a backup of a HarperDB database. Database files are stored in the hdb/schemas directory (organized schema directories). As long as the snapshot is an atomic snapshot of these database files, the data can be copied/movied back into the schemas directory to restore a previous backup (with HarperDB shut down) , and database integrity will be preserved. Note that simply copying an in-use database file (using cp, for example) is not a snapshot, and this would progressively read data from the database at different points in time, which yields unreliable copy that likely will not be usable. Standard copying is only reliable for a database file that is not in use.

Logging

HarperDB maintains a log of events that take place throughout operation. Log messages can be used for diagnostics purposes as well as monitoring.

All logs (except for the install log) are stored in the main log file in the hdb directory <ROOTPATH>/log/hdb.log. The install log is located in the HarperDB application directory most likely located in your npm directory npm/harperdb/logs.

Each log message has several key components for consistent reporting of events. A log message has a format of:

<timestamp> [<level>] [<thread/id>] ...[<other tags>]: <message>

For example, a typical log entry looks like:

2023-03-09T14:25:05.269Z [notify] [main/0]: HarperDB successfully started.

The components of a log entry are:

timestamp - This is the date/time stamp when the event occurred
level - This is an associated log level that gives a rough guide to the importance and urgency of the message. The available log levels in order of least urgent (and more verbose) are: trace, debug, info, warn, error, fatal, and notify.
thread/id - This reports the name of the thread and the thread id, that the event was reported on. Note that NATS logs are recorded by their process name and there is no thread id for them since they are a separate process. Key threads are:
- main - This is the thread that is responsible for managing all other threads and routes incoming requests to the other threads
- http - These are the worker threads that handle the primary workload of incoming HTTP requests to the operations API and custom functions.
- Clustering* - These are threads and processes that handle replication.
- job - These are job threads that have been started to handle operations that are executed in a separate job thread.
tags - Logging from a custom function will include a "custom-function" tag in the log entry. Most logs will not have any additional tags.
message - This is the main message that was reported.

We try to keep logging to a minimum by default, to do this the default log level is error. If you require more information from the logs, increasing the log level down will provide that.

The log level can be changed by modifying logging.level in the config file harperdb-config.yaml.

Clustering Logging

HarperDB clustering utilizes two Nats servers, named Hub and Leaf. The Hub server is responsible for establishing the mesh network that connects instances of HarperDB and the Leaf server is responsible for managing the message stores (streams) that replicate and store messages between instances. Due to the verbosity of these servers there is a separate log level configuration for them. To adjust their log verbosity set clustering.logLevel in the config file harperdb-config.yaml. Valid log levels from least verbose are error, warn, info, debug and trace.

Log File vs Standard Streams

HarperDB logs can optionally be streamed to standard streams. Logging to standard streams (stdout/stderr) is primarily used for container logging drivers. For more traditional installations, we recommend logging to a file. Logging to both standard streams and to a file can be enabled simultaneously. To log to standard streams effectively, make sure to directly run harperdb and don't start it as a separate process (don't use harperdb start) and logging.stdStreams must be set to true. Note, logging to standard streams only will disable clustering catchup.

Logging Rotation

Log rotation allows for managing log files, such as compressing rotated log files, archiving old log files, determining when to rotate, and the like. This will allow for organized storage and efficient use of disk space. For more information see “logging” in our config docs.

Read Logs via the API

To access specific logs you may query the HarperDB API. Logs can be queried using the read_log operation. read_log returns outputs from the log based on the provided search criteria.

{
    "operation": "read_log",
    "start": 0,
    "limit": 1000,
    "level": "error",
    "from": "2021-01-25T22:05:27.464+0000",
    "until": "2021-01-25T23:05:27.464+0000",
    "order": "desc"
}

Transaction Logging

HarperDB offers two options for logging transactions executed against a table. The options are similar but utilize different storage layers.

Transaction log

The first option is read_transaction_log. The transaction log is built upon clustering streams. Clustering streams are per-table message stores that enable data to be propagated across a cluster. HarperDB leverages streams for use with the transaction log. When clustering is enabled all transactions that occur against a table are pushed to its stream, and thus make up the transaction log.

If you would like to use the transaction log, but have not set up clustering yet, please see "How to Cluster".

Transaction Log Operations

read_transaction_log

The read_transaction_log operation returns a prescribed set of records, based on given parameters. The example below will give a maximum of 2 records within the timestamps provided.

{
    "operation": "read_transaction_log",
    "schema": "dev",
    "table": "dog",
    "from": 1598290235769,
    "to": 1660249020865,
    "limit": 2
}

See example response below.

read_transaction_log Response

[
    {
        "operation": "insert",
        "user": "admin",
        "timestamp": 1660165619736,
        "records": [
            {
                "id": 1,
                "dog_name": "Penny",
                "owner_name": "Kyle",
                "breed_id": 154,
                "age": 7,
                "weight_lbs": 38,
                "__updatedtime__": 1660165619688,
                "__createdtime__": 1660165619688
            }
        ]
    },
    {
        "operation": "update",
        "user": "admin",
        "timestamp": 1660165620040,
        "records": [
            {
                "id": 1,
                "dog_name": "Penny B",
                "__updatedtime__": 1660165620036
            }
        ]
    }
]

See example request above.

delete_transaction_logs_before

The delete_transaction_logs_before operation will delete transaction log data according to the given parameters. The example below will delete records older than the timestamp provided.

{
    "operation": "delete_transaction_logs_before",
    "schema": "dev",
    "table": "dog",
    "timestamp": 1598290282817
}

Note: Streams are used for catchup if a node goes down. If you delete messages from a stream there is a chance catchup won't work.

Read on for read_audit_log, the second option, for logging transactions executed against a table.

Jobs

HarperDB Jobs are asynchronous tasks performed by the Operations API.

Job Summary

Jobs uses an asynchronous methodology to account for the potential of a long-running operation. For example, exporting millions of records to S3 could take some time, so that job is started and the id is provided to check on the status.

The job status can be COMPLETE or IN_PROGRESS.

Example Job Operations

Example job operations include:

delete_records_before

export_local

export_to_s3

Example Response from a Job Operation

{
  "message": "Starting job with id 062a1892-6a0a-4282-9791-0f4c93b12e16"
}

Whenever one of these operations is initiated, an asynchronous job is created and the request contains the id of that job which can be used to check on its status.

Managing Jobs

To check on a job's status, use the get_job operation.

Get Job Request

{
    "operation": "get_job",
    "id": "4a982782-929a-4507-8794-26dae1132def"
}

Get Job Response

[
  {
    "__createdtime__": 1611615798782,
    "__updatedtime__": 1611615801207,
    "created_datetime": 1611615798774,
    "end_datetime": 1611615801206,
    "id": "4a982782-929a-4507-8794-26dae1132def",
    "job_body": null,
    "message": "successfully loaded 350 of 350 records",
    "start_datetime": 1611615798805,
    "status": "COMPLETE",
    "type": "csv_url_load",
    "user": "HDB_ADMIN",
    "start_datetime_converted": "2021-01-25T23:03:18.805Z",
    "end_datetime_converted": "2021-01-25T23:03:21.206Z"
  }
]

Finding Jobs

To find jobs (if the id is not know) use the search_jobs_by_start_date operation.

Search Jobs Request

{
    "operation": "search_jobs_by_start_date",
    "from_date": "2021-01-25T22:05:27.464+0000",
    "to_date": "2021-01-25T23:05:27.464+0000"
}

Search Jobs Response

[
  {
    "id": "942dd5cb-2368-48a5-8a10-8770ff7eb1f1",
    "user": "HDB_ADMIN",
    "type": "csv_url_load",
    "status": "COMPLETE",
    "start_datetime": 1611613284781,
    "end_datetime": 1611613287204,
    "job_body": null,
    "message": "successfully loaded 350 of 350 records",
    "created_datetime": 1611613284764,
    "__createdtime__": 1611613284767,
    "__updatedtime__": 1611613287207,
    "start_datetime_converted": "2021-01-25T22:21:24.781Z",
    "end_datetime_converted": "2021-01-25T22:21:27.204Z"
  }
]

Upgrade a HarperDB Instance

This document describes best practices for upgrading self-hosted HarperDB instances. HarperDB can be upgraded using a combination of npm and built-in HarperDB upgrade scripts. Whenever upgrading your HarperDB installation it is recommended you make a backup of your data first. Note: This document applies to self-hosted HarperDB instances only. All will be upgraded by the HarperDB Cloud team.

Upgrading

Upgrading HarperDB is a two-step process. First the latest version of HarperDB must be downloaded from npm, then the HarperDB upgrade scripts will be utilized to ensure the newest features are available on the system.

Install the latest version of HarperDB using npm install -g harperdb.
Note -g should only be used if you installed HarperDB globally (which is recommended).
Run harperdb to initiate the upgrade process.
HarperDB will then prompt you for all appropriate inputs and then run the upgrade directives.

Node Version Manager (nvm)

is an easy way to install, remove, and switch between different versions of Node.js as required by various applications. More information, including directions on installing nvm can be found here: https://nvm.sh/.

HarperDB supports Node.js versions 14.0.0 and higher, however, please check our for our recommended Node.js version. To install a different version of Node.js with nvm, run the command:

To switch to a version of Node run:

To see the current running version of Node run:

With a handful of different versions of Node.js installed, run nvm with the ls argument to list out all installed versions:

When upgrading HarperDB, we recommend also upgrading your Node version. Here we assume you're running on an older version of Node; the execution may look like this:

Switch to the older version of Node that HarperDB is running on (if it is not the current version):

Make sure HarperDB is not running:

Uninstall HarperDB. Note, this step is not required, but will clean up old artifacts of HarperDB. We recommend removing all other HarperDB installations to ensure the most recent version is always running.

Switch to the newer version of Node:

Install HarperDB globally

Run the upgrade script

Start HarperDB

Reference

This section contains technical details and reference materials for HarperDB.

Storage Algorithm

The HarperDB storage algorithm is fundamental to the HarperDB core functionality, enabling the and all other user-facing functionality. HarperDB is built on top of Lightning Memory-Mapped Database (LMDB), a key-value store offering industry leading performance and functionality, which allows for our storage algorithm to store data in tables as rows/objects. This document will provide additional details on how data is stored within HarperDB.

Query Language Agnostic

The HarperDB storage algorithm was designed to abstract the data storage from any individual query language. HarperDB currently supports both SQL and NoSQL on top of this storage algorithm, with the ability to add additional query languages in the future. This means data can be inserted via NoSQL and read via SQL while hitting the same underlying data storage.

ACID Compliant

Utilizing Multi-Version Concurrency Control (MVCC) through LMDB, HarperDB offers ACID compliance independently on each node. Readers and writers operate independently of each other, meaning readers don’t block writers and writers don’t block readers. Each HarperDB table has a single writer process, avoiding deadlocks and assuring that writes are executed in the order in which they were received. HarperDB tables can have multiple reader processes operating at the same time for consistent, high scale reads.

Universally Indexed

All top level attributes are automatically indexed immediately upon ingestion. The reflexively creates both the attribute and index reflexively as new schema metadata comes in. Indexes are agnostic of datatype, honoring the following order: booleans, numbers ordered naturally, strings ordered lexically. Within the LMDB implementation, table records are grouped together into a single LMDB environment file, where each attribute index is a sub-database (dbi) inside said environment file. An example of the indexing scheme can be seen below.

Additional LMDB Benefits

HarperDB inherits both functional and performance benefits by implementing LMDB as the underlying key-value store. Data is memory-mapped, which enables quick data access without data duplication. All writers are fully serialized, making writes deadlock-free. LMDB is built to maximize operating system features and functionality, fully exploiting buffer cache and built to run in CPU cache. To learn more about LMDB, visit their documentation.

HarperDB Indexing Example (Single Table)

Dynamic Schema

HarperDB is built to make data ingestion simple. A primary driver of that is the Dynamic Schema. The purpose of this document is to provide a detailed explanation of the dynamic schema specifically related to schema definition and data ingestion.

The dynamic schema provides the structure of schema and table namespaces while simultaneously providing the flexibility of a data-defined schema. Individual attributes are reflexively created as data is ingested, meaning the table will adapt to the structure of data ingested. HarperDB tracks the metadata around schemas, tables, and attributes allowing for describe table, describe schema, and describe all operations.

Schemas

HarperDB schemas are analogous to a namespace that groups tables together. A schema is required to create a table.

Tables

HarperDB tables group records together with a common data pattern. To create a table users must provide a table name and a primary key.

Table Name: Used to identify the table.
Primary Key: This is a required attribute that serves as the unique identifier for a record and is also known as the hash_attribute in HarperDB.

Primary Key

The primary key (also referred to as the hash_attribute) is used to uniquely identify records. Uniqueness is enforced on the primary; inserts with the same primary key will be rejected. If a primary key is not provided on insert, a GUID will be automatically generated and returned to the user. The utilizes this value for indexing.

Standard Attributes

Additional attributes are reflexively added via insert and update operations (in both SQL and NoSQL) when new attributes are included in the data structure provided to HarperDB. As a result, schemas are additive, meaning new attributes are created in the underlying storage algorithm as additional data structures are provided. HarperDB offers create_attribute and drop_attribute operations for users who prefer to manually define their data model independent of data ingestion. When new attributes are added to tables with existing data the value of that new attribute will be assumed null for all existing records.

Audit Attributes

HarperDB automatically creates two audit attributes used on each record.

__createdtime__: The time the record was created in format.
__updatedtime__: The time the record was updated in format.

Dynamic Schema Example

To better understand the behavior let’s take a look at an example. This example utilizes .

Create a Schema

Create a Table

Notice the schema name, table name, and hash attribute name are the only required parameters.

At this point the table does not have structure beyond what we provided, so the table looks like this:

dev.dog

Insert Record

To define attributes we do not need to do anything beyond sending them in with an insert operation.

With a single record inserted and new attributes defined, our table now looks like this:

dev.dog

Indexes have been automatically created for dog_name and owner_name attributes.

Insert Additional Record

If we continue inserting records with the same data schema no schema updates are required. One record will omit the hash attribute from the insert to demonstrate GUID generation.

In this case, there is no change to the schema. Our table now looks like this:

dev.dog

Update Existing Record

In this case, we will update a record with a new attribute not previously defined on the table.

Now we have a new attribute called weight_lbs. Our table now looks like this:

dev.dog

Query Table with SQL

Now if we query for all records where weight_lbs is null we expect to get back two records.

This results in the expected two records being returned.

HarperDB Headers

All HarperDB API responses include headers that are important for interoperability and debugging purposes. The following headers are returned with all HarperDB API responses:

Key

Example Value

Description

server-timing

db;dur=7.165

This reports the duration of the operation, in milliseconds. This follows the standard for Server-Timing and can be consumed by network monitoring tools.

hdb-response-time

7.165

This is the legacy header for reporting response time. It is deprecated and will be removed in 4.2.

content-type

application/json

This reports the MIME type of the returned content, which is negotiated based on the requested content type in the Accept header.

HarperDB Limits

This document outlines limitations of HarperDB.

Schema Naming Restrictions

Case Sensitivity

HarperDB schema metadata (schema names, table names, and attribute/column names) are case sensitive. Meaning schemas, tables, and attributes can differ only by the case of their characters.

Restrictions on Schema Metadata Names

HarperDB schema metadata (schema names, table names, and attribute names) cannot contain the following UTF-8 characters:

Additionally, they cannot contain the first 31 non-printing characters. Spaces are allowed, but not recommended as best practice. The regular expression used to verify a name is valid is:

Table Limitations

Attribute Maximum

HarperDB limits number of attributes to 10,000 per table.

Support

HarperDB support is available with all paid instances. Support tickets are managed via our Zendesk portal. Once a ticket is submitted the HarperDB team will triage your request and get back to you as soon as possible. Additionally, you can join our Slack community where HarperDB team members and others in the community are frequently active to help answer questions.

Common Issues

1 Gigabyte Limit to Request Bodies

HarperDB supports the body of a request to be up to 1 GB in size. This limit does not impact the CSV file import function the reads from the local file system or from an external URL. We recommend if you do need to bulk import large record sets that you utilize the CSV import function, especially if you run up on the 1 GB body size limit. Documentation for these functions can be found here.

Do not install as sudo

HarperDB should be installed using a specific user for HarperDB. This allows you to restrict the permissions that user has and who has access to the HarperDB file system. The reason behind this is that HarperDB files are written directly to the file system, and by using a specific HarperDB user this gives you granular control over who has access to these files.

Error: Must execute as User

You may have gotten an error like, Error: Must execute as <<username>>. This means that you installed HarperDB as <<user>>. Because HarperDB stores files directly to the file system, we only allow the HarperDB executable to be run by a single user. This prevents permissions issues on files. For example if you installed as user_a, but later wanted to run as user_b. User_b may not have access to the database files HarperDB needs. This also keeps HarperDB more secure as it allows you to lock files down to a specific user and prevents other users from accessing your files.

Frequently Asked Questions (FAQs)

What operating system should I use to run HarperDB?

All major operating systems: Linux, Windows, and macOS. However, running HarperDB on Windows and macOS is intended only for development and evaluation purposes. Linux is strongly recommended for production use.

How are HarperDB’s SQL and NoSQL capabilities different from other solutions?

Many solutions offer NoSQL capability and separate processing for SQL such as in-memory transformation or multi-model support. HarperDB’s unique mechanism for storing each data attribute individually allows for performing NoSQL and SQL operations in real-time on the stored data set.

How does HarperDB ensure high availability and consistency?

HarperDB's clustering and replication capabilities allow high availability and fault-tolerance; if a server goes down, traffic can be quickly routed to other HarperDB servers that can service requests. HarperDB's replication uses a consistent resolution strategy (last-write-wins by logical timestamp), to ensure eventual consistency. HarperDB offers auditing capabilities that can be enabled to preserve a record of all changes so that mistakes or even malicious data changes are recorded and can be reverted.

Is HarperDB ACID-compliant?

HarperDB operations are atomic, consist, and isolated per instance. This means that any query will provide an isolated consistent snapshot view of the database (based on when the query started. Updating and insertion operations are also performed atomically; any reads and writes are performed within an atomic, isolated transaction with serialization isolation level, and will rollback if it can not be fully completed successfully. Data is immediately flushed to disk after a write to ensure eventual durability. ACID compliance is not guaranteed across instances in a cluster, rather the eventual consistency will propagate changes with last-write-wins (by last logical timestamp) resolution.

How Does HarperDB Secure My Data?

HarperDB has role and user based security allowing you to simply and easily control that the right people have access to your data. We also implement a number of authentication mechanisms to ensure the transactions submitted are trusted and secure.

Is HarperDB row or column oriented?

HarperDB can be considered column oriented, however, the exploded data model creates an interface that is free from either of these orientations. A user can search and update with columnar benefits and be as ACID as row oriented restrictions.

What do you mean when you say HarperDB is single model?

HarperDB takes every attribute of a database table object and creates a key:value for both the key and its corresponding value. For example, the attribute eye color will be represented by a key “eye-color” and the corresponding value “green” will be represented by a key with the value “green”. We use LMDB’s lightning-fast key:value store to underpin all these interrelated keys and values, meaning that every “column” is automatically indexed, and you get huge performance in a tiny package.

Are Primary Keys Case-Sensitive?

When using HarperDB, primary keys are case-sensitive. This can cause confusion for developers. For example, if you have a user table, it might make sense to use user.email as the primary key. This can cause problems as [email protected] and [email protected] would be seen as two different records. We recommend enforcing case on keys within your app to avoid this issue.

How Do I Move My HarperDB Data Directory?

HarperDB’s data directory can be moved from one location to another by simply updating the rootPath in the config file (where the data lives, which you specified during installation) to a new location.

Next, edit HarperDB’s hdb_boot_properties.file to point HarperDB to the new location by updating the settings_path variable. Substitute the NEW_HDB_ROOT variable in the snippets below with the new path to your new data directory, making sure you escape any slashes.

On MacOS/OSX

sed -i '' -E 's/^(settings_path[[:blank:]]*=[[:blank:]]*).*/\1NEW_HDB_ROOT\/harperdb-config.yaml/' ~/.harperdb/hdb_boot_properties.file

On Linux

sed -i -E 's/^(settings_path[[:blank:]]*=[[:blank:]]*).*/\1NEW_HDB_ROOT\/harperdb-config.yaml/' ~/hdb_boot_properties.file

Finally, edit the config file in the root folder you just moved:

Edit the rootPath parameter to reflect the new location of your data directory.

HarperDB Tucker (Version 4)

Did you know our release names are dedicated to employee pups? For our fourth release, we have Tucker.

G’day, I’m Tucker. My dad is David Cockerill, a software engineer here at HarperDB. I am a 3-year-old Labrador Husky mix. I love to protect my dad from all the squirrels and rabbits we have in our yard. I have very ticklish feet and love belly rubs!

4.1.0

HarperDB 4.1 introduces the ability to use worker threads for concurrently handling HTTP requests. Previously this was handled by processes. This shift provides important benefits in terms of better control of traffic delegation with support for optimized load tracking and session affinity, better debuggability, and reduced memory footprint.

This means debugging will be much easier for custom functions. If you install/run HarperDB locally, most modern IDEs like WebStorm and VSCode support worker thread debugging, so you can start HarperDB in your IDE, and set breakpoints in your custom functions and debug them.

The associated routing functionality now includes session affinity support. This can be used to consistently route users to the same thread which can improve caching locality, performance, and fairness. This can be enabled in with the http.sessionAffinity option in your configuration.

HarperDB 4.1's NoSQL query handling has been revamped to consistently use iterators, which provide an extremely memory efficient mechanism for directly streaming query results to the network as the query results are computed. This results in faster Time to First Byte (TTFB) (only the first record/value in a query needs to be computed before data can start to be sent), and less memory usage during querying (the entire query result does not need to be stored in memory). These iterators are also available in query results for custom functions and can provide means for custom function code to iteratively access data from the database without loading entire results. This should be a completely transparent upgrade, all HTTP APIs function the same, with the one exception that custom functions need to be aware that they can't access query results by [index] (they should use array methods or for-in loops to handle query results).

4.1 includes configuration options for specifying the location of database storage files. This allows you to specifically locate database directories and files on different volumes for better flexibility and utilization of disks and storage volumes. See the storage configuration and schemas configuration for information on how to configure these locations.

Logging has been revamped and condensed into one hdb.log file. See logging for more information.

A new operation called cluster_network was added, this operation will ping the cluster and return a list of enmeshed nodes.

Custom Functions will no longer automatically load static file routes, instead the @fastify/static plugin will need to be registered with the Custom Function server. See Host A Static Web UI-static.

Updates to S3 import and export mean that these operations now require the bucket region in the request. Also, if referencing a nested object it should be done in the key parameter. See examples here.

Due to the AWS SDK v2 reaching end of life support we have updated to v3. This has caused some breaking changes in our operations import_from_s3 and export_to_s3:

A new attribute region will need to be supplied
The bucket attribute can no longer have trailing slashes. Slashes will now need to be in the key.

Starting HarperDB without any command (just harperdb) now runs HarperDB like a standard process, in the foreground. This means you can use standard unix tooling for interacting with the process and is conducive for running HarperDB with systemd or any other process management tool. If you wish to have HarperDB launch itself in separate background process (and immediately terminate the shell process), you can do so by running harperdb start.

Internal Tickets completed:

CORE-609 - Ensure that attribute names are always added to global schema as Strings
CORE-1549 - Remove fastify-static code from Custom Functions server which auto serves content from "static" folder
CORE-1655 - Iterator based queries
CORE-1764 - Fix issue where describe_all operation returns an empty object for non super-users if schema(s) do not yet have table(s)
CORE-1854 - Switch to using worker threads instead of processes for handling concurrency
CORE-1877 - Extend the csv_url_load operation to allow for additional headers to be passed to the remote server when the csv is being downloaded
CORE-1893 - Add last updated timestamp to describe operations
CORE-1896 - Fix issue where Select * from system.hdb_info returns wrong HDB version number after Instance Upgrade
CORE-1904 - Fix issue when executing GEOJSON query in SQL
CORE-1905 - Add HarperDB YAML configuration setting which defines the storage location of NATS streams
CORE-1906 - Add HarperDB YAML configuration setting defining the storage location of tables.
CORE-1655 - Streaming binary format serialization
CORE-1943 - Add configuration option to set mount point for audit tables
CORE-1921 - Update NATS transaction lifecycle to handle message deduplication in work queue streams.
CORE-1963 - Update logging for better readability, reduced duplication, and request context information.
CORE-1968 - In server\nats\natsIngestService.js remove the js_msg.working(); line to improve performance.
CORE-1976 - Fix error when calling describe_table operation with no schema or table defined in payload.
CORE-1983 - Fix issue where create_attribute operation does not validate request for required attributes
CORE-2015 - Remove PM2 logs that get logged in console when starting HDB
CORE-2048 - systemd script for 4.1
CORE-2052 - Include thread information in system_information for visibility of threads
CORE-2061 - Add a better error msg when clustering is enabled without a cluster user set
CORE-2068 - Create new log rotate logic since pm2 log-rotate no longer used
CORE-2072 - Update to Node 18.15.0
CORE-2090 - Upgrade Testing from v4.0.x and v3.x to v4.1.
CORE-2091 - Run the performance tests
CORE-2092 - Allow for automatic patch version updates of certain packages
CORE-2109 - Add verify option to clustering TLS configuration
CORE-2111 - Update AWS SDK to v3

4.0.6

03/09/2023

Bug Fixes

Fixed a data serialization error that occurs when a large number of different record structures are persisted in a single table.

4.0.5

02/15/2023

Bug Fixes

CORE-2029 Improved the upgrade process for handling existing user TLS certificates and correctly configuring TLS settings. Added a prompt to upgrade to determine if new certificates should be created or existing certificates should be kept/used.
Fix the way NATS connections are honored in a local environment.
Do not define the certificate authority path to NATS if it is not defined in the HarperDB config.

4.0.4

01/27/2023

Bug Fixes

CORE-2009 Fixed bug where add node was not being called when upgrading clustering.

4.0.3

01/26/2023

Bug Fixes

CORE-2007 Add update nodes 4.0.0 launch script to build script to fix clustering upgrade.

4.0.2

01/24/2023

Bug Fixes

CORE-2003 Fix bug where if machine had one core thread config would default to zero.
Update to lmdb 2.7.3 and msgpackr 1.7.0

4.0.1

01/20/2023

Bug Fixes

CORE-1992 Local studio was not loading because the path got mangled in the build.
CORE-2001 Fixed deploy_custom_function_project after node update broke it.

Configuration File

HarperDB is configured through a YAML file called harperdb-config.yaml located in the operations API root directory (by default this is a directory named hdb located in the home directory of the current user).

All available configuration will be populated by default in the config file on install, regardless of whether it is used.

Using the Configuration File and Naming Conventions

The configuration elements in harperdb-config.yaml use camelcase: operationsApi.

To change a configuration value edit the harperdb-config.yaml file and save any changes. HarperDB must be restarted for changes to take effect.

Alternately, configuration can be changed via environment and/or command line variables or via the API. To access lower level elements, use underscores to append parent/child elements (when used this way elements are case insensitive):

- Environment variables: `OPERATIONSAPI_NETWORK_PORT=9925`
- Command line variables: `--OPERATIONSAPI_NETWORK_PORT 9925`
- Calling `set_configuration` through the API: `operationsApi_network_port: 9925`

Configuration Options

`clustering`

The clustering section configures the clustering engine, this is used to replicate data between instances of HarperDB.

Clustering offers a lot of different configurations, however in a majority of cases the only options you will need to pay attention to are:

clustering.enabled Enable the clustering processes.
clustering.hubServer.cluster.network.port The port other nodes will connect to. This port must be accessible from other cluster nodes.
clustering.hubServer.cluster.network.routesThe connections to other instances.
clustering.nodeName The name of your node, must be unique within the cluster.
clustering.user The name of the user credentials used for Inter-node authentication.

enabled - Type: boolean; Default: false

Enable clustering.

Note: If you enabled clustering but do not create and add a cluster user you will get a validation error. See user description below on how to add a cluster user.

clustering:
  enabled: true

clustering.hubServer.cluster

Clustering’s hubServer facilitates the HarperDB mesh network and discovery service.

clustering:
  hubServer:
    cluster:
      name: harperdb
      network:
        port: 9932
        routes:
          - host: 3.62.184.22
            port: 9932
          - host: 3.735.184.8
            port: 9932

name - Type: string, Default: harperdb

The name of your cluster. This name needs to be consistent for all other nodes intended to be meshed in the same network.

port - Type: integer, Default: 9932

The port the hub server uses to accept cluster connections

routes - Type: array, Default: null

An object array that represent the host and port this server will cluster to. Each object must have two properties port and host. Multiple entries can be added to create network resiliency in the event one server is unavailable. Routes can be added, updated and removed either by directly editing the harperdb-config.yaml file or by using the cluster_set_routes or cluster_delete_routes API endpoints.

host - Type: string

The host of the remote instance you are creating the connection with.

port - Type: integer

The port of the remote instance you are creating the connection with. This is likely going to be the clustering.hubServer.cluster.network.port on the remote instance.

clustering.hubServer.leafNodes

clustering:
  hubServer:
    leafNodes:
      network:
        port: 9931

port - Type: integer; Default: 9931

The port the hub server uses to accept leaf server connections.

clustering.hubServer.network

clustering:
  hubServer:
    network:
      port: 9930

port - Type: integer; Default: 9930

Use this port to connect a client to the hub server, for example using the NATs SDK to interact with the server.

clustering.leafServer

Manages streams, streams are ‘message stores’ that store table transactions.

clustering:
  leafServer:
    network:
      port: 9940
      routes:
        - host: 3.62.184.22
          port: 9931
        - host: node3.example.com
          port: 9931
    streams:
      maxAge: 3600
      maxBytes: 10000000
      maxMsgs: 500
      path: /user/hdb/clustering/leaf

port - Type: integer; Default: 9940

Use this port to connect a client to the leaf server, for example using the NATs SDK to interact with the server.

routes - Type: array; Default: null

An object array that represent the host and port the leaf node will directly connect with. Each object must have two properties port and host. Unlike the hub server, the leaf server will establish connections to all listed hosts. Routes can be added, updated and removed either by directly editing the harperdb-config.yaml file or by using the cluster_set_routes or cluster_delete_routes API endpoints.

host - Type: string

The host of the remote instance you are creating the connection with.

port - Type: integer

The port of the remote instance you are creating the connection with. This is likely going to be the clustering.hubServer.cluster.network.port on the remote instance.

clustering.leafServer.streams

maxAge - Type: integer; Default: null

The maximum age of any messages in the stream, expressed in seconds.

maxBytes - Type: integer; Default: null

The maximum size of the stream in bytes. Oldest messages are removed if the stream exceeds this size.

maxMsgs - Type: integer; Default: null

How many messages may be in a stream. Oldest messages are removed if the stream exceeds this number.

path - Type: string; Default: <ROOTPATH>/clustering/leaf

The directory where all the streams are kept.

logLevel - Type: string; Default: error

Control the verbosity of clustering logs.

clustering:
  logLevel: error

There exists a log level hierarchy in order as trace, debug, info, warn, and error. When the level is set to trace logs will be created for all possible levels. Whereas if the level is set to warn, the only entries logged will be warn and error. The default value is error.

nodeName - Type: string; Default: null

The name of this node in your HarperDB cluster topology. This must be a value unique from the rest of the cluster node names.

Note: If you want to change the node name make sure there are no subscriptions in place before doing so. After the name has been changed a full restart is required.

clustering:
  nodeName: great_node

tls

Transport Layer Security default values are automatically generated on install.

clustering:
  tls:
    certificate: ~/hdb/keys/certificate.pem
    certificateAuthority: ~/hdb/keys/ca.pem
    privateKey: ~/hdb/keys/privateKey.pem
    insecure: true
    verify: true

certificate - Type: string; Default: <ROOTPATH>/keys/certificate.pem

Path to the certificate file.

certificateAuthority - Type: string; Default: <ROOTPATH>/keys/ca.pem

Path to the certificate authority file.

privateKey - Type: string; Default: <ROOTPATH>/keys/privateKey.pem

Path to the private key file.

insecure - Type: boolean; Default: true

When true, will skip certificate verification. For use only with self-signed certs.

republishMessages - Type: boolean; Default: true

When true, all transactions that are received from other nodes are republished to this node's stream. When subscriptions are not fully connected between all nodes, this ensures that messages are routed to all nodes through intermediate nodes. This also ensures that all writes, whether local or remote, are written to the NATS transaction log. However, there is additional overhead with republishing, and setting this is to false can provide better data replication performance. When false, you need to ensure all subscriptions are fully connected between every node to every other node, and be aware that the NATS transaction log will only consist of local writes.

verify - Type: boolean; Default: true

When true, hub server will verify client certificate using the CA certificate.

user - Type: string; Default: null

The username given to the cluster_user. All instances in a cluster must use the same clustering user credentials (matching username and password).

Inter-node authentication takes place via a special HarperDB user role type called cluster_user.

The user can be created either through the API using an add_user request with the role set to cluster_user, or on install using environment variables CLUSTERING_USER=cluster_person CLUSTERING_PASSWORD=pass123! or CLI variables harperdb --CLUSTERING_USER cluster_person --CLUSTERING_PASSWORD pass123!

clustering:
  user: cluster_person

`customFunctions`

The customFunctions section configures HarperDB Custom Functions.

enabled - Type: boolean; Default: true

Enable the Custom Function server or not.

customFunctions:
  enabled: true

customFunctions.network

customFunctions:
  network:
    cors: true
    corsAccessList:
      - null
    headersTimeout: 60000
    https: false
    keepAliveTimeout: 5000
    port: 9926
    timeout: 120000

cors - Type: boolean; Default: true

Enable Cross Origin Resource Sharing, which allows requests across a domain.

corsAccessList - Type: array; Default: null

An array of allowable domains with CORS

headersTimeout - Type: integer; Default: 60,000 milliseconds (1 minute)

Limit the amount of time the parser will wait to receive the complete HTTP headers with.

https - Type: boolean; Default: false

Enables HTTPS on the Custom Functions API. This requires a valid certificate and key. If false, Custom Functions will run using standard HTTP.

keepAliveTimeout - Type: integer; Default: 5,000 milliseconds (5 seconds)

Sets the number of milliseconds of inactivity the server needs to wait for additional incoming data after it has finished processing the last response.

port - Type: integer; Default: 9926

The port used to access the Custom Functions server.

timeout - Type: integer; Default: Defaults to 120,000 milliseconds (2 minutes)

The length of time in milliseconds after which a request will timeout.

nodeEnv - Type: string; Default: production

Allows you to specify the node environment in which application will run.

customFunctions:
  nodeEnv: production

production native node logging is kept to a minimum; more caching to optimize performance. This is the default value.
development more native node logging; less caching.

root - Type: string; Default: <ROOTPATH>/custom_functions

The path to the folder containing Custom Function files.

customFunctions:
  root: ~/hdb/custom_functions

tls Transport Layer Security

customFunctions:
  tls:
    certificate: ~/hdb/keys/certificate.pem
    certificateAuthority: ~/hdb/keys/ca.pem
    privateKey: ~/hdb/keys/privateKey.pem

certificate - Type: string; Default: <ROOTPATH>/keys/certificate.pem

Path to the certificate file.

certificateAuthority - Type: string; Default: <ROOTPATH>/keys/ca.pem

Path to the certificate authority file.

privateKey - Type: string; Default: <ROOTPATH>/keys/privateKey.pem

Path to the private key file.

`ipc`

The ipc section configures the HarperDB Inter-Process Communication interface.

ipc:
  network:
    port: 9383

port - Type: integer; Default: 9383

The port the IPC server runs on. The default is 9383.

`localStudio`

The localStudio section configures the local HarperDB Studio, a simplified GUI for HarperDB hosted on the server. A more comprehensive GUI is hosted by HarperDB at https://studio.harperdb.io. Note, all database traffic from either localStudio or HarperDB Studio is made directly from your browser to the instance.

enabled - Type: boolean; Default: false

Enabled the local studio or not.

localStudio:
  enabled: false

`logging`

The logging section configures HarperDB logging across all HarperDB functionality. HarperDB leverages pm2 for logging. Each process group gets their own log file which is located in logging.root.

auditLog - Type: boolean; Default: false

Enabled table transaction logging.

logging:
  auditLog: false

To access the audit logs, use the API operation read_audit_log. It will provide a history of the data, including original records and changes made, in a specified table.

{
  "operation": "read_audit_log",
  "schema": "dev",
  "table": "dog"
}

file - Type: boolean; Default: true

Defines whether or not to log to a file.

logging:
  file: true

level - Type: string; Default: error

Control the verbosity of logs.

logging:
  level: error

There exists a log level hierarchy in order as trace, debug, info, warn, error, fatal, and notify. When the level is set to trace logs will be created for all possible levels. Whereas if the level is set to fatal, the only entries logged will be fatal and notify. The default value is error.

root - Type: string; Default: <ROOTPATH>/log

The path where the log files will be written.

logging:
  root: ~/hdb/log

rotation

Rotation provides the ability for a user to systematically rotate and archive the hdb.log file. To enable interval and/or maxSize must be set.

Note: interval and maxSize are approximates only. It is possible that the log file will exceed these values slightly before it is rotated.

logging:
  rotation:
    enabled: true
    compress: false
    interval: 1D
    maxSize: 100K
    path: /user/hdb/log

enabled - Type: boolean; Default: false

Enables logging rotation.

compress - Type: boolean; Default: false

Enables compression via gzip when logs are rotated.

interval - Type: string; Default: null

The time that should elapse between rotations. Acceptable units are D(ays), H(ours) or M(inutes).

maxSize - Type: string; Default: null

The maximum size the log file can reach before it is rotated. Must use units M(egabyte), G(igabyte), or K(ilobyte).

path - Type: string; Default: <ROOTPATH>/log

Where to store the rotated log file. File naming convention is HDB-YYYY-MM-DDT-HH-MM-SSSZ.log.

stdStreams - Type: boolean; Default: false

Log HarperDB logs to the standard output and error streams. The operationsApi.foreground flag must be enabled in order to receive the stream.

logging:
  stdStreams: false

`operationsApi`

The operationsApi section configures the HarperDB Operations API.

authentication

operationsApi:
  authentication:
    operationTokenTimeout: 1d
    refreshTokenTimeout: 30d

operationTokenTimeout - Type: string; Default: 1d

Defines the length of time an operation token will be valid until it expires. Example values: https://github.com/vercel/ms.

refreshTokenTimeout - Type: string; Default: 1d

Defines the length of time a refresh token will be valid until it expires. Example values: https://github.com/vercel/ms.

foreground - Type: boolean; Default: false

Determines whether or not HarperDB runs in the foreground.

operationsApi:
  foreground: false

network

operationsApi:
  network:
    cors: true
    corsAccessList:
      - null
    headersTimeout: 60000
    https: false
    keepAliveTimeout: 5000
    port: 9925
    timeout: 120000

cors - Type: boolean; Default: true

Enable Cross Origin Resource Sharing, which allows requests across a domain.

corsAccessList - Type: array; Default: null

An array of allowable domains with CORS

headersTimeout - Type: integer; Default: 60,000 milliseconds (1 minute)

Limit the amount of time the parser will wait to receive the complete HTTP headers with.

https - Type: boolean; Default: false

Enable HTTPS on the HarperDB operations endpoint. This requires a valid certificate and key. If false, HarperDB will run using standard HTTP.

keepAliveTimeout - Type: integer; Default: 5,000 milliseconds (5 seconds)

Sets the number of milliseconds of inactivity the server needs to wait for additional incoming data after it has finished processing the last response.

port - Type: integer; Default: 9925

The port the HarperDB operations API interface will listen on.

timeout - Type: integer; Default: Defaults to 120,000 milliseconds (2 minutes)

The length of time in milliseconds after which a request will timeout.

nodeEnv - Type: string; Default: production

Allows you to specify the node environment in which application will run.

operationsApi:
  nodeEnv: production

production native node logging is kept to a minimum; more caching to optimize performance. This is the default value.
development more native node logging; less caching.

tls

This configures the Transport Layer Security for HTTPS support.

operationsApi:
  tls:
    certificate: ~/hdb/keys/certificate.pem
    certificateAuthority: ~/hdb/keys/ca.pem
    privateKey: ~/hdb/keys/privateKey.pem

certificate - Type: string; Default: <ROOTPATH>/keys/certificate.pem

Path to the certificate file.

certificateAuthority - Type: string; Default: <ROOTPATH>/keys/ca.pem

Path to the certificate authority file.

privateKey - Type: string; Default: <ROOTPATH>/keys/privateKey.pem

Path to the private key file.

`http`

threads - Type: number; Default: One less than the number of logical cores/ processors

The threads option specifies the number of threads that will be used to service the HTTP requests for the operations API and custom functions. Generally, this should be close to the number of CPU logical cores/processors to ensure the CPU is fully utilized (a little less because HarperDB does have other threads at work), assuming HarperDB is the main service on a server.

http:
  threads: 11

sessionAffinity - Type: string; Default: null

HarperDB is a multi-threaded server designed to scale to utilize many CPU cores with high concurrency. Session affinity can help improve the efficiency and fairness of thread utilization by routing multiple requests from the same client to the same thread. This provides a fairer method of request handling by keeping a single user contained to a single thread, can improve caching locality (multiple requests from a single user are more likely to access the same data), and can provide the ability to share information in-memory in user sessions. Enabling session affinity will cause subsequent requests from the same client to be routed to the same thread.

To enable sessionAffinity, you need to specify how clients will be identified from the incoming requests. If you are using HarperDB to directly serve HTTP requests from users from different remote addresses, you can use a setting of ip. However, if you are using HarperDB behind a proxy server or application server, all the remote ip addresses will be the same and HarperDB will effectively only run on a single thread. Alternately, you can specify a header to use for identification. If you are using basic authentication, you could use the "Authorization" header to route requests to threads by the user's credentials. If you have another header that uniquely identifies users/clients, you can use that as the value of sessionAffinity. But be careful to ensure that the value does provide sufficient uniqueness and that requests are effectively distributed to all the threads and fully utilizing all your CPU cores.

http:
  sessionAffinity: ip

`rootPath`

rootPath - Type: string; Default: home directory of the current user

The HarperDB database and applications/API/interface are decoupled from each other. The rootPath directory specifies where the HarperDB application persists data, config, logs, and Custom Functions.

rootPath: /Users/jonsnow/hdb

`storage`

writeAsync - Type: boolean; Default: false

The writeAsync option turns off disk flushing/syncing, allowing for faster write operation throughput. However, this does not provide storage integrity guarantees, and if a server crashes, it is possible that there may be data loss requiring restore from another backup/another node.

storage:
  writeAsync: false

caching - Type: boolean; Default: true

The caching option enables in-memory caching of records, providing faster access to frequently accessed objects. This can incur some extra overhead for situations where reads are extremely random and don't benefit from caching.

storage:
  caching: true

compression - Type: boolean; Default: false

The compression option enables compression of records in the database. This can be helpful for very large databases in reducing storage requirements and potentially allowing more data to be cached. This uses the very fast LZ4 compression algorithm, but this still incurs extra costs for compressing and decompressing.

storage:
  compression: false

noReadAhead - Type: boolean; Default: true

The noReadAhead option advises the operating system to not read ahead when reading from the database. This provides better memory utilization, except in situations where large records are used or frequent range queries are used.

storage:
  noReadAhead: true

prefetchWrites - Type: boolean; Default: true

The prefetchWrites option loads data prior to write transactions. This should be enabled for databases that are larger than memory (although it can be faster to disable this for smaller databases).

storage:
  prefetchWrites: true

path - Type: string; Default: <rootPath>/schema

The path configuration sets where all database files should reside.

storage:
  path: /users/harperdb/storage

Note: This configuration applies to all database files, which includes system tables that are used internally by HarperDB. For this reason if you wish to use a non default path value you must move any existing schemas into your path location. Existing schemas is likely to include the system schema which can be found at <rootPath>/schema/system.

`schemas`

The schemas section is an optional configuration that can be used to define where database files should reside down to the table level. This configuration should be set before the schema and table have been created. The configuration will not create the directories in the path, that must be done by the user.

To define where a schema and all its tables should reside use the name of your schema and the path parameter.

schemas:
  nameOfSchema:
    path: /path/to/schema

To define where specific tables within a schema should reside use the name of your schema, the tables parameter, the name of your table and the path parameter.

schemas:
  nameOfSchema:
    tables:
      nameOfTable:
        path: /path/to/table

This same pattern can be used to define where the audit log database files should reside. To do this use the auditPath parameter.

schemas:
  nameOfSchema:
    auditPath: /path/to/schema

Setting the schemas section through the command line, environment variables or API

When using command line variables,environment variables or the API to configure the schemas section a slightly different convention from the regular one should be used. To add one or more configurations use a JSON object array.

Using command line variables:

--SCHEMAS [{\"nameOfSchema\":{\"tables\":{\"nameOfTable\":{\"path\":\"\/path\/to\/table\"}}}}]

Using environment variables:

SCHEMAS=[{"nameOfSchema":{"tables":{"nameOfTable":{"path":"/path/to/table"}}}}]

Using the API:

{
  "operation": "set_configuration",
  "schemas": [{
    "nameOfSchema": {
      "tables": {
        "nameOfTable": {
          "path": "/path/to/table"
        }
      }
    }
  }]
}