v5-migration

Harper version 5.0 includes many updates to provide a cleaner, more consistent and secure environment. However, there are some breaking changes, and users should review the migration guide for details on how to update their applications. Note that applications that have race conditions that are prone to timing or rely on undocumented features or bugs are always prone to breakage at any point, including major version upgrades. This document describes the important changes to make for applications correctly built in documented APIs.

Naming Changes

HarperDB now uses the name Harper, not HarperDB. And this change is reflected in the package name. So Harper should now be launched with: The open source edition can be run with: npm i -g harper And the pro edition can be run with: npm i -g @harperfast/harper-pro

Application code should import from the harper package instead of harperdb:

import { tables } from 'harper';

Package Installation Install Scripts

By default, Harper now uses the --ignore-scripts flag when installing packages to prevent against accidental execution of scripts, which can be a significant security risk. If you are installing applications that require installation scripts to be executed (sometimes necessary for installing additional binaries for execution), use the allowInstallScripts option when deploying.

`Table.get` return value

The return value of Table.get has been changed to return a record object instead of an instance of the table class (previously this behavior only occured in classes that had set static loadAsInstance=false). This means that the returned object will not have all the table instance methods available. Most functionality is still available through the Table class. One notable method that had been commonly used is wasLoadedFromSource. Information about whether the request was fulfilled from cache or origin is now available on the request target object. For example, if you have existing code like:

const record = await Table.get(id);
// old method:
if (record.wasLoadedFromSource()) {
	// record was loaded from origin (not cache)
}

You should update this code to:

const target = new RequestTarget(); // note that this is passed in if you are overriding the `get` method
target.id = id;
const record = await Table.get(target);
// new way of checking if it was loaded from source:
if (target.loadedFromSource) {
	// record was loaded from origin (not cache)
}

The record objects do have getUpdatedTime and getExpiresAt methods available.

Frozen Records

The record object is also frozen. This means that you cannot add or remove properties from the record object, and if you want a modified version of the record, you must create or copy a new one. For example if you had code:

const record = await Table.get(id);
record.property = 'changed';

You would need to change this to:

let record = await Table.get(id);
record = { ...record, property: 'changed' };

Transactions and Context

With RocksDB, transactions are now fully supported through the storage engine, providing a consistent ability to read and query data that has been written to the transaction. This does result in behavioral changes if code had previously not expected written data to be visible in queries until after a commit.

Harper v5 now uses asynchronous context tracking to automatically preserve context and the current transaction across calls and asynchronous operations. Context is used to track the current transaction. Previously, transactions were only applied to calls to other tables if they were explicitly included in the arguments. Now context is implicitly and automatically carried to other calls (this was also behavior in v4.x with static loadAsInstance=false). Previous code may have omitted context to another table call to exclude it from a transaction. Code should be updated to explicitly commit/finish a transaction to see new visible data or start a new transaction. For example, if you had a function that polled to determine when a record was updated:

import { setTimeout as delay } from 'node:timers/promises';
class MyResource {
	static async get(target) {
		// this function is within a transaction, with a consistent snapshot of data that won't change, but previous code could
		// call Table.get without a context, it would not use the current transaction and would instead get the latest data
		while ((await Table.get(target)).status !== 'ready') {
			delay(100);
		}
		return Table.get(target);
	}
}

Now the internal Table.get will automatically use the current transaction, which will never change and won't receive updated data. So we should explicitly commit the transaction to see the updated data and/or start a new transaction for each get request to see the latest data:

import { setTimeout as delay } from 'node:timers/promises';
import { getContext } from 'harper';
class MyResource {
	static async get(target) {
		// this function is still within a transaction, with a consistent snapshot of data that won't change, but we should
		// explicitly commit the transaction to see the updated data
		await getContext().transaction.commit();
		// now we can call Table.get and it will read the latest data.
		// we could also explicitly start a new transaction here for each get:
		while ((await transaction(() => Table.get(target))).status !== 'ready') {
			delay(100);
		}
		return Table.get(target);
	}
}

Automatic context tracking can greatly simplify code and automatically handling transactions, but there are subtle shifts in logic and explicitly committing/finishing transactions is important if you are executing code outside the context of a Harper request.

Spawning new processes (via `node:child_process`)

The ability to spawn new processes is a dangerous pathway for exploitation and security vulnerabilities. Additionally, spawning processes from multiple threads presents unique challenges and hazards. In Harper version 5, spawning new processes (through node's child_process module) is more tightly controlled and managed. First, any spawn, exec, or execFile may only spawn executables or commands that have been registered in the applications.allowedSpawnCommands configuration. This provides a much more secure evironment, preventing malicious intrusions. Second, it is common to attempt to use spawn child processes with the expectations of code that is written to run in a single thread for an indefinite period of time. However, Harper runs multiple threads that may frequently be restarted. When attempting to start/run a supporting process, spawning every time a module loads leads multiplication of processes and orphaned processes. Harper now manages the spawning process to ensure a single process is spawned. To ensure that only a single process is started, the spawn, exec, etc. functions require a name property in the options argument, to create a named process that other threads can check and omit starting a new process if one is already started. If you really want to start a separate process from a previously started process, a new name must be provided.

Response Objects

Harper has expanded support for using standard Response-like objects in the API. In particular, if you return an object from a REST method with a headers property, this will be used as the response headers.

`blob.save()` removed

The blob.save() method has been removed. Please use the saveBeforeCommit flag in the options to the Blob constructor instead.

VM Module Loader

Harper v5 loads application modules through Node.js's VM module API, giving each application its own module cache and execution context. This provides per-application context: the logger global/export is automatically tagged with the application name, and config reflects that application's own configuration. Each application's module graph is isolated from other applications and from Harper internals.

All module loading behavior is controlled by the applications section in harperdb-config.yaml:

applications:
  lockdown: freeze-after-load # default; see below
  moduleLoader: vm # vm (default) | native | compartment
  dependencyLoader: auto # auto (default) | app | native
  allowedDirectory: app # app (default) | any
  allowedSpawnCommands: # see "Spawning new processes" above
    - npm
    - node
  # allowedBuiltinModules: [] # if omitted, all Node.js built-ins are allowed

Intrinsic Lockdown

The default lockdown mode (freeze-after-load) freezes JavaScript intrinsics (Object, Array, Promise, Map, Set, and others) after all application code has loaded. This prevents prototype pollution attacks. If application code or a dependency modifies intrinsic prototypes at runtime (after startup), it will throw a TypeError.

Available lockdown modes:

freeze-after-load — freeze intrinsics after all components have loaded (default)
freeze — freeze intrinsics before loading any application code
ses — full SES lockdown via the ses package (strictest; most likely to break packages that mutate built-ins)
none — no lockdown

If a dependency modifies intrinsic prototypes and you need a temporary workaround, set lockdown: none.

Allowed Directory

In production, applications can only load modules from within their own directory tree (allowedDirectory: app). Attempting to load a module from outside that directory will throw. Dev mode installs default to allowedDirectory: any, so local development is typically unaffected.

If your application legitimately needs to load files from outside its own directory in production, set:

applications:
  allowedDirectory: any

Allowed Built-in Modules

By default all Node.js built-in modules are accessible. To restrict which built-ins applications may import, set an explicit allowlist:

applications:
  allowedBuiltinModules:
    - fs
    - path
    - http

Dependency Loading

By default (dependencyLoader: auto), npm packages that do not declare harper as a dependency are loaded with the native Node.js loader. Packages that do depend on harper are loaded through the VM loader so they receive application context. Set dependencyLoader: app to always use the VM loader for dependencies, or native to always use the native loader for packages.

Disabling the VM Loader

If the VM loader is causing compatibility issues with existing code, it can be disabled entirely:

applications:
  moduleLoader: native

This restores pre-v5 behavior where modules are loaded with a standard import(). Application-specific context (tagged logging, per-app config) will not be available in native mode.

If the goal is only to fix package compatibility while keeping application context for first-party code, dependencyLoader: native is a narrower option. It uses native loading only for npm packages while keeping the VM loader for application source files.

Recommend Changes

The migration information above highlights necessary changes to make to existing applications, if they have used any of these patterns or features. However, we also have new recommended best practices for applications. These are not necessary, but can help to ensure that your application is using the best patterns.

We recommend using the static methods on Resources/Tables to implement endpoints. This should be used in conjunction with accessing request information from the request target argument (or the request itself via getContext). See the Resources API for more information.
Context does not need to be explicitly passed to every call, and can be accessed through the getContext function available as an export from the harper package.
Harper functions/APIs should be accessed through the harper package rather than through the global variables.

Naming Changes​

Package Installation Install Scripts​

Table.get return value​

Frozen Records​

Transactions and Context​

Spawning new processes (via node:child_process)​

Response Objects​

blob.save() removed​

VM Module Loader​

Intrinsic Lockdown​

Allowed Directory​

Allowed Built-in Modules​

Dependency Loading​

Disabling the VM Loader​