LogoLogo
Studio
4.4
4.4
  • Harper Docs
  • Getting Started
  • Developers
    • Applications
      • Caching
      • Defining Schemas
      • Defining Roles
      • Debugging Applications
      • Define Fastify Routes
      • Web Applications
      • Example Projects
    • Components
      • Managing
      • Reference
      • Built-In Components
    • REST
    • Operations API
      • Quick Start Examples
      • Databases and Tables
      • NoSQL Operations
      • Bulk Operations
      • Users and Roles
      • Clustering
        • Clustering with NATS
      • Custom Functions
      • Components
      • Registration
      • Jobs
      • Logs
      • Utilities
      • Token Authentication
      • SQL Operations
      • Advanced JSON SQL Examples
    • Real-Time
    • Replication/Clustering
      • Sharding
      • Legacy NATS Clustering
        • Requirements and Definitions
        • Creating A Cluster User
        • Naming A Node
        • Enabling Clustering
        • Establishing Routes
        • Subscription Overview
        • Managing Subscriptions
        • Things Worth Knowing
        • Certificate Management
    • Security
      • JWT Authentication
      • Basic Authentication
      • mTLS Authentication
      • Configuration
      • Users & Roles
      • Certificate Management
    • SQL Guide
      • SQL Features Matrix
      • SQL Date Functions
      • SQL Reserved Word
      • SQL Functions
      • SQL JSON Search
      • SQL Geospatial Functions
    • Miscellaneous
      • Google Data Studio
      • SDKs
      • Query Optimization
  • Administration
    • Best Practices and Recommendations
    • Logging
      • Standard Logging
      • Audit Logging
      • Transaction Logging
    • Clone Node
    • Compact
    • Jobs
    • Harper Studio
      • Create an Account
      • Log In & Password Reset
      • Organizations
      • Instances
      • Manage Databases / Browse Data
      • Manage Clustering
      • Manage Instance Users
      • Manage Instance Roles
      • Manage Applications
      • Instance Metrics
      • Instance Configuration
      • Enable Mixed Content
  • Deployments
    • Configuration File
    • Harper CLI
    • Install Harper
      • On Linux
    • Upgrade a Harper Instance
    • Harper Cloud
      • IOPS Impact on Performance
      • Instance Size Hardware Specs
      • Alarms
      • Verizon 5G Wavelength
  • Technical Details
    • Reference
      • Analytics
      • Architecture
      • Content Types
      • Data Types
      • Dynamic Schema
      • GraphQL
      • Harper Headers
      • Harper Limits
      • Globals
      • Resource Class
      • Transactions
      • Storage Algorithm
    • Release Notes
      • Harper Tucker (Version 4)
        • 4.4.24
        • 4.4.23
        • 4.4.22
        • 4.4.21
        • 4.4.20
        • 4.4.19
        • 4.4.18
        • 4.4.17
        • 4.4.16
        • 4.4.15
        • 4.4.14
        • 4.4.13
        • 4.4.12
        • 4.4.11
        • 4.4.10
        • 4.4.9
        • 4.4.8
        • 4.4.7
        • 4.4.6
        • 4.4.5
        • 4.4.4
        • 4.4.3
        • 4.4.2
        • 4.4.1
        • 4.4.0
        • 4.3.38
        • 4.3.37
        • 4.3.36
        • 4.3.35
        • 4.3.34
        • 4.3.33
        • 4.3.32
        • 4.3.31
        • 4.3.30
        • 4.3.29
        • 4.3.28
        • 4.3.27
        • 4.3.26
        • 4.3.25
        • 4.3.24
        • 4.3.23
        • 4.3.22
        • 4.3.21
        • 4.3.20
        • 4.3.19
        • 4.3.18
        • 4.3.17
        • 4.3.16
        • 4.3.15
        • 4.3.14
        • 4.3.13
        • 4.3.12
        • 4.3.11
        • 4.3.10
        • 4.3.9
        • 4.3.8
        • 4.3.7
        • 4.3.6
        • 4.3.5
        • 4.3.4
        • 4.3.3
        • 4.3.2
        • 4.3.1
        • 4.3.0
        • 4.2.8
        • 4.2.7
        • 4.2.6
        • 4.2.5
        • 4.2.4
        • 4.2.3
        • 4.2.2
        • 4.2.1
        • 4.2.0
        • 4.1.2
        • 4.1.1
        • 4.1.0
        • 4.0.7
        • 4.0.6
        • 4.0.5
        • 4.0.4
        • 4.0.3
        • 4.0.2
        • 4.0.1
        • 4.0.0
        • Tucker
      • HarperDB Monkey (Version 3)
        • 3.3.0
        • 3.2.1
        • 3.2.0
        • 3.1.5
        • 3.1.4
        • 3.1.3
        • 3.1.2
        • 3.1.1
        • 3.1.0
        • 3.0.0
      • HarperDB Penny (Version 2)
        • 2.3.1
        • 2.3.0
        • 2.2.3
        • 2.2.2
        • 2.2.0
        • 2.1.1
      • HarperDB Alby (Version 1)
        • 1.3.1
        • 1.3.0
        • 1.2.0
        • 1.1.0
  • More Help
    • Support
    • Slack
    • Contact Us
Powered by GitBook
On this page
  • Databases
  • Tables
  • Primary Key
  • Dynamic Schema Example
  1. Technical Details
  2. Reference

Dynamic Schema

PreviousData TypesNextGraphQL

Last updated 2 months ago

When tables are created without any schema, through the operations API (without specifying attributes) or studio, the tables follow "dynamic-schema" behavior. Generally it is best-practice to define schemas for your tables to ensure predictable, consistent structures with data integrity and precise control over indexing, without dependency on data itself. However, it can often be simpler and quicker to simply create a table and let the data auto-generate the schema dynamically with everything being auto-indexed for broad querying.

With dynamic schemas individual attributes are reflexively created as data is ingested, meaning the table will adapt to the structure of data ingested. Harper tracks the metadata around schemas, tables, and attributes allowing for describe table, describe schema, and describe all operations.

Databases

Harper databases hold a collection of tables together in a single file that are transactionally connected. This means that operations across tables within a database can be performed in a single atomic transaction. By default tables are added to the default database called "data", but other databases can be created and specified for tables.

Tables

Harper tables group records together with a common data pattern. To create a table users must provide a table name and a primary key.

  • Table Name: Used to identify the table.

  • Primary Key: This is a required attribute that serves as the unique identifier for a record and is also known as the hash_attribute in Harper operations API.

Primary Key

The primary key (also referred to as the hash_attribute) is used to uniquely identify records. Uniqueness is enforced on the primary; inserts with the same primary key will be rejected. If a primary key is not provided on insert, a GUID will be automatically generated and returned to the user. The utilizes this value for indexing.

Standard Attributes

With tables that are using dynamic schemas, additional attributes are reflexively added via insert and update operations (in both SQL and NoSQL) when new attributes are included in the data structure provided to Harper. As a result, schemas are additive, meaning new attributes are created in the underlying storage algorithm as additional data structures are provided. Harper offers create_attribute and drop_attribute operations for users who prefer to manually define their data model independent of data ingestion. When new attributes are added to tables with existing data the value of that new attribute will be assumed null for all existing records.

Audit Attributes

Harper automatically creates two audit attributes used on each record if the table is created without a schema.

  • __createdtime__: The time the record was created in format.

  • __updatedtime__: The time the record was updated in format.

Dynamic Schema Example

Create a Database

{
    "operation": "create_database",
    "schema": "dev"
}

Create a Table

Notice the schema name, table name, and primary key name are the only required parameters.

{
    "operation": "create_table",
    "database": "dev",
    "table": "dog",
    "primary_key": "id"
}

At this point the table does not have structure beyond what we provided, so the table looks like this:

dev.dog

Insert Record

To define attributes we do not need to do anything beyond sending them in with an insert operation.

{
    "operation": "insert",
    "database": "dev",
    "table": "dog",
    "records": [
      {"id": 1, "dog_name": "Penny", "owner_name": "Kyle"}
    ]
}

With a single record inserted and new attributes defined, our table now looks like this:

dev.dog

Indexes have been automatically created for dog_name and owner_name attributes.

Insert Additional Record

If we continue inserting records with the same data schema no schema updates are required. One record will omit the hash attribute from the insert to demonstrate GUID generation.

{
    "operation": "insert",
    "database": "dev",
    "table": "dog",
    "records": [
        {"id": 2, "dog_name": "Monk", "owner_name": "Aron"},
        {"dog_name": "Harper","owner_name": "Stephen"}
    ]
}

In this case, there is no change to the schema. Our table now looks like this:

dev.dog

Update Existing Record

In this case, we will update a record with a new attribute not previously defined on the table.

{
    "operation": "update",
    "database": "dev",
    "table": "dog",
    "records": [
      {"id": 2, "weight_lbs": 35}
    ]
}

Now we have a new attribute called weight_lbs. Our table now looks like this:

dev.dog

Query Table with SQL

Now if we query for all records where weight_lbs is null we expect to get back two records.

{
    "operation": "sql",
    "sql": "SELECT * FROM dev.dog WHERE weight_lbs IS NULL"
}

This results in the expected two records being returned.

To better understand the behavior let’s take a look at an example. This example utilizes .

Harper Storage Algorithm
Unix Epoch with milliseconds
Unix Epoch with milliseconds
Harper API operations