LogoLogo
Studio
4.6
4.6
  • Harper Docs
  • Getting Started
    • What is Harper
    • Install Harper
    • Harper Concepts
    • Create Your First Application
  • Developers
    • Applications
      • Caching
      • Defining Schemas
      • Defining Roles
      • Data Loader
      • Debugging Applications
      • Define Fastify Routes
      • Web Applications
      • Example Projects
    • Components
      • Managing
      • Reference
      • Built-In Components
    • REST
    • Operations API
      • Quick Start Examples
      • Databases and Tables
      • NoSQL Operations
      • Bulk Operations
      • Users and Roles
      • Clustering
        • Clustering with NATS
      • Components
      • Registration
      • Jobs
      • Logs
      • System Operations
      • Configuration
      • Certificate Management
      • Token Authentication
      • SQL Operations
      • Advanced JSON SQL Examples
    • Real-Time
    • Replication/Clustering
      • Sharding
      • Legacy NATS Clustering
        • Requirements and Definitions
        • Creating A Cluster User
        • Naming A Node
        • Enabling Clustering
        • Establishing Routes
        • Subscription Overview
        • Managing Subscriptions
        • Things Worth Knowing
        • Certificate Management
    • Security
      • JWT Authentication
      • Basic Authentication
      • mTLS Authentication
      • Configuration
      • Users & Roles
      • Certificate Management
    • SQL Guide
      • SQL Features Matrix
      • SQL Date Functions
      • SQL Reserved Word
      • SQL Functions
      • SQL JSON Search
      • SQL Geospatial Functions
    • Miscellaneous
      • Google Data Studio
      • SDKs
      • Query Optimization
  • Administration
    • Best Practices and Recommendations
    • Logging
      • Standard Logging
      • Audit Logging
      • Transaction Logging
    • Clone Node
    • Compact
    • Jobs
    • Harper Studio
      • Create an Account
      • Log In & Password Reset
      • Organizations
      • Instances
      • Query Instance Data
      • Manage Databases / Browse Data
      • Manage Clustering
      • Manage Instance Users
      • Manage Instance Roles
      • Manage Applications
      • Instance Metrics
      • Instance Configuration
      • Enable Mixed Content
  • Deployments
    • Configuration File
    • Harper CLI
    • Install Harper
      • On Linux
    • Upgrade a Harper Instance
    • Harper Cloud
      • IOPS Impact on Performance
      • Instance Size Hardware Specs
      • Alarms
      • Verizon 5G Wavelength
  • Technical Details
    • Reference
      • Analytics
      • Architecture
      • Content Types
      • Data Types
      • Dynamic Schema
      • GraphQL
      • Harper Headers
      • Harper Limits
      • Globals
      • Resource Class
      • Transactions
      • Storage Algorithm
      • Blob
    • Release Notes
      • Harper Tucker (Version 4)
        • 4.6.0
        • 4.5.10
        • 4.5.9
        • 4.5.8
        • 4.5.7
        • 4.5.6
        • 4.5.5
        • 4.5.4
        • 4.5.3
        • 4.5.2
        • 4.5.1
        • 4.5.0
        • 4.4.24
        • 4.4.23
        • 4.4.22
        • 4.4.21
        • 4.4.20
        • 4.4.19
        • 4.4.18
        • 4.4.17
        • 4.4.16
        • 4.4.15
        • 4.4.14
        • 4.4.13
        • 4.4.12
        • 4.4.11
        • 4.4.10
        • 4.4.9
        • 4.4.8
        • 4.4.7
        • 4.4.6
        • 4.4.5
        • 4.4.4
        • 4.4.3
        • 4.4.2
        • 4.4.1
        • 4.4.0
        • 4.3.38
        • 4.3.37
        • 4.3.36
        • 4.3.35
        • 4.3.34
        • 4.3.33
        • 4.3.32
        • 4.3.31
        • 4.3.30
        • 4.3.29
        • 4.3.28
        • 4.3.27
        • 4.3.26
        • 4.3.25
        • 4.3.24
        • 4.3.23
        • 4.3.22
        • 4.3.21
        • 4.3.20
        • 4.3.19
        • 4.3.18
        • 4.3.17
        • 4.3.16
        • 4.3.15
        • 4.3.14
        • 4.3.13
        • 4.3.12
        • 4.3.11
        • 4.3.10
        • 4.3.9
        • 4.3.8
        • 4.3.7
        • 4.3.6
        • 4.3.5
        • 4.3.4
        • 4.3.3
        • 4.3.2
        • 4.3.1
        • 4.3.0
        • 4.2.8
        • 4.2.7
        • 4.2.6
        • 4.2.5
        • 4.2.4
        • 4.2.3
        • 4.2.2
        • 4.2.1
        • 4.2.0
        • 4.1.2
        • 4.1.1
        • 4.1.0
        • 4.0.7
        • 4.0.6
        • 4.0.5
        • 4.0.4
        • 4.0.3
        • 4.0.2
        • 4.0.1
        • 4.0.0
        • Tucker
      • HarperDB Monkey (Version 3)
        • 3.3.0
        • 3.2.1
        • 3.2.0
        • 3.1.5
        • 3.1.4
        • 3.1.3
        • 3.1.2
        • 3.1.1
        • 3.1.0
        • 3.0.0
      • HarperDB Penny (Version 2)
        • 2.3.1
        • 2.3.0
        • 2.2.3
        • 2.2.2
        • 2.2.0
        • 2.1.1
      • HarperDB Alby (Version 1)
        • 1.3.1
        • 1.3.0
        • 1.2.0
        • 1.1.0
  • More Help
    • Support
    • Slack
    • Contact Us
Powered by GitBook
On this page
  • Configuration
  • Data File Format
  • Basic Example
  • Multiple Tables
  • File Organization
  • Single File Pattern
  • Multiple Files Pattern
  • Glob Pattern
  • Loading Behavior
  • Best Practices
  • Example Component Structure
  • Related Documentation
  1. Developers
  2. Applications

Data Loader

PreviousDefining RolesNextDebugging Applications

Last updated 2 days ago

The Data Loader is a built-in component that provides a reliable mechanism for loading data from JSON or YAML files into Harper tables as part of component deployment. This feature is particularly useful for ensuring specific records exist in your database when deploying components, such as seed data, configuration records, or initial application data.

Configuration

To use the Data Loader, first specify your data files in the config.yaml in your component directory:

dataLoader:
  files: 'data/*.json'

The Data Loader is an and supports the standard files configuration option.

Data File Format

Data files can be structured as either JSON or YAML files containing the records you want to load. Each data file must specify records for a single table - if you need to load data into multiple tables, create separate data files for each table.

Basic Example

Create a data file in your component's data directory (one table per file):

{
  "database": "myapp",
  "table": "users",
  "records": [
    {
      "id": 1,
      "username": "admin",
      "email": "admin@example.com",
      "role": "administrator"
    },
    {
      "id": 2,
      "username": "user1",
      "email": "user1@example.com",
      "role": "standard"
    }
  ]
}

Multiple Tables

To load data into multiple tables, create separate data files for each table:

users.json:

{
  "database": "myapp",
  "table": "users",
  "records": [
    {
      "id": 1,
      "username": "admin",
      "email": "admin@example.com"
    }
  ]
}

settings.yaml:

database: myapp
table: settings
records:
  - id: 1
    setting_name: app_name
    setting_value: My Application
  - id: 2
    setting_name: version
    setting_value: "1.0.0"

File Organization

You can organize your data files in various ways:

Single File Pattern

dataLoader:
  files: 'data/seed-data.json'

Multiple Files Pattern

dataLoader:
  files: 
    - 'data/users.json'
    - 'data/settings.yaml'
    - 'data/initial-products.json'

Glob Pattern

dataLoader:
  files: 'data/**/*.{json,yaml,yml}'

Loading Behavior

When Harper starts up with a component that includes the Data Loader:

  1. The Data Loader reads all specified data files (JSON or YAML)

  2. For each file, it validates that a single table is specified

  3. Records are inserted or updated based on timestamp comparison:

    • New records are inserted if they don't exist

    • Existing records are updated only if the data file's modification time is newer than the record's updated time

    • This ensures data files can be safely reloaded without overwriting newer changes

  4. If records with the same primary key already exist, updates occur only when the file is newer

Best Practices

  1. One Table Per File: Remember that each data file can only load records into a single table. Organize your files accordingly.

  2. Idempotency: Design your data files to be idempotent - they should be safe to load multiple times without creating duplicate or conflicting data.

  3. Version Control: Include your data files in version control to ensure consistency across deployments.

  4. Environment-Specific Data: Consider using different data files for different environments (development, staging, production).

  5. Data Validation: Ensure your data files are valid JSON or YAML and match your table schemas before deployment.

  6. Sensitive Data: Avoid including sensitive data like passwords or API keys directly in data files. Use environment variables or secure configuration management instead.

Example Component Structure

my-component/
├── config.yaml
├── data/
│   ├── users.json
│   ├── roles.json
│   └── settings.json
├── schemas.graphql
└── roles.yaml

With this structure, your config.yaml might look like:

# Load environment variables first
loadEnv:
  files: '.env'

# Define schemas
graphqlSchema:
  files: 'schemas.graphql'

# Define roles
roles:
  files: 'roles.yaml'

# Load initial data
dataLoader:
  files: 'data/*.json'

# Enable REST endpoints
rest: true

Related Documentation

Note: While the Data Loader can create tables automatically by inferring the schema from the provided records, it's recommended to define your table schemas explicitly using the component for better control and type safety.

Define Schemas First: While the Data Loader can infer schemas, it's strongly recommended to define your table schemas and relations explicitly using the component before loading data. This ensures proper data types, constraints, and relationships between tables.

- For loading data via the Operations API

graphqlSchema
graphqlSchema
Built-In Components
Bulk Operations
Extension
Extensions