Applications
Last updated
Last updated
HarperDB is more than a database, it's a distributed clustering platform allowing you to package your schema, endpoints and application logic and deploy them to an entire fleet of HarperDB instances optimized for on-the-edge scalable data delivery.
In this guide, we are going to explore the evermore extensible architecture that HarperDB provides by building a HarperDB component, a fundamental building-block of the HarperDB ecosystem.
When working through this guide, we recommend you use the HarperDB Application Template repo as a reference.
HarperDB provides several types of components. Any package that is added to HarperDB is called a "component", and components are generally categorized as either "applications", which deliver a set of endpoints for users, or "extensions", which are building blocks for features like authentication, additional protocols, and connectors that can be used by other components. Components can be added to the hdb/components
directory and will be loaded by HarperDB when it starts. Components that are remotely deployed to HarperDB (through the studio or the operation API) are installed into the hdb/node_modules
directory. Using harperdb run .
or harperdb dev .
allows us to specifically load a certain application in addition to any that have been manually added to hdb/components
or installed (in hdb/node_modules
).
We assume you are running HarperDB version 4.2 or greater, which supports HarperDB Application architecture (in previous versions, this is 'custom functions').
Let's create and initialize a new directory for our application. It is recommended that you start by using the HarperDB application template. Assuming you have git
installed, you can create your project directory by cloning:
The core of a HarperDB application is the database, so let's create a database table!
A quick and expressive way to define a table is through a GraphQL Schema. Using your editor of choice, edit the file named schema.graphql
in the root of the application directory, my-app
, that we created above. To create a table, we will need to add a type
of @table
named Dog
(and you can remove the example table in the template):
And then we'll add a primary key named id
of type ID
:
(Note: A GraphQL schema is a fast method to define tables in HarperDB, but you are by no means required to use GraphQL to query your application, nor should you necessarily do so)
Now we tell HarperDB to run this as an application:
HarperDB will now create the Dog
table and its id
attribute we just defined. Not only is this an easy way to get create a table, but this schema is included in our application, which will ensure that this table exists wherever we deploy this application (to any HarperDB instance).
Next, let's expand our Dog
table by adding additional typed attributes for dog name
, breed
and age
.
This will ensure that new records must have these properties with these types.
Because we ran harperdb dev .
earlier (dev mode), HarperDB is now monitoring the contents of our application directory for changes and reloading when they occur. This means that once we save our schema file with these new attributes, HarperDB will automatically reload our application, read my-app/schema.graphql
and update the Dog
table and attributes we just defined. The dev mode will also ensure that any logging or errors are immediately displayed in the console (rather only in the log file).
As a NoSQL database, HarperDB supports heterogeneous records (also referred to as documents), so you can freely specify additional properties on any record. If you do want to restrict the records to only defined properties, you can always do that by adding the sealed
directive:
If you are using HarperDB Studio, we can now add JSON-formatted records to this new table in the studio or upload data as CSV from a local file or URL. A third, more advanced, way to add data to your database is to use the operations API, which provides full administrative control over your new HarperDB instance and tables.
Now that we have a running application with a database (with data if you imported any data), let's make this data accessible from a RESTful URL by adding an endpoint. To do this, we simply add the @export
directive to our Dog
table:
By default the application HTTP server port is 9926
(this can be configured here), so the local URL would be http://localhost:9926/Dog/ with a full REST API. We can PUT or POST data into this table using this new path, and then GET or DELETE from it as well (you can even view data directly from the browser). If you have not added any records yet, we could use a PUT or POST to add a record. PUT is appropriate if you know the id, and POST can be used to assign an id:
With this a record will be created and the auto-assigned id will be available through the Location
header. If you added a record, you can visit the path /Dog/<id>
to view that record. Alternately, the curl command curl http://localhost:9926/Dog/<id>
will achieve the same thing.
These endpoints automatically support Basic
, Cookie
, and JWT
authentication methods. See the documentation on security for more information on different levels of access.
By default, HarperDB also automatically authorizes all requests from loopback IP addresses (from the same computer) as the superuser, to make it simple to interact for local development. If you want to test authentication/authorization, or enforce stricter security, you may want to disable the authentication.authorizeLocal
setting.
These endpoints support various content types, including JSON
, CBOR
, MessagePack
and CSV
. Simply include an Accept
header in your requests with the preferred content type. We recommend CBOR
as a compact, efficient encoding with rich data types, but JSON
is familiar and great for web application development, and CSV
can be useful for exporting data to spreadsheets or other processing.
HarperDB works with other important standard HTTP headers as well, and these endpoints are even capable of caching interaction:
Querying your application database is straightforward and easy, as tables exported with the @export
directive are automatically exposed via REST endpoints. Simple queries can be crafted through URL query parameters.
In order to maintain reasonable query speed on a database as it grows in size, it is critical to select and establish the proper indexes. So, before we add the @export
declaration to our Dog
table and begin querying it, let's take a moment to target some table properties for indexing. We'll use name
and breed
as indexed table properties on our Dog
table. All we need to do to accomplish this is tag these properties with the @indexed
directive:
And finally, we'll add the @export
directive to expose the table as a RESTful endpoint
Now we can start querying. Again, we just simply access the endpoint with query parameters (basic GET requests), like:
Congratulations, you now have created a secure database application backend with a table, a well-defined structure, access controls, and a functional REST endpoint with query capabilities! See the REST documentation for more information on HTTP access and see the Schema reference for more options for defining schemas.
Additionally, you may now use GraphQL (over HTTP) to create queries. See the documentation for that new feature here.
This guide assumes that you're building a HarperDB application locally. If you have a cloud instance available, you can deploy it by doing the following:
Commit and push your application component directory code (i.e., the my-app
directory) to a Github repo. In this tutorial we started with a clone of the application-template. To commit and push to your own repository, change the origin to your repo: git remote set-url origin git@github.com:your-account/your-repo.git
Go to the applications section of your target cloud instance in the HarperDB Studio.
In the left-hand menu of the applications IDE, click 'deploy' and specify a package location reference that follows the npm package specification (i.e., a string like HarperDB/Application-Template
or a URL like https://github.com/HarperDB/application-template
, for example, that npm knows how to install).
You can also deploy your application from your repository by directly using the deploy_component
operation.
Once you have deployed your application to a HarperDB cloud instance, you can start scaling your application by adding additional instances in other regions.
With the help of a global traffic manager/load balancer configured, you can distribute incoming requests to the appropriate server. You can deploy and re-deploy your application to all the nodes in your mesh.
Now, with an application that you can deploy, update, and re-deploy, you have an application that is horizontally and globally scalable!
So far we have built an application entirely through schema configuration. However, if your application requires more custom functionality, you will probably want to employ your own JavaScript modules to implement more specific features and interactions. This gives you tremendous flexibility and control over how data is accessed and modified in HarperDB. Let's take a look at how we can use JavaScript to extend and define "resources" for custom functionality. Let's add a property to the dog records when they are returned, that includes their age in human years. In HarperDB, data is accessed through our Resource API, a standard interface to access data sources, tables, and make them available to endpoints. Database tables are Resource
classes, and so extending the function of a table is as simple as extending their class.
To define custom (JavaScript) resources as endpoints, we need to create a resources.js
module (this goes in the root of your application folder). And then endpoints can be defined with Resource classes that export
ed. This can be done in addition to, or in lieu of the @export
ed types in the schema.graphql. If you are exporting and extending a table you defined in the schema make sure you remove the @export
from the schema so that don't export the original table or resource to the same endpoint/path you are exporting with a class. Resource classes have methods that correspond to standard HTTP/REST methods, like get
, post
, patch
, and put
to implement specific handling for any of these methods (for tables they all have default implementations). To do this, we get the Dog
class from the defined tables, extend it, and export it:
Here we exported the DogWithHumanAge
class (exported with the same name), which directly maps to the endpoint path. Therefore, now we have a /DogWithHumanAge/<dog-id>
endpoint based on this class, just like the direct table interface that was exported as /Dog/<dog-id>
, but the new endpoint will return objects with the computed humanAge
property. Resource classes provide getters/setters for every defined attribute so that accessing instance properties like age
, will get the value from the underlying record. The instance holds information about the primary key of the record so updates and actions can be applied to the correct record. And changing or assigning new properties can be saved or included in the resource as it returned and serialized. The return super.get(query)
call at the end allows for any query parameters to be applied to the resource, such as selecting individual properties (with a select
query parameter).
Often we may want to incorporate data from other tables or data sources in your data models. Next, let's say that we want a Breed
table that holds detailed information about each breed, and we want to add that information to the returned dog object. We might define the Breed table as (back in schema.graphql):
And next we will use this table in our get()
method. We will call the new table's (static) get()
method to retrieve a breed by id. To do this correctly, we access the table using our current context by passing in this
as the second argument. This is important because it ensures that we are accessing the data atomically, in a consistent snapshot across tables. This provides automatically tracking of most recently updated timestamps across resources for caching purposes. This allows for sharing of contextual metadata (like user who requested the data), and ensure transactional atomicity for any writes (not needed in this get operation, but important for other operations). The resource methods are automatically wrapped with a transaction (will commit/finish when the method completes), and this allows us to fully utilize multiple resources in our current transaction. With our own snapshot of the database for the Dog and Breed table we can then access data like this:
The call to Breed.get
will return an instance of the Breed
resource class, which holds the record specified the provided id/primary key. Like the Dog
instance, we can access or change properties on the Breed instance.
Here we have focused on customizing how we retrieve data, but we may also want to define custom actions for writing data. While HTTP PUT method has a specific semantic definition (replace current record), a common method for custom actions is through the HTTP POST method. the POST method has much more open-ended semantics and is a good choice for custom actions. POST requests are handled by our Resource's post() method. Let's say that we want to define a POST handler that adds a new trick to the tricks
array to a specific instance. We might do it like this, and specify an action to be able to differentiate actions:
And a POST request to /CustomDog/ would call this post
method. The Resource class then automatically tracks changes you make to your resource instances and saves those changes when this transaction is committed (again these methods are automatically wrapped in a transaction and committed once the request handler is finished). So when you push data on to the tricks
array, this will be recorded and persisted when this method finishes and before sending a response to the client.
The post
method automatically marks the current instance as being update. However, you can also explicitly specify that you are changing a resource by calling the update()
method. If you want to modify a resource instance that you retrieved through a get()
call (like Breed.get()
call above), you can call its update()
method to ensure changes are saved (and will be committed in the current transaction).
We can also define custom authorization capabilities. For example, we might want to specify that only the owner of a dog can make updates to a dog. We could add logic to our post
method or put
method to do this, but we may want to separate the logic so these methods can be called separately without authorization checks. The Resource API defines allowRead
, allowUpdate
, allowCreate
, and allowDelete
, or to easily configure individual capabilities. For example, we might do this:
Any methods that are not defined will fall back to HarperDB's default authorization procedure based on users' roles. If you are using/extending a table, this is based on HarperDB's role based access. If you are extending the base Resource
class, the default access requires super user permission.
You can also use the default
export to define the root path resource handler. For example:
This will allow requests to url like / to be directly resolved to this resource.
We can also directly implement the Resource class and use it to create new data sources from scratch that can be used as endpoints. Custom resources can also be used as caching sources. Let's say that we defined a Breed
table that was a cache of information about breeds from another source. We could implement a caching table like:
The caching documentation provides much more information on how to use HarperDB's powerful caching capabilities and set up data sources.
HarperDB provides a powerful JavaScript API with significant capabilities that go well beyond a "getting started" guide. See our documentation for more information on using the globals
and the Resource interface.
Every application or component can define their own configuration in a config.yaml
. If you are using the application template, you will have a default configuration in this config file (which is default configuration if no config file is provided). Within the config file, you can configure how different files and resources are loaded and handled. The default configuration file itself is documented with directions. Each entry can specify any files
that the loader will handle, and can also optionally specify what, if any, URL path
s it will handle. A path of /
means that the root URLs are handled by the loader, and a path of .
indicates that the URLs that start with this application's name are handled.
This config file allows you define a location for static files, as well (that are directly delivered as-is for incoming HTTP requests).
Each configuration entry can have the following properties, in addition to properties that may be specific to the individual component:
files
: This specifies the set of files that should be handled the component. This is a glob pattern, so a set of files can be specified like "directory/**".
path
: This is the URL path that is handled by this component.
root
: This specifies the root directory for mapping file paths to the URLs. For example, if you want all the files in web/**
to be available in the root URL path via the static handler, you could specify a root of web
, to indicate that the web directory maps to the root URL path.
package
: This is used to specify that this component is a third party package, and can be loaded from the specified package reference (which can be an NPM package, Github reference, URL, etc.).
Exporting resource will generate full RESTful endpoints. But, you may prefer to define endpoints through a framework. HarperDB includes a resource plugin for defining routes with the Fastify web framework. Fastify is a full-featured framework with many plugins, that provides sophisticated route definition capabilities.
By default, applications are configured to load any modules in the routes
directory (matching routes/*.js
) with Fastify's autoloader, which will allow these modules to export a function to define fastify routes. See the defining routes documentation for more information on how to create Fastify routes.
However, Fastify is not as fast as HarperDB's RESTful endpoints (about 10%-20% slower/more-overhead), nor does it automate the generation of a full uniform interface with correct RESTful header interactions (for caching control), so generally the HarperDB's REST interface is recommended for optimum performance and ease of use.
Generally, HarperDB will auto-detect when files change and auto-restart the appropriate threads. However, if there are changes that aren't detected, you may manually restart, with the restart_service
operation: