Skip to main content
Version: 4.6

Analytics

Harper provides extensive telemetry and analytics data to help monitor the status of the server and work loads, and to help understand traffic and usage patterns to identify issues and scaling needs, and identify queries and actions that are consuming the most resources.

Harper collects statistics for all operations, URL endpoints, and messaging topics, aggregating information by thread, operation, resource, and methods, in real-time. These statistics are logged in the hdb_raw_analytics and hdb_analytics table in the system database.

There are two "levels" of analytics in the Harper analytics table: the first is the immediate level of raw direct logging of real-time statistics. These analytics entries are recorded once a second (when there is activity) by each thread, and include all recorded activity in the last second, along with system resource information. The records have a primary key that is the timestamp in milliseconds since epoch. This can be queried (with superuser permission) using the search_by_conditions operation (this will search for 10 seconds worth of analytics) on the hdb_raw_analytics table:

POST http://localhost:9925
Content-Type: application/json

{
"operation": "search_by_conditions",
"schema": "system",
"table": "hdb_raw_analytics",
"conditions": [{
"search_attribute": "id",
"search_type": "between",
"search_value": [168859400000, 1688594010000]
}]
}

And a typical response looks like:

{
"time": 1688594390708,
"period": 1000.8336279988289,
"metrics": [
{
"metric": "bytes-sent",
"path": "search_by_conditions",
"type": "operation",
"median": 202,
"mean": 202,
"p95": 202,
"p90": 202,
"count": 1
},
...
{
"metric": "memory",
"threadId": 2,
"rss": 1492664320,
"heapTotal": 124596224,
"heapUsed": 119563120,
"external": 3469790,
"arrayBuffers": 798721
},
{
"metric": "utilization",
"idle": 138227.52767700003,
"active": 70.5066209952347,
"utilization": 0.0005098165086230495
}
],
"threadId": 2,
"totalBytesProcessed": 12182820,
"id": 1688594390708.6853
}

The second level of analytics recording is aggregate data. The aggregate records are recorded once a minute, and aggregate the results from all the per-second entries from all the threads, creating a summary of statistics once a minute. The ids for these milliseconds since epoch can be queried from the hdb_analytics table. You can query these with an operation like:

POST http://localhost:9925
Content-Type: application/json

{
"operation": "search_by_conditions",
"schema": "system",
"table": "hdb_analytics",
"conditions": [{
"search_attribute": "id",
"search_type": "between",
"search_value": [1688194100000, 1688594990000]
}]
}

And a summary record looks like:

{
"period": 60000,
"metric": "bytes-sent",
"method": "connack",
"type": "mqtt",
"median": 4,
"mean": 4,
"p95": 4,
"p90": 4,
"count": 1,
"id": 1688589569646,
"time": 1688589569646
}

Standard Analytics Metrics

While applications can define their own metrics, Harper provides a set of standard metrics that are tracked for all services:

HTTP

The following metrics are tracked for all HTTP requests:

metricpathmethodtypeUnitDescription
durationresource pathrequest methodcache-hit or cache-miss if a caching tablemillisecondsDuration of request handler
durationroute pathrequest methodfastify-routemilliseconds
durationoperationoperationmilliseconds
successresource pathrequest method%
successroute pathrequest methodfastify-route%
successoperationoperation%
bytes-sentresource pathrequest methodbytes
bytes-sentroute pathrequest methodfastify-routebytes
bytes-sentoperationoperationbytes
transferresource pathrequest methodoperationmillisecondsduration of transfer
transferroute pathrequest methodfastify-routemillisecondsduration of transfer
transferoperationoperationmillisecondsduration of transfer
socket-routed%percentage of sockets that could be immediately routed
tls-handshakemilliseconds
tls-reused%percentage of TLS that reuses sessions
cache-hittable name%The percentage of cache hits
cache-resolutiontable namemillisecondsThe duration of resolving requests for uncached entries

The following are metrics for real-time MQTT connections:

metricpathmethodtypeUnitDescription
mqtt-connectionscountThe number of open direct MQTT connections
ws-connectionscountnumber of open WS connections
connectionmqttconnect%percentage of successful direct MQTT connections
connectionmqttdisconnect%percentage of explicit direct MQTT disconnects
connectionwsconnect%percentage of successful WS connections
connectionwsdisconnect%percentage of explicit WS disconnects
bytes-senttopicmqtt commandmqttbytesThe number of bytes sent for a given command and topic

The following are metrics for replication:

metricpathmethodtypeUnitDescription
bytes-sentnode.databasereplicationegressbytesThe number of bytes sent for replication
bytes-sentnode.databasereplicationblobbytesThe number of bytes sent for replication of blobs
bytes-receivednode.databasereplicationingressbytesThe number of bytes received for replication
bytes-receivednode.databasereplicationblobbytesThe number of bytes received for replication of blobs

The following are general resource usage statistics that are tracked:

metricprimary attribute(s)other attribute(s)UnitDescription
database-sizesize, used, free, auditdatabasebytesThe size of the database in bytes
main-thread-utilizationidle, active, taskQueueLatency, rss, heapTotal, heapUsed, external, arrayBufferstimevariousMain thread resource usage; including idle time, active time, task queue latency, RSS, heap, buffer and external memory usage
resource-usagevariousSee breakout below
storage-volumeavailable, free, sizedatabasebytesThe size of the storage volume in bytes
table-sizesizedatabase, tablebytesThe size of the table in bytes
utilization%How much of the time the worker was processing requests

resource-usage metrics are everything returned by node:process.resourceUsage()1 plus the following additional metrics:

metricUnitDescription
timemsCurrent time when metric was recorded (Unix time)
periodmsDuration of the metric period
cpuUtilization%CPU utilization percentage (user and system combined)

Footnotes

  1. The userCPUTime and systemCPUTime metrics are converted to milliseconds to match the other time-related metrics.