Skip to main content
Version: v5

API

Added in: v5.1.0

The models object exposes three methods. All of them accept an optional model option naming the configured logical model to use; when omitted, the logical name default is used. Calling a logical name with no configured backend, or asking a backend for a capability it does not support (for example, embeddings from a generation-only backend), throws an error — capability checks run up front, before any request is made.

embed()

models.embed(input: string | string[], options?: EmbedOpts): Promise<Float32Array[]>

Converts one or more strings into embedding vectors. The result is always an array of Float32Array, one per input string, in input order — including when a single string is passed.

import { models } from 'harper';

const [single] = await models.embed('What is Harper?', { inputType: 'query' });
const batch = await models.embed(['first document', 'second document']);
OptionTypeDefaultDescription
modelstring'default'Logical name of a configured embedding model
inputType'document' | 'query'Hint for models that distinguish document embeddings from query embeddings (e.g. nomic-embed-text); ignored by models that do not
signalAbortSignalCancels the call; composed with the backend's configured requestTimeoutMs

generate()

models.generate(input: GenerateInput, options?: GenerateOpts): Promise<GenerateResult>

Generates a completion. The input may be:

  • a string — shorthand for a single user message,
  • an array of messages: { role: 'system' | 'user' | 'assistant' | 'tool', content: string },
  • an object { messages, tools?, system? } — the form required to declare tools or pass a system prompt alongside the messages.
const result = await models.generate(
[
{ role: 'system', content: 'You are a terse assistant.' },
{ role: 'user', content: 'What is an HNSW index?' },
],
{ temperature: 0.2, maxTokens: 300 }
);
console.log(result.content);
OptionTypeDefaultDescription
modelstring'default'Logical name of a configured generative model
temperaturenumberbackendSampling temperature, passed through to the backend
maxTokensnumberbackendCompletion token limit, passed through to the backend
responseFormat'text' | 'json' | { schema: object }'text'Structured output. { schema } requests output conforming to a JSON Schema; support varies by backend
toolMode'return' | 'auto''return'How tool calls are handled — see Tool Calling
signalAbortSignalCancels the call; composed with the backend's configured requestTimeoutMs

Additional options apply only when toolMode: 'auto'; they are documented in Tool Calling.

GenerateResult

FieldTypeDescription
contentstringThe generated text
finishReason'stop' | 'length' | 'tool_calls' | 'content_filter'Why generation stopped, normalized across backends
toolCallsToolCall[]Tool calls the model requested, when finishReason is 'tool_calls' (each { id, name, arguments }, with arguments parsed to an object)
usageTokenUsageToken usage reported by the backend (promptTokens, completionTokens, …), when available
traceToolTraceEntry[]Per-tool-invocation trace; only populated by the toolMode: 'auto' loop — see Tool Calling

generateStream()

models.generateStream(input: GenerateInput, options?: GenerateOpts): AsyncIterable<GenerateChunk>

Identical to generate() but yields the completion incrementally:

let text = '';
for await (const chunk of models.generateStream('Write a haiku about databases.')) {
if (chunk.deltaContent) text += chunk.deltaContent;
}

Each chunk may carry:

FieldTypeDescription
deltaContentstringText appended since the previous chunk
deltaToolCallsPartial<ToolCall>[]Tool-call deltas; a backend may deliver the same tool call across several chunks with partial fields
finishReasonsame values as GenerateResultSet on the final chunk only

Errors detected before the call starts (unknown model name, missing capability) throw synchronously; errors during generation propagate through the iterable.

Errors and timeouts

  • An unconfigured logical model name throws a not-found error. The error names the missing logical name only — it does not enumerate configured names.
  • A capability mismatch (embedding call to a generation-only backend, tool declarations against a backend without tool support) throws before any request is made.
  • Each backend supports a requestTimeoutMs configuration field; when set, it is composed with any caller-provided signal so whichever fires first cancels the request.
  • Backend/network failures throw backend-specific errors with sanitized messages.

Every call — successful or failed — is recorded in the model-call analytics.