API
The models object exposes three methods. All of them accept an optional model option naming the configured logical model to use; when omitted, the logical name default is used. Calling a logical name with no configured backend, or asking a backend for a capability it does not support (for example, embeddings from a generation-only backend), throws an error — capability checks run up front, before any request is made.
embed()
models.embed(input: string | string[], options?: EmbedOpts): Promise<Float32Array[]>
Converts one or more strings into embedding vectors. The result is always an array of Float32Array, one per input string, in input order — including when a single string is passed.
import { models } from 'harper';
const [single] = await models.embed('What is Harper?', { inputType: 'query' });
const batch = await models.embed(['first document', 'second document']);
| Option | Type | Default | Description |
|---|---|---|---|
model | string | 'default' | Logical name of a configured embedding model |
inputType | 'document' | 'query' | — | Hint for models that distinguish document embeddings from query embeddings (e.g. nomic-embed-text); ignored by models that do not |
signal | AbortSignal | — | Cancels the call; composed with the backend's configured requestTimeoutMs |
generate()
models.generate(input: GenerateInput, options?: GenerateOpts): Promise<GenerateResult>
Generates a completion. The input may be:
- a
string— shorthand for a single user message, - an array of messages:
{ role: 'system' | 'user' | 'assistant' | 'tool', content: string }, - an object
{ messages, tools?, system? }— the form required to declare tools or pass a system prompt alongside the messages.
const result = await models.generate(
[
{ role: 'system', content: 'You are a terse assistant.' },
{ role: 'user', content: 'What is an HNSW index?' },
],
{ temperature: 0.2, maxTokens: 300 }
);
console.log(result.content);
| Option | Type | Default | Description |
|---|---|---|---|
model | string | 'default' | Logical name of a configured generative model |
temperature | number | backend | Sampling temperature, passed through to the backend |
maxTokens | number | backend | Completion token limit, passed through to the backend |
responseFormat | 'text' | 'json' | { schema: object } | 'text' | Structured output. { schema } requests output conforming to a JSON Schema; support varies by backend |
toolMode | 'return' | 'auto' | 'return' | How tool calls are handled — see Tool Calling |
signal | AbortSignal | — | Cancels the call; composed with the backend's configured requestTimeoutMs |
Additional options apply only when toolMode: 'auto'; they are documented in Tool Calling.
GenerateResult
| Field | Type | Description |
|---|---|---|
content | string | The generated text |
finishReason | 'stop' | 'length' | 'tool_calls' | 'content_filter' | Why generation stopped, normalized across backends |
toolCalls | ToolCall[] | Tool calls the model requested, when finishReason is 'tool_calls' (each { id, name, arguments }, with arguments parsed to an object) |
usage | TokenUsage | Token usage reported by the backend (promptTokens, completionTokens, …), when available |
trace | ToolTraceEntry[] | Per-tool-invocation trace; only populated by the toolMode: 'auto' loop — see Tool Calling |
generateStream()
models.generateStream(input: GenerateInput, options?: GenerateOpts): AsyncIterable<GenerateChunk>
Identical to generate() but yields the completion incrementally:
let text = '';
for await (const chunk of models.generateStream('Write a haiku about databases.')) {
if (chunk.deltaContent) text += chunk.deltaContent;
}
Each chunk may carry:
| Field | Type | Description |
|---|---|---|
deltaContent | string | Text appended since the previous chunk |
deltaToolCalls | Partial<ToolCall>[] | Tool-call deltas; a backend may deliver the same tool call across several chunks with partial fields |
finishReason | same values as GenerateResult | Set on the final chunk only |
Errors detected before the call starts (unknown model name, missing capability) throw synchronously; errors during generation propagate through the iterable.
Errors and timeouts
- An unconfigured logical model name throws a not-found error. The error names the missing logical name only — it does not enumerate configured names.
- A capability mismatch (embedding call to a generation-only backend, tool declarations against a backend without tool support) throws before any request is made.
- Each backend supports a
requestTimeoutMsconfiguration field; when set, it is composed with any caller-providedsignalso whichever fires first cancels the request. - Backend/network failures throw backend-specific errors with sanitized messages.
Every call — successful or failed — is recorded in the model-call analytics.