Release Notes

All Bernard releases, newest first.

Milestone release

v0.9.0 2026-05-07 Bernard learns to look things up.

The largest update since launch. v0.9.0 reshapes how Bernard handles ambiguity: references resolve against memory before each turn, unknown entities trigger a read-only tool lookup, and prompts get rewritten for the active model family. A new coordinator (ReAct) mode adds iterative reasoning with verifiable plans, the UI moves to arrow-key menus, and a tool augmentation layer learns from every error your tools throw.

16 PRs merged
14 user-facing features
7 new env vars
5 bundled specialists

Smarter conversations new

Bernard now understands references the same way a human collaborator does. Before each turn, a reference resolver scans your message for named entities ("my brother", "the staging cluster") and looks them up in persistent memory. If the answer is missing, Bernard tries one read-only tool (e.g. a Google Contacts MCP) before asking you, then offers a Save / Edit / Skip menu so the answer sticks. A model-specific prompt rewriter then reshapes the message for the active model family, and a context-gathering protocol keeps the agent from inventing numbers it could have just looked up.

bernard
bernard> email my brother about the trip
// resolver: "my brother" not in memory — trying lookup
  ▶ google_contacts__search { query: "brother" }
  └─ found: Daniel Pereira <daniel@example.com>

? Save this for next time?
  ❯ Save   Edit  Skip

  ▶ gmail__send_email { to: "daniel@example.com", … }
   sent
  • Reference resolver with four outcomes: resolved, ambiguous, unknown, no-op (#128)
  • Reference tool-lookup pass with read-only allowlist + 5 s timeout (#138)
  • Model-specific prompt rewriter using ModelProfile.rewriterHint, fail-open (#125, #131)
  • Context-gathering protocol baked into the base system prompt (#126)

Better reasoning new

Coordinator (ReAct) mode turns Bernard into an iterative thinker. With BERNARD_REACT_MODE=true, every turn runs a think → act → evaluate → decide loop with new plan, think, and evaluate tools. The step budget triples (capped at 150) so multi-step plans actually finish, and a post-run verifier re-checks the plan if any step is still unresolved. Turn it on for hard tasks; leave it off for chat — you can flip the toggle live in /agent-options.

bernard
bernard> /agent-options
  ReAct (coordinator) mode  [on]

bernard> ship a release: build, test, tag, push
  ▶ plan: 4 steps drafted
  ▶ think: tests must pass before tagging
  ▶ shell: npm test — 2074 passed
  ▶ evaluate: step 1/4 done, advancing
  … (think → act → evaluate, repeated)
   all 4 steps verified
  • Coordinator / ReAct loop with plan, think, evaluate tools (#117)
  • Per-turn step budget grows to min(BERNARD_MAX_STEPS × 3, 150) when ReAct is on
  • Post-run activity logs and plan verification with up to 2 enforcement re-prompts (#137)

New UI surface new

Menus are now arrow-key driven by default. Bernard pins the menu region with setPinnedRegion, you move with arrows or digits 1–9, and Enter commits. Non-TTY environments and BERNARD_PLAIN_MENU=1 still get the classic numbered prompt as a fallback. /image <path> [prompt] attaches an image to the next turn, and a /options tool-details toggle collapses the per-call tool noise into compact one-liners.

bernard
bernard> /image ./screenshot.png describe this UI
// attached: screenshot.png (412 KB)
  ▶ reading image …
  └─ A dark dashboard with three tabs: Memory, Cron, MCP. The Memory tab…

bernard> /theme
  ? Pick a theme — ↑/↓ Enter
  ❯ bernard
    ocean
    forest
    synthwave
  • Arrow-key menu component with BERNARD_PLAIN_MENU fallback (#113, #132)
  • Image attachment via /image on vision-capable models (#112)
  • /options tool-details compact / verbose toggle — BERNARD_TOOL_DETAILS (#119)

Operators & infra new

A new tool augmentation layer wraps every tool's execute to observe errors and patch the system prompt with concrete bad-example fixes — no hallucination, only real observed failures. Shell commands are split into sub-categories (shell.git, shell.gh, shell.docker, shell.npm, shell.fs, shell.http) so guidance stays scoped. Cron jobs gained persistent notes: the daemon injects cron_notes_read / cron_notes_write scoped to the current job so long-running schedules don't redo work after restarts.

bernard
// In a cron job, Bernard reads notes from the previous run:
  ▶ cron_notes_read — 3 entries
    1. last sync cursor: 2026-05-07T03:14Z
    2. skip account #482 — flagged for review
    3. retry budget reset to 0
  ▶ sync_accounts { since: "2026-05-07T03:14Z" }
  ▶ cron_notes_write { entry: "synced 12 accounts, cursor → …" }
   run finished — notes saved
  • Tool augmentation layer with auto-learned bad-example fixes — tool-wrapper specialists (shell-wrapper, file-wrapper, web-wrapper), correction agent, and specialist creator (#111)
  • Cron persistent notes: per-job JSON store, 100-entry cap, atomic writes, and a ## Notes written during this run section in cron_logs_get (#130)

Reliability & polish fix

  • Auto-create startup race fixed — pending candidates are promoted on enable (#135)
  • Critic and ReAct toggles now have round-trip test coverage (#129)
  • MCP-tool schema extraction handles both jsonSchema and the _jsonSchema shape produced by @ai-sdk/ui-utils
  • Reference-lookup timeout enforced via Promise.race against a real timer — resilient to tools that ignore abortSignal

Also in this release

Smaller-but-load-bearing pieces that ship with v0.9.0 — many of them power the headline themes above.

Bundled specialists

Five tool-wrapper specialists ship out of the box and seed on first run: shell-wrapper, file-wrapper, web-wrapper, correction-agent, and specialist-creator. A .seeded-v1 marker prevents re-seeding so user edits stick.

tool_wrapper_run dispatch

The main agent talks to specialists through one strict-typed dispatch tool. Each call is isolated to its targetTools — the shell-wrapper can't accidentally browse the web. Output is guaranteed JSON: {status, result, error?, reasoning?}.

web_search with provider chain

A new built-in web_search tool tries Brave first (BRAVE_API_KEY), falls back to Tavily (TAVILY_API_KEY), and finally to a DuckDuckGo scrape so search keeps working without keys. Returns [{title, url, snippet}].

Correction loop with validation

When a tool-wrapper returns status: 'error', the args land in a candidate queue. At REPL shutdown the correction-agent proposes a fix and actually executes it via tool_wrapper_run. Only validated fixes get appended to the specialist's good/bad examples (capped at 10 each).

Reasoning logs

Every tool-wrapper run appends one JSONL entry to logs/tool-wrappers.jsonl. Easy to jq, easy to grep, easy to ship to a log aggregator. Pairs naturally with the new post-run activity logs to give you a full audit trail of what Bernard actually did.

OS-aware tool prompts

osPromptBlock() detects the host OS and injects shell hints (Linux, macOS, Windows) into every tool-wrapper system prompt — so the shell-wrapper doesn't suggest brew on Ubuntu or GNU-only flags on macOS.

XDG paths everywhere

All file paths now follow the XDG Base Directory Specification, centralized in src/paths.ts. Set BERNARD_HOME for the legacy flat layout. Existing ~/.bernard/ installs auto-migrate on first run and leave a MIGRATED marker behind.

Sub-agent context guard

BERNARD_SUBAGENT_RESULT_MAX_CHARS (default 4000) caps how much of a sub-agent or specialist's output gets fed back into the parent context. The user still sees the full output in the terminal — only the parent's memory budget gets protected.

Post-launch additions

Follow-ups shipped on top of v0.9.0, included here so the release notes track one continuous milestone.

Custom Providers feature

Register named providers that wrap one of the three installed SDKs (openai, anthropic, xai) and point it at a different endpoint — Ollama, LM Studio, vLLM, OpenRouter, internal proxies, anything compatible. Multiple custom providers coexist with the built-ins.

  • CLI: bernard add-provider <name> --sdk <openai|anthropic|xai> --base-url <url> --model <model>, bernard remove-provider <name>, bernard providers.
  • REPL: /provider now ends with + Add custom provider… (interactive wizard); for a custom provider, /model ends with + Type a new model name… and remembers what you typed.
  • Keys are stored in keys.json alongside built-in keys and are never injected into process.env.

Mid-Turn Questions with ask_user feature

A new ask_user tool lets the agent pause the turn and ask clarifying questions instead of writing them as prose (which previously left the turn idle and, in coordinator mode, tripped the plan-enforcement loop).

  • Batches up to 10 questions in one call — useful for “title + body + labels” style prompts — with a pinned tab strip showing [1 ✓ ▸2◂ 3] progress.
  • Each question can be free-form or constrained by a choices list with an optional “Other” escape hatch (custom label supported).
  • Esc mid-batch returns whatever was already answered, so partial input isn’t lost. Headless sessions get {unavailable: true} so the agent can pick a default.

Dangerous-Command Confirmation Fix fix

The dangerous-command confirmation now uses the arrow-key menu instead of the legacy rl.question prompt, fixing a hang where the spinner blanked the prompt and the agent waited indefinitely. rm calls scoped to Bernard’s own scratch-script directory (<os-tmpdir>/bernard-*, e.g. /tmp/bernard-* on Linux — an internal constant in src/tools/shell.ts, not a configurable env var) and free of shell metacharacters skip the prompt entirely — that’s just Bernard cleaning up after itself.

Wrapper Routing + Tool-Call Repair reliability

The main agent’s direct shell, web_read, file_read_lines, and file_edit_lines calls are now transparently routed through their bundled wrapper specialists — same tool name and schema, only the execute path changes. The model can’t tell the difference, but wrappers add OS-aware examples, schema validation, and result capping.

  • A one-shot repair hook is wired into every generateText site (main, specialist, subagent, tool-wrapper, cron). On InvalidToolArgumentsError or NoSuchToolError the model gets re-prompted with the actual error.
  • Argument-truncation errors (the classic 16 KB heredoc cut mid-string) trigger a specific hint to use file_write + shell instead of inlining the payload.
  • tool_wrapper_run now strips the wrapper’s reasoning array and caps the result field before returning to the parent, keeping the parent context clean. Reasoning still lands in logs/tool-wrappers.jsonl for debugging.
  • Wrapper errors are mapped back to the native tool’s error shape (e.g. shell{ output, is_error: true }), so detectToolError and tool-profile learning keep working under the shim.

v0.8.1 2026-04-06

Proportional Task Step Budget fix

Tasks now receive a proportional fraction of the available step budget instead of a fixed step count, and the final step forces text-only output to guarantee structured JSON results.

  • Introduced TASK_STEP_RATIO, getTaskMaxSteps, and makeLastStepTextOnly utilities in task.ts
  • Overhauled wrapTaskResult to robustly extract and validate JSON from arbitrary output, handling nested objects, escaped quotes, and multiple JSON blocks
  • Updated REPL and task tool to pass correct maxSteps and experimental_prepareStep options
  • Expanded test coverage for result extraction edge cases

Threshold Normalization for Auto-Create Agents fix

The auto-create threshold now accepts both fractional (0–1) and percentage (1–100) values, and pending specialist candidates are re-evaluated automatically when settings change.

  • Added normalizeThreshold utility in config.ts that clamps values to the [0, 1] range regardless of input format
  • Updated /agent-options threshold command to display both normalized and percentage values
  • Refactored auto-creation logic into a reusable promoteCandidate helper
  • Enabling auto-create now re-evaluates and promotes pending candidates above the threshold

v0.8.0 2026-03-27

Per-Agent Plan-Act-Critic System new

A reusable Plan-Act-Critic (PAC) loop now wraps sub-agents, specialists, and cron job executions when critic mode is enabled. The critic verifies each agent’s work and retries on failure with targeted feedback.

  • Extracted pac.ts module with runPACLoop() for consistent critic integration
  • Sub-agents, specialists, and cron jobs all use the same PAC pipeline automatically
  • On a FAIL verdict, the task is retried with the critic’s feedback (up to 2 retries)
  • Compact critic verdicts — PASS and WARN are now single-line inline, reducing terminal noise while keeping full detail for FAIL

Auto-Continue on Truncation new

When a response hits the max-tokens limit and is cut off, Bernard automatically continues where it left off — up to 3 continuations — then recommends an optimal token limit based on actual usage.

  • Seamless auto-continuation with no user intervention required
  • Post-completion recommendation of the ideal max-tokens value (1.25× actual usage)
  • Warning with /options max-tokens instructions if still incomplete after 3 continuations
  • New /debug command — prints a diagnostic report (runtime, config, API key status, MCP servers, RAG stats) with no secrets leaked

Specialist Overlap Detection & Auto-Creation new

Smarter specialist suggestions powered by token-based similarity scoring, plus a new auto-creation mode that saves high-confidence candidates without manual review.

  • Jaccard similarity across name, description, system prompt, and guidelines (weighted)
  • Candidates with >60% overlap are suppressed; partial overlaps suggest enhancing the existing specialist
  • New /agent-options command to toggle auto-creation and set confidence threshold at runtime
  • BERNARD_AUTO_CREATE_SPECIALISTS and BERNARD_AUTO_CREATE_THRESHOLD environment variables

Configurable Loop Limit & Default Planning enhanced

Control how many tool-call iterations the agent can take per request, and benefit from step-by-step planning in every conversation — not just critic mode.

  • BERNARD_MAX_STEPS environment variable (default 25) with runtime override via /options max-steps
  • Sub-agents automatically receive 50% of the main agent’s step budget
  • When the step limit is hit, Bernard offers to double it for the current session
  • System prompt now includes a planning section by default, improving multi-step task execution

Timestamp Awareness enhanced

Bernard now has full temporal context — every user message carries an ISO 8601 timestamp, and the system prompt includes the current date and time.

  • ISO 8601 timestamp prefixes on every user message for precise time context
  • System prompt upgraded from “Today’s date” to “Current date and time”
  • time tool output now includes seconds and timezone
  • Bug fix: MCP URL transport correctly uses HTTP for non-SSE endpoints (was always defaulting to SSE)

v0.7.0 2026-03-13

File Manipulation Tools new

Two new tools — file_read_lines and file_edit_lines — give Bernard precision file editing without shelling out to sed or awk.

  • Line-based editing with replace, insert, delete, and append operations
  • Atomic writes — all edits in a single call succeed or all fail
  • Binary file detection and pagination for large files (50 MB cap)
  • file_read_lines supports offset/limit for targeted reads across big files

Per-Agent Model Selection new

Override the provider and model for any specialist, sub-agent, or task invocation. Set a default model on specialists at creation time, or pass an override at call time.

  • Specialists store a default provider/model that is used every time they run
  • Sub-agents and tasks accept per-invocation provider/model overrides
  • Cross-provider safety — auto-resolves the default model when switching providers
  • Model tags shown in /specialists listings

Deterministic Single-Step Tasks enhanced

Tasks are now single-step: one LLM call with tool use, then structured output. Saved tasks can be invoked by ID, and the REPL shows tasks and routines separately.

  • Reduced to maxSteps: 2 (one tool-use round + structured result)
  • New taskId parameter — run saved task routines by ID or via /task-{id} in the REPL
  • Separate display for tasks vs routines in the REPL
  • Auto-context injection — working directory, available tools, memory, and RAG are included automatically

v0.6.2 2026-03-12

Critic Visibility — Adaptive Truncation fix

Tool-result summaries shown to the critic now use an adaptive 8000-char budget instead of a fixed 500-char slice, so the critic sees far more context before judging.

  • Adaptive 8000-char budget with a 500-char floor per tool result
  • Response truncation raised from 2000 to 4000 chars, args from 500 to 1000
  • Truncation markers (...) added to tool results
  • Critic prompt updated to not treat truncation as failure evidence

Error Handling & Eventual Consistency fix

Agents no longer retry the exact same failed command, and external API mutations get a short delay before re-verification to account for eventual consistency.

  • No retrying the exact same failed command across all agent types and cron jobs
  • Wait 2–5s before re-verifying external API mutations

Specialist Candidate Filtering fix

Candidates that have already been saved as specialists are automatically filtered out of future suggestions.

  • Auto-marks candidates as accepted when a specialist is created by draft ID or name
  • Defensive filtering at startup and in /candidates output

v0.6.1 2026-03-11

Critic Retry Loop fix

Critic mode now retries up to 2 times when it detects issues, feeding its verdict back to the agent for correction before giving up.

  • Up to 2 automatic retries when critic returns WARN or FAIL
  • Critic feedback is injected into conversation history so the agent can self-correct
  • Improved verdict parsing using regex-based extraction
  • REPL prompt now shows a diamond symbol instead of [CRITIC] label
  • New parseCriticVerdict utility extracted to output module

/create-task Command fix

Added /create-task guided creation command and refactored duplicate creation logic.

  • New /create-task REPL command for guided task routine creation
  • Refactored duplicate guided creation logic into a shared runGuidedCreation helper
  • Help menu now lists /routines, /create-routine, and /create-task commands

Specialist Auto-Dispatch fix

Specialists are now automatically matched to user input using keyword scoring, with high-confidence matches auto-dispatched.

  • Keyword-based scoring matches user input against saved specialists
  • Scores ≥ 0.8 trigger automatic dispatch; 0.4–0.8 prompt for confirmation
  • Match advisory injected into system prompt so the agent knows which specialists fit
  • Stop-word filtering prevents common words from inflating scores

v0.6.0 2026-03-10

Tasks new

Tasks are isolated, focused executions that return structured JSON output. Unlike sub-agents (which return free-form text), tasks always produce a {status, output, details?} response — making them ideal for machine-readable results, routine chaining, and conditional branching.

  • 5-step budget — tasks are meant to be quick and focused
  • Structured JSON output — always returns {status: "success"|"error", output, details?}
  • Completely isolated from the current session — no conversation history
  • Available as both a tool and a command via /task
  • Shared concurrency pool with sub-agents (4 slots max)
bernard
bernard> /task List all TypeScript files in src/
┌─ task — List all TypeScript files in src/
  ▶ shell: find src -name "*.ts" -type f
└─ task success: Found 23 .ts files

// The agent can also call tasks during routines:
bernard> run the deploy routine
  ▶ task: "Check test suite passes"
  ┌─ task — Check test suite passes
    ▶ shell: npm test
  └─ task success: All 994 tests passed
  ▶ shell: git push origin main

Specialists new

Specialists are reusable expert profiles — persistent personas with custom system prompts and behavioral guidelines that shape how a sub-agent approaches work. Unlike routines (which define what steps to follow), specialists define how to work.

  • Each specialist run gets its own generateText loop with a 10-step budget
  • One JSON file per specialist in ~/.local/share/bernard/specialists/
  • Manage via /specialists or invoke directly with /{id}
  • Up to 50 specialists with lowercase kebab-case IDs
bernard
bernard> create a specialist called "code-reviewer" that reviews code for correctness, style, and security
  ▶ specialist: create { id: "code-reviewer", name: "Code Reviewer", … }
   Specialist "Code Reviewer" (code-reviewer) created.

bernard> /code-reviewer review the changes in src/agent.ts
┌─ spec:1 [Code Reviewer] — review the changes in src/agent.ts
  ▶ shell: git diff src/agent.ts
└─ spec:1 done

// List all saved specialists:
bernard> /specialists

Specialist Suggestions new

Bernard automatically detects recurring delegation patterns in your conversations and suggests new specialists. Detection runs in the background when you exit a session or use /clear --save.

  • Review pending suggestions with /candidates
  • Each candidate includes a name, description, confidence score, and reasoning
  • Accept or reject candidates conversationally
  • Auto-dismissed after 30 days if not reviewed; up to 10 pending at a time
bernard
// On startup, Bernard notifies you of pending suggestions:
  2 specialist suggestion(s) pending. Use /candidates to review.

bernard> /candidates

  1. code-reviewer (confidence: 0.85)
    Reviews PRs for correctness, style, and security issues

  2. db-migrator (confidence: 0.72)
    Manages database schema changes and migration scripts

bernard> accept the code-reviewer candidate
   Specialist "Code Reviewer" (code-reviewer) created.

Critic Mode new

Critic mode adds planning, proactive scratch/memory usage, and post-response verification. Recommended for high-stakes work like deployments, git operations, and multi-file edits.

  • Planning — writes a plan to scratch before multi-step tasks
  • Proactive scratch — accumulates findings during complex work
  • Verification — a critic agent reviews the work and prints a verdict (PASS / WARN / FAIL)
bernard
bernard> /critic on
   Critic mode enabled.

bernard> refactor the auth module to use JWT
  ▶ scratch: write plan
  ▶ shell: cat src/auth.ts
  ▶ write: src/auth.ts
  ▶ shell: npm test

  ┌─ critic verdict
  │ PASS — All claimed edits match tool calls.
  │ Tests pass after refactor. No issues found.
  └─

bernard> /critic off
  Critic mode disabled.

Conversation Summaries new

A fourth specialized domain in Bernard's RAG memory system, joining Tool Usage Patterns, User Preferences, and General Knowledge. Conversation summaries capture what was discussed, approaches taken, tools/specialists/routines used, and outcomes.

  • Extracts high-level summaries of what was accomplished each session
  • Helps identify recurring patterns across conversations
  • All four domain extractors run in parallel — no extra latency
  • Search returns up to 5 results per domain (15 total max), preventing any single domain from crowding out others
bernard
// After a session, Bernard extracts memories by domain:
  ▶ Extracting facts…
    tool-usage: 3 facts
    user-preferences: 1 fact
    general: 2 facts
    conversations: 1 fact

// In a future session, recalled context is organized by domain:
bernard> /rag
  RAG Memory: 142 facts across 4 domains
    tool-usage       48 facts
    user-preferences 31 facts
    general          41 facts
    conversations    22 facts

Earlier releases — Routines, the multi-provider switch, and the first sub-agent loop — predate this changelog format and aren't reproduced here. See the git history for anything before v0.6.0.