Release Notes

All Bernard releases, newest first.

Milestone release

v0.9.0 2026-05-07 Bernard learns to look things up.

The largest update since launch. v0.9.0 reshapes how Bernard handles ambiguity: references resolve against memory before each turn, unknown entities trigger a read-only tool lookup, and prompts get rewritten for the active model family. A new coordinator (ReAct) mode adds iterative reasoning with verifiable plans, the UI moves to arrow-key menus, and a tool augmentation layer learns from every error your tools throw.

16 PRs merged
14 user-facing features
7 new env vars
5 bundled specialists

Smarter conversations new

Bernard now understands references the same way a human collaborator does. Before each turn, a reference resolver scans your message for named entities ("my brother", "the staging cluster") and looks them up in persistent memory. If the answer is missing, Bernard tries one read-only tool (e.g. a Google Contacts MCP) before asking you, then offers a Save / Edit / Skip menu so the answer sticks. A model-specific prompt rewriter then reshapes the message for the active model family, and a context-gathering protocol keeps the agent from inventing numbers it could have just looked up.

bernard
bernard> email my brother about the trip
// resolver: "my brother" not in memory — trying lookup
  ▶ google_contacts__search { query: "brother" }
  └─ found: Daniel Pereira <daniel@example.com>

? Save this for next time?
  ❯ Save   Edit  Skip

  ▶ gmail__send_email { to: "daniel@example.com", … }
   sent
  • Reference resolver with four outcomes: resolved, ambiguous, unknown, no-op (#128)
  • Reference tool-lookup pass with read-only allowlist + 5 s timeout (#138)
  • Model-specific prompt rewriter using ModelProfile.rewriterHint, fail-open (#125, #131)
  • Context-gathering protocol baked into the base system prompt (#126)

Better reasoning new

Coordinator (ReAct) mode turns Bernard into an iterative thinker. With BERNARD_REACT_MODE=true, every turn runs a think → act → evaluate → decide loop with new plan, think, and evaluate tools. The step budget triples (capped at 150) so multi-step plans actually finish, and a post-run verifier re-checks the plan if any step is still unresolved. Turn it on for hard tasks; leave it off for chat — you can flip the toggle live in /agent-options.

bernard
bernard> /agent-options
  ReAct (coordinator) mode  [on]

bernard> ship a release: build, test, tag, push
  ▶ plan: 4 steps drafted
  ▶ think: tests must pass before tagging
  ▶ shell: npm test — 2074 passed
  ▶ evaluate: step 1/4 done, advancing
  … (think → act → evaluate, repeated)
   all 4 steps verified
  • Coordinator / ReAct loop with plan, think, evaluate tools (#117)
  • Per-turn step budget grows to min(BERNARD_MAX_STEPS × 3, 150) when ReAct is on
  • Post-run activity logs and plan verification with up to 2 enforcement re-prompts (#137)

New UI surface new

Menus are now arrow-key driven by default. Bernard pins the menu region with setPinnedRegion, you move with arrows or digits 1–9, and Enter commits. Non-TTY environments and BERNARD_PLAIN_MENU=1 still get the classic numbered prompt as a fallback. /image <path> [prompt] attaches an image to the next turn, and a /options tool-details toggle collapses the per-call tool noise into compact one-liners.

bernard
bernard> /image ./screenshot.png describe this UI
// attached: screenshot.png (412 KB)
  ▶ reading image …
  └─ A dark dashboard with three tabs: Memory, Cron, MCP. The Memory tab…

bernard> /theme
  ? Pick a theme — ↑/↓ Enter
  ❯ bernard
    ocean
    forest
    synthwave
  • Arrow-key menu component with BERNARD_PLAIN_MENU fallback (#113, #132)
  • Image attachment via /image on vision-capable models (#112)
  • /options tool-details compact / verbose toggle — BERNARD_TOOL_DETAILS (#119)

Operators & infra new

A new tool augmentation layer wraps every tool's execute to observe errors and patch the system prompt with concrete bad-example fixes — no hallucination, only real observed failures. Shell commands are split into sub-categories (shell.git, shell.gh, shell.docker, shell.npm, shell.fs, shell.http) so guidance stays scoped. Cron jobs gained persistent notes: the daemon injects cron_notes_read / cron_notes_write scoped to the current job so long-running schedules don't redo work after restarts.

bernard
// In a cron job, Bernard reads notes from the previous run:
  ▶ cron_notes_read — 3 entries
    1. last sync cursor: 2026-05-07T03:14Z
    2. skip account #482 — flagged for review
    3. retry budget reset to 0
  ▶ sync_accounts { since: "2026-05-07T03:14Z" }
  ▶ cron_notes_write { entry: "synced 12 accounts, cursor → …" }
   run finished — notes saved
  • Tool augmentation layer with auto-learned bad-example fixes — tool-wrapper specialists (shell-wrapper, file-wrapper, web-wrapper), correction agent, and specialist creator (#111)
  • Cron persistent notes: per-job JSON store, 100-entry cap, atomic writes, and a ## Notes written during this run section in cron_logs_get (#130)

Reliability & polish fix

  • Auto-create startup race fixed — pending candidates are promoted on enable (#135)
  • Critic and ReAct toggles now have round-trip test coverage (#129)
  • MCP-tool schema extraction handles both jsonSchema and the _jsonSchema shape produced by @ai-sdk/ui-utils
  • Reference-lookup timeout enforced via Promise.race against a real timer — resilient to tools that ignore abortSignal

Also in this release

Smaller-but-load-bearing pieces that ship with v0.9.0 — many of them power the headline themes above.

Bundled specialists

Five tool-wrapper specialists ship out of the box and seed on first run: shell-wrapper, file-wrapper, web-wrapper, correction-agent, and specialist-creator. A .seeded-v1 marker prevents re-seeding so user edits stick.

tool_wrapper_run dispatch

The main agent talks to specialists through one strict-typed dispatch tool. Each call is isolated to its targetTools — the shell-wrapper can't accidentally browse the web. Output is guaranteed JSON: {status, result, error?, reasoning?}.

web_search with provider chain

A new built-in web_search tool tries Brave first (BRAVE_API_KEY), falls back to Tavily (TAVILY_API_KEY), and finally to a DuckDuckGo scrape so search keeps working without keys. Returns [{title, url, snippet}].

Correction loop with validation

When a tool-wrapper returns status: 'error', the args land in a candidate queue. At REPL shutdown the correction-agent proposes a fix and actually executes it via tool_wrapper_run. Only validated fixes get appended to the specialist's good/bad examples (capped at 10 each).

Reasoning logs

Every tool-wrapper run appends one JSONL entry to logs/tool-wrappers.jsonl. Easy to jq, easy to grep, easy to ship to a log aggregator. Pairs naturally with the new post-run activity logs to give you a full audit trail of what Bernard actually did.

OS-aware tool prompts

osPromptBlock() detects the host OS and injects shell hints (Linux, macOS, Windows) into every tool-wrapper system prompt — so the shell-wrapper doesn't suggest brew on Ubuntu or GNU-only flags on macOS.

XDG paths everywhere

All file paths now follow the XDG Base Directory Specification, centralized in src/paths.ts. Set BERNARD_HOME for the legacy flat layout. Existing ~/.bernard/ installs auto-migrate on first run and leave a MIGRATED marker behind.

Sub-agent context guard

BERNARD_SUBAGENT_RESULT_MAX_CHARS (default 4000) caps how much of a sub-agent or specialist's output gets fed back into the parent context. The user still sees the full output in the terminal — only the parent's memory budget gets protected.

v0.8.1 2026-04-06

Proportional Task Step Budget fix

Tasks now receive a proportional fraction of the available step budget instead of a fixed step count, and the final step forces text-only output to guarantee structured JSON results.

  • Introduced TASK_STEP_RATIO, getTaskMaxSteps, and makeLastStepTextOnly utilities in task.ts
  • Overhauled wrapTaskResult to robustly extract and validate JSON from arbitrary output, handling nested objects, escaped quotes, and multiple JSON blocks
  • Updated REPL and task tool to pass correct maxSteps and experimental_prepareStep options
  • Expanded test coverage for result extraction edge cases

Threshold Normalization for Auto-Create Agents fix

The auto-create threshold now accepts both fractional (0–1) and percentage (1–100) values, and pending specialist candidates are re-evaluated automatically when settings change.

  • Added normalizeThreshold utility in config.ts that clamps values to the [0, 1] range regardless of input format
  • Updated /agent-options threshold command to display both normalized and percentage values
  • Refactored auto-creation logic into a reusable promoteCandidate helper
  • Enabling auto-create now re-evaluates and promotes pending candidates above the threshold

v0.8.0 2026-03-27

Per-Agent Plan-Act-Critic System new

A reusable Plan-Act-Critic (PAC) loop now wraps sub-agents, specialists, and cron job executions when critic mode is enabled. The critic verifies each agent’s work and retries on failure with targeted feedback.

  • Extracted pac.ts module with runPACLoop() for consistent critic integration
  • Sub-agents, specialists, and cron jobs all use the same PAC pipeline automatically
  • On a FAIL verdict, the task is retried with the critic’s feedback (up to 2 retries)
  • Compact critic verdicts — PASS and WARN are now single-line inline, reducing terminal noise while keeping full detail for FAIL

Auto-Continue on Truncation new

When a response hits the max-tokens limit and is cut off, Bernard automatically continues where it left off — up to 3 continuations — then recommends an optimal token limit based on actual usage.

  • Seamless auto-continuation with no user intervention required
  • Post-completion recommendation of the ideal max-tokens value (1.25× actual usage)
  • Warning with /options max-tokens instructions if still incomplete after 3 continuations
  • New /debug command — prints a diagnostic report (runtime, config, API key status, MCP servers, RAG stats) with no secrets leaked

Specialist Overlap Detection & Auto-Creation new

Smarter specialist suggestions powered by token-based similarity scoring, plus a new auto-creation mode that saves high-confidence candidates without manual review.

  • Jaccard similarity across name, description, system prompt, and guidelines (weighted)
  • Candidates with >60% overlap are suppressed; partial overlaps suggest enhancing the existing specialist
  • New /agent-options command to toggle auto-creation and set confidence threshold at runtime
  • BERNARD_AUTO_CREATE_SPECIALISTS and BERNARD_AUTO_CREATE_THRESHOLD environment variables

Configurable Loop Limit & Default Planning enhanced

Control how many tool-call iterations the agent can take per request, and benefit from step-by-step planning in every conversation — not just critic mode.

  • BERNARD_MAX_STEPS environment variable (default 25) with runtime override via /options max-steps
  • Sub-agents automatically receive 50% of the main agent’s step budget
  • When the step limit is hit, Bernard offers to double it for the current session
  • System prompt now includes a planning section by default, improving multi-step task execution

Timestamp Awareness enhanced

Bernard now has full temporal context — every user message carries an ISO 8601 timestamp, and the system prompt includes the current date and time.

  • ISO 8601 timestamp prefixes on every user message for precise time context
  • System prompt upgraded from “Today’s date” to “Current date and time”
  • time tool output now includes seconds and timezone
  • Bug fix: MCP URL transport correctly uses HTTP for non-SSE endpoints (was always defaulting to SSE)

v0.7.0 2026-03-13

File Manipulation Tools new

Two new tools — file_read_lines and file_edit_lines — give Bernard precision file editing without shelling out to sed or awk.

  • Line-based editing with replace, insert, delete, and append operations
  • Atomic writes — all edits in a single call succeed or all fail
  • Binary file detection and pagination for large files (50 MB cap)
  • file_read_lines supports offset/limit for targeted reads across big files

Per-Agent Model Selection new

Override the provider and model for any specialist, sub-agent, or task invocation. Set a default model on specialists at creation time, or pass an override at call time.

  • Specialists store a default provider/model that is used every time they run
  • Sub-agents and tasks accept per-invocation provider/model overrides
  • Cross-provider safety — auto-resolves the default model when switching providers
  • Model tags shown in /specialists listings

Deterministic Single-Step Tasks enhanced

Tasks are now single-step: one LLM call with tool use, then structured output. Saved tasks can be invoked by ID, and the REPL shows tasks and routines separately.

  • Reduced to maxSteps: 2 (one tool-use round + structured result)
  • New taskId parameter — run saved task routines by ID or via /task-{id} in the REPL
  • Separate display for tasks vs routines in the REPL
  • Auto-context injection — working directory, available tools, memory, and RAG are included automatically

v0.6.2 2026-03-12

Critic Visibility — Adaptive Truncation fix

Tool-result summaries shown to the critic now use an adaptive 8000-char budget instead of a fixed 500-char slice, so the critic sees far more context before judging.

  • Adaptive 8000-char budget with a 500-char floor per tool result
  • Response truncation raised from 2000 to 4000 chars, args from 500 to 1000
  • Truncation markers (...) added to tool results
  • Critic prompt updated to not treat truncation as failure evidence

Error Handling & Eventual Consistency fix

Agents no longer retry the exact same failed command, and external API mutations get a short delay before re-verification to account for eventual consistency.

  • No retrying the exact same failed command across all agent types and cron jobs
  • Wait 2–5s before re-verifying external API mutations

Specialist Candidate Filtering fix

Candidates that have already been saved as specialists are automatically filtered out of future suggestions.

  • Auto-marks candidates as accepted when a specialist is created by draft ID or name
  • Defensive filtering at startup and in /candidates output

v0.6.1 2026-03-11

Critic Retry Loop fix

Critic mode now retries up to 2 times when it detects issues, feeding its verdict back to the agent for correction before giving up.

  • Up to 2 automatic retries when critic returns WARN or FAIL
  • Critic feedback is injected into conversation history so the agent can self-correct
  • Improved verdict parsing using regex-based extraction
  • REPL prompt now shows a diamond symbol instead of [CRITIC] label
  • New parseCriticVerdict utility extracted to output module

/create-task Command fix

Added /create-task guided creation command and refactored duplicate creation logic.

  • New /create-task REPL command for guided task routine creation
  • Refactored duplicate guided creation logic into a shared runGuidedCreation helper
  • Help menu now lists /routines, /create-routine, and /create-task commands

Specialist Auto-Dispatch fix

Specialists are now automatically matched to user input using keyword scoring, with high-confidence matches auto-dispatched.

  • Keyword-based scoring matches user input against saved specialists
  • Scores ≥ 0.8 trigger automatic dispatch; 0.4–0.8 prompt for confirmation
  • Match advisory injected into system prompt so the agent knows which specialists fit
  • Stop-word filtering prevents common words from inflating scores

v0.6.0 2026-03-10

Tasks new

Tasks are isolated, focused executions that return structured JSON output. Unlike sub-agents (which return free-form text), tasks always produce a {status, output, details?} response — making them ideal for machine-readable results, routine chaining, and conditional branching.

  • 5-step budget — tasks are meant to be quick and focused
  • Structured JSON output — always returns {status: "success"|"error", output, details?}
  • Completely isolated from the current session — no conversation history
  • Available as both a tool and a command via /task
  • Shared concurrency pool with sub-agents (4 slots max)
bernard
bernard> /task List all TypeScript files in src/
┌─ task — List all TypeScript files in src/
  ▶ shell: find src -name "*.ts" -type f
└─ task success: Found 23 .ts files

// The agent can also call tasks during routines:
bernard> run the deploy routine
  ▶ task: "Check test suite passes"
  ┌─ task — Check test suite passes
    ▶ shell: npm test
  └─ task success: All 994 tests passed
  ▶ shell: git push origin main

Specialists new

Specialists are reusable expert profiles — persistent personas with custom system prompts and behavioral guidelines that shape how a sub-agent approaches work. Unlike routines (which define what steps to follow), specialists define how to work.

  • Each specialist run gets its own generateText loop with a 10-step budget
  • One JSON file per specialist in ~/.local/share/bernard/specialists/
  • Manage via /specialists or invoke directly with /{id}
  • Up to 50 specialists with lowercase kebab-case IDs
bernard
bernard> create a specialist called "code-reviewer" that reviews code for correctness, style, and security
  ▶ specialist: create { id: "code-reviewer", name: "Code Reviewer", … }
   Specialist "Code Reviewer" (code-reviewer) created.

bernard> /code-reviewer review the changes in src/agent.ts
┌─ spec:1 [Code Reviewer] — review the changes in src/agent.ts
  ▶ shell: git diff src/agent.ts
└─ spec:1 done

// List all saved specialists:
bernard> /specialists

Specialist Suggestions new

Bernard automatically detects recurring delegation patterns in your conversations and suggests new specialists. Detection runs in the background when you exit a session or use /clear --save.

  • Review pending suggestions with /candidates
  • Each candidate includes a name, description, confidence score, and reasoning
  • Accept or reject candidates conversationally
  • Auto-dismissed after 30 days if not reviewed; up to 10 pending at a time
bernard
// On startup, Bernard notifies you of pending suggestions:
  2 specialist suggestion(s) pending. Use /candidates to review.

bernard> /candidates

  1. code-reviewer (confidence: 0.85)
    Reviews PRs for correctness, style, and security issues

  2. db-migrator (confidence: 0.72)
    Manages database schema changes and migration scripts

bernard> accept the code-reviewer candidate
   Specialist "Code Reviewer" (code-reviewer) created.

Critic Mode new

Critic mode adds planning, proactive scratch/memory usage, and post-response verification. Recommended for high-stakes work like deployments, git operations, and multi-file edits.

  • Planning — writes a plan to scratch before multi-step tasks
  • Proactive scratch — accumulates findings during complex work
  • Verification — a critic agent reviews the work and prints a verdict (PASS / WARN / FAIL)
bernard
bernard> /critic on
   Critic mode enabled.

bernard> refactor the auth module to use JWT
  ▶ scratch: write plan
  ▶ shell: cat src/auth.ts
  ▶ write: src/auth.ts
  ▶ shell: npm test

  ┌─ critic verdict
  │ PASS — All claimed edits match tool calls.
  │ Tests pass after refactor. No issues found.
  └─

bernard> /critic off
  Critic mode disabled.

Conversation Summaries new

A fourth specialized domain in Bernard's RAG memory system, joining Tool Usage Patterns, User Preferences, and General Knowledge. Conversation summaries capture what was discussed, approaches taken, tools/specialists/routines used, and outcomes.

  • Extracts high-level summaries of what was accomplished each session
  • Helps identify recurring patterns across conversations
  • All four domain extractors run in parallel — no extra latency
  • Search returns up to 5 results per domain (15 total max), preventing any single domain from crowding out others
bernard
// After a session, Bernard extracts memories by domain:
  ▶ Extracting facts…
    tool-usage: 3 facts
    user-preferences: 1 fact
    general: 2 facts
    conversations: 1 fact

// In a future session, recalled context is organized by domain:
bernard> /rag
  RAG Memory: 142 facts across 4 domains
    tool-usage       48 facts
    user-preferences 31 facts
    general          41 facts
    conversations    22 facts

Earlier releases — Routines, the multi-provider switch, and the first sub-agent loop — predate this changelog format and aren't reproduced here. See the git history for anything before v0.6.0.