Release Notes

Milestone release

v0.9.0 2026-05-07 Bernard learns to look things up.

The largest update since launch. v0.9.0 reshapes how Bernard handles ambiguity: references resolve against memory before each turn, unknown entities trigger a read-only tool lookup, and prompts get rewritten for the active model family. A new coordinator (ReAct) mode adds iterative reasoning with verifiable plans, the UI moves to arrow-key menus, and a tool augmentation layer learns from every error your tools throw.

16 PRs merged

14 user-facing features

7 new env vars

5 bundled specialists

Smarter conversations new

Bernard now understands references the same way a human collaborator does. Before each turn, a reference resolver scans your message for named entities ("my brother", "the staging cluster") and looks them up in persistent memory. If the answer is missing, Bernard tries one read-only tool (e.g. a Google Contacts MCP) before asking you, then offers a Save / Edit / Skip menu so the answer sticks. A model-specific prompt rewriter then reshapes the message for the active model family, and a context-gathering protocol keeps the agent from inventing numbers it could have just looked up.

bernard

bernard> email my brother about the trip
// resolver: "my brother" not in memory — trying lookup
  ▶ google_contacts__search { query: "brother" }
  └─ found: Daniel Pereira <daniel@example.com>

? Save this for next time?
  ❯ Save   Edit  Skip

  ▶ gmail__send_email { to: "daniel@example.com", … }
  ✓ sent

Reference resolver with four outcomes: resolved, ambiguous, unknown, no-op (#128)
Reference tool-lookup pass with read-only allowlist + 5 s timeout (#138)
Model-specific prompt rewriter using ModelProfile.rewriterHint, fail-open (#125, #131)
Context-gathering protocol baked into the base system prompt (#126)

Better reasoning new

Coordinator (ReAct) mode turns Bernard into an iterative thinker. With BERNARD_REACT_MODE=true, every turn runs a think → act → evaluate → decide loop with new plan, think, and evaluate tools. The step budget triples (capped at 150) so multi-step plans actually finish, and a post-run verifier re-checks the plan if any step is still unresolved. Turn it on for hard tasks; leave it off for chat — you can flip the toggle live in /agent-options.

bernard

bernard> /agent-options
  ReAct (coordinator) mode  [on]

bernard> ship a release: build, test, tag, push
  ▶ plan: 4 steps drafted
  ▶ think: tests must pass before tagging
  ▶ shell: npm test — 2074 passed
  ▶ evaluate: step 1/4 done, advancing
  … (think → act → evaluate, repeated)
  ✓ all 4 steps verified

Coordinator / ReAct loop with plan, think, evaluate tools (#117)
Per-turn step budget grows to min(BERNARD_MAX_STEPS × 3, 150) when ReAct is on
Post-run activity logs and plan verification with up to 2 enforcement re-prompts (#137)

New UI surface new

Menus are now arrow-key driven by default. Bernard pins the menu region with setPinnedRegion, you move with arrows or digits 1–9, and Enter commits. Non-TTY environments and BERNARD_PLAIN_MENU=1 still get the classic numbered prompt as a fallback. /image <path> [prompt] attaches an image to the next turn, and a /options tool-details toggle collapses the per-call tool noise into compact one-liners.

bernard

bernard> /image ./screenshot.png describe this UI
// attached: screenshot.png (412 KB)
  ▶ reading image …
  └─ A dark dashboard with three tabs: Memory, Cron, MCP. The Memory tab…

bernard> /theme
  ? Pick a theme — ↑/↓ Enter
  ❯ bernard
    ocean
    forest
    synthwave

Arrow-key menu component with BERNARD_PLAIN_MENU fallback (#113, #132)
Image attachment via /image on vision-capable models (#112)
/options tool-details compact / verbose toggle — BERNARD_TOOL_DETAILS (#119)

Operators & infra new

A new tool augmentation layer wraps every tool's execute to observe errors and patch the system prompt with concrete bad-example fixes — no hallucination, only real observed failures. Shell commands are split into sub-categories (shell.git, shell.gh, shell.docker, shell.npm, shell.fs, shell.http) so guidance stays scoped. Cron jobs gained persistent notes: the daemon injects cron_notes_read / cron_notes_write scoped to the current job so long-running schedules don't redo work after restarts.

bernard

// In a cron job, Bernard reads notes from the previous run:
  ▶ cron_notes_read — 3 entries
    1. last sync cursor: 2026-05-07T03:14Z
    2. skip account #482 — flagged for review
    3. retry budget reset to 0
  ▶ sync_accounts { since: "2026-05-07T03:14Z" }
  ▶ cron_notes_write { entry: "synced 12 accounts, cursor → …" }
  ✓ run finished — notes saved

Tool augmentation layer with auto-learned bad-example fixes — tool-wrapper specialists (shell-wrapper, file-wrapper, web-wrapper), correction agent, and specialist creator (#111)
Cron persistent notes: per-job JSON store, 100-entry cap, atomic writes, and a ## Notes written during this run section in cron_logs_get (#130)

Reliability & polish fix

Auto-create startup race fixed — pending candidates are promoted on enable (#135)
Critic and ReAct toggles now have round-trip test coverage (#129)
MCP-tool schema extraction handles both jsonSchema and the _jsonSchema shape produced by @ai-sdk/ui-utils
Reference-lookup timeout enforced via Promise.race against a real timer — resilient to tools that ignore abortSignal

Also in this release

Smaller-but-load-bearing pieces that ship with v0.9.0 — many of them power the headline themes above.

Bundled specialists

Five tool-wrapper specialists ship out of the box and seed on first run: shell-wrapper, file-wrapper, web-wrapper, correction-agent, and specialist-creator. A .seeded-v1 marker prevents re-seeding so user edits stick.

`tool_wrapper_run` dispatch

The main agent talks to specialists through one strict-typed dispatch tool. Each call is isolated to its targetTools — the shell-wrapper can't accidentally browse the web. Output is guaranteed JSON: {status, result, error?, reasoning?}.

`web_search` with provider chain

A new built-in web_search tool tries Brave first (BRAVE_API_KEY), falls back to Tavily (TAVILY_API_KEY), and finally to a DuckDuckGo scrape so search keeps working without keys. Returns [{title, url, snippet}].

Correction loop with validation

When a tool-wrapper returns status: 'error', the args land in a candidate queue. At REPL shutdown the correction-agent proposes a fix and actually executes it via tool_wrapper_run. Only validated fixes get appended to the specialist's good/bad examples (capped at 10 each).

Reasoning logs

Every tool-wrapper run appends one JSONL entry to logs/tool-wrappers.jsonl. Easy to jq, easy to grep, easy to ship to a log aggregator. Pairs naturally with the new post-run activity logs to give you a full audit trail of what Bernard actually did.

OS-aware tool prompts

osPromptBlock() detects the host OS and injects shell hints (Linux, macOS, Windows) into every tool-wrapper system prompt — so the shell-wrapper doesn't suggest brew on Ubuntu or GNU-only flags on macOS.

XDG paths everywhere

All file paths now follow the XDG Base Directory Specification, centralized in src/paths.ts. Set BERNARD_HOME for the legacy flat layout. Existing ~/.bernard/ installs auto-migrate on first run and leave a MIGRATED marker behind.

Sub-agent context guard

BERNARD_SUBAGENT_RESULT_MAX_CHARS (default 4000) caps how much of a sub-agent or specialist's output gets fed back into the parent context. The user still sees the full output in the terminal — only the parent's memory budget gets protected.

v0.8.1 2026-04-06

Proportional Task Step Budget fix

Tasks now receive a proportional fraction of the available step budget instead of a fixed step count, and the final step forces text-only output to guarantee structured JSON results.

Introduced TASK_STEP_RATIO, getTaskMaxSteps, and makeLastStepTextOnly utilities in task.ts
Overhauled wrapTaskResult to robustly extract and validate JSON from arbitrary output, handling nested objects, escaped quotes, and multiple JSON blocks
Updated REPL and task tool to pass correct maxSteps and experimental_prepareStep options
Expanded test coverage for result extraction edge cases

Threshold Normalization for Auto-Create Agents fix

The auto-create threshold now accepts both fractional (0–1) and percentage (1–100) values, and pending specialist candidates are re-evaluated automatically when settings change.

Added normalizeThreshold utility in config.ts that clamps values to the [0, 1] range regardless of input format
Updated /agent-options threshold command to display both normalized and percentage values
Refactored auto-creation logic into a reusable promoteCandidate helper
Enabling auto-create now re-evaluates and promotes pending candidates above the threshold

v0.8.0 2026-03-27

Per-Agent Plan-Act-Critic System new

A reusable Plan-Act-Critic (PAC) loop now wraps sub-agents, specialists, and cron job executions when critic mode is enabled. The critic verifies each agent’s work and retries on failure with targeted feedback.

Extracted pac.ts module with runPACLoop() for consistent critic integration
Sub-agents, specialists, and cron jobs all use the same PAC pipeline automatically
On a FAIL verdict, the task is retried with the critic’s feedback (up to 2 retries)
Compact critic verdicts — PASS and WARN are now single-line inline, reducing terminal noise while keeping full detail for FAIL

Auto-Continue on Truncation new

When a response hits the max-tokens limit and is cut off, Bernard automatically continues where it left off — up to 3 continuations — then recommends an optimal token limit based on actual usage.

Seamless auto-continuation with no user intervention required
Post-completion recommendation of the ideal max-tokens value (1.25× actual usage)
Warning with /options max-tokens instructions if still incomplete after 3 continuations
New /debug command — prints a diagnostic report (runtime, config, API key status, MCP servers, RAG stats) with no secrets leaked

Specialist Overlap Detection & Auto-Creation new

Smarter specialist suggestions powered by token-based similarity scoring, plus a new auto-creation mode that saves high-confidence candidates without manual review.

Jaccard similarity across name, description, system prompt, and guidelines (weighted)
Candidates with >60% overlap are suppressed; partial overlaps suggest enhancing the existing specialist
New /agent-options command to toggle auto-creation and set confidence threshold at runtime
BERNARD_AUTO_CREATE_SPECIALISTS and BERNARD_AUTO_CREATE_THRESHOLD environment variables

Configurable Loop Limit & Default Planning enhanced

Control how many tool-call iterations the agent can take per request, and benefit from step-by-step planning in every conversation — not just critic mode.

BERNARD_MAX_STEPS environment variable (default 25) with runtime override via /options max-steps
Sub-agents automatically receive 50% of the main agent’s step budget
When the step limit is hit, Bernard offers to double it for the current session
System prompt now includes a planning section by default, improving multi-step task execution

Timestamp Awareness enhanced

Bernard now has full temporal context — every user message carries an ISO 8601 timestamp, and the system prompt includes the current date and time.

ISO 8601 timestamp prefixes on every user message for precise time context
System prompt upgraded from “Today’s date” to “Current date and time”
time tool output now includes seconds and timezone
Bug fix: MCP URL transport correctly uses HTTP for non-SSE endpoints (was always defaulting to SSE)

v0.7.0 2026-03-13

File Manipulation Tools new

Two new tools — file_read_lines and file_edit_lines — give Bernard precision file editing without shelling out to sed or awk.

Line-based editing with replace, insert, delete, and append operations
Atomic writes — all edits in a single call succeed or all fail
Binary file detection and pagination for large files (50 MB cap)
file_read_lines supports offset/limit for targeted reads across big files

Per-Agent Model Selection new

Override the provider and model for any specialist, sub-agent, or task invocation. Set a default model on specialists at creation time, or pass an override at call time.

Specialists store a default provider/model that is used every time they run
Sub-agents and tasks accept per-invocation provider/model overrides
Cross-provider safety — auto-resolves the default model when switching providers
Model tags shown in /specialists listings

Deterministic Single-Step Tasks enhanced

Tasks are now single-step: one LLM call with tool use, then structured output. Saved tasks can be invoked by ID, and the REPL shows tasks and routines separately.

Reduced to maxSteps: 2 (one tool-use round + structured result)
New taskId parameter — run saved task routines by ID or via /task-{id} in the REPL
Separate display for tasks vs routines in the REPL
Auto-context injection — working directory, available tools, memory, and RAG are included automatically

v0.6.2 2026-03-12

Critic Visibility — Adaptive Truncation fix

Tool-result summaries shown to the critic now use an adaptive 8000-char budget instead of a fixed 500-char slice, so the critic sees far more context before judging.

Adaptive 8000-char budget with a 500-char floor per tool result
Response truncation raised from 2000 to 4000 chars, args from 500 to 1000
Truncation markers (...) added to tool results
Critic prompt updated to not treat truncation as failure evidence

Error Handling & Eventual Consistency fix

Agents no longer retry the exact same failed command, and external API mutations get a short delay before re-verification to account for eventual consistency.

No retrying the exact same failed command across all agent types and cron jobs
Wait 2–5s before re-verifying external API mutations

Specialist Candidate Filtering fix

Candidates that have already been saved as specialists are automatically filtered out of future suggestions.

Auto-marks candidates as accepted when a specialist is created by draft ID or name
Defensive filtering at startup and in /candidates output

v0.6.1 2026-03-11

Critic Retry Loop fix

Critic mode now retries up to 2 times when it detects issues, feeding its verdict back to the agent for correction before giving up.

Up to 2 automatic retries when critic returns WARN or FAIL
Critic feedback is injected into conversation history so the agent can self-correct
Improved verdict parsing using regex-based extraction
REPL prompt now shows a diamond symbol instead of [CRITIC] label
New parseCriticVerdict utility extracted to output module

`/create-task` Command fix

Added /create-task guided creation command and refactored duplicate creation logic.

New /create-task REPL command for guided task routine creation
Refactored duplicate guided creation logic into a shared runGuidedCreation helper
Help menu now lists /routines, /create-routine, and /create-task commands

Specialist Auto-Dispatch fix

Specialists are now automatically matched to user input using keyword scoring, with high-confidence matches auto-dispatched.

Keyword-based scoring matches user input against saved specialists
Scores ≥ 0.8 trigger automatic dispatch; 0.4–0.8 prompt for confirmation
Match advisory injected into system prompt so the agent knows which specialists fit
Stop-word filtering prevents common words from inflating scores

v0.6.0 2026-03-10

Tasks new

Tasks are isolated, focused executions that return structured JSON output. Unlike sub-agents (which return free-form text), tasks always produce a {status, output, details?} response — making them ideal for machine-readable results, routine chaining, and conditional branching.

5-step budget — tasks are meant to be quick and focused
Structured JSON output — always returns {status: "success"|"error", output, details?}
Completely isolated from the current session — no conversation history
Available as both a tool and a command via /task
Shared concurrency pool with sub-agents (4 slots max)

bernard

bernard> /task List all TypeScript files in src/
┌─ task — List all TypeScript files in src/
  ▶ shell: find src -name "*.ts" -type f
└─ task success: Found 23 .ts files

// The agent can also call tasks during routines:
bernard> run the deploy routine
  ▶ task: "Check test suite passes"
  ┌─ task — Check test suite passes
    ▶ shell: npm test
  └─ task success: All 994 tests passed
  ▶ shell: git push origin main

Specialists new

Specialists are reusable expert profiles — persistent personas with custom system prompts and behavioral guidelines that shape how a sub-agent approaches work. Unlike routines (which define what steps to follow), specialists define how to work.

Each specialist run gets its own generateText loop with a 10-step budget
One JSON file per specialist in ~/.local/share/bernard/specialists/
Manage via /specialists or invoke directly with /{id}
Up to 50 specialists with lowercase kebab-case IDs

bernard

bernard> create a specialist called "code-reviewer" that reviews code for correctness, style, and security
  ▶ specialist: create { id: "code-reviewer", name: "Code Reviewer", … }
  ✓ Specialist "Code Reviewer" (code-reviewer) created.

bernard> /code-reviewer review the changes in src/agent.ts
┌─ spec:1 [Code Reviewer] — review the changes in src/agent.ts
  ▶ shell: git diff src/agent.ts
└─ spec:1 done

// List all saved specialists:
bernard> /specialists

Specialist Suggestions new

Bernard automatically detects recurring delegation patterns in your conversations and suggests new specialists. Detection runs in the background when you exit a session or use /clear --save.

Review pending suggestions with /candidates
Each candidate includes a name, description, confidence score, and reasoning
Accept or reject candidates conversationally
Auto-dismissed after 30 days if not reviewed; up to 10 pending at a time

bernard

// On startup, Bernard notifies you of pending suggestions:
  2 specialist suggestion(s) pending. Use /candidates to review.

bernard> /candidates

  1. code-reviewer (confidence: 0.85)
    Reviews PRs for correctness, style, and security issues

  2. db-migrator (confidence: 0.72)
    Manages database schema changes and migration scripts

bernard> accept the code-reviewer candidate
  ✓ Specialist "Code Reviewer" (code-reviewer) created.

Critic Mode new

Critic mode adds planning, proactive scratch/memory usage, and post-response verification. Recommended for high-stakes work like deployments, git operations, and multi-file edits.

Planning — writes a plan to scratch before multi-step tasks
Proactive scratch — accumulates findings during complex work
Verification — a critic agent reviews the work and prints a verdict (PASS / WARN / FAIL)

bernard

bernard> /critic on
  ✓ Critic mode enabled.

bernard> refactor the auth module to use JWT
  ▶ scratch: write plan
  ▶ shell: cat src/auth.ts
  ▶ write: src/auth.ts
  ▶ shell: npm test

  ┌─ critic verdict
  │ PASS — All claimed edits match tool calls.
  │ Tests pass after refactor. No issues found.
  └─

bernard> /critic off
  Critic mode disabled.

Conversation Summaries new

A fourth specialized domain in Bernard's RAG memory system, joining Tool Usage Patterns, User Preferences, and General Knowledge. Conversation summaries capture what was discussed, approaches taken, tools/specialists/routines used, and outcomes.

Extracts high-level summaries of what was accomplished each session
Helps identify recurring patterns across conversations
All four domain extractors run in parallel — no extra latency
Search returns up to 5 results per domain (15 total max), preventing any single domain from crowding out others

bernard

// After a session, Bernard extracts memories by domain:
  ▶ Extracting facts…
    tool-usage: 3 facts
    user-preferences: 1 fact
    general: 2 facts
    conversations: 1 fact

// In a future session, recalled context is organized by domain:
bernard> /rag
  RAG Memory: 142 facts across 4 domains
    tool-usage       48 facts
    user-preferences 31 facts
    general          41 facts
    conversations    22 facts

Earlier releases — Routines, the multi-provider switch, and the first sub-agent loop — predate this changelog format and aren't reproduced here. See the git history for anything before v0.6.0.

v0.9.0 2026-05-07 Bernard learns to look things up.

Smarter conversations new

Better reasoning new

New UI surface new

Operators & infra new

Reliability & polish fix

Also in this release

Bundled specialists

tool_wrapper_run dispatch

web_search with provider chain

Correction loop with validation

Reasoning logs

OS-aware tool prompts

XDG paths everywhere

Sub-agent context guard

v0.8.1 2026-04-06

Proportional Task Step Budget fix

Threshold Normalization for Auto-Create Agents fix

v0.8.0 2026-03-27

Per-Agent Plan-Act-Critic System new

Auto-Continue on Truncation new

Specialist Overlap Detection & Auto-Creation new

Configurable Loop Limit & Default Planning enhanced

Timestamp Awareness enhanced

v0.7.0 2026-03-13

File Manipulation Tools new

Per-Agent Model Selection new

Deterministic Single-Step Tasks enhanced

v0.6.2 2026-03-12

Critic Visibility — Adaptive Truncation fix

Error Handling & Eventual Consistency fix

Specialist Candidate Filtering fix

v0.6.1 2026-03-11

Critic Retry Loop fix

/create-task Command fix

Specialist Auto-Dispatch fix

v0.6.0 2026-03-10

Tasks new

Specialists new

Specialist Suggestions new

Critic Mode new

Conversation Summaries new

`tool_wrapper_run` dispatch

`web_search` with provider chain

`/create-task` Command fix