All Bernard releases, newest first.
The largest update since launch. v0.9.0 reshapes how Bernard handles ambiguity: references resolve against memory before each turn, unknown entities trigger a read-only tool lookup, and prompts get rewritten for the active model family. A new coordinator (ReAct) mode adds iterative reasoning with verifiable plans, the UI moves to arrow-key menus, and a tool augmentation layer learns from every error your tools throw.
Bernard now understands references the same way a human collaborator does. Before each turn, a reference resolver scans your message for named entities ("my brother", "the staging cluster") and looks them up in persistent memory. If the answer is missing, Bernard tries one read-only tool (e.g. a Google Contacts MCP) before asking you, then offers a Save / Edit / Skip menu so the answer sticks. A model-specific prompt rewriter then reshapes the message for the active model family, and a context-gathering protocol keeps the agent from inventing numbers it could have just looked up.
ModelProfile.rewriterHint,
fail-open (#125, #131)
Coordinator (ReAct) mode turns Bernard into an iterative thinker. With
BERNARD_REACT_MODE=true, every turn runs a
think → act → evaluate → decide loop with
new plan, think, and evaluate tools. The step
budget triples (capped at 150) so multi-step plans actually finish, and a post-run
verifier re-checks the plan if any step is still unresolved. Turn it on for hard
tasks; leave it off for chat — you can flip the toggle live in
/agent-options.
plan, think,
evaluate tools (#117)
min(BERNARD_MAX_STEPS × 3, 150) when ReAct is on
Menus are now arrow-key driven by default. Bernard pins the menu region with
setPinnedRegion, you move with arrows or digits 1–9, and Enter
commits. Non-TTY environments and BERNARD_PLAIN_MENU=1 still get the
classic numbered prompt as a fallback.
/image <path> [prompt] attaches an image to the next turn, and a
/options tool-details toggle collapses the per-call tool noise into
compact one-liners.
BERNARD_PLAIN_MENU fallback (#113, #132)
/image on vision-capable models (#112)/options tool-details compact / verbose toggle —
BERNARD_TOOL_DETAILS (#119)
A new tool augmentation layer wraps every tool's execute to observe
errors and patch the system prompt with concrete bad-example fixes — no
hallucination, only real observed failures. Shell commands are split into
sub-categories (shell.git, shell.gh,
shell.docker, shell.npm, shell.fs,
shell.http) so guidance stays scoped. Cron jobs gained persistent notes:
the daemon injects cron_notes_read / cron_notes_write scoped
to the current job so long-running schedules don't redo work after restarts.
shell-wrapper, file-wrapper,
web-wrapper), correction agent, and specialist creator (#111)
## Notes written during this run section in
cron_logs_get (#130)
jsonSchema and the
_jsonSchema shape produced by @ai-sdk/ui-utils
Promise.race against a real timer
— resilient to tools that ignore abortSignal
Smaller-but-load-bearing pieces that ship with v0.9.0 — many of them power the headline themes above.
Five tool-wrapper specialists ship out of the box and seed on first run:
shell-wrapper, file-wrapper, web-wrapper,
correction-agent, and specialist-creator. A
.seeded-v1 marker prevents re-seeding so user edits stick.
tool_wrapper_run dispatch
The main agent talks to specialists through one strict-typed dispatch tool. Each call
is isolated to its targetTools — the
shell-wrapper can't accidentally browse the web. Output is guaranteed
JSON: {status, result, error?, reasoning?}.
web_search with provider chain
A new built-in web_search tool tries Brave first
(BRAVE_API_KEY), falls back to Tavily (TAVILY_API_KEY), and
finally to a DuckDuckGo scrape so search keeps working without keys. Returns
[{title, url, snippet}].
When a tool-wrapper returns status: 'error', the args land in a candidate
queue. At REPL shutdown the correction-agent proposes a fix and
actually executes it via tool_wrapper_run. Only validated fixes
get appended to the specialist's good/bad examples (capped at 10 each).
Every tool-wrapper run appends one JSONL entry to
logs/tool-wrappers.jsonl. Easy to jq, easy to grep, easy to
ship to a log aggregator. Pairs naturally with the new post-run activity logs to give
you a full audit trail of what Bernard actually did.
osPromptBlock() detects the host OS and injects shell hints (Linux,
macOS, Windows) into every tool-wrapper system prompt — so the
shell-wrapper doesn't suggest brew on Ubuntu or GNU-only
flags on macOS.
All file paths now follow the XDG Base Directory Specification, centralized in
src/paths.ts. Set BERNARD_HOME for the legacy flat layout.
Existing ~/.bernard/ installs auto-migrate on first run and leave a
MIGRATED marker behind.
BERNARD_SUBAGENT_RESULT_MAX_CHARS (default 4000) caps how much of a
sub-agent or specialist's output gets fed back into the parent context. The user still
sees the full output in the terminal — only the parent's memory budget gets
protected.
Tasks now receive a proportional fraction of the available step budget instead of a fixed step count, and the final step forces text-only output to guarantee structured JSON results.
TASK_STEP_RATIO, getTaskMaxSteps, and
makeLastStepTextOnly utilities in task.ts
wrapTaskResult to robustly extract and validate JSON from
arbitrary output, handling nested objects, escaped quotes, and multiple JSON blocks
maxSteps and
experimental_prepareStep options
The auto-create threshold now accepts both fractional (0–1) and percentage (1–100) values, and pending specialist candidates are re-evaluated automatically when settings change.
normalizeThreshold utility in config.ts that clamps
values to the [0, 1] range regardless of input format
/agent-options threshold command to display both normalized and
percentage values
promoteCandidate helper
A reusable Plan-Act-Critic (PAC) loop now wraps sub-agents, specialists, and cron job executions when critic mode is enabled. The critic verifies each agent’s work and retries on failure with targeted feedback.
pac.ts module with runPACLoop() for consistent
critic integration
When a response hits the max-tokens limit and is cut off, Bernard
automatically continues where it left off — up to 3 continuations — then
recommends an optimal token limit based on actual usage.
max-tokens value (1.25×
actual usage)
/options max-tokens instructions if still incomplete after 3
continuations
/debug command — prints a diagnostic report (runtime, config,
API key status, MCP servers, RAG stats) with no secrets leaked
Smarter specialist suggestions powered by token-based similarity scoring, plus a new auto-creation mode that saves high-confidence candidates without manual review.
/agent-options command to toggle auto-creation and set confidence
threshold at runtime
BERNARD_AUTO_CREATE_SPECIALISTS and
BERNARD_AUTO_CREATE_THRESHOLD environment variables
Control how many tool-call iterations the agent can take per request, and benefit from step-by-step planning in every conversation — not just critic mode.
BERNARD_MAX_STEPS environment variable (default 25) with runtime
override via /options max-steps
Bernard now has full temporal context — every user message carries an ISO 8601 timestamp, and the system prompt includes the current date and time.
time tool output now includes seconds and timezone
Two new tools — file_read_lines and file_edit_lines
— give Bernard precision file editing without shelling out to
sed or awk.
file_read_lines supports offset/limit for targeted reads across big files
Override the provider and model for any specialist, sub-agent, or task invocation. Set a default model on specialists at creation time, or pass an override at call time.
provider/model that is used
every time they run
/specialists listingsTasks are now single-step: one LLM call with tool use, then structured output. Saved tasks can be invoked by ID, and the REPL shows tasks and routines separately.
maxSteps: 2 (one tool-use round + structured result)
taskId parameter — run saved task routines by ID or via
/task-{id} in the REPL
Tool-result summaries shown to the critic now use an adaptive 8000-char budget instead of a fixed 500-char slice, so the critic sees far more context before judging.
...) added to tool resultsAgents no longer retry the exact same failed command, and external API mutations get a short delay before re-verification to account for eventual consistency.
Candidates that have already been saved as specialists are automatically filtered out of future suggestions.
/candidates outputCritic mode now retries up to 2 times when it detects issues, feeding its verdict back to the agent for correction before giving up.
[CRITIC] labelparseCriticVerdict utility extracted to output module/create-task Command fix
Added /create-task guided creation command and refactored duplicate
creation logic.
/create-task REPL command for guided task routine creationrunGuidedCreation helper
/routines, /create-routine, and
/create-task commands
Specialists are now automatically matched to user input using keyword scoring, with high-confidence matches auto-dispatched.
Tasks are isolated, focused executions that return structured JSON output. Unlike
sub-agents (which return free-form text), tasks always produce a
{status, output, details?} response — making them ideal for
machine-readable results, routine chaining, and conditional branching.
{status: "success"|"error", output, details?}
/taskSpecialists are reusable expert profiles — persistent personas with custom system prompts and behavioral guidelines that shape how a sub-agent approaches work. Unlike routines (which define what steps to follow), specialists define how to work.
generateText loop with a 10-step budget
~/.local/share/bernard/specialists/
/specialists or invoke directly with /{id}
Bernard automatically detects recurring delegation patterns in your conversations and
suggests new specialists. Detection runs in the background when you exit a session or
use /clear --save.
/candidatesCritic mode adds planning, proactive scratch/memory usage, and post-response verification. Recommended for high-stakes work like deployments, git operations, and multi-file edits.
A fourth specialized domain in Bernard's RAG memory system, joining Tool Usage Patterns, User Preferences, and General Knowledge. Conversation summaries capture what was discussed, approaches taken, tools/specialists/routines used, and outcomes.
Earlier releases — Routines, the multi-provider switch, and the
first sub-agent loop — predate this changelog format and aren't reproduced here.
See the
git history
for anything before v0.6.0.