## Summary
Seven related Coach Claire / Budget-Bot review-reliability and data-safety fixes, found while working the Central Support (CF) and JigTree Q3 docs. Originally scoped to the CF review bug (KLAIR-2826); KLAIR-2829/2830/2831/2832/2833/2835 were folded in as they surfaced during the same session.
- KLAIR-2826 — CF review where only one check completed (negative-EBITDA parse bug + check noise).
- KLAIR-2829 — GM Commentary regeneration silently failing (over-long Claire tool input dropped).
- KLAIR-2830 — Coach Claire chat turns truncating (cut-off replies + "promised N edits, only 1 landed").
- KLAIR-2831 — Claire reaching for refresh_data to update non-financial tables (it only rebuilds the Financials tables).
- KLAIR-2832 — chat max_tokens 16K → 32K (a four-full-rewrite turn still truncated at 16K).
- KLAIR-2833 — findings digest dropped CRITICAL findings to a stale char cap + told the agent nothing, so it critiqued a partial finding set as if exhaustive.
- KLAIR-2835 — reload-from-doc silently wiped content when the doc's heading parse degraded (data loss); add guards + undo.
## Why it's needed
### KLAIR-2826 — CF review
1. Negative EBITDA didn't parse. Cost-center / CF P&Ls run negative EBITDA formatted as $ (694,141). _parse_cell did the paren→negative conversion *before* stripping $/spaces, so a $-prefixed accounting negative never matched startswith("("), fell through to float("(694141)") → ValueError → None. Every EBITDA-dependent check (C2.1 margin target, C2.5 cost-vs-revenue, C2.7 BU-vs-Hybrid) then skipped.
2. Noise. The BU-Plans-tab checks (C2.3/C2.4/C2.8) and the per-product benchmark family (C3.1, C3.3–C3.9) can *never* run on a CF (those tabs don't exist for CFs), yet they surfaced as "skipped — tab not available", masking the real failure above.
### KLAIR-2829 — GM Commentary regen silently no-ops
regenerate_section.feedback is capped at max_length=4000, but the Anthropic-facing wire schema never declared that bound, so Claude couldn't self-correct. On synthesis sections (GM Commentary) Claire writes a long instruction, exceeds 4000, Pydantic rejects with string_too_long, and parse_tool_calls skipped the whole call → the regen silently vanished.
### KLAIR-2830 — chat turns truncate
The chat turn ran at max_tokens=4096 with adaptive extended thinking (effort="high"). Thinking tokens spend out of max_tokens, so 4096 starved the visible reply + tool_use blocks on heavy turns: cross-section analyses cut off mid-sentence, and "I'll do 3 rewrites" turns emitted only the first tool_use block before the cap (so only one proposal landed). stop_reason was never inspected, so truncation shipped silently.
## Changes
### KLAIR-2826
- review_checks/_helpers.py — _parse_cell (and _to_float) strip currency/commas/whitespace before the paren→negative conversion. $ (694,141) → -694141; also fixes ($1,234) mis-read as positive.
- review_checks/_registry.py — CheckSpec.cf_applicable, derived from required_data (a check needing a BU-only review tab is CF-inapplicable). Self-maintaining: a future per-product check auto-hides on CFs.
- review_checks/__init__.py — run_all_checks filters CF-inapplicable checks entirely for CF specs (not run, not surfaced), with an aggregate debug log of how many were filtered.
- models.py — BU_ONLY_REVIEW_TAB_SOURCES centralized and shared across _registry and canonical_plan._CANONICAL_BU_ONLY_KEYS.
### KLAIR-2829
- claire_tools.py — mirror the 4000-char cap onto the feedback wire schema so Claude self-corrects (same pattern as RefreshDataInput.reason).
- claire_tools.py — parse_tool_calls truncate-and-retry: when the only errors are string_too_long, truncate the offending top-level string field(s) to their cap and re-validate once, so the edit lands trimmed instead of vanishing. Any other error still skips.
### KLAIR-2830
- wizard_orchestrator.py — chat-turn max_tokens 4096 → 16000 (the safe Opus/Sonnet 4.x ceiling the generation path already uses).
- wizard_orchestrator.py — handle_chat inspects stop_reason: on max_tokens it logs a warning and appends an honest "I hit my response-length limit — reply 'continue'" nudge instead of shipping a silent truncation.
- wizard_orchestrator.py — _MULTI_EDIT_DIRECTIVE system-prompt block: emit one tool call per affected section in the same turn and lead with the tool calls, not a long preamble (preserves the existing one-proposal-per-section batching rule).
### KLAIR-2832
- models.py — BOARD_DOC_CHAT_MAX_OUTPUT_TOKENS 16K → 32K. A turn proposing a rewrite per affected section (four full-section rewrites) + high-effort thinking still truncated at 16K. max_tokens is a ceiling (no cost/latency unless emitted), the model already runs at 128K for generation, and the stop_reason nudge backstops the tail. Test pins the shared constant.
### KLAIR-2833
- wizard_orchestrator.py — _full_doc_findings_block: raise the digest caps (4K→16K chars, 40→100 lines — it's *input* context, ~3K tokens for 63 findings), render all CRITICAL findings unconditionally (bypass both caps), and only trim warning/info. Truncation note is now truncation-*aware*: states all N criticals are shown, that the agent is on a PARTIAL set, and that broad whole-doc answers must say so + not imply exhaustiveness. Fixes a hallucination-risk surface where a finding-heavy doc (JigTree: 63 findings, 12+ critical) silently dropped criticals and the agent critiqued the whole doc as if it had seen everything. Added a test that 12 criticals survive a 150-warning flood.
### KLAIR-2831
- wizard_orchestrator.py — REVIEW-phase guidance: replaced the false "sections are automatically regenerated" line with an explicit "Refresh Data vs. regenerate" routing block. refresh_data rebuilds only the Financials tables + surgically swaps key P&L numbers in prose; ARR/retention, PQR-variance, product tables, and new-quarter reframing go through regenerate_section. Notes a refresh can't manufacture not-yet-closed-quarter actuals.
- claire_tools.py — same scope note in the refresh_data tool description so Claire picks the right tool at decision time. (Prompt/description only — no logic change.)
### KLAIR-2835 (data-loss fix)
- routers/board_doc_router.py — reload-from-doc was a destructive title-matched full-replace guarded only by "refuse if 0 sections parsed." A user hand-editing the doc's section headings made the parser match titles but extract empty bodies, wiping real content. Now: (a) per-section empty guard (never overwrite a populated section with an empty parsed body; report sections_guarded), (b) degraded-parse abort (>=2 matched sections and >=50% empty → 409, nothing touched), (c) pre-reload snapshot + POST /restore-pre-reload single-shot undo. WizardSession.pre_reload_sections_backup added. Tests: abort-preserves-all, single-section-guarded, reload→restore round-trip, restore-without-backup 409.
### Backlog hygiene
- BACKLOG.md — folds in pending hygiene per the "no standalone backlog PRs" convention: B11.1–B11.4 + B11.7 → DONE, B5.1 → IN REVIEW + B5 Linear IDs, B11.8 + Observability (KLAIR-2824) scoped.
## Breaking changes
None. CF reviews show fewer (only applicable) checks; BU reviews unchanged. Chat behavior is strictly more robust.
## Test plan
- [x] pytest tests/board_doc/test_review_checks.py — parser + CF-applicability
- [x] pytest tests/board_doc/test_claire_tools.py — truncate-and-retry + wire-schema maxLength
- [x] pytest tests/board_doc/test_chat_tool_calls.py — max_tokens nudge fires; end_turn does not
- [x] ruff format + ruff check + pyright clean on touched files
- [ ] Manual: re-run Central Support Q2 review — confirm C2.1/C2.5/C2.7 produce findings and BU-Plans / per-product checks no longer appear as skips
- [ ] Manual: GM Commentary regen with a long feedback instruction now lands; a heavy multi-edit turn proposes all edits at once