Hermes Decision Trace

Hindsight Reflect 使用规范与下一步规划

Hindsight 的 reflect 不是"定时反思"，而是 query-driven 的 agentic RAG。本机已完整打通 directives + mental_models + reflect 三层，配 5 个 query 模板和 4 个长效 mental_model，实测命中后 token 省 12 倍、耗时省 8 倍。短期不上 cron，长期等周复盘稳定后可考虑。

🧭

推荐路径

> 入库时间：2026-06-19

🔎

关键依据

见证据摘要与完整记录中的状态、产物和校验链。

🛠️

落地方式

先把已验证方案当成稳定基线：保留当前 schedule / deliver / workdir，不急着继续扩面；新增候选先读源码、看 output、做 run-now 验证，再决定是否转 script-only。

证据摘要

正文保留完整证据链；本页顶部只展示可读摘要。

行动清单

> 入库时间：2026-06-19

> 适用范围：本机 Hindsight 0.8.1（端口 8889，bank=hermes，事实数 8157+）

> 配套 skill：integration/hindsight-reflect-ops

> CLI 入口：~/.local/bin/hindsight-reflect（symlink 到 ~/.hermes/scripts/hindsight_reflect.py）

边界 / 风险

风险 / 边界

reflect 只读，不写回 mental_model；mental_model 是独立创建的

风险 / 边界

mental_model 自动生成靠 hindsight 内部跑 reflect，有跨主题污染风险——source_query 必须窄而清晰

风险 / 边界

LLM 间歇性不遵守 forced tool_choice，短 query + 大量 directives 时容易直接 done()

风险 / 边界

low budget + 注入 directives 后会触发 agent 偷懒返回 "I don't have information"，日常用 mid，复盘用 high

完整记录

本节目录一、Reflect 是什么二、当前资产清单三、ROI 实证四、几个关键陷阱五、下一步规划六、操作 Cheat Sheet 七、引用 / 关联

Hindsight Reflect 使用规范与下一步规划

入库时间：2026-06-19
适用范围：本机 Hindsight 0.8.1（端口 8889，bank=hermes，事实数 8157+）
配套 skill：integration/hindsight-reflect-ops
CLI 入口：~/.local/bin/hindsight-reflect（symlink 到 ~/.hermes/scripts/hindsight_reflect.py）

一、Reflect 是什么

Hindsight 0.8.x 的 POST /banks/{bank}/reflect 是 agentic RAG，而不是离线反思批处理。

调用时它会：

强制按层级调工具检索：search_mental_models → search_observations → recall
把检索结果合成中文 markdown 答案
答案被 directives（强制规则）和 reflect_mission（角色定位）双重约束
可选：传 response_schema 强制结构化 JSON 输出

它不会：

不会自动跑（没 cron / scheduler）
不会自动生成 mental_model 或 directives
不会接管 consolidation（那是单独的合并 pipeline）

二、当前资产清单

8 条全局 directives（tags=[]，全局生效）

priority	name	作用
100	结论先行	回答先给结论判断，再给支撑；不铺垫
95	不端水不浪费	有依据敢反对；没证据不硬顶
90	真实工具验证	git/部署/版本/系统状态必须真实工具调用
90	删除前确认	删除前确认，重启前说明影响
85	承认不确定	拿不准就说"不确定"，不推测意图
80	溯源必须	web 来的事实必须带溯源，不照搬 ≥3 字原文
80	专有名词必搜	不认识的词必须先搜，不靠"听起来像 X"猜
75	隐私脱敏	对外提 issue 默认去除路径/端口/凭据

4 个长效 mental_models

id	主题	长度
`model-routing-canon`	Routing 规范（CPA / AnyRouter / Krill / Codex 主路）	3.5k 字
`gateway-restart-protocol`	Gateway / 常驻服务重启边界	3.6k 字
`decision-trace-publish-canon`	Decision Trace 发布与验证标准流程	2.7k 字
`hermes-upgrade-protocol`	Hermes 升级与 clone-and-swap 协议	3.9k 字

5 个 reflect 模板（CLI）

hindsight-reflect <template> [topic] [--budget low|mid|high] [--print]

template	用途	budget	实测耗时 / token
`priming "<topic>"`	给新 agent 注入主题背景	mid	116s / 411k（mm 命中后 14s / 33k）
`history "<topic>"`	历史决策回顾	mid	待实战
`conflicts "<topic>"`	矛盾诊断	mid	105s / 407k
`weekly`	7 天反复模式（含 schema 结构化输出）	high	62s / 67k
`style-brief`	风格简报	mid	待实战
`raw "<query>" --budget`	自定义查询	任意	12s / 15k（mm 命中）

输出落 ~/.hermes/cache/reflect-output/<template>-<ts>.{md,json}。

三、ROI 实证

同一个主题相关的 query，有无 mental_model 的差距：

项目	裸 reflect（翻底层 fact）	mental_model 命中	差距
input tokens	411,370	31,707	省 12 倍
耗时	116 s	14 s	省 8 倍
答案准确度	召回散乱、需自由综合	直接引用规范，溯源到 mental_model id	显著提升
跨主题污染风险	高	低（顶层已收敛）	显著降低

实测数据来源：/home/ht/.hermes/cache/reflect-output/priming-20260619-093226.{md,json} 与 raw-20260619-095101.{md,json}。

四、几个关键陷阱

1. tag isolation（最坑的一个）

POST 完 directives 后，reflect 响应里 based_on.directives = []，输出风格也没变。

根因（memory_engine.py:10138）：reflect 默认 isolation_mode=True + 没传 tags → 所有带 tag 的 directive 都被过滤。

做法：全局通用 directive / mental_model 必须 tags=[]。

2. LLM 间歇性跳过 forced tool_choice

源码 agent.py:627 强制 tool_choice={"type":"function","function":{"name":"search_mental_models"}}，但实测部分短 query + mid budget 时 LLM 不遵守，4 秒内直接 done() 返回 "I don't have information"。

做法：

query 写长一点、措辞明确（"先用 search_mental_models 搜索..."）
mid 以上 budget 比 low 稳得多
重试 1-2 次

3. mental_model 跨主题污染

gateway-restart-protocol 的 source_query 里写了"验证 base_url、model、API mode"，结果生成的 mental_model 里塞了大量 routing 内容（属于 model-routing-canon 的领地）。

做法：source_query 用具体名词（hermes-gateway.service、decision_trace_publish.py），不要提其他主题的关键词。

4. budget 误用

budget	适用	不适用
low	mental_model 已建好且 query 简单	需要召回底层 fact 的查询
mid	日常主战场，priming/conflicts/raw	weekly 这种需要全局视图的
high	weekly + response_schema 结构化输出	简单查询（浪费）

5. Hindsight 0.8.1 已知 bug

reflect 输出末尾偶尔出现 memory_ids: [] mental_model_ids: [...]——这是 done() 工具的结构化输出泄漏，正则没清干净。可忽略，或在 wrapper 里后处理裁掉。

五、下一步规划

短期（已完成 + 不再扩张）

✅ 8 条 global directives 写入并验证生效

✅ 4 个高 ROI 主题建成 mental_model

✅ hindsight-reflect CLI 部署到 ~/.local/bin/

✅ 沉淀 skill integration/hindsight-reflect-ops

明确不做的：

❌ 不建 feishu-vs-weixin-channel mental_model — bank 里相关纯净 fact 太少，召回噪音太重
❌ 不建 external-project-absorption mental_model — 已有 skill external-project-absorption-eval 作为 canonical，重复建会冲突
❌ 不上 cron 定时跑 reflect — query-driven 工具盲跑等于烧 token

中期（条件触发，不强推）

条件 A：weekly 模板连续 2-3 周产出稳定

→ 上 cron，周一早 6:30 跑 hindsight-reflect weekly → 飞书短卡 + 发到 decision.ht1072.top

→ 可参考 morning-brief-direct-feishu-send 模式

条件 B：consolidation 跑出新主题观察

→ 给 mental_model 配 trigger.refresh_after_consolidation: true

→ consolidation 完成后 hindsight 自动 refresh 这些 mental_model

→ 等于"被动反思"，不需要手动触发

条件 C：新主题反复出现

→ 任何主题在 1 个月内被 ≥3 次 raw query 覆盖，就考虑建 mental_model

→ 候选观察池：cron 治理、skill 治理、provider 升级路线、kanban 调度

长期（视价值密度决定）

directives 演进

现在 8 条都是从 USER.md 提炼的核心规则。如果未来发现某条 directive 在 reflect 输出里反复"生效却没什么帮助"，下调 priority 或软关停（is_active=false）
不主动加新 directive；只在反复指出同一类问题后才补

mental_model 库扩张原则

每个 mental_model 的 source_query 必须能用一句话概括，且不和现有库重叠
单个 mental_model 不超过 4k 字（超出说明主题不够聚焦，应拆分）
每季度复审一次：跑一次相关 reflect，看是否还命中、内容是否过时

何时考虑 reflect 自动化

mental_model 库稳定到 ≥10 个，且 reflect 命中率（based_on.mental_models 非空的比例）≥80%
weekly 模板的产出连续 3 周可读、可决策
出现明确的"我每周一定会问的问题"

满足前先观察、后决策；现在不动。

六、操作 Cheat Sheet

跑一次反思

# 主题简报 hindsight-reflect priming "<主题>" # 矛盾诊断 hindsight-reflect conflicts "<A vs B>" # 周复盘（带 schema） hindsight-reflect weekly # 自定义 hindsight-reflect raw "<query>" --budget mid

看 mental_models / directives 当前状态

# 当前所有 mental_models curl -s http://localhost:8889/v1/default/banks/hermes/mental-models | python3 -m json.tool # 当前所有 directives（按 priority 倒序） curl -s http://localhost:8889/v1/default/banks/hermes/directives | \ python3 -c "import sys,json; d=json.load(sys.stdin); \ [print(f'[{x[\"priority\"]:3}] {x[\"name\"]}') for x in sorted(d['items'],key=lambda x:-x['priority'])]"

创建新 mental_model

import json, urllib.request mm = { "id": "<lowercase-with-hyphens>", "name": "<人类可读名>", "source_query": "<让 hindsight reflect 生成内容的 query>", "max_tokens": 3000, "trigger": {"mode": "delta", "refresh_after_consolidation": False}, "tags": [] # 关键：留空避免 isolation } urllib.request.urlopen(urllib.request.Request( "http://localhost:8889/v1/default/banks/hermes/mental-models", data=json.dumps(mm).encode(), headers={"Content-Type":"application/json"}, method="POST" ))

返回 operation_id，轮询 /operations/{op_id} 直到 status=completed（约 30-90s）。

修改 directive（软关停 / 改文案）

# 软关停 curl -X PATCH http://localhost:8889/v1/default/banks/hermes/directives/<id> \ -H "Content-Type: application/json" -d '{"is_active": false}' # 改文案 curl -X PATCH http://localhost:8889/v1/default/banks/hermes/directives/<id> \ -H "Content-Type: application/json" -d '{"content": "新内容"}'

七、引用 / 关联

skill：integration/hindsight-reflect-ops（含完整故障排查表和 cheat sheet）
兄弟 skill：integration/hindsight-memory-operations（umbrella）
源码参考：hindsight_api/engine/reflect/agent.py、prompts.py、memory_engine.py:list_directives
API 文档：http://localhost:8889/openapi.json
缓存输出：~/.hermes/cache/reflect-output/
实测样本：raw-20260619-095101.{md,json}（4 个 mental_model 全命中、33k token、14 秒）

---

变更： 后续如果新建/废弃 mental_model、调整 directive priority、或改变 reflect 自动化策略，请同步更新本文。