Editor's picks
把 temperature 設成 0,AI 就會每次都一樣嗎?
網路上常說:要 AI 每次給一樣的答案,把 temperature 設成 0 就好。但有人拿同一個 prompt、temperature 0 連跑 1000 次,還是冒出 80 種不同輸出。原因不是浮點誤差這麼簡單,而是它在 GPU 上跟多少別的請求湊成一批一起算。聊聊為什麼『最確定』不等於『可重現』。
Why Does AI Give a Different Answer Every Time You Ask?
Ask an AI the same thing three times and you often get three different answers. It isn't being flaky — it never picks the single most-likely word, it draws one by probability. Here's the dial behind it, why even temperature 0 isn't fully repeatable, and why 'varies' isn't the same as 'making things up'.
為什麼同一個問題問 AI,每次答案都不一樣?
同一個問題問 AI 三次,常常拿到三個不一樣的答案。不是它在亂講——它本來就不是每次都挑機率最高的那個字,而是照機率抽一個。用一個加權抽籤的畫面,聊聊它為什麼會飄,還有飄跟唬爛其實是兩回事。
When an AI says "done," ask it to show you
An AI's 'done' sounds the same whether the work happened or not. The fix is one small habit: don't take its word for it, ask it to show you a result you can check yourself, sized to the task.
AI 說「完成了」,怎麼確認它真的做完?
AI 回報「完成了」的時候,真的做完、做一半繞過去、方向整個誤會,那段話讀起來幾乎一樣。與其判斷那句話可不可信,不如養成一個反射:給我看一個我自己查得到的東西,commit、測試輸出、diff。
Claude Fable 5: First Public Mythos-Class Model, One Day In
Anthropic released Claude Fable 5 on June 9 — the first publicly available Mythos-class model, one tier above Opus. What it is, what it costs, the June 22 deadline on the subscription window, and what changed when I pointed three real projects at it for a day.
Claude Fable 5 是什麼?第一個公開的 Mythos 級模型,加上我第一天的使用心得
Anthropic 6/9 釋出第一個公開的 Mythos 級模型 Claude Fable 5。這篇整理它跟 Opus 4.8 的關係、定價、6/22 截止的訂閱免費期,加上第一天把三個專案丟給它跑的心得:它對治理流程的遵守程度是真的,token 也是真的兇。
怎麼讓 AI agent 照流程走:閘門只記帳,不攔人
流程裡那些閘門其實不在執行時擋住 AI agent,它要的是一張改不掉的收據。真正有牙齒的不是閘門,是記錄抹不掉、賴不掉。
Claude Code 多了個 dynamic workflows,我打開那段 JS 看了一下
Claude Code 5/28 釋出 dynamic workflows,跟 Opus 4.8 同一天上。比起「能開 1000 個 subagent」那個數字,更關鍵的是 orchestration 那段 JS 是 Claude 寫的、不是 Claude 在跑——這件事其實滿值得想一下的。
How Claude Code's Dynamic Workflows Run 1,000 Subagents
Claude Code's new dynamic workflows hand the orchestration plan over to a JavaScript script that Claude writes. The runtime executes it with up to 1,000 subagents — 16 concurrent — and Claude's context only sees the final cross-checked answer.


