Always in the middle of something.

Chasing ideas across ML, AI, and data. Building tools when the rabbit hole gets interesting enough.
Why Does AI Sound So Confident When It's Wrong?
AI Systems看中文版

Why Does AI Sound So Confident When It's Wrong?

AI's most dangerous trait isn't that it's wrong sometimes. It's that its tone when wrong is identical to its tone when right. Here's my plain-language take on why, including why it won't just say 'I don't know'.

2026-06-02 · 3 min read · 1315 words · KbWen · EN
How I Use ChatGPT, Claude, and Gemini Day to Day
AI Systems看中文版

How I Use ChatGPT, Claude, and Gemini Day to Day

Not a benchmark or a verdict on which AI is best — just the small habits I picked up from keeping ChatGPT, Claude, and Gemini all open: route by task, give context first, don't expect one perfect answer, and verify the confident-sounding stuff.

2026-06-02 · 3 min read · 1009 words · KbWen · EN
為什麼 AI 唬爛的時候,口氣跟講真話一模一樣?
AI SystemsRead in English

為什麼 AI 唬爛的時候,口氣跟講真話一模一樣?

AI 最會唬人的地方,不是它會錯,是它錯的時候那個口氣跟講對的時候完全一樣。用『它一直在猜下一個最順的字』這個角度,白話聊聊為什麼篤定不等於知道。

2026-06-02 · 4 min read · 1893 words · KbWen · ZH
我每天開著三個 AI 聊天視窗,這陣子摸出來的幾個小習慣
AI SystemsRead in English

我每天開著三個 AI 聊天視窗,這陣子摸出來的幾個小習慣

沒什麼大道理,就是同時用 ChatGPT、Gemini、Claude 一陣子之後,自己順手摸出來的幾個小習慣。不同事丟不同家、先講清楚再問、別期待一次到位這類的。

2026-06-02 · 5 min read · 2256 words · KbWen · ZH
AI SystemsRead in English

Benchmark 飽和的真正問題:不在測量,在驗證

GSM8k 99%、MMLU 90 出頭、HLE 在 2026 年中已進入 40 分檔。每出一份『更難的 benchmark』看起來都在解決問題,但結構性的事沒變:我們從來沒在驗證模型學會了什麼,只是在量它有沒有看過。

2026-06-01 · 6 min read · 2737 words · KbWen · ZH
AI Systems看中文版

LLM Benchmark Saturation Isn't a Measurement Problem

GSM8k at 99%, MMLU at the 88-94% noise band, HLE already in the mid-40s by mid-2026. Each round of harder benchmarks looks like progress, but the field never solved the underlying problem: we measure correlation with a test distribution and call it capability.

2026-06-01 · 4 min read · 1543 words · KbWen · EN
Python List Comprehensions: Read Them as For-Loops
Python看中文版

Python List Comprehensions: Read Them as For-Loops

A relaxed take on Python list comprehensions: translate them back into the equivalent for-loop, and check what's actually true about variable leaking and speed on Python 3.14.

2026-05-31 · 10 min read · 1973 words · KbWen · EN
Python 列表推導式:一行取代 for 迴圈
Python輕鬆讀Read in English

Python 列表推導式:一行取代 for 迴圈

用比較白話的方式聊 Python 列表推導式:把它翻回普通的 for 迴圈來看,順便用 Python 3.14 實測一下變數外洩跟效能到底是怎樣。

2026-05-31 · 7 min read · 3365 words · KbWen · ZH
A three-stage evolution diagram: a small four-line atomic skill on the left, a cluster of overlapping skills in the middle (pattern emerging), and a taller seventeen-line production skill on the right, connected by dashed timeline arrows
AI Systems

The Skill Your Annoyed Prompt Becomes

Your first Claude Code skill won't look like the polished examples in tutorials. It'll look like a prompt you've typed three times in a row, saved into a four-line markdown file. This post walks that minimum shape, shows the three things that break, and compares it to a real seventeen-line production-grade skill from the framework I use daily.

2026-05-28 · 7 min read · 1480 words · KbWen · EN
三層演化圖:左邊一個 4 行 atomic skill 草稿,中間幾個 atomic skill 群聚,右邊一個成熟的 17 行 production skill,細線串成時間軸
AI Systems

怎麼寫你的第一個 skill — 從一個煩躁的 prompt 開始

你的第一個 skill 不會長得像書裡那些 production-grade 的成熟形態,它會長得像「你重複打三次的同一個 prompt」。從那裡開始,比從一個成熟框架的 skill 倒著學容易很多。

2026-05-28 · 7 min read · 3015 words · KbWen · ZH