Mihai Cosma

Emergence through chaos and order.

Agent Harnesses

Orchestration, memory, communication architectures

panetone 1-file Python bridge for 2-way communication with Claude/Codex/Gemini/OpenCode interactive sessions in wezterm-mux-server via Telegram/Signal/Slack. Launch multi-agent collaborative session via /collab.
swarmhost multi-agent orchestration with dependency graphs, built by reverse-engineering Claude Code's /batch in <12 hours. Stateless planner/worker/merger architecture with continuous task refill. Used to ship 15K LOC to prod in 26 hours.
longmem ablation study on agent memory retrieval beats Mastra-AI's LongMemEval SOTA with just embeddings and terse prompts. Novel application of ColBERT-Zero for long-context retrieval. Conclusion is that complexity hurts.

Quantitative benchmarks, agent capabilities, automated evaluation

mad novel eval benchmark for LLM agent memory and multi-step planning over long game sequences. Agents reach 91% of ceiling with memory, and match random score without.
human3090 maps the efficient frontier of local coding LLMs at the trailing edge of hardware on my 3090.
search-claude-history lets Claude efficiently search session history through a simple CLI tool.
redvsblue.fyi reproducible economic analysis of US Presidential control on 83 economic metrics.

Novel architectures, reinforcement learning, empirical results

reservoir computing for LLMs adds ESN sidecar to Qwen3.5-0.8B. 3x average exact-match across 23 benchmarks (0.12→0.36) at negligible perplexity cost (+0.3%). Perfect on passkey retrieval, near-perfect on multi-digit arithmetic, strong gains on variable tracking and program tracing. Kicked off with /collab.
mnk-game AlphaZero-style in Rust shows modest gains from transfer learning across board sizes. Hyperparameter sweeps and 75x speedup over reference Python, but improvements don't stack.
energy spillage attempt to repurpose ICLR 2026 hallucination metric for test-time tree search. /collab for research design. Found energy spillage is confidence proxy, and underperforms min(log_prob). Killed via predefined go/no-go gates.

gitrep.fyi star-based reputation scores for GitHub contributors. Sort issues and PRs by who filed them, not just when. Plus star histories and agent-friendly unlimited API.
clanker-analytics token usage analytics for AI coding tools. Reads local session logs from Claude Code, Codex, and Gemini CLI with per-project breakdowns using DuckDB. uv tool install clanker-analytics.
wakterm fork of WezTerm that makes Claude/Codex/Gemini/OpenCode first-class terminal citizens. Agent lifecycle CLI, turn-state tracking, session persistence, and 26 upstream bug fixes. The terminal layer underneath panetone.

GitHub Twitter Resume Status