Search: [agent]

Agentbase | Serverless Agent Platform for Developers

ai · agent

March 10, 2026 at 10:37:05 AM EDT * · permalink

·

[2602.10715] Locomo-Plus: Beyond-Factual Cognitive Memory Evaluation Framework for LLM Agents

Experiments across diverse backbone models, retrieval-based methods, and memory systems demonstrate that cognitive memory remains challenging and reveals failures not captured by existing benchmarks.

ai · llm · memory · agent · paper

March 8, 2026 at 4:16:04 PM EDT * · permalink

·

https://arxiv.org/abs/2602.10715

Vibe Coding Is the Future of Programming. How to Get Started.

Instead, he says, business leaders should prioritize creating a culture in which their employees feel empowered to experiment with vibe coding and share their best creations. “Seeing is believing,” says Schluntz, “and I think getting non-developers in every company to use these tools to bring their ideas to life is one of the most powerful things.”

According to Anthropic researcher Eric Schluntz, vibe coding makes it so that “people are limited only by their creativity, not by the skills that they have.” Think about Apple in the 1970s; Steve Jobs was the big ideas guy, and Steve Wozniak was the technical genius who translated Jobs’ ideas into a working product. Vibe coding essentially gives everyone their own personal Woz. “If you have an image of something in your mind, you can go create it,” adds Schluntz.

ai · vibecoding · agent

March 6, 2026 at 10:52:18 AM EST * · permalink

·

https://archive.is/VVt1S#selection-1915.0-1915.352

Jido 2.0 is now available · Agent Jido

TypeScript agent frameworks felt like toys. Single-threaded event loops trying to juggle concurrent agents with promises and prayer. Python agents did a little better, but after a long time they couldn’t stay up. The BEAM was built for exactly this kind of work.

ai · elixir · agent

March 6, 2026 at 9:24:30 AM EST * · permalink

·

https://jido.run/blog/jido-2-0-is-here

KARL: A Faster Agent for Enterprise Knowledge, powered by custom RL

While SFT distillation meaningfully improves overall performance over the base model, the gap between the two approaches is most apparent when combined with test-time compute. On in-distribution tasks, SFT benefits substantially from parallel sampling (69.1 → 75.3), yet on out-of-distribution tasks the gains are negligible (59.4 → 59.6). This suggests that distillation teaches the model to imitate task-specific expert behavior, which scales well within the training distribution but fails to generalize beyond it. In contrast, KARL benefits from test-time compute both in- and out-of-distribution, indicating that RL develops more general search capabilities rather than task-specific heuristic

ai · rl · agent

March 5, 2026 at 10:10:20 PM EST * · permalink

·

https://www.databricks.com/sites/default/files/2026-03/karl.pdf

symphony/elixir/README.md at main · openai/symphony

Why Elixir?

Elixir is built on Erlang/BEAM/OTP, which is great for supervising long-running processes. It has an active ecosystem of tools and libraries. It also supports hot code reloading without stopping actively running subagents, which is very useful during development.

elixir · ai · agent

March 5, 2026 at 10:07:33 PM EST * · permalink

·

https://github.com/openai/symphony/blob/main/elixir/README.md

Observational Memory | Memory | Mastra Docs

Observations

When message history tokens exceed a threshold (default: 30,000), the Observer creates observations — concise notes about what happened.

When observations exceed their threshold (default: 40,000 tokens), the Reflector condenses them — combining related items and reflecting on patterns.

The result is a three-tier system:

Recent messages: Exact conversation history for the current task
Observations: A log of what the Observer has seen
Reflections: Condensed observations when memory becomes too long

ai · agent · memory

February 12, 2026 at 12:55:28 PM EST * · permalink

·

https://mastra.ai/docs/memory/observational-memory

RAW.works - Recursive Language Models as Memory Systems

we were able to demonstrate a “Top-5” LongMemEval result with very minimal modifications to dspy.RLM, just some helper functions to process the “multi-chat” sessions

ai · agent · memory

February 12, 2026 at 12:34:34 PM EST * · permalink

·

https://raw.works/recursive-language-models-as-memory-systems/

Observational Memory: 95% on LongMemEval - Mastra Research

Observational Memory achieves the highest score ever recorded on LongMemEval — 94.87% with gpt-5-mini — while maintaining a completely stable, cacheable context window. It beats the oracle, outperforms complex multi-step reranking systems with a single pass, and scales better with model quality than existing approaches.

ai · llm · agent

February 10, 2026 at 9:55:30 PM EST * · permalink

·

https://mastra.ai/research/observational-memory

Inside OpenAI’s in-house data agent | OpenAI

OpenAI recently introduced their bespoke in-house AI data agent, a GPT-5.2-powered tool designed to help employees navigate and analyze over 600 petabytes of internal data across 70,000 datasets. By translating natural language questions into complex data insights in minutes, the agent enables teams across the company to bypass manual SQL debugging and quickly make data-driven decisions.

openai · agent

January 30, 2026 at 6:35:00 PM EST * · permalink

·

https://openai.com/index/inside-our-in-house-data-agent/

SOTA on swebench-verified: (re)learning the bitter lesson

Searching code is an important part of every developer's workflow. We're trying to make it better.

aide · ml · agent

January 27, 2025 at 1:23:02 AM EST * · permalink

·

https://aide.dev/blog/sota-bitter-lesson

[2310.06770v2] SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Abstract page for arXiv paper 2310.06770v2: SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

agent

January 15, 2025 at 12:23:03 PM EST * · permalink

·

https://arxiv.org/abs/2310.06770v2

Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet

A post for developers about the new Claude 3.5 Sonnet and the SWE-bench eval

agent

January 15, 2025 at 12:22:43 PM EST * · permalink

·

https://www.anthropic.com/research/swe-bench-sonnet

Agents

Chip Huyen's 8,000 word practical guide to building useful LLM-driven workflows that take advantage of tools. Chip starts by providing a definition of "agents" to be used in the piece …

agent

January 11, 2025 at 10:38:43 PM EST * · permalink

·

https://simonwillison.net/2025/Jan/11/agents/

Agents

Intelligent agents are considered by many to be the ultimate goal of AI. The classic book by Stuart Russell and Peter Norvig, Artificial Intelligence: A Mode...

agent

January 11, 2025 at 10:38:30 PM EST * · permalink

·

https://huyenchip.com//2025/01/07/agents.html