HACKER Q&A
📣 AS_YC

Are we missing a middleware layer between LLM agents and the web?


I’ve been experimenting with browser agents (OpenClaw Browser, agent-browser, Playwright setups with Claude/Cursor).

Even with: - accessibility snapshots - element references (E1, E2) - semantic locators - session isolation

they still feel fundamentally fragile.

LLMs are reasoning over DOM trees step by step. It works — but barely. Small UI changes break everything.

It feels like we’re missing an abstraction layer.

What if instead of agents operating on markup, websites exposed structured “interaction surfaces” — something closer to tools or world models rather than DOM nodes?

Instead of: - parse DOM - guess selector - click element

It would be: - request action - receive structured state - operate over stable semantic primitives

Is this already being explored somewhere beyond MCP experiments? Or is everyone still stuck in DOM-land?

Curious if others see the same limitation — and whether a middleware “site-agent” layer makes sense.

Would love to hear your thoughts


  👤 andsoitis Accepted Answer ✓
Another approach is for LLMs to operate software, such as a web browser, like a human would.

For instance, see “Computer use” in the recent Sonnet 4.6 announcement: https://www.anthropic.com/news/claude-sonnet-4-6