HACKER Q&A
📣 galaxyeye

Is a JVM/CDP based browser agent stack fundamentally a bad idea?


Hi HN, We built a very early prototype: a Browser-Agent/browser-automation runtime using Kotlin/JVM and raw CDP. Before investing further, we’d like advice from anyone who has worked on browser agents, AI browsers, large-scale automation, crawling, browser farms, or who has deep knowledge of Chromium/CDP. We ourselves suspect many of our design assumptions may be flawed, so sharp criticism is very welcome. --- TL;DR We’re building an open-source runtime: • AI planning/reasoning/logic lives on the JVM • Browser actions are driven via raw CDP • High concurrency via Kotlin coroutines • A small ML agent learns page structure But we’re not sure any of this is actually meaningful. Feedback—especially negative feedback—is appreciated. --- 1. JVM + CDP: possibly the wrong abstraction layer AI planning/reasoning/logic is on the JVM; browser actions are sent through CDP. Some doubts we cannot resolve internally: • Is the JVM too heavy for this domain? Will GC and scheduling cause tail latency? • Is CDP inherently unsuitable for high-throughput automation? • Does nobody actually need a JVM-native browser agent? • Would Go/Node/Python be more sensible choices? If the answer is “no, this is the wrong direction,” we’d really like to hear it. --- 2. High-concurrency runtime: likely to fall apart in real workloads We’re trying to push single-machine throughput on real, complex pages by relying on: • Kotlin coroutines • Minimizing DevTools round-trips • Raw CDP with multi-tab concurrency But our doubts are even larger: • Can Chromium realistically survive this scale? (render-process contention, GPU-thread limits, compositor stalls, etc.) • Are multi-tab workloads doomed to event interference, reordering, and deadlocks? • Will CDP scheduling become the true bottleneck? • Is raw CDP unavoidably more brittle than Playwright? If you’ve seen similar attempts fail, we’d especially like to know how they failed. --- 3. Non-LLM page-structure learning: probably not generalizable We built a small ML module to avoid calling an LLM every time we parse HTML. It works well on e-commerce pages, but we strongly suspect it will break elsewhere. Concerns: • Will it fail outright on news, forums, SaaS dashboards, and other domains? • Has anyone built DOM-structure-learning systems and then abandoned them? Why? • Is the long tail of the web fundamentally hostile to non-LLM approaches? Failure stories are particularly valuable. --- 4. Some questions we have zero confidence about • Does the world actually need yet another browser-automation stack? • Do “Browser Agents” have long-term practical value at all? • Do coroutine-style concurrency models provide real benefits under heavy CDP I/O? • Should we drop the “agent” layer entirely and just build a runtime? • What fatal issues exist around resource isolation, multi-tenancy, event storms, or long-tail page behaviors? • Do all high-concurrency browser runtimes eventually die for the same reasons? If the answer is “yes, stop now,” we’d prefer to know early. --- Prototype status We’ll open-source a very early version (missing docs, missing examples, and possibly flawed designs). Known issues include: • Deadlocks on certain complex sites that are hard to reproduce • CDP event reordering under high concurrency • Worse-than-expected memory behavior • Structure-learning module is inaccurate on non-e-commerce pages If you’ve built systems with heavy browser interaction, automation, data extraction, or treating the browser as a runtime, we’d love to hear about the bottlenecks you hit—so we don’t optimize toward the wrong direction. --- Finally Any single sentence of criticism may save us months. — Browser4 Team


  👤 grizzles Accepted Answer ✓
Open source it and you'll get all the feedback you desire.