Unlock the Power of Browser Agents with Surfer H's Open-Source Models

Discover the power of browser agents with Surfer H's open-source models. Learn how their state-of-the-art framework and efficient models can streamline web interactions, boost performance, and reduce costs. Explore the technical details and benchmark comparisons in this comprehensive overview.

14 يونيو 2025

party-gif

Discover the power of Surfer H, a cutting-edge browser agent framework that offers unparalleled performance and cost-efficiency. Explore the open-source models and research behind this groundbreaking technology, and learn how you can leverage it to automate web tasks and streamline your workflow.

Introducing Surfer H: A Cost-Efficient Web Agent with Open-Source Models

Surfer H is a state-of-the-art browser use agent framework recently released by H Company. The company has not only open-sourced the core models powering this technology but also published a research paper detailing their approach.

Surfer H is a visual web retrieval agent designed to be easily trained through reinforcement learning techniques. It comprises three main modules: a policy, a localizer, and a validator. These modules are compatible with any Vision-Language Model (VLM) capable of proposing and evaluating actions.

The key innovation of Surfer H is the use of Hollow One, a family of lightweight VLMs specialized in taking and evaluating actions, as well as localizing UI elements. Hollow One models can be used to determine the 2D coordinates of elements on a web page, without requiring access to the Document Object Model or the accessibility tree.

Surfer H achieves state-of-the-art performance on the Web Voyager benchmark, with a 92.2% accuracy while maintaining cost-efficiency. The open-source Hollow One models can be easily downloaded, extended, and fine-tuned by the community.

The research paper also introduces the WebClick evaluation dataset, which can be used to benchmark web agent performance. Surfer H and Hollow One represent a significant advancement in the field of computer use agents, enabling more efficient and accessible web automation.

Leveraging Hollow One: The Vision-Language Models Behind Surfer H

Hollow One is a family of lightweight vision-language models (VLMs) that power the Surfer H web agent framework. These models specialize in taking and evaluating actions, as well as localizing UI elements on web pages. The key components of Surfer H that leverage Hollow One are:

  1. Policy: The policy module proposes a sequence of actions to be executed in the browser, such as navigating to a URL, scrolling, or clicking on a specific element.

  2. Localizer: The localizer provides the 2D coordinates of the UI elements that the policy needs to interact with, based on the screenshot of the web page.

  3. Validator: The validator generates feedback on the agent's proposed actions and determines whether the overall task has been successfully completed.

The Hollow One models are designed to strike a balance between accuracy and cost-efficiency. Benchmarks show that the Hollow One 3B and 7B models outperform larger and more expensive VLMs on web navigation and information extraction tasks, achieving state-of-the-art performance on the Web Voyager benchmark.

The open-sourcing of the Hollow One models allows developers to easily integrate them into their own web automation projects, either by using the pre-trained models or fine-tuning them for specific use cases. The small size and efficiency of the Hollow One models make them particularly well-suited for deployment in cost-sensitive environments or on resource-constrained devices.

How Surfer H Works: Integrating Policy, Localizer, and Validator

Surfer H, the cost-efficient web agent framework released by H Company, comprises three main modules: a policy, a localizer, and a validator. These modules work together to enable the agent to interact with websites and accomplish tasks effectively.

The policy proposes a sequence of actions to be executed, such as navigating to a website, scrolling, clicking on specific elements, and so on. These actions are designed to mimic human-like interactions with the web interface.

The localizer plays a crucial role in the process. When the policy generates an action that requires interacting with a specific element on a web page, the localizer provides the 2D coordinates of that element. This allows the agent to accurately target and interact with the desired UI components.

The validator is responsible for evaluating the agent's progress and determining whether the task has been successfully completed. It generates feedback about the agent's actions and decides whether the final answer is suitable for the user. If the answer is valid, it is returned to the user. Otherwise, the feedback is incorporated into the agent's memory, and the execution continues until the task is completed or a time/cost budget is reached.

This integration of the policy, localizer, and validator modules enables Surfer H to navigate websites, interact with UI elements, and accomplish tasks in a robust and efficient manner. The open-source Hollow One models, which specialize in web navigation and information extraction, power these modules, providing state-of-the-art performance on the Web Voyager benchmarks.

Benchmarking Surfer H: Accuracy and Cost-Efficiency Comparisons

The research paper presented by H Company showcases the impressive performance of their Surfer H agent, powered by the Hollow One family of vision-language models. The benchmarks highlight Surfer H's state-of-the-art accuracy and exceptional cost-efficiency.

When powered by the Hollow 1 models, Surfer H achieves a remarkable 92.2% accuracy on the Web Voyager benchmark, outperforming other leading models. This demonstrates Surfer H's ability to strike a near-optimal balance between accuracy and cost-efficiency.

The paper further compares the click accuracy across various models, including Quen 2.5, Vision Language 3B, and the Hollow 1 series. Consistently, the Hollow 1 models, both the 3B and 7B versions, outperform the competition, showcasing their superior localization capabilities.

In terms of cost-efficiency, the Hollow 1 models shine. The paper presents the average cost per run, where the Hollow 1 models are positioned on the left side of the graph, indicating lower costs. When considering the cost per million input and output tokens, the Hollow 1 series again demonstrates a significant advantage over the other models.

The researchers highlight that the fully Hollow 1-based Surfer H agent offers the strongest trade-off between accuracy and cost-efficiency. Specifically, the Surfer H + Hollow1 7B configuration achieves 92.2% accuracy at only 13 cents per task, outperforming the Surfer H + GPT-4 and Surfer H + 4.1 Mini alternatives.

By leveraging the lightweight and efficient Hollow 1 models, Surfer H is able to deliver state-of-the-art performance while maintaining a remarkably low cost per task, making it a highly attractive solution for a wide range of web-based applications and tasks.

Automating Tasks with Surfer H: Hands-On Demonstration

H Company has just released a state-of-the-art browser use agent framework, Surfer H. Not only have they open-sourced the core models that power it, but they've also published a research paper detailing how they achieved such great performance.

Surfer H is a browser-based agent that allows you to automate various web-related tasks. You can simply provide it with a task, and it will browse the web, execute the necessary actions, and deliver the desired outcome. The agent is powered by a policy, a localizer, and a validator, which work together to navigate the web and complete the given task.

The open-source models released by H Company, known as Hollow One, include a navigation model and a localization model. These models can be used to power your own web automation projects, and you can even fine-tune them to suit your specific needs.

The research paper presented by H Company showcases the impressive performance of Surfer H, which achieves a 92.2% state-of-the-art performance on the Web Voyager benchmark. This is achieved while maintaining a favorable balance between accuracy and cost-efficiency.

To demonstrate the capabilities of Surfer H, let's kick off an agent and watch it in action. I'll have the agent search for Pokémon cards on eBay and create a Google Sheet with the results. You'll see the agent navigate the web, execute the necessary actions, and deliver the final output.

Additionally, H Company has announced the release of Tester H, a tool for automating QA and testing of websites and applications. This tool allows you to define tests using a simple, human-readable syntax, and then have the agent execute those tests automatically.

Overall, the launch of Surfer H and the open-sourcing of the Hollow One models represent a significant advancement in the field of web automation. I encourage you to explore these tools and the accompanying research paper to see how you can leverage them in your own projects.

Surfer H's Advanced Features: Human Involvement, Integrations, and Upcoming Payments

Surfer H offers several advanced features that enhance its capabilities and flexibility:

  1. Human Involvement Levels: Surfer H allows you to select the desired level of human involvement in the task execution process. You can choose from "Highly Involved", "Moderately Involved", and "Fully Automated" modes, giving you control over the balance between human oversight and autonomous agent operation.

  2. Integrations: Surfer H provides built-in integrations with popular services and platforms, including Google Sheets, Google Docs, Google Drive, Notion, Slack, and Zapier. This allows you to seamlessly connect Surfer H with the tools you already use, simplifying the process of authenticating and accessing these services.

  3. Upcoming Payments: Surfer H is set to introduce a payments feature in the near future. This will enable agents to make payments on your behalf, further expanding the range of tasks they can handle autonomously. By adding your payment credentials, you can empower Surfer H to execute transactions as part of its web-based operations.

  4. Tester H: In addition to Surfer H, the company has also announced Tester H, a tool for automating QA and testing of websites and applications. Tester H allows you to define test cases using a simple, human-readable syntax, and then execute these tests automatically, ensuring the reliability and functionality of your digital products.

Overall, Surfer H's advanced features, integrations, and upcoming payment capabilities demonstrate the platform's commitment to providing a comprehensive and flexible solution for web-based task automation and agent-driven interactions.

Tester H: Automated QA and Website Testing

Tester H is a new tool from H Company that allows you to automate the testing of your websites, apps, and other digital products. It is currently in private beta, but you can apply for access.

Tester H works by allowing you to define test cases using a simple, human-readable syntax. For example, you can write a test case like this:

Given navigate to Airbnb on the specific page
When scroll laterally
And click on the first photo with an orange bed
Then the image of the orange bed is displayed

Once you've defined your test cases, you can simply hit "Go" and Tester H will automatically execute the tests, simulating user interactions with your product. This allows you to quickly and easily identify and fix any issues or bugs that may be present.

One of the key benefits of Tester H is its ability to integrate with a wide range of services and platforms, including Google Sheets, Google Docs, Notion, Slack, and Zapier. This makes it easy to seamlessly incorporate Tester H into your existing workflows and processes.

Additionally, Tester H is designed to be cost-effective, with plans that offer a balance of accuracy and efficiency. This makes it a valuable tool for teams of all sizes, from small startups to large enterprises.

Overall, Tester H is a powerful and innovative tool that can help you streamline your QA and testing processes, saving you time and resources while ensuring the quality and reliability of your digital products.

التعليمات