Skip to content

Docs: Added a Browser component refereence.#15660

Draft
writinwaters wants to merge 1 commit into
infiniflow:mainfrom
writinwaters:browser_component
Draft

Docs: Added a Browser component refereence.#15660
writinwaters wants to merge 1 commit into
infiniflow:mainfrom
writinwaters:browser_component

Conversation

@writinwaters
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Added a Browser component reference.

Type of change

  • Documentation Update

@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 4, 2026
@writinwaters writinwaters marked this pull request as draft June 4, 2026 13:30
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

A new Browser component guide is added to the agent component reference documentation. The page describes the component's role in navigating web pages and extracting content, with complete configuration details including URL input, selector strategies, JavaScript execution control, and load-wait behavior options, plus documented output variables.

Changes

Browser Component Guide

Layer / File(s) Summary
Browser component configuration guide
docs/guides/agent/agent_component_reference/browser.md
New page documents the Browser component's purpose, configuration options (URL input, HTML/CSS selectors, JavaScript toggle, wait conditions), and output variables (full text, JSON data, error code).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 A browser guide hops into view,
Selectors and URLs, all shiny and new!
Web pages unwrap, their secrets laid bare,
Documentation complete—with methodical care! 📄✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the change as adding a Browser component reference documentation, which matches the actual documentation addition.
Description check ✅ Passed The description includes both required sections from the template: problem statement and type of change (Documentation Update), though the problem description is minimal.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/guides/agent/agent_component_reference/browser.md`:
- Around line 22-49: The docs section lists fields that do not match the Browser
component contract; update the browser.md content to reflect the actual inputs
and runtime params exposed by BrowserParam.get_input_form() in
agent/component/browser.py: document the input fields prompts and upload_sources
(including their shapes/usage), and document runtime parameters max_steps,
headless, enable_default_extensions, chromium_sandbox, and persist_session
(their types, defaults, and intended behavior) instead of the obsolete Action
type/URL/Selectors/JavaScript/Wait-for items so the guide matches the
implemented component contract.
- Around line 50-56: Docs incorrectly list output variables; update the Browser
component docs to match the actual emitted keys and semantics used by the
implementation: replace "Full text/JSON data/Error code" with the real outputs
`content` (string with the extracted page/selectors text), `downloaded_files`
(array/list of downloaded file metadata), and `_ERROR` (exception string written
on failure, not an HTTP status code); reference the Browser component and its
_invoke method to ensure descriptions match how `content`, `downloaded_files`,
and `_ERROR` are produced and when each is present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5346ccc8-52eb-46df-9a74-9a0acbfa5c03

📥 Commits

Reviewing files that changed from the base of the PR and between 98f2a2e and d212774.

📒 Files selected for processing (1)
  • docs/guides/agent/agent_component_reference/browser.md

Comment on lines +22 to +49
### Action type

Select the primary operation the browser will perform. For example, selecting **Get page content** instructs the component to retrieve the text and structure of the target webpage.

### URL input

*Mandatory*

Specify the web address the browser should navigate to. This field accepts dynamic variables from upstream components. For instance, you can reference a variable like `{{search_results_url}}` generated by a previous search component.

### Selectors

Define the specific data points to extract from the loaded web page. You can add multiple selectors to extract different pieces of information simultaneously.

- **HTML**: Allows you to extract broad page elements, such as the full HTML of the document.
- **CSS selector**: Allows you to pinpoint exact data using standard CSS selectors (e.g., `.product-title` or `.product-price`). This is highly useful for targeting specific text blocks, tables, or item attributes within a structured webpage.

### JavaScript enabled

A toggle to determine if the browser should execute JavaScript on the target page. Setting this to **True** is highly recommended for scraping modern websites where content is dynamically rendered after the initial page load.

### Wait for

Configure delay conditions to ensure the page has fully loaded before the component attempts to extract data.

- **Wait for element**: Pauses the extraction until a specific CSS class or ID (e.g., `results-container`) becomes visible on the page.
- **Input (seconds)**: Imposes a strict time delay (e.g., `5 seconds`) to give the browser ample time to load all assets and execute background scripts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Documented configuration fields do not match the actual Browser component contract.

This section describes Action type, mandatory URL input, Selectors, JavaScript enabled, and Wait for, but the component input contract in agent/component/browser.py exposes prompts and upload_sources via BrowserParam.get_input_form() (line range 46-80), with runtime params like max_steps, headless, enable_default_extensions, chromium_sandbox, and persist_session. Please align this page to the implemented parameters to avoid invalid workflow setup guidance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/agent/agent_component_reference/browser.md` around lines 22 - 49,
The docs section lists fields that do not match the Browser component contract;
update the browser.md content to reflect the actual inputs and runtime params
exposed by BrowserParam.get_input_form() in agent/component/browser.py: document
the input fields prompts and upload_sources (including their shapes/usage), and
document runtime parameters max_steps, headless, enable_default_extensions,
chromium_sandbox, and persist_session (their types, defaults, and intended
behavior) instead of the obsolete Action type/URL/Selectors/JavaScript/Wait-for
items so the guide matches the implemented component contract.

Comment on lines +50 to +56
### Output variables

The **Browser** component provides three distinct outputs that can be referenced by downstream components in the workflow:

- **Full text**: The raw text content extracted from the page or the targeted selectors.
- **JSON data**: A structured representation of the extracted elements.
- **Error code**: Captures and outputs any HTTP errors or navigation failures encountered during the process. No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Output variable names/types are inconsistent with implementation.

The docs list Full text, JSON data, and Error code, but the component actually emits content and downloaded_files (see agent/component/browser.py:46-80 and _invoke at 665-713), and writes errors to _ERROR as an exception string (not an HTTP error code). Please update the documented output variables to the real keys and semantics.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/agent/agent_component_reference/browser.md` around lines 50 - 56,
Docs incorrectly list output variables; update the Browser component docs to
match the actual emitted keys and semantics used by the implementation: replace
"Full text/JSON data/Error code" with the real outputs `content` (string with
the extracted page/selectors text), `downloaded_files` (array/list of downloaded
file metadata), and `_ERROR` (exception string written on failure, not an HTTP
status code); reference the Browser component and its _invoke method to ensure
descriptions match how `content`, `downloaded_files`, and `_ERROR` are produced
and when each is present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant