Docs: Added a Browser component refereence.#15660
Conversation
📝 WalkthroughWalkthroughA new Browser component guide is added to the agent component reference documentation. The page describes the component's role in navigating web pages and extracting content, with complete configuration details including URL input, selector strategies, JavaScript execution control, and load-wait behavior options, plus documented output variables. ChangesBrowser Component Guide
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/guides/agent/agent_component_reference/browser.md`:
- Around line 22-49: The docs section lists fields that do not match the Browser
component contract; update the browser.md content to reflect the actual inputs
and runtime params exposed by BrowserParam.get_input_form() in
agent/component/browser.py: document the input fields prompts and upload_sources
(including their shapes/usage), and document runtime parameters max_steps,
headless, enable_default_extensions, chromium_sandbox, and persist_session
(their types, defaults, and intended behavior) instead of the obsolete Action
type/URL/Selectors/JavaScript/Wait-for items so the guide matches the
implemented component contract.
- Around line 50-56: Docs incorrectly list output variables; update the Browser
component docs to match the actual emitted keys and semantics used by the
implementation: replace "Full text/JSON data/Error code" with the real outputs
`content` (string with the extracted page/selectors text), `downloaded_files`
(array/list of downloaded file metadata), and `_ERROR` (exception string written
on failure, not an HTTP status code); reference the Browser component and its
_invoke method to ensure descriptions match how `content`, `downloaded_files`,
and `_ERROR` are produced and when each is present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5346ccc8-52eb-46df-9a74-9a0acbfa5c03
📒 Files selected for processing (1)
docs/guides/agent/agent_component_reference/browser.md
| ### Action type | ||
|
|
||
| Select the primary operation the browser will perform. For example, selecting **Get page content** instructs the component to retrieve the text and structure of the target webpage. | ||
|
|
||
| ### URL input | ||
|
|
||
| *Mandatory* | ||
|
|
||
| Specify the web address the browser should navigate to. This field accepts dynamic variables from upstream components. For instance, you can reference a variable like `{{search_results_url}}` generated by a previous search component. | ||
|
|
||
| ### Selectors | ||
|
|
||
| Define the specific data points to extract from the loaded web page. You can add multiple selectors to extract different pieces of information simultaneously. | ||
|
|
||
| - **HTML**: Allows you to extract broad page elements, such as the full HTML of the document. | ||
| - **CSS selector**: Allows you to pinpoint exact data using standard CSS selectors (e.g., `.product-title` or `.product-price`). This is highly useful for targeting specific text blocks, tables, or item attributes within a structured webpage. | ||
|
|
||
| ### JavaScript enabled | ||
|
|
||
| A toggle to determine if the browser should execute JavaScript on the target page. Setting this to **True** is highly recommended for scraping modern websites where content is dynamically rendered after the initial page load. | ||
|
|
||
| ### Wait for | ||
|
|
||
| Configure delay conditions to ensure the page has fully loaded before the component attempts to extract data. | ||
|
|
||
| - **Wait for element**: Pauses the extraction until a specific CSS class or ID (e.g., `results-container`) becomes visible on the page. | ||
| - **Input (seconds)**: Imposes a strict time delay (e.g., `5 seconds`) to give the browser ample time to load all assets and execute background scripts. | ||
|
|
There was a problem hiding this comment.
Documented configuration fields do not match the actual Browser component contract.
This section describes Action type, mandatory URL input, Selectors, JavaScript enabled, and Wait for, but the component input contract in agent/component/browser.py exposes prompts and upload_sources via BrowserParam.get_input_form() (line range 46-80), with runtime params like max_steps, headless, enable_default_extensions, chromium_sandbox, and persist_session. Please align this page to the implemented parameters to avoid invalid workflow setup guidance.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/guides/agent/agent_component_reference/browser.md` around lines 22 - 49,
The docs section lists fields that do not match the Browser component contract;
update the browser.md content to reflect the actual inputs and runtime params
exposed by BrowserParam.get_input_form() in agent/component/browser.py: document
the input fields prompts and upload_sources (including their shapes/usage), and
document runtime parameters max_steps, headless, enable_default_extensions,
chromium_sandbox, and persist_session (their types, defaults, and intended
behavior) instead of the obsolete Action type/URL/Selectors/JavaScript/Wait-for
items so the guide matches the implemented component contract.
| ### Output variables | ||
|
|
||
| The **Browser** component provides three distinct outputs that can be referenced by downstream components in the workflow: | ||
|
|
||
| - **Full text**: The raw text content extracted from the page or the targeted selectors. | ||
| - **JSON data**: A structured representation of the extracted elements. | ||
| - **Error code**: Captures and outputs any HTTP errors or navigation failures encountered during the process. No newline at end of file |
There was a problem hiding this comment.
Output variable names/types are inconsistent with implementation.
The docs list Full text, JSON data, and Error code, but the component actually emits content and downloaded_files (see agent/component/browser.py:46-80 and _invoke at 665-713), and writes errors to _ERROR as an exception string (not an HTTP error code). Please update the documented output variables to the real keys and semantics.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/guides/agent/agent_component_reference/browser.md` around lines 50 - 56,
Docs incorrectly list output variables; update the Browser component docs to
match the actual emitted keys and semantics used by the implementation: replace
"Full text/JSON data/Error code" with the real outputs `content` (string with
the extracted page/selectors text), `downloaded_files` (array/list of downloaded
file metadata), and `_ERROR` (exception string written on failure, not an HTTP
status code); reference the Browser component and its _invoke method to ensure
descriptions match how `content`, `downloaded_files`, and `_ERROR` are produced
and when each is present.
What problem does this PR solve?
Added a Browser component reference.
Type of change