Docs: Added a Browser component refereence. by writinwaters · Pull Request #15660 · infiniflow/ragflow

writinwaters · 2026-06-04T13:29:59Z

What problem does this PR solve?

Added a Browser component reference.

Type of change

Documentation Update

coderabbitai · 2026-06-04T13:34:49Z

📝 Walkthrough

Walkthrough

A new Browser component guide is added to the agent component reference documentation. The page describes the component's role in navigating web pages and extracting content, with complete configuration details including URL input, selector strategies, JavaScript execution control, and load-wait behavior options, plus documented output variables.

Changes

Browser Component Guide

Layer / File(s)	Summary
Browser component configuration guide `docs/guides/agent/agent_component_reference/browser.md`	New page documents the Browser component's purpose, configuration options (URL input, HTML/CSS selectors, JavaScript toggle, wait conditions), and output variables (full text, JSON data, error code).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 A browser guide hops into view,
Selectors and URLs, all shiny and new!
Web pages unwrap, their secrets laid bare,
Documentation complete—with methodical care! 📄✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the change as adding a Browser component reference documentation, which matches the actual documentation addition.
Description check	✅ Passed	The description includes both required sections from the template: problem statement and type of change (Documentation Update), though the problem description is minimal.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/guides/agent/agent_component_reference/browser.md`:
- Around line 22-49: The docs section lists fields that do not match the Browser
component contract; update the browser.md content to reflect the actual inputs
and runtime params exposed by BrowserParam.get_input_form() in
agent/component/browser.py: document the input fields prompts and upload_sources
(including their shapes/usage), and document runtime parameters max_steps,
headless, enable_default_extensions, chromium_sandbox, and persist_session
(their types, defaults, and intended behavior) instead of the obsolete Action
type/URL/Selectors/JavaScript/Wait-for items so the guide matches the
implemented component contract.
- Around line 50-56: Docs incorrectly list output variables; update the Browser
component docs to match the actual emitted keys and semantics used by the
implementation: replace "Full text/JSON data/Error code" with the real outputs
`content` (string with the extracted page/selectors text), `downloaded_files`
(array/list of downloaded file metadata), and `_ERROR` (exception string written
on failure, not an HTTP status code); reference the Browser component and its
_invoke method to ensure descriptions match how `content`, `downloaded_files`,
and `_ERROR` are produced and when each is present.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5346ccc8-52eb-46df-9a74-9a0acbfa5c03

📥 Commits

Reviewing files that changed from the base of the PR and between 98f2a2e and d212774.

📒 Files selected for processing (1)

docs/guides/agent/agent_component_reference/browser.md

coderabbitai · 2026-06-04T13:41:43Z

+### Action type
+
+Select the primary operation the browser will perform. For example, selecting **Get page content** instructs the component to retrieve the text and structure of the target webpage.
+
+### URL input
+
+*Mandatory*
+
+Specify the web address the browser should navigate to. This field accepts dynamic variables from upstream components. For instance, you can reference a variable like `{{search_results_url}}` generated by a previous search component.
+
+### Selectors
+
+Define the specific data points to extract from the loaded web page. You can add multiple selectors to extract different pieces of information simultaneously.
+
+- **HTML**: Allows you to extract broad page elements, such as the full HTML of the document.
+- **CSS selector**: Allows you to pinpoint exact data using standard CSS selectors (e.g., `.product-title` or `.product-price`). This is highly useful for targeting specific text blocks, tables, or item attributes within a structured webpage.
+
+### JavaScript enabled
+
+A toggle to determine if the browser should execute JavaScript on the target page. Setting this to **True** is highly recommended for scraping modern websites where content is dynamically rendered after the initial page load.
+
+### Wait for
+
+Configure delay conditions to ensure the page has fully loaded before the component attempts to extract data.
+
+- **Wait for element**: Pauses the extraction until a specific CSS class or ID (e.g., `results-container`) becomes visible on the page.
+- **Input (seconds)**: Imposes a strict time delay (e.g., `5 seconds`) to give the browser ample time to load all assets and execute background scripts.
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Documented configuration fields do not match the actual Browser component contract.

This section describes Action type, mandatory URL input, Selectors, JavaScript enabled, and Wait for, but the component input contract in agent/component/browser.py exposes prompts and upload_sources via BrowserParam.get_input_form() (line range 46-80), with runtime params like max_steps, headless, enable_default_extensions, chromium_sandbox, and persist_session. Please align this page to the implemented parameters to avoid invalid workflow setup guidance.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/guides/agent/agent_component_reference/browser.md` around lines 22 - 49, The docs section lists fields that do not match the Browser component contract; update the browser.md content to reflect the actual inputs and runtime params exposed by BrowserParam.get_input_form() in agent/component/browser.py: document the input fields prompts and upload_sources (including their shapes/usage), and document runtime parameters max_steps, headless, enable_default_extensions, chromium_sandbox, and persist_session (their types, defaults, and intended behavior) instead of the obsolete Action type/URL/Selectors/JavaScript/Wait-for items so the guide matches the implemented component contract.

coderabbitai · 2026-06-04T13:41:43Z

+### Output variables
+
+The **Browser** component provides three distinct outputs that can be referenced by downstream components in the workflow:
+
+- **Full text**: The raw text content extracted from the page or the targeted selectors.
+- **JSON data**: A structured representation of the extracted elements.
+- **Error code**: Captures and outputs any HTTP errors or navigation failures encountered during the process.


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Output variable names/types are inconsistent with implementation.

The docs list Full text, JSON data, and Error code, but the component actually emits content and downloaded_files (see agent/component/browser.py:46-80 and _invoke at 665-713), and writes errors to _ERROR as an exception string (not an HTTP error code). Please update the documented output variables to the real keys and semantics.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/guides/agent/agent_component_reference/browser.md` around lines 50 - 56, Docs incorrectly list output variables; update the Browser component docs to match the actual emitted keys and semantics used by the implementation: replace "Full text/JSON data/Error code" with the real outputs `content` (string with the extracted page/selectors text), `downloaded_files` (array/list of downloaded file metadata), and `_ERROR` (exception string written on failure, not an HTTP status code); reference the Browser component and its _invoke method to ensure descriptions match how `content`, `downloaded_files`, and `_ERROR` are produced and when each is present.

Docs: Added a Browser component refereence.

d212774

dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 4, 2026

writinwaters marked this pull request as draft June 4, 2026 13:30

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: Added a Browser component refereence.#15660

Docs: Added a Browser component refereence.#15660
writinwaters wants to merge 1 commit into
infiniflow:mainfrom
writinwaters:browser_component

writinwaters commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Uh oh!

coderabbitai Bot Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

writinwaters commented Jun 4, 2026

What problem does this PR solve?

Type of change

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading