-
Notifications
You must be signed in to change notification settings - Fork 9.5k
Docs: Added a Browser component refereence. #15660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| --- | ||
| sidebar_position: 28 | ||
| slug: /browser_component | ||
| sidebar_custom_props: { | ||
| categoryIcon: LucideSwatchBook | ||
| } | ||
| --- | ||
| # Browser component | ||
|
|
||
| A component that autonomously navigates web pages and extracts specific content using standard formats or custom selectors. | ||
|
|
||
| --- | ||
|
|
||
| A **Browser** component is usually the downstream of an input component or a search API component that provides URLs, and the upstream of an LLM component for summarization or analysis. | ||
|
|
||
| ## Scenarios | ||
|
|
||
| A **Browser** component is essential when you need the agent workflow to fetch real-time data from external websites, scrape specific dynamic elements, or navigate web pages before passing the extracted content to an LLM. | ||
|
|
||
| ## Configurations | ||
|
|
||
| ### Action type | ||
|
|
||
| Select the primary operation the browser will perform. For example, selecting **Get page content** instructs the component to retrieve the text and structure of the target webpage. | ||
|
|
||
| ### URL input | ||
|
|
||
| *Mandatory* | ||
|
|
||
| Specify the web address the browser should navigate to. This field accepts dynamic variables from upstream components. For instance, you can reference a variable like `{{search_results_url}}` generated by a previous search component. | ||
|
|
||
| ### Selectors | ||
|
|
||
| Define the specific data points to extract from the loaded web page. You can add multiple selectors to extract different pieces of information simultaneously. | ||
|
|
||
| - **HTML**: Allows you to extract broad page elements, such as the full HTML of the document. | ||
| - **CSS selector**: Allows you to pinpoint exact data using standard CSS selectors (e.g., `.product-title` or `.product-price`). This is highly useful for targeting specific text blocks, tables, or item attributes within a structured webpage. | ||
|
|
||
| ### JavaScript enabled | ||
|
|
||
| A toggle to determine if the browser should execute JavaScript on the target page. Setting this to **True** is highly recommended for scraping modern websites where content is dynamically rendered after the initial page load. | ||
|
|
||
| ### Wait for | ||
|
|
||
| Configure delay conditions to ensure the page has fully loaded before the component attempts to extract data. | ||
|
|
||
| - **Wait for element**: Pauses the extraction until a specific CSS class or ID (e.g., `results-container`) becomes visible on the page. | ||
| - **Input (seconds)**: Imposes a strict time delay (e.g., `5 seconds`) to give the browser ample time to load all assets and execute background scripts. | ||
|
|
||
| ### Output variables | ||
|
|
||
| The **Browser** component provides three distinct outputs that can be referenced by downstream components in the workflow: | ||
|
|
||
| - **Full text**: The raw text content extracted from the page or the targeted selectors. | ||
| - **JSON data**: A structured representation of the extracted elements. | ||
| - **Error code**: Captures and outputs any HTTP errors or navigation failures encountered during the process. | ||
|
Comment on lines
+50
to
+56
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Output variable names/types are inconsistent with implementation. The docs list 🤖 Prompt for AI Agents |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documented configuration fields do not match the actual Browser component contract.
This section describes
Action type, mandatoryURL input,Selectors,JavaScript enabled, andWait for, but the component input contract inagent/component/browser.pyexposespromptsandupload_sourcesviaBrowserParam.get_input_form()(line range 46-80), with runtime params likemax_steps,headless,enable_default_extensions,chromium_sandbox, andpersist_session. Please align this page to the implemented parameters to avoid invalid workflow setup guidance.🤖 Prompt for AI Agents