-
Notifications
You must be signed in to change notification settings - Fork 11
Add Human Pages integration for human fallback #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
human-pages-ai
wants to merge
3
commits into
tinyfish-io:main
Choose a base branch
from
human-pages-ai:add-humanpages-integration
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 2 commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| __pycache__/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| MIT License | ||
|
|
||
| Copyright (c) 2025 TinyFish, Inc. | ||
|
|
||
| Permission is hereby granted, free of charge, to any person obtaining a copy | ||
| of this software and associated documentation files (the "Software"), to deal | ||
| in the Software without restriction, including without limitation the rights | ||
| to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
| copies of the Software, and to permit persons to whom the Software is | ||
| furnished to do so, subject to the following conditions: | ||
|
|
||
| The above copyright notice and this permission notice shall be included in all | ||
| copies or substantial portions of the Software. | ||
|
|
||
| THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
| IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
| FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
| AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
| LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
| OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
| SOFTWARE. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| .PHONY: all format lint test tests integration_tests docker_tests help extended_tests | ||
|
|
||
| # Default target executed when no arguments are given to make. | ||
| all: help | ||
|
|
||
| # Define a variable for the test file path. | ||
| TEST_FILE ?= tests/unit_tests/ | ||
| integration_test integration_tests: TEST_FILE = tests/integration_tests/ | ||
|
|
||
|
|
||
| # unit tests are run with the --disable-socket flag to prevent network calls | ||
| test tests: | ||
| poetry run pytest --disable-socket --allow-unix-socket $(TEST_FILE) | ||
|
|
||
| test_watch: | ||
| poetry run ptw --snapshot-update --now . -- -vv $(TEST_FILE) | ||
|
|
||
| # integration tests are run without the --disable-socket flag to allow network calls | ||
| integration_test integration_tests: | ||
| poetry run pytest $(TEST_FILE) | ||
|
|
||
| ###################### | ||
| # LINTING AND FORMATTING | ||
| ###################### | ||
|
|
||
| # Define a variable for Python and notebook files. | ||
| PYTHON_FILES=. | ||
| MYPY_CACHE=.mypy_cache | ||
| lint format: PYTHON_FILES=. | ||
| lint_diff format_diff: PYTHON_FILES=$(shell git diff --relative --name-only --diff-filter=d main | grep -E '\.py$$|\.ipynb$$') | ||
| lint_package: PYTHON_FILES=agentql_humanpages | ||
| lint_tests: PYTHON_FILES=tests | ||
| lint_tests: MYPY_CACHE=.mypy_cache_test | ||
|
|
||
| lint lint_diff lint_package lint_tests: | ||
| [ "$(PYTHON_FILES)" = "" ] || poetry run ruff check $(PYTHON_FILES) | ||
| [ "$(PYTHON_FILES)" = "" ] || poetry run ruff format $(PYTHON_FILES) --diff | ||
| [ "$(PYTHON_FILES)" = "" ] || { mkdir -p $(MYPY_CACHE) && poetry run mypy $(PYTHON_FILES) --cache-dir $(MYPY_CACHE); } | ||
|
|
||
| format format_diff: | ||
| [ "$(PYTHON_FILES)" = "" ] || poetry run ruff format $(PYTHON_FILES) | ||
| [ "$(PYTHON_FILES)" = "" ] || poetry run ruff check --select I --fix $(PYTHON_FILES) | ||
|
|
||
| spell_check: | ||
| poetry run codespell --toml pyproject.toml | ||
|
|
||
| spell_fix: | ||
| poetry run codespell --toml pyproject.toml -w | ||
|
|
||
| check_imports: $(shell find agentql_humanpages -name '*.py') | ||
| poetry run python ./scripts/check_imports.py $^ | ||
|
|
||
| ###################### | ||
| # HELP | ||
| ###################### | ||
|
|
||
| help: | ||
| @echo '----' | ||
| @echo 'check_imports - check imports' | ||
| @echo 'format - run code formatters' | ||
| @echo 'lint - run linters' | ||
| @echo 'test - run unit tests' | ||
| @echo 'tests - run unit tests' | ||
| @echo 'test TEST_FILE=<test_file> - run all tests in file' | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,117 @@ | ||
| # agentql-humanpages | ||
|
|
||
| An integration package connecting [AgentQL](https://www.agentql.com/) and [Human Pages](https://humanpages.ai) for human-in-the-loop web data extraction. | ||
|
|
||
| When AgentQL's automated extraction fails -- due to anti-bot protections, CAPTCHAs, empty results, or any other blocker -- the task is automatically delegated to a human worker via the Human Pages platform. | ||
|
|
||
| ## Installation | ||
|
|
||
| ```bash | ||
| pip install -U agentql-humanpages | ||
| ``` | ||
|
|
||
| You need to configure both API keys: | ||
|
|
||
| - `AGENTQL_API_KEY` -- get one from the [AgentQL Dev Portal](https://dev.agentql.com) | ||
| - `HUMANPAGES_API_KEY` -- get one from [Human Pages](https://humanpages.ai) | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ```python | ||
| from agentql_humanpages import HumanFallbackAgent | ||
|
|
||
| agent = HumanFallbackAgent( | ||
| agentql_api_key="your-agentql-key", | ||
| humanpages_api_key="your-humanpages-key", | ||
| ) | ||
|
|
||
| result = agent.extract( | ||
| url="https://example.com/products", | ||
| query="{ products[] { name price } }", | ||
| ) | ||
|
|
||
| if result["source"] == "agentql": | ||
| print("Extracted via AgentQL:", result["data"]) | ||
| else: | ||
| print("Extracted via human:", result["messages"]) | ||
| ``` | ||
|
|
||
| ## HumanFallbackAgent | ||
|
|
||
| The main entry point. Attempts AgentQL extraction first, then falls back to Human Pages. | ||
|
|
||
| ```python | ||
| agent = HumanFallbackAgent( | ||
| agentql_api_key="...", # or set AGENTQL_API_KEY env var | ||
| humanpages_api_key="...", # or set HUMANPAGES_API_KEY env var | ||
| price_usdc=5.0, # default price for human jobs | ||
| deadline_hours=24, # default deadline for human jobs | ||
| ) | ||
| ``` | ||
|
|
||
| ### extract() | ||
|
|
||
| ```python | ||
| result = agent.extract( | ||
| url="https://example.com", | ||
| query="{ products[] { name price } }", # AgentQL query | ||
| # OR | ||
| prompt="Get all product names and prices", # Natural language | ||
| fallback_description="Custom instructions for the human worker", | ||
| price_usdc=10.0, # override default price | ||
| deadline_hours=12, # override default deadline | ||
| ) | ||
| ``` | ||
|
|
||
| Returns a dict with: | ||
| - `source`: `"agentql"` or `"humanpages"` | ||
| - `data`: extracted data (when source is agentql) | ||
| - `job_id`, `status`, `messages`: job details (when source is humanpages) | ||
|
|
||
| ### aextract() | ||
|
|
||
| Async version of `extract()` with the same interface. | ||
|
|
||
| ## HumanPagesClient | ||
|
|
||
| Lower-level client for the Human Pages REST API: | ||
|
|
||
| ```python | ||
| from agentql_humanpages import HumanPagesClient | ||
|
|
||
| client = HumanPagesClient(api_key="your-key") | ||
|
|
||
| # Search for available humans | ||
| humans = client.search_humans(skill="web task", available=True) | ||
|
|
||
| # Create a job | ||
| job = client.create_job( | ||
| human_id=humans[0]["id"], | ||
| title="Extract product data", | ||
| description="Visit example.com and extract all product names and prices.", | ||
| price_usdc=5.0, | ||
| deadline_hours=24, | ||
| ) | ||
|
|
||
| # Check job status | ||
| status = client.get_job_status(job["id"]) | ||
|
|
||
| # Get messages | ||
| messages = client.get_job_messages(job["id"]) | ||
| ``` | ||
|
|
||
| All methods have async counterparts (`asearch_humans`, `acreate_job`, `aget_job_status`, `aget_job_messages`). | ||
|
|
||
| ## Run Tests | ||
|
|
||
| Unit tests (no network calls): | ||
|
|
||
| ```bash | ||
| make test | ||
| ``` | ||
|
|
||
| Integration tests (requires valid API keys): | ||
|
|
||
| ```bash | ||
| make integration_tests | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| from importlib import metadata | ||
|
|
||
| from agentql_humanpages.agent import HumanFallbackAgent | ||
| from agentql_humanpages.client import HumanPagesClient | ||
| from agentql_humanpages.const import ( | ||
| DEFAULT_DEADLINE_HOURS, | ||
| DEFAULT_POLL_INTERVAL_SECONDS, | ||
| DEFAULT_PRICE_USDC, | ||
| DEFAULT_TIMEOUT_SECONDS, | ||
| HUMANPAGES_BASE_URL, | ||
| ) | ||
|
|
||
| try: | ||
| __version__ = metadata.version(__package__) | ||
| except metadata.PackageNotFoundError: | ||
| # Case where package metadata is not available. | ||
| __version__ = "0.1.0" | ||
|
|
||
| __all__ = [ | ||
| "DEFAULT_DEADLINE_HOURS", | ||
| "DEFAULT_POLL_INTERVAL_SECONDS", | ||
| "DEFAULT_PRICE_USDC", | ||
| "DEFAULT_TIMEOUT_SECONDS", | ||
| "HUMANPAGES_BASE_URL", | ||
| "HumanFallbackAgent", | ||
| "HumanPagesClient", | ||
| "__version__", | ||
| ] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.