Skip to content

Feat/manual mode#610

Open
Zropk66 wants to merge 21 commits into
Samueli924:mainfrom
Zropk66:feat/manual-mode
Open

Feat/manual mode#610
Zropk66 wants to merge 21 commits into
Samueli924:mainfrom
Zropk66:feat/manual-mode

Conversation

@Zropk66

@Zropk66 Zropk66 commented Jun 2, 2026

Copy link
Copy Markdown

变更说明

这个 PR 主要优化了多线程并发运行下的几处死锁与报错,重构了题库模块(重构为抽象基类以支持批量搜题),并新增了用于交互的手动答题模式。

主要修改:

  1. 多线程与并发安全修复
    - 修复了多课程并行时,优先级队列因字典直接比较抛出 TypeError 的报错。
    - 修复了 tqdm 动态替换锁导致的 loguru 异步日志线程锁释放报错。
    - 在 cookies.py 读写、AI 和 SiliconFlow 请求处引入线程锁,防止并发读写冲突和请求频次过高导致的异常。
    - 升级了 httpx[socks] 并引入 tenacity 重试机制以支持 SOCKS 代理。
  2. 题库接口重构与安全校验
    - 将 Tiku 重构为抽象基类,规范了 query_all 批量搜题接口。
    - 在 TikuFallback 中增加长度与类型防御校验,防止子题库返回数据缺失导致后续题目答案错位。
    - 修改 check_answer,放行手动模式和 Fallback 包装器,避免手动输入的答案被二次校验误杀。
  3. 新增手动答题模式 (TikuManual)
    - 支持单题输入与批量粘贴(根据配置的分隔符分割),输入时会自动进行即时格式检查(比如单选不能选两个,多选不能超出可用
    选项范围)。
    - 答题时自动挂起其他无关日志并清除 tqdm 进度条,保证前台输入界面整洁。
    - 优先提取选项的 aria-label,解决了部分特殊多选题选项内容抓取不全的 Bug。

Summary by CodeRabbit

  • New Features

    • Interactive manual-answer provider (single/batch) with configurable defaults and separator.
    • Batch querying with provider-chain fallback and configurable provider loading.
    • Config/CLI retry-interval support.
  • Bug Fixes

    • Atomic cache writes and recovery from corrupted cache entries.
    • Safer form parsing, improved judgement matching (case-insensitive, expanded tokens).
    • Thread-safe cookie/captcha/LLM access, buffered logs during manual prompts, OCR optional, stronger retry and forbidden-recovery flows, and more robust progress handling.
  • Chores

    • Updated requirements and configuration template.

Zropk66 added 2 commits June 1, 2026 23:14
- 修复多课程并行时优先级队列的字典比较报错 (TypeError)
- 移除 tqdm 动态替换锁导致的 loguru 异步日志线程锁释放报错 (RuntimeError)
- 重构 Tiku 为抽象基类 (ABC),支持自定义配置文件路径 `-c` 的深层穿透
- 实现线程安全的 Cookie 自动重登录与 403 验证码 (ddddocr) 自动识别绕过
- 优化任务执行流为全局跨课程并发调度,新增 `retry_interval` 重试间隔配置
- 添加对 Python 3.12 及以下版本的兼容处理,使用 with 语句规范 tqdm 资源释放
- 扁平化重构局部嵌套函数,对 ddddocr 导包异常做防崩溃处理以提升平台兼容性
- 引入 tenacity 重试机制并升级 httpx[socks] 依赖,完美支持系统 SOCKS 代理
- 新增 `TikuManual` 交互式手动答题模式,支持单题输入与批量粘贴答题,并提供即时输入校验。
- 在 `Tiku` 中实现 `query_all` 与 `_query_all` 批量搜题接口,并在 `TikuFallback` 中重写以支持批量回退搜题。
- 为 `TikuFallback._query_all` 增加防御性校验(检查返回类型与长度),防止多题库回退时答案配对错位。
- 在 `check_answer` 校验中放行手动模式与回退包装器,防止因格式严格校验误杀正确的答案。
- 为 `AI` 与 `SiliconFlow` 大模型查询引入线程锁 `threading.Lock` 以避免高并发下的状态冲突。
- 为 `cookies.py` 引入全局 `threading.RLock`,实现多线程并发读取/更新 Cookie 时的文件写入安全。
- 解析选项节点时优先提取 `aria-label` 属性,修复多选题选项文本截断缺失的问题。
- 增强 `captcha.py` 的 OCR 组件初始化兼容性与 `logger.py` 多线程日志文件切片安全性。
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Converts Tiku into an instance-configurable ABC with batched query_all, adds an interactive TikuManual provider, integrates batched answering into study_work/study_video, improves resilience (locks, retries, captcha/session recovery), makes cookie/log access thread-safe, and refactors main task orchestration to a global sequential JobProcessor.

Changes

Batch Query Framework & Study Integration

Layer / File(s) Summary
Abstract Tiku Base Class with Config Path Support
api/answer.py
Converts Tiku to an abstract base class with instance-level _config_path; adds query_all and default _query_all; introduces static get_tiku_from_config.
Batch Query Caching and query_all implementation
api/answer.py
Implements query_all with cache hit/miss handling, pending-index tracking, batched _query_all invocation, per-item check_answer validation, and atomic cache writes.
Interactive TikuManual Provider with Input Validation
api/answer.py
Adds TikuManual for console-driven single/batch input with class-level manual lock, parsing/validation, configurable separator, and registers it in PROVIDER_REGISTRY.
TikuFallback Batch Fallback with Provider Chain
api/answer.py
Implements TikuFallback._query_all to sequentially batch-query providers, validate list responses and lengths, and progressively fill pending answers.
Provider constructors & exports
api/answer.py
Updates provider constructors to accept config_path; updates DummyTiku; adds TikuManual to __all__ and PROVIDER_REGISTRY.
LLM provider locking and checks
api/answer.py
Adds per-instance locks for AI/SiliconFlow, wraps queries under locks, and tightens connection checks to require non-empty response content.
Enhanced Judgement Normalization & Manual Bypass
api/answer.py, api/answer_check.py
Normalizes judgement answers (lowercase/strip), expands true/false synonyms, and makes check_answer return True for manual/fallback providers.
Answer Processing Utilities for Study Flows
api/base.py
Adds multi_cut, clean_res, normalize_text, get_option_text, best_option_by_similarity, is_subsequence, random_answer, and imports tenacity for retries.
Video progress, captcha auto-solve, and session recovery
api/base.py, api/captcha.py
Refactors video_progress_log to detect 403/captcha, lazily invoke OCR via CxCaptcha, update cookies on success, adds SessionManager.relogin_if_needed, and rewrites study_video tqdm lifecycle.
Captcha detection and OCR guards
api/captcha.py
Makes ocr_init() optional, distinguishes omitted vs explicit None in CxCaptcha, raises when OCR absent, and wraps try_pass() to return False on failure.
Tenacity-Based Work Retry and Batch Answering
api/base.py
Replaces previous retry logic with tenacity-backed fetch and switches study_work to use tiku.query_all, origin-HTML-aware multi_cut, similarity fallback, and random_answer when needed.
Thread-Safe Cookie File Access and Logger Buffering
api/cookies.py, api/logger.py
Adds cookie_lock (RLock) around cookie read/write with defensive parsing and error handling; logger buffers tqdm messages while manual lock is held and flushes after release.
Resilient Form Parsing and Font Parser Fallback
api/decode.py, api/font_decoder.py
Makes form-data extraction robust to missing/list attributes and falls back BeautifulSoup parser from lxml to html.parser on parse errors.
Task Queue and Sequential Job Processing
main.py
Refactors ChapterTask to include course and custom ordering; updates JobProcessor to use task.course, stores retry_interval, switches process_chapter to sequential jobs, and runs a single global JobProcessor.
Configuration Init, Retry Intervals, and Global Task Queueing
main.py
Adds --retry-interval CLI/config support (default 1.0); init_config() returns config_path; init_chaoxing() accepts it and loads tiku via factory; main builds a global task list and runs one JobProcessor.
Configuration Template and Dependency Updates
config_template.ini, requirements.txt
Adds retry_interval, manual_mode_default, and manual_mode_separator to config template; updates requirements.txt with ddddocr and tenacity and httpx[socks].
Import Cleanup and Minor Refactoring
api/live.py, api/live_process.py, api/logger.py
Removes unused imports and applies minor message formatting and logger sink buffering changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • Samueli924

Poem

🐰 Hoppy updates, providers now batch,
TikuManual waits for a keyboard catch,
Captchas bow and sessions renew,
Cookies and logs keep steady and true,
Tasks march in line — a rabbit's small patch.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat/manual mode' clearly relates to the primary feature addition in this changeset: a new interactive manual mode (TikuManual) for answering questions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
main.py (2)

350-366: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

NOT_OPEN tasks can retry forever — the max_tries guard is dead.

task.tries += 1 is commented out (Line 351), so in the retry path task.tries stays 0. The guard at Lines 357-363 is therefore never reached and the task is re-queued indefinitely. Per the warning text itself, a chapter may be permanently closed ("因为时效已关闭"), so a never-opening chapter will loop forever and task_queue.join() in run() will never return — the program hangs.

🐛 Proposed fix: increment tries so the retry cap applies
                 case ChapterResult.NOT_OPEN:
-                    # task.tries += 1
+                    task.tries += 1
                     if self.config["notopen_action"] == "continue":
                         logger.warning("章节未开启: {} - {}, 正在跳过", task.course["title"], task.point["title"])
                         self.task_queue.task_done()
                         continue

If indefinite retrying is intentional for NOT_OPEN, then the max_tries branch (Lines 357-363) is unreachable and should be removed to avoid the misleading code path. Please confirm the intended behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@main.py` around lines 350 - 366, The NOT_OPEN branch currently never
increments task.tries so the max_tries check in the same case never triggers;
restore or add the increment of task.tries (e.g., increment task.tries before
putting the task into self.retry_queue) inside the case handling
ChapterResult.NOT_OPEN, so tasks get counted and will hit self.max_tries and
follow the error/continue path; ensure you update the same branch that checks
self.config["notopen_action"] and references task.course/point and
self.retry_queue so behavior remains consistent.

72-74: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale help text for -j/--jobs.

The help states task points within a chapter are processed without concurrency limits, but process_chapter now iterates jobs sequentially (Lines 421-423). Update the description to avoid misleading users.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@main.py` around lines 72 - 74, The help text for parser.add_argument("-j",
"--jobs") is stale: it claims task points within a chapter are not
concurrency-limited, but process_chapter uses the jobs value sequentially (see
process_chapter and the jobs variable). Update the help string passed to
parser.add_argument to accurately describe what jobs controls (e.g., "number of
chapters processed concurrently; task points within each chapter are processed
sequentially") so users are not misled; change the message near
parser.add_argument accordingly.
🧹 Nitpick comments (12)
api/answer.py (5)

213-217: ⚡ Quick win

Duplicated is_manual detection logic.

The same logic for detecting manual mode appears in both query (lines 213-217) and query_all (lines 252-256). Consider extracting this to a property or helper method to reduce duplication and ensure consistency.

♻️ Proposed refactor
+    `@property`
+    def _is_manual_mode(self) -> bool:
+        return (
+            getattr(self, 'is_manual', False) or
+            self.__class__.__name__ == 'TikuManual' or
+            (self.__class__.__name__ == 'TikuFallback' and 
+             any(getattr(p, 'is_manual', False) or p.__class__.__name__ == 'TikuManual' 
+                 for p in getattr(self, 'providers', [])))
+        )
+
     def query(self,q_info:dict) -> Optional[str]:
         if self.DISABLE:
             return None
 
-        is_manual = (
-            getattr(self, 'is_manual', False) or
-            self.__class__.__name__ == 'TikuManual' or
-            (self.__class__.__name__ == 'TikuFallback' and any(getattr(p, 'is_manual', False) or p.__class__.__name__ == 'TikuManual' for p in getattr(self, 'providers', [])))
-        )
-
         # 预处理, 去除【单选题】这样与标题无关的字段
-        if not is_manual:
+        if not self._is_manual_mode:
             logger.debug(f"原始标题:{q_info['title']}")

Also applies to: 252-256

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/answer.py` around lines 213 - 217, The is_manual detection logic is
duplicated in query and query_all; extract it into a single helper (e.g., a
property or method) on the class (for example def is_manual_mode(self) or
`@property` is_manual_mode) that implements the existing logic (check
getattr(self, 'is_manual', False), self.__class__.__name__ == 'TikuManual', and
the TikuFallback/providers scan for provider.is_manual or
provider.__class__.__name__ == 'TikuManual'); then replace the inline
expressions in query and query_all with a call to that helper (use the helper
name consistently in both methods) so the detection is centralized and
consistent across the class.

1246-1283: ⚖️ Poor tradeoff

Connection check holds lock during network I/O.

The lock is held for the entire check_llm_connection operation including network requests. If the API is slow or unresponsive, this could block other threads waiting to query. Consider adding a timeout to the OpenAI client creation or using a separate lock for connection checks vs. queries.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/answer.py` around lines 1246 - 1283, The check_llm_connection method
currently holds self._lock for the entire operation including network I/O
(client creation and client.chat.completions.create), which can block other
threads; change it to acquire the lock only for updating/sharing local state
(e.g., protect access to self.last_request_time and any shared flags) and
perform the network call outside the lock, or introduce a separate lock (e.g.,
_conn_check_lock) for this health check; also set a network timeout on the HTTP
client (httpx.Client(timeout=...)) or the OpenAI client so slow/unresponsive
calls fail fast; keep references: check_llm_connection, self._lock,
self._wait_for_interval, self.last_request_time, client.chat.completions.create.

474-510: 💤 Low value

TikuFallback._query_all implementation is well-designed with proper defensive checks.

The length validation at lines 496-498 prevents answer misalignment. Minor nit: sub_idx at line 501 is unused.

♻️ Minor fix for unused variable
-            for sub_idx, (orig_idx, ans) in enumerate(zip(pending_indices, sub_results)):
+            for orig_idx, ans in zip(pending_indices, sub_results):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/answer.py` around lines 474 - 510, The loop in TikuFallback._query_all
declares an unused variable sub_idx via enumerate; remove the unused variable by
changing the loop header to iterate directly over zip(pending_indices,
sub_results) (e.g., for orig_idx, ans in zip(...)) or use an explicit throwaway
name (_) instead of sub_idx; update the loop in the _query_all method so no
unused local remains and ensure logging and results assignment still use
orig_idx and ans as before.

143-144: 💤 Low value

Mutable class attributes are safe here but could be clearer.

The true_list and false_list are mutable class-level defaults. While this is safe because init_tiku reassigns them from config (not mutating in-place), using None as default and initializing in __init__ would be more defensive.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/answer.py` around lines 143 - 144, true_list and false_list are defined
as mutable class-level defaults; change them to None and initialize them to
empty lists (or from config) inside the class constructor or in init_tiku to
avoid shared-mutable-state. Specifically, replace the class-level defaults for
true_list and false_list with None, then in __init__ (or at the start of
init_tiku) set self.true_list = config.get(...) or [] and self.false_list =
config.get(...) or [] so each instance gets its own lists; reference the
true_list, false_list attributes and the init_tiku method when making the
change.

1600-1600: ⚡ Quick win

Ambiguous variable name l.

The variable l can be visually confused with 1. Consider renaming to letter for clarity.

♻️ Proposed rename
-                invalid_letters = [l for l in letters if l not in valid_keys]
+                invalid_letters = [letter for letter in letters if letter not in valid_keys]
-            if letters and all(l in valid_keys for l in letters):
+            if letters and all(letter in valid_keys for letter in letters):
                 unique_ordered_letters = []
-                for l in letters:
-                    if l not in unique_ordered_letters:
-                        unique_ordered_letters.append(l)
+                for letter in letters:
+                    if letter not in unique_ordered_letters:
+                        unique_ordered_letters.append(letter)

Also applies to: 1648-1648, 1650-1650

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/answer.py` at line 1600, The list comprehension creating invalid_letters
uses an ambiguous single-letter loop variable "l"; rename it to "letter" (e.g.,
change invalid_letters = [l for l in letters if l not in valid_keys] to use
"letter") and update the same rename in the other occurrences around the block
(references at the spots analogous to lines 1648 and 1650) so all uses of the
loop variable and any references inside comprehensions or small loops
consistently use "letter" for clarity.
api/captcha.py (2)

87-87: 💤 Low value

Constructor treats explicit ocr=None as "initialize OCR".

The expression ocr if ocr else ocr_init() means passing ocr=None explicitly will trigger ocr_init(), which may not be the intended behavior if a caller wants to disable OCR. Consider distinguishing between "parameter not provided" vs "explicitly None".

♻️ Alternative using sentinel pattern
+_MISSING = object()
+
-    def __init__(self, user_agent: str, cookies: str, ocr: Optional[DdddOcr] = None):
+    def __init__(self, user_agent: str, cookies: str, ocr: Optional[DdddOcr] = _MISSING):
         """
         初始化 CxCaptcha 实例。
         
         Args:
             user_agent (str): 用户代理字符串。
             cookies (str): 会话 cookies。
-            ocr (DdddOcr, optional): 已初始化的 DdddOcr 对象。默认为 None。据DdddOcr官方说明,每次初始化和初始化后的首次识别速度都非常慢,所以推荐传入一个现成的DdddOcr对象实现复用。
+            ocr (DdddOcr, optional): 已初始化的 DdddOcr 对象。如果未提供则自动初始化;如果显式传入 None 则禁用 OCR。据DdddOcr官方说明,每次初始化和初始化后的首次识别速度都非常慢,所以推荐传入一个现成的DdddOcr对象实现复用。
         """
         
         self.user_agent = user_agent
         self.cookies = cookies
         self.s = session()
         self.s.headers.update({
             'User-Agent': self.user_agent,
             'Cookie': self.cookies,
             'Accept': 'image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8'
         })
         self.s.verify = False
         
-        self.ocr = ocr if ocr else ocr_init()
+        self.ocr = ocr_init() if ocr is _MISSING else ocr

However, given the current caller in api/base.py (line 608) always checks for None before passing, this may be acceptable as-is.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/captcha.py` at line 87, The constructor in api/captcha.py currently uses
"self.ocr = ocr if ocr else ocr_init()", which treats an explicit ocr=None as a
signal to initialize OCR; change this to distinguish an omitted parameter from
an explicit None by using an identity check: set self.ocr to the passed ocr if
ocr is not None, otherwise call ocr_init(); alternatively use a sentinel default
for the constructor parameter and initialize with ocr_init() only when the
sentinel is seen. Ensure references to ocr_init() and the constructor's self.ocr
assignment are updated accordingly.

36-38: 💤 Low value

Simplify redundant availability check.

The check DdddOcr is None is redundant when HAS_DDDDOCR is already False, since lines 26-27 guarantee DdddOcr = None in that case.

♻️ Proposed simplification
-    if not HAS_DDDDOCR or DdddOcr is None:
+    if not HAS_DDDDOCR:
         logger.warning("未检测到 ddddocr 依赖,自动验证码识别将不可用。如遇403限制请在浏览器端手动完成验证。")
         return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/captcha.py` around lines 36 - 38, The condition in the captcha
availability check is redundant: replace the compound check "if not HAS_DDDDOCR
or DdddOcr is None" with a single "if not HAS_DDDDOCR" (keeping the existing
logger.warning and return None) so you rely on the earlier initialization
guarantee that DdddOcr is set to None when HAS_DDDDOCR is False; update the
conditional in the block containing HAS_DDDDOCR, DdddOcr, and logger.warning to
remove the redundant DdddOcr reference (no other behavior changes to ensure).
api/logger.py (1)

11-30: 💤 Low value

Unbounded buffer growth if manual mode is held for extended periods.

With enqueue=True, log_buffer access is thread-safe (single sink thread). However, if manual mode remains active for a long time while background tasks generate heavy logging, log_buffer will accumulate messages without bound. Consider adding a cap or periodic flush.

♻️ Optional: Add a buffer size limit
+MAX_LOG_BUFFER_SIZE = 1000
+
 def tqdm_sink(msg):
     manual_locked = False
     try:
         # 动态获取 api.answer 模块中的 TikuManual 锁,避免循环导入
         if 'api.answer' in sys.modules:
             TikuManual = getattr(sys.modules['api.answer'], 'TikuManual', None)
             if TikuManual and getattr(TikuManual, '_manual_lock', None):
                 manual_locked = TikuManual._manual_lock.locked()
     except Exception:
         pass

-    if manual_locked:
+    if manual_locked and len(log_buffer) < MAX_LOG_BUFFER_SIZE:
         log_buffer.append(msg)
     else:
         if log_buffer:
             for buffered_msg in log_buffer:
                 tqdm.write(buffered_msg.rstrip(), file=tqdm_stream)
             log_buffer.clear()
         tqdm.write(msg.rstrip(), file=tqdm_stream)
     tqdm_stream.flush()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/logger.py` around lines 11 - 30, tqdm_sink currently appends messages to
the global log_buffer while TikuManual._manual_lock is held, which can grow
unbounded; modify tqdm_sink to enforce a bounded buffer (e.g., add a
MAX_BUFFER_SIZE constant) and when log_buffer length exceeds that cap either
drop oldest entries (pop(0)) or rotate/trim, and optionally trigger a flush to
tqdm_stream; update references to log_buffer and manual_locked in tqdm_sink (and
ensure thread-safety if needed) so that when TikuManual._manual_lock is active
the buffer never grows past the configured limit and heavy logging won’t exhaust
memory.
api/font_decoder.py (1)

40-43: ⚡ Quick win

Narrow the exception catch to specific parser errors.

Catching bare Exception can hide unrelated programming errors. The fallback should target lxml-specific failures (parsing errors, missing library) rather than any exception.

♻️ Proposed fix to catch specific exceptions
-            try:
-                soup = BeautifulSoup(html_content, "lxml")
-            except Exception:
-                soup = BeautifulSoup(html_content, "html.parser")
+            try:
+                soup = BeautifulSoup(html_content, "lxml")
+            except (ImportError, LookupError, Exception) as e:
+                logger.trace(f"lxml parsing failed, falling back to html.parser: {e}")
+                soup = BeautifulSoup(html_content, "html.parser")

Alternatively, to be more specific (requires importing from lxml):

from lxml.etree import ParserError as LxmlParserError

# Then in the except clause:
except (ImportError, LookupError, LxmlParserError):

As per static analysis hints, ruff BLE001 warns against blind exception catching.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/font_decoder.py` around lines 40 - 43, The try/except around
BeautifulSoup(html_content, "lxml") should stop catching all Exceptions; change
the except clause to only handle parser- or backend-related failures (e.g.,
ImportError/ModuleNotFoundError if lxml is missing,
LookupError/bs4.FeatureNotFound if the "lxml" parser isn't available, and
lxml.etree.ParserError for parsing errors) so that unrelated bugs aren't masked;
import the specific ParserError from lxml.etree and bs4.FeatureNotFound and
update the except to something like except (ImportError, LookupError,
FeatureNotFound, LxmlParserError) and fall back to BeautifulSoup(html_content,
"html.parser") using the existing soup/html_content symbols.
api/answer_check.py (1)

28-31: 💤 Low value

Minor redundancy in list membership checks.

The check val in true_list after val in [x.lower() for x in true_list] is redundant when val is already lowercased. The only case it would catch is if true_list contains exact lowercase strings, which the lowercased comparison already handles. Same issue exists for false_list on line 31.

This doesn't cause incorrect behavior, but simplifying would improve clarity.

♻️ Suggested simplification
 def check_judgement(answer, true_list, false_list):
     val = str(answer).strip().lower()
-    if val in ['true', 't', '1', '对', '正确', '√', '是', 'yes', 'y'] or val in [x.lower() for x in true_list] or val in true_list:
+    if val in ['true', 't', '1', '对', '正确', '√', '是', 'yes', 'y'] or val in [x.lower() for x in true_list]:
         return 1
-    elif val in ['false', 'f', '0', '错', '错误', '×', '否', 'no', 'n', '不对', '不正确'] or val in [x.lower() for x in false_list] or val in false_list:
+    elif val in ['false', 'f', '0', '错', '错误', '×', '否', 'no', 'n', '不对', '不正确'] or val in [x.lower() for x in false_list]:
         return 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/answer_check.py` around lines 28 - 31, Remove the redundant direct
membership checks against true_list and false_list in the boolean parsing block:
since val is already lowercased (val = str(answer).strip().lower()), only test
against the hardcoded lowercase tokens and the lowercased list comprehensions
([x.lower() for x in true_list] and [x.lower() for x in false_list]); remove the
trailing "or val in true_list" and "or val in false_list" parts to simplify the
condition while keeping the checks against true_list and false_list via their
lowercased versions; update the conditions around the variable val and the lists
true_list/false_list accordingly.
api/base.py (2)

778-784: 💤 Low value

Consider extracting progress bar cleanup to reduce code duplication.

The pbar cleanup logic (pbar.leave = False; pbar.close()) is repeated 4 times (lines 779-784, 812-816, 828-832, and in the finally block). Extracting to a helper would reduce duplication and make maintenance easier.

♻️ Suggested helper extraction
+    def close_pbar_safe(pbar_ref):
+        if pbar_ref is not None:
+            try:
+                pbar_ref.leave = False
+                pbar_ref.close()
+            except Exception:
+                pass
+        return None
+
     pbar = None
     try:
         while not passed:
             # ... existing logic ...
                         if pbar is not None:
-                            try:
-                                pbar.leave = False
-                                pbar.close()
-                            except Exception:
-                                pass
-                            pbar = None
+                            pbar = close_pbar_safe(pbar)

Also applies to: 811-817

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/base.py` around lines 778 - 784, Extract the repeated progress-bar
teardown into a small helper (e.g., _cleanup_pbar(pbar)) that checks for None,
sets pbar.leave = False, calls pbar.close() inside a try/except, and returns
None so callers can assign pbar = _cleanup_pbar(pbar); replace the four
duplicated blocks (the occurrences that reference the local variable pbar in the
function containing the finally block and the two earlier try/except sites) with
calls to this helper and assign the result back to pbar; keep the exception
swallow behavior and no-op semantics identical.

804-808: 💤 Low value

Silent exception swallowing may mask import or attribute errors.

The bare except Exception: pass around accessing TikuManual._manual_lock.locked() will silently ignore any error, including NameError if TikuManual isn't imported or AttributeError if the class structure changes. Consider logging at trace/debug level for diagnosability.

♻️ Suggested improvement
                 try:
                     manual_locked = TikuManual._manual_lock.locked()
-                except Exception:
-                    pass
+                except Exception as e:
+                    logger.trace(f"Could not check manual lock state: {e}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/base.py` around lines 804 - 808, Replace the silent except in the manual
lock check with targeted error handling and logging: when calling
TikuManual._manual_lock.locked() (affecting manual_locked), catch specific
exceptions like NameError and AttributeError (and optionally RuntimeError) and
log the exception at debug/trace level rather than swallowing it; ensure
manual_locked defaults to False on error. Use the module or existing logger to
record the exception and include the object/context
(TikuManual._manual_lock.locked) for diagnosability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/answer.py`:
- Around line 281-295: The loop over zip(pending_indices, sub_results) can
silently drop items if _query_all returned a different-length sub_results; add a
defensive check after calling self._query_all to ensure len(sub_results) ==
len(pending_indices) (or use zip(pending_indices, sub_results, strict=True) on
Python 3.10+) and if lengths differ log an error including self.name and q_list
titles, then either raise or pad/align sub_results consistently before iterating
so that results, cache_dao.add_cache, check_answer, and assignment into
results[idx] cannot silently skip entries.
- Around line 1458-1465: The code directly manipulates the private
tqdm._instances list; replace that fragility by first checking that tqdm exposes
a public instances container and that it is iterable, e.g., guard with
hasattr(tqdm, "_instances") and isinstance(tqdm._instances, (list, tuple, set))
before iterating, and for each item use safe getattr checks (hasattr/getattr) to
call close()/clear()/set leave=False only if those attributes exist;
alternatively, prefer storing and closing your own tqdm objects where they are
created (call instance.close()) instead of touching tqdm._instances—apply this
guarded approach to the blocks that reference tqdm._instances (the try/import
blocks that currently set instance.leave, instance.clear, instance.close).

In `@api/base.py`:
- Line 939: The loop using zip(questions["questions"], answers) can silently
drop items when lengths differ; update the iteration to either use zip(...,
strict=True) (if running on Python 3.10+) or add an explicit length assertion
like checking len(answers) == len(questions["questions"]) and raising/logging a
clear error before the loop so mismatches are detected; adjust the code around
the zip call that iterates over questions["questions"] and answers accordingly.

---

Outside diff comments:
In `@main.py`:
- Around line 350-366: The NOT_OPEN branch currently never increments task.tries
so the max_tries check in the same case never triggers; restore or add the
increment of task.tries (e.g., increment task.tries before putting the task into
self.retry_queue) inside the case handling ChapterResult.NOT_OPEN, so tasks get
counted and will hit self.max_tries and follow the error/continue path; ensure
you update the same branch that checks self.config["notopen_action"] and
references task.course/point and self.retry_queue so behavior remains
consistent.
- Around line 72-74: The help text for parser.add_argument("-j", "--jobs") is
stale: it claims task points within a chapter are not concurrency-limited, but
process_chapter uses the jobs value sequentially (see process_chapter and the
jobs variable). Update the help string passed to parser.add_argument to
accurately describe what jobs controls (e.g., "number of chapters processed
concurrently; task points within each chapter are processed sequentially") so
users are not misled; change the message near parser.add_argument accordingly.

---

Nitpick comments:
In `@api/answer_check.py`:
- Around line 28-31: Remove the redundant direct membership checks against
true_list and false_list in the boolean parsing block: since val is already
lowercased (val = str(answer).strip().lower()), only test against the hardcoded
lowercase tokens and the lowercased list comprehensions ([x.lower() for x in
true_list] and [x.lower() for x in false_list]); remove the trailing "or val in
true_list" and "or val in false_list" parts to simplify the condition while
keeping the checks against true_list and false_list via their lowercased
versions; update the conditions around the variable val and the lists
true_list/false_list accordingly.

In `@api/answer.py`:
- Around line 213-217: The is_manual detection logic is duplicated in query and
query_all; extract it into a single helper (e.g., a property or method) on the
class (for example def is_manual_mode(self) or `@property` is_manual_mode) that
implements the existing logic (check getattr(self, 'is_manual', False),
self.__class__.__name__ == 'TikuManual', and the TikuFallback/providers scan for
provider.is_manual or provider.__class__.__name__ == 'TikuManual'); then replace
the inline expressions in query and query_all with a call to that helper (use
the helper name consistently in both methods) so the detection is centralized
and consistent across the class.
- Around line 1246-1283: The check_llm_connection method currently holds
self._lock for the entire operation including network I/O (client creation and
client.chat.completions.create), which can block other threads; change it to
acquire the lock only for updating/sharing local state (e.g., protect access to
self.last_request_time and any shared flags) and perform the network call
outside the lock, or introduce a separate lock (e.g., _conn_check_lock) for this
health check; also set a network timeout on the HTTP client
(httpx.Client(timeout=...)) or the OpenAI client so slow/unresponsive calls fail
fast; keep references: check_llm_connection, self._lock,
self._wait_for_interval, self.last_request_time, client.chat.completions.create.
- Around line 474-510: The loop in TikuFallback._query_all declares an unused
variable sub_idx via enumerate; remove the unused variable by changing the loop
header to iterate directly over zip(pending_indices, sub_results) (e.g., for
orig_idx, ans in zip(...)) or use an explicit throwaway name (_) instead of
sub_idx; update the loop in the _query_all method so no unused local remains and
ensure logging and results assignment still use orig_idx and ans as before.
- Around line 143-144: true_list and false_list are defined as mutable
class-level defaults; change them to None and initialize them to empty lists (or
from config) inside the class constructor or in init_tiku to avoid
shared-mutable-state. Specifically, replace the class-level defaults for
true_list and false_list with None, then in __init__ (or at the start of
init_tiku) set self.true_list = config.get(...) or [] and self.false_list =
config.get(...) or [] so each instance gets its own lists; reference the
true_list, false_list attributes and the init_tiku method when making the
change.
- Line 1600: The list comprehension creating invalid_letters uses an ambiguous
single-letter loop variable "l"; rename it to "letter" (e.g., change
invalid_letters = [l for l in letters if l not in valid_keys] to use "letter")
and update the same rename in the other occurrences around the block (references
at the spots analogous to lines 1648 and 1650) so all uses of the loop variable
and any references inside comprehensions or small loops consistently use
"letter" for clarity.

In `@api/base.py`:
- Around line 778-784: Extract the repeated progress-bar teardown into a small
helper (e.g., _cleanup_pbar(pbar)) that checks for None, sets pbar.leave =
False, calls pbar.close() inside a try/except, and returns None so callers can
assign pbar = _cleanup_pbar(pbar); replace the four duplicated blocks (the
occurrences that reference the local variable pbar in the function containing
the finally block and the two earlier try/except sites) with calls to this
helper and assign the result back to pbar; keep the exception swallow behavior
and no-op semantics identical.
- Around line 804-808: Replace the silent except in the manual lock check with
targeted error handling and logging: when calling
TikuManual._manual_lock.locked() (affecting manual_locked), catch specific
exceptions like NameError and AttributeError (and optionally RuntimeError) and
log the exception at debug/trace level rather than swallowing it; ensure
manual_locked defaults to False on error. Use the module or existing logger to
record the exception and include the object/context
(TikuManual._manual_lock.locked) for diagnosability.

In `@api/captcha.py`:
- Line 87: The constructor in api/captcha.py currently uses "self.ocr = ocr if
ocr else ocr_init()", which treats an explicit ocr=None as a signal to
initialize OCR; change this to distinguish an omitted parameter from an explicit
None by using an identity check: set self.ocr to the passed ocr if ocr is not
None, otherwise call ocr_init(); alternatively use a sentinel default for the
constructor parameter and initialize with ocr_init() only when the sentinel is
seen. Ensure references to ocr_init() and the constructor's self.ocr assignment
are updated accordingly.
- Around line 36-38: The condition in the captcha availability check is
redundant: replace the compound check "if not HAS_DDDDOCR or DdddOcr is None"
with a single "if not HAS_DDDDOCR" (keeping the existing logger.warning and
return None) so you rely on the earlier initialization guarantee that DdddOcr is
set to None when HAS_DDDDOCR is False; update the conditional in the block
containing HAS_DDDDOCR, DdddOcr, and logger.warning to remove the redundant
DdddOcr reference (no other behavior changes to ensure).

In `@api/font_decoder.py`:
- Around line 40-43: The try/except around BeautifulSoup(html_content, "lxml")
should stop catching all Exceptions; change the except clause to only handle
parser- or backend-related failures (e.g., ImportError/ModuleNotFoundError if
lxml is missing, LookupError/bs4.FeatureNotFound if the "lxml" parser isn't
available, and lxml.etree.ParserError for parsing errors) so that unrelated bugs
aren't masked; import the specific ParserError from lxml.etree and
bs4.FeatureNotFound and update the except to something like except (ImportError,
LookupError, FeatureNotFound, LxmlParserError) and fall back to
BeautifulSoup(html_content, "html.parser") using the existing soup/html_content
symbols.

In `@api/logger.py`:
- Around line 11-30: tqdm_sink currently appends messages to the global
log_buffer while TikuManual._manual_lock is held, which can grow unbounded;
modify tqdm_sink to enforce a bounded buffer (e.g., add a MAX_BUFFER_SIZE
constant) and when log_buffer length exceeds that cap either drop oldest entries
(pop(0)) or rotate/trim, and optionally trigger a flush to tqdm_stream; update
references to log_buffer and manual_locked in tqdm_sink (and ensure
thread-safety if needed) so that when TikuManual._manual_lock is active the
buffer never grows past the configured limit and heavy logging won’t exhaust
memory.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0f69359d-6e67-49be-b0d3-555fe1f3814f

📥 Commits

Reviewing files that changed from the base of the PR and between 6ca38cb and 589e0bc.

📒 Files selected for processing (13)
  • api/answer.py
  • api/answer_check.py
  • api/base.py
  • api/captcha.py
  • api/cookies.py
  • api/decode.py
  • api/font_decoder.py
  • api/live.py
  • api/live_process.py
  • api/logger.py
  • config_template.ini
  • main.py
  • requirements.txt

Comment thread api/answer.py
Comment thread api/answer.py Outdated
Comment thread api/base.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/answer.py (1)

1486-1492: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Acquire _manual_lock before clearing tqdm bars.

Line 1488 and Line 1497 clear existing bars before the manual lock is held. study_video() only suppresses/recreates bars while TikuManual._manual_lock is locked, so another thread can recreate a progress bar in that gap and pollute the prompt again. Make the cleanup and prompt handoff atomic under the same lock.

Proposed fix
     def _query(self, q_info: dict) -> Optional[str]:
-        # 强行关闭清除所有当前活动的 tqdm 进度条
-        self._safe_close_tqdm_bars()
-
         with self._manual_lock:
+            # 先持锁再清理,避免其他线程在窗口期重新创建进度条
+            self._safe_close_tqdm_bars()
             ans = self._single_query(q_info)
         logger.debug("手动答题结束,冲刷缓存日志")
         return ans

     def _query_all(self, q_list: list[dict], query_delay: float = 0.0) -> list[Optional[str]]:
-        # 强行关闭清除所有当前活动的 tqdm 进度条
-        self._safe_close_tqdm_bars()
-
         with self._manual_lock:
+            # 先持锁再清理,避免其他线程在窗口期重新创建进度条
+            self._safe_close_tqdm_bars()
             print(f"\n{'='*20} 手动输入题库 (共 {len(q_list)} 题) {'='*20}")
             if self.default_mode == 'batch':
                 ans_list = self._batch_query_flow(q_list)

Also applies to: 1495-1500

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/answer.py` around lines 1486 - 1492, The cleanup of tqdm bars must happen
while holding the manual lock to prevent races; move the call to
self._safe_close_tqdm_bars() to be executed inside the critical section guarded
by self._manual_lock so the sequence (acquire self._manual_lock, clear bars,
call self._single_query) is atomic in _query; apply the same change for the
other occurrence around lines 1495-1500 (ensure any call that clears or
suppresses/creates bars e.g., study_video-related cleanup runs only after
acquiring TikuManual._manual_lock) and keep logger.debug after the critical
section as appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@api/answer.py`:
- Around line 1486-1492: The cleanup of tqdm bars must happen while holding the
manual lock to prevent races; move the call to self._safe_close_tqdm_bars() to
be executed inside the critical section guarded by self._manual_lock so the
sequence (acquire self._manual_lock, clear bars, call self._single_query) is
atomic in _query; apply the same change for the other occurrence around lines
1495-1500 (ensure any call that clears or suppresses/creates bars e.g.,
study_video-related cleanup runs only after acquiring TikuManual._manual_lock)
and keep logger.debug after the critical section as appropriate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f9afe232-53b0-480a-8638-06d8941510a1

📥 Commits

Reviewing files that changed from the base of the PR and between 589e0bc and 8fc2e28.

📒 Files selected for processing (2)
  • api/answer.py
  • api/base.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
api/answer.py (1)

1602-1624: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't parse arbitrary option text as option letters.

These branches treat every Latin character in ans as a choice key. In manual mode, inputs like DNA复制 / HTTP / TCP will be rejected or normalized into letters even when they exactly match the option content, so option-text entry is not reliable for technical questions. Only take the “letter answer” path when the whole input is a pure option-key pattern; otherwise fall back to text matching.

Suggested fix
+    `@staticmethod`
+    def _extract_option_letters(ans: str) -> list[str]:
+        cleaned = re.sub(r'[\s,,;;、]+', '', ans)
+        if not cleaned or not re.fullmatch(r'[A-Za-z]+', cleaned):
+            return []
+        return [c.upper() for c in cleaned]
+
     def _validate_user_input(self, ans: str, q: dict) -> tuple[bool, str]:
         ...
-                letters = [c.upper() for c in ans if re.match(r'[A-Za-z]', c)]
+                letters = self._extract_option_letters(ans)
                 if not letters:
                     ...
 
     def _normalize_user_input(self, ans: str, q: dict) -> Optional[str]:
         ...
-            letters = [c.upper() for c in ans if re.match(r'[A-Za-z]', c)]
+            letters = self._extract_option_letters(ans)
             if letters and all(letter in valid_keys for letter in letters):
                 ...

Also applies to: 1666-1672

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/answer.py` around lines 1602 - 1624, The code treats any Latin letter
inside ans as an option key; change it so you only take the “letter answer” path
when the entire input matches a pure option-key pattern (e.g., single letter or
a list of letters / letter separators) rather than any Latin chars embedded in
text. Concretely, in the block using valid_keys and computing letters, first
test ans against a regex like /^\s*[A-Za-z](\s*[,、;.;:]?\s*[A-Za-z])*\s*$/ (or
an equivalent stricter pattern) and only then split to letters; otherwise skip
the letters branch and perform the text matching logic (the cut(...) / parts
matching) as the default. Apply the same guard to the duplicate logic that
appears around the other branch (the similar code handling at the later location
mentioned) so technical tokens like "DNA复制" or "HTTP" are treated as text
matches, not option letters.
api/base.py (1)

781-788: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate refreshed video metadata back into the loop state.

Line 784 stores the refreshed duration in _duration, but later video_progress_log(...) calls and the recreated progress bar still use duration. play_time is also taken from refreshed_meta without the same int(...) normalization as the initial fetch. After a 403 recovery, the next retry can keep signing requests with stale timing data or hit numeric ops on string payloads.

🛠️ Proposed fix
-                            _dtoken = refreshed_meta["dtoken"]
-                            _duration = refreshed_meta["duration"]
-                            play_time = refreshed_meta.get("playTime", play_time)
+                            _dtoken = refreshed_meta["dtoken"]
+                            duration = int(refreshed_meta["duration"])
+                            refreshed_play_time = refreshed_meta.get("playTime")
+                            if refreshed_play_time is not None:
+                                play_time = int(refreshed_play_time)

-                            logger.debug("刷新后的令牌: {}, 持续时间: {}, 播放时间: {}", _dtoken, _duration, play_time)
+                            logger.debug("刷新后的令牌: {}, 持续时间: {}, 播放时间: {}", _dtoken, duration, play_time)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/base.py` around lines 781 - 788, After recovering refreshed_meta in the
forbidden-recovery branch, propagate the refreshed values into the loop state so
later calls use them: assign the refreshed dtoken and duration back to the
variables used by the loop (e.g., set the loop's dtoken/duration from
_dtoken/_duration) and normalize play_time with int(...) (use
int(refreshed_meta.get("playTime", play_time))). Ensure subsequent calls like
video_progress_log(...) and recreation of the progress bar via
_close_pbar_safe(...) use the updated duration/play_time variables rather than
the stale original duration.
🧹 Nitpick comments (2)
api/answer_check.py (1)

47-47: ⚡ Quick win

Prefer an explicit capability flag over class-name matching.

tiku.__class__.__name__ in [...] is brittle here: a rename, subclass, or wrapper around TikuFallback will silently lose the bypass and reintroduce the false-negative filtering this change is trying to avoid. TikuManual is already covered by is_manual, so the remaining fallback case should use an instance/base-class flag instead of a string comparison.

Suggested change
-    if getattr(tiku, 'is_manual', False) or tiku.__class__.__name__ in ['TikuManual', 'TikuFallback']:
+    if getattr(tiku, 'is_manual', False) or getattr(tiku, 'skip_answer_validation', False):
         return True

Then set skip_answer_validation = True on TikuFallback in api/answer.py.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/answer_check.py` at line 47, Replace the brittle class-name check with an
explicit capability flag: change the condition "if getattr(tiku, 'is_manual',
False) or tiku.__class__.__name__ in ['TikuManual', 'TikuFallback']" to check
for the flag, e.g. "if getattr(tiku, 'is_manual', False) or getattr(tiku,
'skip_answer_validation', False)". Then set skip_answer_validation = True on the
TikuFallback implementation (in the class definition where TikuFallback is
declared) so the fallback instance opts out of answer validation.
api/base.py (1)

604-615: ⚡ Quick win

Skip captcha retries when OCR is explicitly unavailable.

CxCaptcha(..., ocr=None) now means “OCR disabled”, so this three-attempt loop cannot succeed in that state. It just adds a deterministic delay before returning the original 403, which makes the recovery path feel much slower whenever ddddocr is missing or failed to initialize.

♻️ Proposed fix
                     captcha_solver = CxCaptcha(user_agent=ua, cookies=cookies_str, ocr=ocr_inst)
+                    if captcha_solver.ocr is None:
+                        logger.warning("OCR 不可用,跳过自动验证码重试。")
+                        return res
                     solved = False
                     for attempt in range(3):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/base.py` around lines 604 - 615, The captcha retry loop wastes time when
OCR is disabled because CxCaptcha was created with ocr=None; update the logic
around CxCaptcha/captcha_solver to detect when ocr_inst is None (or
captcha_solver.ocr is falsy) and skip the 3-attempt retry: either attempt a
single try_pass (or skip trying and treat as unsolvable) and immediately proceed
to the 403/error path, rather than sleeping and retrying three times; modify the
block around CxCaptcha, captcha_solver.try_pass, and the solved flag so behavior
is fast and deterministic when OCR is explicitly unavailable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/answer.py`:
- Around line 1266-1280: Health-check calls are bypassing the per-provider
serialization so they can run concurrently with _query(); modify
AI.check_llm_connection() and SiliconFlow.check_llm_connection() to acquire the
same instance lock (self._lock) and perform the same throttling sequence used in
_query() — call self._wait_for_interval() while holding the lock, update
self.last_request_time = time.time() inside that lock, then release before
making the network call — so health checks are serialized with normal LLM
requests and cannot bypass throttling.

---

Outside diff comments:
In `@api/answer.py`:
- Around line 1602-1624: The code treats any Latin letter inside ans as an
option key; change it so you only take the “letter answer” path when the entire
input matches a pure option-key pattern (e.g., single letter or a list of
letters / letter separators) rather than any Latin chars embedded in text.
Concretely, in the block using valid_keys and computing letters, first test ans
against a regex like /^\s*[A-Za-z](\s*[,、;.;:]?\s*[A-Za-z])*\s*$/ (or an
equivalent stricter pattern) and only then split to letters; otherwise skip the
letters branch and perform the text matching logic (the cut(...) / parts
matching) as the default. Apply the same guard to the duplicate logic that
appears around the other branch (the similar code handling at the later location
mentioned) so technical tokens like "DNA复制" or "HTTP" are treated as text
matches, not option letters.

In `@api/base.py`:
- Around line 781-788: After recovering refreshed_meta in the forbidden-recovery
branch, propagate the refreshed values into the loop state so later calls use
them: assign the refreshed dtoken and duration back to the variables used by the
loop (e.g., set the loop's dtoken/duration from _dtoken/_duration) and normalize
play_time with int(...) (use int(refreshed_meta.get("playTime", play_time))).
Ensure subsequent calls like video_progress_log(...) and recreation of the
progress bar via _close_pbar_safe(...) use the updated duration/play_time
variables rather than the stale original duration.

---

Nitpick comments:
In `@api/answer_check.py`:
- Line 47: Replace the brittle class-name check with an explicit capability
flag: change the condition "if getattr(tiku, 'is_manual', False) or
tiku.__class__.__name__ in ['TikuManual', 'TikuFallback']" to check for the
flag, e.g. "if getattr(tiku, 'is_manual', False) or getattr(tiku,
'skip_answer_validation', False)". Then set skip_answer_validation = True on the
TikuFallback implementation (in the class definition where TikuFallback is
declared) so the fallback instance opts out of answer validation.

In `@api/base.py`:
- Around line 604-615: The captcha retry loop wastes time when OCR is disabled
because CxCaptcha was created with ocr=None; update the logic around
CxCaptcha/captcha_solver to detect when ocr_inst is None (or captcha_solver.ocr
is falsy) and skip the 3-attempt retry: either attempt a single try_pass (or
skip trying and treat as unsolvable) and immediately proceed to the 403/error
path, rather than sleeping and retrying three times; modify the block around
CxCaptcha, captcha_solver.try_pass, and the solved flag so behavior is fast and
deterministic when OCR is explicitly unavailable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: daa6e25e-fb17-4e89-9a4c-f62d904a3976

📥 Commits

Reviewing files that changed from the base of the PR and between 8fc2e28 and ca6cb70.

📒 Files selected for processing (6)
  • api/answer.py
  • api/answer_check.py
  • api/base.py
  • api/captcha.py
  • api/font_decoder.py
  • api/logger.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • api/font_decoder.py
  • api/logger.py

Comment thread api/answer.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
api/answer.py (1)

1396-1424: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

SiliconFlow.check_llm_connection doesn't respect min_interval like AI does.

The AI.check_llm_connection calls _wait_for_interval() and updates last_request_time before making the API call (lines 1269-1270). However, SiliconFlow.check_llm_connection acquires the lock but doesn't enforce the interval timing, which could still trigger rate limiting if called shortly after a query.

🛡️ Proposed fix to add interval handling
     def check_llm_connection(self) -> bool:
         """
         检查硅基流动大模型连接是否可用
         发送一个简单的测试请求来验证 API 配置
         """
         with self._lock:
             logger.info(f'正在检查 {self.name} 连接...')
             try:
+                # Respect min_interval to avoid rate limiting
+                if self.last_request_time:
+                    interval = time.time() - self.last_request_time
+                    if interval < self.min_interval:
+                        time.sleep(self.min_interval - interval)
+
                 headers = {
                     'Authorization': f'Bearer {self.api_key}',
                     'Content-Type': 'application/json'
                 }
                 # ... rest of the code ...
                 response = requests.post(
                     self.api_endpoint,
                     headers=headers,
                     json=payload,
                     timeout=30
                 )
+                self.last_request_time = time.time()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/answer.py` around lines 1396 - 1424, SiliconFlow.check_llm_connection
currently acquires the lock and makes the request without respecting the
min_interval; update this method to call the same interval enforcement used by
AI.check_llm_connection: invoke _wait_for_interval() (or the shared interval
helper) while holding the lock, and update last_request_time before performing
the requests.post call so the min_interval is enforced and rate-limiting is
avoided; reference SiliconFlow.check_llm_connection, _wait_for_interval,
last_request_time, and min_interval when applying the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@api/answer.py`:
- Around line 1396-1424: SiliconFlow.check_llm_connection currently acquires the
lock and makes the request without respecting the min_interval; update this
method to call the same interval enforcement used by AI.check_llm_connection:
invoke _wait_for_interval() (or the shared interval helper) while holding the
lock, and update last_request_time before performing the requests.post call so
the min_interval is enforced and rate-limiting is avoided; reference
SiliconFlow.check_llm_connection, _wait_for_interval, last_request_time, and
min_interval when applying the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b6222957-9112-4a99-b273-9bb842ab53d2

📥 Commits

Reviewing files that changed from the base of the PR and between ca6cb70 and 0acedc7.

📒 Files selected for processing (3)
  • api/answer.py
  • api/answer_check.py
  • api/base.py

@Zropk66 Zropk66 force-pushed the feat/manual-mode branch from ea8d8ab to 57ac976 Compare June 2, 2026 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant