Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ git config user.email "SOME_EMAIL@example.com"
Playwright/Puppeteer integration and main web e2e coverage. The package name is `@midscene/web`.
- `packages/shared`: shared utilities used across the monorepo. The package name is `@midscene/shared`.
- `packages/{android,ios,computer,harmony}`: platform runtimes. Matching
`*-mcp` and `*-playground` packages live alongside them. The package names are `@midscene/android`, `@midscene/ios`, `@midscene/computer`, `@midscene/harmony`.
`*-playground` packages live alongside them. The package names are `@midscene/android`, `@midscene/ios`, `@midscene/computer`, `@midscene/harmony`.
- `packages/visualizer` and `apps/report`: report rendering and viewer UI.
- `apps/site`: documentation site. The Nx project name is `doc`, not `site`.
- `apps/chrome-extension`, `apps/playground`, `apps/report`,
Expand Down Expand Up @@ -288,14 +288,13 @@ Every commit **must** include a scope. The scope must be one of the following:
* `llm`
* `playwright`
* `puppeteer`
* `mcp`
* `bridge`
* *(All top-level directories in the apps and packages directories)*
* *(Consider adding other relevant top-level packages or areas here if needed)*

**Examples:**

* `feat(mcp): add screenshot tool with element selection`
* `feat(bridge): add screenshot tool with element selection`
* `fix(android): correct adb connection issue on windows`
* `refactor(llm): simplify prompt generation logic`
* `chore(workflow): update commitlint configuration`
Expand Down
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ Use [Midscene Skills](https://github.com/web-infra-dev/midscene-skills) to contr
* [iOS Automation - Auto-like the first @midscene_ai tweet](https://midscenejs.com/showcases#ios)
* [Android Automation - DCar: Xiaomi SU7 specs](https://midscenejs.com/showcases#android)
* [Android Automation - Booking a hotel for Christmas](https://midscenejs.com/showcases#android)
* [MCP Integration - Midscene MCP UI prepatch release](https://midscenejs.com/showcases#mcp)
* [robotic arm + vision + voice for in-vehicle testing](https://midscenejs.com/showcases#community-showcases)

## 💡 Why Midscene
Expand All @@ -50,13 +49,13 @@ Most UI automation — including AI tools that read the DOM or the accessibility
- **Less maintenance** — no selectors to chase when the UI changes.
- **Reach every element and surface** — if a human can see it, Midscene can target it, even with no semantic annotations, on `<canvas>`, native apps, and cross-origin iframes.
- **Assert what users actually see** — verify colors, highlights, layout, and rendered state, not just whether a DOM node exists.
- **Two ways to test** — add Midscene to your [Playwright](https://midscenejs.com/integrate-with-playwright) / Vitest suite, or let an AI agent test autonomously via [Skills](https://midscenejs.com/skills) and [MCP](https://midscenejs.com/mcp).
- **Two ways to test** — add Midscene to your [Playwright](https://midscenejs.com/integrate-with-playwright) / Vitest suite, or let an AI agent test autonomously via [Skills](https://midscenejs.com/skills).

Midscene is built for UI testing first, but the same vision-driven engine handles any UI automation task.

## 💡 What you can automate

Midscene works anywhere you can take a screenshot — web browsers, Android, iOS, HarmonyOS, desktop apps, and [any custom interface](https://midscenejs.com/integrate-with-any-interface) — all through one API. Write automation with the JavaScript SDK or in YAML, hand it to AI agents via [Skills](https://midscenejs.com/skills) and [MCP](https://midscenejs.com/mcp), and look up every method (`aiAct`, `aiQuery`, `aiAssert`, and more) in the [API reference](https://midscenejs.com/api).
Midscene works anywhere you can take a screenshot — web browsers, Android, iOS, HarmonyOS, desktop apps, and [any custom interface](https://midscenejs.com/integrate-with-any-interface) — all through one API. Write automation with the JavaScript SDK or in YAML, hand it to AI agents via [Skills](https://midscenejs.com/skills), and look up every method (`aiAct`, `aiQuery`, `aiAssert`, and more) in the [API reference](https://midscenejs.com/api).

## 🚀 Get started

Expand Down
5 changes: 2 additions & 3 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@
* [iOS 自动化 - 自动点赞 @midscene_ai 的第一条推文](https://midscenejs.com/zh/showcases#ios)
* [Android 自动化 - 懂车帝:查看小米 SU7 参数](https://midscenejs.com/zh/showcases#android)
* [Android 自动化 - 预订圣诞节酒店](https://midscenejs.com/zh/showcases#android)
* [MCP 集成 - Midscene MCP UI prepatch 版本发布](https://midscenejs.com/zh/showcases#mcp)
* [车机测试中的机械臂 + 视觉 + 语音方案](https://midscenejs.com/zh/showcases#community-showcases)

## 💡 为什么选择 Midscene
Expand All @@ -52,15 +51,15 @@ Midscene 仅凭截图工作。你只需用自然语言描述每一步:
- **更低的维护成本**:UI 变化时,无需再追着改选择器。
- **触达每个元素与界面**:只要人眼能看到,Midscene 就能定位。即使元素没有语义化标注,或位于 `<canvas>`、原生应用、跨域 iframe 上,也可以定位。
- **校验用户真正看到的效果**:验证颜色、高亮、布局与渲染状态,而不只是判断 DOM 节点是否存在。
- **两种测试方式**:接入你的 [Playwright](https://midscenejs.com/zh/integrate-with-playwright) / Vitest 测试,或让 AI Agent 通过 [Skills](https://midscenejs.com/zh/skills) 与 [MCP](https://midscenejs.com/zh/mcp) 自主测试。
- **两种测试方式**:接入你的 [Playwright](https://midscenejs.com/zh/integrate-with-playwright) / Vitest 测试,或让 AI Agent 通过 [Skills](https://midscenejs.com/zh/skills) 自主测试。

Midscene 首先为 UI 测试而生,但同一套视觉驱动引擎也能胜任任意 UI 自动化任务。

## 💡 能自动化什么

只要能截图,Midscene 就能工作。Web 浏览器、Android、iOS、HarmonyOS、桌面应用,以及[任意自定义界面](https://midscenejs.com/zh/integrate-with-any-interface),全部通过同一套 API。

你可以用 JavaScript SDK 或 YAML 编写自动化,也可以通过 [Skills](https://midscenejs.com/zh/skills) 与 [MCP](https://midscenejs.com/zh/mcp) 交给 AI Agent。所有方法都可以在 [API 参考](https://midscenejs.com/zh/api) 中查阅,包括 `aiAct`、`aiQuery` 和 `aiAssert`。
你可以用 JavaScript SDK 或 YAML 编写自动化,也可以通过 [Skills](https://midscenejs.com/zh/skills) 交给 AI Agent。所有方法都可以在 [API 参考](https://midscenejs.com/zh/api) 中查阅,包括 `aiAct`、`aiQuery` 和 `aiAssert`。

## 🚀 开始使用

Expand Down
2 changes: 1 addition & 1 deletion apps/chrome-extension/src/scripts/worker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ self.addEventListener('unhandledrejection', (event) => {
console.error('[ServiceWorker] Unhandled promise rejection:', event.reason);
});

// Background Bridge for MCP connection
// Background Bridge for external automation connections
const BRIDGE_PERMISSION_KEY = 'midscene_bridge_permission';
const BRIDGE_STOPPED_KEY = 'midscene_bridge_stopped';
let backgroundBridge: BridgeConnector | null = null;
Expand Down
37 changes: 18 additions & 19 deletions apps/report/rsbuild.config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ import { pluginNodePolyfill } from '@rsbuild/plugin-node-polyfill';
import { pluginReact } from '@rsbuild/plugin-react';
import { pluginSvgr } from '@rsbuild/plugin-svgr';
import { pluginWorkspaceDev } from 'rsbuild-plugin-workspace-dev';
import {
buildReportTemplateInjection,
isReportTemplateInjectableFile,
reportTemplateMagicString,
reportTemplateReplacedMark,
reportTemplateReplacementRegExp,
} from '../../scripts/report-template-utils.mjs';
import {
commonIgnoreWarnings,
createTypeCheckPlugin,
Expand Down Expand Up @@ -35,20 +42,14 @@ const copyReportTemplate = () => ({
onAfterBuild: (arg0: ({ compiler }: { compiler: any }) => void) => void;
}) {
api.onAfterBuild(({ compiler }) => {
const magicString = 'REPLACE_ME_WITH_REPORT_HTML';
const replacedMark = '/*REPORT_HTML_REPLACED*/';
const regExpForReplace = /\/\*REPORT_HTML_REPLACED\*\/.*/;

// read the template file
const srcPath = path.join(__dirname, 'dist', 'index.html');
const tplFileContent = fs
.readFileSync(srcPath, 'utf-8')
.replaceAll(magicString, '');
const { sanitizedTplFileContent, finalContent } =
buildReportTemplateInjection(fs.readFileSync(srcPath, 'utf-8'));
assert(
!tplFileContent.includes(magicString),
!sanitizedTplFileContent.includes(reportTemplateMagicString),
'magic string should not be in the template file',
);
const finalContent = `${replacedMark}${JSON.stringify(tplFileContent)}`;

// find the core package
const corePkgDir = path.join(__dirname, '..', '..', 'packages', 'core');
Expand All @@ -65,35 +66,33 @@ const copyReportTemplate = () => ({
const jsFiles = fs.readdirSync(corePkgDistDir, { recursive: true });
let replacedCount = 0;
for (const file of jsFiles) {
if (
typeof file === 'string' &&
(file.endsWith('.js') || file.endsWith('.mjs'))
) {
if (isReportTemplateInjectableFile(file)) {
const filePath = path.join(corePkgDistDir, file.toString());
const fileContent = fs.readFileSync(filePath, 'utf-8');
if (fileContent.includes(replacedMark)) {
if (fileContent.includes(reportTemplateReplacedMark)) {
assert(
regExpForReplace.test(fileContent),
reportTemplateReplacementRegExp.test(fileContent),
'a replaced mark is found but cannot match',
);

const replacedContent = fileContent.replace(
regExpForReplace,
reportTemplateReplacementRegExp,
() => finalContent,
);
fs.writeFileSync(filePath, replacedContent);
replacedCount++;
console.log(`Template updated in file ${filePath}`);
} else if (fileContent.includes(magicString)) {
} else if (fileContent.includes(reportTemplateMagicString)) {
const magicStringCount = (
fileContent.match(new RegExp(magicString, 'g')) || []
fileContent.match(new RegExp(reportTemplateMagicString, 'g')) ||
[]
).length;
assert(
magicStringCount === 1,
'magic string shows more than once in the file, cannot process',
);
const replacedContent = fileContent.replace(
`'${magicString}'`,
`'${reportTemplateMagicString}'`,
() => finalContent, // there are some $- code in the tpl, so we have to use a function as the second argument
);
fs.writeFileSync(filePath, replacedContent);
Expand Down
37 changes: 18 additions & 19 deletions apps/report/scripts/inject-report-template.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,32 @@ import assert from 'node:assert';
import fs from 'node:fs';
import path from 'node:path';
import { fileURLToPath } from 'node:url';
import {
buildReportTemplateInjection,
isReportTemplateInjectableFile,
reportTemplateMagicString,
reportTemplateReplacedMark,
reportTemplateReplacementRegExp,
} from '../../../scripts/report-template-utils.mjs';

const __dirname = path.dirname(fileURLToPath(import.meta.url));
const reportRoot = path.resolve(__dirname, '..');
const repoRoot = path.resolve(reportRoot, '..', '..');

const magicString = 'REPLACE_ME_WITH_REPORT_HTML';
const replacedMark = '/*REPORT_HTML_REPLACED*/';
const regExpForReplace = /\/\*REPORT_HTML_REPLACED\*\/.*/;

const srcPath = path.join(reportRoot, 'dist', 'index.html');
if (!fs.existsSync(srcPath)) {
throw new Error(
`Report template not found at ${srcPath}. Run "nx build @midscene/report" first.`,
);
}

const tplFileContent = fs
.readFileSync(srcPath, 'utf-8')
.replaceAll(magicString, '');
const { sanitizedTplFileContent, finalContent } = buildReportTemplateInjection(
fs.readFileSync(srcPath, 'utf-8'),
);
assert(
!tplFileContent.includes(magicString),
!sanitizedTplFileContent.includes(reportTemplateMagicString),
'magic string should not be in the template file',
);
const finalContent = `${replacedMark}${JSON.stringify(tplFileContent)}`;

const corePkgDir = path.join(repoRoot, 'packages', 'core');
const corePkgJson = JSON.parse(
Expand All @@ -41,34 +43,31 @@ const corePkgDistDir = path.join(corePkgDir, 'dist');
const jsFiles = fs.readdirSync(corePkgDistDir, { recursive: true });
let replacedCount = 0;
for (const file of jsFiles) {
if (
typeof file === 'string' &&
(file.endsWith('.js') || file.endsWith('.mjs'))
) {
if (isReportTemplateInjectableFile(file)) {
const filePath = path.join(corePkgDistDir, file);
const fileContent = fs.readFileSync(filePath, 'utf-8');
if (fileContent.includes(replacedMark)) {
if (fileContent.includes(reportTemplateReplacedMark)) {
assert(
regExpForReplace.test(fileContent),
reportTemplateReplacementRegExp.test(fileContent),
'a replaced mark is found but cannot match',
);
const replacedContent = fileContent.replace(
regExpForReplace,
reportTemplateReplacementRegExp,
() => finalContent,
);
fs.writeFileSync(filePath, replacedContent);
replacedCount++;
console.log(`Template updated in file ${filePath}`);
} else if (fileContent.includes(magicString)) {
} else if (fileContent.includes(reportTemplateMagicString)) {
const magicStringCount = (
fileContent.match(new RegExp(magicString, 'g')) || []
fileContent.match(new RegExp(reportTemplateMagicString, 'g')) || []
).length;
assert(
magicStringCount === 1,
'magic string shows more than once in the file, cannot process',
);
const replacedContent = fileContent.replace(
`'${magicString}'`,
`'${reportTemplateMagicString}'`,
() => finalContent,
);
fs.writeFileSync(filePath, replacedContent);
Expand Down
1 change: 0 additions & 1 deletion apps/report/test-data/mcp-release-workflow.json

This file was deleted.

30 changes: 30 additions & 0 deletions apps/report/tests/report-template-utils.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import { describe, expect, it } from 'vitest';
import {
buildReportTemplateInjection,
reportTemplateMagicString,
reportTemplateReplacedMark,
reportTemplateReplacementRegExp,
} from '../../../scripts/report-template-utils.mjs';

describe('report template utils', () => {
it('should remove placeholders and sanitize nested injected report templates', () => {
const html = [
'<html>',
`${reportTemplateMagicString}<body>latest report</body>`,
`<script>window.__REPORT__=${reportTemplateReplacedMark}"<html>old report</html>"</script>`,
'</html>',
].join('');

const { sanitizedTplFileContent, finalContent } =
buildReportTemplateInjection(html);

expect(sanitizedTplFileContent).not.toContain(reportTemplateMagicString);
expect(sanitizedTplFileContent).toContain(
`${reportTemplateReplacedMark}""`,
);
expect(finalContent).toMatch(reportTemplateReplacementRegExp);
expect(
JSON.parse(finalContent.slice(reportTemplateReplacedMark.length)),
).toBe(sanitizedTplFileContent);
});
});
2 changes: 1 addition & 1 deletion apps/site/docs/en/api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ All agents share these base options:
- For mobile devices, setting `screenshotShrinkFactor` to 2 can reduce token consumption while maintaining clarity, but it is not recommended to set it higher than 3, as this may cause the image to be too blurry and affect the AI model's understanding.
- For web pages, if the content is complex or contains a lot of details, it is not recommended to set `screenshotShrinkFactor` to avoid overly blurry screenshots. Additionally, if you want higher clarity for web page screenshots, you can configure Puppeteer or Playwright's `deviceScaleFactor` to 2, which will allow Puppeteer or Playwright to render the page as if it were a high-definition screen.

The MCP tools and device CLIs expose these same Agent behavior options per call. In CLIs, convert the camelCase API option to a bare kebab-case flag, such as `waitAfterAction` -> `--wait-after-action`. In MCP calls, keep the camelCase option under the platform namespace, such as `android.waitAfterAction` or `web.waitAfterAction`. See [Configure Agent behavior per call](./mcp#configure-agent-behavior-per-call) for examples.
The device CLIs expose these same Agent behavior options per call. Convert the camelCase API option to a bare kebab-case flag, such as `waitAfterAction` -> `--wait-after-action`. See [Skills](./skills) for the platform CLI entry points.

### Custom model configuration

Expand Down
4 changes: 2 additions & 2 deletions apps/site/docs/en/integrate-with-any-interface.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ After implementing the UI operation class, you get the full capabilities of Mids
- the TypeScript GUI Automation Agent SDK, supporting integration with any interface
- the playground for debugging
- controlling the interface with YAML scripts
- an MCP service that exposes UI actions
- Skills support through CLI commands


## Demo and community project
Expand Down Expand Up @@ -298,7 +298,7 @@ const agent = new Agent(device, {

- `device: AbstractInterface` (required): Any class that fulfills `screenshotBase64`, `size`, and `actionSpace`. This is where you translate Midscene actions into real I/O calls for your hardware or desktop app.
- `options?: PageAgentOpt`: Shares the same option bag as the browser and mobile agents described in the [API constructors](./api#common-parameters). Commonly used fields include `generateReport`, `reportFileName`, `actionContext`/`aiActionContext`, `cacheId`, `modelConfig`, `createOpenAIClient`, `customActions`, and `onTaskStartTip`.
- The resulting agent instantly unlocks the regular automation surfaces: `aiAct`/`aiTap` APIs, YAML runner (`interface` block), [playground](#playgroundforagent-function), MCP server, and reporting pipeline.
- The resulting agent instantly unlocks the regular automation surfaces: `aiAct`/`aiTap` APIs, YAML runner (`interface` block), [playground](#playgroundforagent-function), Skills CLI support, and reporting pipeline.

### `AbstractInterface` class

Expand Down
4 changes: 2 additions & 2 deletions apps/site/docs/en/introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Midscene takes a different route: it **works from the screenshot alone**, using
- **Tests stop breaking on every refactor.** There are no selectors to chase when markup or styles change, so the maintenance cost of your suite drops sharply.
- **Reach every element and every surface.** If a human can see it, Midscene can target it — even elements with no semantic annotations, `<canvas>`, native apps, and cross-origin iframes that structure-based tools cannot reach.
- **Assert on what users actually see.** Verify visual results — colors, highlights, layout, rendered state — not just whether a node exists in the DOM.
- **Two ways to test.** Add Midscene to your existing [Playwright](./integrate-with-playwright) or Vitest suite, or let an AI agent test your app autonomously through [Skills](./skills) and [MCP](./mcp).
- **Two ways to test.** Add Midscene to your existing [Playwright](./integrate-with-playwright) or Vitest suite, or let an AI agent test your app autonomously through [Skills](./skills).
- **Failures you can read.** Every run produces a visual report you can replay step by step.

> Midscene is built for UI testing first, but the same vision-driven engine handles any UI automation task — use it however fits your work.
Expand Down Expand Up @@ -64,7 +64,7 @@ Register the GitHub form autonomously in a web browser and pass all field valida

<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/1.0-showcases/github2.mp4" height="300" controls />

See more real-world examples across iOS, Android, desktop, and MCP in [Showcases](./showcases).
See more real-world examples across iOS, Android, desktop, and custom interfaces in [Showcases](./showcases).

## Resources & community

Expand Down
Loading
Loading