Test Automation Circles v2.0

A framework for E2E test automation in the AI era

What is Test Automation Circles?

Test Automation Circles is a thinking framework for correctly implementing and operating E2E test automation.

Most test automation failures are not caused by tools or technology, but by the absence of purpose and strategy. This holds true regardless of the tool — from Selenium to Playwright to AI.

The framework starts from the center (Why), expanding outward (How, then What). Five layers, each building on the previous one.

Version 2.0 adds the perspective of human-AI collaboration across all layers.

Five Layers

Core: Why?

Why do you test?

The most important question — and the one most often skipped.

Without a clear purpose, your test suite grows without direction. AI can write tests faster than ever, but faster tests without purpose is just faster waste.

Decide	Example
Objective	Risk mitigation? Faster feedback? Regression prevention?
Stakeholder alignment	Does everyone agree on what "quality" means?

In the AI era, this layer is unchanged. Purpose does not change because tools get better.

Concept: How?

How do you test?

Once the purpose is clear, design your approach. This layer has changed significantly in v2.0.

Test Strategy — What to Decide in the AI Era

In addition to traditional test strategy (risk-based approach, test level allocation, prioritization), the AI era requires these additional decisions:

Which activities use AI? Test design? Code generation? Result analysis? Execution?
How much autonomy does AI get? Human leads and AI assists? Or AI leads and human reviews?
Where is human approval required? Decide based on impact and reversibility

AI is most effective in test design (generating test perspectives) and result analysis (classifying failures). Test execution via AI should be introduced gradually.

Testability Design [New in v2.0]

Testability means "how easy it is to test." The key question in v2.0 is: testable for whom?

Traditionally, we only considered whether a system was easy for humans to test. In the AI era, once you decide "which activities use AI" in your strategy, whether the specs, environment, and code are testable for AI becomes a critical design consideration. This decision has a significant impact on outcomes.

For example, when AI operates a browser, it perceives the UI through accessibility trees (ARIA Snapshots). Without semantic HTML, ARIA roles, and meaningful labels, AI cannot understand your UI.

Good news: accessibility investment = AI testability. The same practices that help screen readers also help AI.

Test Scope — Three Domains [Updated in v2.0]

Testing in the AI era is no longer a binary choice between "manual" and "automated." There are three domains:

Domain	Characteristics	Use Case
Human Testing	Exploratory, judgment-based	UX, edge cases, creative scenarios
Automated Testing	Scripted, deterministic	Regression, CI/CD gates
AI-Driven Testing	Intent-based, exploratory automation	Quick verification, rapid feedback

Consider which tests belong in which domain, and when to convert AI-driven tests to automated tests.

Design — Spec, Test, Data

Design consists of three sub-items. In v1, only "Test Design" existed. In the AI era, Spec and Data also require independent design decisions.

These three must be designed together. If any one of them falls out of sync, AI amplifies the gap into regressions. For example, when AI modifies code but the spec is not updated, a later refactoring session where AI references the stale spec can introduce unintended changes — this actually happens.

① Spec Design [New in v2.0]

When AI writes code, the spec becomes the source of truth — not just documentation, but the instruction set for AI agents.

Clear spec → AI implements → Tests verify against spec.

The most critical factor is spec freshness. If the spec is stale, AI writes code based on outdated information, and tests validate against that stale spec as "correct." Spec, code, and tests must stay in sync at all times — otherwise AI becomes a force that breaks quality rather than protects it.

Including not just "what to build" but "why it is needed" enables AI to generate more essential tests.

② Test Design

The area where human thinking is most challenged. In an era where AI can write tests, deciding "what should be tested" is something only humans can do. Multi-perspective analysis is essential:

Vertical Thinking — Deep domain analysis
Lateral Thinking — Examining from different angles
Critical Thinking — Verifying the whole chain holds together

③ Data Design [New in v2.0]

The three test domains (Automated, AI-Driven, Human) require different levels of data granularity.

Domain	Data Granularity	Example
Automated Testing	Concrete	"User A adds Product B x1 to cart" — specific data, rule-based, deterministic
AI-Driven Testing	Abstract	"A valid user purchases a product" — leave room for AI to select/generate specific data contextually
Human Testing	Situational	Tester chooses data by judgment. Not predetermined in exploratory testing

For automated tests, idempotency (same operation, same result) requires pre-seeding data, cleanup after tests, and independent datasets per test.

For AI-driven tests, overly concrete data kills the AI's exploratory strength. Overly abstract data loses reproducibility. The right data granularity depends on the test format.

Architecture: What?

What do you use?

Concept defined strategy, testability, and design. Now choose the technology to realize them.

A layer that has changed significantly in v2.0. The principle of choosing tools only after Core and Concept are determined still holds. But the scope of what you need to consider has expanded considerably.

What Has Changed

Traditional E2E test automation meant choosing one tool and working within it. In the AI era, a single tool is no longer sufficient.

How UI is perceived has changed — From DOM-based selectors (CSS/XPath) to accessibility trees (ARIA Snapshots). AI understands UI through accessibility, not the DOM
How UI is operated has changed — AI can now operate browsers directly via MCP (Model Context Protocol)
The scope of architecture decisions has expanded — Beyond a single tool to AI integration, token costs, non-determinism handling, logging design, and more

As a result, required skills also become broader. This directly connects to the Base layer (Foundation) and skill design.

With these changes in mind, there are four things to decide:

What to Decide

Tool selection — Consider team skills, project's future, MCP compatibility
Framework — Align with development language. If engineers maintain tests, learning cost is a direct factor
Environment — Ensure test independence. When tests depend on each other, a single failure cascades and destroys reliability of the entire suite
CI/CD integration — When to run what. A trade-off between feedback speed and accuracy. For AI-driven tests, token costs also factor in — running on every commit vs. only on merge is a strategic choice

Knowledge and Wisdom in Tool Selection

AI may recommend excellent tools. But the tool that AI recommends and the tool that fits your project's present and future may differ.

AI can suggest "the best tool for current requirements." But "compatibility with features you will need in three months" or "fit with your team's existing skills" — these require human judgment grounded in project context. Tool selection requires not just AI's knowledge, but human wisdom to see ahead.

Monitoring & Control: Real.

Keep it running.

The test automation built in Architecture — that is just the starting point. Test automation does not end when tests are written. Most teams stop investing here — and this is where test suites die.

Result Analysis — Pass/Fail Is Not Enough

A test failed. So what? What matters is what comes next.

Trend analysis — Track success rates and execution times daily
Failure classification — Environment-caused? Code change? Test itself? Without classification, every failure investigation starts from zero
Dashboard visualization — The entire team should see test health at a glance

AI-Driven Test Logging [New in v2.0]

Logging for automated tests is about recording results. Same script, same behavior — results are enough.

AI-driven tests are different. The same prompt may produce different actions each time. What needs to be recorded is not the result but the decision-making process — what AI "saw," what it decided, and why it chose that action.

Without this, "AI tested it" is no evidence at all. Trust, auditing, debugging, improvement — all depend on proper logging.

Test Script Maintenance

When the system under test changes, tests must change too. In practice, test updates are often deferred. "Is this test still needed?" "Can any tests be consolidated?" "Is the suite the right size?" — without a regular maintenance cycle asking these questions, the test suite degrades.

Test Suite Optimization [New in v2.0]

When humans wrote 100 tests, you could understand them all. When AI generates 500, nobody has the full picture.

Deduplication — Consolidate tests verifying the same functionality with different expressions
Value assessment — Does each test contribute to business risk coverage? Remove low-value tests
Smoke test curation — Select critical tests from the full suite for rapid verification
Periodic inventory — Regularly review the entire suite; remove obsolete tests

Base: Foundation.

The foundation nobody wants to talk about.

Everything from Core to Monitoring & Control rests on this foundation. No matter how excellent your strategy or tools, without a solid foundation, nothing is sustainable. Four things to consider: Resources, Teams, Skill Sets, Culture.

What Happens When the Foundation Is Weak

Insufficient resources or skills → The automation system becomes unmaintainable. It depends on specific individuals
No culture or team buy-in → Tests are "created but unused." Unless the entire team understands why automation matters, tests become ceremonial
Scripting language misaligned with development language → Learning costs spike and maintenance sustainability drops

Leverage Existing Skills or Introduce New Ones?

This choice determines success or failure. Do you build automated tests that maximize your team's existing skill set, or introduce new skills as a team? In the AI era, this decision becomes even more complex.

Skills Are Not Disappearing — They Are Changing

"AI makes test automation easy, so we do not need skilled engineers anymore" — this is wrong. The type of skills required changes.

In addition to traditional skills (selector strategies, wait handling, programming), the AI era demands:

Accessibility knowledge — the foundation for AI to perceive UI
LLM behavior understanding — non-determinism, hallucination, context limits
Prompt design — the quality of instructions determines AI output
AI output evaluation — the ability to catch confident-looking mistakes
Spec writing — spec design quality affects everything

The Principles

Start from the Center. Work Outward.

The faster AI accelerates test generation and execution, the more important the center becomes. AI provides powerful "What." Defining "Why" and "How" remains human work.

This principle has not changed in 20 years. It will not change in the next 20.

Knowledge and Wisdom

This theme runs through every layer — not just Base.

AI has knowledge — vast and instant. Wisdom is different. Wisdom is knowing to "check the back when you see the front." It is cultivated through experience: thinking, trying, noticing mistakes, thinking again.

AI accelerates this cycle. But the more you use AI, the more your own judgment matters. Core (why do we test?), Concept (how do we design strategy?), Architecture (what do we choose?), M&C (how do we improve operations?), Base (how do we grow the team?) — in every layer, the wisdom to judge AI's knowledge within your own context is what is required.

Relationship with arQua:Maturity

Test Automation Circles and arQua:Maturity are complementary:

arQua:Maturity (Diagnosis)
  → Identifies where the weaknesses are

Test Automation Circles (Implementation Guide)
  → Provides how to improve

Use arQua:Maturity to diagnose your current state. Use Test Automation Circles to implement improvements. Re-diagnose to measure progress.

Get Started

Whether you are starting test automation from scratch or struggling with an existing implementation, the framework applies.

Start with one question: Why do you test?

If you'd like help applying Test Automation Circles to your team, contact us.