A Dhwani × CIFF working paper · v0.1

From test cases
to test passes.

We wired Fluxx, Dynamics 365 and Power BI into one automation pipeline — and let the QA spreadsheet drive it. Here's what we built, what we learned, and where it goes next.

POC complete Presented May 14, 2026 Author Abhijit Nair

§1 · The setup

Three systems. One QA team. Every release night looks the same.

Fluxx

Grants · disbursements

Proposals, fund requests, donor approvals. Configurable forms, event-driven status changes, async API.

Dynamics 365

CRM · leads · service

Lead management, deduplication, bulk imports, customer service flows. SLA-sensitive surfaces.

Power BI

Donor analytics · KPIs

Dashboards on top of the other two. The bit your board actually looks at. Filtering, drill-through, export.

Each system has its own console, login wall and quirks. Integration bugs hide in the seams.

§1 · The math

1,284 cases. Run by hand. Every release.

Across the three systems, the manual QA catalogue looks like this — and grows quarter on quarter as new flows ship.

1,284

Total manual cases

Fluxx · D365 · Power BI

412

Dynamics 365

96% historical pass rate

328

Fluxx

91% — donor approvals soft

544

Power BI

87% — cross-filter latency

Today, this is human work

One QA. A spreadsheet. Several days a release. The cases that pass don't generate evidence; the cases that fail generate Slack messages.

Counts above are from the POC's demo fixtures — placeholder numbers we wired in for the runner. Worth replacing with CIFF's real catalogue counts before the readout.

§1 · The thesis

The QA spreadsheet is a programming language.
We just hadn't compiled it yet.

Manual test cases already describe inputs, steps, and expected results. That's a spec. The only thing missing is a runtime that understands it — and a way to feed the results back.

If the spreadsheet is the spec, the browser is the compiler, and the run is the assertion — we don't need to rewrite QA, we need to plug it in.

Phase 1 · Shipped

A five-stage pipeline, end to end.

Upload a sheet. Walk away. Come back to results, screenshots, and a diagnosis on every failure.

01 · INGEST

Validate

Parse xlsx, map columns, detect module.

02 · ACCESS

Chromium sign-in to Fluxx / D365 / PBI.

03 · TRANSLATE

Generate

Manual steps → Playwright .spec.ts.

04 · RUN

Execute

Live console, screenshots, traces.

05 · REPORT

Push

Git commit, dashboard, Claude review.

Custom test runner

Built for QA, not engineers. Re-run a single case, regroup cases into a suite, replay a run — without touching the CLI.

§2 · See it run

~38 seconds · captions on

The pipeline in motion — real POC, real failure, real fix.

§2 · What we built

One portal. Three integrations. Zero scripts to hand-write.

Dashboard · live POC

QA lands on a personal dashboard: total / passed / failed / skipped, modules with pass-rate bars, and the last five runs at a glance.

§2 · Upload → generate

Drop a spreadsheet. We figure out the rest.

Manual case · Excel row

Module	Fluxx
Test ID	TC-015
Title	Reject grant submission
Steps	1. Login as program officer 2. Open submission #4821 3. Click "Reject", give reason 4. Confirm rejection
Expected	Status = Rejected
Priority	P1

Generated · Playwright

import { test, expect } from '@playwright/test';

test('TC-015 reject grant submission', async ({ page }) => {
  await page.goto('https://cif.fluxx.io/login');
  await page.getByLabel('Username').fill(process.env.FLUXX_USER);
  await page.getByLabel('Password').fill(process.env.FLUXX_PASS);
  await page.getByRole('button', { name: 'Sign in' }).click();

  await page.goto('https://cif.fluxx.io/grants/4821');
  await page.getByRole('button', { name: 'Reject' }).click();
  await page.getByLabel('Reason').fill('Out of programme scope');
  await page.getByRole('button', { name: 'Confirm rejection' }).click();

  await expect(page.getByTestId('grant-status')).toHaveText('Rejected');
});

Column-mapper figures out which column is "title", which is "steps", which is the test ID — even when the QA file has been re-saved nine times.

§2 · The run

QA can watch tests run. So can their PM.

Run #215 · live

Real-time signal

Five pipeline stages, live console straight from the runner. No SSH, no CLI, no "did the build pass?" Slack message at 11pm.

Run artifacts

Trace, screenshot, video and the generated .spec.ts file — all linked from the run detail page. Re-run a single case in one click.

§2 · A real failure caught

TC-015: Reject grant submission — failed

Async race condition. The assertion fired before the API responded. The kind of bug a manual QA might miss because the screen "looks right" after a second.

The error

AssertionError
expect(locator).toHaveText('Rejected')

Expected: Rejected
Received: Pending review
At: grant_approval.spec.ts:142

Claude's analysis

The reject button triggered the request, but the status update is async. The assertion fired before the API responded.

Suggestion: Add waitForResponse('/api/grants/*/reject') before the status check.

Three failure modes the runner has already surfaced: async race, brittle locator, real backend bug. We file the third with the dev team — the first two, we fix in seconds.

§2 · Claude's fix

From diagnosis to a one-line patch.

grant_approval.spec.ts · diff

  await page.getByRole('button', { name: 'Confirm rejection' }).click();
+ await page.waitForResponse(r =>
+   r.url().includes('/api/grants/') && r.url().includes('/reject')
+ );
  await expect(page.getByTestId('grant-status')).toHaveText('Rejected');

Why this matters

The runner doesn't just say "TC-015 failed." It explains why, proposes a fix, and lets QA re-run with one click. Bug found → patched → re-verified in under three minutes. That used to be a half-day investigation.

§2 · So far

What the POC shipped.

Systems wired

Fluxx, D365, Power BI — same runner

Pipeline stages

Validate → Push, fully traced

1-click

Re-run

Single case, suite, or full module

~3m illustrative

Find → fix → verify

For the failure modes Claude understands

Phase 1 · Done

Runner & portal

Custom test runner with on-demand execution
Excel ingest, column mapping, module detection
Pre-built specs for grant approval / lead capture / dashboard filter
Claude analysis card on every failure
Suites, runs history, traces, screenshots

Phase 2 · Proposed

AI-powered case conversion

Convert any manual case in the catalogue → spec
Engineer + Claude Code (or autonomous AI) in the loop
Suite library covering all three systems
CI gate on every dev PR
QA spends time on edges, not regression

Phase 2 · Proposed

What if every manual case became a runnable spec?

Not just the three flows we hand-picked for the POC. The whole catalogue — 1,284 cases — converted, suite-ised, and rerun on every PR.

Today · per case illustrative

QA writes the manual case. QA executes it. Every release. Defects surface late — sometimes after the donor has seen the dashboard.

~25m

per case · per release

×1,284

cases in catalogue

With Phase 2 · per case illustrative

QA writes the manual case once. AI translates it. The runner enforces it forever. QA reviews failures — not green runs.

~8s

per case · per release

~1 day

catalogue, end to end

Numbers on this slide are illustrative projections — not measured. They assume average case complexity and 2× concurrency. Real numbers will land somewhere in this ballpark once we have a baseline week of runs.

§3 · The conversion loop

Engineer + AI + runner — a feedback loop that gets faster every week.

01 · INTAKE

Submit cases

QA uploads the manual catalogue — same Excel, same columns.

02 · DRAFT

AI generates

Claude proposes a Playwright spec per case, citing locators.

03 · REVIEW

Engineer in the loop

Quick human sign-off, fixes the 10% that need judgement.

04 · VERIFY

Runner executes

Specs run against staging. Any failure → Claude analysis.

05 · BANK

Add to suite

Approved cases join the regression suite. Forever.

The compounding effect

Every converted case stays converted. Every new manual case gets the same treatment. The library grows; the cost per case falls.

§3 · What changes

Three teams. Three new defaults.

CIFF QA

Manual → exploratory

Stops re-running the same 1,284 cases. Spends time on new flows, exploratory testing, and edge cases that actually need a human.

DEV

CIFF engineering

Find bugs sooner

Regression runs on every PR. Failures land in chat with a Claude diagnosis and a proposed patch. Less back-and-forth.

Programme leads

Release with confidence

Dashboard goes from "QA says it's fine" to a real-time pass-rate per module, per release, with trace evidence.

§3 · Roadmap

Six months to a self-healing regression suite.

M0–M1 · NOW

Stabilise Phase 1

Harden the runner, onboard the QA engineer, run weekly suites against staging.

M2–M3

AI conversion v1

Engineer + Claude converts top 200 cases. Targets the high-touch Fluxx & D365 flows.

M3–M4

Suite library

Smoke, regression, integration suites per module. CIFF can mix and match.

M5–M6

CI gate

Suite runs on every PR. Releases are gated on green; failures route to dev with a diagnosis.

No big-bang migration. Each milestone ships value on its own. If we stop at M3, CIFF still keeps the 200-case library.

§4 · The ask

Let's stop testing the
same things twice.

What we need to start Phase 2: a green-light, access to staging for the three systems, and one QA hour a week to validate the converted cases. We bring the engineer, the AI, and the runner.

Next step Scope workshop · this month Contact abhijit.nair@dhwaniris.com

CIFF × DHWANI · MAY 2026 · v0.1