CIFF×DHWANI ·QA AUTOMATION ·MAY 2026

A Dhwani × CIFF working paper · v0.1

From test cases
to test passes.

We wired Fluxx, Dynamics 365 and Power BI into one automation pipeline — and let the QA spreadsheet drive it. Here's what we built, what we learned, and where it goes next.

POC complete Presented May 14, 2026 Author Abhijit Nair

§1 · The setup

Three systems. One QA team. Every release night looks the same.

FX

Fluxx

Grants · disbursements

Proposals, fund requests, donor approvals. Configurable forms, event-driven status changes, async API.

D3

Dynamics 365

CRM · leads · service

Lead management, deduplication, bulk imports, customer service flows. SLA-sensitive surfaces.

PB

Power BI

Donor analytics · KPIs

Dashboards on top of the other two. The bit your board actually looks at. Filtering, drill-through, export.

Each system has its own console, login wall and quirks. Integration bugs hide in the seams.

§1 · The math

1,284 cases. Run by hand. Every release.

Across the three systems, the manual QA catalogue looks like this — and grows quarter on quarter as new flows ship.

1,284
Total manual cases
Fluxx · D365 · Power BI
412
Dynamics 365
96% historical pass rate
328
Fluxx
91% — donor approvals soft
544
Power BI
87% — cross-filter latency
Today, this is human work

One QA. A spreadsheet. Several days a release. The cases that pass don't generate evidence; the cases that fail generate Slack messages.

Counts above are from the POC's demo fixtures — placeholder numbers we wired in for the runner. Worth replacing with CIFF's real catalogue counts before the readout.

§1 · The thesis

The QA spreadsheet is a programming language.
We just hadn't compiled it yet.

Manual test cases already describe inputs, steps, and expected results. That's a spec. The only thing missing is a runtime that understands it — and a way to feed the results back.

If the spreadsheet is the spec, the browser is the compiler, and the run is the assertion — we don't need to rewrite QA, we need to plug it in.

Phase 1 · Shipped

A five-stage pipeline, end to end.

Upload a sheet. Walk away. Come back to results, screenshots, and a diagnosis on every failure.

01 · INGEST
Validate
Parse xlsx, map columns, detect module.
02 · ACCESS
Login
Chromium sign-in to Fluxx / D365 / PBI.
03 · TRANSLATE
Generate
Manual steps → Playwright .spec.ts.
04 · RUN
Execute
Live console, screenshots, traces.
05 · REPORT
Push
Git commit, dashboard, Claude review.
Custom test runner

Built for QA, not engineers. Re-run a single case, regroup cases into a suite, replay a run — without touching the CLI.

§2 · See it run

~38 seconds · captions on

The pipeline in motion — real POC, real failure, real fix.

§2 · What we built

One portal. Three integrations. Zero scripts to hand-write.

Dashboard · live POC

QA lands on a personal dashboard: total / passed / failed / skipped, modules with pass-rate bars, and the last five runs at a glance.

§2 · Upload → generate

Drop a spreadsheet. We figure out the rest.

Manual case · Excel row

ModuleFluxx
Test IDTC-015
TitleReject grant submission
Steps 1. Login as program officer
2. Open submission #4821
3. Click "Reject", give reason
4. Confirm rejection
ExpectedStatus = Rejected
PriorityP1

Generated · Playwright

import { test, expect } from '@playwright/test';

test('TC-015 reject grant submission', async ({ page }) => {
  await page.goto('https://cif.fluxx.io/login');
  await page.getByLabel('Username').fill(process.env.FLUXX_USER);
  await page.getByLabel('Password').fill(process.env.FLUXX_PASS);
  await page.getByRole('button', { name: 'Sign in' }).click();

  await page.goto('https://cif.fluxx.io/grants/4821');
  await page.getByRole('button', { name: 'Reject' }).click();
  await page.getByLabel('Reason').fill('Out of programme scope');
  await page.getByRole('button', { name: 'Confirm rejection' }).click();

  await expect(page.getByTestId('grant-status')).toHaveText('Rejected');
});

Column-mapper figures out which column is "title", which is "steps", which is the test ID — even when the QA file has been re-saved nine times.

§2 · The run

QA can watch tests run. So can their PM.

Run #215 · live
Real-time signal

Five pipeline stages, live console straight from the runner. No SSH, no CLI, no "did the build pass?" Slack message at 11pm.

Run artifacts

Trace, screenshot, video and the generated .spec.ts file — all linked from the run detail page. Re-run a single case in one click.

§2 · A real failure caught

TC-015: Reject grant submission — failed

Async race condition. The assertion fired before the API responded. The kind of bug a manual QA might miss because the screen "looks right" after a second.

The error

AssertionError
expect(locator).toHaveText('Rejected')

Expected: Rejected
Received: Pending review
At: grant_approval.spec.ts:142

Claude's analysis

The reject button triggered the request, but the status update is async. The assertion fired before the API responded.

Suggestion: Add waitForResponse('/api/grants/*/reject') before the status check.

Three failure modes the runner has already surfaced: async race, brittle locator, real backend bug. We file the third with the dev team — the first two, we fix in seconds.

§2 · Claude's fix

From diagnosis to a one-line patch.

grant_approval.spec.ts · diff

  await page.getByRole('button', { name: 'Confirm rejection' }).click();
+ await page.waitForResponse(r =>
+   r.url().includes('/api/grants/') && r.url().includes('/reject')
+ );
  await expect(page.getByTestId('grant-status')).toHaveText('Rejected');
Why this matters

The runner doesn't just say "TC-015 failed." It explains why, proposes a fix, and lets QA re-run with one click. Bug found → patched → re-verified in under three minutes. That used to be a half-day investigation.

§2 · So far

What the POC shipped.

3
Systems wired
Fluxx, D365, Power BI — same runner
5
Pipeline stages
Validate → Push, fully traced
1-click
Re-run
Single case, suite, or full module
~3m illustrative
Find → fix → verify
For the failure modes Claude understands
Phase 1 · Done

Runner & portal

  • Custom test runner with on-demand execution
  • Excel ingest, column mapping, module detection
  • Pre-built specs for grant approval / lead capture / dashboard filter
  • Claude analysis card on every failure
  • Suites, runs history, traces, screenshots
Phase 2 · Proposed

What if every manual case became a runnable spec?

Not just the three flows we hand-picked for the POC. The whole catalogue — 1,284 cases — converted, suite-ised, and rerun on every PR.

Today · per case illustrative

QA writes the manual case. QA executes it. Every release. Defects surface late — sometimes after the donor has seen the dashboard.

~25m
per case · per release
×1,284
cases in catalogue

With Phase 2 · per case illustrative

QA writes the manual case once. AI translates it. The runner enforces it forever. QA reviews failures — not green runs.

~8s
per case · per release
~1 day
catalogue, end to end

Numbers on this slide are illustrative projections — not measured. They assume average case complexity and 2× concurrency. Real numbers will land somewhere in this ballpark once we have a baseline week of runs.

§3 · The conversion loop

Engineer + AI + runner — a feedback loop that gets faster every week.

01 · INTAKE
Submit cases
QA uploads the manual catalogue — same Excel, same columns.
02 · DRAFT
AI generates
Claude proposes a Playwright spec per case, citing locators.
03 · REVIEW
Engineer in the loop
Quick human sign-off, fixes the 10% that need judgement.
04 · VERIFY
Runner executes
Specs run against staging. Any failure → Claude analysis.
05 · BANK
Add to suite
Approved cases join the regression suite. Forever.
The compounding effect

Every converted case stays converted. Every new manual case gets the same treatment. The library grows; the cost per case falls.

§3 · What changes

Three teams. Three new defaults.

QA

CIFF QA

Manual → exploratory

Stops re-running the same 1,284 cases. Spends time on new flows, exploratory testing, and edge cases that actually need a human.

DEV

CIFF engineering

Find bugs sooner

Regression runs on every PR. Failures land in chat with a Claude diagnosis and a proposed patch. Less back-and-forth.

PM

Programme leads

Release with confidence

Dashboard goes from "QA says it's fine" to a real-time pass-rate per module, per release, with trace evidence.

§3 · Roadmap

Six months to a self-healing regression suite.

M0–M1 · NOW
Stabilise Phase 1
Harden the runner, onboard the QA engineer, run weekly suites against staging.
M2–M3
AI conversion v1
Engineer + Claude converts top 200 cases. Targets the high-touch Fluxx & D365 flows.
M3–M4
Suite library
Smoke, regression, integration suites per module. CIFF can mix and match.
M5–M6
CI gate
Suite runs on every PR. Releases are gated on green; failures route to dev with a diagnosis.

No big-bang migration. Each milestone ships value on its own. If we stop at M3, CIFF still keeps the 200-case library.

§4 · The ask

Let's stop testing the
same things twice.

What we need to start Phase 2: a green-light, access to staging for the three systems, and one QA hour a week to validate the converted cases. We bring the engineer, the AI, and the runner.

Next step Scope workshop · this month Contact abhijit.nair@dhwaniris.com

CIFF × DHWANI · MAY 2026 · v0.1