Jonathan Capone — Build Environment for Testing & Analytics

The Architecture

Holistic Progress Tracking.

Before BETA, proof generation, project documentation, and metrics lived in silos. BETA unifies them into a single, reliable pane of glass where real metrics drive decisions. It ingests report.json files from proof runs, scans codebases, manages SQLite histories, and acts as the nervous system for the OMEGA suite.

Data Quarantine

Authentic Evidence

BETA strictly separates real field/bench evidence from mock/demo data. If a report is flagged as synthetic, it is quarantined and prevented from artificially inflating progress metrics.

Local Dashboard

Stdlib HTTP Server

A dependency-free Python server (built on http.server / ThreadingTCPServer) serves JSON data and generated HTML from local SQLite workspaces. No web framework, no external runtime dependencies.

Advisory AI

Ollama Integration

Optionally uses local, privacy-preserving Ollama models (e.g. Llama 3.2, Gemma, Qwen) as an advisory analyst and project manager. The deterministic tracking engine remains the source of truth; AI output is marked advisory and gated by real evidence.

Engineering Journal

Evolving the Dashboard.

The Bottleneck

The CLI Limits

Initially conceived as just the "OMEGA proof generator", the tool was a simple command-line script. When the ecosystem grew to encompass hardware and team coordination, the CLI became an untameable beast.

The Migration

Generated Web UI

Moving to a local web dashboard was required to visualize the sprawling data. BETA generates static HTML plus a JSON data feed served by a small stdlib server, which let the tool track itself (using BETA on BETA) alongside unrelated hardware projects.

Data Flow

Unified Ingestion Pipeline

  Data Ingestion                          Analytics Engine                 Dashboard UI
  ──────────────                          ────────────────                 ────────────
  report.json files ──┐                   Quarantine & Sanitize            Serve Local Dashboard
  Project Scans       │                   Calculate Deltas                 (stdlib http.server)
  GitHub issues + CI ─┼─► SQLite DB ──►   Daily Snapshot + Trend ─► JSON ► Action Tracker
  AI Sessions       ──┘                   Generate Reports                 Trend Lines

Automation Layer

One Command Interface.

A single PowerShell helper (dev.ps1) wraps the Python CLI behind a validated set of verbs, so the day-to-day loop does not require remembering raw module flags. It bootstraps and configures workspaces, ingests external report.json artifacts and project info, records measured evidence and work sessions, runs the test suite, serves the dashboard, and invokes the local AI analyst.

  .\dev.ps1 doctor           # environment + workspace checks
  .\dev.ps1 init-project     # bootstrap a tracked workspace
  .\dev.ps1 evidence-template / record-evidence   # capture measured runs
  .\dev.ps1 record-work      # log a work session to the ledger
  .\dev.ps1 test / run-tests # BETA suite, or allowlisted per-project test commands
  .\dev.ps1 github-import / import-tests   # pull issues + milestones, CI / test results
  .\dev.ps1 snapshot-project # persist a dated snapshot for trend lines
  .\dev.ps1 refresh / serve  # rebuild + serve the local dashboard
  .\dev.ps1 ai / ask / manage-ai  -Model gemma4:latest   # advisory Ollama

Raw python -m beta <command> calls still work; dev.ps1 is a convenience layer over the same engine. The Python package itself ships with zero third-party dependencies.

What It Measures

Metrics, Not Vibes.

The deterministic engine derives a fixed set of tracked metrics from ingested evidence, then consolidates them into a single prioritized backlog. Nothing here is a vanity number — each metric maps back to a signal in the data.

Proof Health

Evidence Quality

Correctness, effectiveness, efficiency, evidence coverage, backlog health, and regression load, plus per-claim coverage and data-quality checks (baseline availability, repeatability, scenario diversity, sample size).

Core OMEGA Metrics

Run Throughput

Ingest throughput, write p95, query p95, station coverage, bathymetry, mesh routes, portal payload, database bytes per observation, and wire savings — compared run-over-run against the same scenario.

Workflow Health

Project Velocity

Evidence velocity, repeat depth, connected-source coverage, active-measurement coverage, and AI availability, alongside PM metrics for runs/scenarios tracked, improved vs regressed metrics, and open P0/P1 work.

Action Tracker

Metrics → Prioritized Backlog

The Plan, Manager, and AI screens consolidate the operating plan, deterministic work guidance, measurement plan, manager risks, missing data sources, metric backlog, and advisory Ollama suggestions into statused action records. Each action carries a source, priority, metric, next step, success signal, and evidence requirement. Actions can be active, blocked by missing real data, gated by manager risks, or flagged as advisory AI items — AI can feed the tracker, but status stays gated by real project evidence.

Honest Tracking

The Blank-Chart Trade-off.

BETA's whole point is to not lie about progress. That choice has a deliberate, visible cost: until real field or bench evidence is provided, the charts can be entirely blank.

By Design

Blank Until Proven

If no accepted real report exists, project pages show Needs data and keep all proof charts empty. An empty chart is treated as the honest answer, not a bug to paper over with placeholder data. Likewise, a software metric with no measurement is shown as not measured — never a misleading 0.0.

Quarantine

Marker Set

Reports carrying demo, synthetic, simulated, fixture, mock, dummy, fake, example, placeholder, template, or draft markers — plus the OMEGA-specific local-proof and coastal-demo tags — are quarantined before metrics run. (smoke is treated as a soft marker.) They stay visible as audit inputs but never drive charts, scores, or AI analysis.

Provenance

Data & Model Lineage

Every dashboard JSON and exported report records the BETA version, analysis schema, data-version ID, project config versions, real vs quarantined report counts, and the Ollama model used for advisory review — so you can tell which model and which data shaped a result without treating model output as proof.

Version History & Deltas

Tracking Its Own Drift

When dashboards or reports are generated, BETA appends compact workspace/project records to version-history.json and compares the latest scoped record against the previous one — calling out material changes such as a different AI model, new real evidence, fewer quarantined reports, or a changed config version. BETA also persists a dated project snapshot each day, so every tracked metric carries a trend line over time rather than just a latest value. Each tracked project (for example omega and beta itself) gets its own scoped page and data feed; the workspace dashboard is only a project selector and does not combine metrics across projects.

BETA Framework.