BETA - BETA

validate

Current run is stable; strengthen evidence depth.

Comparable metrics are mostly unchanged, so stronger samples and external evidence matter more.

Data quality 69% thin

Focus Metrics

No metric regressionsKeep collecting repeated runs and broader scenarios.

Next Moves

Software Test Pass Rate P1 | Software QA
Dashboard Render Smoke P1 | Frontend QA
Project Control Coverage P1 | Product/project manager
AI Recommendation Follow-Through P1 | AI/project manager

Missing Inputs

Sample Scale Run proof scenarios at 100, 500, and 1000+ observations and compare the curves.
Source Connection Plan Register planned sources, then connect real files or folders as they become available.
AI Analyst Use auto-strong for serious reviews; keep deterministic metrics as the source of truth.
Field And Bench Logs Create a simple CSV or report.json path for bench and field validation evidence.

Real Evidence Capture Kit

BETA has real evidence; repeat matching scenarios to prove improvement.

collect-repeat-baseline

Starter Scenarios

benchP1

Bench Validation Baseline

Creates the first accepted real baseline so charts and claims stop being empty.

.\dev.ps1 evidence-template -ProjectKey beta -ScenarioId bench-validation -CollectionType bench

.\dev.ps1 record-evidence -ProjectKey beta -ScenarioId bench-validation -CollectionType bench -RequiredPassed -AdvisoryPassed -ObservationsAccepted 100

One measured report imports as real, required gates pass, and the dashboard shows one real run.

fieldP1

Field-Link Load Check

Connects portal/API responsiveness to real network conditions instead of local-only timing.

.\dev.ps1 evidence-template -ProjectKey beta -ScenarioId field-link-load -CollectionType field

.\dev.ps1 record-evidence -ProjectKey beta -ScenarioId field-link-load -CollectionType field -RequiredPassed -ObservationsAccepted 100 -PortalHtmlBytes 1 -QueryP95Ms 1

Portal payload and query p95 are recorded from a constrained or remote path.

ciP1

CI Regression Evidence

Adds build/test pass history so project progress is not inferred only from proof runs.

.\dev.ps1 evidence-template -ProjectKey beta -ScenarioId ci-regression -CollectionType ci

.\dev.ps1 record-evidence -ProjectKey beta -ScenarioId ci-regression -CollectionType ci -RequiredPassed -ObservationsAccepted 1

CI result evidence is attached and repeat failures trend down over time.

Capture Steps

1 Create a template Generate a non-imported template so the team can see the expected report shape. .\dev.ps1 evidence-template -ProjectKey beta -ScenarioId bench-validation -CollectionType bench
2 Collect measured values Run a real bench, field, CI, firmware, hardware, or operator check and keep the notes/logs. Do not mark the report real until values came from an actual measured run.
3 Record the report Use record-evidence with the measured values so BETA writes report.json in the project evidence folder. .\dev.ps1 record-evidence -ProjectKey beta -ScenarioId bench-validation -CollectionType bench -RequiredPassed -AdvisoryPassed -ObservationsAccepted 100
4 Repeat the same scenario A second matching real run is what turns current-state evidence into improvement evidence. .\dev.ps1 refresh

Required Report Fields

metadata.data_authenticityrealLets BETA accept the report as usable evidence instead of quarantining it.
metadata.collection_typebench | field | ci | firmware | hardware | operator | measuredTells BETA what kind of real-world source produced the measurement.
scenario.scenario_idstable scenario idMatching scenario ids allow before/after comparisons over time.
overall.required_passedtrue/falseCorrectness gates are the minimum proof before performance claims matter.
metrics.ingest.counts.observations.acceptedintegerSample size affects confidence in throughput, storage, and latency claims.
analysis.evidence[].summaryshort measured source noteKeeps the metric tied to the actual test, bench note, CI run, or field observation.
analysis.evidence[].sourcelog/photo/serial/CI/bench source referenceLets a reviewer trace the metric back to the file, capture, or note that produced it.

Metric Field Map

Measurement	Report Field	Priority
Software Test Pass Rateunit_test_pass_rate	metrics.custom.unit_test_pass_rate	P1
Code Coveragecode_coverage_percent	metrics.custom.code_coverage_percent	P2
Open Test Failurestest_failure_count	metrics.custom.test_failure_count	P2
Dashboard Render Smokedashboard_render_success	metrics.custom.dashboard_render_success	P1
Project Control Coverageproject_control_coverage	metrics.custom.project_control_coverage	P1
Blocker And Todo Flowopen_blocker_count	metrics.custom.open_blocker_count	P2
AI Recommendation Follow-Throughai_recommendation_follow_through	metrics.custom.ai_recommendation_follow_through	P1
Issue And CI Historyci_failure_rate	metrics.custom.ci_failure_rate	P2
Test Pass Ratetest_pass_rate	analysis.evidence[] or imported CI input	P2
Critical Workflow Success Ratecritical_workflow_success_rate	metrics.custom.critical_workflow_success_rate	P2
P95 Latencyp95_latency	metrics.custom.p95_latency	P2
Throughputthroughput	metrics.custom.throughput	P2
Resource Costresource_cost	metrics.custom.resource_cost	P2
Failure Ratefailure_rate	analysis.evidence[] or imported CI input	P2

Overall Health68%

At risk. Weighted from proof, coverage, backlog, and regressions.

Stable

Proof Score61%

Correctness, effectiveness, and efficiency score.

Evidence Coverage40%

2 of 5 evidence signals present.

Data Quality69%

thin. Baselines, repeatability, sample size, and AI review.

Backlog Health100%

Falls as high-priority findings and open work increase.

What This Is Tracking

BETA compares the latest accepted real proof report against the previous real run with the same scenario. Quarantined demo data is listed for audit but never drives health, progress, claims, or AI analysis.

Software Qualitypassing

35.00 / 40.0

Required checks, advisory checks, unit tests, and dashboard render success.

Project Controlthin

7 / 35.0

Setup, todos, work logs, evidence capture, reports, and AI workflow controls.

Workflow Visibilitythin

19.00 / 25.0

Project-specific metrics, blockers, todos, work logs, and real evidence lineage.

Project Management

Runs 4

proof reports imported

Scenarios 2

distinct comparable scenarios

Regressions 0

latest vs matching baseline

Improvements 0

latest vs matching baseline

Open Work 0

generated backlog items

P1/P0 Load 0

urgent improvement items

Sources 9

connected evidence inputs

Planned Metrics 14

measurement plan items

Tracked Projects

betaplanned

BETA

software | software

Goal: Use BETA to track and improve BETA itself Files: 31 of 31 Inputs: 9 (ci: 2, docs: 5, repo: 2) Todos: 2 open, 0 blocked, 50.00 % done AI: gemma4:latest

Next Actions

ClearNo generated actions for the latest run.

Progress Over Time

The latest comparable run is stable; expand evidence depth.

Metric Drilldowns

Metric	Latest	Previous	Status	Best	Why
Unit Test Pass RatePercent of BETA unit tests passing during validation	100.0 %	100.0 %	unchanged	100.0 %beta-automated-test-run-2026-06-22t18-32-10z	Unit Test Pass Rate is stable within the 2 percent noise band.
Code CoverageCoverage shows how much of the code the passing tests actually exercise, so a high pass rate is not hiding untested paths.	0 %	0 %	no-baseline	0 %beta-automated-test-run-2026-06-22t18-32-10z	Code Coverage needs a matching baseline before progress can be judged.
Test FailuresFailures point directly at what is broken right now and what to fix before the next claim of done.	0 count	0 count	unchanged	0 countbeta-automated-test-run-2026-06-22t18-32-10z	Test Failures is stable within the 2 percent noise band.
Dashboard Render SuccessWhether dashboard regeneration succeeds after evidence import	0 pass	0 pass	no-baseline	0 passbeta-automated-test-run-2026-06-22t18-32-10z	Dashboard Render Success needs a matching baseline before progress can be judged.
Project Control CoverageCoverage of setup, todo, work, evidence, and AI controls on project pages	0 %	0 %	no-baseline	0 %beta-automated-test-run-2026-06-22t18-32-10z	Project Control Coverage needs a matching baseline before progress can be judged.
Open Blocker CountOpen blocked todos or work blockers that need attention	0 count	0 count	unchanged	0 countbeta-automated-test-run-2026-06-22t18-32-10z	Open Blocker Count is stable within the 2 percent noise band.

Run Timeline

Run	Generated	Posture	Score	Findings
omega-automated-test-run-2026-06-22t19-37-16z	2026-06-22T19:37:16Z	stable	61.00	0
beta-automated-test-run-2026-06-22t18-32-10z	2026-06-22T18:32:10Z	baseline	61.00	0
beta-real-validation-custom-metrics-20260606	2026-06-06T19:40:09Z	stable	61.00	1
beta-real-validation-20260606	2026-06-06T19:09:01Z	baseline	61.00	1

Why Ahead

No ahead signalsNo metrics improved in the latest comparable run.

Why Behind

SignalThin claim evidence: BETA's software checks are passing and the app can be trusted while it changes.
SignalThin claim evidence: BETA provides useful project-management controls for setup, todos, work, evidence, reports, and AI review.
SignalThin claim evidence: BETA can explain progress from real project work, todos, and evidence without mixing in unrelated domain metrics.

Working

No improving signalsRepeat scenarios after changes to identify where work is paying off.

Not Working

No current regressionsNo regressing metrics in the latest comparable run.

Metrics Used

These are the current gauges. Each one has a direction, a latest value, and, when possible, a baseline value from a comparable run.

Metric	Latest	Baseline	Status	Gauge
Unit Test Pass RatePercent of BETA unit tests passing during validation	100.0 %	100.0 %	unchanged	The metric has a real baseline, a repeat run, and a clear decision rule.
Code CoverageLine coverage reported by the imported coverage file (Cobertura XML or coverage.py JSON).	0 %	0 %	no-baseline	Higher is better; set a project floor (often 70-80 percent) and avoid regressions against the last import.
Test FailuresNumber of failing tests plus errors in the imported test report.	0 count	0 count	unchanged	Lower is better; zero failing tests is the target for a green build.
Dashboard Render SuccessWhether dashboard regeneration succeeds after evidence import	0 pass	0 pass	no-baseline	The metric has a real baseline, a repeat run, and a clear decision rule.
Project Control CoverageCoverage of setup, todo, work, evidence, and AI controls on project pages	0 %	0 %	no-baseline	The metric has a real baseline, a repeat run, and a clear decision rule.
Open Blocker CountOpen blocked todos or work blockers that need attention	0 count	0 count	unchanged	The metric has a real baseline, a repeat run, and a clear decision rule.

Metric Strategy

BETA uses metrics for three jobs: prove the project works, prove it is improving, and decide what work deserves attention next.

compare-and-improve

metric family4

Proof Quality

Separates real evidence from templates, demos, and unsupported claims.

Planning use: Do this first when real_count is zero or reports are quarantined. Examples: real_report_available, required_passed, repeatability, scenario_diversity

metric family4

Effectiveness

Shows whether the build does the thing it claims to do.

Planning use: Use this when deciding whether the core workflow is ready for broader testing. Examples: observations_accepted, station_features, mesh_routes, firmware_identity_fields_present

metric family4

Efficiency

Shows whether the build does the work fast enough and cheaply enough.

Planning use: Use this after correctness is credible, or when field constraints are tight. Examples: observation_throughput_per_s, query_p95_ms, db_bytes_per_observation, portal_html_bytes

metric family4

Reliability

Shows whether good results repeat instead of appearing once.

Planning use: Use this before calling an improvement durable or release-ready. Examples: failure_rate, test_pass_rate, firmware_boot_success_rate, repeat_depth

metric family4

Project Management

Shows whether the work loop is producing useful evidence or wasting time.

Planning use: Use this to decide what to focus on, stop doing, connect, or delegate. Examples: work evidence percent, blocked/rework time, connected sources, planned measurements

Metric Purpose Map

Use this table to understand what each metric proves and what project decision it should drive.

Metric	Question	Why Important	Planning Decision	Done When
Software Test Pass Rateunit_test_pass_rate	What does Software Test Pass Rate prove for this project?	Proves BETA is still correct while project-management and AI features are added.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	Pass rate holds at the project target and regressions are linked to specific work.
Code Coveragecode_coverage_percent	What does Code Coverage prove for this project?	Coverage shows how much of the code the passing tests actually exercise, so a green build is not hiding untested paths.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	Coverage holds above the project floor and does not regress against the last import.
Open Test Failurestest_failure_count	What does Open Test Failures prove for this project?	Failing tests point directly at what is broken right now and block any honest claim of done.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	Test failures reach zero and any new failure is linked to a specific change.
Dashboard Render Smokedashboard_render_success	What does Dashboard Render Smoke prove for this project?	The app has to rebuild and load project pages after evidence changes.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	Dashboard rebuild succeeds and the project page opens with the expected scoped data.
Project Control Coverageproject_control_coverage	What does Project Control Coverage prove for this project?	BETA needs no-command controls for setup, todos, work, evidence, reports, and AI management.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	Coverage rises when useful controls are added and browser-verified.
Blocker And Todo Flowopen_blocker_count	What does Blocker And Todo Flow prove for this project?	Open blockers explain what is behind schedule and what the project manager AI should focus on.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	Open blocker count trends down or each blocker has a named next action.
AI Recommendation Follow-Throughai_recommendation_follow_through	What does AI Recommendation Follow-Through prove for this project?	AI advice should become tracked actions whose impact can be measured later.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	High-value AI recommendations have a decision, owner, and result metric.
Issue And CI Historyci_failure_rate	What does Issue And CI History prove for this project?	Build failures, flaky tests, and unresolved issues are real workflow health signals.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	Failure rate and stale issue load trend down over repeated project cycles.
Test Pass Ratetest_pass_rate	Are basic software checks staying green?	Baseline correctness and regression signal.	Use this to protect known-good behavior before new experiments.	Metric appears in the dashboard with a baseline and trend.
Critical Workflow Success Ratecritical_workflow_success_rate	What does Critical Workflow Success Rate prove for this project?	Measures whether the product does what it claims.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	Metric appears in the dashboard with a baseline and trend.
P95 Latencyp95_latency	What does P95 Latency prove for this project?	Tracks user-facing or control-loop responsiveness.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	Metric appears in the dashboard with a baseline and trend.
Throughputthroughput	What does Throughput prove for this project?	Tracks capacity and efficiency.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	Metric appears in the dashboard with a baseline and trend.
Resource Costresource_cost	What does Resource Cost prove for this project?	Tracks efficiency across software and hardware.	Use it to decide whether the next step is testing, fixing, scaling, or stopping.	Metric appears in the dashboard with a baseline and trend.
Failure Ratefailure_rate	How often does the project fail or repeat the same problem?	Tracks stability and reliability.	Use this to decide whether to stabilize before adding scope.	Metric appears in the dashboard with a baseline and trend.

How To Use Metrics For Planning

SignalPick one project goal or claim.
SignalChoose the metric that would prove movement toward that goal.
SignalCollect one real baseline report or source input.
SignalMake one focused change or run one focused bench/field/test cycle.
SignalRepeat the same scenario and compare the metric to the baseline.
SignalUse the Manager screen to turn the result into the next priority, risk, or stop-doing item.

Project Goals Driving Metrics

SignalUse BETA to track and improve BETA itself
SignalMeasure whether BETA is becoming more useful, reliable, actionable, and reusable across projects
SignalIdentify missing project-management, QA, data-analysis, and AI-assistant capabilities
SignalKeep BETA usable as a no-command project manager for OMEGA and BETA itself
SignalTrack real tests, evidence reports, todos, work sessions, and AI manager output as measurable project data
SignalImprove BETA from concrete gaps found while using it on real projects

Data Version & Model Provenance

This records exactly what generated this page, what schemas were used, and whether an Ollama model contributed advisory analysis.

beta-2026-06-22t21-...

Data version beta-2026-06-22t21-24-40z-3c269fc11a 3c269fc11a44b0602fd692622f78d189d5c90134b4853b69a99e3d9ef90d310d

BETA app 0.2.0 Build Environment for Testing & Analytics

Analysis schema beta.analysis.v1 BETA deterministic engine

Generated 2026-06-22T21:24:40Z C:\Users\jdcap\Documents\Projects\BETA\.beta

AI model none AI not used; ollama model none; available=False; usable=False; source=none

Data policy deterministic source of truth Only accepted real reports are used for metrics, graphs, claims, progress, and AI analysis. Synthetic/demo/local proof reports are quarantined.

Real reports 4 Accepted reports used for metrics, graphs, claims, and progress.

Quarantined reports 0 Visible for audit but excluded from proof calculations.

Project Version Records

BETA Config: v3 (explicit) | Schema: beta.project.v1 Project AI plan: gemma4:latest | Profile: 2026-06-06T19:08:02Z | Plan: 2026-06-06T19:08:02Z

Data Used

This is the exact source trail behind the evidence screen. Scores are computed only from accepted real proof reports, then enriched with project plans when available.

Real reports used 4 Only accepted real reports are used for metrics, graphs, claims, progress, and AI analysis. Synthetic/demo/local proof reports are quarantined.

Quarantined reports 0 Excluded before metrics, graphs, claims, and AI analysis.

Latest report omega-automated-test-run-2026-06-22t19-37-16z C:\Users\jdcap\Documents\Projects\BETA\.beta\projects\omega\evidence\omega-automated-test-run-2026-06-22t19-37-16z\report.json

Latest generated 2026-06-22T19:37:16Z

Baseline report beta-automated-test-run-2026-06-22t18-32-10z C:\Users\jdcap\Documents\Projects\BETA\.beta\projects\beta\evidence\beta-automated-test-run-2026-06-22t18-32-10z\report.json

Comparison key automated-test-run:0h:0m The latest proof report is compared with the previous proof report that has the same scenario id, duration, and step size.

Quarantined Inputs

No quarantined reportsNo synthetic/demo proof reports were excluded from this page.

Project Planning Inputs

BETA Profile: C:\Users\jdcap\Documents\Projects\BETA\.beta\projects\beta\profile.json Plan: C:\Users\jdcap\Documents\Projects\BETA\.beta\projects\beta\plan.json AI plan: C:\Users\jdcap\Documents\Projects\BETA\.beta\projects\beta\ai-plan.json

Claims And Evidence

This is the relevance check: every tracked metric should support a claim that matters to the system being built.

criticalthin

55%

BETA's software checks are passing and the app can be trusted while it changes.

A project manager and analytics tool must not regress its own tests or dashboard rendering.

criticalgap

25%

BETA provides useful project-management controls for setup, todos, work, evidence, reports, and AI review.

The tool should help run a project, not only summarize proof reports after the fact.

highthin

57%

BETA can explain progress from real project work, todos, and evidence without mixing in unrelated domain metrics.

This is what makes BETA reusable across software, hardware, firmware, and mixed projects.

Claim Evidence Reasoning

Each claim below shows the deterministic reasoning chain: source report, signal evidence, metric deltas, and caveats.

BETA's software checks are passing and the app can be trusted while it changes. The claim has partial support and should not be treated as fully proven yet. Present signals: Required validation gates and Unit test pass rate. Missing signals: Advisory validation gates and Dashboard render check. Compared with baseline beta-automated-test-run-2026-06-22t18-32-10z, stable metrics: Unit Test Pass Rate.

thin 55%

Latest omega-automated-test-run-2026-06-22t19-37-16z | Baseline beta-automated-test-run-2026-06-22t18-32-10z | Extra report notes 1

present Required validation gates Required validation gates is present at 1. Required pass/fail gates are the minimum correctness proof. latest proof report overall gates: 1
missing Advisory validation gates Advisory validation gates is missing or zero in the latest report. Add this evidence before relying on the claim. latest proof report overall gates: 0
present Unit test pass rate Unit test pass rate is present at 100.0 %. Unit test pass rate shows whether BETA changes are preserving the validated software behavior. latest and baseline project report custom metrics: 100.0 %
missing Dashboard render check Dashboard render check is missing or zero in the latest report. Add this evidence before relying on the claim. latest project report custom metrics: missing
unchanged Unit Test Pass Rate Unit Test Pass Rate stayed within the 2 percent noise band at 100.0 %. This supports stability, not proven improvement. latest proof report compared with baseline beta-automated-test-run-2026-06-22t18-32-10z: 100.0 %
no-baseline Dashboard Render Success Dashboard Render Success is currently 0 pass, but no comparable baseline exists yet. This is current-state evidence only. latest proof report compared with baseline beta-automated-test-run-2026-06-22t18-32-10z: 0 pass

Caveats and gaps

Missing signal evidence: Advisory validation gates and Dashboard render check.
No comparable baseline exists for this claim yet.
Evidence score is below the adequate threshold.

BETA provides useful project-management controls for setup, todos, work, evidence, reports, and AI review. The claim has a clear evidence gap. Present signals: none. Missing signals: Project control coverage, Dashboard render check, and Advisory validation gates. Compared with baseline beta-automated-test-run-2026-06-22t18-32-10z, stable metrics: Open Blocker Count.

gap 25%

Latest omega-automated-test-run-2026-06-22t19-37-16z | Baseline beta-automated-test-run-2026-06-22t18-32-10z | Extra report notes 1

missing Project control coverage Project control coverage is missing or zero in the latest report. Add this evidence before relying on the claim. latest and baseline project report custom metrics: missing
missing Dashboard render check Dashboard render check is missing or zero in the latest report. Add this evidence before relying on the claim. latest project report custom metrics: missing
missing Advisory validation gates Advisory validation gates is missing or zero in the latest report. Add this evidence before relying on the claim. latest proof report overall gates: 0
no-baseline Project Control Coverage Project Control Coverage is currently 0 %, but no comparable baseline exists yet. This is current-state evidence only. latest proof report compared with baseline beta-automated-test-run-2026-06-22t18-32-10z: 0 %
unchanged Open Blocker Count Open Blocker Count stayed within the 2 percent noise band at 0 count. This supports stability, not proven improvement. latest proof report compared with baseline beta-automated-test-run-2026-06-22t18-32-10z: 0 count

Caveats and gaps

Missing signal evidence: Project control coverage, Dashboard render check, and Advisory validation gates.
No comparable baseline exists for this claim yet.
Evidence score is below the adequate threshold.

BETA can explain progress from real project work, todos, and evidence without mixing in unrelated domain metrics. The claim has partial support and should not be treated as fully proven yet. Present signals: Unit test pass rate. Missing signals: Project control coverage. Compared with baseline beta-automated-test-run-2026-06-22t18-32-10z, stable metrics: Open Blocker Count and Unit Test Pass Rate.

thin 57%

Latest omega-automated-test-run-2026-06-22t19-37-16z | Baseline beta-automated-test-run-2026-06-22t18-32-10z | Extra report notes 1

missing Project control coverage Project control coverage is missing or zero in the latest report. Add this evidence before relying on the claim. latest and baseline project report custom metrics: missing
present Unit test pass rate Unit test pass rate is present at 100.0 %. Unit test pass rate shows whether BETA changes are preserving the validated software behavior. latest and baseline project report custom metrics: 100.0 %
no-baseline Project Control Coverage Project Control Coverage is currently 0 %, but no comparable baseline exists yet. This is current-state evidence only. latest proof report compared with baseline beta-automated-test-run-2026-06-22t18-32-10z: 0 %
unchanged Open Blocker Count Open Blocker Count stayed within the 2 percent noise band at 0 count. This supports stability, not proven improvement. latest proof report compared with baseline beta-automated-test-run-2026-06-22t18-32-10z: 0 count
unchanged Unit Test Pass Rate Unit Test Pass Rate stayed within the 2 percent noise band at 100.0 %. This supports stability, not proven improvement. latest proof report compared with baseline beta-automated-test-run-2026-06-22t18-32-10z: 100.0 %

Caveats and gaps

Missing signal evidence: Project control coverage.
No comparable baseline exists for this claim yet.
Evidence score is below the adequate threshold.

Data-Quality Checks

OK Real proof report is available Accepted real reports: 4; quarantined demo/synthetic reports: 0.
OK Synthetic/demo reports are quarantined 0 report(s) were excluded from metrics before scoring.
OK Comparable baseline exists Needed to separate real progress from a single isolated run.
OK Multiple proof runs exist Repeated runs help expose noise and regressions.
OK More than one scenario is tracked A single scenario can overfit the evidence.
GAP Observation sample is large enough Latest run has 2 observations; larger samples make performance/storage claims stronger.
GAP Critical claims have adequate evidence Critical claims should not rely on thin evidence.
OK No material regressions in latest comparable run Latest comparable run has 0 regressed metrics.
GAP AI analyst reviewed current data AI review is advisory and does not inflate the deterministic data-quality score.

Current Evidence Gaps

GapLatest run has 2 observations; larger samples make performance/storage claims stronger.
GapCritical claims should not rely on thin evidence.
GapAI review is advisory and does not inflate the deterministic data-quality score.
GapClaim needs stronger evidence: BETA's software checks are passing and the app can be trusted while it changes.
GapClaim needs stronger evidence: BETA provides useful project-management controls for setup, todos, work, evidence, reports, and AI review.
GapClaim needs stronger evidence: BETA can explain progress from real project work, todos, and evidence without mixing in unrelated domain metrics.

Meaningful Data To Add

Repeatable scale runsRun the same scenario several times at larger observation counts to separate noise from real change.
Field-link timingTrack public/tunnel load time and API latency from a slower network path.
Reliability historyTrack failed runs, flaky checks, and repeated-regression frequency.
Human acceptanceTrack review/field-test signoff so proof connects to actual user readiness.

QA Matrix

Claims are turned into QA targets with priorities, current evidence strength, regressions, and the next test to run.

Priority	Claim	Evidence	Regressions	Next QA Test
P0	BETA's software checks are passing and the app can be trusted while it changes.Metrics: unit_test_pass_rate, dashboard_render_success	thin 55%	none	Repeat the matching scenario and add the missing signals listed in claim caveats.
P0	BETA provides useful project-management controls for setup, todos, work, evidence, reports, and AI review.Metrics: project_control_coverage, open_blocker_count	gap 25%	none	Repeat the matching scenario and add the missing signals listed in claim caveats.
P1	BETA can explain progress from real project work, todos, and evidence without mixing in unrelated domain metrics.Metrics: project_control_coverage, open_blocker_count, unit_test_pass_rate	thin 57%	none	Repeat the matching scenario and add the missing signals listed in claim caveats.

Time And Effort Focus

Inferred from proof reports, findings, backlog, and data-quality checks. Direct time-spend tracking requires issue, CI, or work-log imports.

Focus Categories

No categoriesNo active findings or backlog categories yet.

Where Time Looks Well Spent

SignalUse repeatable proof runs and the measurement plan; those create evidence that compounds over time.

Where Time May Be Wasted

SignalEvidence friction: Latest run has 2 observations; larger samples make performance/storage claims stronger.
SignalEvidence friction: Critical claims should not rely on thin evidence.
SignalEvidence friction: AI review is advisory and does not inflate the deterministic data-quality score.

Needed For Real Time Accounting

SignalIssue status and cycle time
SignalCI duration and flake rate
SignalManual test or bench-session duration
SignalMilestone estimates and actuals

Operating Plan

The latest comparable run is stable; expand evidence depth.

P0
Strengthen claim: BETA's software checks are passing and the app can be trusted while it changes. Current claim evidence is thin and affects trust in the system. Owner: QA/project lead | Impact: high | Confidence: medium Success: claim evidence score Evidence: Repeat the matching scenario and add the missing signals listed in claim caveats. Read the claim caveats and missing signals. Add the missing signal to the next proof run or project evidence import. Rerun the scenario and confirm the claim moves out of thin/gap status.
P0
Strengthen claim: BETA provides useful project-management controls for setup, todos, work, evidence, reports, and AI review. Current claim evidence is gap and affects trust in the system. Owner: QA/project lead | Impact: high | Confidence: medium Success: claim evidence score Evidence: Repeat the matching scenario and add the missing signals listed in claim caveats. Read the claim caveats and missing signals. Add the missing signal to the next proof run or project evidence import. Rerun the scenario and confirm the claim moves out of thin/gap status.
P1
Software Test Pass Rate Proves BETA is still correct while project-management and AI features are added. Owner: Engineering | Impact: high | Confidence: thin Success: Pass rate holds at the project target and regressions are linked to specific work. Evidence: A fresh matching proof report plus before/after metric comparison. Run the same scenario used by the comparable baseline. Capture report.json and import it into BETA. Check whether the target metric improved, stabilized, or regressed again. If it regresses again, profile the owning code path before adding new features.
P1
Dashboard Render Smoke The app has to rebuild and load project pages after evidence changes. Owner: Engineering | Impact: high | Confidence: thin Success: Dashboard rebuild succeeds and the project page opens with the expected scoped data. Evidence: A fresh matching proof report plus before/after metric comparison. Run the same scenario used by the comparable baseline. Capture report.json and import it into BETA. Check whether the target metric improved, stabilized, or regressed again. If it regresses again, profile the owning code path before adding new features.

Action Tracker

Actions are built from deterministic BETA guidance, manager risks, measurement gaps, and advisory AI output. Statuses are gated by real evidence availability.

beta.action_tracker.v1

Actions 44

tracked recommendations

Active 39

ready to work

Blocked 0

need real data first

Risks 3

manager risks

Avoid 2

guardrails

AI 0

advisory actions

Showing top 12 of 44 tracked actions for this scope.

Status	Action	Source	Metric	Next Step	Evidence Needed
activeP0	Import one accepted real proof reportRun or import one real bench, field, CI, hardware, or project proof report with no demo, synthetic, or local-proof markers.	Project todo ledgerdeterministic	project_todo_progressAccepted real report count is greater than zero and appears in the project evidence page.	Mark it doing, done with evidence, blocked with a blocker, or dropped with a reason.	Accepted real report count is greater than zero and appears in the project evidence page.
activeP0	Strengthen claim: BETA provides useful project-management controls for setup, todos, work, evidence, reports, and AI review.Current claim evidence is gap and affects trust in the system.	Deterministic operating plandeterministic	project_control_coverage, open_blocker_countclaim evidence score	Read the claim caveats and missing signals. Add the missing signal to the next proof run or project evidence import. Rerun the scenario and confirm the claim moves out of thin/gap status.	Repeat the matching scenario and add the missing signals listed in claim caveats.
activeP0	Strengthen claim: BETA's software checks are passing and the app can be trusted while it changes.Current claim evidence is thin and affects trust in the system.	Deterministic operating plandeterministic	unit_test_pass_rate, dashboard_render_successclaim evidence score	Read the claim caveats and missing signals. Add the missing signal to the next proof run or project evidence import. Rerun the scenario and confirm the claim moves out of thin/gap status.	Repeat the matching scenario and add the missing signals listed in claim caveats.
activeP1	AI Recommendation Follow-ThroughAI advice should become tracked actions whose impact can be measured later.	Deterministic operating plandeterministic	ai_recommendation_follow_throughHigh-value AI recommendations have a decision, owner, and result metric.	Run the same scenario used by the comparable baseline. Capture report.json and import it into BETA. Check whether the target metric improved, stabilized, or regressed again. If it regresses again, profile the owning code path before adding new features.	A fresh matching proof report plus before/after metric comparison.
activeP1	AI Recommendation Follow-ThroughAI advice should become tracked actions whose impact can be measured later.	Deterministic work guidancedeterministic	ai_recommendation_follow_throughHigh-value AI recommendations have a decision, owner, and result metric.	Record AI recommendations, mark which ones were tried, and connect them to metric movement.	High-value AI recommendations have a decision, owner, and result metric.
activeP1	AI Recommendation Follow-ThroughAI advice should become tracked actions whose impact can be measured later.	Measurement plandeterministic	ai_recommendation_follow_throughHigh-value AI recommendations have a decision, owner, and result metric.	Record AI recommendations, mark which ones were tried, and connect them to metric movement.	High-value AI recommendations have a decision, owner, and result metric.
activeP1	AI Recommendation Follow-ThroughThis planned metric is not active yet.	Manager metric backlogdeterministic	ai_recommendation_follow_throughHigh-value AI recommendations have a decision, owner, and result metric.	Add this metric to a real report, CI import, bench log, field note, or manual evidence record.	High-value AI recommendations have a decision, owner, and result metric.
activeP1	Connect AI AnalystOllama reviews weak evidence, missing measurements, experiment ideas, and next actions.	Missing data sourcedeterministic	aisource connected	Use auto-strong for serious reviews; keep deterministic metrics as the source of truth.	Use auto-strong for serious reviews; keep deterministic metrics as the source of truth.
activeP1	Connect AI Work SessionsAI session summaries show what an AI helped change, suggested, tested, or left uncertain.	Missing data sourcedeterministic	aisource connected	Import Codex/ChatGPT/session summaries as source_type=ai-session after meaningful project work.	Import Codex/ChatGPT/session summaries as source_type=ai-session after meaningful project work.
activeP1	Connect Field And Bench LogsPower, thermal, calibration, endurance, and human acceptance data prove real-world readiness.	Missing data sourcedeterministic	fieldsource connected	Create a simple CSV or report.json path for bench and field validation evidence.	Create a simple CSV or report.json path for bench and field validation evidence.
activeP1	Connect Quarantined Demo ReportsSynthetic, demo, smoke, fixture, and local proof reports stay visible for audit but are excluded from metrics.	Missing data sourcedeterministic	evidencesource connected	Replace quarantined reports with real evidence or mark real reports explicitly with metadata.data_authenticity=real.	Replace quarantined reports with real evidence or mark real reports explicitly with metadata.data_authenticity=real.
activeP1	Connect Sample ScaleLarger samples make performance, storage, and reliability claims harder to fake.	Missing data sourcedeterministic	evidencesource connected	Run proof scenarios at 100, 500, and 1000+ observations and compare the curves.	Run proof scenarios at 100, 500, and 1000+ observations and compare the curves.

AI can suggest actions, but BETA only treats metrics, imported inputs, proof reports, and work logs as evidence.

Metric Intelligence

Each metric now has risk, confidence, volatility, streaks, and a recommended action.

Metric	Risk	Latest	Trend	Volatility	Recommended Action
Code CoverageCoverage shows how much of the code the passing tests actually exercise, so a high pass rate is not hiding untested paths.	stablethin confidence	0 %best 0 %	no-baselinestreak R0 / I0	0 %gap 0 %	Keep this in the regression suite while focusing on weaker metrics.
Dashboard Render SuccessWhether dashboard regeneration succeeds after evidence import	stablethin confidence	0 passbest 0 pass	no-baselinestreak R0 / I0	0 %gap 0 %	Keep this in the regression suite while focusing on weaker metrics.
Open Blocker CountOpen blocked todos or work blockers that need attention	stablethin confidence	0 countbest 0 count	unchangedstreak R0 / I0	0 %gap 0 %	Keep this in the regression suite while focusing on weaker metrics.
Project Control CoverageCoverage of setup, todo, work, evidence, and AI controls on project pages	stablethin confidence	0 %best 0 %	no-baselinestreak R0 / I0	0 %gap 0 %	Keep this in the regression suite while focusing on weaker metrics.
Test FailuresFailures point directly at what is broken right now and what to fix before the next claim of done.	stablethin confidence	0 countbest 0 count	unchangedstreak R0 / I0	0 %gap 0 %	Keep this in the regression suite while focusing on weaker metrics.
Unit Test Pass RatePercent of BETA unit tests passing during validation	stablethin confidence	100.0 %best 100.0 %	unchangedstreak R0 / I0	0 %gap 0 %	Keep this in the regression suite while focusing on weaker metrics.

Workflow Health

Workflow Score 56.40 %

thin

Runs / Week 1.75

evidence velocity

Repeat Depth 2

matching scenario runs

Sources 60.00 %

connected evidence inputs

Active Metrics 21.40 %

planned metrics live

AI Review no

advisory analysis

Work Sessions 4

logged effort

Evidence Work 100.0 %

tracked work tied to proof

Blocked/Rework 0 %

effort friction

Open Todos 2

committed work

Todo Progress 50.00 %

done excluding dropped

Blocked Todos 0

plan blockers

Workflow Levers

SignalIncrease repeat depth to at least three matching runs per important scenario.
SignalConnect issue/CI, bench, field, or work-log sources so project-management claims stop being inferred.
SignalPromote planned metrics into active report fields.
SignalClose data-quality gaps before treating improvements as durable.
SignalRun a local AI review after deterministic analysis has current data.

Project Understanding

Project scan is connected across 1 project(s), with strongest signals in software, documentation.

BETA 20 source/firmware, 1 test, 0 evidence, 5 docs, 4 work logs, 2 open todos. Test/source 5 %. Keep project plans current as code, tests, and evidence evolve.

Stop Doing

Do not use demo data as proof Quarantined local/demo reports can be useful for UI testing, but they cannot prove project progress. Better: Use real project runs and keep the quarantined list as an audit trail.
Do not chase one-off numbers A single run can be noise; the dashboard needs repeated comparable runs before calling progress real. Better: Repeat the same scenario after each meaningful change and compare against the matching baseline.

Project Manager

The latest comparable run is stable; expand evidence depth.

Posture build

manager mode

Readiness 59.40 %

thin

Source Coverage 60.00 %

connected sources

Projects 1

tracked builds

Priorities 5

current actions

Risks 3

tracked manager risks

Data Gaps 6

sources to connect

Metrics Backlog 8

planned metrics

Inputs 17

project evidence files

Open Todos 2

committed work

Blocked Todos 0

plan blockers

Todo Progress 50.00 %

done excluding dropped

Work Logs 4

effort records

Workflow thin

health label

Project Todo Ledger

4 todo item(s) are tracked across 1 project(s): 2 open, 0 blocked, 2 done.

Todos 4

tracked commitments

Open 2

todo, doing, blocked

Doing 1

current focus

Blocked 0

needs decision

Overdue 0

past due

Done 2

completed

Progress 50.00 %

done excluding dropped

Active Todo Board

P0
Import one accepted real proof report BETA | doing | due none | owner Project owner ID: import-one-accepted-real-proof-report-2026-06-22t193143z Area: evidence Success: Accepted real report count is greater than zero and appears in the project evidence page. Blocker: none Evidence: none
P1
Connect issue and CI history BETA | todo | due none | owner Project owner ID: connect-issue-and-ci-history-2026-06-22t193208z Area: source coverage Success: Issue and CI sources are connected and visible in source coverage. Blocker: none Evidence: none

Recent Todo Changes

P1
Connect issue and CI history BETA | todo | due none | owner Project owner ID: connect-issue-and-ci-history-2026-06-22t193208z Area: source coverage Success: Issue and CI sources are connected and visible in source coverage. Blocker: none Evidence: none
P0
Import one accepted real proof report BETA | doing | due none | owner Project owner ID: import-one-accepted-real-proof-report-2026-06-22t193143z Area: evidence Success: Accepted real report count is greater than zero and appears in the project evidence page. Blocker: none Evidence: none
P1
Add project-specific metric labels and custom evidence rendering BETA | done | due none | owner BETA ID: add-project-specific-metric-labels-and-custom-evidence-r-2026-06-06t190938z Area: metrics Success: BETA page shows Unit Test Pass Rate, Dashboard Render Success, Project Control Coverage, and Open Blocker Count as first-class real metrics. Blocker: none Evidence: .beta/projects/beta/evidence/beta-real-validation-custom-metrics-20260606/report.json; BETA plan shows Unit Test Pass Rate at 100%; 23 tests pass
P1
Finish BETA project-management todo tracking BETA | done | due none | owner BETA ID: finish-beta-project-management-todo-tracking-2026-06-06t134802z Area: project management Success: Dashboard and project pages show todo metrics and tests pass. Blocker: none Evidence: python -m unittest discover -s tests; python -m compileall beta prooflab tests; browser QA

Action Tracker

Actions are built from deterministic BETA guidance, manager risks, measurement gaps, and advisory AI output. Statuses are gated by real evidence availability.

beta.action_tracker.v1

Actions 44

tracked recommendations

Active 39

ready to work

Blocked 0

need real data first

Risks 3

manager risks

Avoid 2

guardrails

AI 0

advisory actions

Showing top 12 of 44 tracked actions for this scope.

Status	Action	Source	Metric	Next Step	Evidence Needed
activeP0	Import one accepted real proof reportRun or import one real bench, field, CI, hardware, or project proof report with no demo, synthetic, or local-proof markers.	Project todo ledgerdeterministic	project_todo_progressAccepted real report count is greater than zero and appears in the project evidence page.	Mark it doing, done with evidence, blocked with a blocker, or dropped with a reason.	Accepted real report count is greater than zero and appears in the project evidence page.
activeP0	Strengthen claim: BETA provides useful project-management controls for setup, todos, work, evidence, reports, and AI review.Current claim evidence is gap and affects trust in the system.	Deterministic operating plandeterministic	project_control_coverage, open_blocker_countclaim evidence score	Read the claim caveats and missing signals. Add the missing signal to the next proof run or project evidence import. Rerun the scenario and confirm the claim moves out of thin/gap status.	Repeat the matching scenario and add the missing signals listed in claim caveats.
activeP0	Strengthen claim: BETA's software checks are passing and the app can be trusted while it changes.Current claim evidence is thin and affects trust in the system.	Deterministic operating plandeterministic	unit_test_pass_rate, dashboard_render_successclaim evidence score	Read the claim caveats and missing signals. Add the missing signal to the next proof run or project evidence import. Rerun the scenario and confirm the claim moves out of thin/gap status.	Repeat the matching scenario and add the missing signals listed in claim caveats.
activeP1	AI Recommendation Follow-ThroughAI advice should become tracked actions whose impact can be measured later.	Deterministic operating plandeterministic	ai_recommendation_follow_throughHigh-value AI recommendations have a decision, owner, and result metric.	Run the same scenario used by the comparable baseline. Capture report.json and import it into BETA. Check whether the target metric improved, stabilized, or regressed again. If it regresses again, profile the owning code path before adding new features.	A fresh matching proof report plus before/after metric comparison.
activeP1	AI Recommendation Follow-ThroughAI advice should become tracked actions whose impact can be measured later.	Deterministic work guidancedeterministic	ai_recommendation_follow_throughHigh-value AI recommendations have a decision, owner, and result metric.	Record AI recommendations, mark which ones were tried, and connect them to metric movement.	High-value AI recommendations have a decision, owner, and result metric.
activeP1	AI Recommendation Follow-ThroughAI advice should become tracked actions whose impact can be measured later.	Measurement plandeterministic	ai_recommendation_follow_throughHigh-value AI recommendations have a decision, owner, and result metric.	Record AI recommendations, mark which ones were tried, and connect them to metric movement.	High-value AI recommendations have a decision, owner, and result metric.
activeP1	AI Recommendation Follow-ThroughThis planned metric is not active yet.	Manager metric backlogdeterministic	ai_recommendation_follow_throughHigh-value AI recommendations have a decision, owner, and result metric.	Add this metric to a real report, CI import, bench log, field note, or manual evidence record.	High-value AI recommendations have a decision, owner, and result metric.
activeP1	Connect AI AnalystOllama reviews weak evidence, missing measurements, experiment ideas, and next actions.	Missing data sourcedeterministic	aisource connected	Use auto-strong for serious reviews; keep deterministic metrics as the source of truth.	Use auto-strong for serious reviews; keep deterministic metrics as the source of truth.
activeP1	Connect AI Work SessionsAI session summaries show what an AI helped change, suggested, tested, or left uncertain.	Missing data sourcedeterministic	aisource connected	Import Codex/ChatGPT/session summaries as source_type=ai-session after meaningful project work.	Import Codex/ChatGPT/session summaries as source_type=ai-session after meaningful project work.
activeP1	Connect Field And Bench LogsPower, thermal, calibration, endurance, and human acceptance data prove real-world readiness.	Missing data sourcedeterministic	fieldsource connected	Create a simple CSV or report.json path for bench and field validation evidence.	Create a simple CSV or report.json path for bench and field validation evidence.
activeP1	Connect Quarantined Demo ReportsSynthetic, demo, smoke, fixture, and local proof reports stay visible for audit but are excluded from metrics.	Missing data sourcedeterministic	evidencesource connected	Replace quarantined reports with real evidence or mark real reports explicitly with metadata.data_authenticity=real.	Replace quarantined reports with real evidence or mark real reports explicitly with metadata.data_authenticity=real.
activeP1	Connect Sample ScaleLarger samples make performance, storage, and reliability claims harder to fake.	Missing data sourcedeterministic	evidencesource connected	Run proof scenarios at 100, 500, and 1000+ observations and compare the curves.	Run proof scenarios at 100, 500, and 1000+ observations and compare the curves.

AI can suggest actions, but BETA only treats metrics, imported inputs, proof reports, and work logs as evidence.

Operating Metrics

Project profiles: 1 Each tracked build needs a setup profile, goals, source paths, and project type.
Project goals: 6 Goals tell BETA what outcomes the metrics are supposed to support.
Connected data sources: 9 / 15 Connected sources make the manager brief factual instead of guessy.
Project inputs: 9 Issues, CI, AI sessions, docs, bench logs, and field logs explain why metrics moved.
Local repo snapshots: 1 Snapshots show Git state, source/test/doc balance, CI presence, and project structure.
Snapshot source files: 20 Source volume helps size the project and compare testing/documentation balance.
Snapshot test files: 2 Test volume is an early signal for regression protection and QA maturity.
Dirty repos: 0 Dirty worktrees can make evidence hard to reproduce unless changes are explained.

Tracked Projects

betaplanned

BETA

software | software

Goal: Use BETA to track and improve BETA itself Files: 31 of 31 Inputs: 9 (ci: 2, docs: 5, repo: 2) Todos: 2 open, 0 blocked, 50.00 % done AI: gemma4:latest

Current Manager Priorities

P0
Strengthen claim: BETA's software checks are passing and the app can be trusted while it changes. Current claim evidence is thin and affects trust in the system. Owner: QA/project lead | Impact: high | Confidence: medium Success: claim evidence score Evidence: Repeat the matching scenario and add the missing signals listed in claim caveats. Read the claim caveats and missing signals. Add the missing signal to the next proof run or project evidence import. Rerun the scenario and confirm the claim moves out of thin/gap status.
P0
Strengthen claim: BETA provides useful project-management controls for setup, todos, work, evidence, reports, and AI review. Current claim evidence is gap and affects trust in the system. Owner: QA/project lead | Impact: high | Confidence: medium Success: claim evidence score Evidence: Repeat the matching scenario and add the missing signals listed in claim caveats. Read the claim caveats and missing signals. Add the missing signal to the next proof run or project evidence import. Rerun the scenario and confirm the claim moves out of thin/gap status.
P1
Software Test Pass Rate Proves BETA is still correct while project-management and AI features are added. Owner: Engineering | Impact: high | Confidence: thin Success: Pass rate holds at the project target and regressions are linked to specific work. Evidence: A fresh matching proof report plus before/after metric comparison. Run the same scenario used by the comparable baseline. Capture report.json and import it into BETA. Check whether the target metric improved, stabilized, or regressed again. If it regresses again, profile the owning code path before adding new features.
P1
Dashboard Render Smoke The app has to rebuild and load project pages after evidence changes. Owner: Engineering | Impact: high | Confidence: thin Success: Dashboard rebuild succeeds and the project page opens with the expected scoped data. Evidence: A fresh matching proof report plus before/after metric comparison. Run the same scenario used by the comparable baseline. Capture report.json and import it into BETA. Check whether the target metric improved, stabilized, or regressed again. If it regresses again, profile the owning code path before adding new features.
P2
Owner: Project manager | Impact: | Confidence: Success: Evidence:

Manager Risks

Missing AI Analyst Ollama reviews weak evidence, missing measurements, experiment ideas, and next actions. Mitigation: Use auto-strong for serious reviews; keep deterministic metrics as the source of truth. Owner: Project manager
Missing Field And Bench Logs Power, thermal, calibration, endurance, and human acceptance data prove real-world readiness. Mitigation: Create a simple CSV or report.json path for bench and field validation evidence. Owner: Project manager
Missing AI Work Sessions AI session summaries show what an AI helped change, suggested, tested, or left uncertain. Mitigation: Import Codex/ChatGPT/session summaries as source_type=ai-session after meaningful project work. Owner: Project manager

Project Controls

Project intake active 1 project profile(s) Keep project goals, paths, type, and claim list current.
Evidence intake active 9 ingested project input(s) Import the source that explains the latest work or blocker.
Measurement backlog active 14 planned measurement(s) Attach every major claim to a repeatable metric and test method.
Snapshot intelligence active 1 snapshot(s), 20 source file(s), 2 test file(s) Collect snapshots after meaningful repo changes and review dirty repo, CI, test, and doc signals.
Todo ledger active 4 todo item(s), 2 open, 0 blocked Keep active todos current and close done work with evidence.
Work ledger active 4 logged work session(s) Record minutes, category, outcome, evidence, blocker, and next step after meaningful work.
AI collaboration planned 0 AI session input(s) Capture AI decisions, discarded ideas, and tested recommendations.

Project Control

Run the manager loop, rebuild data, or ask BETA AI about BETA without leaving this page.

AI model

Proof reports path

Setup Gaps

SignalBETA: Add a CI workflow, import CI logs, or record repeatable local test evidence after each major change.
SignalSave useful AI work summaries with source_type=ai-session so BETA can connect suggestions to evidence.

Review Agenda

SignalWhat metric changed since the last comparable run, and what caused it?
SignalWhich current priority has the clearest success metric and evidence source?
SignalWhich missing data source would most improve the next decision?
SignalWhat AI recommendation is worth testing, and what evidence would prove it helped?
SignalWhich recent work sessions produced evidence, and which ones were blocked or unclear?
SignalWhat work should be stopped because it is not tied to a metric, risk, or milestone?

Next Checkpoints

SignalRun or schedule the next comparable scenario.
SignalUpdate project inputs after meaningful AI, issue, CI, bench, or field work.
SignalReview metric intelligence and close one P0/P1 action.
SignalExport a report before changing direction.

Data To Connect

Quarantined Demo Reports Synthetic, demo, smoke, fixture, and local proof reports stay visible for audit but are excluded from metrics. Replace quarantined reports with real evidence or mark real reports explicitly with metadata.data_authenticity=real.
Sample Scale Larger samples make performance, storage, and reliability claims harder to fake. Run proof scenarios at 100, 500, and 1000+ observations and compare the curves.
AI Analyst Ollama reviews weak evidence, missing measurements, experiment ideas, and next actions. Use auto-strong for serious reviews; keep deterministic metrics as the source of truth.
Source Connection Plan Source connectors define which project inputs BETA should expect for issues, CI, docs, bench, field, metrics, and AI sessions. Register planned sources, then connect real files or folders as they become available.
Field And Bench Logs Power, thermal, calibration, endurance, and human acceptance data prove real-world readiness. Create a simple CSV or report.json path for bench and field validation evidence.
AI Work Sessions AI session summaries show what an AI helped change, suggested, tested, or left uncertain. Import Codex/ChatGPT/session summaries as source_type=ai-session after meaningful project work.

Metric Backlog

P2
Code Coverage code_coverage_percent Coverage holds above the project floor and does not regress against the last import.
P1
Dashboard Render Smoke dashboard_render_success Dashboard rebuild succeeds and the project page opens with the expected scoped data.
P1
Project Control Coverage project_control_coverage Coverage rises when useful controls are added and browser-verified.
P1
AI Recommendation Follow-Through ai_recommendation_follow_through High-value AI recommendations have a decision, owner, and result metric.
P2
Issue And CI History ci_failure_rate Failure rate and stale issue load trend down over repeated project cycles.
P2
Test Pass Rate test_pass_rate Metric appears in the dashboard with a baseline and trend.
P2
Critical Workflow Success Rate critical_workflow_success_rate Metric appears in the dashboard with a baseline and trend.
P2
P95 Latency p95_latency Metric appears in the dashboard with a baseline and trend.

AI Collaboration

SignalSave AI session summaries as source_type=ai-session inputs.
SignalAsk BETA which AI suggestions have measurable evidence.
SignalPromote repeated AI recommendations into metrics, tests, or risks.

Project Setup Wizard

Make the project measurable by defining goals, claims, metrics, source inputs, and the evidence packet BETA should expect.

beta

Source Connection Plan

Connected 3

source connectors

Planned 3

still needs data

Coverage 50.00 %

source plan

ciconnected

CI And Test History

Tracks build health, test pass rate, coverage, and failed checks.

Metric: unit_test_pass_rate, code_coverage_percent, test_failure_count No path connected yet. Next: Use Import Test Results (beta import-tests) on a JUnit XML, pytest JSON, coverage report, or CI log after each meaningful build.

docsconnected

Requirements And Design Docs

Connects project goals, claims, acceptance criteria, and design decisions to evidence.

Metric: claim_coverage, acceptance_criteria_coverage No path connected yet. Next: Attach requirements, design notes, test matrices, decision logs, and acceptance criteria.

repoconnected

Local Project Snapshot

Tracks repository state, file mix, test/doc/config signals, CI hints, and dirty worktree risk.

Metric: source_file_count, test_file_count, doc_file_count, dirty_repo_count C:\Users\jdcap\Documents\Projects\BETA Next: Use Collect Project Snapshot after important work so BETA can compare source, test, docs, and Git-state changes.

ai-sessionplanned

AI Work Sessions

Tracks what AI suggested, what was tried, and whether later metrics improved.

Metric: ai_recommendation_follow_through, accepted_ai_actions No path connected yet. Next: Save useful AI summaries with recommendation, action, evidence, and result fields.

benchplanned

Bench Evidence

Tracks measured setup, hardware, performance, calibration, and validation runs.

Metric: bench_pass_rate, measured_failure_count, setup_time_minutes No path connected yet. Next: Attach bench CSVs, checklists, calibration logs, photos, or measured report.json files.

issuesplanned

Issue And Milestone History

Tracks planned work, stale work, blockers, cycle time, and release scope.

Metric: open_issue_count, blocked_issue_count, cycle_time_days No path connected yet. Next: Export GitHub/Jira issues or keep a simple CSV of issue status and milestone dates.

Project Goals & Setup

Use this setup guide to turn BETA from a dashboard into a project manager: goals define intent, metrics define proof, and evidence changes the plan.

software

Project BETA beta

Project file C:\Users\jdcap\Documents\Projects\BETA\.beta\projects\beta\project.json Customize goals, claims, metrics, paths, and evidence sources here.

Setup And Customization Commands

1
Create a separated project Creates a project page, project.json, scan profile, verification plan, and project-scoped dashboard data. .\dev.ps1 init-project -ProjectPath C:\path\to\build -ProjectName "Bench Prototype" -ProjectType hardware -Goal "Prove stable bench operation"
2
Add goals, claims, or custom metrics Custom goals and metrics become planning inputs, manager context, and measurement backlog. .\dev.ps1 configure-project -ProjectKey beta -Goal "Make field setup repeatable" -Metric "setup_success_rate|%|Tracks whether setup succeeds without manual rescue" -Claim "Operators can identify a node and trust its status"
3
Connect real project context Issues, CI, bench logs, field notes, docs, and AI sessions explain why a metric moved. .\dev.ps1 ingest-info -ProjectKey beta -InfoPath C:\path\to\issues-or-notes.csv -SourceType issues -Note "current backlog"
4
Record work and evidence Work logs tell the manager where time is productive, blocked, rework, or evidence-producing. .\dev.ps1 record-work -ProjectKey beta -WorkTitle "Validation pass" -WorkMinutes 45 -WorkStatus tested -WorkEvidence "report.json"
5
Run the manager loop Refreshes deterministic analysis, then asks the local AI to turn it into actionable project guidance. .\dev.ps1 refresh; .\dev.ps1 manage-ai -Model gemma4:latest

What You Can Customize

goals: 6 What the project is trying to improve or prove.
custom_claims: 3 Statements BETA should try to connect to evidence.
custom_metrics: 4 Project-specific measurements that should appear in the planning backlog.
evidence_sources: 9 Where useful proof can come from: CI, bench, field, docs, issues, AI sessions.
test_tools: 8 Tools or workflows that can produce proof.
paths: 1 Repo, hardware folder, docs folder, or build path BETA should scan.

Planning Loop

SignalGoal: decide what outcome matters.
SignalClaim: write the thing you want to be able to say is true.
SignalMetric: define the number, pass/fail, or evidence signal that would prove it.
SignalScenario: run the same test or validation path repeatedly.
SignalEvidence: import the report, CI result, bench log, field note, or operator signoff.
SignalManager decision: focus, stop, protect, or connect a missing source.

Configured Goals

SignalUse BETA to track and improve BETA itself
SignalMeasure whether BETA is becoming more useful, reliable, actionable, and reusable across projects
SignalIdentify missing project-management, QA, data-analysis, and AI-assistant capabilities
SignalKeep BETA usable as a no-command project manager for OMEGA and BETA itself
SignalTrack real tests, evidence reports, todos, work sessions, and AI manager output as measurable project data
SignalImprove BETA from concrete gaps found while using it on real projects

Custom Claims

SignalBETA can separate project pages and avoid mixing BETA and OMEGA data.
SignalBETA can ingest real project evidence and turn it into progress, QA, manager, and AI views.
SignalBETA can use todos and work logs to explain what is moving, blocked, or wasting time.

Custom Metrics

P1
Unit Test Pass Rate unit_test_pass_rate (%) Percent of BETA unit tests passing during validation
P1
Dashboard Render Success dashboard_render_success (%) Whether dashboard regeneration succeeds after evidence import
P1
Project Control Coverage project_control_coverage (%) Coverage of setup, todo, work, evidence, and AI controls on project pages
P1
Open Blocker Count open_blocker_count (count) Open blocked todos or work blockers that need attention

Manager Uses This For

Priority rankingGoals tell BETA which metric matters most when several gaps are open.
Evidence planningClaims and custom metrics become planned measurements and evidence capture prompts.
Scope controlPaths and project type keep the project page separated from other builds.
AI usefulnessAI reviews get better when goals, claims, and source inputs are explicit.

Project Manager Goals

Use BETA to track and improve BETA itselfBETA
Measure whether BETA is becoming more useful, reliable, actionable, and reusable across projectsBETA
Identify missing project-management, QA, data-analysis, and AI-assistant capabilitiesBETA
Keep BETA usable as a no-command project manager for OMEGA and BETA itselfBETA
Track real tests, evidence reports, todos, work sessions, and AI manager output as measurable project dataBETA
Improve BETA from concrete gaps found while using it on real projectsBETA

Setup Gaps

SignalBETA: Add a CI workflow, import CI logs, or record repeatable local test evidence after each major change.
SignalSave useful AI work summaries with source_type=ai-session so BETA can connect suggestions to evidence.

Data To Connect

Quarantined Demo Reports Synthetic, demo, smoke, fixture, and local proof reports stay visible for audit but are excluded from metrics. Replace quarantined reports with real evidence or mark real reports explicitly with metadata.data_authenticity=real.
Sample Scale Larger samples make performance, storage, and reliability claims harder to fake. Run proof scenarios at 100, 500, and 1000+ observations and compare the curves.
AI Analyst Ollama reviews weak evidence, missing measurements, experiment ideas, and next actions. Use auto-strong for serious reviews; keep deterministic metrics as the source of truth.
Source Connection Plan Source connectors define which project inputs BETA should expect for issues, CI, docs, bench, field, metrics, and AI sessions. Register planned sources, then connect real files or folders as they become available.
Field And Bench Logs Power, thermal, calibration, endurance, and human acceptance data prove real-world readiness. Create a simple CSV or report.json path for bench and field validation evidence.
AI Work Sessions AI session summaries show what an AI helped change, suggested, tested, or left uncertain. Import Codex/ChatGPT/session summaries as source_type=ai-session after meaningful project work.

Starter Todo Cycle

Use this to create the first measurable planning loop: real proof, connected project history, and field or bench validation.

connected

Todo And Work Control

Add committed work, update blockers, and log how time was spent so the manager view can explain progress and waste.

beta

Project Todo Ledger

4 todo item(s) are tracked across 1 project(s): 2 open, 0 blocked, 2 done.

Todos 4

tracked commitments

Open 2

todo, doing, blocked

Doing 1

current focus

Blocked 0

needs decision

Overdue 0

past due

Done 2

completed

Progress 50.00 %

done excluding dropped

Active Todo Board

P0
Import one accepted real proof report BETA | doing | due none | owner Project owner ID: import-one-accepted-real-proof-report-2026-06-22t193143z Area: evidence Success: Accepted real report count is greater than zero and appears in the project evidence page. Blocker: none Evidence: none
P1
Connect issue and CI history BETA | todo | due none | owner Project owner ID: connect-issue-and-ci-history-2026-06-22t193208z Area: source coverage Success: Issue and CI sources are connected and visible in source coverage. Blocker: none Evidence: none

Recent Todo Changes

P1
Connect issue and CI history BETA | todo | due none | owner Project owner ID: connect-issue-and-ci-history-2026-06-22t193208z Area: source coverage Success: Issue and CI sources are connected and visible in source coverage. Blocker: none Evidence: none
P0
Import one accepted real proof report BETA | doing | due none | owner Project owner ID: import-one-accepted-real-proof-report-2026-06-22t193143z Area: evidence Success: Accepted real report count is greater than zero and appears in the project evidence page. Blocker: none Evidence: none
P1
Add project-specific metric labels and custom evidence rendering BETA | done | due none | owner BETA ID: add-project-specific-metric-labels-and-custom-evidence-r-2026-06-06t190938z Area: metrics Success: BETA page shows Unit Test Pass Rate, Dashboard Render Success, Project Control Coverage, and Open Blocker Count as first-class real metrics. Blocker: none Evidence: .beta/projects/beta/evidence/beta-real-validation-custom-metrics-20260606/report.json; BETA plan shows Unit Test Pass Rate at 100%; 23 tests pass
P1
Finish BETA project-management todo tracking BETA | done | due none | owner BETA ID: finish-beta-project-management-todo-tracking-2026-06-06t134802z Area: project management Success: Dashboard and project pages show todo metrics and tests pass. Blocker: none Evidence: python -m unittest discover -s tests; python -m compileall beta prooflab tests; browser QA

Work Session Ledger

4 work session(s) are logged across 1 project(s), totaling 0.93 tracked hours.

Sessions 4

logged work blocks

Tracked Time 0.93 h

total effort

Productive 100.0 %

completed, shipped, tested, evidence, decided

Evidence Work 100.0 %

tied to proof

Blocked/Rework 0 %

friction signal

AI-Assisted 0.93 h

tracked AI use

Where Time Is Going

Category	Tracked Hours
implementation	0.58 h
validation	0.33 h
testing	0.02 h

Recent Work Sessions

tested
Ran tests: pytest BETA | testing | 0 min | 2026-06-22T18:32:10Z Outcome: 48/48 passed, 0 failed; exit 0 Evidence: C:\Users\jdcap\Documents\Projects\BETA\.beta\projects\beta\evidence\beta-automated-test-run-2026-06-22t18-32-10z\report.json Next: none recorded
tested
Wire real custom metrics and project AI rollup BETA | implementation | 35 min | 2026-06-06T19:54:37Z Outcome: Added metrics.custom evidence support, activated BETA software metrics in measurement planning and metric intelligence, and rolled successful per-project AI into the workspace when combined AI falls back. Evidence: python -m unittest discover -s tests: 23 OK; python -m compileall beta tests OK; beta-real-validation-custom-metrics-20260606 report; browser verified BETA and OMEGA pages Next: Repeat OMEGA underwater proof to create a comparable baseline and add a no-command custom metric entry control.
tested
Create BETA self-validation evidence BETA | validation | 20 min | 2026-06-06T19:09:57Z Outcome: Ran BETA unit tests and compile check, generated real CI JSON, recorded accepted BETA software evidence, and attached the CI source. Evidence: 22 unit tests passed; compileall passed; .beta/projects/beta/evidence/beta-real-validation-20260606/report.json Next: Render project-specific software metrics so BETA evidence reads naturally.
tested
Add work session ledger verification BETA | testing | 1 min | 2026-06-02T04:53:57Z Outcome: Ran unit tests and Python compile checks for the work-session ledger feature Evidence: python -m unittest discover -s tests: 17 tests OK; py_compile passed Next: Review BETA and OMEGA project pages after refresh

Deterministic Findings

No open findingsThe latest run has no deterministic findings.

Improvement Backlog

ClearNo generated actions for the latest run.

Todo Ledger Signals

SignalAdd due dates to active P0/P1 todos so BETA can tell what is ahead or behind.

Blocked Todos

No todosNo blocked todos are currently recorded.

Work Ledger Signals

SignalNo major blocked or rework pattern is visible in the current work ledger.

Work Ledger Recommendations

SignalKeep using the work ledger and compare effort patterns against future metric movement.

Operating Commands

.\dev.ps1 refresh
.\dev.ps1 configure-project -ProjectKey beta -Goal "Prove field readiness" -Metric "field_setup_success_rate|%|Tracks setup success"
python -m beta record-todo beta --title "Next project task" --priority P1 --status todo
python -m beta update-todo beta TODO_ID --status done --evidence "test or report link"
.\dev.ps1 record-work -ProjectKey beta -WorkTitle "Project work session" -WorkMinutes 30 -WorkStatus completed -WorkOutcome "What changed"
.\dev.ps1 evidence-template -ProjectKey beta -ScenarioId bench-validation
.\dev.ps1 record-evidence -ProjectKey beta -ScenarioId bench-validation -CollectionType bench -RequiredPassed -ObservationsAccepted 100
.\dev.ps1 ai -Model gemma4:latest
.\dev.ps1 plan-ai -ProjectPath C:\path\to\build -ProjectName "Build Name"

Useful Project Signals To Add Next

Issue and milestone statusImport open issues, closed issues, milestone dates, and release notes.
CI historyTrack build pass rate, test duration, flaky tests, and coverage movement.
Hardware bench logsTrack power draw, thermal rise, calibration, endurance, and failure rate.
Human validationTrack review signoff, field-test notes, defect severity, and acceptance status.