MidGARD is a Rust command-line tool suite that transforms TLA+ formal specifications into actionable regression test plans. The process is model-driven: the TLA+ specification guides the pipeline from model checking and graph generation through test sequence output. This bridges the gap between formal modeling and practical test engineering by automating the creation of test sequences from formally specified and model-checked system behavior.
The tool addresses a core challenge at SanDisk: even when a generated state graph is finite, cycles and branching behavior can produce an extremely large number of possible execution paths. These paths are too numerous for engineers to enumerate or evaluate manually. MidGARD replaces intuition-driven test planning with algorithmic analysis, producing a finite, high-confidence set of test sequences that cover the state space comprehensively.
The system is organized as a four-stage pipeline, where each stage is a distinct module that can also be invoked independently. The stages are designed to flow sequentially from specification to deployable test plan:
| # | Stage | Description |
|---|---|---|
| 1 | Model Checking | Accepts a TLA+ specification and invokes the TLC model checker as a subprocess. If the specification is successfully model checked, TLC generates a Graphviz DOT file representing the explored state-transition graph. For specifications with large state spaces, VIEW and ALIAS abstractions can be applied to reduce complexity before DOT generation. |
| 2 | Graph Preprocessing | Parses the raw DOT file into an in-memory directed graph structure (likely using the petgraph crate). Cleans the graph by removing self-referencing edges (artifacts of TLA+ stuttering invariance), optionally collapsing equivalent states, and stripping unreachable nodes. The result is a lean, analysis-ready digraph. |
| 3 | Analysis & Generation | The core algorithmic engine. Applies directed-graph analysis techniques—including strongly connected component decomposition (Tarjan’s), path enumeration, loop detection, and coverage algorithms—to generate test sequences. Supports full-coverage mode (all nodes and edges) and n-test mode (maximally distinct routes). All generation is seeded for reproducibility. |
| 4 | Ordering & Output | Reorders the generated test sequences by a user-chosen criterion (e.g., random, coverage-weighted, edge-diversity) so that any prefix of the list is a useful subset. Outputs a structured, machine-readable list of test sequences for deployment into SanDisk’s regression workflows. |
Every run of the test generation engine accepts a seed value that deterministically controls all algorithmic decisions (path selection, tie-breaking, randomization). A given seed always produces the same output for the same graph. Users can also supply a “seed set”—a collection of seeds chosen to produce maximally distinct test suites—enabling teams to partition their regression efforts across complementary test plans.
Defining what makes two test sequences “distinct” is a central open question. Our initial approach treats distinctness as a function of edge-set overlap: two routes are more distinct when they share fewer edges. We will also explore node-coverage diversity, loop-participation metrics, and other graph-theoretic distance measures as the project matures.
Each pipeline stage is a standalone Rust module with a well-defined interface. This allows SanDisk to swap in alternative algorithms (e.g., a different coverage strategy), add new preprocessing steps, or integrate additional output formats without modifying the rest of the pipeline.
MidGARD is built using a focused set of tools chosen for their reliability, performance, and support for formal modeling and graph‑based analysis. Each technology plays a specific role in enabling automated test generation from TLA+ specifications.
We use TLA+ to formally describe system behavior in a mathematically precise way, ensuring that all generated tests originate from verified system logic. The VSCode TLA+ extension supports authoring, validating, and maintaining these specifications throughout development.
Rust powers the MidGARD toolchain, giving us memory‑safe performance for graph processing and deterministic test generation. Cargo and the petgraph crate allow us to build a modular, efficient pipeline capable of handling large state spaces.
TLC explores all valid behaviors of a TLA+ specification and produces the raw state graph that MidGARD analyzes. We run TLC using OpenJDK to ensure compatibility, reproducibility, and freedom from Oracle licensing restrictions.
Graphviz generates the DOT graphs that represent system state transitions, which we then clean and analyze inside MidGARD. The Graphviz Interactive Preview extension helps us inspect and validate graph structure during development.
VSCode is our primary development environment, supported by Rust‑Analyzer, TLA+, Graphviz Preview, and Live Share. These tools streamline collaboration, debugging, and consistent development across the team.
We rely on Cargo extensions and Rust’s ecosystem to manage dependencies, run tests, and support graph‑analysis workflows. This tooling keeps the codebase maintainable and ready for future extensions.
As MidGARD moves toward final implementation and sponsor handoff, several areas remain important for refinement and possible future extension:
Refining the definition of “distinct” test sequences using edge overlap, node coverage, loop participation, or a composite graph metric.
Improving support for very large graphs through abstraction, filtering, or approximation strategies when exhaustive analysis is not practical.
Finalizing output formats such as CSV and JSON so generated test sequences can be integrated into SanDisk’s testing workflows.
Adding optional metadata to generated tests, such as coverage summaries, priority nodes, or path constraints.
Exploring a time-permitting AI-assisted feature that could translate generated test sequences into more readable action steps for test engineers.
These refinements will continue to be evaluated through sponsor feedback, technical testing, and final documentation as the team prepares MidGARD for delivery.