01 / Run It Locally
Start with the executable path, then read the claim.
The repository includes a prebuilt fluno_runtime wheel and Windows x86_64 DLL payloads for the s, m, and l continuous-inference artifacts. No signup, service token, or compiler checkout is required for the local run path.
Expected scope: scalar inputs, fail-closed validation, digest output, and an 11-element partial_summary_vector. This is not a Python compiler and not a full tensor ABI.
pip install ./fluno_runtime-0.1.0-py3-none-any.whl
python customer_package_sample_no_binary/l/examples/validate_and_bench.py customer_package_sample_no_binary/l/
Reproduction Note
The bundled live payloads are Windows x86_64 .dll artifacts. Linux users need the equivalent .so payload build with matching manifests. The public package intentionally excludes compiler internals and generated full MLIR dumps.
02 / Phase 1B Systems Baselines & Telemetry
Measured row first: 84.673 ms PyTorch vs 4.061 ms Fluno hot_vector.
The Phase 1B benchmark isolates a continuous inference state-estimation loop. The important comparison is not a synthetic kernel. It is the delta between a Python/PyTorch optimized path and a compiled artifact call that still returns a materialized partial summary vector.
| Implementation | Scope | Median latency | Relative position | Validation |
|---|---|---|---|---|
| PyTorch Optimized | optimized eager repeated path | 84.673 ms | 1.00x baseline | digest true / vector true / max error 0 |
| Fluno Artifact Runtime | hot_vector_repeated, materialized partial summary vector | 4.061 ms | 20.85x faster than PyTorch | digest true / vector true / max error 0.0 |
| Fluno Artifact Runtime | hot_run_repeated, artifact run path approximation | 7.245 ms | 11.69x faster than PyTorch | digest true / vector true / max error 0.0 |
| Rust Hand-Written Ref | systems reference implementation | 3.313 ms | Fluno is 1.23x slower | digest true / vector true / max error 0.0 |
| C++ Hand-Written Ref | systems reference implementation | 3.083 ms | Fluno is 1.32x slower | digest true / vector true / max error 0.0 |
Honest Disclosure
Fluno does not claim to beat expert handwritten Rust or C++ in this row. It loses narrowly to both. The interesting part is the operating boundary: from Python, with one controlled artifact load and fail-closed validation, the runtime reaches native-class latency without shipping compiler internals.
03 / The Core Problem
For stateful AI loops, infrastructure spend increasingly becomes a compiler/runtime problem.
Compute availability is a practical constraint for high-volume AI systems. If a product path repeatedly burns scheduling overhead inside Python-level control flow, the cost compounds across tokens, sessions, agents, and model-adjacent services.
Fluno treats orchestration overhead as a first-class performance target. The product boundary is not a Python-compatible language. It is a runtime contract for calling ahead-of-time compiled artifacts from existing Python applications.
torch.compile and LibTorch are powerful when a stable tensor graph dominates execution. They are less decisive when continuous inference loops combine scalar state, dynamic control flow, recurrent estimation, and repeated boundary crossings. Graph breaks turn a compiler promise into runtime negotiation.
That negotiation is the Orchestration Tax: the latency and infrastructure cost emitted by repeatedly coordinating execution rather than executing the hot region as a native, validated call target.
04 / Infra Cost Model
Amdahl is the cost model when the hot loop dominates the fleet.
The model is intentionally conservative. It does not claim that every server dollar disappears. It states the condition under which the claim becomes testable: when a measured hot region dominates runtime by ratio p, and Fluno accelerates that region by factor s, the new steady-state cost ratio is:
For the L-size continuous inference row, s = 20.85 against the optimized PyTorch path. If a production service spends 95% of its time inside the same class of loop, the model converges toward roughly a 90% reduction in the compute slice addressed by the artifact.
76.2% theoretical reduction for the measured hot region class.
85.7% theoretical reduction when orchestration dominates the loop.
90.4% theoretical reduction; the fleet economics become visible.
Electricity, capex amortization, cloud commitments, GPU scarcity, and scheduler occupancy all inherit from the same denominator: useful work per wall-clock interval. Removing orchestration tax increases that denominator without asking the customer to rewrite the product in C++.
05 / SDK Architecture & Fail-Closed Governance
Closed compiler, inspectable artifact boundary.
The artifact runtime is deliberately small. The customer receives a package that can be loaded, validated, run, and benchmarked. The compiler, lowering passes, region extraction logic, and generated MLIR full dumps do not ship in the Python SDK.
The package is governed as a product surface rather than a research dump. Runtime validation happens before library loading; if a field, hash, license policy, platform target, or symbol does not match, the SDK fails closed.
import fluno_runtime as fluno
runner = fluno.load_artifact("./production_region.fluno_artifact", validate=True)
summary_vector = runner.run_vector(steps=100, seed=610001)
- Manifest integrity Artifact metadata is bound by SHA-256 sidecars and embedded manifest hashes.
- Library hash validation Digest and vector libraries are hashed before
ctypescan load them. - Schema binding Input schema, output schema, validation profile, ABI signature, and vector length are cross-checked.
- Expiry policy
expires_atis parsed and enforced; invalid formats fail closed. - Platform gate Backend, OS, architecture, and target metadata must match the runtime environment.
- Symbol resolution Only manifest-declared symbols are resolved. User-provided arbitrary symbol calls are not exposed.
- No local path leakage Package text is scanned for local absolute paths before execution.
- Benchmark scope labels
hot_digest,hot_vector, andhot_runremain separated to avoid overstating customer-path speed.
What this is / what this is not
| This is | This is not |
|---|---|
| A Python runtime for calling precompiled Fluno artifacts. | A Python-compatible language. |
| A demo package shape that does not ship compiler internals. | An open-source compiler distribution. |
| Compiled execution for selected hot regions. | Automatic conversion of an entire Python application. |
| Validated partial summary vector output. | A completed full state vector ABI. |
| A measured 20.85x L-size hot_vector result against optimized PyTorch. | A claim to beat handwritten Rust or C++. |
06 / Discussion / Why Closed Source?
The core compiler is the asset.
Fluno's source-level region extraction, MLIR lowering strategy, ABI synthesis, validation hardening, and cache discipline are closed intellectual property. This boundary is deliberate: customers can audit the artifact contract without receiving the compiler internals.
The repository can expose customer_package_sample_no_binary, manifests, schemas, file manifests, validation profiles, benchmark logs, and SDK source without exposing compiler internals. Systems engineers can inspect the contract, reproduce package validation, and debate the scope of the performance claim.
The intended discussion is precise: where does Python orchestration tax dominate, what percentage of fleet runtime is hot-loop bound, and how much verified native execution can be inserted without destabilizing the existing product stack?
07 / References & Reproducibility
Claims are bounded by artifacts, not slogans.
The runtime claim is intentionally narrow: precompiled Fluno artifacts can be loaded, validated, run, and benchmarked from Python without shipping Fluno compiler internals. The cost claim is an Amdahl-law projection conditioned on the measured hot region dominating runtime.
- OpenAI: Building the compute infrastructure for the Intelligence Age.
- Runnable package: customer_package_sample_no_binary/l/.
- Runtime wheel: fluno_runtime-0.1.0-py3-none-any.whl.
- Manifest and hash files: manifest.json, artifact_manifest.sha256, and bundle_file_manifest.sha256.