But How?
A deep dive into how Stinger's ML idle prediction works — what the models are, how they're trained, and the design decisions behind them.
The big picture
Stinger's flywheels take time to spin up. If the blaster could predict you're about to pull the trigger, it can start the motors early so they're at full speed the moment you fire. That's what the ML idle system does: it watches your hand movements via an IMU (accelerometer + gyroscope) and continuously asks "is this person about to shoot in the next 100–600ms?"
Two models are available on-device — a fast logistic regression (LR) and a more capable multi-layer perceptron (MLP). You choose which one to use in the Motor → Idling menu (ML:LR or ML:MLP). Only one model runs at a time. Having both lets you compare: the LR is simpler and faster to train, while the MLP can pick up on subtler motion patterns at the cost of slightly more computation.
Data collection
When you activate Start ML Recording, the firmware logs every IMU sample plus the trigger state to onboard flash at 100 Hz. Each sample is 17 bytes:
| Field | Type | Description |
|---|---|---|
| timestamp | uint32 | Milliseconds since boot (millis()) |
| ax, ay, az | int16 × 3 | Accelerometer (raw LSB) |
| gx, gy, gz | int16 × 3 | Gyroscope (raw LSB) |
| trigger | uint8 | 0 = not pulled, 1 = pulled |
At 17 bytes/sample × 100 Hz, you get roughly 14–15 minutes of recording in the available flash. The log is wiped on every reboot, so each recording session is fresh.
Shot detection
Training needs labeled examples: "a shot happened here" vs "nothing happened here." We detect shots by finding rising edges in the trigger signal (0 → 1 transitions). Shots closer than 1 second apart are merged to avoid counting recoil bounce or rapid double-taps as separate events.
Each detected shot becomes a positive training window: the 500ms of IMU data ending ~100ms before the trigger pull. The idea is that your hand was already moving into position during that window — that's the motion pattern we want the model to recognize.
Negative windows are sampled from regions far away from any shot (at least 600ms before and 1 second after each trigger edge), representing normal aiming, walking, or idle movement that should not trigger a spin-up.
Feature engineering
Raw IMU windows are 50 samples × 6 channels = 300 values. Rather than feeding all of these directly, we compute summary statistics that capture the shape of the motion. Two feature sets are used:
Summary features (LR) — 18 values
| Feature | Count | What it captures |
|---|---|---|
| mean per axis | 6 | Average orientation / motion direction |
| std per axis | 6 | How shaky or dynamic the motion is |
| abs-max per axis | 6 | Peak intensity of movement |
Rich features (MLP) — 30 values
| Feature | Count | What it captures |
|---|---|---|
| mean per axis | 6 | Same as above |
| std per axis | 6 | Same as above |
| abs-max per axis | 6 | Same as above |
| abs-mean per axis | 6 | Average magnitude regardless of direction |
| accel magnitude (mean, std, max) | 3 | Combined acceleration intensity |
| gyro magnitude (mean, std, max) | 3 | Combined rotation intensity |
The magnitude features (sqrt of sum-of-squares across xyz) make the MLP less sensitive to how you hold the blaster — rotation in any direction contributes equally.
The two models
Logistic Regression (LR)
The simplest possible classifier: a weighted sum of the 18 summary features, passed through a sigmoid to produce a probability between 0 and 1. Mathematically:
Training finds the 18 weights + 1 bias that best separate "about to shoot" from "not shooting." We use gradient descent with L2 regularization to prevent overfitting on small datasets.
| Property | Value |
|---|---|
| Parameters | 19 weights + 36 scaler values (220 bytes total) |
| Features | 18 (summary) |
| Inference cost | ~18 multiply-adds + 1 sigmoid |
| Training | < 100ms in browser |
Multi-Layer Perceptron (MLP)
A small neural network with two hidden layers. It can learn nonlinear patterns that logistic regression misses — like "high rotation combined with sudden acceleration" being predictive even when neither alone is.
ReLU activation (max(0, x)) in the hidden layers lets the network model nonlinear decision boundaries. The final sigmoid squashes the output to a probability. Training uses the Adam optimizer with binary cross-entropy loss.
| Property | Value |
|---|---|
| Parameters | 4,097 weights + 60 scaler values (16.2 KB total) |
| Architecture | 30 → 64 → 32 → 1 |
| Features | 30 (rich) |
| Inference cost | ~4K multiply-adds |
| Training | ~1–3 seconds in browser |
Standardization
Before feeding features to either model, each feature is standardized: the training mean is subtracted and the result is divided by the training standard deviation. This puts all features on the same scale (roughly zero-centered, unit variance) which is critical for both gradient descent convergence and for the models treating all axes equally.
The scaler parameters (mean and scale per feature) are stored alongside the model weights and applied at inference time on the device, so the firmware always normalizes features exactly the same way training did.
On-device inference
The RP2040 runs inference on Core 0 (the slow/UI core) while the 3200 Hz PID motor control loop runs on Core 1. Core 1 pushes IMU samples into a sliding window ring buffer at 100 Hz. Meanwhile, Core 0 runs the prediction loop at ~100 Hz:
- Copies the current 50-sample window from the ring buffer
- Computes the feature vector (summary for LR, rich for MLP)
- Applies the stored scaler (subtract mean, divide by scale)
- Runs the forward pass (dot product for LR, matrix multiplies for MLP)
- If p > threshold → signal Core 1 to spin up flywheels
The entire inference pipeline takes well under 1ms on the RP2040 — fast enough to run every sample with margin to spare.
Model file format (MLMD)
Models are stored as compact binary blobs with a 24-byte header followed by the payload (all float32, little-endian):
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 4 | magic | Always "MLMD" |
| 4 | 2 | version | Currently 1 |
| 6 | 2 | windowSamples | Must be 50 |
| 8 | 1 | modelType | 0 = LR, 1 = MLP |
| 9 | 1 | reserved | Zero |
| 10 | 2 | features | 18 (LR) or 30 (MLP) |
| 12 | 2 | h1 | Hidden layer 1 size (64 for MLP, 0 for LR) |
| 14 | 2 | h2 | Hidden layer 2 size (32 for MLP, 0 for LR) |
| 16 | 4 | payloadBytes | Size of everything after the header |
| 20 | 4 | payloadCrc32 | CRC32 over payload bytes |
The CRC32 uses the standard IEEE polynomial (same as zlib). On load, the firmware verifies magic, version, window size, architecture dimensions, payload size, and CRC — if anything is off, the model is rejected and factory defaults are used instead.
In-browser training
The entire training pipeline runs in your browser — no server, no Python, no install. Here's what happens when you click "Connect to Stinger":
- Pull log data — The binary log is read from the device over Web Serial.
- Parse — The 17-byte samples are decoded into arrays of timestamps, IMU values, and trigger states.
- Detect shots — Rising edges in the trigger signal are found and filtered.
- Extract dataset — Positive windows (pre-shot) and negative windows (far from shots) are cut out.
- Featurize — Summary (18) and rich (30) feature vectors are computed for each window.
- Scale — StandardScaler is fit on the training data, then applied.
- Train LR — Gradient descent with L2 regularization, ~5000 iterations.
- Train MLP — Adam optimizer with binary cross-entropy, ~800 iterations.
- Build MLMD — Weights, scaler params, and CRC are packed into the binary format.
- Upload — The MLMD blobs are sent to the device over the same serial connection and loaded into RAM.
The heaviest step (MLP training) typically completes in 1–3 seconds on any modern browser. No GPU needed — the dataset is small enough that plain JavaScript is fast.
Design decisions
- Why two models instead of just the MLP?
- Different people get better results with different models. The LR is simpler — it works well when your pre-shot motion is distinctive along a single axis (e.g. a consistent raise-and-aim). The MLP can learn nonlinear combinations across axes, which helps when your aiming style is more complex. We train both and let you compare via the Motor → Idling menu (ML:LR vs ML:MLP) so you can pick whichever feels better.
- Why 500ms windows?
- Empirically, the preparatory motion before a trigger pull starts about 300–600ms ahead. 500ms (50 samples at 100 Hz) captures most of the relevant motion without including too much unrelated background movement. Shorter windows miss the early part of the aiming gesture; longer windows dilute the signal with noise.
- Why 100ms lead time?
- The prediction window ends 100ms before the trigger pull, not at the pull itself. This ensures the model doesn't cheat by seeing the actual trigger motion (which would show up as a sharp spike in the IMU). The 100ms margin means flywheels start spinning at least 100ms before you fire.
- Why personal models instead of a universal one?
- Everyone holds and aims differently. A model trained on one person's data performs poorly for another. Personal training adapts to your specific grip, aiming style, and movement patterns. We ship factory defaults trained on generic data, but personal models are significantly more accurate.
- Why 64→32→1 for the MLP?
- Small enough to run comfortably on an RP2040 (~4K multiply-adds), large enough to model the nonlinear interactions between axes. We tried smaller networks (32→16→1) and they underfit; larger ones (128→64→1) didn't improve accuracy and wasted RAM. The 64→32→1 architecture hits the sweet spot for this problem size.
- Why summary features instead of raw samples?
- Raw windows are 300 values — too many features for the small training datasets we get (typically 20–100 shots). Feature engineering compresses the relevant information into 18–30 values, making the models far more robust to overfitting. The features are also rotation-invariant thanks to the magnitude statistics, which helps generalize across different holding angles.
- Why in-browser instead of a cloud service?
- Privacy (your motion data never leaves your machine), no account needed, no internet dependency for the training step, works offline after first load, and zero infrastructure cost. The training is lightweight enough that JavaScript handles it easily.
- Why CRC32 in the model file?
- Serial transmission can corrupt data. A single flipped bit in a weight could make the model produce garbage predictions (and therefore spin up flywheels randomly). The CRC32 check on load ensures the model is exactly as trained — if corruption occurred during upload, the firmware falls back to factory defaults.
- What happens if the model is bad?
- The firmware always has factory-trained default weights compiled in. If a personal model file is missing, corrupt, or fails validation, factory defaults are used automatically. You can explicitly revert to factory defaults using the "Reset to factory model" button in the web tool (requires a serial connection). You can also re-record and re-train at any time — the log is wiped on every reboot, so each session starts clean.
What to expect
With a good recording session (5+ minutes, 20+ shots mixed with plenty of non-shooting movement):
- Accuracy: Both models should correctly identify 80–95% of pre-shot windows while keeping false positive rate under 5–10%.
- Flywheel behavior: You'll notice the flywheels beginning to spool up slightly before you fire. The spin-up window is short — they won't stay on indefinitely if you don't fire.
- Latency reduction: Instead of waiting for full spin-up after trigger pull, darts launch nearly instantly because the motors are already at speed.
- False positives: Occasional unnecessary spin-ups will happen. Aggressive aiming motions that mimic pre-shot patterns can trigger them. More training data (especially negatives) helps. If one model gives too many false positives, try the other.
If results aren't great, record a longer session with more varied movements. The models improve dramatically with more negative examples (non-shooting motion).