← Back to personalization tool

But How?

A deep dive into how Stinger's ML idle prediction works — what the models are, how they're trained, and the design decisions behind them.

The big picture

Stinger's flywheels take time to spin up. If the blaster could predict you're about to pull the trigger, it can start the motors early so they're at full speed the moment you fire. That's what the ML idle system does: it watches your hand movements via an IMU (accelerometer + gyroscope) and continuously asks "is this person about to shoot in the next 100–600ms?"

IMU (100 Hz) ──▸ sliding window (50 samples = 500ms) ──▸ featurize ──▸ model(s) ──▸ p(shoot) ──▸ if p > threshold → spin up flywheels

Two models are available on-device — a fast logistic regression (LR) and a more capable multi-layer perceptron (MLP). You choose which one to use in the Motor → Idling menu (ML:LR or ML:MLP). Only one model runs at a time. Having both lets you compare: the LR is simpler and faster to train, while the MLP can pick up on subtler motion patterns at the cost of slightly more computation.

Data collection

When you activate Start ML Recording, the firmware logs every IMU sample plus the trigger state to onboard flash at 100 Hz. Each sample is 17 bytes:

FieldTypeDescription
timestampuint32Milliseconds since boot (millis())
ax, ay, azint16 × 3Accelerometer (raw LSB)
gx, gy, gzint16 × 3Gyroscope (raw LSB)
triggeruint80 = not pulled, 1 = pulled

At 17 bytes/sample × 100 Hz, you get roughly 14–15 minutes of recording in the available flash. The log is wiped on every reboot, so each recording session is fresh.

Shot detection

Training needs labeled examples: "a shot happened here" vs "nothing happened here." We detect shots by finding rising edges in the trigger signal (0 → 1 transitions). Shots closer than 1 second apart are merged to avoid counting recoil bounce or rapid double-taps as separate events.

Each detected shot becomes a positive training window: the 500ms of IMU data ending ~100ms before the trigger pull. The idea is that your hand was already moving into position during that window — that's the motion pattern we want the model to recognize.

Negative windows are sampled from regions far away from any shot (at least 600ms before and 1 second after each trigger edge), representing normal aiming, walking, or idle movement that should not trigger a spin-up.

Feature engineering

Raw IMU windows are 50 samples × 6 channels = 300 values. Rather than feeding all of these directly, we compute summary statistics that capture the shape of the motion. Two feature sets are used:

Summary features (LR) — 18 values

FeatureCountWhat it captures
mean per axis6Average orientation / motion direction
std per axis6How shaky or dynamic the motion is
abs-max per axis6Peak intensity of movement

Rich features (MLP) — 30 values

FeatureCountWhat it captures
mean per axis6Same as above
std per axis6Same as above
abs-max per axis6Same as above
abs-mean per axis6Average magnitude regardless of direction
accel magnitude (mean, std, max)3Combined acceleration intensity
gyro magnitude (mean, std, max)3Combined rotation intensity

The magnitude features (sqrt of sum-of-squares across xyz) make the MLP less sensitive to how you hold the blaster — rotation in any direction contributes equally.

The two models

Logistic Regression (LR)

The simplest possible classifier: a weighted sum of the 18 summary features, passed through a sigmoid to produce a probability between 0 and 1. Mathematically:

p(shoot) = sigmoid( w1·x1 + w2·x2 + ... + w18·x18 + b )

Training finds the 18 weights + 1 bias that best separate "about to shoot" from "not shooting." We use gradient descent with L2 regularization to prevent overfitting on small datasets.

PropertyValue
Parameters19 weights + 36 scaler values (220 bytes total)
Features18 (summary)
Inference cost~18 multiply-adds + 1 sigmoid
Training< 100ms in browser

Multi-Layer Perceptron (MLP)

A small neural network with two hidden layers. It can learn nonlinear patterns that logistic regression misses — like "high rotation combined with sudden acceleration" being predictive even when neither alone is.

30 features ──▸ [64 neurons, ReLU] ──▸ [32 neurons, ReLU] ──▸ [1 output, sigmoid] ──▸ p(shoot)

ReLU activation (max(0, x)) in the hidden layers lets the network model nonlinear decision boundaries. The final sigmoid squashes the output to a probability. Training uses the Adam optimizer with binary cross-entropy loss.

PropertyValue
Parameters4,097 weights + 60 scaler values (16.2 KB total)
Architecture30 → 64 → 32 → 1
Features30 (rich)
Inference cost~4K multiply-adds
Training~1–3 seconds in browser

Standardization

Before feeding features to either model, each feature is standardized: the training mean is subtracted and the result is divided by the training standard deviation. This puts all features on the same scale (roughly zero-centered, unit variance) which is critical for both gradient descent convergence and for the models treating all axes equally.

x_scaled = (x - mean) / scale

The scaler parameters (mean and scale per feature) are stored alongside the model weights and applied at inference time on the device, so the firmware always normalizes features exactly the same way training did.

On-device inference

The RP2040 runs inference on Core 0 (the slow/UI core) while the 3200 Hz PID motor control loop runs on Core 1. Core 1 pushes IMU samples into a sliding window ring buffer at 100 Hz. Meanwhile, Core 0 runs the prediction loop at ~100 Hz:

  1. Copies the current 50-sample window from the ring buffer
  2. Computes the feature vector (summary for LR, rich for MLP)
  3. Applies the stored scaler (subtract mean, divide by scale)
  4. Runs the forward pass (dot product for LR, matrix multiplies for MLP)
  5. If p > threshold → signal Core 1 to spin up flywheels

The entire inference pipeline takes well under 1ms on the RP2040 — fast enough to run every sample with margin to spare.

Model file format (MLMD)

Models are stored as compact binary blobs with a 24-byte header followed by the payload (all float32, little-endian):

OffsetSizeFieldDescription
04magicAlways "MLMD"
42versionCurrently 1
62windowSamplesMust be 50
81modelType0 = LR, 1 = MLP
91reservedZero
102features18 (LR) or 30 (MLP)
122h1Hidden layer 1 size (64 for MLP, 0 for LR)
142h2Hidden layer 2 size (32 for MLP, 0 for LR)
164payloadBytesSize of everything after the header
204payloadCrc32CRC32 over payload bytes

The CRC32 uses the standard IEEE polynomial (same as zlib). On load, the firmware verifies magic, version, window size, architecture dimensions, payload size, and CRC — if anything is off, the model is rejected and factory defaults are used instead.

In-browser training

The entire training pipeline runs in your browser — no server, no Python, no install. Here's what happens when you click "Connect to Stinger":

  1. Pull log data — The binary log is read from the device over Web Serial.
  2. Parse — The 17-byte samples are decoded into arrays of timestamps, IMU values, and trigger states.
  3. Detect shots — Rising edges in the trigger signal are found and filtered.
  4. Extract dataset — Positive windows (pre-shot) and negative windows (far from shots) are cut out.
  5. Featurize — Summary (18) and rich (30) feature vectors are computed for each window.
  6. Scale — StandardScaler is fit on the training data, then applied.
  7. Train LR — Gradient descent with L2 regularization, ~5000 iterations.
  8. Train MLP — Adam optimizer with binary cross-entropy, ~800 iterations.
  9. Build MLMD — Weights, scaler params, and CRC are packed into the binary format.
  10. Upload — The MLMD blobs are sent to the device over the same serial connection and loaded into RAM.

The heaviest step (MLP training) typically completes in 1–3 seconds on any modern browser. No GPU needed — the dataset is small enough that plain JavaScript is fast.

Design decisions

Why two models instead of just the MLP?
Different people get better results with different models. The LR is simpler — it works well when your pre-shot motion is distinctive along a single axis (e.g. a consistent raise-and-aim). The MLP can learn nonlinear combinations across axes, which helps when your aiming style is more complex. We train both and let you compare via the Motor → Idling menu (ML:LR vs ML:MLP) so you can pick whichever feels better.
Why 500ms windows?
Empirically, the preparatory motion before a trigger pull starts about 300–600ms ahead. 500ms (50 samples at 100 Hz) captures most of the relevant motion without including too much unrelated background movement. Shorter windows miss the early part of the aiming gesture; longer windows dilute the signal with noise.
Why 100ms lead time?
The prediction window ends 100ms before the trigger pull, not at the pull itself. This ensures the model doesn't cheat by seeing the actual trigger motion (which would show up as a sharp spike in the IMU). The 100ms margin means flywheels start spinning at least 100ms before you fire.
Why personal models instead of a universal one?
Everyone holds and aims differently. A model trained on one person's data performs poorly for another. Personal training adapts to your specific grip, aiming style, and movement patterns. We ship factory defaults trained on generic data, but personal models are significantly more accurate.
Why 64→32→1 for the MLP?
Small enough to run comfortably on an RP2040 (~4K multiply-adds), large enough to model the nonlinear interactions between axes. We tried smaller networks (32→16→1) and they underfit; larger ones (128→64→1) didn't improve accuracy and wasted RAM. The 64→32→1 architecture hits the sweet spot for this problem size.
Why summary features instead of raw samples?
Raw windows are 300 values — too many features for the small training datasets we get (typically 20–100 shots). Feature engineering compresses the relevant information into 18–30 values, making the models far more robust to overfitting. The features are also rotation-invariant thanks to the magnitude statistics, which helps generalize across different holding angles.
Why in-browser instead of a cloud service?
Privacy (your motion data never leaves your machine), no account needed, no internet dependency for the training step, works offline after first load, and zero infrastructure cost. The training is lightweight enough that JavaScript handles it easily.
Why CRC32 in the model file?
Serial transmission can corrupt data. A single flipped bit in a weight could make the model produce garbage predictions (and therefore spin up flywheels randomly). The CRC32 check on load ensures the model is exactly as trained — if corruption occurred during upload, the firmware falls back to factory defaults.
What happens if the model is bad?
The firmware always has factory-trained default weights compiled in. If a personal model file is missing, corrupt, or fails validation, factory defaults are used automatically. You can explicitly revert to factory defaults using the "Reset to factory model" button in the web tool (requires a serial connection). You can also re-record and re-train at any time — the log is wiped on every reboot, so each session starts clean.

What to expect

With a good recording session (5+ minutes, 20+ shots mixed with plenty of non-shooting movement):

If results aren't great, record a longer session with more varied movements. The models improve dramatically with more negative examples (non-shooting motion).