Biological Basis
The dragonfly catches prey mid-air with a success rate among the highest recorded for any predator. It does this not by reacting to where prey is, but by predicting where it will be and flying directly to meet it. This foresight is computed in nanoseconds by a compact, dedicated neural circuit.
The underlying mechanism is the Hassenstein–Reichardt correlation detector: a multiplication-based directional filter first derived from behavioural experiments in 1956. Its neural implementation — down to specific receptor subtypes at precise dendritic locations on T4 and T5 neurons — has since been mapped at synapse resolution in the fly optic lobe.
LIBELLULA implements this same logic in synthesizable RTL. Event-driven inputs feed direction-selective Reichardt detectors, which drive an α–β predictor to generate a lead-point coordinate — the silicon equivalent of a dragonfly computing an intercept before the prey knows it is being pursued.
1 · Validation & Implementation
All results below are simulation-based. The RTL is vendor-neutral Verilog-2001 and has not been synthesis-tested against any FPGA or cell library in this release. No timing closure report is included.
37 core benches covering latency, accuracy, throughput, power, AXI integration, and hostile / failure-mode conditions — all passing. 111 AXI-layer assertions also all passing. Invoked from sim/Makefile.
| Claim | Specification | Simulation result | Testbench | Status |
|---|---|---|---|---|
| Latency | ≤ 6 cycles @ 200 MHz | 5 cycles · 25 ns | tb_latency | Pass |
| Accuracy | ±2 px @ 300 Hz | 0 violations | tb_px_bound_300hz | Pass |
| Throughput | 1 Meps, zero drops | 2000 / 2000 ACK | tb_aer_throughput_1meps | Pass |
| Power scaling | Activity-proportional | Low/high confirmed | tb_power_lo · tb_power_hi | Pass |
| 8-direction motion | Cardinal + diagonal | All 8 axes exercised | tb_reichardt_ds | Pass |
| Outlier rejection | Coast on bad measurement | Confirmed | tb_ab_predictor | Pass |
| Hysteresis gating | Stable gate behaviour | Confirmed | tb_burst_gate | Pass |
| Core suite | 37 benches | 37 / 37 | make test | Pass |
| AXI layer | 111 assertions | 111 / 111 | make axi | Pass |
2 · Processing Pipeline
The pipeline ingests asynchronous events from DVS sensors and processes them through a feed-forward prediction architecture: AER → LIF tile (time-multiplexed) → 8-direction delay lattice & directional correlation → burst/confidence gate → α–β predictor with outlier rejection.
4-phase handshake receiver following standard AER semantics (REQ↑, ACK↑, REQ↓, ACK↓). Designed to interoperate with DVS sensors (Prophesee EVK4 / IMX636, iniVation DAVIS346, Samsung DVS) at FPGA bring-up and subsequent ASIC integration. Back-pressure-safe: no events dropped under sustained load.
Time-multiplexed neuron array with 14-bit precision. Filters noise and performs short-term salience selection — analogous to temporal filtering in medulla intrinsic neurons (Mi1, Mi4, Mi9). Only coherent motion engages downstream stages.
Ring-buffer-based delay lattice operating across eight directions: cardinal (E, W, N, S) and diagonal (NE, NW, SE, SW). Diagonal contributions are scaled to better approximate geometric equivalence with cardinal axes. Provides the temporal gradient structure required by the downstream Reichardt motion detector.
Elementary motion detector with leaky integration, converting delay lattice outputs into directional motion vectors across all eight axes. Implements both preferred-direction enhancement and null-direction suppression — a direct computational analogue to T4/T5 dendritic computation in the fly optic lobe. All eight response axes are exercised in tb_reichardt_ds.
Suppresses noise and sparse false triggers via event density thresholding with hysteresis: separate opening threshold (TH_OPEN = 3) and closing threshold (TH_CLOSE = 1) reduce chatter at threshold boundaries. Only sustained coherent activity engages the predictor.
Kalman-like predictor in Q8.8 fixed-point. Extrapolates trajectories forward in continuous time: p̂ = p + v·Δt. Includes outlier rejection: measurements whose residuals exceed OUTLIER_TH = 128 are rejected; the predictor coasts on its current velocity estimate rather than allowing a single bad sample to corrupt state.
Derives a reliability score from event rate and direction magnitude across all eight motion axes. Gates the predictor output with a pred_valid strobe — downstream consumers receive (x̂, ŷ, conf) only when coherent motion is confirmed, preventing noise-driven coordinates from propagating.
3 · RTL Modules
All timing budgets verified in simulation at 200 MHz. FPGA synthesis and timing closure not run in this release.
Address-Event Representation receiver implementing the standard 4-phase handshake. Designed for interoperability with commercial DVS sensors. Validated at ≥10&sup6; events/s with zero drops. The AER shell is a behavioural layer for rapid FPGA bring-up; ASIC integration follows the same semantics.
Leaky Integrate-and-Fire neuron tile sharing a single accumulator across the spatial array via time-multiplexing. 14-bit precision balances temporal resolution against gate count for efficient FPGA/ASIC utilization. Leak rate configurable.
Ring-buffer-based delay lattice across eight directions: cardinal (E, W, N, S) and diagonal (NE, NW, SE, SW). Diagonal contributions are scaled to better approximate geometric equivalence with cardinal axes. Provides the temporal gradient structure required by the downstream Reichardt motion detector.
Elementary motion detector with leaky integration. Implements both preferred-direction enhancement and null-direction suppression — a direct computational analogue to T4/T5 dendritic computation in the fly optic lobe. All eight axes (cardinal + diagonal) are exercised in tb_reichardt_ds.
Suppresses noise and sparse false triggers via event density thresholding with hysteresis: separate opening (TH_OPEN = 3) and closing (TH_CLOSE = 1) thresholds reduce chatter at threshold boundaries. Only sustained coherent activity engages the predictor.
Kalman-like α–β predictor performing trajectory extrapolation in continuous time (p + v·Δt) in Q8.8 fixed-point arithmetic. Includes outlier rejection: measurements whose residuals exceed OUTLIER_TH = 128 are rejected; the predictor coasts on its current velocity estimate. Validated at ≤ ±2 px per prediction at 300 Hz.
Derives a reliability score from event rate and direction magnitude across all eight motion axes. Gates the predictor output with a pred_valid strobe — downstream consumers receive (x̂, ŷ, conf) only when coherent motion is confirmed.
| Parameter | Default | Description |
|---|---|---|
| XW / YW | 10 | X / Y coordinate width (bits) |
| AW | 8 | LIF address width |
| DW | 6 | Delay lattice depth (bits) |
| PW | 16 | Predictor output width (bits) |
| WINDOW | 16 | Burst gate event-counting window |
| TH_OPEN | 3 | Events required to open burst gate |
| TH_CLOSE | 1 | Events required to hold gate open |
| OUTLIER_TH | 128 | Residual threshold for outlier rejection; predictor coasts above this |
4 · Relation to AI Systems
LIBELLULA is not a replacement for AI-based vision systems. It is a preprocessing layer that operates at a different timescale. Neural networks running on event-camera streams batch or accumulate events before inference; even fast, purpose-built SNNs take on the order of 100 µs to several milliseconds to produce a position estimate.
LIBELLULA produces a predicted lead-point coordinate — (x̂, ŷ, conf) — in 25 ns. These outputs are available to the AI layer as a stable, direction-confirmed, forward-extrapolated signal rather than a raw or stale one. The two layers operate at complementary timescales and are designed to coexist.
Raw event streams are sparse, asynchronous, and noisy. LIBELLULA delivers a cleaned, direction-confirmed, confidence-gated motion signal — making the AI's classification problem simpler and more reliable than working from raw events or accumulated frames.
AI inference arrives late by construction. LIBELLULA's predicted coordinate gives the AI a forward-extrapolated position to reason against rather than a stale historical one — reducing the effective latency of the combined system.
AI systems are difficult to certify for hard real-time bounds. LIBELLULA's fixed, gate-level-verifiable behaviour provides a deterministic layer beneath the AI — one whose outputs can be traced and whose failure modes are predictable.
A neural network running continuously draws 1–10 W. LIBELLULA's 45–60 mW core can run continuously, triggering AI inference only when confidence-gated motion is confirmed — acting as a low-power wake signal for the more expensive compute layer.
5 · Integration Notes
Standard 4-phase AER semantics. Designed to interoperate with Prophesee Metavision (IMX636, EVK4), iniVation DAVIS346, and Samsung DVS. No proprietary handshake required.
Downstream systems receive (x̂, ŷ, conf) with pred_valid strobes. No frame buffers, no DMA — a coordinate pair and confidence flag, updated at the event rate.
45–60 mW for core logic (simulation-verified, activity-proportional switching). ASIC target: <20 mW in a low-power process node.
No learned weights. No runtime adaptation. Every output is traceable through the 6-stage pipeline — a requirement for aviation and safety-critical certification pathways.
6 · Competitive Landscape
The following comparison covers current event-camera and motion-prediction approaches. System-level latency figures are used for all comparison rows; LIBELLULA figures are from simulation. The forward prediction column reflects the α–β predictor’s output horizon, which is selectable via Δt and remains a roadmap item beyond the current validated core.
| Solution | Latency | Forecast | Power |
|---|---|---|---|
| DJI / Skydio commercial stackFrame CNN on ARM + GPU | 20–40 ms | Reactive only | 3–8 W |
| ETH-UZH event-camera avoidanceScience Robotics, 2020 | 3.5 ms | 0 ms (reactive) | ~10 W |
| FPGA event-vision acceleratorBonazzi et al., arXiv 2024 | ~2 ms | 0 ms | 3–5 W |
| LIBELLULA — core logic (simulation)Synthesizable Verilog-2001 · 200 MHz simulation clock | 25 ns (5 cycles · spec ≤ 0.8 µs) | 2–30 ms aheadΔt selectable — roadmap | 45–60 mW<20 mW ASIC target |
7 · Roadmap
FPGA hardware loop-in with a physical DVS front-end (Prophesee EVK4 or iniVation DAVIS346) — sensor-in-the-loop testing and timing characterization on real event streams.
Timing characterization on silicon — validation of 25 ns core latency (5 cycles @ 200 MHz) and 45–60 mW power figures against physical implementation.
Prediction horizon extension toward 2–30 ms under power caps, via tuning of the α–β predictor and burst gate parameters.
ASIC tape-out in a low-power process node — targeting <20 mW for field deployment.
A planned module — the Predictive Mesh Lattice — adds short-horizon anticipatory scan steering for improved robustness under vibration, occlusion, and sensor noise. Inspired by the dragonfly's capacity to reacquire a target after brief occlusion. PML is intentionally excluded from the current Core: tuning is application-specific and co-development with an integration partner is the intended path.
8 · Evaluator Package
Designed to answer four immediate questions for a qualified engineering team:
26-bench simulation suite including core, power, AXI integration, and hostile / failure-mode conditions. All passing. Representative outputs and logs included.
Simulation logs for all 37 core benches and 111 AXI assertions. All pass under Icarus Verilog 12.0.
Lint report from Verilator 5.038 — zero warnings. FPGA synthesis has not been run in this release.
Both bare core (libellula_top) and AXI-integrated evaluation shell (libellula_axi_eval_top) are available as evaluation surfaces.
Open Repository
RTL source, testbenches, simulation Makefile, and implementation reports are available on GitHub. Technical engagement, including access to the full evaluator package, can be arranged directly.
github.com/vertov/LIBELLULA