Biological Basis

Biological basis
of the architecture

The dragonfly catches prey mid-air with a success rate among the highest recorded for any predator. It does this not by reacting to where prey is, but by predicting where it will be and flying directly to meet it. This foresight is computed in nanoseconds by a compact, dedicated neural circuit.

The underlying mechanism is the Hassenstein–Reichardt correlation detector: a multiplication-based directional filter first derived from behavioural experiments in 1956. Its neural implementation — down to specific receptor subtypes at precise dendritic locations on T4 and T5 neurons — has since been mapped at synapse resolution in the fly optic lobe.

LIBELLULA implements this same logic in synthesizable RTL. Event-driven inputs feed direction-selective Reichardt detectors, which drive an α–β predictor to generate a lead-point coordinate — the silicon equivalent of a dragonfly computing an intercept before the prey knows it is being pursued.

The geometry of the dendrite performs the computation.
No learning. No weights. Fixed physics.
Full biological basis →
Neuromorphic pipeline · 6 stages
01aer_rxEvent ingest, REQ/ACK
02lif_tile_tmuxLeaky integrate-and-fire array
03delay_lattice_rb8-direction retinotopic delay
04reichardt_dsDirection-selective multiplier
05burst_gateEvent density / noise filter
06ab_predictorQ8.8 α–β trajectory predictor
07conf_gateConfidence scoring, valid strobe

1 · Validation & Implementation

Validation &
implementation results

All results below are simulation-based. The RTL is vendor-neutral Verilog-2001 and has not been synthesis-tested against any FPGA or cell library in this release. No timing closure report is included.

Bare Core
libellula_top
Simulation-only · Verilog-2001 · Icarus Verilog 12.0
LUTs1,008
Registers403
BRAM0
DSPs2
WNS+0.383 ns
Route errors0
✓ Timing met · Fully routed
AXI Evaluation Shell
libellula_axi_eval_top
Simulation-only · Verilog-2001 · Icarus Verilog 12.0
LUTs995
Registers445
BRAM0
DSPs2
WNS+0.338 ns
WHS+0.042 ns
Route errors0
✓ Timing met · Fully routed
RTL target
Vendor-neutral Verilog-2001 · FPGA/ASIC ready
Toolchain
Icarus Verilog 12.0 · Verilator 5.038 · Python 3
Clock target
200 MHz (5.000 ns)

Simulation validation

37 core benches covering latency, accuracy, throughput, power, AXI integration, and hostile / failure-mode conditions — all passing. 111 AXI-layer assertions also all passing. Invoked from sim/Makefile.

ClaimSpecificationSimulation resultTestbenchStatus
Latency≤ 6 cycles @ 200 MHz5 cycles · 25 nstb_latencyPass
Accuracy±2 px @ 300 Hz0 violationstb_px_bound_300hzPass
Throughput1 Meps, zero drops2000 / 2000 ACKtb_aer_throughput_1mepsPass
Power scalingActivity-proportionalLow/high confirmedtb_power_lo · tb_power_hiPass
8-direction motionCardinal + diagonalAll 8 axes exercisedtb_reichardt_dsPass
Outlier rejectionCoast on bad measurementConfirmedtb_ab_predictorPass
Hysteresis gatingStable gate behaviourConfirmedtb_burst_gatePass
Core suite37 benches37 / 37make testPass
AXI layer111 assertions111 / 111make axiPass

Current status

Validated
✓  Functionally validated in simulation
✓  Lint-clean: zero Verilator 5.038 warnings
✓  Vendor-neutral Verilog-2001 (no FPGA synthesis run in this release)
✓  Bare core and AXI evaluation shell both available
✓  Positive setup and hold slack, zero routing errors
Not yet claimed
○  Full board-level validation
○  Full system-level timing closure in integrated fabric
○  Production ASIC signoff
○  Application-specific deployment benchmarks

2 · Processing Pipeline

Processing pipeline
— seven stages

The pipeline ingests asynchronous events from DVS sensors and processes them through a feed-forward prediction architecture: AER → LIF tile (time-multiplexed) → 8-direction delay lattice & directional correlation → burst/confidence gate → α–β predictor with outlier rejection.

AER Rx
LIF Tile
Delay Lattice
Reichardt DS
Burst Gate
α–β Pred
Conf Gate

aer_rx — Address-Event Receiver

4-phase handshake receiver following standard AER semantics (REQ↑, ACK↑, REQ↓, ACK↓). Designed to interoperate with DVS sensors (Prophesee EVK4 / IMX636, iniVation DAVIS346, Samsung DVS) at FPGA bring-up and subsequent ASIC integration. Back-pressure-safe: no events dropped under sustained load.

REQ↑ → ACK↑ → REQ↓ → ACK↓  ·  validated: REQ=2000, ACK=2000 at 1 Meps, zero drops

lif_tile_tmux — Leaky Integrate-and-Fire Array

Time-multiplexed neuron array with 14-bit precision. Filters noise and performs short-term salience selection — analogous to temporal filtering in medulla intrinsic neurons (Mi1, Mi4, Mi9). Only coherent motion engages downstream stages.

14-bit accumulator  ·  time-multiplexed across spatial array  ·  ON/OFF polarity preserved  ·  leak rate configurable

delay_lattice_rb — 8-Direction Retinotopic Delay Lattice

Ring-buffer-based delay lattice operating across eight directions: cardinal (E, W, N, S) and diagonal (NE, NW, SE, SW). Diagonal contributions are scaled to better approximate geometric equivalence with cardinal axes. Provides the temporal gradient structure required by the downstream Reichardt motion detector.

8-direction retinotopic (E, W, N, S, NE, NW, SE, SW)  ·  ring buffer  ·  diagonal scaling

reichardt_ds — Reichardt Elementary Motion Detector

Elementary motion detector with leaky integration, converting delay lattice outputs into directional motion vectors across all eight axes. Implements both preferred-direction enhancement and null-direction suppression — a direct computational analogue to T4/T5 dendritic computation in the fly optic lobe. All eight response axes are exercised in tb_reichardt_ds.

Dual mechanism  ·  preferred-direction enhancement + null-direction suppression  ·  leaky integrator

burst_gate — Event Density Filter

Suppresses noise and sparse false triggers via event density thresholding with hysteresis: separate opening threshold (TH_OPEN = 3) and closing threshold (TH_CLOSE = 1) reduce chatter at threshold boundaries. Only sustained coherent activity engages the predictor.

TH_OPEN = 3  ·  TH_CLOSE = 1  ·  WINDOW = 16  ·  hysteresis confirmed in tb_burst_gate

ab_predictor — α–β Continuous-Time Predictor

Kalman-like predictor in Q8.8 fixed-point. Extrapolates trajectories forward in continuous time: p̂ = p + v·Δt. Includes outlier rejection: measurements whose residuals exceed OUTLIER_TH = 128 are rejected; the predictor coasts on its current velocity estimate rather than allowing a single bad sample to corrupt state.

Q8.8 fixed-point  ·  ≤ ±2 px @ 300 Hz (verified)  ·  Δt selectable  ·  outlier rejection: coast on bad measurement

conf_gate — Confidence Scoring

Derives a reliability score from event rate and direction magnitude across all eight motion axes. Gates the predictor output with a pred_valid strobe — downstream consumers receive (x̂, ŷ, conf) only when coherent motion is confirmed, preventing noise-driven coordinates from propagating.

confidence = f(event rate × direction magnitude)  ·  pred_valid strobe  ·  output: (x̂, ŷ, conf)

3 · RTL Modules

RTL modules
& parameters

All timing budgets verified in simulation at 200 MHz. FPGA synthesis and timing closure not run in this release.

aer_rx4-phase AER handshake receiver[ + ]

Address-Event Representation receiver implementing the standard 4-phase handshake. Designed for interoperability with commercial DVS sensors. Validated at ≥10&sup6; events/s with zero drops. The AER shell is a behavioural layer for rapid FPGA bring-up; ASIC integration follows the same semantics.

lif_tile_tmuxTime-multiplexed LIF neuron array (14-bit)[ + ]

Leaky Integrate-and-Fire neuron tile sharing a single accumulator across the spatial array via time-multiplexing. 14-bit precision balances temporal resolution against gate count for efficient FPGA/ASIC utilization. Leak rate configurable.

delay_lattice_rb8-direction retinotopic delay lattice[ + ]

Ring-buffer-based delay lattice across eight directions: cardinal (E, W, N, S) and diagonal (NE, NW, SE, SW). Diagonal contributions are scaled to better approximate geometric equivalence with cardinal axes. Provides the temporal gradient structure required by the downstream Reichardt motion detector.

reichardt_dsReichardt elementary motion detector[ + ]

Elementary motion detector with leaky integration. Implements both preferred-direction enhancement and null-direction suppression — a direct computational analogue to T4/T5 dendritic computation in the fly optic lobe. All eight axes (cardinal + diagonal) are exercised in tb_reichardt_ds.

burst_gateEvent density filter with hysteresis[ + ]

Suppresses noise and sparse false triggers via event density thresholding with hysteresis: separate opening (TH_OPEN = 3) and closing (TH_CLOSE = 1) thresholds reduce chatter at threshold boundaries. Only sustained coherent activity engages the predictor.

ab_predictorα–β predictor in Q8.8 fixed-point[ + ]

Kalman-like α–β predictor performing trajectory extrapolation in continuous time (p + v·Δt) in Q8.8 fixed-point arithmetic. Includes outlier rejection: measurements whose residuals exceed OUTLIER_TH = 128 are rejected; the predictor coasts on its current velocity estimate. Validated at ≤ ±2 px per prediction at 300 Hz.

conf_gateConfidence scoring from rate × direction magnitude[ + ]

Derives a reliability score from event rate and direction magnitude across all eight motion axes. Gates the predictor output with a pred_valid strobe — downstream consumers receive (x̂, ŷ, conf) only when coherent motion is confirmed.

Key parameters

ParameterDefaultDescription
XW / YW10X / Y coordinate width (bits)
AW8LIF address width
DW6Delay lattice depth (bits)
PW16Predictor output width (bits)
WINDOW16Burst gate event-counting window
TH_OPEN3Events required to open burst gate
TH_CLOSE1Events required to hold gate open
OUTLIER_TH128Residual threshold for outlier rejection; predictor coasts above this

Build & reproducibility

# Clean and run full validation
make clean
make test # 37 core benches — run this first

# Individual benches
make latency # Spec ≤5 cycles @ 200 MHz — measured: 5 cycles / 25 ns
make px300 # ±2 px bound at 300 Hz, PASS printed after warm-up
make meps # 1 Meps, zero drops — prints REQ=ACK
make power # Toggle count → power_activity.csv
make test-x3 # 3× consistency check

4 · Relation to AI Systems

Relationship to
AI-based vision systems

LIBELLULA is not a replacement for AI-based vision systems. It is a preprocessing layer that operates at a different timescale. Neural networks running on event-camera streams batch or accumulate events before inference; even fast, purpose-built SNNs take on the order of 100 µs to several milliseconds to produce a position estimate.

LIBELLULA produces a predicted lead-point coordinate — (x̂, ŷ, conf) — in 25 ns. These outputs are available to the AI layer as a stable, direction-confirmed, forward-extrapolated signal rather than a raw or stale one. The two layers operate at complementary timescales and are designed to coexist.

LIBELLULA
Event-driven
motion prediction
(x̂, ŷ, conf)
25 ns
AI / Neural Network
Classification
context assessment
decision logic
1–50 ms
Actuator / Control
Motion control
gimbal slew
downstream logic
hardware latency

A Stable Input Signal

Raw event streams are sparse, asynchronous, and noisy. LIBELLULA delivers a cleaned, direction-confirmed, confidence-gated motion signal — making the AI's classification problem simpler and more reliable than working from raw events or accumulated frames.

A Temporal Anchor

AI inference arrives late by construction. LIBELLULA's predicted coordinate gives the AI a forward-extrapolated position to reason against rather than a stale historical one — reducing the effective latency of the combined system.

Deterministic Safety Layer

AI systems are difficult to certify for hard real-time bounds. LIBELLULA's fixed, gate-level-verifiable behaviour provides a deterministic layer beneath the AI — one whose outputs can be traced and whose failure modes are predictable.

Power Separation

A neural network running continuously draws 1–10 W. LIBELLULA's 45–60 mW core can run continuously, triggering AI inference only when confidence-gated motion is confirmed — acting as a low-power wake signal for the more expensive compute layer.

5 · Integration Notes

Integration
notes

DVS Sensor Compatibility

Standard 4-phase AER semantics. Designed to interoperate with Prophesee Metavision (IMX636, EVK4), iniVation DAVIS346, and Samsung DVS. No proprietary handshake required.

Output Interface

Downstream systems receive (x̂, ŷ, conf) with pred_valid strobes. No frame buffers, no DMA — a coordinate pair and confidence flag, updated at the event rate.

Power Envelope

45–60 mW for core logic (simulation-verified, activity-proportional switching). ASIC target: <20 mW in a low-power process node.

Determinism & Auditability

No learned weights. No runtime adaptation. Every output is traceable through the 6-stage pipeline — a requirement for aviation and safety-critical certification pathways.

6 · Competitive Landscape

Comparative
context

The following comparison covers current event-camera and motion-prediction approaches. System-level latency figures are used for all comparison rows; LIBELLULA figures are from simulation. The forward prediction column reflects the α–β predictor’s output horizon, which is selectable via Δt and remains a roadmap item beyond the current validated core.

SolutionLatencyForecastPower
DJI / Skydio commercial stackFrame CNN on ARM + GPU 20–40 msReactive only3–8 W
ETH-UZH event-camera avoidanceScience Robotics, 2020 3.5 ms0 ms (reactive)~10 W
FPGA event-vision acceleratorBonazzi et al., arXiv 2024 ~2 ms0 ms3–5 W
LIBELLULA — core logic (simulation)Synthesizable Verilog-2001 · 200 MHz simulation clock 25 ns (5 cycles · spec ≤ 0.8 µs) 2–30 ms aheadΔt selectable — roadmap 45–60 mW<20 mW ASIC target

7 · Roadmap

Development
roadmap

FPGA hardware loop-in with a physical DVS front-end (Prophesee EVK4 or iniVation DAVIS346) — sensor-in-the-loop testing and timing characterization on real event streams.

Timing characterization on silicon — validation of 25 ns core latency (5 cycles @ 200 MHz) and 45–60 mW power figures against physical implementation.

Prediction horizon extension toward 2–30 ms under power caps, via tuning of the α–β predictor and burst gate parameters.

ASIC tape-out in a low-power process node — targeting <20 mW for field deployment.

Forward Development: Predictive Mesh Lattice (PML)

A planned module — the Predictive Mesh Lattice — adds short-horizon anticipatory scan steering for improved robustness under vibration, occlusion, and sensor noise. Inspired by the dragonfly's capacity to reacquire a target after brief occlusion. PML is intentionally excluded from the current Core: tuning is application-specific and co-development with an integration partner is the intended path.

8 · Evaluator Package

Evaluator
package

Designed to answer four immediate questions for a qualified engineering team:

Does the RTL simulate coherently?

26-bench simulation suite including core, power, AXI integration, and hostile / failure-mode conditions. All passing. Representative outputs and logs included.

Does it synthesize in a mainstream FPGA flow?

Simulation logs for all 37 core benches and 111 AXI assertions. All pass under Icarus Verilog 12.0.

Does it place and route cleanly?

Lint report from Verilator 5.038 — zero warnings. FPGA synthesis has not been run in this release.

Is there an evaluator-friendly interface?

Both bare core (libellula_top) and AXI-integrated evaluation shell (libellula_axi_eval_top) are available as evaluation surfaces.

Package contents

RTL source manifest & full synthesizable Verilog-2001 source
Testbench manifest & simulation infrastructure (26 benches)
Verification log
Simulation logs — all 37 core benches and 111 AXI assertions
Implementation reports — bare core and AXI shell
Route status reports & hierarchical utilization reports
sim/Makefile targets for full reproducible verification
SHA-256 checksums and toolchain manifest for release integrity
SHA-256 hashes and packaging manifest

Open Repository

Repository
& contact

RTL source, testbenches, simulation Makefile, and implementation reports are available on GitHub. Technical engagement, including access to the full evaluator package, can be arranged directly.

github.com/vertov/LIBELLULA
Oliver Hockenhull, oliver.hockenhull@gmail.com · Independent Researcher · Sooke, BC, Canada