/home/bill/web/Neural nets/Paper reviews/240211 paper review- math only.txt http://www.BillHowell.ca/Neural nets/Paper reviews/240211 paper review- math only.txt www.BillHowell.ca 11Feb2024 initial view this section in constant width font (eg Liberation Mono 10 point) and tab = 3 spaces This file consists mostly of notes taken to force myself to go through the details of the paper. #24************************24 # Table of Contents, generate with : # $ grep "^#]" "$d_web"'Neural nets/Paper reviews/240211 paper review- math only.txt' | sed "s|^#\]| |" # 24************************24 Overall topic 08********08 #] Nomenclature DVS dynamic vision sensors GPW Gradient Penalty Window I = pre-synaptic “current” I(t) = synapse current at time t LIF Leaky Integrate-and-Fire SNN model LT-SNN authors' novel self-adaptive SNN training algorithm with Learnable Threshold SG surrogate gradient functions SGP Separate Gradient Path τ = time constant θ represents the Heaviside step function u(t) = membrane potential at time t u_reset = reset potential after the spike operation V(th) = membrane potential threshold for spiking neurons Existing models for adaptive thresholds : ParamLIF [16] optimized the membrane time constant throughout training, with the requirements of large-sized models DSR [18] proposed threshold-associated spikes with a learnable potential threshold However, the heuristic and deterministic high-precision ratio between the firing range and the potential threshold of DSR limits the adaptiveness of SNN ??? [19] implemented weight-threshold balancing to improve the SNN adaptability to the input data for enhanced firing rate in deep SNNs. However, considering the weights and potential threshold same landscape makes the learning sub-optimal. 08********08 #] +-----+ #] model : Leaky Integrate-and-Fire SNN model : (1) τ*d[dt: u(t)] = -(u(t) − u_reset ) + I(t) where u(t) = membrane potential at time t u_reset = reset potential after the spike operation I(t) = synapse current at time t τ = time constant >> OK, this is the standard LIF definition (2) u(t+1) = (1 - τ*dt)*u(t - 1) + I*dt/τ where I = pre-synaptic “current” (3) S(t) = θ(u(t) - V(th) = 1 if u(t) >= V(th) = 0 otherwise where θ represents the Heaviside step function V(th) = membrane potential threshold for spiking neurons (4) ∂L/∂W = ∂L/∂S(t) *∂S(t)/∂u(t) *∂u(t)/∂I(t) *∂I(t)/∂W chain rule In this work, we choose the triangle function for gradient approximation: (5) ∂S(t)/∂u(t) = θ′(u(t) − V(th)) = max(0, 1 − |u(t) − V(th)|) ...the gradient of the potential threshold V(th) can be naı̈vely approximated in a straight-through manner using vanilla surrogate gradient (SG): (6) ∂L/∂V(th) = ∂L/∂S(t) *∂S(t)/V(th) = ∂L/∂S(t) *θ′(u(t) − V(th)) Different from the vanilla SG approach, the recent DSR scheme [18] computes the gradient of the potential threshold with the following threshold-based firing procedure: (7) S(t) = V(th) × θ(u(t) − α*V(th)) p4c1h0.86 (8) ∂S(t)/∂u(t) = θ′(u(t) − V(th)) = max(0, 1 − |u(t) − V(th)|) >> nice, as the authors said "simple-yet-effective" (9) ∂S(t)/∂V(th) = −θ′(u(t) − V(th)) *σ(u(t) − V(th)) (10) = − max(0, 1 − |u(t) − V(th)|) *σ(u(t) − V(th)) p4c1h0.86 "... In this work, we choose the Sigmoid function as the gradient penalty window for the potential threshold. ... The choice of Sigmoid function is empirical as it produces the best results among different surrogate functions that we have experimented. ..." p4c2h0.13 TABLE IV: Comparison of different surrogate functions’ performance as SGP for DVS-CIFAR10 dataset (11) σ(u(t) − V(th)) = (1 + exp^(-(u(t) − V(th))) >> comfortable familiarity sigmoid, again reminiscent of Grossberg's concepts : >> URL to captioned image ??? p4c2h0.31 "... For gradient computation of V(th), we accumulate the gradient computed in Eq. (9) to avoid the dimensionality mismatch: ..." p4c2h0.36 (12) |∂L/∂V(th)| = ∂L/∂S(t) *∂S(t)/V(th) = ∂L/∂S(t) *sum[N,C,H,W: 1 {u(t) ≥ V(th)} × θ(u(t) − V(th)) *σ(u(t) − V(th))] p4c2h0.46 "... Since the unfired neurons have no contribution to the final loss, the indicator function 1 {u t ≥ V th } filters the gradient with respect to the active neurons in the forward pass. ..." 24************************24 #] +-----+ #] Questions : # enddoc