/home/bill/web/Neural nets/Paper reviews/240211 paper review- math only.txt
http://www.BillHowell.ca/Neural nets/Paper reviews/240211 paper review- math only.txt
www.BillHowell.ca 11Feb2024 initial
view this section in constant width font (eg Liberation Mono 10 point) and tab = 3 spaces

This file consists mostly of notes taken to force myself to go through the details of the paper.  


#24************************24
# Table of Contents, generate with :
# $ grep  "^#]"  "$d_web"'Neural nets/Paper reviews/240211 paper review- math only.txt'  |   sed "s|^#\]|  |" 
#


24************************24
Overall topic

08********08
#] Nomenclature 


DVS		dynamic vision sensors 
GPW		Gradient Penalty Window 
I			= pre-synaptic “current”
I(t)		= synapse current at time t
LIF		Leaky Integrate-and-Fire SNN model
LT-SNN	authors' novel self-adaptive SNN training algorithm with Learnable Threshold
SG			surrogate gradient functions
SGP		Separate Gradient Path 
τ			= time constant
θ			represents the Heaviside step function
u(t)		= membrane potential at time t
u_reset	= reset potential after the spike operation
V(th)		=  membrane potential threshold for spiking neurons


Existing models for adaptive thresholds : 
ParamLIF	[16] optimized the membrane time constant throughout training, 
			with the requirements of large-sized models
DSR		[18] proposed threshold-associated spikes with a learnable potential threshold
			However, the heuristic and deterministic high-precision ratio between the firing range and the potential threshold of DSR limits the adaptiveness of SNN
???		[19] implemented weight-threshold balancing to improve the SNN adaptability 
			to the input data for enhanced firing rate in deep SNNs. However, considering the weights and potential threshold same landscape makes the learning sub-optimal.


08********08
#] +-----+
#] model : 


Leaky Integrate-and-Fire SNN model :
(1)	τ*d[dt: u(t)]  =  -(u(t) − u_reset ) + I(t)
where 
u(t)		= membrane potential at time t
u_reset	= reset potential after the spike operation
I(t)		= synapse current at time t
τ			= time constant
>> OK, this is the standard LIF definition

(2)	u(t+1)		=  (1 - τ*dt)*u(t - 1) + I*dt/τ
where 
I			= pre-synaptic “current”

(3)	S(t)			= θ(u(t) - V(th) 
						= 1  if  u(t) >= V(th)
						= 0  otherwise 
where 
θ			represents the Heaviside step function
V(th)		=  membrane potential threshold for spiking neurons

(4)	∂L/∂W			=  ∂L/∂S(t) *∂S(t)/∂u(t) *∂u(t)/∂I(t) *∂I(t)/∂W   chain rule

In this work, we choose the triangle function for gradient approximation:
(5)	∂S(t)/∂u(t)	=  θ′(u(t) − V(th)) 
						=  max(0, 1 − |u(t) − V(th)|)


...the gradient of the potential threshold V(th) can be naı̈vely approximated in a straight-through manner using vanilla surrogate gradient (SG):
(6)	∂L/∂V(th)	=  ∂L/∂S(t) *∂S(t)/V(th)
						=  ∂L/∂S(t) *θ′(u(t) − V(th))

Different from the vanilla SG approach, the recent DSR scheme [18] computes the gradient of the potential threshold with the following threshold-based firing procedure:
(7)	S(t)			=  V(th) × θ(u(t) − α*V(th))

p4c1h0.86  
(8)	∂S(t)/∂u(t)	=  θ′(u(t) − V(th))
						=  max(0, 1 − |u(t) − V(th)|)
>> nice, as the authors said "simple-yet-effective"

(9)	∂S(t)/∂V(th) = −θ′(u(t) − V(th))					*σ(u(t) − V(th))
(10)					 = − max(0, 1 − |u(t) − V(th)|)	*σ(u(t) − V(th))

p4c1h0.86  "...   In this work, we choose the Sigmoid function as the gradient penalty window for the potential threshold.  ...  The choice of Sigmoid function is empirical as it produces the best results among different surrogate functions that we have experimented.   ..."
p4c2h0.13  TABLE IV: Comparison of different surrogate functions’ performance as SGP for DVS-CIFAR10 dataset

(11)	σ(u(t) − V(th))  =  (1 + exp^(-(u(t) − V(th)))

>> comfortable familiarity sigmoid, again reminiscent of Grossberg's concepts : 
>> URL to captioned image ???

p4c2h0.31  "...   For gradient computation of V(th), we accumulate the gradient computed in Eq. (9) to avoid the dimensionality mismatch:   ..."
p4c2h0.36  

(12)	|∂L/∂V(th)|	=  ∂L/∂S(t) *∂S(t)/V(th)
						=  ∂L/∂S(t) *sum[N,C,H,W: 1 {u(t) ≥  V(th)} × θ(u(t) − V(th)) *σ(u(t) − V(th))]

p4c2h0.46  "...   Since the unfired neurons have no contribution to the final loss, the indicator function 1 {u t ≥ V th } filters the gradient with respect to the active neurons in the forward pass.   ..."


24************************24
#] +-----+
#] Questions : 


# enddoc