http://www.billhowell.ca/Neural nets/Paper reviews/160719 Journal paper peer review - mathematics only.ndf
11Jul2016 start, 19Jul2016 - very incomplete, initial stage, stuck in symbols & nomenclature
# INSTRUCTIONS :
# View this file in a text editor, constant-width font (eg courrier 10), word-wrap turned OFF so that equation terms "line up"
# NOTE : Unfortunately, this attempt at a step-by-step re-derivation of the authors' work only got to the initial definition stage! I mostly worked on the symbols and nomenclature, and while I read through the entire paper very closely, I di not progress in the time available to confirming significant portions of the paper.
# CAUTION
# I have used time-step indexing (of samples) nomenclature O@j, rrather than the authors' O^j format.
#***********
# Descriptors of [data, equations]
# quick list
sym_add "dat 'k' 'k = 1,2,... sample index '
[a,t] is the interval of f(t)
E_output = error of the single output neuron
E_hidden@[n] = error of the weight vector connecting hidden neurons (n) to single output neuron (1)
E_input [p] = error of the weight vector connecting hidden neurons (n) to single output neuron (1)
E_output
E_hidden@i
i = 1,2, ,n input layer neuron index '
J = number of training samples
j = 1,2,...,J training sample index '
K = constant transfer function assumed for Eqns (04) and (05)
k = 1,2,..., iteration index for full-batch or individual sample introduction, the weights are adjusted until a stopping criteria is met (eg [max iterations, min error?)
m = 1,2,...,p middle layer neuron index
n = smallest integer that is > α Eqn (01)
n = number of hidden neurons, n ∈ N,
O = ideal output of lone output neuron
p = number of inputs
t = time, t ∈ R
u@[k,n*1] = weight vector connecting hidden neurons (n) to single output neuron (1)
v@[k,p*n] = weight vector connecting inputs (p) to hidden neurons (n)
x = input vector to NN, x@j ∈ R^p ?????????????
'y = final actual output for any fixed weights w '
'w@0 is the initial weight vector '
'α > 0, n-1 < α < n, but for (05) & (06) α ∈ [0,1] '
'η > 0 is the learning rate, constant and same for all weights '
'Γ = ∫{dt, 0 to ∞ : t^(α - 1)*e^(-t)] is the gamma function '
ζ@j is the jth input value of the output layer ??what does this mean?? what are indexes j and k?
Note that O is the ideal output, ζ is NN estimate
# Reset lists ...
list_sym_dats := '' ; list_desc_dats := '' ;
list_sym_eqns := '' ; list_desc_eqns := '' ;
list_authr_eqns := '' ; list_QNial_eqns := '' ;
sym_add "eqn '(01)' 'Grunwald-Letnikov (GL) fractional order derivative of order α of function f(t) with respect to t'
(post
'd_GL[dt, α, a : f(t)] '
'= lim[h->0, n*h=(t-a) : h^(-α)*sum[r=0 to n : (-1)^r*choose(α r)*f(t - r*h) ]'
'= sum[k=0 to m : d_k[dt : a]*(t - a)^(-α + k) / Γ(-α + k + 1)] '
' + 1/Γ(-α + k + 1) * ∫{dτ, a to t : (t - τ)^(n - α)*dT[dτ, (m+1) order deriv : f(τ)] } '
'where : '
'GL fractional order derivative can be derived from the TL derivative '
'should "m" be replaced by "n"?'
)
'QNial operator not yet defined '
;
sym_add "eqn '(1a)' 'Gamma function'
'Γ = ∫{dt, 0 to ∞ : t^(α - 1)*e^(-t)]'
'QNial operator not yet defined '
;
sym_add "eqn '(02)' 'Riemann-Liouville (RL) fractional order derivative of order α of function f(t) with respect to t'
(post
'd_RL[dt, α, a : f(t)] '
'= d_n[dt, alpha : d_RL[dt, -(n - α), a : f(t)] '
'= 1/Γ(n - α)] * ∫{dτ, a to t : (t - τ)^(n - α - 1)*f(τ) } '
)
'QNial operator not yet defined '
;
sym_add "eqn '(03)' 'Caputo (CP) fractional order derivative of order α of function f(t) with respect to t '
(post
'd_CP[dt, α, a : f(t)] '
'= 1/Γ(n - α)] * ∫{dτ, a to t : (t - τ)^(n - α - 1)*dT[dt, n order deriv : f(τ) ] } '
)
'QNial operator not yet defined '
;
sym_add "eqn '(04)' 'α ∈ [0,1] Caputo (CP) fractional order derivative of order α of function f(t) with respect to t '
(post
'd_CP_base[dt, α ∈ [0,1], a : f(t)] '
'= 1/Γ(1 - α)] * ∫{dτ, a to t : (t - τ)^(-α)*dT[dt : f(τ) ] } '
'where : '
'α ∈ [0,1], [a,t] is the interval of f(t) '
)
'QNial operator not yet defined '
;
sym_add "eqn '(05)' '[α ∈ [0,1], a = 0, f(t) = constant K] case for Riemann-Liouville (RL) fractional order derivative of order α of function f(t) with respect to t '
(post
'd_RL_baseLow[dt, α ∈ [0,1], a = 0 : f(t)] = K/Γ(1 - α) * x^(-alpha) '
'where : α ∈ [0,1], [a,t] is the interval of f(t) '
)
'QNial operator not yet defined '
;
# derivation of (05)
d_RL_baseLow[dt, α ∈ [0,1] so n=1, a = 0 : f(t)]
= 1/Γ(n - α)] *∫{dτ, a to t : (t - τ)^(n - α - 1)*f(τ) }
= 1/Γ(1 - α)] *∫{dτ, 0 to t : (t - τ)^(-α)*K }
= 1/Γ(1 - α)] *∫{dτ, 0 to t : (t - τ)^(-α) } *K
= 1/Γ(1 - α)] * (t - τ)^(-α + 1)/(-α + 1)*(-1)*K from τ = 0 to t
= K/(1 - α)/Γ(1 - α)] * (t - τ)^(1 - α) from τ = 0 to t
= K/(1 - α)/Γ(1 - α)] *{(t - τ)^(1 - α) - (0 - τ)^(1 - α) }
= K/(1 - α)/Γ(1 - α)] *{(t - τ)^(1 - α) - (-τ)^(1 - α) }
For t >> τ :
≈ K/(1 - α)/Γ(1 - α)] *t^(1 - α)
This is NOT the same as authors' Eqn (5), but the conclusion is the same :
0 (zero)
sym_add "eqn '(06)' '[α ∈ [0,1], a = 0, f(t) = constant K] case for Caputo (CP) fractional order derivative of order α of function f(t) with respect to t '
(post
'd_CP_baseLow[dt, α ∈ [0,1], a = 0 : f(t)] '
'= 1/Γ(n - α)] * ∫{dτ, a to t : (t - τ)^(n - α - 1)*dT[dt, n order deriv : f(τ) ] } '
'where : α ∈ [0,1], [a,t] is the interval of f(t) '
)
'QNial operator not yet defined '
;
# derivation of (06)
d_CP_baseLow[dt, α ∈ [0,1] so n=1, a = 0 : f(t)]
= 1/Γ(n - α)] * ∫{dτ, a to t : (t - τ)^(n - α - 1)*dT[dt, n order deriv : f(τ) ] }
= 1/Γ(1 - α)] * ∫{dτ, a to t : (t - τ)^(1 - α - 1)*dT[dt, 1 order deriv : K ] }
= 1/Γ(1 - α)] * ∫{dτ, a to t : (t - τ)^(1 - α - 1)*0 }
= 0
OK - same as authors
sym_add "eqn '(07)' 'G(z) - overall activation function vector result '
(post
'G(z) = (g(z1), g(z2),g(z3),...,g(zn)) '
'where : for All(z) ∈ R^n '
'Here ALL neurons have the SAME activation function - was this the intent? (check later derivations) '
)
'QNial operator not yet defined '
;
sym_add "eqn '(08)' 'y = final actual output for any fixed weights w '
(post
'y = f(u G(V*x@j)) '
'where : x@j ∈ R^p '
)
'QNial operator not yet defined '
;
sym_add "eqn '(9a)' 'f_j(t) = output square error function '
'f_j(t) = [O@j - f (u G(V*x@j))]^2 '
'QNial operator not yet defined '
;
sym_add "eqn '(09)' 'E(w) = error of the NN output '
(post
'E_output '
'= 1/2*sum[j = 1 to J : (O@j - f (u G(V*x@j)) )^2 ] '
'= sum[j = 1 to J : (O@j - f_j(u G(V*x@j)) )^2 ] '
'where : '
'f_j(t) = [O@j - f (u G(V*x@j))]^2 '
'Note that the order of input samples doesn't matter in this batch training approach '
)
'QNial operator not yet defined '
;
sym_add "eqn '(10)' 'hidden-layer to single output weight updates '
(post
'u@i = u@i - η*E_hidden@i for neuron i in hidden layer '
'u = u - η*E_hidden vector expression for all neurons in hidden layer '
'I have replaced E@[u@[k,i], w] with E_hidden@i to make it easier to understand '
'iteration index k is implicit in my notation above, as this is a simple update, and no memory is required of past iterations. '
)
'QNial operator not yet defined '
;
sym_add "eqn '(11)' 'input to hidden-layer weight updates '
(post
'v@[i,m] = v@[i,m] - η*E_input@[i,m] for neuron i in input layer '
'v = v - η*E_input vector expression for all neurons in input layer '
'I have replaced E@[v@[k,i,m], w] with E_input@[i,m] to make it easier to understand '
'iteration index k is implicit in my notation above, as this is a simple update, and no memory is required of past iterations. '
)
'QNial operator not yet defined '
;
sym_add "eqn '(12)' 'BP error estimate for hidden neuron outputs to (single) output neuron '
(post
'E_hidden@i = sum{j = 1 to J : dp[d(u G(V*x@j)) : f(u G(V*x@j))] * g(v@i x@j) } individual neuron-based '
'E_hidden = dp[d(u G(V*x )) : f(u G(V*x ))] * G(V*x ) ???vector-based, is my interpretation correct??? '
'del_hidden = u_T*E_output*dp[d(u G(V*x )) : f(u G(V*x ))] based on generic expressions from other references (eg Zurada 1992) '
'u_del =
'Authors'' term E(w,k,ui) is equivalent to the normal (old?) convention of del@[l,k]*y@[l,k], '
' where l is layer, k is neuron index in layer, y is neuron output, del is BP error for than neuron in that layer '
'iteration index k is implicit in my notations for (10) & (11), so it is not used here '
'This uses a total sample series (batch) update. '
)
'QNial operator not yet defined '
;
# Howell's description of a Multi-Layer Perceptron (MLP) - i.e. feed-forward (FF), "nice" layers
Note than MLPs are a constrained, restricted form of FF.
General FF-NNs must use Paul Werbo's "ordered derivatives", as normal BP is inadequate.
Note that the entire network is described by lists of lists , NOT by matrices, although components (eg net) can be matrices!
Note that the number of symbols is drastically reduced with this [nomenclature, symbolism]!
Whole MLP and each layer therein are "composed of" :
L = number of layers in MLP,
l = layer index, l ∈ 1,2,3,...,L
N@l = number of neurons in layer l
n@l = index of a specific neuron in layer l (n is a list of lists for whole MLP), n ∈ 1,2,3,...,N@l
F = a list of lists of activation functions, for whole MLP
f@(n@l) = activation function for neuron l@n, allowing for a different function for each neron in the MLP
However, usually [all f are same, or f may be the same within each layer]
W = a list of matrices for the whole MLP
w@l = weight matrix between neuron layers l-1 and l, note dimensions of w@l = [N@(l-1),N@l]
Y = list of neuron output vectors for all layers of MLP
Y@l = vector of outputs of all neurons in layer l
y@(n@l) = output of neuron l@n = sum{n = 1 to l@N : f@(l@n)(y
NET = list of neuron activation vectors for all layers of MLP
net@l = activation of layer l neurons = ?(transpose w@(l-1))*Y@[l-1]
DEL = list of BP error vectors for all layers of MLP
del@l = vector of BP error for each neuron in layer l
del@(n@l) = BP error for neuron n in layer l (n <=N@l)
# check on (12) using Zurada 1992 p187 - my quick interpretation :
del@[l@n] = (transpose w@[k,j])*del@[l+1]*dp[d(net@l) : f(net@l)]
del@[(l+1)@n]*(transpose y@[l@n] =
= u_T*E_output*dp[d(u G(V*x )) : f(u G(V*x ))]
Using authors' symbol E :
E_hidden = del@[l = hidden_layer,nth neuron in layer]*y@[hidden_layer,n]
delta(W) = nu*del_output*y_t
del_input =
delta(V) = nu*del_hidden*z_T
E_hidden
E_input =
This differs a lot from the authors' expression!
The authors E_hidden and E_input are equivalent to Zirawda's del_[o,y]*[y_T,z_T]
# check on (12) using Michael Nielson, http://neuralnetworksanddeeplearning.com/chap2.html :
dp[dz@[L,j] : sigma(z@[L,j]) ] = dp[dz@[L,j] : a@[L,j] ]
where z@[l,j] = sum{k = 1 to n_neurons in layer : w@[l,j,k]*a@[l-1,k] + b@[l,j] }
In notation of this paper :
for single output neuron layer, sample j, learning iteration k, and NO bias neurons (?) :
z_for_output = weighted sum of inputs + bias to the single output neuron
= sum{k = 1 to n_ : u@[j,k]*a_hidden@[l-1,k] + b@[l,j] }
del@l = (transpose(w@[l+1])*del@[l+1]) prod_Hadamard dp[d(z@l) : sigma(z@l) ]
for hidden layer
z_hidden@[1,...,n] = weighted sum of inputs + bias to the hidden layer neurons
z@[l,j] = sum{k = 1 to n_neurons in layer : w@[l,j,k]*a@[l-1,k] + b@[l,j] } generic
dp[dO@j : f(O@j) ] = dp[dO@j : O@j ]
dp[dz@[L,j] : sigma(z@[L,j]) ] = dp[dz@[L,j] : a@[L,j] ]
dp[dz@[L,j] : sigma(z@[L,j]) ] = dp[dz@[L,j] : a@[L,j] ] for input layer
sym_add "eqn '(13)' 'BP error estimate for input neuron outputs to hidden neurons '
(post
'E_input@[i,m] = sum{j = 1 to J : dT[dt : f(u G(V*x@j))] *u@i* dT[dt : g(v@[i] x@[k,j])] *x@[k,m] } '
'E_input@[i,m] = sum{j = 1 to J : dT[dt : f(u G(V*x@j))] *u@i* dT[dt : g(v@[i] x@[k,j])] *x@[k,m] } '
'E_input@[i,m] = sum{j = 1 to J : dT[dt : f(u G(V*x@j))] *u@i* dT[dt : g(v@[i] x@[k,j])] *x@[k,m] } '
'iteration index k is implicit in my notations for (10) & (11), so it is not used here '
)
'QNial operator not yet defined '
;
sym_add "eqn '(14a)' 'dζ@j = the jth input value of the output layer ??what does this mean?? '
(post
'dζ@j = sum{i = 1 to n : u@i*g(v@[i,m] x@j) } = u G(V*x@j) '
'dζ = u G(V*x) '
'where : ??what does the index j mean?? - it''s the sample # '
)
'QNial operator not yet defined '
;
sym_add "eqn '(14)' 'E_output = error of the Caputo NN output, same as Eqn (9) '
(post
'E_output '
'= 1/2*sum[j = 1 to J : (O@j - f (u G(V*x@j)) )^2 ] '
'= sum[j = 1 to J : (O@j - f_j(u G(V*x@j)) )^2 ] '
'where : '
'f_j(t) = [O@j - f (u G(V*x@j))]^2 '
'Eqn (14) is the same as Eqn (9) !! '
)
'QNial operator not yet defined '
;
sym_add "eqn '(15)' 'hidden-layer to single output weight updates, Caputo '
(post
'u@i = u@i - η*d_CP[d(u@i), α, a : E_output ] for connection from neuron i in hidden layer to output neuron '
'u = u - η*d_CP[du , α, a : E_output ] vector expression for all connections from hidden layer to output '
'Eqn (15) is the same as (10) for integer-order, except one takes Caputo derivative of E_hidden '
'I have replaced E@[u@[k,i], w] with E_hidden@[k,i] to make it easier to understand '
)
'QNial operator not yet defined '
;
sym_add "eqn '(16)' 'input to hidden-layer weight updates, Caputo '
(post
'v@[i,m] = v@[i,m] - η*d_CP[d(v@[i,m]), α, a : E_output ] for connection from neuron i in input layer to m in hidden layer '
'v = v - η*d_CP[dv , α, a : E_output ] vector expression for all connections from input to hidden layer '
'Eqn (16) is the same as (11) for integer-order, except one takes Caputo derivative of E_hidden '
'I have replaced E@[v@[k,i,m], w] with E_input@[k,i] to make it easier to understand '
)
'QNial operator not yet defined '
;
sym_add "eqn '(17)' 'Caputo derivative as product of [partial integer-order & fractional order] differentials '
(post
'd_CP[dt, α, a : h(s) ] = dp[ds : h(s)] d_CP[dt, α, a : E_input@[k,i] ] '
'Why the dot product? '
)
'QNial operator not yet defined '
;
sym_add "eqn '(18)' 'd_CP[d(u@i), α, c : E_hidden ] - Caputo derivative of hidden layer to output weights '
(post
'd_CP[d(u@i), α, c : E_hidden ] = dp[dζ@j : E_hidden] * d_CP[dt, α, c : ζ@j] ] '
'Why the dot product? '
)
'QNial operator not yet defined '
;
# enddoc