X-Account-Key: account2
X-UIDL: 0000dfd14877d5f6
X-Mozilla-Status: 0011
X-Mozilla-Status2: 00000000
X-Mozilla-Keys:
Return-Path:
Received: from ti-edu.ch (posta.ti-edu.ch [195.176.176.171])
by mail-cc.srv.lexi.net (8.14.4/8.14.4) with ESMTP id sBUKoZIf059636
for ; Tue, 30 Dec 2014 13:50:39 -0700
Received: from [178.192.1.161] (account juergen@idsia.ch HELO [192.168.1.106])
by ti-edu.ch (CommuniGate Pro SMTP 6.0.9)
with ESMTPSA id 102346678 for Bill@BillHowell.ca; Tue, 30 Dec 2014 21:50:32 +0100
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\))
Subject: Re: IJCNN2015 Deep Learning theme in our publicity
From: Juergen Schmidhuber
In-Reply-To: <54A22B10.4030008@BillHowell.ca>
Date: Tue, 30 Dec 2014 21:50:31 +0100
Message-Id: <32786F8D-1BF8-4E0C-9E62-A6838C1B7B8F@idsia.ch>
References: <54A22B10.4030008@BillHowell.ca>
To: "Bill Howell. Retired from NRCan. now in Alberta Canada"
X-Mailer: Apple Mail (2.1510)
X-Greylist: Sender IP whitelisted by DNSRBL, not delayed by milter-greylist-4.4.1 (mail-cc.srv.lexi.net [198.161.90.1]); Tue, 30 Dec 2014 13:50:40 -0700 (MST)
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by mail-cc.srv.lexi.net id sBUKoZIf059636
Hi Bill,
thanks a lot!
Below please find a few comments / corrections / updates.
Happy New Year!
Juergen
1. A minor thing: My name is spelled Juergen (not Jeurgen).
2. Unfortunately I am not on Facebook. I have posted a few related announcements on G+ though:
https://plus.google.com/u/0/100849856540000067209/posts
3. Yes, the Deep Learning workshop at NIPS had 600 participants. I wasn't there, but some of my postdocs and students were. I am happy, because one big topic were our Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs), now widely used in industry. Some recent (2014) benchmark records achieved with LSTM at big IT companies:
1. Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
2. English to French translation (Sutskever et al., Google, NIPS 2014)
3. Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014)
4. Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
5. Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014)
6. Image caption generation (Vinyals et al., Google, 2014)
x. Baidu also used our RNNs to improve speech recognition (Hannun et al, 2014).
Original papers on LSTM and its various topologies and learning algorithms since 1995:
http://www.idsia.ch/~juergen/rnn.html
4. The final version of the survey has now 888 references on 88 pages.
So perhaps one could edit your DRAFT comment on Deep Learning for Facebook etc as follows:
---
The "Deep Learning" theme has driven a huge increase of interest and excitement in recent years for neural networks. But are you aware of the early origins of "Deep Learning" concepts? An excellent starting point for understanding the underlying concepts and history of the field is provided by a recent review article by Juergen Schmidhuber:
J. Schmidhuber. Deep Learning in Neural Networks: An Overview. Neural Networks, Volume 61, January 2015, Pages 85-117 (DOI: 10.1016/j.neunet.2014.09.003). Published online in 2014: http://authors.elsevier.com/a/1Q3Bc3BBjKFZVN
This is based on a tech report with 888 references on 88 pages: http://arxiv.org/abs/1404.7828
LATEX source: http://www.idsia.ch/~juergen/DeepLearning8Oct2014.tex
The complete BIBTEX file (888 kB) is also public: http://www.idsia.ch/~juergen/deep.bib
---
5. Here a link to videos and slides of "deep learning" talks I gave in 2014:
http://www.kurzweilai.net/deep-learning-rnnaissance-with-dr-juergen-schmidhuber
On Dec 30, 2014, at 5:33 AM, "Bill Howell. Retired from NRCan. now in Alberta Canada" wrote:
> Jeurgen - Although "Deep Learning" and "Big Data" have been mentioned in IJCNN2015 Publicity up to now, I am preparing DRAFT material on those two themes for our mass emails and our Facebook site (see the example below). A major reason for this emphasis is the apparently-huge attendance at the "Neural Information Processing Systems" (NIPS) conference in Montreal earlier this month. I still don't have solid reasons for the jump, but I suspect that the "Deep Learning" theme was an important attraction among others.
>
>
> [NOTE: I have yet to extract author names for NIPS 2007-2013.]
>
> Given your leadership in the "Deep Learning" area of research, I would certainly like to have any suggestions or content that you feel will help. If there are members of your lab would would like to "run" the postings on Facebook, that would be more than welcome.
>
> Please let me know what you think.
>
>
>
> Mr. Bill Howell
> 1-587-707-2027 Bill@BillHowell.ca www.BillHowell.ca
> P.O. Box 299, Hussar, Alberta, T0J1S0
> Retired: Science Research Manager (SE-REM-01) of Natural Resources Canada, CanmetMINING, Ottawa
> IJCNN2015 Killarney Ireland, Publicity co-Chair www.ijcnn.org
> INNS BigData2015 San Francisco, Publicity Co-Chair http://innsbigdata.org
> IJCNN2014 Beijing, Technical Program Committee, http://www.ieee-wcci2014.org
> INNS BOG member, Secretary www.inns.org
>
>
> ****************************
> DRAFT comment on Deep Learning for Facebook
> (this is based on your Connectionist posting of 02May2014)
>
> The "Deep Learning" theme has driven a huge increase of interest and excitement in recent years for neural networks. But are you aware of the early origins of "Deep Learning" concepts? An excellent starting point for understanding the underlying concepts and history of the field is provided by a recent review article by Juergen Schmidhuber :
> • A revised version (now with 750+ references) is here: http://arxiv.org/abs/1404.7828
> • PDF with better formatting (for A4 paper): http://www.idsia.ch/~juergen/DeepLearning30April2014.pdf
> • LATEX source: http://www.idsia.ch/~juergen/DeepLearning30April2014.tex
> • The complete BIBTEX file is also public: http://www.idsia.ch/~juergen/bib.bib
> A paper will also appear soon in the journal Neural Networks.
>
> But : Who created the first Deep Learning networks?
>
> The earliest reference found by Jeurgen was done by Olexiy Hryhorovych (Alexey Grigoryevich) Ivakhnenko and colleagues in 1965. Here a brief summary:
> Networks trained by the Group Method of Data Handling (GMDH) (Ivakhnenko and Lapa, 1965; Ivakhnenko et al., 1967; Ivakhnenko, 1968, 1971) were perhaps the first Deep Learning systems of the Feedforward Multilayer Perceptron type. The units of GMDH nets may have polynomial activation functions implementing Kolmogorov-Gabor polynomials (more general than other widely used neural network activation functions). Given a training set, layers are incrementally grown and trained by regression analysis (e.g., Legendre, 1805; Gauss, 1809, 1821), then pruned with the help of a separate validation set (using today’s terminology), where Decision Regularisation is used to weed out superfluous units. The numbers of layers and units per layer can be learned in problem-dependent fashion. To my knowledge, this was the first example of hierarchical representation learning in NNs. A paper of 1971 already described a deep GMDH network with 8 layers (Ivakhnenko, 1971). There have been numerous applications of GMDH-style nets, e.g. (Ikeda et al., 1976; Farlow, 1984; Madala and Ivakhnenko, 1994; Ivakhnenko, 1995; Kondo, 1998; Kordik et al., 2003; Witczak et al., 2006; Kondo and Ueno, 2008) …
> Precise references and more history in:
> Deep Learning in Neural Networks: An Overview
> PDF & LATEX source & complete public BIBTEX file under
> http://www.idsia.ch/~juergen/deep-learning-overview.html
>
> Juergen Schmidhuber
>
> ****************************
> DRAFT comment on Deep Learning for the next mass email
> [NOTE: As I do not have the lists of accepted [Competitions, Tutorials, Workshops], I have to wait for feedback from the Chairs to finalise details below!]
>
> The "Deep Learning" theme has driven a huge increase of interest and excitement in recent years for neural networks. IJCNN2015 will have ?? a special session, tutorials, and workshops?? focussing on the Deep Learning theme, in addition to individual paper submissions on the topic. Links are provided below :
> • SS05 Nature-Inspired Deep Learning Chair : Andries Engelbrecht
> Of course, we will have to wait for paper acceptance in early-to-mid March for a full list of Deep Lean=rning papers at IJCNN2015.
>
>
> ************************************
> Schmidhuber connectionist postings :
> one of many sources of some good ideas to stimulate [questions, discussions] on IJCNN2015-Facebook?
>
> -------- Forwarded Message --------
> Subject: Re: Connectionists: Deep Learning Overview Draft
> Date: Fri, 2 May 2014 16:13:50 +0200
> From: Juergen Schmidhuber
> To: connectionists@cs.cmu.edu
>
>
> Dear Connectionists,
>
> thanks a lot for numerous helpful comments!
> This was great fun; I learned a lot.
>
> A revised version (now with 750+ references) is here: http://arxiv.org/abs/1404.7828
> PDF with better formatting (for A4 paper): http://www.idsia.ch/~juergen/DeepLearning30April2014.pdf
> LATEX source: http://www.idsia.ch/~juergen/DeepLearning30April2014.tex
> The complete BIBTEX file is also public: http://www.idsia.ch/~juergen/bib.bib
>
> Don't hesitate to send additional suggestions to juergen@idsia.ch (NOT to the entire list).
>
> Kind regards,
>
> Juergen Schmidhuber
> http://www.idsia.ch/~juergen/deeplearning.html
>
>
> Revised Table of Contents
>
> 1 Introduction to Deep Learning (DL) in Neural Networks (NNs)
> 2 Event-Oriented Notation for Activation Spreading in Feedforward NNs (FNNs) and Recurrent NNs (RNNs)
> 3 Depth of Credit Assignment Paths (CAPs) and of Problems
>
> 4 Recurring Themes of Deep Learning
> 4.1 Dynamic Programming (DP) for DL
> 4.2 Unsupervised Learning (UL) Facilitating Supervised Learning (SL) and RL
> 4.3 Occam’s Razor: Compression and Minimum Description Length (MDL)
> 4.4 Learning Hierarchical Representations Through Deep SL, UL, RL
> 4.5 Fast Graphics Processing Units (GPUs) for DL in NNs
>
> 5 Supervised NNs, Some Helped by Unsupervised NNs (With DL Timeline)
> 5.1 1940s and Earlier
> 5.2 Around 1960: More Neurobiological Inspiration for DL
> 5.3 1965: Deep Networks Based on the Group Method of Data Handling (GMDH)
> 5.4 1979: Convolution + Weight Replication + Winner-Take-All (WTA)
> 5.5 1960-1981 and Beyond: Development of Backpropagation (BP) for NNs
> 5.5.1 BP for Weight-Sharing FNNs and RNNs
> 5.6 1989: BP for Convolutional NNs (CNNs)
> 5.7 Late 1980s-2000: Improvements of NNs
> 5.7.1 Ideas for Dealing with Long Time Lags and Deep CAPs
> 5.7.2 Better BP Through Advanced Gradient Descent
> 5.7.3 Discovering Low-Complexity, Problem-Solving NNs
> 5.7.4 Potential Benefits of UL for SL
> 5.8 1987: UL Through Autoencoder (AE) Hierarchies
> 5.9 1991: Fundamental Deep Learning Problem of Gradient Descent
> 5.10 1991: UL-Based History Compression Through a Deep Hierarchy of RNNs
> 5.11 1994: Contest-Winning Not So Deep NNs
> 5.12 1995: Supervised Very Deep Recurrent Learner (LSTM RNN)
> 5.13 1999: Max-Pooling (MP)
> 5.14 2003: More Contest-Winning/Record-Setting, Often Not So Deep NNs
> 5.15 2006: Deep Belief Networks (DBNs) / Improved CNNs / GPU-CNNs
> 5.16 2009: First Official Competitions Won by RNNs, and with MPCNNs
> 5.17 2010: Plain Backprop (+ Distortions) on GPU Yields Excellent Results
> 5.18 2011: MPCNNs on GPU Achieve Superhuman Vision Performance
> 5.19 2011: Hessian-Free Optimization for RNNs
> 5.20 2012: First Contests Won on ImageNet & Object Detection & Segmentation
> 5.21 2013: More Contests and Benchmark Records
> 5.21.1 Currently Successful Supervised Techniques: LSTM RNNs / GPU-MPCNNs
> 5.22 Recent Tricks for Improving SL Deep NNs (Compare Sec. 5.7.2, 5.7.3)
> 5.23 Consequences for Neuroscience
> 5.24 DL with Spiking Neurons?
>
> 6 DL in FNNs and RNNs for Reinforcement Learning (RL)
> 6.1 RL Through NN World Models Yields RNNs With Deep CAPs
> 6.2 Deep FNNs for Traditional RL and Markov Decision Processes (MDPs)
> 6.3 Deep RL RNNs for Partially Observable MDPs (POMDPs)
> 6.4 RL Facilitated by Deep UL in FNNs and RNNs
> 6.5 Deep Hierarchical RL (HRL) and Subgoal Learning with FNNs and RNNs
> 6.6 Deep RL by Direct NN Search / Policy Gradients / Evolution
> 6.7 Deep RL by Indirect Policy Search / Compressed NN Search
> 6.8 Universal RL
>
> 7 Conclusion
>
>
>
> -------- Forwarded Message --------
> Subject: Connectionists: First Deep Learning Networks in 1965
> Date: Tue, 8 Jul 2014 13:40:06 +0200
> From: Schmidhuber Juergen
> To: connectionists@cs.cmu.edu
>
>
> Who created the first Deep Learning networks?
>
> To my knowledge, this was done by Olexiy Hryhorovych (Alexey Grigoryevich) Ivakhnenko and colleagues in 1965. Here a brief summary:
>
> Networks trained by the Group Method of Data Handling (GMDH) (Ivakhnenko and Lapa, 1965; Ivakhnenko et al., 1967; Ivakhnenko, 1968, 1971) were perhaps the first Deep Learning systems of the Feedforward Multilayer Perceptron type. The units of GMDH nets may have polynomial activation functions implementing Kolmogorov-Gabor polynomials (more general than other widely used neural network activation functions). Given a training set, layers are incrementally grown and trained by regression analysis (e.g., Legendre, 1805; Gauss, 1809, 1821), then pruned with the help of a separate validation set (using today’s terminology), where Decision Regularisation is used to weed out superfluous units. The numbers of layers and units per layer can be learned in problem-dependent fashion. To my knowledge, this was the first example of hierarchical representation learning in NNs. A paper of 1971 already described a deep GMDH network with 8 layers (Ivakhnenko, 1971). There have been numerous applications of GMDH-style nets, e.g. (Ikeda et al., 1976; Farlow, 1984; Madala and Ivakhnenko, 1994; Ivakhnenko, 1995; Kondo, 1998; Kordik et al., 2003; Witczak et al., 2006; Kondo and Ueno, 2008) …
>
> Precise references and more history in:
>
> Deep Learning in Neural Networks: An Overview
> PDF & LATEX source & complete public BIBTEX file under
> http://www.idsia.ch/~juergen/deep-learning-overview.html
>
> Juergen Schmidhuber
>
>
>
>
> -------- Forwarded Message --------
> Subject: Connectionists: Who invented backpropagation?
> Date: Tue, 22 Jul 2014 14:22:28 +0200
> From: Schmidhuber Juergen
> To: connectionists@cs.cmu.edu
>
>
> Efficient backpropagation (BP) is central to the ongoing Neural Network (NN) ReNNaissance and "Deep Learning." Who invented it?
>
> It is easy to find misleading accounts of BP's history. I had a look at the original papers from the 1960s and 70s, and talked to BP pioneers. Here is a summary derived from my recent survey, which has additional references:
>
> The minimisation of errors through gradient descent (Hadamard, 1908) in the parameter space of complex, nonlinear, differentiable, multi-stage, NN-related systems has been discussed at least since the early 1960s (e.g., Kelley, 1960; Bryson, 1961; Bryson and Denham, 1961; Pontryagin et al., 1961; Dreyfus, 1962; Wilkinson, 1965; Amari, 1967; Bryson and Ho, 1969; Director and Rohrer, 1969), initially within the framework of Euler-LaGrange equations in the Calculus of Variations (e.g., Euler, 1744).
>
> Steepest descent in the weight space of such systems can be performed (Bryson, 1961; Kelley, 1960; Bryson and Ho, 1969) by iterating the chain rule (Leibniz, 1676; L’Hopital, 1696) à la Dynamic Programming (DP, Bellman, 1957). A simplified derivation of this backpropagation method uses the chain rule only (Dreyfus, 1962).
>
> The systems of the 1960s were already efficient in the DP sense. However, they backpropagated derivative information through standard Jacobian matrix calculations from one “layer” to the previous one, without explicitly addressing either direct links across several layers or potential additional efficiency gains due to network sparsity (but perhaps such enhancements seemed obvious to the authors).
>
> Explicit, efficient error backpropagation (BP) in arbitrary, discrete, possibly sparsely connected, NN-like networks apparently was first described in a 1970 master’s thesis (Linnainmaa, 1970, 1976), albeit without reference to NNs. BP is also known as the reverse mode of automatic differentiation (e.g., Griewank, 2012), where the costs of forward activation spreading essentially equal the costs of backward derivative calculation. See early BP FORTRAN code (Linnainmaa, 1970) and closely related work (Ostrovskii et al., 1971).
>
> BP was soon explicitly used to minimize cost functions by adapting control parameters (weights) (Dreyfus, 1973). This was followed by some preliminary, NN-specific discussion (Werbos, 1974, section 5.5.1), and a computer program for automatically deriving and implementing BP for any given differentiable system (Speelpenning, 1980).
>
> To my knowledge, the first NN-specific application of efficient BP as above was described in 1981 (Werbos, 1981). Related work was published several years later (Parker, 1985; LeCun, 1985). A paper of 1986 significantly contributed to the popularisation of BP (Rumelhart et al., 1986).
>
> Compare also the first adaptive, deep, multilayer perceptrons (Ivakhnenko et al., since 1965), whose layers are incrementally grown and trained by regression analysis, as well as a more recent method for multilayer threshold NNs (Bobrowski, 1978).
>
> Precise references and more history in:
>
> Deep Learning in Neural Networks: An Overview
> PDF & LATEX source & complete public BIBTEX file under
> http://www.idsia.ch/~juergen/deep-learning-overview.html
>
> Juergen Schmidhuber
> http://www.idsia.ch/~juergen/whatsnew.html
>
> P.S.: I'll give talks on Deep Learning and other things in the NYC area around 1-5 and 18-19 August, and in the Bay area around 7-15 August; videos of previous talks can be found under http://www.idsia.ch/~juergen/videos.html
>
>