Aberdeen, D. (2003). Policy-Gradient Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Australian National University. Abounadi, J., Bertsekas, D., and Borkar, V. S. (2002). Learning algorithms for Markov decision processes with average cost. SIAM Journal on Control and Optimization, 40(3):681–698. Akaike, H. (1970). Statistical predictor identification. Ann. Inst. Statist. Math., 22:203–217. Allender, A. (1992). Application of time-bounded Kolmogorov complexity in complexity theory. InWatanabe, O., editor, Kolmogorov complexity and computational complexity, pages 6–22. EATCSMonographs on Theoretical Computer Science, Springer. Almeida, L. B. (1987). A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In IEEE 1st International Conference on Neural Networks, San Diego, volume 2, pages 609–618. Almeida, L. B., Almeida, L. B., Langlois, T., Amaral, J. D., and Redol, R. A. (1997). On-line step size adaptation. Technical report, INESC, 9 Rua Alves Redol, 1000. Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Trans. EC, 16(3):299–307. Amari, S., Cichocki, A., and Yang, H. (1996). A new learning algorithm for blind signal separation. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Systems (NIPS), volume 8. The MIT Press. Amari, S. and Murata, N. (1993). Statistical theory of learning curves under entropic loss criterion. Neural Computation, 5(1):140–153. Amari, S.-I. (1998). Natural gradient works efficiently in learning. Neural Computation, 10(2):251–276. Amit, D. J. and Brunel, N. (1997). Dynamics of a recurrent network of spiking neurons before and following learning. Network: Computation in Neural Systems, 8(4):373–404. An, G. (1996). The effects of adding noise during backpropagation training on a generalization performance. Neural Computation, 8(3):643–674. Andrade, M. A., Chacon, P., Merelo, J. J., and Moran, F. (1993). Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network. Protein Engineering, 6(4):383–390. Andrews, R., Diederich, J., and Tickle, A. B. (1995). Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-Based Systems, 8(6):373–389. Anguita, D. and Gomes, B. A. (1996). Mixing floating- and fixed-point formats for neural network learning on neuroprocessors. Microprocessing and Microprogramming, 41(10):757 – 769. Anguita, D., Parodi, G., and Zunino, R. (1994). An efficient implementation of BP on RISC-based workstations. Neurocomputing, 6(1):57 – 65. Arel, I., Rose, D. C., and Karnowski, T. P. (2010). Deep machine learning – a new frontier in artificial intelligence research. Computational Intelligence Magazine, IEEE, 5(4):13–18. Ash, T. (1989). Dynamic node creation in backpropagation neural networks. Connection Science, 1(4):365–375. Atick, J. J., Li, Z., and Redlich, A. N. (1992). Understanding retinal color coding from first principles. Neural Computation, 4:559–572. 27 Atiya, A. F. and Parlos, A. G. (2000). New results on recurrent network training: unifying the algorithms and accelerating convergence. IEEE Transactions on Neural Networks, 11(3):697–709. Ba, J. and Frey, B. (2013). Adaptive dropout for training deep neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 3084–3092. Baird, H. (1990). Document Image Defect Models. In Proceddings, IAPR Workshop on Syntactic and Structural Pattern Recognition, Murray Hill, NJ. Baird, L. and Moore, A. W. (1999). Gradient descent for general reinforcement learning. In Advances in neural information processing systems 12 (NIPS), pages 968–974. MIT Press. Baird, L. C. (1994). Reinforcement learning in continuous time: Advantage updating. In IEEE World Congress on Computational Intelligence, volume 4, pages 2448–2453. IEEE. Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In International Conference on Machine Learning, pages 30–37. Bakker, B. (2002). Reinforcement learning with Long Short-TermMemory. In Dietterich, T. G., Becker, S., and Ghahramani, Z., editors, Advances in Neural Information Processing Systems 14, pages 1475–1482. MIT Press, Cambridge, MA. Bakker, B. and Schmidhuber, J. (2004). Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In et al., F. G., editor, Proc. 8th Conference on Intelligent Autonomous Systems IAS-8, pages 438–445, Amsterdam, NL. IOS Press. Bakker, B., Zhumatiy, V., Gruener, G., and Schmidhuber, J. (2003). A robot that reinforcement-learns to identify and memorize important previous observations. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2003, pages 430–435. Baldi, P. (1995). Gradient descent learning algorithms overview: A general dynamical systems perspective. IEEE Transactions on Neural Networks, 6(1):182–195. Baldi, P. (2012). Autoencoders, Unsupervised Learning, and Deep Architectures. Journal of Machine Learning Research (Proc. 2011 ICML Workshop on Unsupervised and Transfer Learning), 27:37–50. Baldi, P., Brunak, S., Frasconi, P., Pollastri, G., and Soda, G. (1999). Exploiting the past and the future in protein secondary structure prediction. Bioinformatics, 15:937–946. Baldi, P. and Chauvin, Y. (1993). Neural networks for fingerprint recognition. Neural Computation, 5(3):402–418. Baldi, P. and Chauvin, Y. (1996). Hybrid modeling, HMM/NN architectures, and protein applications. Neural Computation, 8(7):1541–1565. Baldi, P. and Hornik, K. (1989). Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks, 2:53–58. Baldi, P. and Hornik, K. (1994). Learning in linear networks: a survey. IEEE Transactions on Neural Networks, 6(4):837–858. 1995. Baldi, P. and Pollastri, G. (2003). The principled design of large-scale recursive neural network architectures – DAG-RNNs and the protein structure prediction problem. J. Mach. Learn. Res., 4:575–602. Baldi, P. and Sadowski, P. (2014). The dropout learning algorithm. Artificial Intelligence, 210C:78–122. Ballard, D. H. (1987). Modular learning in neural networks. In Proc. AAAI, pages 279–284. Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report CMU-CS-94-163, Carnegie Mellon University. 28 Balzer, R. (1985). A 15 year perspective on automatic programming. IEEE Transactions on Software Engineering, 11(11):1257–1268. Barlow, H. B. (1989). Unsupervised learning. Neural Computation, 1(3):295–311. Barlow, H. B., Kaushal, T. P., and Mitchison, G. J. (1989). Finding minimum entropy codes. Neural Computation, 1(3):412–423. Barrow, H. G. (1987). Learning receptive fields. In Proceedings of the IEEE 1st Annual Conference on Neural Networks, volume IV, pages 115–121. IEEE. Bartlett, P. L. and Baxter, J. (2011). Infinite-horizon policy-gradient estimation. arXiv preprint arXiv:1106.0665. Barto, A. G. and Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4):341–379. Barto, A. G., Singh, S., and Chentanez, N. (2004). Intrinsically motivated learning of hierarchical collections of skills. In Proceedings of International Conference on Developmental Learning (ICDL), pages 112–119. MIT Press, Cambridge, MA. Barto, A. G., Sutton, R. S., and Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems,Man, and Cybernetics, SMC-13:834– 846. Battiti, R. (1989). Accelerated backpropagation learning: two optimization methods. Complex Systems, 3(4):331–342. Battiti, T. (1992). First- and second-order methods for learning: Between steepest descent and Newton’s method. Neural Computation, 4(2):141–166. Baxter, J. and Bartlett, P. (1999). Direct Gradient-Based Reinforcement Learning. Technical report, Research School of Information Sciences and Engineering, Australian National University. Bayer, J., Osendorfer, C., Chen, N., Urban, S., and van der Smagt, P. (2013). On fast dropout and its applicability to recurrent networks. arXiv preprint arXiv:1311.0701. Bayer, J., Wierstra, D., Togelius, J., and Schmidhuber, J. (2009). Evolving memory cell structures for sequence learning. In Proc. ICANN (2), pages 755–764. Becker, S. (1991). Unsupervised learning procedures for neural networks. International Journal of Neural Systems, 2(1 & 2):17–33. Becker, S. and Le Cun, Y. (1989). Improving the convergence of back-propagation learning with second order methods. In Touretzky, D., Hinton, G., and Sejnowski, T., editors, Proc. 1988 Connectionist Models Summer School, pages 29–37, Pittsburg 1988. Morgan Kaufmann, San Mateo. Behnke, S. (2003). Hierarchical Neural Networks for Image Interpretation, volume LNCS 2766 of Lecture Notes in Computer Science. Springer. Bell, A. J. and Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6):1129–1159. Bellman, R. (1957). Dynamic Programming. Princeton University Press, Princeton, NJ, USA, 1st edition. Belouchrani, A., Abed-Meraim, K., Cardoso, J.-F., and Moulines, E. (1997). A blind source separation technique using second-order statistics. IEEE Transactions on Signal Processing, 45(2):434–444. Bengio, Y. (1991). Artificial Neural Networks and their Application to Sequence Recognition. PhD thesis, McGill University, (Computer Science), Montreal, Qc., Canada. 29 Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends in Machine Learning, V2(1). Now Publishers. Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1798–1828. Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems 19 (NIPS), pages 153–160. MIT Press. Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166. Beringer, N., Graves, A., Schiel, F., and Schmidhuber, J. (2005). Classifying unprompted speech by retraining LSTM nets. In Duch, W., Kacprzyk, J., Oja, E., and Zadrozny, S., editors, Artificial Neural Networks: Biological Inspirations - ICANN 2005, LNCS 3696, pages 575–581. Springer-Verlag Berlin Heidelberg. Bertsekas, D. P. (2001). Dynamic Programming and Optimal Control. Athena Scientific. Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-dynamic Programming. Athena Scientific, Belmont, MA. Biegler-K¨onig, F. and B¨armann, F. (1993). A learning algorithm for multilayered neural networks based on linear least squares problems. Neural Networks, 6(1):127–131. Bishop, C.M. (1993). Curvature-driven smoothing: A learning algorithm for feed-forward networks. IEEE Transactions on Neural Networks, 4(5):882–884. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. Blair, A. D. and Pollack, J. B. (1997). Analysis of dynamical recognizers. Neural Computation, 9(5):1127– 1142. Bluche, T., Louradour, J., Knibbe, M., Moysset, B., Benzeghiba, F., and Kermorvant., C. (2014). The A2iA Arabic Handwritten Text Recognition System at the OpenHaRT2013 Evaluation. In International Workshop on Document Analysis Systems. Bobrowski, L. (1978). Learning processes in multilayer threshold nets. Biological Cybernetics, 31:1–6. Bod´en, M. andWiles, J. (2000). Context-free and context-sensitive dynamics in recurrent neural networks. Connection Science, 12(3-4):197–210. Bodenhausen, U. and Waibel, A. (1991). The tempo 2 algorithm: Adjusting time-delays by supervised learning. In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 155–161. Morgan Kaufmann. Bohte, S.M., Kok, J. N., and La Poutre, H. (2002). Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing, 48(1):17–37. Bottou, L. (1991). Une approche th´eorique de l’apprentissage connexioniste; applications `a la reconnaissance de la parole. PhD thesis, Universit´e de Paris XI. Bourlard, H. and Morgan, N. (1994). Connnectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers. Boutilier, C. and Poole, D. (1996). Computing optimal policies for partially observable Markov decision processes using compact representations. In Proceedings of the AAAI, Portland, OR. Bradtke, S. J., Barto, A. G., and Kaelbling, L. P. (1996). Linear least-squares algorithms for temporal difference learning. In Machine Learning, pages 22–33. 30 Brafman, R. I. and Tennenholtz, M. (2002). R-MAX—a general polynomial time algorithm for nearoptimal reinforcement learning. Journal of Machine Learning Research, 3:213–231. Breiman, L. (1996). Bagging predictors. Machine Learning, 24:123–140. Brette, R., Rudolph, M., Carnevale, T., Hines, M., Beeman, D., Bower, J. M., Diesmann, M., Morrison, A., Goodman, P. H., Harris Jr, F. C., et al. (2007). Simulation of networks of spiking neurons: a review of tools and strategies. Journal of Computational Neuroscience, 23(3):349–398. Breuel, T. M., Ul-Hasan, A., Al-Azawi, M. A., and Shafait, F. (2013). High-performance OCR for printed English and Fraktur using LSTM networks. In 12th International Conference on Document Analysis and Recognition (ICDAR), pages 683–687. IEEE. Broyden, C. G. et al. (1965). A class of methods for solving nonlinear simultaneous equations. Math. Comp, 19(92):577–593. Brunel, N. (2000). Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. Journal of Computational Neuroscience, 8(3):183–208. Bryson, A. and Ho, Y. (1969). Applied optimal control: optimization, estimation, and control. Blaisdell Pub. Co. Bryson, A. E. (1961). A gradient method for optimizing multi-stage allocation processes. In Proc. Harvard Univ. Symposium on digital computers and their applications. Bryson, Jr., A. E. and Denham, W. F. (1961). A steepest-ascent method for solving optimum programming problems. Technical Report BR-1303, Raytheon Company, Missle and Space Division. Buhler, J. (2001). Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17(5):419–428. Buntine, W. L. and Weigend, A. S. (1991). Bayesian back-propagation. Complex Systems, 5:603–643. Cardoso, J.-F. (1994). On the performance of orthogonal source separation algorithms. In Proc. EUSIPCO, pages 776–779. Carreira-Perpinan, M. A. (2001). Continuous latent variable models for dimensionality reduction and sequential data reconstruction. PhD thesis, University of Sheffield UK. Carter, M. J., Rudolph, F. J., and Nucci, A. J. (1990). Operational fault tolerance of CMAC networks. In Touretzky, D. S., editor, Advances in Neural Information Processing Systems (NIPS) 2, pages 340–347. San Mateo, CA: Morgan Kaufmann. Casey, M. P. (1996). The dynamics of discrete-time computation, with application to recurrent neural networks and finite state machine extraction. Neural Computation, 8(6):1135–1178. Cauwenberghs, G. (1993). A fast stochastic error-descent algorithm for supervised learning and optimization. In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 5, pages 244–244. Morgan Kaufmann. Chaitin, G. J. (1966). On the length of programs for computing finite binary sequences. Journal of the ACM, 13:547–569. Chalup, S. K. and Blair, A. D. (2003). Incremental training of first order recurrent neural networks to predict a context-sensitive language. Neural Networks, 16(7):955–972. Chellapilla, K., Puri, S., and Simard, P. (2006). High performance convolutional neural networks for document processing. In International Workshop on Frontiers in Handwriting Recognition. Cho, K. (2014). Foundations and Advances in Deep Learning. PhD thesis, Aalto University School of Science. 31 Cho, K., Ilin, A., and Raiko, T. (2012). Tikhonov-type regularization for restricted Boltzmann machines. In Intl. Conf. on Artificial Neural Networks (ICANN) 2012, pages 81–88. Springer. Cho, K., Raiko, T., and Ilin, A. (2013). Enhanced gradient for training restricted Boltzmann machines. Neural Computation, 25(3):805–831. Church, A. (1936). An unsolvable problem of elementary number theory. American Journal of Mathematics, 58:345–363. Ciresan, D. C., Giusti, A., Gambardella, L. M., and Schmidhuber, J. (2012a). Deep neural networks segment neuronal membranes in electron microscopy images. In Advances in Neural Information Processing Systems (NIPS), pages 2852–2860. Ciresan, D. C., Giusti, A., Gambardella, L. M., and Schmidhuber, J. (2013). Mitosis detection in breast cancer histology images with deep neural networks. In Proc. MICCAI, volume 2, pages 411–418. Ciresan, D. C., Meier, U., Gambardella, L. M., and Schmidhuber, J. (2010). Deep big simple neural nets for handwritten digit recogntion. Neural Computation, 22(12):3207–3220. Ciresan, D. C., Meier, U., Masci, J., Gambardella, L. M., and Schmidhuber, J. (2011a). Flexible, high performance convolutional neural networks for image classification. In Intl. Joint Conference on Artificial Intelligence IJCAI, pages 1237–1242. Ciresan, D. C., Meier, U., Masci, J., and Schmidhuber, J. (2011b). A committee of neural networks for traffic sign classification. In International Joint Conference on Neural Networks (IJCNN), pages 1918– 1921. Ciresan, D. C., Meier, U., Masci, J., and Schmidhuber, J. (2012b). Multi-column deep neural network for traffic sign classification. Neural Networks, 32:333–338. Ciresan, D. C., Meier, U., and Schmidhuber, J. (2012c). Multi-column deep neural networks for image classification. In IEEE Conference on Computer Vision and Pattern Recognition CVPR 2012. Long preprint arXiv:1202.2745v1 [cs.CV]. Ciresan, D. C., Meier, U., and Schmidhuber, J. (2012d). Transfer learning for Latin and Chinese characters with deep neural networks. In International Joint Conference on Neural Networks (IJCNN), pages 1301– 1306. Ciresan, D. C. and Schmidhuber, J. (2013). Multi-column deep neural networks for offline handwritten Chinese character classification. Technical report, IDSIA. arXiv:1309.0261. Cliff, D. T., Husbands, P., and Harvey, I. (1993). Evolving recurrent dynamical networks for robot control. In Artificial Neural Nets and Genetic Algorithms, pages 428–435. Springer. Clune, J., Stanley, K. O., Pennock, R. T., and Ofria, C. (2011). On the performance of indirect encoding across the continuum of regularity. Trans. Evol. Comp, 15(3):346–367. Coates, A., Huval, B., Wang, T.,Wu, D. J., Ng, A. Y., and Catanzaro, B. (2013). Deep learning with COTS HPC systems. In Proc. International Conference on Machine learning (ICML’13). Cochocki, A. and Unbehauen, R. (1993). Neural networks for optimization and signal processing. John Wiley & Sons, Inc. Collobert, R. and Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning (ICML), pages 160–167. ACM. Comon, P. (1994). Independent component analysis – a new concept? Signal Processing, 36(3):287–314. 32 Connor, J., Martin, D. R., and Atlas, L. E. (1994). Recurrent neural networks and robust time series prediction. IEEE Transactions on Neural Networks, 5(2):240–254. Cook, S. A. (1971). The complexity of theorem-proving procedures. In Proceedings of the 3rd Annual ACM Symposium on the Theory of Computing (STOC’71), pages 151–158. ACM, New York. Cramer, N. L. (1985). A representation for the adaptive generation of simple sequential programs. In Grefenstette, J., editor, Proceedings of an International Conference on Genetic Algorithms and Their Applications, Carnegie-Mellon University, July 24-26, 1985, Hillsdale NJ. Lawrence Erlbaum Associates. Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math., 31:377–403. Cuccu, G., Luciw, M., Schmidhuber, J., and Gomez, F. (2011). Intrinsically motivated evolutionary search for vision-based reinforcement learning. In Proceedings of the 2011 IEEE Conference on Development and Learning and Epigenetic Robotics IEEE-ICDL-EPIROB, volume 2, pages 1–7. IEEE. Dahl, G., Yu, D., Deng, L., and Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. Audio, Speech, and Language Processing, IEEE Transactions on, 20(1):30–42. Dahl, G. E., Sainath, T. N., and Hinton, G. E. (2013). Improving Deep Neural Networks for LVCSR using Rectified Linear Units and Dropout. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8609–8613. IEEE. D’Ambrosio, D. B. and Stanley, K. O. (2007). A novel generative encoding for exploiting neural network sensor and output geometry. In Proceedings of the Conference on Genetic and Evolutionary Computation (GECCO), pages 974–981. Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. S. (2004). Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th Annual Symposium on Computational Geometry, pages 253–262. ACM. Dayan, P. and Hinton, G. (1993). Feudal reinforcement learning. In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems (NIPS) 5, pages 271–278. Morgan Kaufmann. Dayan, P. and Hinton, G. E. (1996). Varieties of Helmholtz machine. Neural Networks, 9(8):1385–1403. Dayan, P., Hinton, G. E., Neal, R. M., and Zemel, R. S. (1995). The Helmholtz machine. Neural Computation, 7:889–904. Dayan, P. and Zemel, R. (1995). Competition and multiple cause models. Neural Computation, 7:565–579. De Freitas, J. F. G. (2003). Bayesian methods for neural networks. PhD thesis, University of Cambridge. de Souto, M. C., Souto, M. C. P. D., and Oliveira, W. R. D. (1999). The loading problem for pyramidal neural networks. In Electronic Journal on Mathematics of Computation. de Vries, B. and Principe, J. C. (1991). A theory for neural networks with time delays. In Lippmann, R. P., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems (NIPS) 3, pages 162–168. Morgan Kaufmann. Deco, G. and Parra, L. (1997). Non-linear feature extraction by redundancy reduction in an unsupervised stochastic neural network. Neural Networks, 10(4):683–691. Deco, G. and Rolls, E. T. (2005). Neurodynamics of biased competition and cooperation for attention: a model with spiking neurons. Journal of Neurophysiology, 94(1):295–313. 33 DeJong, G. and Mooney, R. (1986). Explanation-based learning: An alternative view. Machine Learning, 1(2):145–176. DeMers, D. and Cottrell, G. (1993). Non-linear dimensionality reduction. In Hanson, S. J., Cowan, J. D., and Giles, C. L., editors, Advances in Neural Information Processing Systems (NIPS) 5, pages 580–587. Morgan Kaufmann. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39. Deng, L. and Yu, D. (2014). Deep Learning: Methods and Applications. NOW Publishers. Deville, Y. and Lau, K. K. (1994). Logic program synthesis. Journal of Logic Programming, 19(20):321– 350. Di Lena, P., Nagata, K., and Baldi, P. (2012). Deep architectures for protein contact map prediction. Bioinformatics, 28:2449–2457. Dickmanns, D., Schmidhuber, J., and Winklhofer, A. (1987). Der genetische Algorithmus: Eine Implementierung in Prolog. Technical Report, Inst. of Informatics, Tech. Univ. Munich. http://www.idsia.ch/˜juergen/geneticprogramming.html. Dickmanns, E. D., Behringer, R., Dickmanns, D., Hildebrandt, T., Maurer, M., Thomanek, F., and Schiehlen, J. (1994). The seeing passenger car ’VaMoRs-P’. In Proc. Int. Symp. on Intelligent Vehicles ’94, Paris, pages 68–73. Dietterich, T. G. (2000a). Ensemble methods in machine learning. In Multiple classifier systems, pages 1–15. Springer. Dietterich, T. G. (2000b). Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. (JAIR), 13:227–303. Director, S. W. and Rohrer, R. A. (1969). Automated network design - the frequency-domain case. IEEE Trans. Circuit Theory, CT-16:330–337. Dorffner, G. (1996). Neural networks for time series processing. In Neural Network World. Doya, K., Samejima, K., ichi Katagiri, K., and Kawato, M. (2002). Multiple model-based reinforcement learning. Neural Computation, 14(6):1347–1369. Dreyfus, S. E. (1962). The numerical solution of variational problems. Journal of Mathematical Analysis and Applications, 5(1):30–45. Dreyfus, S. E. (1973). The computational solution of optimal control problems with time lag. IEEE Transactions on Automatic Control, 18(4):383–385. Duchi, J., Hazan, E., and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning, 12:2121–2159. Egorova, A., Gloye, A., G¨oktekin, C., Liers, A., Luft, M., Rojas, R., Simon, M., Tenchio, O., and Wiesel, F. (2004). FU-Fighters Small Size 2004, Team Description. RoboCup 2004 Symposium: Papers and Team Description Papers. CD edition. Elman, J. L. (1988). Finding structure in time. Technical Report CRL 8801, Center for Research in Language, University of California, San Diego. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., and Bengio, S. (2010). Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res., 11:625–660. Eubank, R. L. (1988). Spline smoothing and nonparametric regression. In Farlow, S., editor, Self- Organizing Methods in Modeling. Marcel Dekker, New York. 34 Euler, L. (1744). Methodus inveniendi. Faggin, F. (1992). Neural network hardware. In International Joint Conference on Neural Networks (IJCNN), volume 1, page 153. Fahlman, S. E. (1988). An empirical study of learning speed in back-propagation networks. Technical Report CMU-CS-88-162, Carnegie-Mellon Univ. Fahlman, S. E. (1991). The recurrent cascade-correlation learning algorithm. In Lippmann, R. P., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems (NIPS) 3, pages 190–196. Morgan Kaufmann. Farabet, C., Couprie, C., Najman, L., and LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1915–1929. Farlow, S. J. (1984). Self-organizing methods in modeling: GMDH type algorithms, volume 54. CRC Press. Feldkamp, L. A., Prokhorov, D. V., Eagen, C. F., and Yuan, F. (1998). Enhanced multi-stream Kalman filter training for recurrent networks. In Nonlinear Modeling, pages 29–53. Springer. Feldkamp, L. A., Prokhorov, D. V., and Feldkamp, T.M. (2003). Simple and conditioned adaptive behavior from Kalman filter trained recurrent networks. Neural Networks, 16(5):683–689. Feldkamp, L. A. and Puskorius, G. V. (1998). A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification. Proceedings of the IEEE, 86(11):2259–2277. Fern´andez, S., Graves, A., and Schmidhuber, J. (2007). An application of recurrent neural networks to discriminative keyword spotting. In Proc. ICANN (2), pages 220–229. Fernandez, S., Graves, A., and Schmidhuber, J. (2007). Sequence labelling in structured domains with hierarchical recurrent neural networks. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI). Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America, 4:2379–2394. Field, D. J. (1994). What is the goal of sensory coding? Neural Computation, 6:559–601. Fieres, J., Schemmel, J., and Meier, K. (2008). Realizing biological spiking network models in a configurable wafer-scale hardware system. In IEEE International Joint Conference on Neural Networks, pages 969–976. Fine, S., Singer, Y., and Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32(1):41–62. FitzHugh, R. (1961). Impulses and physiological states in theoretical models of nerve membrane. Biophysical Journal, 1(6):445–466. Fletcher, R. and Powell, M. J. (1963). A rapidly convergent descent method for minimization. The Computer Journal, 6(2):163–168. Fogel, D. B., Fogel, L. J., and Porto, V. (1990). Evolving neural networks. Biological Cybernetics, 63(6):487–493. Fogel, L., Owens, A., and Walsh, M. (1966). Artificial Intelligence through Simulated Evolution. Wiley, New York. F¨oldi´ak, P. (1990). Forming sparse representations by local anti-Hebbian learning. Biological Cybernetics, 64:165–170. 35 F¨oldi´ak, P. and Young, M. P. (1995). Sparse coding in the primate cortex. In Arbib, M. A., editor, The Handbook of Brain Theory and Neural Networks, pages 895–898. The MIT Press. F¨orster, A., Graves, A., and Schmidhuber, J. (2007). RNN-based Learning of Compact Maps for Efficient Robot Localization. In 15th European Symposium on Artificial Neural Networks, ESANN, pages 537– 542, Bruges, Belgium. Franzius, M., Sprekeler, H., andWiskott, L. (2007). Slowness and sparseness lead to place, head-direction, and spatial-view cells. PLoS Computational Biology, 3(8):166. Frinken, V., Zamora-Martinez, F., Espana-Boquera, S., Castro-Bleda, M. J., Fischer, A., and Bunke, H. (2012). Long-short term memory neural networks language modeling for handwriting recognition. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 701–704. IEEE. Fritzke, B. (1994). A growing neural gas network learns topologies. In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, NIPS, pages 625–632. MIT Press. Fu, K. S. (1977). Syntactic Pattern Recognition and Applications. Berlin, Springer. Fukada, T., Schuster, M., and Sagisaka, Y. (1999). Phoneme boundary estimation using bidirectional recurrent neural networks and its applications. Systems and Computers in Japan, 30(4):20–30. Fukushima, K. (1979). Neural network model for a mechanism of pattern recognition unaffected by shift in position - Neocognitron. Trans. IECE, J62-A(10):658–665. Fukushima, K. (1980). Neocognitron: A self-organizing neural network for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4):193–202. Fukushima, K. (2011). Increasing robustness against background noise: visual pattern recognition by a Neocognitron. Neural Networks, 24(7):767–778. Fukushima, K. (2013a). Artificial vision by multi-layered neural networks: Neocognitron and its advances. Neural Networks, 37:103–119. Fukushima, K. (2013b). Training multi-layered neural network Neocognitron. Neural Networks, 40:18–31. Gauss, C. F. (1809). Theoria motus corporum coelestium in sectionibus conicis solem ambientium. Gauss, C. F. (1821). Theoria combinationis observationum erroribus minimis obnoxiae (Theory of the combination of observations least subject to error). Ge, S., Hang, C. C., Lee, T. H., and Zhang, T. (2010). Stable adaptive neural network control. Springer. Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4:1–58. Gers, F. A. and Schmidhuber, J. (2000). Recurrent nets that time and count. In Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on, volume 3, pages 189–194. IEEE. Gers, F. A. and Schmidhuber, J. (2001). LSTM recurrent networks learn simple context free and context sensitive languages. IEEE Transactions on Neural Networks, 12(6):1333–1340. Gers, F. A., Schmidhuber, J., and Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10):2451–2471. Gers, F. A., Schraudolph, N., and Schmidhuber, J. (2002). Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, 3:115–143. Gerstner, W. and Kistler, W. K. (2002). Spiking Neuron Models. Cambridge University Press. 36 Gerstner,W. and van Hemmen, J. L. (1992). Associativememory in a network of spiking neurons. Network: Computation in Neural Systems, 3(2):139–164. Ghavamzadeh, M. and Mahadevan, S. (2003). Hierarchical policy gradient algorithms. In Proceedings of the Twentieth Conference on Machine Learning (ICML-2003), pages 226–233. Gherrity, M. (1989). A learning algorithm for analog fully recurrent neural networks. In IEEE/INNS International Joint Conference on Neural Networks, San Diego, volume 1, pages 643–644. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2013). Rich feature hierarchies for accurate object detection and semantic segmentation. Technical Report arxiv.org/abs/1311.2524, UC Berkeley and ICSI. Gisslen, L., Luciw, M., Graziano, V., and Schmidhuber, J. (2011). Sequential constant size compressor for reinforcement learning. In Proc. Fourth Conference on Artificial General Intelligence (AGI), Google, Mountain View, CA, pages 31–40. Springer. Giusti, A., Ciresan, D. C., Masci, J., Gambardella, L.M., and Schmidhuber, J. (2013). Fast image scanning with deep max-pooling convolutional neural networks. In Proc. ICIP. Glackin, B., McGinnity, T. M., Maguire, L. P., Wu, Q., and Belatreche, A. (2005). A novel approach for the implementation of large scale spiking neural networks on FPGA hardware. In Computational Intelligence and Bioinspired Systems, pages 552–563. Springer. Glasmachers, T., Schaul, T., Sun, Y., Wierstra, D., and Schmidhuber, J. (2010). Exponential Natural Evolution Strategies. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pages 393–400. ACM. Glorot, X., Bordes, A., and Bengio, Y. (2011). Deep sparse rectifier networks. In AISTATS, volume 15, pages 315–323. Gloye, A.,Wiesel, F., Tenchio, O., and Simon,M. (2005). Reinforcing the driving quality of soccer playing robots by anticipation. IT - Information Technology, 47(5). G¨odel, K. (1931). ¨Uber formal unentscheidbare S¨atze der Principia Mathematica und verwandter Systeme I. Monatshefte f¨ur Mathematik und Physik, 38:173–198. Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Addison- Wesley, Reading, MA. Goldfarb, D. (1970). A family of variable-metric methods derived by variational means. Mathematics of computation, 24(109):23–26. Golub, G., Heath, H., andWahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21:215–224. Gomez, F. J. (2003). Robust Nonlinear Control through Neuroevolution. PhD thesis, Department of Computer Sciences, University of Texas at Austin. Gomez, F. J. and Miikkulainen, R. (2003). Active guidance for a finless rocket using neuroevolution. In Proc. GECCO 2003, Chicago. Gomez, F. J. and Schmidhuber, J. (2005). Co-evolving recurrent neurons learn deep memory POMDPs. In Proc. of the 2005 conference on genetic and evolutionary computation (GECCO), Washington, D. C. ACM Press, New York, NY, USA. Gomez, F. J., Schmidhuber, J., and Miikkulainen, R. (2008). Accelerated neural evolution through cooperatively coevolved synapses. Journal of Machine Learning Research, 9(May):937–965. Gomi, H. and Kawato, M. (1993). Neural network control for a closed-loop system using feedback-errorlearning. Neural Networks, 6(7):933–946. 37 Goodfellow, I., Mirza, M., Da, X., Courville, A., and Bengio, Y. (2014a). An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks. TR arXiv:1312.6211v2. Goodfellow, I. J., Bulatov, Y., Ibarz, J., Arnoud, S., and Shet, V. (2014b). Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv preprint arXiv:1312.6082 v4. Goodfellow, I. J., Courville, A., and Bengio, Y. (2011). Spike-and-slab sparse coding for unsupervised feature discovery. In NIPS Workshop on Challenges in Learning Hierarchical Models. Goodfellow, I. J., Courville, A. C., and Bengio, Y. (2012). Large-scale feature learning with spike-and-slab sparse coding. In Proceedings of the 29th International Conference on Machine Learning. Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. (2013). Maxout networks. In International Conference on Machine Learning (ICML). Graves, A. (2011). Practical variational inference for neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 2348–2356. Graves, A., Eck, D., Beringer, N., and Schmidhuber, J. (2003). Isolated digit recognition with LSTM recurrent networks. In First International Workshop on Biologically Inspired Approaches to Advanced Information Technology, Lausanne. Graves, A., Fernandez, S., Gomez, F. J., and Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural nets. In ICML’06: Proceedings of the 23rd International Conference on Machine Learning, pages 369–376. Graves, A., Fernandez, S., Liwicki, M., Bunke, H., and Schmidhuber, J. (2008). Unconstrained on-line handwriting recognition with recurrent neural networks. In Platt, J., Koller, D., Singer, Y., and Roweis, S., editors, Advances in Neural Information Processing Systems (NIPS) 20, pages 577–584. MIT Press, Cambridge, MA. Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., and Schmidhuber, J. (2009). A novel connectionist system for improved unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5). Graves, A., Mohamed, A.-R., and Hinton, G. E. (2013). Speech recognition with deep recurrent neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 6645–6649. IEEE. Graves, A. and Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5-6):602–610. Graves, A. and Schmidhuber, J. (2009). Offline handwriting recognition with multidimensional recurrent neural networks. In Advances in Neural Information Processing Systems 21, pages 545–552. MIT Press, Cambridge, MA. Graziano, M. (2009). The Intelligent Movement Machine: An Ethological Perspective on the Primate Motor System. Oxford University Press, USA. Griewank, A. (2012). Documenta Mathematica - Extra Volume ISMP, pages 389–400. Grondman, I., Busoniu, L., Lopes, G. A. D., and Babuska, R. (2012). A survey of actor-critic reinforcement learning: Standard and natural policy gradients. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 42(6):1291–1307. Grossberg, S. (1969). Some networks that can learn, remember, and reproduce any number of complicated space-time patterns, I. Journal of Mathematics and Mechanics, 19:53–91. Grossberg, S. (1976a). Adaptive pattern classification and universal recoding, 1: Parallel development and coding of neural feature detectors. Biological Cybernetics, 23:187–202. 38 Grossberg, S. (1976b). Adaptive pattern classification and universal recoding, 2: Feedback, expectation, olfaction, and illusions. Biological Cybernetics, 23. Gruau, F., Whitley, D., and Pyeatt, L. (1996). A comparison between cellular encoding and direct encoding for genetic neural networks. NeuroCOLT Technical Report NC-TR-96-048, ESPRIT Working Group in Neural and Computational Learning, NeuroCOLT 8556. Gr ¨unwald, P. D., Myung, I. J., and Pitt, M. A. (2005). Advances in minimum description length: Theory and applications. MIT press. Gr ¨uttner, M., Sehnke, F., Schaul, T., and Schmidhuber, J. (2010). Multi-Dimensional Deep Memory Atari- Go Players for Parameter Exploring Policy Gradients. In Proceedings of the International Conference on Artificial Neural Networks ICANN, pages 114–123. Springer. Guyon, I., Vapnik, V., Boser, B., Bottou, L., and Solla, S. A. (1992). Structural risk minimization for character recognition. In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems (NIPS) 4, pages 471–479. Morgan Kaufmann. Hadamard, J. (1908). M´emoire sur le probl`eme d’analyse relatif `a l’´equilibre des plaques ´elastiques encastr ´ees. M´emoires pr´esent´es par divers savants `a l’Acad´emie des sciences de l’Institut de France: ´Extrait. Imprimerie nationale. Hansen, N., M¨uller, S. D., and Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary Computation, 11(1):1–18. Hansen, N. and Ostermeier, A. (2001). Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9(2):159–195. Hanson, S. J. and Pratt, L. Y. (1989). Comparing biases for minimal network construction with backpropagation. In Touretzky, D. S., editor, Advances in Neural Information Processing Systems (NIPS) 1, pages 177–185. San Mateo, CA: Morgan Kaufmann. Happel, B. L. and Murre, J. M. (1994). Design and evolution of modular neural network architectures. Neural Networks, 7(6):985–1004. Hashem, S. and Schmeiser, B. (1992). Improving model accuracy using optimal linear combinations of trained neural networks. IEEE Transactions on Neural Networks, 6:792–794. Hassibi, B. and Stork, D. G. (1993). Second order derivatives for network pruning: Optimal brain surgeon. In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 5, pages 164–171. Morgan Kaufmann. Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models. Monographs on Statisics and Applied Probability, 43. Hawkins, J. and George, D. (2006). Hierarchical Temporal Memory - Concepts, Theory, and Terminology. Numenta Inc. Haykin, S. S. (2001). Kalman filtering and neural networks. Wiley Online Library. Hebb, D. O. (1949). The Organization of Behavior. Wiley, New York. Hecht-Nielsen, R. (1989). Theory of the backpropagation neural network. In International Joint Conference on Neural Networks (IJCNN), pages 593–605. IEEE. Heemskerk, J. N. (1995). Overview of neural hardware. Neurocomputers for Brain-Style Processing. Design, Implementation and Application. 39 Hertz, J., Krogh, A., and Palmer, R. (1991). Introduction to the Theory of Neural Computation. Addison- Wesley, Redwood City. Hestenes, M. R. and Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49:409–436. Hihi, S. E. and Bengio, Y. (1996). Hierarchical recurrent neural networks for long-term dependencies. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Systems 8, pages 493–499. MIT Press. Hinton, G. and Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507. Hinton, G. E. (1989). Connectionist learning procedures. Artificial intelligence, 40(1):185–234. Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Comp., 14(8):1771–1800. Hinton, G. E., Dayan, P., Frey, B. J., and Neal, R. M. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268:1158–1160. Hinton, G. E., Deng, L., Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., and Kingsbury, B. (2012a). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., 29(6):82–97. Hinton, G. E. and Ghahramani, Z. (1997). Generative models for discovering sparse distributed representations. Philosophical Transactions of the Royal Society B, 352:1177–1190. Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554. Hinton, G. E. and Sejnowski, T. E. (1986). Learning and relearning in Boltzmann machines. In Parallel Distributed Processing, volume 1, pages 282–317. MIT Press. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012b). Improving neural networks by preventing co-adaptation of feature detectors. Technical Report arXiv:1207.0580. Hinton, G. E. and van Camp, D. (1993). Keeping neural networks simple. In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 11–18. Springer. Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut f ¨ur Informatik, Lehrstuhl Prof. Brauer, Technische Universit¨at M¨unchen. Advisor: J. Schmidhuber. Hochreiter, S., Bengio, Y., Frasconi, P., and Schmidhuber, J. (2001a). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In Kremer, S. C. and Kolen, J. F., editors, A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press. Hochreiter, S. and Obermayer, K. (2005). Sequence classification for protein analysis. In Snowbird Workshop, Snowbird, Utah. Computational and Biological Learning Society. Hochreiter, S. and Schmidhuber, J. (1996). Bridging long time lags by weight guessing and “Long Short- Term Memory”. In Silva, F. L., Principe, J. C., and Almeida, L. B., editors, Spatiotemporal models in biological and artificial systems, pages 65–72. IOS Press, Amsterdam, Netherlands. Serie: Frontiers in Artificial Intelligence and Applications, Volume 37. Hochreiter, S. and Schmidhuber, J. (1997a). Flat minima. Neural Computation, 9(1):1–42. Hochreiter, S. and Schmidhuber, J. (1997b). Long Short-Term Memory. Neural Computation, 9(8):1735– 1780. Based on TR FKI-207-95, TUM (1995). 40 Hochreiter, S. and Schmidhuber, J. (1999). Feature extraction through LOCOCODE. Neural Computation, 11(3):679–714. Hochreiter, S., Younger, A. S., and Conwell, P. R. (2001b). Learning to learn using gradient descent. In Lecture Notes on Comp. Sci. 2130, Proc. Intl. Conf. on Artificial Neural Networks (ICANN-2001), pages 87–94. Springer: Berlin, Heidelberg. Hodgkin, A. L. and Huxley, A. F. (1952). A quantitative description ofmembrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4):500. Holden, S. B. (1994). On the Theory of Generalization and Self-Structuring in Linearly Weighted Connectionist Networks. PhD thesis, Cambridge University, Engineering Department. Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proc. of the National Academy of Sciences, 79:2554–2558. Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366. Hubel, D. H. and Wiesel, T. (1962). Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. Journal of Physiology (London), 160:106–154. Huffman, D. A. (1952). A method for construction of minimum-redundancy codes. Proceedings IRE, 40:1098–1101. Hutter, M. (2002). The fastest and shortest algorithm for all well-defined problems. International Journal of Foundations of Computer Science, 13(3):431–443. (On J. Schmidhuber’s SNF grant 20-61847). Hutter, M. (2005). Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin. (On J. Schmidhuber’s SNF grant 20-61847). Hyv¨arinen, A., Hoyer, P., and Oja, E. (1999). Sparse code shrinkage: Denoising by maximum likelihood estimation. In Kearns, M., Solla, S. A., and Cohn, D., editors, Advances in Neural Information Processing Systems (NIPS) 12. MIT Press. Hyv¨arinen, A., Karhunen, J., and Oja, E. (2004). Independent component analysis. John Wiley & Sons. ICPR 2012 Contest on Mitosis Detection in Breast Cancer Histological Images (2012). IPAL Laboratory and TRIBVN Company and Pitie-Salpetriere Hospital and CIALAB of Ohio State Univ., http://ipal.cnrs.fr/ICPR2012/. Igel, C. (2003). Neuroevolution for reinforcement learning using evolution strategies. In Reynolds, R., Abbass, H., Tan, K. C., Mckay, B., Essam, D., and Gedeon, T., editors, Congress on Evolutionary Computation (CEC 2003), volume 4, pages 2588–2595. IEEE. Ikeda, S., Ochiai, M., and Sawaragi, Y. (1976). Sequential GMDH algorithm and its application to river flow prediction. IEEE Transactions on Systems, Man and Cybernetics, (7):473–479. Indermuhle, E., Frinken, V., and Bunke, H. (2012). Mode detection in online handwritten documents using BLSTM neural networks. In Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on, pages 302–307. IEEE. Indermuhle, E., Frinken, V., Fischer, A., and Bunke, H. (2011). Keyword spotting in online handwritten documents containing text and non-text using BLSTM neural networks. In Document Analysis and Recognition (ICDAR), 2011 International Conference on, pages 73–77. IEEE. 41 Ivakhnenko, A. G. (1968). The group method of data handling – a rival of the method of stochastic approximation. Soviet Automatic Control, 13(3):43–55. Ivakhnenko, A. G. (1971). Polynomial theory of complex systems. IEEE Transactions on Systems, Man and Cybernetics, (4):364–378. Ivakhnenko, A. G. (1995). The review of problems solvable by algorithms of the group method of data handling (GMDH). Pattern Recognition and Image Analysis / Raspoznavaniye Obrazov I Analiz Izobrazhenii, 5:527–535. Ivakhnenko, A. G. and Lapa, V. G. (1965). Cybernetic Predicting Devices. CCM Information Corporation. Ivakhnenko, A. G., Lapa, V. G., and McDonough, R. N. (1967). Cybernetics and forecasting techniques. American Elsevier, NY. Izhikevich, E. M. et al. (2003). Simple model of spiking neurons. IEEE Transactions on Neural Networks, 14(6):1569–1572. Jaakkola, T., Singh, S. P., and Jordan, M. I. (1995). Reinforcement learning algorithm for partially observable Markov decision problems. In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems (NIPS) 7, pages 345–352. MIT Press. Jackel, L., Boser, B., Graf, H.-P., Denker, J., LeCun, Y., Henderson, D., Matan, O., Howard, R., and Baird, H. (1990). VLSI implementation of electronic neural networks: and example in character recognition. In IEEE, editor, IEEE International Conference on Systems, Man, and Cybernetics, pages 320–322, Los Angeles, CA. Jacob, C., Lindenmayer, A., and Rozenberg, G. (1994). Genetic L-System Programming. In Parallel Problem Solving from Nature III, Lecture Notes in Computer Science. Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1(4):295–307. Jaeger, H. (2001). The ”echo state” approach to analysing and training recurrent neural networks. Technical Report GMD Report 148, German National Research Center for Information Technology. Jaeger, H. (2002). Short term memory in echo state networks. GMD-Report 152, GMD - German National Research Institute for Computer Science. Jaeger, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 304:78–80. Jain, V. and Seung, S. (2009). Natural image denoising with convolutional networks. In Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems (NIPS) 21, pages 769–776. Curran Associates, Inc. Jameson, J. (1991). Delayed reinforcement learning with multiple time scale hierarchical backpropagated adaptive critics. In Neural Networks for Control. Ji, S., Xu, W., Yang, M., and Yu, K. (2013). 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):221–231. Jim, K., Giles, C. L., and Horne, B. G. (1995). Effects of noise on convergence and generalization in recurrent networks. In Tesauro, G., Touretzky, D., and Leen, T., editors, Advances in Neural Information Processing Systems (NIPS) 7, page 649. San Mateo, CA: Morgan Kaufmann. Jin, X., Lujan, M., Plana, L. A., Davies, S., Temple, S., and Furber, S. B. (2010). Modeling spiking neural networks on spinnaker. Computing in Science & Engineering, 12(5):91–97. 42 Jodogne, S. R. and Piater, J. H. (2007). Closed-loop learning of visual control policies. J. Artificial Intelligence Research, 28:349–391. Jordan,M. I. (1986). Serial order: A parallel distributed processing approach. Technical Report ICS Report 8604, Institute for Cognitive Science, University of California, San Diego. Jordan, M. I. (1988). Supervised learning and systems with excess degrees of freedom. Technical Report COINS TR 88-27, Massachusetts Institute of Technology. Jordan, M. I. and Rumelhart, D. E. (1990). Supervised learning with a distal teacher. Technical Report Occasional Paper #40, Center for Cog. Sci., Massachusetts Institute of Technology. Jordan, M. I. and Sejnowski, T. J. (2001). Graphical models: Foundations of neural computation. MIT Press. Juang, C.-F. (2004). A hybrid of genetic algorithm and particle swarm optimization for recurrent network design. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 34(2):997–1006. Judd, J. S. (1990). Neural network design and the complexity of learning. Neural network modeling and connectionism. MIT Press. Jutten, C. and Herault, J. (1991). Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24(1):1–10. Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. (1995). Planning and acting in partially observable stochastic domains. Technical report, Brown University, Providence RI. Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996). Reinforcement learning: a survey. Journal of AI research, 4:237–285. Kalinke, Y. and Lehmann, H. (1998). Computation in recurrent neural networks: From counters to iterated function systems. In Antoniou, G. and Slaney, J., editors, Advanced Topics in Artificial Intelligence, Proceedings of the 11th Australian Joint Conference on Artificial Intelligence, volume 1502 of LNAI, Berlin, Heidelberg. Springer. Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1):35–45. Karhunen, J. and Joutsensalo, J. (1995). Generalizations of principal component analysis, optimization problems, and neural networks. Neural Networks, 8(4):549–562. Kasabov, N. K. (2014). Neucube: A spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data. Neural Networks. Kelley, H. J. (1960). Gradient theory of optimal flight paths. ARS Journal, 30(10):947–954. Kempter, R., Gerstner,W., and Van Hemmen, J. L. (1999). Hebbian learning and spiking neurons. Physical Review E, 59(4):4498. Kerlirzin, P. and Vallet, F. (1993). Robustness in multilayer perceptrons. Neural Computation, 5(1):473– 482. Khan, M. M., Lester, D. R., Plana, L. A., Rast, A., Jin, X., Painkras, E., and Furber, S. B. (2008). Spinnaker: mapping neural networks onto a massively-parallel chip multiprocessor. In International Joint Conference on Neural Networks (IJCNN), pages 2849–2856. IEEE. Kimura, H., Miyazaki, K., and Kobayashi, S. (1997). Reinforcement learning in POMDPs with function approximation. In ICML, volume 97, pages 152–160. Kistler, W. M., Gerstner, W., and van Hemmen, J. L. (1997). Reduction of the Hodgkin-Huxley equations to a single-variable threshold model. Neural Computation, 9(5):1015–1045. 43 Kitano, H. (1990). Designing neural networks using genetic algorithms with graph generation system. Complex Systems, 4:461–476. Klapper-Rybicka, M., Schraudolph, N. N., and Schmidhuber, J. (2001). Unsupervised learning in LSTM recurrent neural networks. In Lecture Notes on Comp. Sci. 2130, Proc. Intl. Conf. on Artificial Neural Networks (ICANN-2001), pages 684–691. Springer: Berlin, Heidelberg. Kohl, N. and Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International Conference on, volume 3, pages 2619–2624. IEEE. Kohonen, T. (1972). Correlation matrix memories. Computers, IEEE Transactions on, 100(4):353–359. Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43(1):59–69. Kohonen, T. (1988). Self-Organization and Associative Memory. Springer, second edition. Kolmogorov, A. N. (1965a). On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition. Doklady Akademii. Nauk USSR,, 114:679–681. Kolmogorov, A. N. (1965b). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1:1–11. Kompella, V. R., Luciw, M. D., and Schmidhuber, J. (2012). Incremental slow feature analysis: Adaptive low-complexity slow feature updating from high-dimensional input streams. Neural Computation, 24(11):2994–3024. Kondo, T. (1998). GMDH neural network algorithm using the heuristic self-organization method and its application to the pattern identification problem. In Proceedings of the 37th SICE Annual Conference SICE’98, pages 1143–1148. IEEE. Kondo, T. and Ueno, J. (2008). Multi-layered GMDH-type neural network self-selecting optimum neural network architecture and its application to 3-dimensional medical image recognition of blood vessels. International Journal of Innovative Computing, Information and Control, 4(1):175–187. Kord´ık, P., N´aplava, P., Snorek, M., and Genyk-Berezovskyj, M. (2003). Modified GMDH method and models quality evaluation by visualization. Control Systems and Computers, 2:68–75. Korkin, M., de Garis, H., Gers, F., and Hemmi, H. (1997). CBM (CAM-Brain Machine) - a hardware tool which evolves a neural net module in a fraction of a second and runs a million neuron artificial brain in real time. Kosko, B. (1990). Unsupervised learning in noise. IEEE Transactions on Neural Networks, 1(1):44–57. Koutn´ık, J., Cuccu, G., Schmidhuber, J., and Gomez, F. (July 2013). Evolving large-scale neural networks for vision-based reinforcement learning. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pages 1061–1068, Amsterdam. ACM. Koutn´ık, J., Gomez, F., and Schmidhuber, J. (2010). Evolving neural networks in compressed weight space. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 619–626. Koutn´ık, J., Greff, K., Gomez, F., and Schmidhuber, J. (2014). A Clockwork RNN. Technical Report arXiv:1402.3511 [cs.NE], The Swiss AI Lab IDSIA. To appear at ICML’2014. Koza, J. R. (1992). Genetic Programming – On the Programming of Computers by Means of Natural Selection. MIT Press. 44 Kramer, M. (1991). Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37:233–243. Kremer, S. C. and Kolen, J. F. (2001). Field guide to dynamical recurrent networks. Wiley-IEEE Press. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS 2012), page 4. Krogh, A. and Hertz, J. A. (1992). A simple weight decay can improve generalization. In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 4, pages 950–957. Morgan Kaufmann. Kurzweil, R. (2012). How to Create a Mind: The Secret of Human Thought Revealed. Lagoudakis, M. G. and Parr, R. (2003). Least-squares policy iteration. JMLR, 4:1107–1149. Lang, K.,Waibel, A., and Hinton, G. E. (1990). A time-delay neural network architecture for isolated word recognition. Neural Networks, 3:23–43. Lange, S. and Riedmiller, M. (2010). Deep auto-encoder neural networks in reinforcement learning. In Neural Networks (IJCNN), The 2010 International Joint Conference on, pages 1–8. Lapedes, A. and Farber, R. (1986). A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition. Physica D, 22:247–259. Larraanaga, P. and Lozano, J. A. (2001). Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Norwell, MA, USA. Le, Q. V., Ranzato, M., Monga, R., Devin, M., Corrado, G., Chen, K., Dean, J., and Ng, A. Y. (2012). Building high-level features using large scale unsupervised learning. In Proc. ICML’12. LeCun, Y. (1985). Une proc´edure d’apprentissage pour r´eseau `a seuil asym´etrique. Proceedings of Cognitiva 85, Paris, pages 599–604. LeCun, Y. (1988). A theoretical framework for back-propagation. In Touretzky, D., Hinton, G., and Sejnowski, T., editors, Proceedings of the 1988 Connectionist Models Summer School, pages 21–28, CMU, Pittsburgh, Pa. Morgan Kaufmann. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Back-propagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard,W., and Jackel, L. D. (1990a). Handwritten digit recognition with a back-propagation network. In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 2, pages 396–404. Morgan Kaufmann. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324. LeCun, Y., Denker, J. S., and Solla, S. A. (1990b). Optimal brain damage. In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 2, pages 598–605. Morgan Kaufmann. LeCun, Y., Muller, U., Cosatto, E., and Flepp, B. (2006). Off-road obstacle avoidance through end-to-end learning. In Advances in Neural Information Processing Systems (NIPS 2005). LeCun, Y., Simard, P., and Pearlmutter, B. (1993). Automatic learning rate maximization by on-line estimation of the Hessian’s eigenvectors. In Hanson, S., Cowan, J., and Giles, L., editors, Advances in Neural Information Processing Systems (NIPS 1992), volume 5. Morgan Kaufmann Publishers, San Mateo, CA. Lee, H., Battle, A., Raina, R., and Ng, A. Y. (2007a). Efficient sparse coding algorithms. In Advances in Neural Information Processing Systems (NIPS) 19, pages 801–808. 45 Lee, H., Ekanadham, C., and Ng, A. Y. (2007b). Sparse deep belief net model for visual area V2. In Advances in Neural Information Processing Systems (NIPS), volume 7, pages 873–880. Lee, H., Grosse, R., Ranganath, R., and Ng, A. Y. (2009a). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th International Conference on Machine Learning (ICML), pages 609–616. Lee, H., Pham, P. T., Largman, Y., and Ng, A. Y. (2009b). Unsupervised feature learning for audio classification using convolutional deep belief networks. In Proc. NIPS, volume 9, pages 1096–1104. Lee, L. (1996). Learning of context-free languages: A survey of the literature. Technical Report TR-12-96, Center for Research in Computing Technology, Harvard University, Cambridge, Massachusetts. Legendre, A. M. (1805). Nouvelles m´ethodes pour la d´etermination des orbites des cometes. F. Didot. Legenstein, R., Wilbert, N., and Wiskott, L. (2010). Reinforcement learning on slow features of highdimensional input streams. PLoS Computational Biology, 6(8). Legenstein, R. A. and Maass, W. (2002). Neural circuits for pattern recognition with small total wire length. Theor. Comput. Sci., 287(1):239–249. Leibniz, G. W. (1676). Memoir using the chain rule (cited in TMME 7:2&3 p 321-332, 2010). Lenat, D. B. (1983). Theory formation by heuristic search. Machine Learning, 21. Lenat, D. B. and Brown, J. S. (1984). Why AM an EURISKO appear to work. Artificial Intelligence, 23(3):269–294. Levenberg, K. (1944). A method for the solution of certain problems in least squares. Quarterly of applied mathematics, 2:164–168. Levin, A. U., Leen, T. K., and Moody, J. E. (1994). Fast pruning using principal components. In Advances in Neural Information Processing Systems 6, page 35. Morgan Kaufmann. Levin, A. U. and Narendra, K. S. (1995). Control of nonlinear dynamical systems using neural networks. ii. observability, identification, and control. IEEE transactions on neural networks/a publication of the IEEE Neural Networks Council, 7(1):30–42. Levin, L. A. (1973a). On the notion of a random sequence. Soviet Math. Dokl., 14(5):1413–1416. Levin, L. A. (1973b). Universal sequential search problems. Problems of Information Transmission, 9(3):265–266. Lewicki, M. S. and Olshausen, B. A. (1998). Inferring sparse, overcomplete image codes using an efficient coding framework. In Jordan, M. I., Kearns, M. J., and Solla, S. A., editors, Advances in Neural Information Processing Systems (NIPS) 10, pages 815–821. L’Hˆopital, G. F. A. (1696). Analyse des infiniment petits, pour l’intelligence des lignes courbes. Paris: L’Imprimerie Royale. Li, M. and Vit´anyi, P. M. B. (1997). An Introduction to Kolmogorov Complexity and its Applications (2nd edition). Springer. Lin, L. (1993). Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh. Lin, T., Horne, B., Tino, P., and Giles, C. (1996). Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions on Neural Networks, 7(6):1329–1338. 46 Lin, T., Horne, B. G., Tino, P., and Giles, C. L. (1995). Learning long-term dependencies is not as difficult with NARX recurrent neural networks. Technical Report UMIACS-TR-95-78 and CS-TR-3500, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742. Lindenmayer, A. (1968). Mathematical models for cellular interaction in development. J. Theoret. Biology, 18:280–315. Linnainmaa, S. (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master’s thesis, Univ. Helsinki. Linnainmaa, S. (1976). Taylor expansion of the accumulated rounding error. BIT Numerical Mathematics, 16(2):146–160. Linsker, R. (1988). Self-organization in a perceptual network. IEEE Computer, 21:105–117. Littman, M. L. (1996). Algorithms for Sequential Decision Making. PhD thesis, Brown University. Littman, M. L., Cassandra, A. R., and Kaelbling, L. P. (1995). Learning policies for partially observable environments: Scaling up. In Prieditis, A. and Russell, S., editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 362–370. Morgan Kaufmann Publishers, San Francisco, CA. Liu, S.-C., Kramer, J., Indiveri, G., Delbr ¨uck, T., Burg, T., Douglas, R., et al. (2001). Orientation-selective aVLSI spiking neurons. Neural Networks, 14(6-7):629–643. Ljung, L. (1998). System identification. Springer. Loiacono, D., Cardamone, L., and Lanzi, P. L. (2011). Simulated car racing championship competition software manual. Technical report, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy. Loiacono, D., Lanzi, P. L., Togelius, J., Onieva, E., Pelta, D. A., Butz, M. V., L¨onneker, T. D., Cardamone, L., Perez, D., S´aez, Y., Preuss, M., and Quadflieg, J. (2009). The 2009 simulated car racing championship. Luciw, M., Kompella, V. R., Kazerounian, S., and Schmidhuber, J. (2013). An intrinsic value system for developing multiple invariant representations with incremental slowness learning. Frontiers in Neurorobotics, 7(9). Lusci, A., Pollastri, G., and Baldi, P. (2013). Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. Journal of Chemical Information and Modeling, 53(7):1563–1575. Maas, A. L., Hannun, A. Y., and Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In International Conference on Machine Learning (ICML). Maass, W. (1996). Lower bounds for the computational power of networks of spiking neurons. Neural Computation, 8(1):1–40. Maass, W. (1997). Networks of spiking neurons: the third generation of neural network models. Neural Networks, 10(9):1659–1671. Maass, W. (2000). On the computational power of winner-take-all. Neural Computation, 12:2519–2535. Maass, W., Natschl¨ager, T., and Markram, H. (2002). Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation, 14(11):2531–2560. MacKay, D. J. C. (1992). A practical Bayesian framework for backprop networks. Neural Computation, 4:448–472. 47 MacKay, D. J. C. and Miller, K. D. (1990). Analysis of Linsker’s simulation of Hebbian rules. Neural Computation, 2:173–187. Maclin, R. and Shavlik, J. W. (1993). Using knowledge-based neural networks to improve algorithms: Refining the Chou-Fasman algorithm for protein folding. Machine Learning, 11(2-3):195–215. Maclin, R. and Shavlik, J.W. (1995). Combining the predictions of multiple classifiers: Using competitive learning to initialize neural networks. In Proc. IJCAI, pages 524–531. Madala, H. R. and Ivakhnenko, A. G. (1994). Inductive learning algorithms for complex systems modeling. CRC Press, Boca Raton. Maei, H. R. and Sutton, R. S. (2010). GQ(): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Proceedings of the Third Conference on Artificial General Intelligence, volume 1, pages 91–96. Maex, R. and Orban, G. (1996). Model circuit of spiking neurons generating directional selectivity in simple cells. Journal of Neurophysiology, 75(4):1515–1545. Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22:159. Maniezzo, V. (1994). Genetic evolution of the topology and weight distribution of neural networks. IEEE Transactions on Neural Networks, 5(1):39–53. Manolios, P. and Fanelli, R. (1994). First-order recurrent neural networks and deterministic finite state automata. Neural Computation, 6:1155–1173. Markram, H. (2012). The human brain project. Scientific American, 306(6):50–55. Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial & Applied Mathematics, 11(2):431–441. Martens, J. (2010). Deep learning via Hessian-free optimization. In F¨urnkranz, J. and Joachims, T., editors, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 735–742, Haifa, Israel. Omnipress. Martens, J. and Sutskever, I. (2011). Learning recurrent neural networks with Hessian-free optimization. In Proceedings of the 28th International Conference on Machine Learning, pages 1033–1040. Martinetz, T. M., Ritter, H. J., and Schulten, K. J. (1990). Three-dimensional neural net for learning visuomotor coordination of a robot arm. IEEE Transactions on Neural Networks, 1(1):131–136. Masci, J., Giusti, A., Ciresan, D. C., Fricout, G., and Schmidhuber, J. (2013). A fast learning algorithm for image segmentation with max-pooling convolutional networks. In International Conference on Image Processing (ICIP13), pages 2713–2717. Matsuoka, K. (1992). Noise injection into inputs in back-propagation learning. IEEE Transactions on Systems, Man, and Cybernetics, 22(3):436–440. Mayer, H., Gomez, F., Wierstra, D., Nagy, I., Knoll, A., and Schmidhuber, J. (2008). A system for robotic heart surgery that learns to tie knots using recurrent neural networks. Advanced Robotics, 22(13- 14):1521–1537. McCallum, R. A. (1996). Learning to use selective attention and short-term memory in sequential tasks. In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., and Wilson, S. W., editors, From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pages 315–324. MIT Press, Bradford Books. 48 McCulloch, W. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 7:115–133. Melnik, O., Levy, S. D., and Pollack, J. B. (2000). RAAM for infinite context-free languages. In Proc. IJCNN (5), pages 585–590. Memisevic, R. and Hinton, G. E. (2010). Learning to represent spatial transformations with factored higher-order Boltzmann machines. Neural Computation, 22(6):1473–1492. Menache, I.,Mannor, S., and Shimkin, N. (2002). Q-cut – dynamic discovery of sub-goals in reinforcement learning. In Proc. ECML’02, pages 295–306. Mesnil, G., Dauphin, Y., Glorot, X., Rifai, S., Bengio, Y., Goodfellow, I., Lavoie, E., Muller, X., Desjardins, G., Warde-Farley, D., Vincent, P., Courville, A., and Bergstra, J. (2011). Unsupervised and transfer learning challenge: a deep learning approach. In JMLR W&CP: Proc. Unsupervised and Transfer Learning, volume 7. Meuleau, N., Peshkin, L., Kim, K. E., and Kaelbling, L. P. (1999). Learning finite state controllers for partially observable environments. In 15th International Conference of Uncertainty in AI, pages 427– 436. Miglino, O., Lund, H., and Nolfi, S. (1995). Evolving mobile robots in simulated and real environments. Artificial Life, 2(4):417–434. Miller, G., Todd, P., and Hedge, S. (1989). Designing neural networks using genetic algorithms. In Proceedings of the 3rd International Conference on Genetic Algorithms, pages 379–384.Morgan Kauffman. Miller, K. D. (1994). A model for the development of simple cell receptive fields and the ordered arrangement of orientation columns through activity-dependent competition between on- and off-center inputs. Journal of Neuroscience, 14(1):409–441. Miller, W. T., Werbos, P. J., and Sutton, R. S. (1995). Neural networks for control. MIT Press. Minai, A. A. andWilliams, R. D. (1994). Perturbation response in feedforward networks. Neural Networks, 7(5):783–796. Minsky, M. (1963). Steps toward artificial intelligence. In Feigenbaum, E. and Feldman, J., editors, Computers and Thought, pages 406–450. McGraw-Hill, New York. Minsky, M. and Papert, S. (1969). Perceptrons. Cambridge, MA: MIT Press. Minton, S., Carbonell, J. G., Knoblock, C. A., Kuokka, D. R., Etzioni, O., and Gil, Y. (1989). Explanationbased learning: A problem solving perspective. Artificial Intelligence, 40(1):63–118. Mitchell, T. (1997). Machine Learning. McGraw Hill. Mitchell, T. M., Keller, R. M., and Kedar-Cabelli, S. T. (1986). Explanation-based generalization: A unifying view. Machine Learning, 1(1):47–80. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (Dec 2013). Playing Atari with deep reinforcement learning. Technical Report arXiv:1312.5602 [cs.LG], Deepmind Technologies. Mohamed, A., Dahl, G. E., and Hinton, G. E. (2009). Deep belief networks for phone recognition. In NIPS’22 workshop on deep learning for speech recognition. Mohamed, A. and Hinton, G. E. (2010). Phone recognition using restricted Boltzmann machines. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pages 4354–4357. Molgedey, L. and Schuster, H. G. (1994). Separation of independent signals using time-delayed correlations. Phys. Reviews Letters, 72(23):3634–3637. 49 Møller, M. F. (1993). Exact calculation of the product of the Hessian matrix of feed-forward network error functions and a vector in O(N) time. Technical Report PB-432, Computer Science Department, Aarhus University, Denmark. Montana, D. J. and Davis, L. (1989). Training feedforward neural networks using genetic algorithms. In Proceedings of the 11th International Joint Conference on Artificial Intelligence (IJCAI) - Volume 1, IJCAI’89, pages 762–767, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Montavon, G., Orr, G., and M¨uller, K. (2012). Neural Networks: Tricks of the Trade. Number LNCS 7700 in Lecture Notes in Computer Science Series. Springer Verlag. Moody, J. E. (1989). Fast learning in multi-resolution hierarchies. In Touretzky, D. S., editor, Advances in Neural Information Processing Systems (NIPS) 1, pages 29–39. Morgan Kaufmann. Moody, J. E. (1992). The effective number of parameters: An analysis of generalization and regularization in nonlinear learning systems. In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems (NIPS) 4, pages 847–854. Morgan Kaufmann. Moody, J. E. and Utans, J. (1994). Architecture selection strategies for neural networks: Application to corporate bond rating prediction. In Refenes, A. N., editor, Neural Networks in the Capital Markets. John Wiley & Sons. Moore, A. and Atkeson, C. (1995). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning, 21(3):199–233. Moore, A. and Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13:103–130. Moriarty, D. E. (1997). Symbiotic Evolution of Neural Networks in Sequential Decision Tasks. PhD thesis, Department of Computer Sciences, The University of Texas at Austin. Moriarty, D. E. andMiikkulainen, R. (1996). Efficient reinforcement learning through symbiotic evolution. Machine Learning, 22:11–32. Morimoto, J. and Doya, K. (2000). Robust reinforcement learning. In Leen, T. K., Dietterich, T. G., and Tresp, V., editors, Advances in Neural Information Processing Systems (NIPS) 13, pages 1061–1067. MIT Press. Mosteller, F. and Tukey, J. W. (1968). Data analysis, including statistics. In Lindzey, G. and Aronson, E., editors, Handbook of Social Psychology, Vol. 2. Addison-Wesley. Mozer, M. C. (1989). A focused back-propagation algorithm for temporal sequence recognition. Complex Systems, 3:349–381. Mozer, M. C. (1991). Discovering discrete distributed representations with iterative competitive learning. In Lippmann, R. P., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 627–634. Morgan Kaufmann. Mozer, M. C. (1992). Induction of multiscale temporal structure. In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems (NIPS) 4, pages 275–282. Morgan Kaufmann. Mozer, M. C. and Smolensky, P. (1989). Skeletonization: A technique for trimming the fat from a network via relevance assessment. In Touretzky, D. S., editor, Advances in Neural Information Processing Systems (NIPS) 1, pages 107–115. Morgan Kaufmann. Muller, U. A., Gunzinger, A., and Guggenb¨uhl,W. (1995). Fast neural net simulation with a DSP processor array. IEEE Transactions on Neural Networks, 6(1):203–213. 50 Munro, P. W. (1987). A dual back-propagation scheme for scalar reinforcement learning. Proceedings of the Ninth Annual Conference of the Cognitive Science Society, Seattle, WA, pages 165–176. Murray, A. F. and Edwards, P. J. (1993). Synaptic weight noise during MLP learning enhances faulttolerance, generalisation and learning trajectory. In S. J. Hanson, J. D. C. and Giles, C. L., editors, Advances in Neural Information Processing Systems (NIPS) 5, pages 491–498. San Mateo, CA: Morgan Kaufmann. Nadal, J.-P. and Parga, N. (1994). Non-linear neurons in the low noise limit: a factorial code maximises information transfer. Network, 5:565–581. Nagumo, J., Arimoto, S., and Yoshizawa, S. (1962). An active pulse transmission line simulating nerve axon. Proceedings of the IRE, 50(10):2061–2070. Nair, V. and Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In International Conference on Machine Learning (ICML). Narendra, K. S. and Parthasarathy, K. (1990). Identification and control of dynamical systems using neural networks. Neural Networks, IEEE Transactions on, 1(1):4–27. Narendra, K. S. and Thathatchar, M. A. L. (1974). Learning automata – a survey. IEEE Transactions on Systems, Man, and Cybernetics, 4:323–334. Neal, R. M. (2006). Classification with Bayesian neural networks. In Quinonero-Candela, J., Magnini, B., Dagan, I., and D’Alche-Buc, F., editors, Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment, volume 3944 of Lecture Notes in Computer Science, pages 28–32. Springer. Neal, R. M. and Zhang, J. (2006). High dimensional classification with Bayesian neural networks and Dirichlet diffusion trees. In Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L. A., editors, Feature Extraction: Foundations and Applications, Studies in Fuzziness and Soft Computing, pages 265–295. Springer. Neftci, E., Das, S., Pedroni, B., Kreutz-Delgado, K., and Cauwenberghs, G. (2014). Event-driven contrastive divergence for spiking neuromorphic systems. Frontiers in Neuroscience, 7(272). Neil, D. and Liu, S.-C. (2014). Minitaur, an event-driven FPGA-based spiking network accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, PP(99):1–8. Neti, C., Schneider, M. H., and Young, E. D. (1992). Maximally fault tolerant neural networks. In IEEE Transactions on Neural Networks, volume 3, pages 14–23. Neuneier, R. and Zimmermann, H.-G. (1996). How to train neural networks. In Orr, G. B. and M¨uller, K.- R., editors, Neural Networks: Tricks of the Trade, volume 1524 of Lecture Notes in Computer Science, pages 373–423. Springer. Nguyen, N. and Widrow, B. (1989). The truck backer-upper: An example of self learning in neural networks. In Proceedings of the International Joint Conference on Neural Networks, pages 357–363. IEEE Press. Nilsson, N. J. (1980). Principles of artificial intelligence. Morgan Kaufmann, San Francisco, CA, USA. Nolfi, S., Floreano, D., Miglino, O., and Mondada, F. (1994a). How to evolve autonomous robots: Different approaches in evolutionary robotics. In Brooks, R. A. and Maes, P., editors, Fourth International Workshop on the Synthesis and Simulation of Living Systems (Artificial Life IV), pages 190–197. MIT. Nolfi, S., Parisi, D., and Elman, J. L. (1994b). Learning and evolution in neural networks. Adaptive Behavior, 3(1):5–28. 51 Nowak, E., Jurie, F., and Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In Proc. ECCV 2006, pages 490–503. Springer. Nowlan, S. J. and Hinton, G. E. (1992). Simplifying neural networks by soft weight sharing. Neural Computation, 4:173–193. O’Connor, P., Neil, D., Liu, S.-C., Delbruck, T., and Pfeiffer,M. (2013). Real-time classification and sensor fusion with a spiking deep belief network. Frontiers in Neuroscience, 7(178). Oh, K.-S. and Jung, K. (2004). GPU implementation of neural networks. Pattern Recognition, 37(6):1311– 1314. Oja, E. (1989). Neural networks, principal components, and subspaces. International Journal of Neural Systems, 1(1):61–68. Oja, E. (1991). Data compression, feature extraction, and autoassociation in feedforward neural networks. In Kohonen, T., M¨akisara, K., Simula, O., and Kangas, J., editors, Artificial Neural Networks, volume 1, pages 737–745. Elsevier Science Publishers B.V., North-Holland. Olshausen, B. A. and Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607–609. Omlin, C. and Giles, C. L. (1996). Extraction of rules from discrete-time recurrent neural networks. Neural Networks, 9(1):41–52. O’Reilly, R. (2003). Making working memory work: A computational model of learning in the prefrontal cortex and basal ganglia. Technical Report ICS-03-03, ICS. O’Reilly, R. C. (1996). Biologically plausible error-driven learning using local activation differences: The generalized recirculation algorithm. Neural Computation, 8(5):895–938. Orr, G. and M¨uller, K. (1998). Neural Networks: Tricks of the Trade. Number LNCS 1524 in Lecture Notes in Computer Science Series. Springer Verlag. Ostrovskii, G. M., Volin, Y. M., and Borisov, W. W. (1971). ¨Uber die Berechnung von Ableitungen. Wiss. Z. Tech. Hochschule f¨ur Chemie, 13:382–384. Otte, S., Krechel, D., Liwicki, M., and Dengel, A. (2012). Local feature based online mode detection with recurrent neural networks. In Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, pages 533–537. IEEE Computer Society. Oudeyer, P.-Y., Baranes, A., and Kaplan, F. (2013). Intrinsically motivated learning of real world sensorimotor skills with developmental constraints. In Baldassarre, G. and Mirolli, M., editors, Intrinsically Motivated Learning in Natural and Artificial Systems. Springer. OReilly, R. C.,Wyatte, D., Herd, S., Mingus, B., and Jilk, D. J. (2013). Recurrent processing during object recognition. Frontiers in Psychology, 4:124. Pachitariu, M. and Sahani, M. (2013). Regularization and nonlinearities for neural language models: when are they needed? arXiv preprint arXiv:1301.5650. Palm, G. (1980). On associative memory. Biological Cybernetics, 36. Palm, G. (1992). On the information storage capacity of local learning rules. Neural Computation, 4(2):703–711. Parker, D. B. (1985). Learning-logic. Technical Report TR-47, Center for Comp. Research in Economics and Management Sci., MIT. Pascanu, R., Gulcehre, C., Cho, K., and Bengio, Y. (2013a). How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026. 52 Pascanu, R., Mikolov, T., and Bengio, Y. (2013b). On the difficulty of training recurrent neural networks. In ICML’13: JMLR: W&CP volume 28. Pasemann, F., Steinmetz, U., and Dieckman, U. (1999). Evolving structure and function of neurocontrollers. In Angeline, P. J., Michalewicz, Z., Schoenauer, M., Yao, X., and Zalzala, A., editors, Proceedings of the Congress on Evolutionary Computation, volume 3, pages 1973–1978, Mayflower Hotel, Washington D.C., USA. IEEE Press. Pearlmutter, B. A. (1989). Learning state space trajectories in recurrent neural networks. Neural Computation, 1(2):263–269. Pearlmutter, B. A. (1994). Fast exact multiplication by the Hessian. Neural Computation, 6(1):147–160. Pearlmutter, B. A. (1995). Gradient calculations for dynamic recurrent neural networks: A survey. IEEE Transactions on Neural Networks, 6(5):1212–1228. Pearlmutter, B. A. and Hinton, G. E. (1986). G-maximization: An unsupervised learning procedure for discovering regularities. In Denker, J. S., editor, Neural Networks for Computing: American Institute of Physics Conference Proceedings 151, volume 2, pages 333–338. Peng, J. and Williams, R. J. (1996). Incremental multi-step Q-learning. Machine Learning, 22:283–290. P´erez-Ortiz, J. A., Gers, F. A., Eck, D., and Schmidhuber, J. (2003). Kalman filters improve LSTMnetwork performance in problems unsolvable by traditional recurrent nets. Neural Networks, (16):241–250. Peters, J. (2010). Policy gradient methods. Scholarpedia, 5(11):3698. Peters, J. and Schaal, S. (2008a). Natural actor-critic. Neurocomputing, 71:1180–1190. Peters, J. and Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Network, 21(4):682–697. Pham, V., Kermorvant, C., and Louradour, J. (2013). Dropout Improves Recurrent Neural Networks for Handwriting Recognition. arXiv preprint arXiv:1312.4569. Pineda, F. J. (1987). Generalization of back-propagation to recurrent neural networks. Physical Review Letters, 19(59):2229–2232. Plate, T. A. (1993). Holographic recurrent networks. In S. J. Hanson, J. D. C. and Giles, C. L., editors, Advances in Neural Information Processing Systems (NIPS) 5, pages 34–41. Morgan Kaufmann. Plumbley, M. D. (1991). On information theory and unsupervised neural networks. Dissertation, published as technical report CUED/F-INFENG/TR.78, Engineering Department, Cambridge University. Pollack, J. B. (1988). Implications of recursive distributed representations. In Proc. NIPS, pages 527–536. Pollack, J. B. (1990). Recursive distributed representation. Artificial Intelligence, 46:77–105. Pontryagin, L. S., Boltyanskii, V. G., Gamrelidze, R. V., and Mishchenko, E. F. (1961). The Mathematical Theory of Optimal Processes. Post, E. L. (1936). Finite combinatory processes-formulation 1. The Journal of Symbolic Logic, 1(3):103– 105. Precup, D., Sutton, R. S., and Singh, S. (1998). Multi-time models for temporally abstract planning. pages 1050–1056. Morgan Kaufmann. Prokhorov, D. (2010). A convolutional learning system for object classification in 3-D LIDAR data. IEEE Transactions on Neural Networks, 21(5):858–863. Prokhorov, D., Puskorius, G., and Feldkamp, L. (2001). Dynamical neural networks for control. In Kolen, J. and Kremer, S., editors, A field guide to dynamical recurrent networks, pages 23–78. IEEE Press. 53 Prokhorov, D. and Wunsch, D. (1997). Adaptive critic design. IEEE Transactions on Neural Networks, 8(5):997–1007. Prokhorov, D. V., Feldkamp, L. A., and Tyukin, I. Y. (2002). Adaptive behavior with fixed weights in RNN: an overview. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), pages 2018–2023. Puskorius, G. V. and Feldkamp, L. A. (1994). Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks. IEEE Transactions on Neural Networks, 5(2):279–297. Raiko, T., Valpola, H., and LeCun, Y. (2012). Deep learning made easier by linear transformations in perceptrons. In International Conference on Artificial Intelligence and Statistics, pages 924–932. Raina, R., Madhavan, A., and Ng, A. (2009). Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 873–880. ACM. Ramacher, U., Raab, W., Anlauf, J., Hachmann, U., Beichter, J., Bruels, N., Wesseling, M., Sicheneder, E., Maenner, R., Glaess, J., and Wurz, A. (1993). Multiprocessor and memory architecture of the neurocomputer SYNAPSE-1. International Journal of Neural Systems, 4(4):333–336. Ranzato, M., Poultney, C., Chopra, S., and LeCun, Y. (2006). Efficient learning of sparse representations with an energy-based model. In et al., J. P., editor, Advances in Neural Information Processing Systems (NIPS 2006). MIT Press. Ranzato, M. A., Huang, F., Boureau, Y., and LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Proc. Computer Vision and Pattern Recognition Conference (CVPR’07), pages 1–8. IEEE Press. Rechenberg, I. (1971). Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Dissertation. Published 1973 by Fromman-Holzboog. Redlich, A. N. (1993). Redundancy reduction as a strategy for unsupervised learning. Neural Computation, 5:289–304. Refenes, N. A., Zapranis, A., and Francis, G. (1994). Stock performance modeling using neural networks: a comparative study with regression models. Neural Networks, 7(2):375–388. Riedmiller,M. (2005). Neural fitted Q iteration—first experiences with a data efficient neural reinforcement learning method. In Proc. ECML-2005, pages 317–328. Springer-Verlag Berlin Heidelberg. Riedmiller, M. and Braun, H. (1993). A direct adaptive method for faster backpropagation learning: The Rprop algorithm. In Proc. IJCNN, pages 586–591. IEEE Press. Riedmiller, M., Lange, S., and Voigtlaender, A. (2012). Autonomous reinforcement learning on raw visual input data in a real world application. In International Joint Conference on Neural Networks (IJCNN), pages 1–8, Brisbane, Australia. Riesenhuber,M. and Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nat. Neurosci., 2(11):1019–1025. Ring, M., Schaul, T., and Schmidhuber, J. (2011). The two-dimensional organization of behavior. In Proceedings of the First Joint Conference on Development Learning and on Epigenetic Robotics ICDLEPIROB, Frankfurt. Ring, M. B. (1991). Incremental development of complex behaviors through automatic construction of sensory-motor hierarchies. In Birnbaum, L. and Collins, G., editors, Machine Learning: Proceedings of the Eighth International Workshop, pages 343–347. Morgan Kaufmann. 54 Ring, M. B. (1993). Learning sequential tasks by incrementally adding higher orders. In S. J. Hanson, J. D. C. and Giles, C. L., editors, Advances in Neural Information Processing Systems 5, pages 115–122. Morgan Kaufmann. Ring, M. B. (1994). Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, Austin, Texas 78712. Rissanen, J. (1986). Stochastic complexity and modeling. The Annals of Statistics, 14(3):1080–1100. Ritter, H. and Kohonen, T. (1989). Self-organizing semantic maps. Biological Cybernetics, 61(4):241–254. Robinson, A. J. and Fallside, F. (1987). The utility driven dynamic error propagation network. Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering Department. Robinson, T. and Fallside, F. (1989). Dynamic reinforcement driven error propagation networks with application to game playing. In Proceedings of the 11th Conference of the Cognitive Science Society, Ann Arbor, pages 836–843. Rodriguez, P. and Wiles, J. (1998). Recurrent neural networks can learn to implement symbol-sensitive counting. In Advances in Neural Information Processing Systems (NIPS), volume 10, pages 87–93. The MIT Press. Rodriguez, P., Wiles, J., and Elman, J. (1999). A recurrent neural network that learns to count. Connection Science, 11(1):5–40. Roggen, D., Hofmann, S., Thoma, Y., and Floreano, D. (2003). Hardware spiking neural network with runtime reconfigurable connectivity in an autonomous robot. In Proc. NASA/DoD Conference on Evolvable Hardware, 2003, pages 189–198. IEEE. Rohwer, R. (1989). The ‘moving targets’ training method. In Kindermann, J. and Linden, A., editors, Proceedings of ‘Distributed Adaptive Neural Information Processing’, St.Augustin, 24.-25.5,. Oldenbourg. Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386. Rosenblatt, F. (1962). Principles of Neurodynamics. Spartan, New York. Roux, L., Racoceanu, D., Lomenie, N., Kulikova, M., Irshad, H., Klossa, J., Capron, F., Genestie, C., Naour, G. L., and Gurcan, M. N. (2013). Mitosis detection in breast cancer histological images - an ICPR 2012 contest. J. Pathol. Inform., 4:8. Rubner, J. and Schulten, K. (1990). Development of feature detectors by self-organization: A network model. Biological Cybernetics, 62:193–199. Rubner, J. and Tavan, P. (1989). A self-organization network for principal-component analysis. Europhysics Letters, 10:693–698. R¨uckstieß, T., Felder, M., and Schmidhuber, J. (2008). State-Dependent Exploration for policy gradient methods. In et al., W. D., editor, European Conference on Machine Learning (ECML) and Principles and Practice of Knowledge Discovery in Databases 2008, Part II, LNAI 5212, pages 234–249. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning internal representations by error propagation. In Rumelhart, D. E. and McClelland, J. L., editors, Parallel Distributed Processing, volume 1, pages 318–362. MIT Press. Rumelhart, D. E. and Zipser, D. (1986). Feature discovery by competitive learning. In Parallel Distributed Processing, pages 151–193. MIT Press. Rummery, G. and Niranjan, M. (1994). On-line Q-learning using connectionist sytems. Technical Report CUED/F-INFENG-TR 166, Cambridge University, UK. 55 Russell, S. J., Norvig, P., Canny, J. F., Malik, J. M., and Edwards, D. D. (1995). Artificial Intelligence: a Modern Approach, volume 2. Englewood Cliffs: Prentice Hall. Saito, K. and Nakano, R. (1997). Partial BFGS update and efficient step-length calculation for three-layer neural networks. Neural Computation, 9(1):123–141. Salakhutdinov, R. and Hinton, G. (2009). Semantic hashing. Int. J. Approx. Reasoning, 50(7):969–978. Sałustowicz, R. P. and Schmidhuber, J. (1997). Probabilistic incremental program evolution. Evolutionary Computation, 5(2):123–141. Samejima, K., Doya, K., and Kawato,M. (2003). Inter-module credit assignment in modular reinforcement learning. Neural Networks, 16(7):985–994. Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, 3:210–229. Sanger, T. D. (1989). An optimality principle for unsupervised learning. In Touretzky, D. S., editor, Advances in Neural Information Processing Systems (NIPS) 1, pages 11–19. Morgan Kaufmann. Santamar´ıa, J. C., Sutton, R. S., and Ram, A. (1997). Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior, 6(2):163–217. Saravanan, N. and Fogel, D. B. (1995). Evolving neural control systems. IEEE Expert, pages 23–27. Saund, E. (1994). Unsupervised learning of mixtures of multiple causes in binary data. In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems (NIPS) 6, pages 27–34. Morgan Kaufmann. Sch¨afer, A. M., Udluft, S., and Zimmermann, H.-G. (2006). Learning long term dependencies with recurrent neural networks. In Kollias, S. D., Stafylopatis, A., Duch, W., and Oja, E., editors, ICANN (1), volume 4131 of Lecture Notes in Computer Science, pages 71–80. Springer. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5:197–227. Schaul, T. and Schmidhuber, J. (2010). Metalearning. Scholarpedia, 6(5):4650. Schaul, T., Zhang, S., and LeCun, Y. (2013). No more pesky learning rates. In Proc. 30th International Conference on Machine Learning (ICML). Scherer, D., M¨uller, A., and Behnke, S. (2010). Evaluation of pooling operations in convolutional architectures for object recognition. In Proc. International Conference on Artificial Neural Networks (ICANN), pages 92–101. Schmidhuber, J. (1987). Evolutionary principles in self-referential learning. Diploma thesis, Institut f ¨ur Informatik, Technische Universit¨at M¨unchen. http://www.idsia.ch/˜juergen/diploma.html. Schmidhuber, J. (1989a). Accelerated learning in back-propagation nets. In Pfeifer, R., Schreter, Z., Fogelman, Z., and Steels, L., editors, Connectionism in Perspective, pages 429 – 438. Amsterdam: Elsevier, North-Holland. Schmidhuber, J. (1989b). A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science, 1(4):403–412. Schmidhuber, J. (1990a). Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem. Dissertation, Institut f ¨ur Informatik, Technische Universit¨at M¨unchen. Schmidhuber, J. (1990b). Learning algorithms for networks with internal and external feedback. In Touretzky, D. S., Elman, J. L., Sejnowski, T. J., and Hinton, G. E., editors, Proc. of the 1990 Connectionist Models Summer School, pages 52–61. Morgan Kaufmann. 56 Schmidhuber, J. (1990c). The Neural Heat Exchanger. Talks at TUMunich (1990), University of Colorado at Boulder (1992), and Z. Li’s NIPS*94 workshop on unsupervised learning. Also published at the Intl. Conference on Neural Information Processing (ICONIP’96), vol. 1, pages 194-197, 1996. Schmidhuber, J. (1990d). An on-line algorithmfor dynamic reinforcement learning and planning in reactive environments. In Proc. IEEE/INNS International Joint Conference on Neural Networks, San Diego, volume 2, pages 253–258. Schmidhuber, J. (1991a). Curious model-building control systems. In Proceedings of the International Joint Conference on Neural Networks, Singapore, volume 2, pages 1458–1463. IEEE press. Schmidhuber, J. (1991b). Learning to generate sub-goals for action sequences. In Kohonen, T., M¨akisara, K., Simula, O., and Kangas, J., editors, Artificial Neural Networks, pages 967–972. Elsevier Science Publishers B.V., North-Holland. Schmidhuber, J. (1991c). Reinforcement learning in Markovian and non-Markovian environments. In Lippman, D. S., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3 (NIPS 3), pages 500–506. Morgan Kaufmann. Schmidhuber, J. (1992a). A fixed size storage O(n3) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation, 4(2):243–248. Schmidhuber, J. (1992b). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234–242. (Based on TR FKI-148-91, TUM, 1991). Schmidhuber, J. (1992c). Learning factorial codes by predictability minimization. Neural Computation, 4(6):863–879. Schmidhuber, J. (1993a). An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, Brighton, pages 191–195. IEE. Schmidhuber, J. (1993b). Netzwerkarchitekturen, Zielfunktionen und Kettenregel. (Network Architectures, Objective Functions, and Chain Rule.) Habilitationsschrift (Habilitation Thesis), Institut f ¨ur Informatik, Technische Universit¨at M¨unchen. Schmidhuber, J. (1995). Discovering solutions with low Kolmogorov complexity and high generalization capability. In Prieditis, A. and Russell, S., editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 488–496. Morgan Kaufmann Publishers, San Francisco, CA. Schmidhuber, J. (1997). Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10(5):857–873. Schmidhuber, J. (2002). The Speed Prior: a new simplicity measure yielding near-optimal computable predictions. In Kivinen, J. and Sloan, R. H., editors, Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Lecture Notes in Artificial Intelligence, pages 216–228. Springer, Sydney, Australia. Schmidhuber, J. (2004). Optimal ordered problem solver. Machine Learning, 54:211–254. Schmidhuber, J. (2006a). Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connection Science, 18(2):173–187. Schmidhuber, J. (2006b). G¨odel machines: Fully self-referential optimal universal self-improvers. In Goertzel, B. and Pennachin, C., editors, Artificial General Intelligence, pages 199–226. Springer Verlag. Variant available as arXiv:cs.LO/0309048. Schmidhuber, J. (2007). Prototype resilient, self-modeling robots. Science, 316(5825):688. Schmidhuber, J. (2012). Self-delimiting neural networks. Technical Report IDSIA-08-12, arXiv:1210.0118v1 [cs.NE], The Swiss AI Lab IDSIA. 57 Schmidhuber, J. (2013a). My first Deep Learning system of 1991 + Deep Learning timeline 1962-2013. Technical Report arXiv:1312.5548v1 [cs.NE], The Swiss AI Lab IDSIA. Schmidhuber, J. (2013b). POWERPLAY: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem. Frontiers in Psychology. Schmidhuber, J., Ciresan, D.,Meier, U.,Masci, J., and Graves, A. (2011). On fast deep nets for AGI vision. In Proc. Fourth Conference on Artificial General Intelligence (AGI), Google, Mountain View, CA, pages 243–246. Schmidhuber, J., Eldracher, M., and Foltin, B. (1996). Semilinear predictability minimization produces well-known feature detectors. Neural Computation, 8(4):773–786. Schmidhuber, J. and Huber, R. (1991). Learning to generate artificial fovea trajectories for target detection. International Journal of Neural Systems, 2(1 & 2):135–141. Schmidhuber, J., Mozer, M. C., and Prelinger, D. (1993). Continuous history compression. In H¨uning, H., Neuhauser, S., Raus, M., and Ritschel, W., editors, Proc. of Intl. Workshop on Neural Networks, RWTH Aachen, pages 87–95. Augustinus. Schmidhuber, J. and Prelinger, D. (1993). Discovering predictable classifications. Neural Computation, 5(4):625–635. Schmidhuber, J. and Wahnsiedler, R. (1992). Planning simple trajectories using neural subgoal generators. In Meyer, J. A., Roitblat, H. L., andWilson, S.W., editors, Proc. of the 2nd International Conference on Simulation of Adaptive Behavior, pages 196–202. MIT Press. Schmidhuber, J., Wierstra, D., Gagliolo, M., and Gomez, F. J. (2007). Training recurrent networks by Evolino. Neural Computation, 19(3):757–779. Schmidhuber, J., Zhao, J., and Schraudolph, N. (1997a). Reinforcement learning with self-modifying policies. In Thrun, S. and Pratt, L., editors, Learning to learn, pages 293–309. Kluwer. Schmidhuber, J., Zhao, J., and Wiering, M. (1997b). Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28:105–130. Sch¨olkopf, B., Burges, C. J. C., and Smola, A. J., editors (1998). Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge, MA. Schraudolph, N. and Sejnowski, T. J. (1993). Unsupervised discrimination of clustered data via optimization of binary information gain. In Hanson, S. J., Cowan, J. D., and Giles, C. L., editors, Advances in Neural Information Processing Systems, volume 5, pages 499–506. Morgan Kaufmann, San Mateo. Schraudolph, N. N. (2002). Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7):1723–1738. Schraudolph, N. N. and Sejnowski, T. J. (1996). Tempering backpropagation networks: Not all weights are created equal. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Systems (NIPS), volume 8, pages 563–569. The MIT Press, Cambridge, MA. Schrauwen, B., Verstraeten, D., and Van Campenhout, J. (2007). An overview of reservoir computing: theory, applications and implementations. In Proceedings of the 15th European Symposium on Artificial Neural Networks. p. 471-482 2007, pages 471–482. Schuster, H. G. (1992). Learning by maximization the information transfer through nonlinear noisy neurons and “noise breakdown”. Phys. Rev. A, 46(4):2131–2138. Schuster,M. (1999). On supervised learning from sequential data with applications for speech recognition. PhD thesis, Nara Institute of Science and Technolog, Kyoto, Japan. 58 Schuster, M. and Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45:2673–2681. Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proc. ICML, pages 298–305. Schwefel, H. P. (1974). Numerische Optimierung von Computer-Modellen. Dissertation. Published 1977 by Birkh¨auser, Basel. Segmentation of Neuronal Structures in EM Stacks Challenge (2012). IEEE International Symposium on Biomedical Imaging (ISBI), http://tinyurl.com/d2fgh7g. Sehnke, F., Osendorfer, C., R¨uckstieß, T., Graves, A., Peters, J., and Schmidhuber, J. (2010). Parameterexploring policy gradients. Neural Networks, 23(4):551–559. Sermanet, P. and LeCun, Y. (2011). Traffic sign recognition with multi-scale convolutional networks. In Proceedings of International Joint Conference on Neural Networks (IJCNN’11), pages 2809–2813. Serrano-Gotarredona, R., Oster, M., Lichtsteiner, P., Linares-Barranco, A., Paz-Vicente, R., G´omez- Rodr´ıguez, F., Camu˜nas-Mesa, L., Berner, R., Rivas-P´erez, M., Delbruck, T., et al. (2009). Caviar: A 45k neuron, 5m synapse, 12g connects/s AER hardware sensory–processing–learning–actuating system for high-speed visual object recognition and tracking. IEEE Transactions on Neural Networks, 20(9):1417–1438. Seung, H. S. (2003). Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron, 40(6):1063–1073. Shanno, D. F. (1970). Conditioning of quasi-Newton methods for function minimization. Mathematics of computation, 24(111):647–656. Shannon, C. E. (1948). A mathematical theory of communication (parts I and II). Bell System Technical Journal, XXVII:379–423. Shavlik, J. W. (1994). Combining symbolic and neural learning. Machine Learning, 14(3):321–331. Shavlik, J. W. and Towell, G. G. (1989). Combining explanation-based and neural learning: An algorithm and empirical results. Connection Science, 1(3):233–255. Siegelmann, H. (1992). Theoretical Foundations of Recurrent Neural Networks. PhD thesis, Rutgers, New Brunswick Rutgers, The State of New Jersey. Siegelmann, H. T. and Sontag, E. D. (1991). Turing computability with neural nets. Applied Mathematics Letters, 4(6):77–80. Silva, F. M. and Almeida, L. B. (1990). Speeding up back-propagation. In Eckmiller, R., editor, Advanced Neural Computers, pages 151–158, Amsterdam. Elsevier. S´ıma, J. (1994). Loading deep networks is hard. Neural Computation, 6(5):842–850. S´ıma, J. (2002). Training a single sigmoidal neuron is hard. Neural Computation, 14(11):2709–2728. Simard, P., Steinkraus, D., and Platt, J. (2003). Best practices for convolutional neural networks applied to visual document analysis. In Seventh International Conference on Document Analysis and Recognition, pages 958–963. Sims, K. (1994). Evolving virtual creatures. In Glassner, A., editor, Proceedings of SIGGRAPH ’94 (Orlando, Florida, July 1994), Computer Graphics Proceedings, Annual Conference, pages 15–22. ACM SIGGRAPH, ACM Press. ISBN 0-89791-667-0. Simsek, ¨ O. and Barto, A. G. (2008). Skill characterization based on betweenness. In NIPS’08, pages 1497–1504. 59 Singh, S., Barto, A. G., and Chentanez, N. (2005). Intrinsically motivated reinforcement learning. In Advances in Neural Information Processing Systems 17 (NIPS). MIT Press, Cambridge, MA. Singh, S. P. (1994). Reinforcement learning algorithms for average-payoff Markovian decision processes. In National Conference on Artificial Intelligence, pages 700–705. Smith, S. F. (1980). A Learning System Based on Genetic Adaptive Algorithms,. PhD thesis, Univ. Pittsburgh. Smolensky, P. (1986). Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1. chapter Information Processing in Dynamical Systems: Foundations of Harmony Theory, pages 194–281. MIT Press, Cambridge, MA, USA. Solla, S. A. (1988). Accelerated learning in layered neural networks. Complex Systems, 2:625–640. Solomonoff, R. J. (1964). A formal theory of inductive inference. Part I. Information and Control, 7:1–22. Soloway, E. (1986). Learning to program = learning to construct mechanisms and explanations. Communications of the ACM, 29(9):850–858. Song, S., Miller, K. D., and Abbott, L. F. (2000). Competitive Hebbian learning through spike-timingdependent synaptic plasticity. Nature Neuroscience, 3(9):919–926. Speelpenning, B. (1980). Compiling Fast Partial Derivatives of Functions Given by Algorithms. PhD thesis, Department of Computer Science, University of Illinois, Urbana-Champaign. Srivastava, R. K.,Masci, J., Kazerounian, S., Gomez, F., and Schmidhuber, J. (2013). Compete to compute. In Advances in Neural Information Processing Systems (NIPS), pages 2310–2318. Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. (2011). INI German Traffic Sign Recognition Benchmark for the IJCNN’11 Competition. Stanley, K. O., D’Ambrosio, D. B., and Gauci, J. (2009). A hypercube-based encoding for evolving largescale neural networks. Artificial Life, 15(2):185–212. Stanley, K. O. and Miikkulainen, R. (2002). Evolving neural networks through augmenting topologies. Evolutionary Computation, 10:99–127. Steijvers, M. and Grunwald, P. (1996). A recurrent network that performs a contextsensitive prediction task. In Proceedings of the 18th Annual Conference of the Cognitive Science Society. Erlbaum. Steil, J. J. (2007). Online reservoir adaptation by intrinsic plasticity for backpropagation–decorrelation and echo state learning. Neural Networks, 20(3):353–364. Stemmler, M. (1996). A single spike suffices: the simplest form of stochastic resonance in model neurons. Network: Computation in Neural Systems, 7(4):687–716. Stone,M. (1974). Cross-validatory choice and assessment of statistical predictions. Roy. Stat. Soc., 36:111– 147. Stoop, R., Schindler, K., and Bunimovich, L. (2000). When pyramidal neurons lock, when they respond chaotically, and when they like to synchronize. Neuroscience research, 36(1):81–91. Sun, G., Chen, H., and Lee, Y. (1993a). Time warping invariant neural networks. In S. J. Hanson, J. D. C. and Giles, C. L., editors, Advances in Neural Information Processing Systems (NIPS) 5, pages 180–187. Morgan Kaufmann. Sun, G. Z., Giles, C. L., Chen, H. H., and Lee, Y. C. (1993b). The neural network pushdown automaton: Model, stack and learning simulations. Technical Report CS-TR-3118, University of Maryland, College Park. 60 Sun, Y., Gomez, F., Schaul, T., and Schmidhuber, J. (2013). A Linear Time Natural Evolution Strategy for Non-Separable Functions. In Proceedings of the Genetic and Evolutionary Computation Conference, page 61, Amsterdam, NL. ACM. Sun, Y., Wierstra, D., Schaul, T., and Schmidhuber, J. (2009). Efficient natural evolution strategies. In Proc. 11th Genetic and Evolutionary Computation Conference (GECCO), pages 539–546. Sutskever, I., Hinton, G. E., and Taylor, G. W. (2008). The recurrent temporal restricted Boltzmann machine. In NIPS, volume 21, page 2008. Sutton, R. and Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA, MIT Press. Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour, Y. (1999a). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems (NIPS) 12, pages 1057–1063. Sutton, R. S., Precup, D., and Singh, S. P. (1999b). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell., 112(1-2):181–211. Sutton, R. S., Szepesv´ari, C., andMaei, H. R. (2008). A convergent O(n) algorithm for off-policy temporaldifference learning with linear function approximation. In Advances in Neural Information Processing Systems (NIPS’08), volume 21, pages 1609–1616. Szab´o, Z., P´oczos, B., and L˝orincz, A. (2006). Cross-entropy optimization for independent process analysis. In Independent Component Analysis and Blind Signal Separation, pages 909–916. Springer. Szegedy, C., Toshev, A., and Erhan, D. (2013). Deep neural networks for object detection. pages 2553– 2561. Tegge, A. N., Wang, Z., Eickholt, J., and Cheng, J. (2009). NNcon: improved protein contact map prediction using 2D-recursive neural networks. Nucleic Acids Research, 37(Suppl 2):W515–W518. Teller, A. (1994). The evolution of mental models. In Kenneth E. Kinnear, J., editor, Advances in Genetic Programming, pages 199–219. MIT Press. Tenenberg, J., Karlsson, J., and Whitehead, S. (1993). Learning via task decomposition. In Meyer, J. A., Roitblat, H., and Wilson, S., editors, From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior, pages 337–343. MIT Press. Tesauro, G. (1994). TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215–219. Tieleman, T. and Hinton, G. (2012). Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning. Tikhonov, A. N., Arsenin, V. I., and John, F. (1977). Solutions of ill-posed problems. Winston. Ting, K. M. andWitten, I. H. (1997). Stacked generalization: when does it work? In in Proc. International Joint Conference on Artificial Intelligence (IJCAI). Ti ˇno, P. and Hammer, B. (2004). Architectural bias in recurrent neural networks: Fractal analysis. Neural Computation, 15(8):1931–1957. Tonkes, B. and Wiles, J. (1997). Learning a context-free task with a recurrent neural network: An analysis of stability. In Proceedings of the Fourth Biennial Conference of the Australasian Cognitive Science Society. Towell, G. G. and Shavlik, J.W. (1994). Knowledge-based artificial neural networks. Artificial Intelligence, 70(1):119–165. 61 Tsitsiklis, J. N. and van Roy, B. (1996). Feature-based methods for large scale dynamic programming. Machine Learning, 22(1-3):59–94. Tsodyks, M. V., Skaggs, W. E., Sejnowski, T. J., and McNaughton, B. L. (1996). Population dynamics and theta rhythm phase precession of hippocampal place cell firing: a spiking neuron model. Hippocampus, 6(3):271–280. Turaga, S. C., Murray, J. F., Jain, V., Roth, F., Helmstaedter, M., Briggman, K., Denk,W., and Seung, H. S. (2010). Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Computation, 22(2):511–538. Turing, A. M. (1936). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, Series 2, 41:230–267. Ueda, N. (2000). Optimal linear combination of neural networks for improving classification performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(2):207–215. Urlbe, A. P. (1999). Structure-adaptable digital neural networks. PhD thesis, Universidad del Valle. Vahed, A. and Omlin, C. W. (2004). A machine learning method for extracting symbolic knowledge from recurrent neural networks. Neural Computation, 16(1):59–71. Vaillant, R., Monrocq, C., and LeCun, Y. (1994). Original approach for the localisation of objects in images. IEE Proc on Vision, Image, and Signal Processing, 141(4):245–250. Vapnik, V. (1992). Principles of risk minimization for learning theory. In Lippman, D. S.,Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems (NIPS) 4, pages 831–838. Morgan Kaufmann. Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, New York. Veta, M., Viergever, M., Pluim, J., Stathonikos, N., and van Diest, P. J. (2013). MICCAI 2013 Grand Challenge on Mitosis Detection. Vincent, P., Hugo, L., Bengio, Y., and Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, ICML ’08, pages 1096–1103, New York, NY, USA. ACM. Vogl, T., Mangis, J., Rigler, A., Zink, W., and Alkon, D. (1988). Accelerating the convergence of the back-propagation method. Biological Cybernetics, 59:257–263. von der Malsburg, C. (1973). Self-organization of orientation sensitive cells in the striate cortex. Kybernetik, 14(2):85–100. Waldinger, R. J. and Lee, R. C. T. (1969). PROW: a step toward automatic program writing. In Walker, D. E. and Norton, L. M., editors, Proceedings of the 1st International Joint Conference on Artificial Intelligence (IJCAI), pages 241–252. Morgan Kaufmann. Wallace, C. S. and Boulton, D. M. (1968). An information theoretic measure for classification. Computer Journal, 11(2):185–194. Wan, E. A. (1994). Time series prediction by using a connectionist network with internal delay lines. In Weigend, A. S. and Gershenfeld, N. A., editors, Time series prediction: Forecasting the future and understanding the past, pages 265–295. Addison-Wesley. Wang, C., Venkatesh, S. S., and Judd, J. S. (1994). Optimal stopping and effective machine complexity in learning. In Advances in Neural Information Processing Systems (NIPS’6), pages 303–310. Morgan Kaufmann. 62 Wang, S. and Manning, C. (2013). Fast dropout training. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 118–126. Watanabe, O. (1992). Kolmogorov complexity and computational complexity. EATCS Monographs on Theoretical Computer Science, Springer. Watanabe, S. (1985). Pattern Recognition: Human and Mechanical. Willey, New York. Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, King’s College, Oxford. Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Machine Learning, 8:279–292. Watrous, R. L. and Kuhn, G. M. (1992). Induction of finite-state automata using second-order recurrent networks. In Moody, J. E., Hanson, S. J., and Lippman, R. P., editors, Advances in Neural Information Processing Systems 4, pages 309–316. Morgan Kaufmann. Waydo, S. and Koch, C. (2008). Unsupervised learning of individuals and categories from images. Neural Computation, 20(5):1165–1178. Weigend, A. S. and Gershenfeld, N. A. (1993). Results of the time series prediction competition at the santa fe institute. In Neural Networks, 1993., IEEE International Conference on, pages 1786–1793. IEEE. Weigend, A. S., Rumelhart, D. E., and Huberman, B. A. (1991). Generalization by weight-elimination with application to forecasting. In Lippmann, R. P., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems (NIPS) 3, pages 875–882. San Mateo, CA: Morgan Kaufmann. Weng, J., Ahuja, N., and Huang, T. S. (1992). Cresceptron: a self-organizing neural network which grows adaptively. In International Joint Conference on Neural Networks (IJCNN), volume 1, pages 576–581. IEEE. Weng, J. J., Ahuja, N., and Huang, T. S. (1997). Learning recognition and segmentation using the cresceptron. International Journal of Computer Vision, 25(2):109–143. Werbos, P. J. (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University. Werbos, P. J. (1981). Applications of advances in nonlinear sensitivity analysis. In Proceedings of the 10th IFIP Conference, 31.8 - 4.9, NYC, pages 762–770. Werbos, P. J. (1987). Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man, and Cybernetics, 17. Werbos, P. J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1. Werbos, P. J. (1989a). Backpropagation and neurocontrol: A review and prospectus. In IEEE/INNS International Joint Conference on Neural Networks, Washington, D.C., volume 1, pages 209–216. Werbos, P. J. (1989b). Neural networks for control and system identification. In Proceedings of IEEE/CDC Tampa, Florida. Werbos, P. J. (1992). Neural networks, system identification, and control in the chemical industries. In D. A. White, D. A. S., editor, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, pages 283–356. Thomson Learning. Werbos, P. J. (2006). Backwards differentiation in AD and neural nets: Past links and new opportunities. In Automatic Differentiation: Applications, Theory, and Implementations, pages 15–34. Springer. West, A. H. L. and Saad, D. (1995). Adaptive back-propagation in on-line learning of multilayer networks. In Touretzky, D. S., Mozer, M., and Hasselmo, M. E., editors, NIPS, pages 323–329. MIT Press. 63 White, H. (1989). Learning in artificial neural networks: A statistical perspective. Neural Computation, 1(4):425–464. Whitehead, S. (1992). Reinforcement Learning for the adaptive control of perception and action. PhD thesis, University of Rochester. Whiteson, S., Kohl, N., Miikkulainen, R., and Stone, P. (2005). Evolving keepaway soccer players through task decomposition. Machine Learning, 59(1):5–30. Widrow, B. and Hoff, M. (1962). Associative storage and retrieval of digital information in networks of adaptive neurons. Biological Prototypes and Synthetic Systems, 1:160. Widrow, B., Rumelhart, D. E., and Lehr,M. A. (1994). Neural networks: Applications in industry, business and science. Commun. ACM, 37(3):93–105. Wieland, A. P. (1991). Evolving neural network controllers for unstable systems. In International Joint Conference on Neural Networks (IJCNN), volume 2, pages 667–673. IEEE. Wiering, M. and Schmidhuber, J. (1996). Solving POMDPs with Levin search and EIRA. In Saitta, L., editor, Machine Learning: Proceedings of the Thirteenth International Conference, pages 534–542. Morgan Kaufmann Publishers, San Francisco, CA. Wiering, M. and Schmidhuber, J. (1998a). HQ-learning. Adaptive Behavior, 6(2):219–246. Wiering, M. A. and Schmidhuber, J. (1998b). Fast online Q(). Machine Learning, 33(1):105–116. Wierstra, D., Foerster, A., Peters, J., and Schmidhuber, J. (2007). Solving deep memory POMDPs with recurrent policy gradients. In ICANN (1), volume 4668 of Lecture Notes in Computer Science, pages 697–706. Springer. Wierstra, D., Foerster, A., Peters, J., and Schmidhuber, J. (2010). Recurrent policy gradients. Logic Journal of IGPL, 18(2):620–634. Wierstra, D., Schaul, T., Peters, J., and Schmidhuber, J. (2008). Natural evolution strategies. In Congress of Evolutionary Computation (CEC 2008). Wiesel, D. H. and Hubel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. J. Physiol., 148:574–591. Wiles, J. and Elman, J. (1995). Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks. In In Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society, pages pages 482 – 487, Cambridge, MA. MIT Press. Wilkinson, J. H., editor (1965). The Algebraic Eigenvalue Problem. Oxford University Press, Inc., New York, NY, USA. Williams, R. J. (1986). Reinforcement-learning in connectionist networks: A mathematical analysis. Technical Report 8605, Institute for Cognitive Science, University of California, San Diego. Williams, R. J. (1988). Toward a theory of reinforcement-learning connectionist systems. Technical Report NU-CCS-88-3, College of Comp. Sci., Northeastern University, Boston, MA. Williams, R. J. (1989). Complexity of exact gradient computation algorithms for recurrent neural networks. Technical Report Technical Report NU-CCS-89-27, Boston: Northeastern University, College of Computer Science. Williams, R. J. (1992a). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256. Williams, R. J. (1992b). Training recurrent networks using the extended Kalman filter. In International Joint Conference on Neural Networks (IJCNN), volume 4, pages 241–246. IEEE. 64 Williams, R. J. and Peng, J. (1990). An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Computation, 4:491–501. Williams, R. J. and Zipser, D. (1988). A learning algorithm for continually running fully recurrent networks. Technical Report ICS Report 8805, Univ. of California, San Diego, La Jolla. Williams, R. J. and Zipser, D. (1989a). Experimental analysis of the real-time recurrent learning algorithm. Connection Science, 1(1):87–111. Williams, R. J. and Zipser, D. (1989b). A learning algorithm for continually running fully recurrent networks. Neural Computation, 1(2):270–280. Willshaw, D. J. and von der Malsburg, C. (1976). How patterned neural connections can be set up by self-organization. Proc. R. Soc. London B, 194:431–445. Windisch, D. (2005). Loading deep networks is hard: The pyramidal case. Neural Computation, 17(2):487– 502. Wiskott, L. and Sejnowski, T. (2002). Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4):715–770. Witczak, M., Korbicz, J., Mrugalski, M., and Patton, R. J. (2006). A GMDH neural network-based approach to robust fault diagnosis: Application to the DAMADICS benchmark problem. Control Engineering Practice, 14(6):671–683. Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2):241–259. Wolpert, D. H. (1994). Bayesian backpropagation over i-o functions rather than weights. In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems (NIPS) 6, pages 200–207. Morgan Kaufmann. Wu, L. and Baldi, P. (2008). Learning to play Go using recursive neural networks. Neural Networks, 21(9):1392–1400. Wyatte, D., Curran, T., and O’Reilly, R. (2012). The limits of feedforward vision: Recurrent processing promotes robust object recognition when objects are degraded. Journal of Cognitive Neuroscience, 24(11):2248–2261. Yamauchi, B. M. and Beer, R. D. (1994). Sequential behavior and learning in evolved dynamical neural networks. Adaptive Behavior, 2(3):219–246. Yamins, D., Hong, H., Cadieu, C., and DiCarlo, J. J. (2013). Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream. Advances in Neural Information Processing Systems (NIPS), pages 1–9. Yang, M., Ji, S., Xu, W., Wang, J., Lv, F., Yu, K., Gong, Y., Dikmen, M., Lin, D. J., and Huang, T. S. (2009). Detecting human actions in surveillance videos. In TREC Video Retrieval Evaluation Workshop. Yao, X. (1993). A review of evolutionary artificial neural networks. International Journal of Intelligent Systems, 4:203–222. Yu, X.-H., Chen, G.-A., and Cheng, S.-X. (1995). Dynamic learning rate optimization of the backpropagation algorithm. IEEE Transactions on Neural Networks, 6(3):669–677. Zeiler, M. D. (2012). ADADELTA: An Adaptive Learning Rate Method. CoRR, abs/1212.5701. Zeiler, M. D. and Fergus, R. (2013). Visualizing and understanding convolutional networks. Technical Report arXiv:1311.2901 [cs.CV], NYU. Zemel, R. S. (1993). A minimum description length framework for unsupervised learning. PhD thesis, University of Toronto. 65 Zemel, R. S. and Hinton, G. E. (1994). Developing population codes by minimizing description length. In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems 6, pages 11–18. Morgan Kaufmann. Zeng, Z., Goodman, R., and Smyth, P. (1994). Discrete recurrent neural networks for grammatical inference. IEEE Transactions on Neural Networks, 5(2). Zimmermann, H.-G., Tietz, C., and Grothmann, R. (2012). Forecasting with recurrent neural networks: 12 tricks. In Montavon, G., Orr, G. B., and M¨uller, K.-R., editors, Neural Networks: Tricks of the Trade (2nd ed.), volume 7700 of Lecture Notes in Computer Science, pages 687–707. Springer. Zipser, D., Kehoe, B., Littlewort, G., and Fuster, J. (1993). A spiking network model of short-term active memory. The Journal of Neuroscience, 13(8):3406–3420. 66