Menu directory status & updates copyrights
  • Summary comments
  • Play with the [time, mind]-bending perspective yourself
  • Ratio of actual to semi-log detrended data : [advantages, disadvantages]
  • Future potential work
  • Comparison of [TradingView, Yahoo finance] data
  • [data, software] cart [description, links] Menu directory status & updates copyrights
  • Menu directory status & updates copyrights Menu directory status & updates copyrights
  • At present, the full video (540 Mbytes) is too slow (dragging, deep voices, slow video), and is too cumbersome to go from one time to another. So until I convert to a different video [codec, contailer] formats (perhaps H.264 codec & .MKV container?) or find a video viewer that is better suited to large files, the videos for each scene are posted instead (see the listing below), giving better throughput and easy of going from one scene to another by separate loading. Microsoft Windows (and hopefully MacIntosh?) users can view this by downloading the VLC media viewer. "... VLC is a free and open source cross-platform multimedia player and framework that plays most multimedia files, and various streaming protocols. ..." At present, this full video cannot be moved forward and back within the video, something I will fix when I get the time, as the ability to go back over material and skip sections is particularly important with this video. In the meantime, the separate "Scenes" listed below can be used by moving back and forward.
  • The QNial programming language was used to [direct, sequence, conduct, whatever] the video production, together with a LibreOffice Calc spreadsheet that acts as a great front-end for preparing code specific to the video sequencing. These can be found in the Programming code directory listing, and will be handy for anyone interested in the details of how I produced the video. I like to describe the QNial programming language of Queen's University, Kingston, Ontario, Canada as "... the beautiful child of a marriage between LISP and APL ...". It is not commonly used today, and even though it is an interpreted language, I always get frustrated with other languages that I also use, it's conceptual power always brings me back home to it. Bug hunting can be problematic if you don't build in bug taps and [structured, object oriented] capabiities, but for much of what I do I keep those chains to a minimum so I can use the full power of the language. Menu directory status & updates copyrights
  • Summary - my commentary as part of Perry Kincaid's winar, 31Mar2022.
  • Key files - to [view, hear] my commentary
  • References - unfortunately, the list is very incomplete, but does provide some links Perry Kincaid, founder of KEI Networks, organised a PUBLIC webinar Alberta is high on hydrogen : Introducing hydrogen to Alberta's energy mix and commentaries about how and why, Thursday, March 31st 4:00pm MST.
  • Slide show - open source presentation file format .odp. Microsoft PowerPoint will probably complain, but should be able to load.
  • Voice narrative - in mp3 audio file format.
  • Adobe pdf - file format.
  • Voice script - text file with script for the voice commentary. Also included are notes for some of the slides that were not commented on (marked by "(X)"). Click to view most files related to this presentation
    Menu directory status & updates copyrights Ben Davidson of Suspicious Observers posted 3 brilliant videos on nearby stellar flaring, as further support for a potential "micro-flare" or other solar disruption to explain the 12,000 year [mythological observations, paleontology, geology, planetary] quasi-periodicity of disruptive events on Earth, which by appearances may be "imminent". I like Ben's <=50 to >=200 year uncertainty - and even though that is still a bit of guess, he is meticulous in pointing out the uncertainties.
  • 24Dec2019 DISASTER CYCLE | Signs in the Sky Now
  • 26Dec2019 Galactic Sheet Impact | Timing the Arrival
  • 27Dec2019 Nearby Superflares | What Do They Mean If we take an "Electric Universe" perspective, in particular Wal Thornhill's Birkeland current concepts for large-scale astronomy, and Don Scott's very interesting "solar transistor" model together with his 2015 Birkeland current model (also 2018 elaboration), then perhaps shifts in the galactic currents could be expected to "reincarnate-light up" or "dim-extinguish" stars to various degrees as the currents shift and move. Many stars (I can't remember all of them - perhaps brown dwarfs, giant planets close to being stars, etc) are not visible by light emission, but perhaps they are easily re-activated when current change. Perhaps in exteme case this might lead to "swapping the central star role" between a large planets and its star in the local z-pinch? In other words, the "lit-up regions" motions may relate more to drifts of galactic currents than to the motions of the stars themselves? In that manner, the "galactic spirals" could move independently of the stars.


    Note that Donald Scott's own analysis of "stellar velocity profiles" provides yet another explanation of what is observed. So my speculations here are just one of many that have been proposed.

    Menu directory status & updates copyrights ALL videos are provided in ogv file format, which is of higher quality and easier and more natural to me in a Linux environment. Microsoft Windows (and hopefully MacIntosh?) users can view this by downloading the VLC media viewer. "... VLC is a free and open source cross-platform multimedia player and framework that plays most multimedia files, and various streaming protocols. ...".
  • Ben Davidson of Suspicious Observers posted 3 brilliant videos on nearby stellar flaring, as further support for a potential "micro-flare" or other solar disruption to explain the 12,000 year [mythological observations, paleontology, geology, planetary] quasi-periodicity of disruptive events on Earth, which by appearances may be "imminent". But can stellar [apparent birth, brighten, dim, apparent death] also provide further potential evidence? Naturally we view stars' life-paths as "unidirectional", but is this a full picture, or can all these processes recur, as is the case fortheir [sunspots, micro-to-super novas, etc]? What has long fascinated me is the statement that the spirals of the galaxies move more rapidly than the stars in the galaxy, and how that might relate to the [Newtonian, General Relativity] problem at large scales.
  • Toolsets can be browsed via: Past and Future Worlds directory. Perhaps these may be of interest, help] to others putting together a film from Linux-based free software.
  • Toolsets can be browsed via: Big Data, Deep Learning, and Safety directory. Perhaps these may be of interest, help] to others putting together a film from Linux-based free software.
  • Toolsets can be browsed via: Icebreaker unchained directory. Perhaps these may be of interest, help] to others putting together a film from Linux-based free software.
  • Menu directory status & updates copyrights
  • code code development overall Menu directory status & updates copyrights
  • fileops run commentary, overall.html
    fileops run commentary, overall.html
    fileops run commentary, overall.html
    code code development Menu directory status & updates copyrights
  • code code development overall Menu directory status & updates copyrights
  • fileops run commentary, overall.html
    fileops run commentary, overall.html
    fileops run commentary, overall.html
    code code development Menu directory status & updates copyrights
  • Howell - TradingView PineScript [description, problem, debug].html
  • Howell - TradingView PineScript of priceTimeFractals.html
  • 0_PineScript notes.txt - details of software [code, bug, blogSolutions]
  • 0_PineScript errors.txt - [error, solution]s that keep coming back
  • Howell - References related to Puetz [H]UWS.html
  • Kivanc Ozbilgics Turtle Trade PineScript - documention.txt
  • Kivanc Ozbilgics Turtle Trade PineScript, plus 8-year detrended SP500.txt
  • RicardoSantos, Function Polynomial Regression.txt
  • sickojacko maximum [,relative] drawdown calculating functions.txt
  • TradingView auto fib extension.txt Menu directory status & updates copyrights
  • Perhaps more importantly, are lessons that can be learned from my own failutres, and some of the techniques I've used to help debug my Pine Script code. General comments are provided on my webPage TradingView PineScript [description, problem, debug].
    This section also appears in my webPage for users , and also applies to programmers. Users only have to set up the basic chart and symbols in TradingView based on my chart PuetzUWS [time, price] multiFractal mirrors, SPX 1872-2020. To do so you must be a TradingView subscriber. After that, copy over my PineScript coding, which you can find on my TradingView page - click on "SCRIPTS", and select my script "PuetzUWS [time, price] multifractal of detrended SPX 1871-2020". Further setup details are given below.
    Download symbol data (like [TVC:[USOIL, GOLD], NASDAQ: NDX]) from [TradingView, yahoo finance, etc]. My own data for SPX is in my LibreCalc spreadsheet SP500 1872-2020 TradingView, 1928-2020 yahoo finance.ods. Actually, it's in several different spreadsheets, hence the possibility of glitches as [update, change]s are made...
    Users can simply follow standard Trading View guide instructions to install the Pine Script program that super-imposes fractal [time, price] grids on their charts. I don't recommend that you do this UNLESS your [are, want to be] familiar with PineScript programming. The reason that I say that is because for every market symbol being tracked, you must provide a formula for the semi-log price [trend, relative standard deviation]. Preferably get as long a series as you can get, eg download from TradingView. If you don't have 20+ years of data (eg the young crypto market), it may be a waste of your time. Here are the statements that you need to adapt to your symbol's data :
    For details, see Howell - TradingView PineScript [description, problem, debug].html.
  • Menu directory status & updates copyrights
  • Special comments
  • Regular [1,6] month market views

  • https://tr.tradingview.com/script/pB5nv16J/?utm_source=notification_email&utm_medium=email&utm_campaign=notification_pubscript_update

    https://www.tradingview.com/script/12M8Jqu6-Function-Polynomial-Regression/

    Menu directory status & updates copyrights





    Menu directory status & updates copyrights
  • Key [results, comments]
  • How can the Great Pricing Waves be correlated with "dead" system?
  • Ratio of actual to semi-log detrended data : [advantages, disadvantages]
  • Future potential work
  • Comparison of [TradingView, Yahoo finance] data
  • [data, software] cart [description, links] NOTE : The model here is DEFINITELY NOT suitable for application to [trade, invest]ing! It's far too [early, incomplete, immature, erroneous]! See Steven Puetz's http://www.uct-news.com/ for the investment side of the UWS.
    I typically use a LibreOffice Calc spreadsheets to [collect, rearrange, simple transform] data. For this project : 1_Fischer 1200-2020.ods
    This is susceptible to serious bias in selecting the [start, end] dates for each segment. See the spreadsheet 1_Fischer 1200-2020.ods.
    The year ~1926 was taken as the [start, end] point for my 1872-2020 detrend StockMkt Indices 1871-2022 PuetzUWS2011 [start, end] point, so I use it here as well. (23Feb2023 - original text said 1940, perhaps it is still like that?)
    This is easy with the spreadsheet - one column of regression results I use 10 year intervals per segment, but you only really need the [start, end] dates [-,+] 20 years. The extra 20 years extends the segments at both ends for visual clarity. For an example, see the spreadsheet 1_Fischer 1200-2020.ods, sheet "Fig 0.01 SLRsegments".
    Save the "SLRsegments" to a data file that canused by GNUplot. Example : Fig 0.01 line segments for GNUplots.dat. Notice that olumn titles can use free-format text, except for the comma, which separates columns.
  • Save data of 1_Fischer 1200-2020.ods to data file , example Fig 0.01 linear regression raw data.dat
  • For each curve, Fischer linear regressions.ndf (23Feb2023 no longer exists?) - a special operator (proceedure) is created to select a segments's dataFile, handles [data, results] file [input, output], and calls fit_linearRegress.ndf
  • text data file : Fig 0.01 Price of consumables in England 1201-1993.dat
  • gnuplot script : Fig 0.01 Price of consumables in England 1201-1993.plt
  • graph output : Fig 0.01 Price of consumables in England 1201-1993.png
  • Fig 0.01 Price of consumables in England 1201-1993 detrended.plt - This covers the medieval to mdern era, and is used to colect curves for different data. The restricted t-frame provides a more accurate view of that period.
  • 1850 BC to 2020 AD prices detrended.plt - Obviously this covers a variety of regions, time-frames. What I really need is data going 7,500 years (~3 cycles of 2,400 years (Halstatt cycle) corsponding to a 2006 project on the rise and fall of civilisations _Civilisations and the sun, and if I find [time to do it, data] this would be nice. https://www.digitizeit.xyz/ https://www.gimp.org http://www.gnuplot.info/ Menu directory status & updates copyrights Executive Intelligence Review This article appears in the October 27, 2000 issue of Executive Intelligence Review. Subscribe to EIR Menu directory status & updates copyrights

  • Menu directory status & updates copyrights
  • Messages sorted by: [ date ][ thread ][ subject ][ author ]
  • Next message: Jas Jain: "Re: Inflation"
  • Previous message: Jas Jain: "Re: Thank you...:)"
  • Maybe in reply to: Dan Brindle: "Earnings growth" Jas Jain (jjain@cisco.com)
    http://www.yardeni.com/public/sp52_c.pdf
    (http://stats.bls.gov:80/cgi-bin/surveymost?wp).] Data from before
    Menu directory status & updates copyrights
  • Key [results, comments]
  • Play with the [time, mind]-bending perspective yourself
  • Ratio of actual to semi-log detrended data : [advantages, disadvantages]
  • Future potential work
  • Comparison of [TradingView, Yahoo finance] data
  • [data, software] cart [description, links]


    Wow! Even knowing that the [eyes, mind] often see patterns that aren't really there (as per random noise), one can basically re-create the essence of the 1872-1960 timeframe simply by copying ONLY 4 chunks of the 1950-2020 time-frame!! Of course, there is nothing new about this - TradingView members are often comparing current market squiggles to the past, over different timescales. I would actually be surprised if the aqbove graph hasn't already been done hundreds of times before. [System_T, Amjad Farooq, TexasWestCapital, perhaps Harry Dent] and others are examples of recent pattern-matching for the SP500, with their comparisons of the 2020 crash to other time periods. But my overlays in the graph above above did not involve [re-scaling, rotating] or other transformation of the time segments, transparencies], so that is [noteworthy, simple, pure]. Scale is important, even if only to confirm the applicability of multi-scalar processes.
    While you probably don't have the gimp (GNU image manipulation program) installed on your computer (yet), it is available for free (I think on MS Windows as well, not just Linux?). With gimp, you will be able to work with my .xcf format file SP500 time-section transparencies. If you are new to gimp, be prepared to lose a lot of hair and gain a lot of wrinkles - it's not the the easiest learning curve, but it is powerful (and cheap!).
  • 7,500 years of history - This is the same challenge that I had with a [lunitic, scattered, naive] model of history by my father and I, where it was necessary to cut ?150? years out of a 7,500 year time series to "kind of make it fit". Steven Yaskall recognized us as the "two fools who rushed in" in his book "Grand Phases on the Sun". We were justifiably proud of that.
  • Smooth sinusoidal curves and regular periodicities - I seems that mathematicians and scientists still [think, apply] models assuming ideal waveforms, even when [their tools, reality] do not. Stephen Puetz's "Universal Waves Series" (UWS) is the most [powerful, fantastic] meta-level model for [natual, human] cycles that I have ever seen, by far. It even has an awesome probablistic-ranked list of expected timings of events at different timescales. However, perhaps more remains to be done on subtleshifts in real time series? I don't know - I'm just guessing.

    While most results are provided in sections above, links to data [spreadsheets, text files] and software [???, source code] are listed below along with brief comments. A full listing of files (including other SP500 web-pages) can be seen via this Directory's listing. Hopefully this will help those who want to do something different, as the programs etc mayhelp with [learning, debugging].
  • TradingView data text file and spreadsheet - I had to upgrade my TradingView subscription to Pro+ to download the data for years prior to 1928, as I couldn't find another source. Note that the S&P500 index started in 1926, so I assume that proxy [data, index memberships] were used for prior years. I used the spreadsheet to [gather, view, process] data, and copied the resulting tables to text files for use by gnuplot (see "Software" below).
  • Yahoo finance data (23Feb2023 the text file has been lost, but the data is in the linked spreadsheet with TradingView data). I was happy to have another "somewhat independent" data source, even if they are both from the same S&P or other source. This really helps as a check on my data treatment (see the section above "Comparison of [TradingView, Yahoo finance] data").
  • TradingView Pine language You are probably wondering why I didn't provide a Pine script, which would make this much more useful to the TradingView community. Laziness is the rule - especially as I am hoping that a Pine Scripter (maybe you?) might do this.
  • gnuplot I've used the unofficial extension .plt to designate gnuplot scripts for each of the graphs. You can see these files in the market data subdirectories (eg 200913 for 13Sep2020, 220802 for 02Aug2022).
  • gimp (GNU image manipulation program) is what I used for the SP500 time-section transparencies. For more details, see the section above "Play with the [time, mind]-bending perspective yourself".
  • gnuplot.sh is the tiny bash script used to select gnuplot scripts. My other bash scripts can be found here.
  • QNial programming language - Quenn's University Nested Interactive Array Language (Q'Nial) is my top prefered programming language for modestly complex to insane programming challenges, along with at least 3 other people in the world. Bash scripts make a great companion to QNial. semi-log formula.ndf is the tiny "program" used to set up the semi-log line fits. More generally : here are many of my QNial programs. Subdirectories provide programs for various projects etc. Menu directory status & updates copyrights    
    Subscribe to the NATIONAL POST
    Regina
    11°
    Partly cloudy
     About nationalpost.com | Subscriber Exclusive [Profile[Register]   [Logout • Find a job
    • Find a car
    • Find a home
    • Obituaries
    • Personals
    • Find an ad News
    World
    Canada
    Issues & Ideas
    Toronto
    Body & Health
    Editorials
    Letters
    Arts & Life
    Sports
    Sports Briefs
    Driver's Edge
    Financial Post
    Post Movies
    FP Investing
    FP Market Data
    Toronto
    Columnists
    Diversions
    Section Scan
    7-Day Archive
    Online Extras
    Subscribe
    Newspaper Ads

    G3
    Weddings
    Driver's Edge
    Post Movies
    FP Entrepreneur
    Weekend Post
    FP Weekend
    Travel
    Homes
    FP Business Magazine
    Post Homes Magazine
    Subscriber services
    Renew Subscription
    Update Credit Card
    User Help
    Corrections
    Send us a news tip
    - To the editor
    - About your event
    - Site Feedback
    Mobile Headlines
    Daily Newsletter
    Contests
    Contact the Post
    Advertise
    Advanced Search
    National news
    Global National
    Features
    [tonight on Global]
    [full listings]-->
    document.write('
     ' +
    Search canada.com   About Us   Advertise   Site Map   Privacy   Terms   FAQ   Our Partners Copyright © CanWest Interactive, a division of CanWest MediaWorks Publications Inc. All rights reserved.
    Copyright & Permission Rules
    document.write(' Menu directory status & updates copyrights
  • multpl.com
  • Qiang Zhang 30Jan2021 Price Earning Ratio model - This is similar to, but better than, my own model below. His github has several other interesting investment-related postings, including Black-Scholes derivative pricing. see Howell - SP500 PE Shiller ratios versus 10 year Treasury bond yields, with earnings growth & discount factors.ods
  • time-varying [SP500_growFuture, etc] - there is little chance of growth rates lasting more than a year or two, especially || > 20%. Frankly, they are constantly changing year-to-year in a big way. The time series approach mentioned below is a simple basis for anticipating this in a statistic manner as a start. Other approaches get more into predictions based on some concept or another.
  • SP500 index, variable [dividends, internal investment & stock buybacks, earnings] - I won't be looking at these any time soon ....
  • Elliot Wave Theory, notable Robert Prechter (including Socionomics). Amoung many, many fun topics, the arguments presented about how the Fed FOLLOWSnterest rates, only gng the impression of leading, is espectially relevant to theis web-page.
  • Harry S. Dent Jr - demographics, with astounding successes in the past (at least twice on decade-or-longer-out basis, perhaps a bit muffled with the last decade.
  • Stephen Puetz - Universal Wave Series stunning results across a huge swath of subject areas!! eminds me of the system of 20+ Mayan calendars.
  • Brian Frank of Frank funds - "Slaughterhouse-Five (Hundred), Passive Investing and its Effects on the U.S. Stock Market" - Index fund [distortion, eventual destabilization] of the markets. This was a recent fascinating read for me. (MarketWatch 10Apr2020) Menu directory status & updates copyrights I will change this every six months or year, just to profile my different projects past and ongoing. See also past home page highlights, Howell's blog, my assorted blogs.

    04Jul202 Edo Kaal periodic table of the elements


    Icebreaker Unchained : we should have lost WWII

    I have not yet made a webPage for this project (so many years after it was shelved in Aug2015!), but [documentation, information, unfinished scripts] are provided in the Stalin supported Hitler (video production) directory and Icebreaker directory (which should be combined into one). Two very simple animations took sooooo loooong to produce. They total only ~ 1 minute for both "A year of stunning victories" map scan-zooms of the Poland, false war, lowlands, France and Dunkirk). Worse, the unfinished part 1 of 6 videos (!1 hour length) wasn't saved to a complete file, and the software to it needs massive updating. The vidoes are in ogv format (.ogg)- use the VLC media player to view (some other media programs also work).
    25May2021 Here are two example graphs of TSLA options that I have been working on. I am far from getting into options trading, I just want to learn more about the market. For more details (but no webPage yet), see QNial software coding for options data processing (also "winURL yahoo finance news download.ndf" in the same directory for yahoo finance news downloads), and several graphs of Tesla options.

    1872-2020 SP500 index, ratio of opening price to semi-log detrended price


    David Fischer - The Great pricing Waves 1200-1990 AD


    "Mega-Life, Mega-Death, and the invisible hand of the Sun: Towards a quasi-predictive model for the rise and fall of civilisations", Click to see a full-sized image of the chart in your browser.. (~3.5 feet squared on my kitchen wall. My printed out version includes hand annotated comparisons to the Mayan calendar and other references.) Menu directory status & updates copyrights

    "Mega-Life, Mega-Death, and the invisible hand of the Sun: Towards a quasi-predictive model for the rise and fall of civilisations", Click to see a full-sized image of the chart in your browser.. (~3.5 feet squared on my kitchen wall. My printed out version includes hand annotated comparisons to the Mayan calendar and other references.)

    12Sep2020: 1872-2020 SP500 index, ratio of opening price to semi-log detrended price


    Menu directory status & updates copyrights
  • Menu directory status & updates copyrights
  • help identify program coding, as distinct from, or hybridized with, protein coding within [DNA, mRNA]. While this is mostly an issue for my MindCode project, callerID-SNNs fit nicely into, and may pragmatically help, that context.
  • extra-neruon [Turing, von Neuman]-like computations based on the local neural network [structure, connection]s. This was a focus of my previous MindCode and earlier work (eg. Genetic specification of recurrent neural networks (draft version of a WCCI2006 conference paper), but isn't a currently active part of my work, as a priority for me is to search for a [Lamarckian, Mendellian] hereditary basis for neural networks, tied into cellular processes. This has long been a focus of Juyang Weng
  • intra-neuron [Turing, von Neuman]-like computations based on the "focus" neuron's [DNA, RNA, methylation, sequence processing mechanisms]. This is a separate subject addressed by my MindCode 2023 concept. An mid-term objective is to tie caller-IDs to the work of Stephen Grossberg as described in my webPage Overview - Stephen Grossberg's 2021 "Conscious Mind, Resonant Brain". Gail Carpenter worked with his concepts from the Spiking Neural Network perspective. Theresa Ludemuir (???), Jose Principe (Reproducing Kernel Hilbert Spaces), and others have also done interesting work with SNNs, but not tied to Grossberg's framework.
    For now, I can't find my earlier musings (see very incomplete Fractal notes. as I remember, the plan was to build fractal dendrites (as the main inter-neuron synaptic information transfer) for callerID-SNNs. Axons as well, but perhaps more specialised for [power transmission or something.
    10Nov2023 Maybe I can use a prime number basis for [time, synapse] fractals, as a contrast to Stephen Puetz's "Universal Wave Series" amazing "factor-of-three" series, combined with his half series. For example, with roughly a factor-of-three [1, 3, 7, 23, ...], or maybe factor-of-two or just all primes.
  • Howell 2006 "Genetic specification of recurrent neural networks" (draft version of my WCCI2006 conference paper)
  • MindCode 2023 description
  • MindCode 2023 program coding (QNial programming language) this is a simple one-line listing of each operator for each file
  • callerID-SNNs Introduction (this webPage)
  • callerID-SNNs program coding (QNial programming language)
  • bash library: file operations used extensively, sometimes hybridized with the QNial programming language Menu directory status & updates copyrights Menu directory status & updates copyrights
  • Genetic

  • Genetic

  • Junk

  • 2005 IJCNN website - official/
  • 2006 WCCI website - official/
  • 2007 IJCNN Orlando website/
  • Holidays - neural networks and genomics.html
  • Howell 2005 - Presentation, Junk DNA and Neural Networks conjecture on directions and implications.ppt
  • Howell 2006 - Genetic specification of neural networks, draft concepts and implications.pdf
  • Howell 2006 - Presentation, Genetic Specification of Recurrent Neural Networks Initial Thoughts.ppt
  • Menu
  • Menu
  • Menu "... Consciousness, at its simplest, is sentience and awareness of internal and external existence.[1] However, its nature has led to millennia of analyses, explanations and debates by philosophers, theologians, linguists, and scientists. Opinions differ about what exactly needs to be studied or even considered consciousness. ..."(Wiki2023)
  • Only a very small number of theories of consciousness are listed on this webPage, compared to the vast number of [paper, book]s on the subject coming out all of the time. "Popular theories" as listed on Wikipedia, are shown, assuming that this will be important for non-experts. But the only ones that really count for this webSite are the "Priority model of consciousness".
    Readers will have completely different [interest, priority]s than I, so they would normally have different "Priority model of consciousness", and rankings of the conscousness theories. To understand my selections and rankings, see Introduction to this webSite.
  • this webSite's Questions: Grossberg's c-ART, Transformer NNs, and consciousness?. I like the description in Wikipedia (Wiki2023):
    The following additional definitions are also quoted from (Wiki2023) :
    ..." (Wiki2023)
    ..." (Wiki2023)
    ..." (Wiki2023)
    Grossberg's concepts are NOT normally listed in [compilations, reviews] of consciousness, which is a [puzzle, failure] that I address separately.
    16Jul2023 I am currently lacking a coherent overall webPage for Grossberg's Consciousness. In the meantime refer to the very detailed listing of consciousness and other themes as a starting point to peruse for Grossberg's ideas. This webPage is a compilation of themes extracted from files listing [chapter, section, figure, table, comment]s.
    The following listing is taken from What is consciousness: from historical to Grossberg, and repeats some of the points in this section above : conscious ART (cART), etc
  • A surprisingly small number of neural architectures can simulate [extensive, diverse] [neuro, pyscho]logical data at BOTH the [sub, ]conscious levels, and for [perception, action] of [sight, auditory, touch, language, cognition, emotion, etc]. This is similar to what we see in physics.
  • [extensive, diverse] ex-bio applications have been successfully [developed, applied], based on Grossberg etal's computational models.
  • see simple grepStr search results : 'ART|cART|pART|ARTMAP|ARTSTREAM|ARTPHONE|ARTSCAN|dARTSCAN|pARTSCAN|ARTSCENE|ARTSTREAM|ARTWORD|cARTWORD|LAMINART|PARSE|SMART|START|nSTART' ..."(Wiki2023)
    Byoung-Kyong Min 2010 "A Thalamic reticular networking model of consciousness"
    (Wiki2023)
    Wikipedia: Models of consciousness, retrieved Apr2023 (Wiki2023)
    ..." (Wiki2023)
    ..." (Wiki2023)
    "... The Neural correlates of consciousness (NCC) formalism is used as a major step towards explaining consciousness. The NCC are defined to constitute the minimal set of neuronal events and mechanisms sufficient for a specific conscious percept, and consequently sufficient for consciousness. In this formalism, consciousness is viewed as a state-dependent property of some undefined complex, adaptive, and highly interconnected biological system.[3][4][5] ..." (Wiki2023, full article: Wiki2023 - Neural_correlates_of_consciousness, also cited by Grossberg 2021)
    Another idea that has drawn attention for several decades is that consciousness is associated with high-frequency (gamma band) oscillations in brain activity. This idea arose from proposals in the 1980s, by Christof von der Malsburg and Wolf Singer, that gamma oscillations could solve the so-called binding problem, by linking information represented in different parts of the brain into a unified experience.[80] Rodolfo Llinás, for example, proposed that consciousness results from recurrent thalamo-cortical resonance where the specific thalamocortical systems (content) and the non-specific (centromedial thalamus) thalamocortical systems (context) interact in the gamma band frequency via synchronous oscillations.[81] ..." (Wiki2023 - Consciousness#Neural_correlates)
    Howell 19Jul2023 Note that Grossberg's ART predictions are supported by experiments by a number of researchers including Wolf Singer (see Quoted text from (Grossberg 2021)).
    "... Integrated Information Theory (IIT) offers an explanation for the nature and source of consciousness. Initially proposed by Giulio Tononi in 2004, it claims that consciousness is identical to a certain kind of information, the realization of which requires physical, not merely functional, integration, and which can be measured mathematically according to the phi metric. ..." (UTM - Integrated information theory)
    "... Integrated information theory (IIT) attempts to provide a framework capable of explaining why some physical systems (such as human brains) are conscious,[1] why they feel the particular way they do in particular states (e.g. why our visual field appears extended when we gaze out at the night sky),[2] and what it would take for other physical systems to be conscious (Are other animals conscious? Might the whole Universe be?).[3] ... In IIT, a system's consciousness (what it is like subjectively) is conjectured to be identical to its causal properties (what it is like objectively). Therefore it should be possible to account for the conscious experience of a physical system by unfolding its complete causal powers (see Central identity).[4] ... Specifically, IIT moves from phenomenology to mechanism by attempting to identify the essential properties of conscious experience (dubbed "axioms") and, from there, the essential properties of conscious physical systems (dubbed "postulates"). 3..." (Wiki2023 - Integrated information theory)
    Wikipedia lists numerous criticisms of IIT, but I have not yet quoted from that, other than to mention the authors : Wikipedia: Models of consciousness
    "... Sociology of human consciousness uses the theories and methodology of sociology to explain human consciousness. The theory and its models emphasize the importance of language, collective representations, self-conceptions, and self-reflectivity. It argues that the shape and feel of human consciousness is heavily social. ..."(Wiki2023, full webPage Wiki2023
    "... Daniel Dennett proposed a physicalist, information processing based multiple drafts model of consciousness described more fully in his 1991 book, Consciousness Explained. ..." (Wiki2023, full webPage Wiki2023)
    ..." (Wiki2023)
    "... Functionalism is a view in the theory of the mind. It states that mental states (beliefs, desires, being in pain, etc.) are constituted solely by their functional role – that is, they have causal relations to other mental states, numerous sensory inputs, and behavioral outputs. ..." (Wiki2023, full webPage Wiki2023)
    "... Electromagnetic theories of consciousness propose that consciousness can be understood as an electromagnetic phenomenon that occurs when a brain produces an electromagnetic field with specific characteristics.[7][8] Some electromagnetic theories are also quantum mind theories of consciousness.[9] ..." (Wiki2023)
    "... "No serious researcher I know believes in an electromagnetic theory of consciousness,"[16] Bernard Baars wrote in an e-mail.[better source needed] Baars is a neurobiologist and co-editor of Consciousness and Cognition, another scientific journal in the field. "It's not really worth talking about scientifically,"[16] he was quoted as saying. ..." (Wiki2023)
    Stuart Hameroff separately worked in cancer research and anesthesia, which gave him an interest in brain processes. Hameroff read Penrose's book and suggested to him that microtubules within neurons were suitable candidate sites for quantum processing, and ultimately for consciousness.[30][31] Throughout the 1990s, the two collaborated on the Orch OR theory, which Penrose published in Shadows of the Mind (1994).[19] ..."Wiki2023
    rationalwiki.org presents a hard-nosed critique of various "quantum consciousness" theories, from which the following quote is taken :
    Menu
  • "... Large Language Models (LLMs) have been transformative. They are pre-trained foundational models that are self-supervised and can be adapted with fine-tuning to a wide range of natural language tasks, each of which previously would have required a separate network model. This is one step closer to the extraordinary versatility of human language. GPT-3 and more recently LaMDA can carry on dialogs with humans on many topics after minimal priming with a few examples. However, there has been a wide range of reactions and debate on whether these LLMs understand what they are saying or exhibit signs of intelligence. This high variance is exhibited in three interviews with LLMs reaching wildly different conclusions. A new possibility was uncovered that could explain this divergence. What appears to be intelligence in LLMs may in fact be a mirror that reflects the intelligence of the interviewer, a remarkable twist that could be considered a Reverse Turing Test. If so, then by studying interviews we may be learning more about the intelligence and beliefs of the interviewer than the intelligence of the LLMs. As LLMs become more capable they may transform the way we interact with machines and how they interact with each other. Increasingly, LLMs are being coupled with sensorimotor devices. LLMs can talk the talk, but can they walk the walk? A road map for achieving artificial general autonomy is outlined with seven major improvements inspired by brain systems and how LLMs could in turn be used to uncover new insights into brain function. ..." (Sejnowski 2022)
    Sejnowski's idea is very interesting, judging by many [science, computer, philosophy, engineering, policy, public] commentators, for whom this is a very emotionally-laden subject that seems to drive [fear, suppression]. The case of Blake Lemoine is a good example. How far can LLMs go in assessing human intelligence, given their huge "codified databases"? Would they be able to go beyond our traditional measures of intelligence in both [depth, diversity]?
    Menu
  • [definitions, models] of consciousness.html -
  • What is consciousness: from historical to Grossberg - Menu
  • data from [neuroscience, psychology] : quick list, more details
  • success in real world advanced [science, engineering] applications (non-[bio, psycho]logical) A few common definitions of consciousness are provided my webPage [definitions, models] of [consciousness, sentience]. However, for reasons given on that webpage, only Stephen Grossberg's concept provide a workable basis that is tied to [].
    A few models of consciousness are summarized on my webPage A quick comparison of Consciousness Theories. Only a few concepts are listed, almost randomly selected except for [Grossberg, Taylor]'s, as there are a huge [number, diversity] of concepts. Stephen Grossberg may have the ONLY definition of consciousness that is directly tied to quantitative models for lower-level [neuron, general neurology, psychology] data. Foundational models, similar in nature to the small number of general theories in physics to describe a vast range of phenomena, were derived over a period of ?4-5? decades BEFORE they were found to apply to consciousness. That paralleled their use in very widespread applications in [science, engineering, etc]. As such, this is the only solidly-based EMERGENT theory of consciousness that I know of. Grossberg's book provides a wonderful description :
  • John Taylor's concepts - The only other concept for consiousness that felt even somewhat comfortable with was the late John Taylor's. It seemed to me that it emerged from "Approximate Dynamic Programming" theories of Paul Werbos, which was inspired by Sigmund Freud's theories (which I dodn't actually like in general, but had to admit their widespread adoption at one time, and their inspirational use) with a tremendous based of [theoretical, practical] applications to system [identification ????]. While I do provide a very brief summary on a separate webPage, it is not my current focus.
  • references- Grossberg and Menu
  • see Grossberg 2021: the biological need for machine consciousness
    Howell 30Dec2011, page 39 "Part VI - Far beyond current toolsets"
    Menu 13.3K Followers ..."(Blake Lemoine, 2022)
  • 11Jun2022 Is LaMDA Sentient? — an Interview

    22Jun2022 We’re All Different and That’s Okay

    11Jun2022 What is LaMDA and What Does it Want?

    14Aug2022 What is sentience and why does it matter?

    More detail following from Sejnnowski's thinking is on the webPage For whom the bell tolls. The following comment comes from that webPage.
    Menu
  • Historical thinking about consciousness.
  • Historical thinking about quantum [neurophysiology, consciousness] Menu
  • WRONG!! It may help the ready to re-visit comments about the historical thinking about consciousness, which is not limited to quantum consciousness. This complements items below. Early era of [General Relativity, Quantum Mechanics]: I would be greatly surprised if there wasn't some thinking about quantum consciousness at least back to the "modern inception" of quantum mechanics by Max Planc in 1901. Schrodinger seems to have gone at least partially in that direction by 1944 (see Historical thinking about quantum [neurophysiology, consciousness]). But as with the ancient Greeks, I would be surprised if others in the quantum mechanics community weren't thinking of mind in addition to matter in the early 1900s. To me, this would not be a solid assumption to make even if the lack of documentation is glaring.
    Pribram 1993 quantum fields and consciousness proceedings provides references back to 1960, and Jibu, Yasue comment that :
  • Howells questions about 1993 conference proceedings Menu
  • Menu
  • from the section Grossberg's c-ART, Transformer NNs, and consciousness?:
    Menu
  • As per the second question from the section Grossberg's c-ART, Transformer NNs, and consciousness?:
    2. How difficult would it be to augment "Transformer Neural Networks" (TrNNs) with Grossberg's [concept, architecture]s, including the emergent systems for consciousness? Perhaps this would combine the scalability of the former with the [robust, extendable] foundations of the latter, which is supported by [broad, diverse, deep] data from [neuroscience, psychology], as well success in real world advanced [science, engineering] applications?
    Menu
  • As per the first question from the section Grossberg's c-ART, Transformer NNs, and consciousness?:
    Menu Menu Menu
  • use a bash script, for example, to automatically play through a sequence of selected segments Viewers may list their own comments in files (on or more files from different people, for example), to include in Files listing [chapter, section, figure, table, selected Grossberg quotes, my comments]s. These files of lists are my basis for providing much more detailed information. While this is FAR LESS HELPFUL than the text of the book or its index alone, it can complement the book index, and it has the advantages that :
  • text extractions of simple searches or "themes" is greatly facilitated, so the reader can download the files, copy the bash scripts (or use another text extraction program), and set up their own "themes". Rather than just watch this video, you can follow it by reading the script and following its links, once I write it... What is consciousness? I will start with a simple definition concentrated on how out [awareness of [environment, situation, self, others], expectations, feeling about a situation] arise from essentially non-conscious cognitive, emotional, and motor processes, including muscle control. "Awareness", "Expectations", "Emotions", lead to "Actions". "Actions" include muscle actions, language communications, striving towards a goal, reactions to the current situation, directing [perception, cognition], and other processes. "Learning" in a robust, stable, and flexible manner is an essential part of this, given that the environment forces us to learn and adapt to new situations and to modify our [conscious, sub-conscious] understanding where it is wrong or insufficient. Some other components of consciousness are provided in the remainder of this video, but there are many, many more in the literature. Of interest to philosophers such as David Chalmers, are qualia and phenomenal experiences.
  • image p038fig01.25 The ART Matching Rule stabilizes real time learning using a [top-down, modulatory on-center, off-surround] network. Object attention is realized by such a network. See text for additional discussion.
    || ART Matching Rule [volition, categories, features]. [one, two] against one.
  • image p483fig13.02 The object-value categories in the orbitofrontal cortex require converging specific inputs from the sensory cortex and nonspecific incentive motivational inputs from the amygdala in order to fire. When the orbitofrontal cortex fires, it can deliver top-down ART Matching Rule priming signals to the sensory cortical area by which it was activated, thereby helping to choose the active recognition categories there that have the most emotional support, while suppressing others, leading to attentional blocking of irrelevant cues.
    || Cognitive-Emotional-Motor (CogEM) model. Drive-> amygdala incentive motivational learning-> orbitofrontal cortex- need converging cue and incentive iputs to fire <-> sensory cortex- conditioned renforcer learning-> amygdala. CS-> sensory cortex. Motivated attention closes the cognitive-emotional feedback loop, focuses on relevant cues, and causes blocking of irrelevant cues.
  • image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
    || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
  • First, what is the 'hard problem of consciousness'? Wikipedia says: '... The hard problem of consciousness is the problem of explaining how and why we have qualia or phenomenal experiences - how sensations acquire characteristics, such as colors and tastes'. David Chalmers, who introduced the term 'hard problem' of consciousness, contrasts this with the 'easy problems' of explaining the ability to discriminate, integrate information, report mental states, focus attention, etc. As (Chalmer 1995) has noted: 'The really hard problem of consciousness is the problem of experience. When we think and perceive, there is a whir of information-processing, but there is also a subjective aspect. As (Nagel 1974) has put it, there is something it is like to be a conscious organism. This subjective aspect is experience. When we see, for example, we experience visual sensations: the felt quality of redness, the experience of dark and light, the quality of depth in a visual field. Other experiences go along with perception in different modalities: the sound of a clarinet, the smell of mothballs. Then there are bodily sensations, from pains to orgasms; mental images that are conjured up internally; the felt qaulity of emotion, and the feeling of a stream of conscious thought. What unites all these states is that there is something it is like to be in them. All of them are states of experience.'
    "... The Internet Encyclopedia of Philosophy goes on to say: 'The hard problem of consciousness is the problem of explaining why any physical state is conscious rather than nonconscious. It is the problem of explaining why there is something it is like for a subject in conscious experience, why conscious mental states light up and directly appear to the subject.' ..."
  • image p541fig15.02 The neurotrophic Spectrally Timed Adaptive Resonance Theory, or nSTART, model of (Franklin, Grossberg 2017) includes hippocampus to enable adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between CS and US.
    || Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. [CS, US] -> Sensory Cortex (SC) <- motivational attention <-> category learning -> Prefrontal Cortex (PFC). SC conditioned reinforcement learning-> Amygdala (cannot bridge the temporal gap) incentive motivational learning-> PFC. SC adaptively timer learning and BDNF-> Hippocampus (can bridge the temporal gap) BDNF-> PFC. PFC adaptively timed motor learning-> cerebellum.
  • p190 Howell: [neural microcircuits, modal architectures] used in ART -
    bottom-up filterstop-down expectationspurpose
    instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
    LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
    EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
    auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
    visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
    red - cognitive-emotional dynamics
    green - working memory dynamics
    black - see [bottom-up, top-down] lists
    EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

    backgound colours in the table signify :
    whitegeneral microcircuit : a possible component of ART architecture
    lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet? some are conscious (decision-quality? or must interact with conscious cognitive?), others not
    light bluepost-perceptual cognition?
    pink"the feeling of what happens" and knowing what event caused that feeling
    020
  • image p038fig01.25 The ART Matching Rule stabilizes real time learning using a [top-down, modulatory on-center, off-surround] network. Object attention is realized by such a network. See text for additional discussion.
    || ART Matching Rule [volition, categories, features]. [one, two] against one. 025
  • image p192fig05.06 Bottom-up and top-down circuits between the LGN and cortical area V1. The top-down circuits obey the ART Matching Rule for matching with bottom-up input patterns and focussing attention on expected critical features.
    || Model V1-LGN circuits, version [1, 2]. retina -> LGN relay cells -> interneurons -> cortex [simple, endstopped] cells -> cortex complex cells 030
  • image p200fig05.13 Instar and outstar learning are often used to learn the adaptive weights in the bottom-up filters and top-down expectations that occur in ART. The ART Matching Rule for object attention enables top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features.
    || Expectations focus attention: feature pattern (STM), Bottom-Up adaptive filter (LTM), Category (STM), competition, Top-Down expectation (LTM); ART Matching Rule: STM before top-down matching, STM after top-down matching (attention!) 100
  • image p520fig14.02 Macrocircuit of the main brain regions, and connections between them, that are modelled in the unified predictive Adaptive Resonance Theory (pART) of cognitive-emotional and working memory dynamics. Abbreviations in red denote brain regions used in cognitive-emotional dynamics. Those in green denote brain regions used in working memory dynamics. Black abbreviations denote brain regions that carry out visual perception, learning and recognition of visual object categories, and motion perception, spatial representation and target tracking. Arrow denote non-excitatory synapses. Hemidiscs denote adpative excitatory synapses. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown. Also not shown are output signals from cortical areas to motor responses. V1: striate, or primary, visual cortex; V2 and V4: areas of prestriate visual cortex; MT: Middle Temporal cortex; MST: Medial Superior Temporal area; ITp: posterior InferoTemporal cortex; ITa: anterior InferoTemporal cortex; PPC: Posterior parietal Cortex; LIP: Lateral InterParietal area; VPA: Ventral PreArculate gyrus; FEF: Frontal Eye Fields; PHC: ParaHippocampal Cortex; DLPFC: DorsoLateral Hippocampal Cortex; HIPPO: hippocampus; LH: Lateral Hypothalamus; BG: Basal Ganglia; AMYG: AMYGdala; OFC: OrbitoFrontal Cortex; PRC: PeriRhinal Cortex; VPS: Ventral bank of the Principal Sulcus; VLPFC: VentroLateral PreFrontal Cortex. See the text for further details.
    || 240
  • image p240fig05.44 When an algebraic exemplar model is realized using only local computations, it starts looking like an ART prototype model.
    || How does the model know which exemplars are in category A? BU-TD learning. How does a NOVEL test item access category A? 325
  • image p207fig05.19 The ART hypothesis testing and learning cycle. See the text for details about how the attentional system and orienting system interact in order to incorporate learning of novel categories into the corpus of already learned categories without causing catastophic forgetting.
    || 330
  • image p215fig05.28 How a mismatch between bottom-up and top-down input patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level.
    || Mismatch -> inhibition -> arousal -> reset. BU input orienting arousal, BU+TD mismatch arousal and reset. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance 335
  • image p226fig05.35 I had shown in 1976 how a competitive learning or self-organizing map model could undergo catastrophic forgetting if the input environment was sufficiently dense and nonstationary, as illustrated by Figure 5.18. Later work with Gail Carpenter showed how, if the ART Matching Rule was shut off, repeating just four input patterns in the correct order could also casue catastrophic forgetting by causing superset recoding, as illustrated in Figure 5.36.
    || Code instability input sequences. D C A; B A; B C = ; |D|<|B|<|C|; where |E| is the number of features in the set E. Any set of input vectors that satisfy the above conditions will lead to unstable coding if they are periodically presented in the order ABCAD and the top-down ART Matching Rule is shut off. 340
  • image p226fig05.36 Column (a) shows catastrophic forgetting when the ART Matching Rule is not operative. It is due to superset recoding. Column (b) shows how category learning quickly stabilizes when the ART Matching Rule is restored.
    || Stabel and unstable learning, superset recoding 345
  • image p241fig05.45 The 5-4 category structure is one example of how an ART network learns the same kinds of categories as human learners. See the text for details.
    || 5-4 Category structure. A1-A5: closer to the (1 1 1 1) prototype; B1-B4: closer to the (0 0 0 0) prototype 350
  • image p419fig12.17 The auditory continuity illusion illustrates the ART Matching Rule at the level of auditory streaming. Its "backwards in time" effect of future context on past conscious perception is a signature of resonance.
    || Auditory continuity illusion. input, percept. Backwards in time - How does a future sound let past sound continue through noise? Resonance! - It takes a while to kick in. After it starts, a future tone can maintain it much more quickly. Why does this not happen if there is no noise? - ART Matching Rule! TD harmonic filter is maodulatory without BU input. It cannot create something out of nothing. 355
  • image p600fig16.36 The entorhinal-hipppocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories. See the text for details.
    || Entorhinal-hippocampal interactions as an ART system. Hippocampal place cells as spatial categories. Angular head velocity-> head direction cells-> stripe cells- small scale 1D periodic code (ECIII) SOM-> grid cells- small scale 2D periodic code (ECII) SOM-> place cells- larger scale spatial map (DG/CA3)-> place cells (CA1)-> conjunctive-coding cells (EC V/VI)-> top-down feedback back to stripe cells- small scale 1D periodic code (ECIII). stripe cells- small scale 1D periodic code (ECIII)-> place cells (CA1). 800
  • image p211fig05.21 Sequences of P120, N200, and P300 event-related potentials occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    || ERP support for mismatch-mediated reset: event-related potentials: human scalp potentials. ART predicted correlated sequences of P120-N200-P300 Event Related Potentials during oddball learning. P120 mismatch; N200 arousal/novelty; P300 STM reset. Confirmed in (Banquet and Grossberg 1987) 905
  • image p541fig15.02 The neurotrophic Spectrally Timed Adaptive Resonance Theory, or nSTART, model of (Franklin, Grossberg 2017) includes hippocampus to enable adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between CS and US.
    || Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. [CS, US] -> Sensory Cortex (SC) <- motivational attention <-> category learning -> Prefrontal Cortex (PFC). SC conditioned reinforcement learning-> Amygdala (cannot bridge the temporal gap) incentive motivational learning-> PFC. SC adaptively timer learning and BDNF-> Hippocampus (can bridge the temporal gap) BDNF-> PFC. PFC adaptively timed motor learning-> cerebellum.
  • image
  • image
  • image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
    || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable different kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..." Menu
  • Menu
  • Grossbergs list of [chapter, section]s.html - Note that the links on this webPage can be used to individually view all captioned images.
  • directory of captioned images - users can easily view all of the captioned images, especially if they are downloaded onto their computer. Many image viewers have "forward, backward] arrows to go through these sequentially, or right-click to open a link in a window.
  • core bash script for extracting captions from webPage listing, convert them to images, then vertically appending them to the figure.
  • my bash utility to [position, move] windows. This is normally used to start up 6 workspaces on my computer (Linux Mint Debian Edition), each with 5-10 apps in separate windows.
  • Prepared themes with links to the captioned images - there are a huge number of themes from the book to focus on. I have prepared a few as examples.
  • What is consciousness? - video example not ready as of 30Aug2023. I save videos as "ogv/ogg" files, and open standard format. The "VLC media viewer" is the program that I use to view them. I have found that although some of the standard video viewers complain, when pushed into the process ogv files can be viewed with them. Menu
  • Navigation: [menu, link, directory]s
  • Theme webPage generation by bash script
  • Notation for [chapter, section, figure, table, index, note]s
  • incorporate reader questions into theme webPages
    GNU Public License The GNU Free Documentation License; Creative Commons License Menu
  • A very primitive bash script is used to generate the search results for ALL themes in the Themes webPage. Many readers will already have far better tools for this from the Computational Intelligence area etc.
    Because the theme webPage is automatically generated, and frequently re-generated as I update the list of themes and sources, I do NOT edit the file directly. The output format can be confusing, due to the special formatted [chapter, section] headings, and large tables which will keep the readers guessing whether they are still within the theme they want to peruse (as per the Table of Contents). Perhaps I can upgrade the searches in time to reduce the confusion, and to split themes in a better way.
  • list of [chapter, section]s
  • list of [figure, table]s
  • selected index items - I have NO intention of re-typing the entire index!
  • Grossberg quotes
  • reader Howell notes - this is an example of building your own webPage of [note, comment, thought]s when reading the book, which can them be added to the bash script for searches. Are notes in addition to [figure, table] captions, mostly comprised of text within the image, but also including quotes of text in the book. Rarely, it includes comments by Howell preceded by "Howell".
    The latter are distinct from "readers notes" (see, for example : Grossberg's list items- related notes from others). The reader may want to create their own file of comments based on this example, or augment this list with their [own, others'] notes. If using a new file, it should be added to the bash search script.
    More importantly, and as an easy first adaptation of Grossbergs [core, fun, strange] concepts.html thematic listings, you probably want to get rid of Howell's [comments, question]s. This can be done for a "local directory on your system" simply by :
  • downloading the entire webDirectories below to some directory on your filesystem, say {yourDir} : TrNNs_ART , bin (hopefully I'm not missing too many other directories in this list)
  • adapt the bash script bash script: thematic [search, collect]s.sh to your own system, and run. This will require re-defining several environmental variables for your, such as : Menu
  • thematic sub-lists appear in the webPage "Grossberg's [core, fun, strange] concepts", created by very simple searches for key [word, phrase]s. Links in the sub-lists lead quickly to pertinent figures or other content. Menu
  • 29Sep2023 Here is a list of various problems with the captioned images and their links on the webPage Grossbergs list of [figure, table]s.html :
    10Aug2023 I haven't yet provided content for this webPage. It does touch on one of three questions of this webSite as mentioned in the Introduction :
  • How difficult would it be to augment "Transformer Neural Networks" (TrNNs) with Grossberg's [concept, architecture]s, including the emergent systems for consciousness? Perhaps this would combine the scalability of the former with the [robust, extendable] foundations of the latter, which is supported by [broad, diverse, deep] data from [neuroscience, psychology], as well success in real world advanced [science, engineering] applications? 10Aug2023 This webPage has not yet been worked on. It will touch on one of three questions of this webSite as mentioned in the Introduction :
  • How difficult would it be to augment "Transformer Neural Networks" (TrNNs) with Grossberg's [concept, architecture]s, including the emergent systems for consciousness? Perhaps this would combine the scalability of the former with the [robust, extendable] foundations of the latter, which is supported by [broad, diverse, deep] data from [neuroscience, psychology], as well success in real world advanced [science, engineering] applications? 10Aug2023 I haven't yet provided content for this webPage. It does touch on one of three questions of this webSite as mentioned in the Introduction : Menu conscious ART (cART), etc
  • A surprisingly small number of neural architectures can simulate [extensive, diverse] [neuro, pyscho]logical data at BOTH the [sub, ]conscious levels, and for [perception, action] of [sight, auditory, touch, language, cognition, emotion, etc]. This is similar to what we see in physics.
  • [extensive, diverse] ex-bio applications have been successfully [developed, applied], based on Grossberg etal's computational models.
  • see simple grepStr search results : 'ART|cART|pART|ARTMAP|ARTSTREAM|ARTPHONE|ARTSCAN|dARTSCAN|pARTSCAN|ARTSCENE|ARTSTREAM|ARTWORD|cARTWORD|LAMINART|PARSE|SMART|START|nSTART' Grossberg's concepts are NOT normally listed in [compilations, reviews] of consciousness, which is a [puzzle, failure] that I address separately. (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin 2017) Byoung-Kyong Min 2010 "A Thalamic reticular networking model of consciousness"
    "... The model suggests consciousness as a "mental state embodied through TRN-modulated synchronization of thalamocortical networks". In this model the thalamic reticular nucleus (TRN) is suggested as ideally suited for controlling the entire cerebral network, and responsible (via GABAergic networking) for synchronization of neural activity. ..." (Wiki2023) Menu
  • Menu
  • Menu
  • Navigation: [menu, link, directory]s
  • Theme webPage generation by bash script
  • Notation for [chapter, section, figure, table, index, note]s
  • incorporate reader questions into theme webPages
  • Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
    The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

    Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

    Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

    These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

    This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

    The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

    I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

    Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

    The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

    If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

    The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

    Howell: bold added for emphasis.
    (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
    see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
  • image p059fig02.05 Bowed serial position curve. This kind of data emphasizes the importance of modelling how our brains give rise to our minds using nonlinear systems of differential equations.
    || Effects of [inter, intra]trial intervals (Hovland 1938). # of errors vs list position. [w (sec), W (sec)] = (2 6) (4 6) (2 126) (4 126). Nonoccurance of future items reduces the number of errors in response to past items. Thes data require a real-time theory for their explanation! that is, DIFFERENTIAL equations.
  • image p064fig02.10 The Shunting Model includes upper and lower bounds on neuronal activities. These bound have the effect of multiplying additive terms by excitatory and inhibitory automatic gain terms that enable such models to preserve their sensitivity to inputs whose size may vary greatly in size through time, while also approximately normalizing their total activities.
    || STM: Shunting Model (Grossberg, PNAS 1967, 1968). Mass action in membrane equations. Bi/Ci -> xi(t) -> O -> -Fi/Ei. Bounded activations, automatic gain control. d[dt: xi(t)] = - Ai*xi + (Bi - Ci*xi)sum[j=1 to n: fj(xj(t))*Dji*yji*zji + Ii] - (Ei*Xi + Fi)*sum[j=1 to n: gj(xj)*Gji*Yji*Zji + Ji]. Includes the Additive Model.
  • image p064fig02.11 Medium-Term Memory (MTM) and Long-Term Memory (LTM) equations complement the Additive and Shunting Models of STM. MTM is typically defined by a chemical transmitter that is released from the synaptic knobs of a neuron (Figure 2.03). Its release or inactivation in an activity-dependent way is also called habituation. LTM defines how associative learning occurs between a pair of neurons whose activities are approximately correlated through time. See the text for details.
    || Medium and Long Term memory.
    MTMhabituative transmitter gated[dt: yki(t)] = H*(K - yki) - L*fk(xk)*yki
    LTMgated steepest descent learningd[dt: zki(t)] = Mk*fk(xk)*(hi(xi) - zki)
  • image p068fig02.14 Hodgkin and Huxley developed a model to explain how spikes travel down the squid giant axon.
    || Neurophysiology (single cell): spike potentials in squid giant axon (Hodgekin, Huxley 1952, Nobel Prize). time -> (dendrites -> cell body -> axon).
    C*dp[dt: V] = α*dp^2[dX^2: V] + (V(+) - V)*g(+) + (V(-) - V)*g(-) + (V^p - V)*g^p
    g(+) = G(+)(m,h), g(-) = G(-)(n), G^p = const, [m, h, n] - ionic processes, V - voltage
    Precursor of Shunting network model (Rail 1962). (Howell: see p075fig02.24 Membrane equations of neurophysiology. Shunting equation
  • image p074fig02.23 The equations for a shunting on-center off-surround network. Shunting terms lead to many beautiful and important properties of these networks, which are found ubiquitously, in one form or another, in all cellular tissues.
    || Shunting on-center off-surround network.
    Mass action: d[dt: xi] = -A*xi +(B - xi)*Ii -xi*sum[k≠i: Ik]
    Turn on unexcited sitesTurn off excited sites
    At equilibrium:
    0 = d[dt: xi] = -(A + Ii + sum[k≠i: Ik])*xi + B*Ii = -(A + I)*xi + B*Ii
    xi = B*Ii/(A + I) = B*θi*I/(A + I) = θi* B*I/(A + I)No saturation!
    Infinite dynamical range
    Automatic gain control
    Compute ratio scale
    Weber law
    x = sum[k-1 to n: xk] = B*I/(A + I) ≤ B Conserve total activity
    NORMALIZATION
    Limited capacty
    Real-time probability
  • image p075fig02.24 The membrane equations of neurophysiology describe how cell voltages change in response to excitatory, inhibitory, and passive input channels. Each channel is described by a potential difference multiplied by a conductance. With the special choices shown in the lower right-hand corner, this equation defines a feedforward shuntin on-center off-surround network.
    || Membrane equations of neurophysiology.
    C*dp[dt] = (V(+) - V)*g(+) +(V(-) - V)*g(-) +(V(p) - V)*g(p)
    Shunting equation (not additive)
    V Voltage
    V(+), V(-), V(p) Saturating voltages
    g(+), g(-), g(p) Conductances
    V(+) = B, C = 1; V(-) = V(p) = 0; g(+) = Ii; g(-) = sum[k≠i: Ik];
    lower V: V(+) = V(p) Silent inhibition, upper V: V(+). (Howell: see p068fig02.14 Grossberg's comment that Hodgkin&Huxley model was a "... Precursor of Shunting network model (Rail 1962) ...").
  • image p079fig02.32 Matching amplifies the matched pattern due to automatic gain control. See terms I and J in the equation.
    || Substrate of resonance. Match (in phase) of BU and TD input patterns AMPLIFIES matched pattern due to automatic gain control by shunting terms. J = sum[i: Ji], I = sum[i: Ii], θi = (Ii + Ji)/(I + J)
    xi = (B + C)*(I + J)/(A + I + J)*[θi -C/(B + C)]
    Need top-down expectations to be MODULATORY.
  • image p202fig05.17 This figure summarizes the simplest equations whereby the adaptive weights of a winning category learn the input pattern that drove it to win, or more generally a time-average of all the input patterns that succeeded in doing so.
    || Geometry of choice and learning, learning trains the closest LTM vector
  • image p359fig10.07 The two bottom-up pathways from LGN to layer 4 of V1 can together activate layer 4 and contrast-normalize layer 4 responses.
    || Bottom-up contrast normalization (Grossberg 1968, 1973; Sperling, Sondhi 1968; Heeger 1992; Douglas etal 1995; Shapley etal 2004). Together, direct LGN-to-4 path and 6-to-4 on-center off-surround provide contrast normalization if cells obey shunting or membrane equation dynamics.
  • image p501fig13.26 A simple differential equation describes the processes of transmitter accumulation and release that do their best, at a finite rate, to carry out unbiased transduction.
    || Transmitter accumulation and release. Transmitter y cannot be restored at an infinite rate: T = S*ym y ~= B, Differential equations: d[dt: y] = A*(B - y) - S*y = accumulate - release. Transmitter y tries to recover to ensure unbiased transduction. What if it falls behind? Evolution has exploited the good properties that happen then.
  • image p505fig13.33 An unexpected event can disconfirm ongoing processing by triggering a burst of nonspecific arousal that causes antagonistic rebounds in currently active gated dipoles, whether cognitive or affective.
    || Novelty reset: rebound to arousal onset. 1. Equilibrate to I and J: S1 = f(I+J); y1 = A*B/(A+S1); S2 = f(I+J); y2 = A*B/(A+S2);. 2. Keep phasic input J fixed; increase arousal I to I* = I + ∆I: (a) OFF reaction if T1 < T2; OFF = T2 - T1 = f(I*+J)*y2 - f(I*)*y1 = { A*B*(f(I*) - f(I*+J)) - B*(f(I*)*f(I+J) - f(I)*f(I*+J)) } / (A+f(I)) / (A + f(I+J)). 3. How to interpret this complicated equation?
  • image p580fig16.05 Macrocircuit of the GridPlaceMap model, which can learn both 2D grid cells and place cells in response to realistic trajectories of navigating rats using a hierarchy of SOMs with identical equations.
    || GridPlaceMap model: rate-based and spiking (Pilly, Grossberg 2012). Pre-wired 1D stripe cells, learns both 2D frid and place cells! Same laws for both; both select most frequent and energetic inputs. Place cells emerge gradually in response to developing grid cells. [place-> grid-> stripe] cells-> path integration-> vestibular signals
  • image p586fig16.16 In the place cell learning model of (Gorchetnikov, Grossberg 2007), three populations of five cells each of entorhinal grid cells (only two are shown) with different spatial periods input to the model's dentate gyrus. The grid cells are one-dimensional and defined algorithmically. A model dentate gyrus granule cell that receives strong projections from all three grid cell scales fires (green cell) and activates a recurrent inhibitory interneuron that inhibits other granule cells. It also generates back-propagating action potentials that trigger learning in the adaptive weights of the projections from the grid cells, thereby causing learning of place cell receptive fields.
    || Grid-to-place Self-Organizing map (Gorchetnikov, Grossberg 2007). Formation of place cell fields via grid-to-place cell learning. Least common multiple: [grid (cm), place (m)] scales: [40, 50, 60 (cm); 6m], [50, 60, 70 (cm); 21m], [41, 53, 59 (cm); 1.282 km]. Our simulations: [40, 50 (cm); 2m], [44, 52 (cm); 5.72m]. Our SOM: Spiking Hodgkin-Huxley membrane equations; Nonlinear choice by contrast-enhancing recurrent on-center off-surround net;. Choice triggers back-propagating action potentials that induce STDP-modulated learning on cell dendrites.
  • image p593fig16.25 Spectral Spacing Model STM, MTM, and LTM equations. The rate spectrum that determines the dorsoventral gradient of multiple grid cell properties is defined by μm.
    || Spectral Spacing Model equations. [STM, MTM, LTM]. μm = rate spectrum.
  • Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
    The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

    Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

    Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

    These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

    This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

    The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

    I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

    Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

    The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

    If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

    The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

    Howell: bold added for emphasis.
    (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
    see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
  • image pxvifig00.01 Macrocircuit of the visual system
  • image p087fig03.01 A macrocircuit of key visual processes (in green) and the cortical areas in which they primarily occur (in red), from the retina to the Prefrontal Cortex (PFC), including both the What and Where cortical streams. The [bottom-up, horizontal, and top-down] interactions help each of these processes to overcome computationally complementary processing deficiencies that they would experience without them, and aso to read-out top-down expectations that help to stabilize learning while they focus attention on salient objects and positions.
    || Emerging unified theory of visual intelligence. [What, Where] streams. Bottom-up and top-down interactions overcome COMPLEMENTARY processing deficiencies.
  • image p168fig04.44 Macrocircuit of the main boundary and surface formation stages that take place from the lateral geniculate nucleus, or LGN, through cortical areas [V1, V2, V4]. See the text for details.
    ||
    left eyebinocularright eye
    V4 binocular surface
    V2 monocular surfaceV2 layer 2/3 binocular boundaryV2 monocular surface
    V2 layer 4 binocular boundary
    V1 monocular surfaceV1 monocular boundaryV1 binocular boundaryV1 monocular boundaryV1 monocular surface
    LGNLGN
  • image p228fig05.37 A macrocircuit of the neurotrophic Spectrally Timed ART, or nSTART, model. I developed nSTART with my PhD student Daniel Franklin. It proposes how adaptively timed learning in the hippocampus, bolstered by Brain Derived Neurotrophic Factor, or BDNF, helps to ensure normal memory consolidation.
    || habituative gates, CS, US, Thalamus (sensory cortex, category learning, conditioned reinforcer learning, adaptively time learning and BDNF), Amygdala (incentive motivation learning), Hippocampus (BDNF), Prefrontal Cortex (attention), Pontine nuclei, Cerebellum (adaptively timed motor learning)
  • image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
    ||
  • image p346fig09.16 A macrocircuit of some of the main brain regions that are used to move the eyes. Black boxes denote areas belonging to the saccadic eye movement systes (SAC), white boxes the smooth pursuit eye system (SPEM), and gray boxes, both systems. The abbreviations for the different brain regions are: LIP - Lateral Intra-Parietal area; FPA - Frontal Pursuit Area; MST - Middle Superior Temporal area; MT - Middle Temporal area; FEF - Frontal Eye Fields; NRPT - Nucleus Reticularis Tegmenti Pontis; DLPN - Dorso-Lateral Pontine Nuclei; SC - Superior Colliculus; CBM - CereBelluM; MVN/rLVN - Medial and Rostro-Lateral Vestibular Nucleii; PPRF - a Peri-Pontine Reticular Formation; TN - Tonic Neurons
    ||
  • image p436fig12.30 The conscious ARTWORD, or cARTWORD, laminar cortical speech model simulates how future context can disambiguate noisy past speech sounds in such a way that the completed percept is consciously heard to proceed from past to future as a feature-item-list resonant wave propagates through time.
    || cARTWORD: Laminar cortical model macrocircuit (Grossberg, Kazerounian 2011) Simulates PHONEMIC RESTORATION: Cognitive Working Memory (processed item sequences) - [Excitatory-> inhibitory-> habituative-> adaptive filter-> adaptive filter-> adaptive filter with depletable synapse-> Acoustic [item, feature]
  • image p474fig12.70 The kind of model macrocircuit that was used in (Grossberg, Stone 1986) to explain lexical decision task data.
    || inputs-> A1 <-> A2 oconic sensory features <-> A3 item and order in sensory STM <-> A4 list parsing in STM (masking field) <-> A5 semantic network (self-feedback). [A4, A5] <-> V* visual object recognition system. M1-> [outputs, A1]. M1 <-> M2 iconic motor features <-> M3 item and order in motor STM. A2-> M2. A3-> M3.
  • image p481fig13.01 Macrocircuit of the functional stages and anatomical interpretations of the Cognitive-Emotional-Motor, or CogEM, model.
    || Drive-> hypothalamus value categories <-> amygdala incentive motivational learning-> Orbitofrontal cortex- object-value categories <-> sensory cortex- invariant object categories- conditioned reinforcer learning-> amygdala-> hypothalamus.
  • image p520fig14.02 Macrocircuit of the main brain regions, and connections between them, that are modelled in the unified predictive Adaptive Resonance Theory (pART) of cognitive-emotional and working memory dynamics. Abbreviations in red denote brain regions used in cognitive-emotional dynamics. Those in green denote brain regions used in working memory dynamics. Black abbreviations denote brain regions that carry out visual perception, learning and recognition of visual object categories, and motion perception, spatial representation and target tracking. Arrow denote non-excitatory synapses. Hemidiscs denote adpative excitatory synapses. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown. Also not shown are output signals from cortical areas to motor responses. V1: striate, or primary, visual cortex; V2 and V4: areas of prestriate visual cortex; MT: Middle Temporal cortex; MST: Medial Superior Temporal area; ITp: posterior InferoTemporal cortex; ITa: anterior InferoTemporal cortex; PPC: Posterior parietal Cortex; LIP: Lateral InterParietal area; VPA: Ventral PreArculate gyrus; FEF: Frontal Eye Fields; PHC: ParaHippocampal Cortex; DLPFC: DorsoLateral Hippocampal Cortex; HIPPO: hippocampus; LH: Lateral Hypothalamus; BG: Basal Ganglia; AMYG: AMYGdala; OFC: OrbitoFrontal Cortex; PRC: PeriRhinal Cortex; VPS: Ventral bank of the Principal Sulcus; VLPFC: VentroLateral PreFrontal Cortex. See the text for further details.
    ||
  • image p532fig14.08 Macrocircuit of the ARTSCENE Search neural model for learning to search for desired objects by using the sequences of already experienced objects and their locations to predict what and where the desired object is. V1 = First visual area or primary visual cortex; V2 = Second visual area; V4 = Fourth visual area; PPC = Posterior Parietal Cortex; ITp = posterior InferoTemporal cortex; ITa = anterior InferoTemporal cortex; MTL = Medial Temporal Lobe; PHC = ParaHippoCampal cortex; PRC = PeriRhinal Cortex; PFC = PreFrontal Cortex; DLPFC = DorsoLateral PreFrontal Cortex; VPFC = Ventral PFC; SC = Superior Colliculus.
    ||
  • image p580fig16.05 Macrocircuit of the GridPlaceMap model, which can learn both 2D grid cells and place cells in response to realistic trajectories of navigating rats using a hierarchy of SOMs with identical equations.
    || GridPlaceMap model: rate-based and spiking (Pilly, Grossberg 2012). Pre-wired 1D stripe cells, learns both 2D frid and place cells! Same laws for both; both select most frequent and energetic inputs. Place cells emerge gradually in response to developing grid cells. [place-> grid-> stripe] cells-> path integration-> vestibular signals
  • image p599fig16.35 Data (a) and simulations (b,c) about anatomically overlapping grid cell modules. (a) shows the anatomical distribution of grid cells belonging to different modules in one animal. DV location (mm) vs postrhinal border. (b) shows the simulated distribution of learned grid cell spacings from two stripe cell scales. frequency (%) vs grid spacing (cm). mu = [1, 0.6]. (c) shows what happens when half the cells respond with one rate and half another rate. (d) shows the same with three rates. (e-g) show spatial maps and autocorrelograms of grid cells that arise from the different rates in (d). [rate map, autocorelogram] vs [score [1.07, 0.5, 0.67], spacing (cm) [23.58, 41, 63.64]].
    ||
  • image p612fig16.42 Macrocircuit of the main SOVEREIGN subsystems.
    || [reward input, drive input, drive representation (DR), visual working memory and planning system (VWMPS), visual form and motion system (VFMS), motor approach and orienting system (MAOS), visual input (VisIn), motor working memory and planning system (MWMPS), motor approach and orienting system (MAOS), motor plant (MotP), Proprioceptive Input (PropIn), Vestibular Input (VesIn), Environmental feedback (EnvFB). DR [incentive motivational learning-> [VWMPS, MWMPS], -> VFMS, -> MAOS], VWMPS [conditioned reinforcer learning-> DR, MAOS], VFMS [visual object categories-> VWMPS, reactive movement commands-> MAOS], MWMPS [conditioned reinforcer learning-> DR, planned movement commands-> MAOS], MAOS [motor map positions-> MWMPS, motor outflow-> MotP], VisIn-> VFMS, VesIn-> MAOS, EnvFB-> [VisIn, MotP, VesIn].
  • p190 Howell: [neural microcircuits, modal architectures] used in ART -
    bottom-up filterstop-down expectationspurpose
    instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
    LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
    EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
    auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
    visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
    red - cognitive-emotional dynamics
    green - working memory dynamics
    black - see [bottom-up, top-down] lists
    EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

    backgound colours in the table signify :
    whitegeneral microcircuit : a possible component of ART architecture
    lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
    light bluepost-perceptual cognition?
    pink"the feeling of what happens" and knowing what event caused that feeling
  • image p025fig01.16 (left panel) The main processing stages of the Cognitive-Emotional-Motor (CogEM) model have anatomical interpretations in terms of sensory cortex, amygdala, and prefrontal cortex. Chapter 13 will describe in greater detail how CS cues activate invariant object categories in the sensory cortex, value categories in the amygdala, and object-value categories in the prefrontal cortex, notably the orbitofrontal cortex. The amygdala is also modulated by internal drive inputs like hunger and satiety. (right panel) Anatomical data support this circuit, as do many neurophysiological data.
    || drive -> amygdala -> prefrontal cortex <-> sensory cortex -> amygdala. [visual, somatosensory, auditory, gustatory, olfactory] cortex -> [amygdala, Orbital Prefrontal Cortex]. amygdala -> Lateral Prefrontal Cortex
  • image p481fig13.01 Macrocircuit of the functional stages and anatomical interpretations of the Cognitive-Emotional-Motor, or CogEM, model.
    || Drive-> hypothalamus value categories <-> amygdala incentive motivational learning-> Orbitofrontal cortex- object-value categories <-> sensory cortex- invariant object categories- conditioned reinforcer learning-> amygdala-> hypothalamus.
  • image p483fig13.02 The object-value categories in the orbitofrontal cortex require converging specific inputs from the sensory cortex and nonspecific incentive motivational inputs from the amygdala in order to fire. When the orbitofrontal cortex fires, it can deliver top-down ART Matching Rule priming signals to the sensory cortical area by which it was activated, thereby helping to choose the active recognition categories there that have the most emotional support, while suppressing others, leading to attentional blocking of irrelevant cues.
    || Cognitive-Emotional-Motor (CogEM) model. Drive-> amygdala incentive motivational learning-> orbitofrontal cortex- need converging cue and incentive iputs to fire <-> sensory cortex- conditioned renforcer learning-> amygdala. CS-> sensory cortex. Motivated attention closes the cognitive-emotional feedback loop, focuses on relevant cues, and causes blocking of irrelevant cues.
  • image p483fig13.03 The predicted processing stages of CogEM have been supported by anatomical studies of connections between sensory cortices, amygdala, and orbitofrontal cortex.
    || Adapted from (Barbas 1995). sensory cortices = [visual, somatosensory, auditory, gustatory, olfactory]. sensory cortices-> amygdala-> orbital prefrontal cortex. sensory cortices-> orbital prefrontal cortex. [visual cortex, amygdala]-> lateral prefrontal cortex.
  • image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
    || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
  • image p487fig13.11 The three main properties of CogEM that help to explain how attentional blocking occurs.
    || CogEM explanation of attentional blocking. Internal drive input <-> Conditioned reinforcer learning (self-recurrent) <-> Competition for STM <- Motor learning. 1. Sensory representations compete for limited capacity STM. 2. Previously reinforced cues amplify their STM via positive feedback. 3. Other dues lose STM via competition.
  • image p489fig13.13 (top row) If a positive ISI separates onset of a CS and US, then the CS can sample the consequences of the US during the time interval before it is inhibited by it. (bottom row) A CogEM simulation of the inverted-U in conditioning as a function of the ISI betweeen CS and US.
    || Positive ISI and conditioning.
  • image p490fig13.15 The CogEM circuit is an ancient design that is found even in mollusks like Aplysia. See the text for details.
    || Aplysia (Buononamo, Baxter, Byrne, Neural Networks 1990; Grossberg, Behavioral and Brain Sciences 1983). Facilitator neuron ~ drive representation.
  • image p494fig13.19 (left column, top row) Secondary conditioning of both arousal and a specific response are now possible. (bottom row) The CogEM circuit may be naturally extended to include multiple drive representations and inputs. (right column, top row) The incentive motivational pathways is also conditionable in order to enable motivational sets to be learned.
    || Secondary conditioning. Homology: conditionable incentive motivation. Multiple drive representations and inputs.
  • image p514fig13.44 Analog of the COgEM model in Figure 6.1 of (Damasio 1999).
    || (a) map of object X-> map of proto-self at inaugural instant-> [, map of proto-self modified]-> assembly of second-order map. (b) map of object X enhanced-> second-order map imaged.
  • image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
    ||
  • image p548fig15.16 Homologous recognition learning and reinforcement learning macrocicuits enable adaptively timed conditioning in the reinforcement learning circuit to increase inhibition of the orienting system at times when a mismatch in the recognition system would have reduced inhibition of it.
    || Homolog between ART and CogEM model, complementary systems. [Recognition, Reinforcement] learning vs [Attentional, Orienting] system. Reinforcement: timing, drive representation.
  • image p080fig02.33 An opposite-attracts rule during the development of intracellular connections can lead to a mature network that realizes informational noise suppression.
    || How do noise suppression parameters arise? Symmetry-breaking during morphogenesis? Opposites attract rule.
    Intracellular parameters C/B = 1/(1 - n) Intercellular parameters
    Predicts that:
    • Intracellular excitatory and inhibitory saturation points can control the growth during development of :
    • Intercellular excitatory and inhibitory connections.
  • image p012fig01.08 A sigmoidal signal function is a hybrid signal that combines the best properties of [faster, same, slower]-than linear signals. It can suppress noise and store a partially contrast-enhanced activity pattern. slower-than-linear saturates pattern; approximately linear- preserves pattern and normalizes; faster-than-linear- noise suppression and contrast-enhancement.
    || Sigmoidal signal: a hybrid. (upper) saturates pattern- slower-than-linear; (middle) preserves pattern and normalizes- approximately linear. (lower) noise suppression and contrast enhancement- faster-than-linear.
  • image p078fig02.30 Choosing the adaptation level to achieve informational noise suppression.
    || Noise suppression. Attenuate Zero Spatial frequency patterns: no information. Ii vs i (flat line), xi vs i (flat line at zero)
    B >> C: Try B = (n - 1)*C or C/(B + C) = 1/n
    Choose a uniform input pattern (no distinctive features): All θi = 1/n
    xi = (B + C)*I/(A + I)*[θi -C/(B + C)] = 0 no matter how intense I is.
  • image p078fig02.31 How noise suppression enables matching of bottom-up and top-down input patterns.
    || Noise suppression -> pattern matching. mismatch (out of phase) suppressed, match (in phase) amplifies pattern.
  • image p080fig02.33 An opposite-attracts rule during the development of intracellular connections can lead to a mature network that realizes informational noise suppression.
    || How do noise suppression parameters arise? Symmetry-breaking during morphogenesis? Opposites attract rule.
    Intracellular parameters C/B = 1/(1 - n) Intercellular parameters
    Predicts that:
    • Intracellular excitatory and inhibitory saturation points can control the growth during development of :
    • Intercellular excitatory and inhibitory connections.
  • image p080fig02.34 How to achieve informational noise suppression in a network with multiple parallel processing channels.
    || Symmetry-breaking: dynamics and anatomy.
    Dynamics:
    • excitatory range is amplified
    • inhibitory range is compressed
    Anatomy:
    • narrow on-center
    • broad off-surround
    Noise suppression: attenuates uniform patterns
    Contour direction: enhances pattern gradients
  • image p081fig02.36 Informational noise suppression in network with Gaussian on-center and off-surround function as contour detectors that are sensitive to ratio-contrast.
    || Noise suppression and contour detection.
    If B*sum[k=1 to n: Cki] <= D*sum[k=1 to n: Eki] then:
    • uniform patterns are suppressed
    • contrasts are selectively enhanced
    • contours are detected
    Ii vs i, xi vs i
    Responses are selective to [REFLECTANCE, SPATIAL SCALE], eg color [feature, surface] contours.
  • image p510fig13.39 Shunting competition and informational noise suppression in affective gated dipoles, plus back-propagating action potentials for teaching signals, enable the net normalized adaptive weights to be learned. They never saturate!
    || Learn net dipole output pattern. Opponent "decision" controls learning. Cf. competitive learning. Learning signal, opponent extinction.
  • image p009fig01.06 Primacy gradient of activity stored in working memory within a recurrent shunting on-center off-surround network. Rehersal is controlled by a nonspecific rehersal wave and self-inhibitory feedback of the item that is currently being rehearsed. Rehearsal is controlled by a nonspecific rehearsal wave and self-inhibitory feedback of the item that is currently being rehearsed. Green = excitatory, red = inhibitory
    || inputs? -> item and order WM storage -> competitive selection-> rehearsal wave -> outputs
  • image p024fig01.15 A REcurrent Associative Dipole, or READ, circuit is a recurrent shunting on-center off-surround network with habituative transmitter gates. Sensory cues sample it with LTM traces and thereby become conditioned reinforcers.
    ||
  • image p038fig01.25 The ART Matching Rule stabilizes real time learning using a [top-down, modulatory on-center, off-surround] network. Object attention is realized by such a network. See text for additional discussion.
    || ART Matching Rule [volition, categories, features]. [one, two] against one.
  • image p073fig02.22 An on-center off-surround network is capable of computing input ratios.
    || Computing with patterns.
    How to compute the pattern-sensitive variable: θi = Ii / sum[k=1 to n: Ik]?
    Needs interactions! What type? θi = Ii / sum[k ≠ i: Ik]
    Ii↑ ⇒ θi↑ excitation, Ik↑ ⇒ θk↓, k ≠ i inhibition
    On-center off-surround network.
  • image p074fig02.23 The equations for a shunting on-center off-surround network. Shunting terms lead to many beautiful and important properties of these networks, which are found ubiquitously, in one form or another, in all cellular tissues.
    || Shunting on-center off-surround network.
    Mass action: d[dt: xi] = -A*xi +(B - xi)*Ii -xi*sum[k≠i: Ik]
    Turn on unexcited sitesTurn off excited sites
    At equilibrium:
    0 = d[dt: xi] = -(A + Ii + sum[k≠i: Ik])*xi + B*Ii = -(A + I)*xi + B*Ii
    xi = B*Ii/(A + I) = B*θi*I/(A + I) = θi* B*I/(A + I)No saturation!
    Infinite dynamical range
    Automatic gain control
    Compute ratio scale
    Weber law
    x = sum[k-1 to n: xk] = B*I/(A + I) ≤ B Conserve total activity
    NORMALIZATION
    Limited capacty
    Real-time probability
  • image p075fig02.24 The membrane equations of neurophysiology describe how cell voltages change in response to excitatory, inhibitory, and passive input channels. Each channel is described by a potential difference multiplied by a conductance. With the special choices shown in the lower right-hand corner, this equation defines a feedforward shuntin on-center off-surround network.
    || Membrane equations of neurophysiology.
    C*dp[dt] = (V(+) - V)*g(+) +(V(-) - V)*g(-) +(V(p) - V)*g(p)
    Shunting equation (not additive)
    V Voltage
    V(+), V(-), V(p) Saturating voltages
    g(+), g(-), g(p) Conductances
    V(+) = B, C = 1; V(-) = V(p) = 0; g(+) = Ii; g(-) = sum[k≠i: Ik];
    lower V: V(+) = V(p) Silent inhibition, upper V: V(+). (Howell: see p068fig02.14 Grossberg's comment that Hodgkin&Huxley model was a "... Precursor of Shunting network model (Rail 1962) ...").
  • image p076fig02.25 An on-center off-surround network can respond to increasing on-center excitatory inputs without a loss of sensitivity. Instead, as the off-surround input increases, the region of a cell's maximal sensitivity to an increasing on-center input shifts to a range of larger inputs. This is because the off-surround divides the effect of the on-center input, an effect that is often called a Weber law.
    || Web er law, adaptation, and shift property (Grossberg 1963).
    Convert to logarithmic coordinates:
    K = ln(Ii), Ii = e^K, J = sum[k≠i: Ik]
    xi(K,J) = B*Ii/(A + Ii + J) = B*e^K/(A + e^K + J)
    x(K + S, J1) = x(K, J2), S = ln((A + J1)/(A + J2)) size of SHIFT.
  • image p076fig02.26 The mudpuppy retina exhibits the shift property that occurs in the feedforward shunting on-center off-surround network in Figure 2.25. As a result, its sensitivity also shifts in response to different background off-surrounds, and therefore exhibits no compression (dashed purple lines).
    || Mudpuppy retina neurophysiology.
    I center, J background
    a) Relative figure-to-ground
    b) Weber-Fechner I*(A + J)^(-I)
    c) No hyperpolarization, SHUNT: Silent inhibition
    d) Shift property(Werblin 1970) xi(K,J) vs K = ln(I)
    Adaptation- sensitivity shifts for different backgrounds. NO COMPRESSION.
  • image p077fig02.27 A schematic of the on-center off-surround network that occurs in the mudpuppy retina, including three main cell types: receptors, horizontal cells, and bipolar cells.
    || Mechanism: cooperative-competitive dynamics.
    On-center off-surround (Kuffler 1953) cat retina
    Subtractive lateral inhibition (Hartline, Ratcliff 1956/7+) limulus retina.
    R receptor -> H horizontal -> B bipolar (Werblin, Dowling, etal 1969+) mudpuppy retina.
  • image p080fig02.34 How to achieve informational noise suppression in a network with multiple parallel processing channels.
    || Symmetry-breaking: dynamics and anatomy.
    Dynamics:
    • excitatory range is amplified
    • inhibitory range is compressed
    Anatomy:
    • narrow on-center
    • broad off-surround
    Noise suppression: attenuates uniform patterns
    Contour direction: enhances pattern gradients
  • image p081fig02.35 The equilibrium activities of a shunting netwok with Gaussian on-center off-surround kernels are sensitive to the ratio-contrasts of the input patterns that they process. The terms in the denominator of the equilibrium activities accomplish this using the shunting on-center and off-surround terms.
    || Ratio-contrast detector. flat versus [Gaussian Cki, flattened Gaussian? Eki]
    d[dt: xi] = -A*xi +(B - xi)*sum[k≠i: Ik]*Cki -(xi + D)*sum[k=1 to n: Ik*Eki]
    Cki = C*e^(-μ*(k - i)^2), Eki = E*e^(-μ*(k - i)^2)
    At equilibrium: xi = I*sum[k=1 to n: θk*Fki] / (A + I*sum[k=1 to n: θk*Gki])
    Fki = B*Cki -D*Eki (weighted D.O.G)
    Gki = Cki +Eki (S,O,G)
    • Reflectance processing
    • Contrast normalization
    • Discount illuminant
  • image p081fig02.36 Informational noise suppression in network with Gaussian on-center and off-surround function as contour detectors that are sensitive to ratio-contrast.
    || Noise suppression and contour detection.
    If B*sum[k=1 to n: Cki] <= D*sum[k=1 to n: Eki] then:
    • uniform patterns are suppressed
    • contrasts are selectively enhanced
    • contours are detected
    Ii vs i, xi vs i
    Responses are selective to [REFLECTANCE, SPATIAL SCALE], eg color [feature, surface] contours.
  • image p106fig03.24 In response to the Synthetic Aperture image (upper corner left), a shunting on-center off-surround network "discounts the illiminant" and thereby normalizes cell activities to compute feature contours, without causing saturation (upper right corner). Multiple-scale boundaries form in response to spatially coherent activities in the feature contours (lower left corner) and create the webs, or containers, into which the feature contours fill-in the final surface representations (lower right corner).
    || Do these ideas work on hard problems? SAR!
    input imagefeature contoursboundary contoursfilled-in surface
    Synthetic Aperture Radar: sees through weather 5 orders of magnitude of power in radar returndiscounting the illuminant
    • normalizes the image: preseves RELATIVE activities without SATURATION
    • shows individual PIXELS
    boundaries complete between regions where normalized feature contrasts changefilling-in averages brightnesses within boundary compartments
  • image p176fig04.53 The on-center off-surround network within position and across depth helps to explain why brighter Kanizsa squares look closer.
    || inhibition vs. depth. p176c1h0.25 "... to qualitatively understand how this example of proximity-luminance covariance works. It follows directly from the boundary pruning by surface contour feedback signals (Figure 4.51) that achieves complementary consistency and initiates figure-ground perception. ...". p176c1h0.45 "... these inhibitory sigals are part of an off-surround network whose strength decreases as the depth difference increases between the surface that generates the signal and its recipient boundaries. ...". p176c1h0.8 "... Within FACADE theory, the perceived depth of a surface is controlled by the boundaries that act as its filling-in generators and barriers (Figure 3.22), since these boundaries select the depth-sselective FIDOs within whin filling-in can occur, and thereby achieve surface capture. These boundaries, in turn, are themselves strengthened after surface-to-boundary contour feedback eliminates redundant boundaries that cannot support sucessful filling-in (Figure 4.51). These surface contour feedback signals have precisely the properties that are needed to explain why brighter Kanizsa squares look closer! ..."
  • image p192fig05.05 ON and OFF cells in the LGN respond differently to the sides and ends of lines.
    || [ON, OFF]-center, [OFF, ON]-surround (respectively). OFF-center cells maximum response at line end (interior), ON-center cells maximum response along sides (exterior)
  • image p253fig06.02 After bottom-up surface inputs activate spatial attentional cells, they send top-down topographic excitatory signals back to the surface representations. This recurrent shunting on-center off-surround network contrast enhances larger attentional activities while approximately normalizing the total spatial attentional activity. A surface-shroud resonance hereby forms that selects an attentional shroud, enhances the perceived contrast of the attended surface (light blue region), and maintains spatial attention on it.
    || Surface-shroud resonance. perceptual surfaces -> competition -> spatial attention. (Carrasco, Penpeci-Talgar, and Eckstein 2000, Reynolds and Desimone 2003)
  • image p300fig08.12 A single flash activates a Gaussian receptive field across space whose maximum is chosen by a winner-take-all recurrent on-center off-surround network.
    || Gaussian receptive fields are sufficient! (Grossberg, Rudd 1992). Single flash. Suppose that a single flash causes a narrow peak of activity at the position where it occurs. It generates output signals through a Gaussian filter that produces a Gaussian activity profile at the next processing stage., A recurrent on-center off-surround network chooses the maximum activity and suppresses samaller activities. Winner-take-all
  • image p315fig08.35 The output signals from the directional grouping network obey the ART Matching Rule. They thereby select consistent motion directional signals while suppressing inconsistent ones, and do not distort that the spared cells code. The aperture problem is hereby solved by the same mechanism that dynamically stabilizes the learning of directional grouping cells.
    || How to select correct direction and preserve speed estimates? Prediction: Feedback from MSTv to MT- obeys ART Matching Rule; Top-down, modulatory on-center, off-surround network (Grossberg 1976, 1980; Carpenter, Grossberg 1987, 1991); Explains how directional grouping network can stably develop and how top-down directional attention can work. (Cavanough 1992; Goner etal 1986; Sekuer, Ball 1977; Stelmach etal 1994). Directional grouping network (MSTv) <-> Directional long-range filter (MT). Modulatory on-center selects chosen direction and preserves speed. Off-surround inhibits incompatible directions.
  • image p340fig09.07 Log polar remapping from the retina to cortical area V1 and beyond converts expansion, translation, and spiral flows on the retina into parallel flows, with different orientations, on the cortical map.
    || Log polar remapping of optic flow. retina -> cortex. Any combination of expansion and circular motion centered on the fovea maps to cortex as a single direction. Retinal Cartesian coordinates (x,y) map to cortical polar coordinates (r,theta). This makes it easy to compute directional receptive fields in the cortex!
  • image p345fig09.15 Double opponent directional receptive fields in MT are capable of detecting the motion of objects relative to each other and their backgrounds.
    || Motion opponency in MT (Born, Tootell 1992). Motion opponent (Grossberg etal), Differential motion (Royden etal), Subtractive motion cells (Neumann etal). ON center directionally selective: [excit, inhibit]ed by motion in [one, opponent] direction. OFF surround directionally selective: [excit, inhibit]ed by motion in [opponent, center] direction.
  • image p354fig10.01 The lamiar cortical circuit that realizes how we pay attention to an object sends signals from layer 6 of a higher cortical level to layer 6 of a lower cortical level and then back up to layer 4. This "folded feedback" circuit realizes a top-down, modulatory on-center, off-surround circuit that realizes the ART Matching Rule.
    || Top-down attention and folded feedback. Attentional signals also feed back into 6-to-4 on-center off-surround. 1-to-5-to-6 feedback path: Macaque (Lund, Booth 1975) cat (Gilbert, Wiesel 1979). V2-to-V1 feedback is on-center off-surround and affects layer 6 of V1 the most (Bullier etal 1996; Sandell, Schiller 1982). Attended stimuli enhanced, ignored stimuli suppressed. This circuit supports the predicted ART Matching Rule! [LGN, V[1,2][6->1]]
  • image p359fig10.06 Another, albeit indirect, pathway from LGN exists that can also excite layer 4 of V1. Why are not these two pathways redundant? The answer, ultimately, how to do with how cortex learns, as well as with how it pays attention. See the text for details.
    || Another bottom-up input to layer 4: Why?? Layer 6-to-4 on-center off-surround (Grieve, Sillito 1991, 1995; Ahmedetal 1994, 1997). LGN projects to layers 6 and 4. Layer 6 excites spiny stellates in column above it. Medium range connections onto inhibitory neurons. 6-t-4 path acts as on-center off-curround.
  • image p359fig10.07 The two bottom-up pathways from LGN to layer 4 of V1 can together activate layer 4 and contrast-normalize layer 4 responses.
    || Bottom-up contrast normalization (Grossberg 1968, 1973; Sperling, Sondhi 1968; Heeger 1992; Douglas etal 1995; Shapley etal 2004). Together, direct LGN-to-4 path and 6-to-4 on-center off-surround provide contrast normalization if cells obey shunting or membrane equation dynamics.
  • image p360fig10.08 The bottom-up on-center off-surround from LGN-to-6-to-4 has a modulatory on-center because of its role in realizing the ART Matching Rule and, with it, the ability of the cortex to dynamically stabilize its learned memories.
    || Modulation of priming by 6-to-4 on-center (Stratford etal 1996; Callaway 1998). On-center 6-to-4 excitation is inhibited down to being modulatory (priming, subthreshold). On-center 6-to-4 excitation cannot activate layer 4 on its own. Clarifies need for direct path. Prediction: plays key role in stable grouping, development and learning. ART Matching Rule!
  • image p362fig10.11 Feedback between layer 2/3 to the layer 6-to-4-to-2/3 feedback loop chooses the strongest grouping in cases where there is more than one. If only one grouping exists, then the circuit can function very quickly in a feedforward manner. When multiple groupings exist, the cortex "runs as fast as it can" to select the one with the most evidence to support it using the self-normalizing inhibition in the layer 6-to-4 off-surround.
    || How is the final grouping selected? Folded feedback LGN-> 6-> 4-> 2/3. 1. Layer 2/3 groupings feed back into 6-to-4 on-center off-surround: a) direct layer 2/3 -to-6 path; b) can also go via layer 5 (Blasdel etal 1985; Kisvarday etal 1989). 2. Strongest grouping enhanced by its on-center. 3. Inputs to weaker groupings suppressed by off-surround. 4. Interlaminar feedback creates functional columns. Activities of conflicting groupings are reduced by self-normalizing inhibition, slowing processing; intracortical feedback selects and contrast-enhances the winning grouping, speeding processing.
  • image p364fig10.13 The bottom-up adaptive filter, intracortical grouping circuit, and intercortical top-down attentional circuit all use the same competitive decision circuit between layers 6 and 4, called the attention-preattention interface, with which to select the featural patterns that will be processed.
    || Bottom-up filters and intracortical grouping feedback use the same 6-to-4 decision circuit, LGN-> Vx[6,4,2/3]. competitive decision circuit, modulatory on-center off-surround network. Top-down intercortical attention also uses the same 6-to-4 decision circuit!
  • image p364fig10.14 This figure emphasizes how preattentive intracortical groupings and top-down intercortical attention share the same modulatory on-center, off-surround layer 4-to-6 decision circuit.
    || Explanation: grouping and attention share the same modulatory decision circuit. Layer 6-6-4-2/3 pathway shown; also a layer 6-1-2/3 path. intercortical attention, both act via a modulatory on-center off-surround decision circuit, intracortical feedback from groupings
  • image p367fig10.15 Data (left column) and simulation (right column) of how attention prevents a masking stimulus from inhibiting the response to the on-center of the cell from which the recording was made.
    || Attention protects target from masking stimulus (Reynolds etal 1999; Grossberg, Raizada 2000).
  • image p375fig11.07 The 3D LAMINART model uses both monocular and binocular simple cells to binocularly fuse like image contrasts. The remainder of the model generates 3D boundary and surface representations of multiple kinds of experiments as well as of natural scenes.
    || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A], V2 thin stripe [4->2/3A], V4]. V1 blob [V1-4 monocular, V1 interior binocular] simple cells. [complex, simple, inhibitory] cells, on-center off-surround
  • image p376fig11.10 The 3D LAMINART model shows how the disparity filter can be integrated into the circuit that completes 3D boundary representations using bipole grouping cells. It also explains how surface contours can strengthen boundaries that succeed in generating closed filling-in domains.
    || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A] surface contour, V2 thin stripe (monocular surface) [4->2/3A], V2 interior [disynaptic inhibitory interneurons, bipole grouping cells, disparity filter, V4 binocular surface]. [complex, simple, inhibitory] cells, on-center off-surround
  • image p423fig12.20 The Spatial Pitch Network, or SPINET, model shows how a log polar spatial representation of the sound frequency spectrum can be derived from auditory signals occuring in time. The spatial representation allows the ARTSTREAM model to compute spatially distinct auditory streams.
    || SPINET model (Spatial Pitch Network) (Cohen, Grossberg, Wyse 1995). 1. input sound 2. Gamma-tone filter bank 3. Shaort-term average energy spectrum 4. MAP transfer function 5. On-center off-surround and rectification 6. Harmonic weighting 7. Harmonic summation and competition -> PITCH
  • image p448fig12.46 A Masking Field working memory is a multiple-scale self-similar recurrent shunting on-center off-surround network. It can learn list chunks that respond selectively to lists of item chunks of variable length that are stored in an item working memory at the previous processing stage. Chunks that code for longer lists (eg MY vs MYSELF) are larger, and give rise to stronger recurrent inhibitory neurons (red arrows).
    || How to code variable length lists? MASKING FIELDS code list chunks of variable length (Cohen, Grossberg 1986, 1987; Grossberg, Kazerounian 2011, 2016; Grossberg, Meyers 2000; Grossberg, Pearson 2008). Multiple-scale self-similar WM: Masking field, adaptive filter. Variable length coding- Masjking fields select list chunks that are sensitive to WM sequences of variable length; Selectivity- Larger cells selectively code code longer lists; Assymetric competition- Larger cells can inhibit smaller cells more than conversely MAgic Number 7! Temporal order- different list chunks respond to the same items in different orders eg LEFT vs FELT;.
  • image p564fig15.35 (a) A pair of recurrent shunting on-center off-surround networks for control of the fore limbs and hind limbs. (b) Varying the GO signal to these networks can trigger changes in movement gaits. See the text for details.
    ||
  • image p567fig15.38 (a) The Gated Pacemaker model for the control of circadian rythms is a recurrent shunting on-center off-surround network whose excitatory feedback signals are gated by habituative transmitters. Tonic arousal signals energize the pacemaker. Diurnal (left) and nocturnal (right) pacemakers are determined by whether phasic light signals turn the pacemaker on or off. An activity-dependent fatigue signal prevents the pacemaker from becoming overly active for too long. (b) Two simulations of circadian activity cycles during different schedules of light (L) and dark (D). See the text for details.
    || sourceOn-> on-cells (recurrent) <-(-) (-)> off-cells (recurrent) <-sourceOff. on-cells-> activity-> off-cells. off-cells-> fatigue. Diurnal: sourceOn=[light, arousal]; sourceOff=arousal;. Nocturnal: sourceOn=arousal; sourceOff=[arousal, light];.
  • image p586fig16.16 In the place cell learning model of (Gorchetnikov, Grossberg 2007), three populations of five cells each of entorhinal grid cells (only two are shown) with different spatial periods input to the model's dentate gyrus. The grid cells are one-dimensional and defined algorithmically. A model dentate gyrus granule cell that receives strong projections from all three grid cell scales fires (green cell) and activates a recurrent inhibitory interneuron that inhibits other granule cells. It also generates back-propagating action potentials that trigger learning in the adaptive weights of the projections from the grid cells, thereby causing learning of place cell receptive fields.
    || Grid-to-place Self-Organizing map (Gorchetnikov, Grossberg 2007). Formation of place cell fields via grid-to-place cell learning. Least common multiple: [grid (cm), place (m)] scales: [40, 50, 60 (cm); 6m], [50, 60, 70 (cm); 21m], [41, 53, 59 (cm); 1.282 km]. Our simulations: [40, 50 (cm); 2m], [44, 52 (cm); 5.72m]. Our SOM: Spiking Hodgkin-Huxley membrane equations; Nonlinear choice by contrast-enhancing recurrent on-center off-surround net;. Choice triggers back-propagating action potentials that induce STDP-modulated learning on cell dendrites.
  • image p627tbl17.01 Homologs between reaction-diffusion and recurrent shunting cellular network models of development.
    || byRows: (reaction-diffusion, recurrent shunting net) (activator, excitatory activity) (inhibitor, inhibitory activity) (morphogenic source density, inputs) (firing of morphogen gradient, contrast enhancement) (maintenance of morphogen gradient, short-term memory) (power or sigmoidal signal functions, power or sigmoidal signal functions) (on-center off-surround interactions via diffusion, on-center off-surround interactions via signals) (self-stabilizing distributions of morphogens if inhibitors equilibrate rapidly, short-term memory pattern if inhibitors equilibrate rapidly) (periodic pulses if inhibitors equilibrate slowly, periodic pulses if inhibitors equilibrate slowly) (regulation, adaptation).
  • image p016fig01.11 A sufficiently big mismatch between a bottom-up input pattern and a top-down expectation can activate the orienting system, which triggers a burst of nonspecific arousal that can reset the recognition category that read out the expectation. In this way, unexpected events can reset short-term memory and initiate a search for a category that better represents the current situation.
    || [category- top-down (TD) expectation; Bottom-up (BU) input pattern] -> Feature pattern -> BU-TD mismatch -> orienting system -> non-specific arousal -> category.
  • image p038fig01.25 The ART Matching Rule stabilizes real time learning using a [top-down, modulatory on-center, off-surround] network. Object attention is realized by such a network. See text for additional discussion.
    || ART Matching Rule [volition, categories, features]. [one, two] against one.
  • image p052fig02.02 Feature-category resonances enable us to rapidly learn how to recognize objects without experiencing catastrophic forgetting. Attentive matching between bottom-up feature pattern inputs and top-down expectations prevent catastrophic forgetting by focussing object attention upon expected patterns of features, while suppressing outlier features that might otherwise have caused catastophic forgetting if they were learned also.
    || Adaptive Resonance. Attended feature clusters reactivate bottom-up pathways. Activated categories reactivate their top-down pathways. Categories STM, Feature patterns STM. Feature-Category resonance [synchronize, amplify, prolong]s system response. Resonance triggers learning in bottom-up and top-down adaptive weights: adaptive resonance!
  • image p078fig02.31 How noise suppression enables matching of bottom-up and top-down input patterns.
    || Noise suppression -> pattern matching. mismatch (out of phase) suppressed, match (in phase) amplifies pattern.
  • image p079fig02.32 Matching amplifies the matched pattern due to automatic gain control. See terms I and J in the equation.
    || Substrate of resonance. Match (in phase) of BU and TD input patterns AMPLIFIES matched pattern due to automatic gain control by shunting terms. J = sum[i: Ji], I = sum[i: Ii], θi = (Ii + Ji)/(I + J)
    xi = (B + C)*(I + J)/(A + I + J)*[θi -C/(B + C)]
    Need top-down expectations to be MODULATORY.
  • image p087fig03.01 A macrocircuit of key visual processes (in green) and the cortical areas in which they primarily occur (in red), from the retina to the Prefrontal Cortex (PFC), including both the What and Where cortical streams. The [bottom-up, horizontal, and top-down] interactions help each of these processes to overcome computationally complementary processing deficiencies that they would experience without them, and aso to read-out top-down expectations that help to stabilize learning while they focus attention on salient objects and positions.
    || Emerging unified theory of visual intelligence. [What, Where] streams. Bottom-up and top-down interactions overcome COMPLEMENTARY processing deficiencies.
  • image p091fig03.04 A cross-section of the eye, and top-down view of the retina, shao how the blind spot and retinal veins can occlude the registration of light signals at their positions on the retina.
    || Eye: [optic nerve, ciliary body, iris,lens, pupil, cornea, sclera, choroid, retina]. Human retina: [fovea, blind spot, optic nerve]. see alsi cross-section of retinal layer.
  • image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
    || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
  • image p192fig05.06 Bottom-up and top-down circuits between the LGN and cortical area V1. The top-down circuits obey the ART Matching Rule for matching with bottom-up input patterns and focussing attention on expected critical features.
    || Model V1-LGN circuits, version [1, 2]. retina -> LGN relay cells -> interneurons -> cortex [simple, endstopped] cells -> cortex complex cells
  • image p193fig05.08 The patterns of LGN activation and inhibition on the sides and ends of a line without the top-down feedback (A) and with it (C). The top-down distribution of excitation (+) and inhibition (-) are shown in (B).
    ||
  • image p199fig05.11 Instar learning enables a bottom-up adaptive filter to become selectively tuned to particular feature patterns. Such pattern learning needs adaptive weights that can either increase or decrease to match the featural activations that they filter.
    || Instar learning STM->LTM: need both increases and decreases in strength for the LTM pattern to learn the STM pattern
  • image p200fig05.13 Instar and outstar learning are often used to learn the adaptive weights in the bottom-up filters and top-down expectations that occur in ART. The ART Matching Rule for object attention enables top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features.
    || Expectations focus attention: feature pattern (STM), Bottom-Up adaptive filter (LTM), Category (STM), competition, Top-Down expectation (LTM); ART Matching Rule: STM before top-down matching, STM after top-down matching (attention!)
  • image p211fig05.20 The PN and N200 event-related potentials are computationally complementary events that are computed within the attentional and orienting systems.
    || PN and N200 are complementary waves. PN [top-down, conditionable, specific] match; N200 [bottom-up, unconditionable, nonspecific] mismatch
  • image p214fig05.24 Learning of a top-down expectation must occur during bottom-up learning in the adaptive filter in order to be able to match the previously associated feature pattern with the one that is currently active.
    || Learning top-down expectations. When the code (green right triangle GRT) for X1 was learned at F2, GRT learned to read-out X1 at F1. [Bottom-Up, Top-Down] learning
  • image p214fig05.25 The sequence of events whereby a novel input pattern can activate a category which, in turn, reads out its learned top-down expectation to be matched against the input pattern. Error correction thus requires the use of a Match Detector that has properties of the Processing Negativity ERP.
    || How is an error corrected. During bottom-up learning, top-down learning must also occur so that the pattern that is read out top-down can be compared with the pattern that is activated by bottom-up inputs. Match detector: Processing Negativity ERP. 1. top-down, 2. conditionable, 3. specific, 4. match
  • image p214fig05.26 When a big enough mismatch occurs, the orienting system is activated and sends a burst of nonspecific arousal to the category level. This Mismatch Detector has properties of the N200 ERP.
    || Mismatch triggers nonspecific arousal. Mismatch at F1 eleicits a nonspecific event at F2. Call this event nonspecific arousal. N200 ERP Naatanen etal: 1. bottom-up, 2. unconditionable, 3. nonspecific, 4. mismatch
  • image p215fig05.28 How a mismatch between bottom-up and top-down input patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level.
    || Mismatch -> inhibition -> arousal -> reset. BU input orienting arousal, BU+TD mismatch arousal and reset. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
  • image p220fig05.29 Vigilance is a gain parameter on inputs to the orienting system that regulates whether net excitation from bottom-up inputs or inhibition from activated categories will dominate the orienting system. If excitation wins, then a memory search for a better matching will occur. If inhibition wins, then the orienting system will remain quiet, thereby enabling resonance and learning to occur.
    || Vigilance control [resonate and learn, reset and search]. ρ is a sensitivity or gain parameter
  • image p221fig05.30 When a predictive disconfirmation occurs, vigilance increases enough to drive a search for a more predictive category. If vigilance increases just enough to exceed the analog match between features that survive top-down matching and the entire bottom-up input pattern, then minimax learning occurs. In this case, the minimum amount of category generalization is given up to correct the predictive error.
    || Match tracking realizes minimax learning principle. Given a predictive error, vigilance increases just enough to trigger search and thus acrifices the minimum generalization to correct the error ... and enables expert knowledge to be incrementally learned. predictive error -> vigilance increase just enough -> minimax learning
  • image p221fig05.31 A system like Fuzzy ARTMAP can learn to associate learned categories in one ART network with learned categories in a second ART network. Because both bottom-up and top-down interactions occur in both networks, a bottom-up input pattern to the first ART network can learn to generate a top-down output pattern from the second ART network.
    || Fuzzy ARTMAP. Match tracking realizes minimax learning principle: vigilence increases to just above the match ratio of prototype / exemplar, thereby triggering search
  • image p226fig05.35 I had shown in 1976 how a competitive learning or self-organizing map model could undergo catastrophic forgetting if the input environment was sufficiently dense and nonstationary, as illustrated by Figure 5.18. Later work with Gail Carpenter showed how, if the ART Matching Rule was shut off, repeating just four input patterns in the correct order could also casue catastrophic forgetting by causing superset recoding, as illustrated in Figure 5.36.
    || Code instability input sequences. D C A; B A; B C = ; |D|<|B|<|C|; where |E| is the number of features in the set E. Any set of input vectors that satisfy the above conditions will lead to unstable coding if they are periodically presented in the order ABCAD and the top-down ART Matching Rule is shut off.
  • image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
    ||
  • image p252fig06.01 A surface-shroud resonance begins to form when the surface representations of objects bid for spatial attention. In addition to these topographic excitatory inputs, there is long-range inhibition of the spatial attention cells that determines which inputs will attract spatial attention.
    || Bottom-up spatial attention competition. [more, less] luminous perceptual surfaces -> competition -> spatial attention
  • image p253fig06.02 After bottom-up surface inputs activate spatial attentional cells, they send top-down topographic excitatory signals back to the surface representations. This recurrent shunting on-center off-surround network contrast enhances larger attentional activities while approximately normalizing the total spatial attentional activity. A surface-shroud resonance hereby forms that selects an attentional shroud, enhances the perceived contrast of the attended surface (light blue region), and maintains spatial attention on it.
    || Surface-shroud resonance. perceptual surfaces -> competition -> spatial attention. (Carrasco, Penpeci-Talgar, and Eckstein 2000, Reynolds and Desimone 2003)
  • image p258fig06.07 A top-down spotlight of attention can also be converted into a shroud. This process begins when the spotlight triggers surface filling-in within a region. Figure 6.8 shows how it is completed.
    || Reconciling spotlights and shrouds: top-down attentional spotlight becomes a shroud. spotlight of attention, surface filling-in
  • image p286fig07.04 Illusory contours persist longer than real contours because real contours have more inducers whose rebound at contour offset can cause faster boundary reset. Illusory contours also take longer to form than real contours, which explains the increasing portion of the curve.
    || Persistence data and simulations (Meyer, Ming 1988; Reynolds 1981). Increasing portion of curve is due to formation time of the illusory contour. Longer persistence is due to fewer bottom-up inducers of an illusory contour that has the same length as a real contour: only illuminance-derived edges generate reset signals. When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
  • image p286fig07.05 This figure shows the propagation through time of illusory contour offset from the rebounded cells that got direct inputs to the center of the contour.
    || Persistence data and simulations. Illusory contours persist longer than real contours (Meyer, Ming 1988; Reynolds 1981). When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
  • image p315fig08.35 The output signals from the directional grouping network obey the ART Matching Rule. They thereby select consistent motion directional signals while suppressing inconsistent ones, and do not distort that the spared cells code. The aperture problem is hereby solved by the same mechanism that dynamically stabilizes the learning of directional grouping cells.
    || How to select correct direction and preserve speed estimates? Prediction: Feedback from MSTv to MT- obeys ART Matching Rule; Top-down, modulatory on-center, off-surround network (Grossberg 1976, 1980; Carpenter, Grossberg 1987, 1991); Explains how directional grouping network can stably develop and how top-down directional attention can work. (Cavanough 1992; Goner etal 1986; Sekuer, Ball 1977; Stelmach etal 1994). Directional grouping network (MSTv) <-> Directional long-range filter (MT). Modulatory on-center selects chosen direction and preserves speed. Off-surround inhibits incompatible directions.
  • image p330fig08.52 Direction fields of the object frame (left column) and of the two dot "parts" (right column) show the correct motion directions after the peak shift top-down expectation acts.
    || Simulation of motion vector decomposition. [Larger scale (nearer depth), Small scale (farther depth)] vs [Down, Up]
  • image p331fig08.54 The simulated part directions of the rotating dot through time after the translational motion of the frame does its work via the top-down peak shift mechanism.
    || Cycloid. Motion directions of a single dot moving slowly along a cycloid curve through time.
  • image p354fig10.01 The lamiar cortical circuit that realizes how we pay attention to an object sends signals from layer 6 of a higher cortical level to layer 6 of a lower cortical level and then back up to layer 4. This "folded feedback" circuit realizes a top-down, modulatory on-center, off-surround circuit that realizes the ART Matching Rule.
    || Top-down attention and folded feedback. Attentional signals also feed back into 6-to-4 on-center off-surround. 1-to-5-to-6 feedback path: Macaque (Lund, Booth 1975) cat (Gilbert, Wiesel 1979). V2-to-V1 feedback is on-center off-surround and affects layer 6 of V1 the most (Bullier etal 1996; Sandell, Schiller 1982). Attended stimuli enhanced, ignored stimuli suppressed. This circuit supports the predicted ART Matching Rule! [LGN, V[1,2][6->1]]
  • image p356fig10.03 Laminar computing achieves at least three basic properties of visual processing that have analogs in all biologically intelligent behaviors. These properties may be found in all cortical circuits in specialized form.
    || What does Laminar Computing achieve? 1. Self-stabilizing development and learning; 2. Seamless fusion of a) pre-attentive automatic bottom-up processing, b) attentive task-selective top-down processing; 3. Analog coherence: Solution of Binding Problem for perceptual grouping without loss of analog sensitivity. Even the earliest visual cortical stages carry out active adaptive information processing: [learn, group, attention]ing
  • image p359fig10.05 Activation of V1 is initiated, in part, by direct excitatory signals from the LGN to layer 4 of V1.
    || How are layer 2/3 bipole cells activated? Direct bottom-up activation of layer 4. LGN -> V1 layer 4. Strong bottom-up LGN input to layer 4 (Stratford etal 1996; Chung, Ferster 1998). Many details omitted.
  • image p359fig10.06 Another, albeit indirect, pathway from LGN exists that can also excite layer 4 of V1. Why are not these two pathways redundant? The answer, ultimately, how to do with how cortex learns, as well as with how it pays attention. See the text for details.
    || Another bottom-up input to layer 4: Why?? Layer 6-to-4 on-center off-surround (Grieve, Sillito 1991, 1995; Ahmedetal 1994, 1997). LGN projects to layers 6 and 4. Layer 6 excites spiny stellates in column above it. Medium range connections onto inhibitory neurons. 6-t-4 path acts as on-center off-curround.
  • image p359fig10.07 The two bottom-up pathways from LGN to layer 4 of V1 can together activate layer 4 and contrast-normalize layer 4 responses.
    || Bottom-up contrast normalization (Grossberg 1968, 1973; Sperling, Sondhi 1968; Heeger 1992; Douglas etal 1995; Shapley etal 2004). Together, direct LGN-to-4 path and 6-to-4 on-center off-surround provide contrast normalization if cells obey shunting or membrane equation dynamics.
  • image p360fig10.08 The bottom-up on-center off-surround from LGN-to-6-to-4 has a modulatory on-center because of its role in realizing the ART Matching Rule and, with it, the ability of the cortex to dynamically stabilize its learned memories.
    || Modulation of priming by 6-to-4 on-center (Stratford etal 1996; Callaway 1998). On-center 6-to-4 excitation is inhibited down to being modulatory (priming, subthreshold). On-center 6-to-4 excitation cannot activate layer 4 on its own. Clarifies need for direct path. Prediction: plays key role in stable grouping, development and learning. ART Matching Rule!
  • image p364fig10.13 The bottom-up adaptive filter, intracortical grouping circuit, and intercortical top-down attentional circuit all use the same competitive decision circuit between layers 6 and 4, called the attention-preattention interface, with which to select the featural patterns that will be processed.
    || Bottom-up filters and intracortical grouping feedback use the same 6-to-4 decision circuit, LGN-> Vx[6,4,2/3]. competitive decision circuit, modulatory on-center off-surround network. Top-down intercortical attention also uses the same 6-to-4 decision circuit!
  • image p364fig10.14 This figure emphasizes how preattentive intracortical groupings and top-down intercortical attention share the same modulatory on-center, off-surround layer 4-to-6 decision circuit.
    || Explanation: grouping and attention share the same modulatory decision circuit. Layer 6-6-4-2/3 pathway shown; also a layer 6-1-2/3 path. intercortical attention, both act via a modulatory on-center off-surround decision circuit, intracortical feedback from groupings
  • image p420fig12.18 The ARTSTREAM model explains and simulates the auditory continuity illusion as an example of a spectral-pitch resonance. Interactions of ART Matching Rule and asymmetric competition mechanisms in cortical strip maps explain how the tone selects the consistent frequency from the noise in its own stream while separating the rest of the noise into another stream.
    || ARTSTREAM model (Grossberg 1999; Grossberg, Govindarajan, Wyse, Cohen 2004). SPINET. Frequency and pitch strips. Bottom Up (BU) harmonic sieve. Top Down (TD) harmonic ART matching. Exclusive allocation. Learn pitch categories based on early harmonic processing. A stream is a Spectral-Pitch Resonance!
  • image p441fig12.38 The LTM Invariance Principle is realized if the relative sizes of the inputs to the list chunk level stay the same as more items are stored in working memory. This property, in turn, follows from shunting previously stored working memory activities when a ne4w item occurs.
    || LTM Invariance principle. Choose STM activities so that newly stored STM activities may alter the size of old STM activities without recoding their LTM patterns. In particular: New events do not change the relative activities of past event sequences, but may reduce their absolute activites. Why? Bottom-up adaptive filtering uses dot products: T(j) = sum[i=1 to n: x(i)*z(i,j) = total input to v(j). The relative sizes of inputs to coding nodes v(j) are preserved. x(i) -> w*x(i), 0 < w <= 1, leaves all past ratios T(j)/T(k) unchanged.
  • image p449fig12.47 This figure illustrates the self-similarity in a Masking Field of both its recurrent inhibitory connections (red arrows) and its top-down excitatory priming signals (green arrows) to the item chunk working memory.
    || Both recurrent inhibition and top-down excitatory priming are self-similar in a masking field. MYSELF <-> [MY, MYSELF]
  • image p453fig12.50 The ARTWORD perception cycle shows how sequences of items activate possible list chunks, which compete among each other and begin to send their top-down expectations back to the item working memory. An item-list resonance develops through time as a result.
    || ARTWORD perception cycle. (a) bottom-up activation (b) list chunk competition (c) item-list resonance (d) chunk reset due to habituative collapse.
  • image p483fig13.02 The object-value categories in the orbitofrontal cortex require converging specific inputs from the sensory cortex and nonspecific incentive motivational inputs from the amygdala in order to fire. When the orbitofrontal cortex fires, it can deliver top-down ART Matching Rule priming signals to the sensory cortical area by which it was activated, thereby helping to choose the active recognition categories there that have the most emotional support, while suppressing others, leading to attentional blocking of irrelevant cues.
    || Cognitive-Emotional-Motor (CogEM) model. Drive-> amygdala incentive motivational learning-> orbitofrontal cortex- need converging cue and incentive iputs to fire <-> sensory cortex- conditioned renforcer learning-> amygdala. CS-> sensory cortex. Motivated attention closes the cognitive-emotional feedback loop, focuses on relevant cues, and causes blocking of irrelevant cues.
  • image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
    || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
  • image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
    ||
  • image p600fig16.36 The entorhinal-hipppocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories. See the text for details.
    || Entorhinal-hippocampal interactions as an ART system. Hippocampal place cells as spatial categories. Angular head velocity-> head direction cells-> stripe cells- small scale 1D periodic code (ECIII) SOM-> grid cells- small scale 2D periodic code (ECII) SOM-> place cells- larger scale spatial map (DG/CA3)-> place cells (CA1)-> conjunctive-coding cells (EC V/VI)-> top-down feedback back to stripe cells- small scale 1D periodic code (ECIII). stripe cells- small scale 1D periodic code (ECIII)-> place cells (CA1).
  • image p613fig16.44 The main target position vector (TPV), difference vector (DV), and volitional GO computations in SOVEREIGN that bring together reactive and planned signals to control decision-making and action. See the text for details.
    || Reactive visual TPV (RVT), NETs (NETs), S-MV mismatch (SMVM), NETmv (NETmv), reactive visual TPV storage (RVTS), reactive DV1 (RD1), NET (NET), motivated what and where decisions (MWWD), Planned DV1 (PD1), tonic (Tonic), top-down readout mismatch (TDRM), Parvo gate (tonic) (PG), Orienting GOp offset (OGpO). RVT-> [NETs, RVTS], NETs-> [SMVM, NET], SMVM-> NET, NETmv-> SMVM, RVTS-> [NETs, RD1], NET-> [RD1, PD1, TDRM], MWWD-> PD1, PD1-> Tonic-> TDRMPG-> NETs, OGpO-> [NETmv, PD1].
  • Grossberg 2021 p229c2h0.60 SMART computer simulations demonstrate that a good enough match of a top-down expectation with a bottom-up feature pattern generates an attentive resonance during which the spikes of active cells synchronize in the gamma frequency range of 20-70 Hz (Figure 5.40). Many labs have reported a link between attention and gamma oscillations in the brain, including two articles published in 2001, one from the laboratory of Robert Desimone when he was at the the National Institute of Mental Health in Bethseda (Fries, Reynolds, Rorie, Desimone 2001), and the other from the laboratory of Wolf Singer in Frankfurt (Engel, Fries, Singer 2001). You'll note that Pascal Fries participated in both studies, and is an acknowledged leader in neurobiological studies of gamma oscillations; eg (Fries 2009). .."
  • Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
    The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

    Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

    Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

    These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

    This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

    The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

    I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

    Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

    The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

    If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

    The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

    Howell: bold added for emphasis.
    (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
    see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
  • p190 Howell: [neural microcircuits, modal architectures] used in ART -
    bottom-up filterstop-down expectationspurpose
    instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
    LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
    EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
    auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
    visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
    red - cognitive-emotional dynamics
    green - working memory dynamics
    black - see [bottom-up, top-down] lists
    EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

    backgound colours in the table signify :
    whitegeneral microcircuit : a possible component of ART architecture
    lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
    light bluepost-perceptual cognition?
    pink"the feeling of what happens" and knowing what event caused that feeling
  • image p009fig01.06 Primacy gradient of activity stored in working memory within a recurrent shunting on-center off-surround network. Rehersal is controlled by a nonspecific rehersal wave and self-inhibitory feedback of the item that is currently being rehearsed. Rehearsal is controlled by a nonspecific rehearsal wave and self-inhibitory feedback of the item that is currently being rehearsed. Green = excitatory, red = inhibitory
    || inputs? -> item and order WM storage -> competitive selection-> rehearsal wave -> outputs
  • image p077fig02.27 A schematic of the on-center off-surround network that occurs in the mudpuppy retina, including three main cell types: receptors, horizontal cells, and bipolar cells.
    || Mechanism: cooperative-competitive dynamics.
    On-center off-surround (Kuffler 1953) cat retina
    Subtractive lateral inhibition (Hartline, Ratcliff 1956/7+) limulus retina.
    R receptor -> H horizontal -> B bipolar (Werblin, Dowling, etal 1969+) mudpuppy retina.
  • image p100fig03.15 A fuzzy band of possible initial grouping orientations allows grouping to get started. Cooperative-competitive feedback via a hierarchical resolution of uncertainty chooses a sharp final grouping that has the most evidence to support it.
    || before choice: transient; after choice: equilibrium
  • image p108fig03.28 The watercolor illusion of Baingio Pinna 1987 can be explained using spatial competition betweeen like-oriented boundary signals. This occurs at what I have called the First Competitive Stage. This is one stage in the brain's computation of hypercomplex cells, which are also called endstopped complex cells. Why the blue regions seem to bulge in depth may be explained using multple-scale, depth-selective boundary webs. See ther text for details.
    || Baigio Pinna. Watercolor illusion 1987. Filled-in regions bulge in depth. Multiple-scale, depth-selective boundary web!
  • image p146fig04.25 Networks of simple, complex, and hypercomplex cells can create end cuts as an example of hierarchical resolution of uncertainty. See the text for details.
    || How are end cuts created? (Grossberg 1984) Two stages of short-range competition. 1st stage: Simple cells -> complex cells -> hypercomplex - endstopped complex. First competitive stage- across position, same orientation; Second competitive stage- same position, across orientation. -> cooperation.
  • image p148fig04.26 End cuts are formed during neon color spreading in the same way that they are formed at line ends.
    || End cut during neon color spreading.
    FIRST competitive stageSECOND competitive stage
    within orientationacross orientation
    across positionwithin position
    to generate end cuts.
  • image p149fig04.27 Bipole cells can form boundaries that interpolate end cuts, and use their cooperative-competitive interactions to choose the boundary groupings that have the most support from them.
    || Bipole cells: boundary completion. long-range cooperation & short-range inhibition: complete winning boundary groupings and suppress weaker boundaries.
  • image p161fig04.37 Kanizsa squares that form either collinearly to their inducers (left panel) or perpendicular to them (right panel) confirm predictions of the BCS boundary completion model.
    || Analog-sensitive boundary completion. contour strength vs Kanizsa square image. Increases with "support ratio" (Shipley, Kellman 1992). Inverted-U (Lesher, Mingoloa 1993; cf Soriano, Spillmann, Bach 1994)(shifted gratings). p370h0.6 BCS = Boundary Contour System, FCS = Feature Contour System. p161c1h0.85 "... As predicted by the BCS, they found an Inverted-U in contour strength as a function of line density. ... This effect may be explained by the action of the short-range competition that occurs before the stage of long-range cooperative grouping by bipole cells (Figure 4.32). It is thus another example of the balance between cooperative and competitive mechanisms. ..."
  • image p198fig05.10 A competitive learning circuit learns to transform distributed feature patterns into selective responses of recognition categories.
    || Competitive learning and Self-Organized Maps (SOMs). input patterns -> feature level (F1) -> adaptive filter (T=ZS) ->
  • image p205fig05.18 How catastrophic forgetting can occur in a competitive learning or self-organizing map model due to basic properties of competition and associative learning.
    || Learning from pattern sequences, practicing a sequence of spatial patterns can recode all of them! When is learning stable? Input patterns cannot be too dense relative to the number of categories; Either: not to many distributed inputs relative to the number of categories, or not too many input clusters
  • image p226fig05.35 I had shown in 1976 how a competitive learning or self-organizing map model could undergo catastrophic forgetting if the input environment was sufficiently dense and nonstationary, as illustrated by Figure 5.18. Later work with Gail Carpenter showed how, if the ART Matching Rule was shut off, repeating just four input patterns in the correct order could also casue catastrophic forgetting by causing superset recoding, as illustrated in Figure 5.36.
    || Code instability input sequences. D C A; B A; B C = ; |D|<|B|<|C|; where |E| is the number of features in the set E. Any set of input vectors that satisfy the above conditions will lead to unstable coding if they are periodically presented in the order ABCAD and the top-down ART Matching Rule is shut off.
  • image p287fig07.07 Persistence increases with distance between a target and a masking stimulus due to weakening of the spatial competition in the first competitive stage of hypercomplex cells.
    || Persistence data and simulations. Persistence increases with distance between a target and a masking stimulus (Farrell, Pavel, Sperling 1990). There is less spatial competition from the masker to the target when they are more distant, hence the target is more persistent.
  • image p364fig10.13 The bottom-up adaptive filter, intracortical grouping circuit, and intercortical top-down attentional circuit all use the same competitive decision circuit between layers 6 and 4, called the attention-preattention interface, with which to select the featural patterns that will be processed.
    || Bottom-up filters and intracortical grouping feedback use the same 6-to-4 decision circuit, LGN-> Vx[6,4,2/3]. competitive decision circuit, modulatory on-center off-surround network. Top-down intercortical attention also uses the same 6-to-4 decision circuit!
  • image p437fig12.33 Item and Order working memory models explain free recall data, as well as many other psychological and neurobiological data, by simulating how temporal series of events are stored as evolving spatial patterns of activity at content-addressable item categories. The categories with the largest activities are rehearsed first, and self-inhibit their activity as they do so in order to prevent tem from being rehearsed perseveratively. The laws whereby the items are stored in working memory obey basic design principles concerning list categories, or chunks, of sequences of stored items can be stably remembered.
    || Working memory models: item and order, or competitive queuing (Grossberg 1978; Houghton 1990; Page, Norris 1998). Event sequence in time stored as an evolving spatial pattern of activity. Primacy gradient of working memory activation stores correct temporal order at content-addressable cells. Maximally activated cell populations is performed next when a rehearsal wave is turned on. Output signal from chosen cell population inhibits its own activity to prevent perseveration: inhibition of return. Iterate until entire sequence is performed.
  • image p488fig13.12 (left column) How incentive motivational feedback amplifies activity of a sensory cortical cell population. (right column) A sensory cortical cell population whose activity is amplified by incentive motivational feedback can suppress the activities of less activated populations via self-normalizing recurrent competitive interactions.
    || Motivational feedback and blocking. (left) sensory input CS, STM activity without motivational feedback, STM activity with motivational feedback. (right) STM suppressed by competition, STM amplified by (+) feedback.
  • image p510fig13.39 Shunting competition and informational noise suppression in affective gated dipoles, plus back-propagating action potentials for teaching signals, enable the net normalized adaptive weights to be learned. They never saturate!
    || Learn net dipole output pattern. Opponent "decision" controls learning. Cf. competitive learning. Learning signal, opponent extinction.
  • p001 Chapter 1 Overview - From Complementary Computing and Adaptive Resonance to conscious awareness
  • p250 Chapter 6 Conscious seeing and invariant recognition - Complementary cortical streams coordinate attention for seeing and recognition
  • p289 Chapter 8 How we see and recognize object motion - Visual form and motion perception obey complementary laws
  • p337 Chapter 9 Target tracking, navigation, and decision-making - Visual tracking and navigation obey complementary laws
  • image p029tbl01.01 Some pairs of complementary processing streams.
    ||
    visual boundary:
    interblob stream V1-V2-V4
    visual surface:
    blob stream V1-V2-V4
    visual boundary:
    interblob stream V1-V2-V4
    visual motion:
    magno stream V1-MT-MST
    WHAT streamWHERE stream
    perception & recognition:
    interferotemporal & prefrontal areas
    space & action:
    parietal & prefrontal areas
    object tracking:
    MT interbands & MSTv
    optic flow navigation:
    MT+ bands & MSTd
    motor target position:
    motor & parietal cortex
    volitional speed:
    basal ganglia
  • image p030tbl01.02 The What and Where cortical processing streams obey complementary laws. These laws enable the What stream to rapidly and stably learn invariant object categories without experiencing catastrophic forgetting, while the Where stream learns labile spatial and action representations to control actions that are aimed towards these objects.
    ||
    WHATWHERE
    spatially-invariant object learning and recognitionspatially-variant reaching and movement
    fast learning without catastrophic forgettingcontinually update sensory-motor maps and gains
    IT InterferoTemporal CortexPPC Posterior Parietal Cortex
    WhatWhere
    matchingexcitatoryinhibitory
    learningmatchmismatch
  • image p087fig03.01 A macrocircuit of key visual processes (in green) and the cortical areas in which they primarily occur (in red), from the retina to the Prefrontal Cortex (PFC), including both the What and Where cortical streams. The [bottom-up, horizontal, and top-down] interactions help each of these processes to overcome computationally complementary processing deficiencies that they would experience without them, and aso to read-out top-down expectations that help to stabilize learning while they focus attention on salient objects and positions.
    || Emerging unified theory of visual intelligence. [What, Where] streams. Bottom-up and top-down interactions overcome COMPLEMENTARY processing deficiencies.
  • image p094fig03.07 The processes of boundary completion and surface filling-in are computationally complementary.
    ||
    Boundary completionSurface filling-in
    outwardinward
    orientedunoriented
    insensitive to direction of contrastsensitive to direction-of-contrast
  • image p174fig04.51 The same feedback circuit that ensures complementary consistency between boundaries and surfaces also, automatically, initiates figure-ground separation! See the text for details.
    || before feedback: [V1 -> V2 pale stripe -> V2 thin stripe, "attention pointers" (Cavanagh etal 2010)]; after feedback: [V1 + V2 thin stripe] -> V2 pale stripe via contrast sensitive [exhitation, inhibition] for depths [1, 2] -> object recognition
  • image p176fig04.53 The on-center off-surround network within position and across depth helps to explain why brighter Kanizsa squares look closer.
    || inhibition vs. depth. p176c1h0.25 "... to qualitatively understand how this example of proximity-luminance covariance works. It follows directly from the boundary pruning by surface contour feedback signals (Figure 4.51) that achieves complementary consistency and initiates figure-ground perception. ...". p176c1h0.45 "... these inhibitory sigals are part of an off-surround network whose strength decreases as the depth difference increases between the surface that generates the signal and its recipient boundaries. ...". p176c1h0.8 "... Within FACADE theory, the perceived depth of a surface is controlled by the boundaries that act as its filling-in generators and barriers (Figure 3.22), since these boundaries select the depth-sselective FIDOs within whin filling-in can occur, and thereby achieve surface capture. These boundaries, in turn, are themselves strengthened after surface-to-boundary contour feedback eliminates redundant boundaries that cannot support sucessful filling-in (Figure 4.51). These surface contour feedback signals have precisely the properties that are needed to explain why brighter Kanizsa squares look closer! ..."
  • image p211fig05.20 The PN and N200 event-related potentials are computationally complementary events that are computed within the attentional and orienting systems.
    || PN and N200 are complementary waves. PN [top-down, conditionable, specific] match; N200 [bottom-up, unconditionable, nonspecific] mismatch
  • image p267fig06.14 Feedback from object surfaces to object boundaries uses surface contours. This feedback assures complementary consistency and enables figure-ground separation. A corollary discharge of the surface contours can be used to compite salient object feature positions.
    || Perceptual consistency and figure-ground separation.
  • image p314fig08.34 The VISTARS model for visually-based spatial navigation. It uses the Motion BCS as a front end and feeds it output signals into two computationally complementary cortical processing streams for computing optic flow and target tracking information.
    || VISTARS navigation model (Browning, Grossberg, Mingolia 2009). Use FORMOTION model as front end for higher level navigational circuits: input natural image sequences -> estimate heading (MT+)-MSTd -> additive processing -> estimate object position (MT-)-MSTv direction and speed subtractive processing -> Complementary Computing. [optic flow navigation, object tracking]
  • image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
    ||
  • image p548fig15.16 Homologous recognition learning and reinforcement learning macrocicuits enable adaptively timed conditioning in the reinforcement learning circuit to increase inhibition of the orienting system at times when a mismatch in the recognition system would have reduced inhibition of it.
    || Homolog between ART and CogEM model, complementary systems. [Recognition, Reinforcement] learning vs [Attentional, Orienting] system. Reinforcement: timing, drive representation.
  • p353 Chapter 10 Laminar computing by cerebral cortex - Towards a unified theory of biologucal and artificial intelligence
  • image p030fig01.20 A schematic cross-section of a slice of laminar neocortex whose cells are organized in a characteristic way in six layers, which themselves may be organized into distinct sublaminae. The computational paradigm of Laminar Computing attempts to show how different parts of neocortex can represent and control very different kinds of behavior - including vision, speech, can cognition - using specializations of the same canonical laminar cortical design.
    || Projection fibres: Cortico[spinal, bulbar, pontine, striate, reticulat, etc]; Thalamocortical fibres; Diffuse cortical afferent fibres: [nonspecific thalamocortical, Cholinergic, Monoaminergic]; Corticocortical efferents; Projection [cell, fibre]; Corticocortical efferent terminals.
  • image p141fig04.19 A laminar cortical circuit for computing binocular disparities in layer 3B of V1 at binocular simple cells. These cells add positionally disparate inputes from like polarized monocular simple cells (layer 4 of V1). Binocular simple cells at each position that is sensitive to opposite polarities then add their outputs at complex cells in layer 2/3. Chapter 10 will explain how these laminar circuits work in greater detail.
    || Laminar cortical circuit for complex cells. [left, right] eye.
    V1 layerdescription
    2/3Acomplex cells
    3Bbinocular simple cells
    4monocular simple cells
  • image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
    || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
  • image p174fig04.52 An example of how the 3D LAMINART model can transform the two monocular images of the random dot stereogream in the top row into the three depth-separated surfaces representations in the bottom row.
    || Stereogram surface percepts: surface lightnesses are segregated in depth (Fang and Grossberg 2009). [left, right] inputs, [far, fixation, near] planes. Contrast algorithms that just compute disparity matches and let computer code build the surface eg (Marr, Poggio, etal 1974).
  • image p182fig04.58 LAMINART model processing stage that are sufficient to explain many percepts of transparency, including those summarized in Figure 4.57.
    || [left, right] eye, [LGN, V1 [6, 4, 3B, 2/3 A], V2 [4, 2/3]], [mo, bi]nocular cart [simple, complex] cells, [excita, inhibi]tory cart [connection, cell]s.
  • image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
    || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
  • image p231fig05.39 The SMART hypothesis testing and learning cycle predicts that vigilance increases when a mismatch in subcortical regions like the nonspecific thanlamus activates the nucleus basalis of Meynert which, in turn, broadcasts a burst of the neurotransmitter acetylcholine, or ACh, to deeper cortical layers. Due to the way in which LAMINART proposes that cortical matching and mismatching occurs, this ACh burst can increase vigilance and thereby trigger a memory search. See the text for details.
    || [BU input, [, non]specific thalamic nucleus, thalamic reticulate nucleus, neocortical laminar circuit] cart [Arousal, Reset, Search, Vigilance]
  • image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
    || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
  • image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
    ||
  • image p248fig05.49 This circuit of the LAMINART model helps to explain properties of Up and Down states during slow wave sleep, and how disturbances in ACh dynamics can disrupt them.
    ||
  • image p356fig10.03 Laminar computing achieves at least three basic properties of visual processing that have analogs in all biologically intelligent behaviors. These properties may be found in all cortical circuits in specialized form.
    || What does Laminar Computing achieve? 1. Self-stabilizing development and learning; 2. Seamless fusion of a) pre-attentive automatic bottom-up processing, b) attentive task-selective top-down processing; 3. Analog coherence: Solution of Binding Problem for perceptual grouping without loss of analog sensitivity. Even the earliest visual cortical stages carry out active adaptive information processing: [learn, group, attention]ing
  • image p357fig10.04 Laminar Computing achieves its properties by computing in a new way that sythesizes the best properties of feedforward and feedback interactions, analog and digital computations, and preattentive and attentive learning. The property of analog coherence enables coherent groupings and decisions to form without losing sensitivity to the amount of evidence that supports them.
    || Laminar Computing: a new way to compute. 1. Feedforward and feedback: a) Fast feedforward processing when data are unambiguous (eg Thorpe etal), b) slower feedback chooses among ambiguous alternatives [self-normalizing property, real-time probabiligy theory], c) A self-organizing system that trades certainty against speed: Goes beyond Bayesian models! 2. Analog and Digital: Analog Coherence combines the stability of digital with the sensitivity of analog. 3. Preattentive and Attentive Learning: Reconciles the differences of (eg) Helmholtz and Kanizsa, "A preattentive grouping is its own 'attentional' prime"
  • image p362fig10.11 Feedback between layer 2/3 to the layer 6-to-4-to-2/3 feedback loop chooses the strongest grouping in cases where there is more than one. If only one grouping exists, then the circuit can function very quickly in a feedforward manner. When multiple groupings exist, the cortex "runs as fast as it can" to select the one with the most evidence to support it using the self-normalizing inhibition in the layer 6-to-4 off-surround.
    || How is the final grouping selected? Folded feedback LGN-> 6-> 4-> 2/3. 1. Layer 2/3 groupings feed back into 6-to-4 on-center off-surround: a) direct layer 2/3 -to-6 path; b) can also go via layer 5 (Blasdel etal 1985; Kisvarday etal 1989). 2. Strongest grouping enhanced by its on-center. 3. Inputs to weaker groupings suppressed by off-surround. 4. Interlaminar feedback creates functional columns. Activities of conflicting groupings are reduced by self-normalizing inhibition, slowing processing; intracortical feedback selects and contrast-enhances the winning grouping, speeding processing.
  • image p363fig10.12 The same laminar circuit design repeats in V1 and V2, albeit with specializations that include longer horizontal grouping axoms and figure-ground separation interactions.
    || V2 repeats V1 circuitry at larger spatial scale, LGN-> V1[6,4,2/3]-> V2[6,4,2/3]. V2 layer 2/3 horizontal axons longer-range than in V1 (Amir etal 1993). Therefore, longer-range groupings can form in V2 (Von der Heydt etal 1984)
  • image p375fig11.07 The 3D LAMINART model uses both monocular and binocular simple cells to binocularly fuse like image contrasts. The remainder of the model generates 3D boundary and surface representations of multiple kinds of experiments as well as of natural scenes.
    || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A], V2 thin stripe [4->2/3A], V4]. V1 blob [V1-4 monocular, V1 interior binocular] simple cells. [complex, simple, inhibitory] cells, on-center off-surround
  • image p376fig11.10 The 3D LAMINART model shows how the disparity filter can be integrated into the circuit that completes 3D boundary representations using bipole grouping cells. It also explains how surface contours can strengthen boundaries that succeed in generating closed filling-in domains.
    || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A] surface contour, V2 thin stripe (monocular surface) [4->2/3A], V2 interior [disynaptic inhibitory interneurons, bipole grouping cells, disparity filter, V4 binocular surface]. [complex, simple, inhibitory] cells, on-center off-surround
  • image p378fig11.12 How monocular and binocular information are combined in V1 and V2 in the 3D LAMINART model.
    || Model utilizes monocular information. [left, right] eye cart V1-[4 monocular simple, 3B binocular simple, complex2/3A [mo,bi]nocular] cells, V2-4 binocular complex cells. black = monocular cells, blue = binocular cells. In V2, monocular inputs add to binocular inputs along the line of sight and contribute to depth perception.
  • image p379fig11.13 How the 3D LAMINART model explains DaVinci stereopsis. All the stages of boundary and surface formation are color coded to clarify their explanation. Although each mechanism is very simple, when all of them act together, the correct depthful surface representation is generated. See the text for details.
    || DaVinci stereopsis (Nakayama, Shimojo 1990). An emergent property of the previous simple mechanisms working together. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] pair [Binocular match: boundaries of thick bar -> Add monocular boundaries along lines-of-sight -> Line-of-sight inhibition kills weaker vertical boundaries -> 3D surface percept not just a disparity match!] pair [Binocular match: right edge of thin and thick bars -> Strongest boundaries: binocular and monocular boundaries add -> Vertical boundaries from monocular left edge of thin bar survive -> Filling-in contained by connected boundaries]. cart [very near, near, fixation plane, far, very far]
  • image p380fig11.14 The model explanation of DaVinci stereopsis when the input stimuli have opposite contrast polarities.
    || Polarity-reversed Da Vinci stereopsis (Nakayama, Shimojo 1990). Same explanation! (... as Figure 11.13 ...) [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p395fig11.34 A comparison of the properties of other rivalry models with those of the 3D LAMINART model (surrounded by red border). Significantly, only 3D LAMINART explains both stable vision and rivalry (green border).
    || Comparison of rivalry models
  • image p401fig11.41 The 3D LAMINART model proposes how angle cells and disparity-gradient interact through learning to generate 3D representations of slanted objects.
    || 3D LAMINART model. [LGN, V1, V2, V4] Four key additions: 1. Angle cells - tuned to various angles; 2. Disparity-gradient cells - tuned to disparity gradients in the image; 3. weights from [angle to disparity-gradient] cells - learned while viewing 3D image; Colinear grouping between [angle to disparity-gradient] cells - disambiguates ambiguous groupings.
  • image p402fig11.44 3D scenic reconstruction of the image in Figure 11.43 by the 3D LAMINART model.
    || Disparity [5, 6, 8, 10, 11, 14]: images of objects in common depth planes
  • image p436fig12.30 The conscious ARTWORD, or cARTWORD, laminar cortical speech model simulates how future context can disambiguate noisy past speech sounds in such a way that the completed percept is consciously heard to proceed from past to future as a feature-item-list resonant wave propagates through time.
    || cARTWORD: Laminar cortical model macrocircuit (Grossberg, Kazerounian 2011) Simulates PHONEMIC RESTORATION: Cognitive Working Memory (processed item sequences) - [Excitatory-> inhibitory-> habituative-> adaptive filter-> adaptive filter-> adaptive filter with depletable synapse-> Acoustic [item, feature]
  • image p444fig12.42 The LIST PARSE laminar cortical model of working memory and list chunking that I published with Lance Pearson in 2008 simulated the Averbeck etal data in Figure 12.41, as in the left column of the figure. It also simulated cognitive data about working memory storage by human subjects. See the text for details.
    || LIST PARSE: Laminar cortical model of working memory and list chunking (Grossberg, Pearson 2008). Simulates data about: [immediate, delayed, continuous] distractor free recall; immediate serial recall; and variable-speed sequential perfomance of motor acts. [velocity, acceleration] vs time (ms) from recall cue.
  • image p445fig12.43 The LIST PARSE laminar cortical Cognitive Working Memory circuit, that is proposed to occur in ventrolateral prefrontal cortex, is homologous to the LAMINART circuit circuit that models aspects of how visual cortex sees. The Motor Working Memory, VITE Trajectory Generator, and Variable-Rate Volitional Control circuits model how other brain regions, including dorsolateral prefrontal cortex, motor cortex, cerebellum, and basal ganglia, interact with the Cognitive Working Memory to control working memory storage and variable-rate performance of item sequences.
    || List parse circuit diagram. Connectivity convention. sequence chunks [<- BU filter, TD expectation ->] working memory. Working memory and sequence chunking circuit is homologous to visual LAMINART circuit!
  • image p498fig13.22 (left column, top row) Adaptive filtering and conditioned arousal are both needed to regulate what cues can learn to activate particular space-time patterns. These developments lead inexorably to basic cognitive abilities, as embodied in the 3D LAMINART models for 3D vision and figure-ground perception (Chapter 11) and the 3D ARTSCAN SEARCH model for invariant object learning, recognition, and 3D search (Chapter 6). (right column, top row) Conditioned arousal enables only emotionally important cues to activate a motivationally relevant space-time pattern. (bottom row) Conditioned arousal and drive representations arise naturally from the unlumping of avalanche circuits to make them selective to motivationally important cues. The MOTIVATOR model is a natural outcome of this unlumping process (this chapter).
    || (top) Adaptive filtering and Conditioned arousal. Towards Cognition: need to filter inputs to the command cell. Towards Emotion: important signals turn arousal ON and OFF. (bottom) Conditioned arousal and Drive representations. Competition between conditioned arousal sources at drive representations, eg amygdala.
  • image p038fig01.25 The ART Matching Rule stabilizes real time learning using a [top-down, modulatory on-center, off-surround] network. Object attention is realized by such a network. See text for additional discussion.
    || ART Matching Rule [volition, categories, features]. [one, two] against one.
  • image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
    || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
  • image p192fig05.06 Bottom-up and top-down circuits between the LGN and cortical area V1. The top-down circuits obey the ART Matching Rule for matching with bottom-up input patterns and focussing attention on expected critical features.
    || Model V1-LGN circuits, version [1, 2]. retina -> LGN relay cells -> interneurons -> cortex [simple, endstopped] cells -> cortex complex cells
  • image p200fig05.13 Instar and outstar learning are often used to learn the adaptive weights in the bottom-up filters and top-down expectations that occur in ART. The ART Matching Rule for object attention enables top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features.
    || Expectations focus attention: feature pattern (STM), Bottom-Up adaptive filter (LTM), Category (STM), competition, Top-Down expectation (LTM); ART Matching Rule: STM before top-down matching, STM after top-down matching (attention!)
  • image p207fig05.19 The ART hypothesis testing and learning cycle. See the text for details about how the attentional system and orienting system interact in order to incorporate learning of novel categories into the corpus of already learned categories without causing catastophic forgetting.
    ||
  • image p211fig05.21 Sequences of P120, N200, and P300 event-related potentials occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    || ERP support for mismatch-mediated reset: event-related potentials: human scalp potentials. ART predicted correlated sequences of P120-N200-P300 Event Related Potentials during oddball learning. P120 mismatch; N200 arousal/novelty; P300 STM reset. Confirmed in (Banquet and Grossberg 1987)
  • image p215fig05.28 How a mismatch between bottom-up and top-down input patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level.
    || Mismatch -> inhibition -> arousal -> reset. BU input orienting arousal, BU+TD mismatch arousal and reset. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
  • image p221fig05.31 A system like Fuzzy ARTMAP can learn to associate learned categories in one ART network with learned categories in a second ART network. Because both bottom-up and top-down interactions occur in both networks, a bottom-up input pattern to the first ART network can learn to generate a top-down output pattern from the second ART network.
    || Fuzzy ARTMAP. Match tracking realizes minimax learning principle: vigilence increases to just above the match ratio of prototype / exemplar, thereby triggering search
  • image p226fig05.35 I had shown in 1976 how a competitive learning or self-organizing map model could undergo catastrophic forgetting if the input environment was sufficiently dense and nonstationary, as illustrated by Figure 5.18. Later work with Gail Carpenter showed how, if the ART Matching Rule was shut off, repeating just four input patterns in the correct order could also casue catastrophic forgetting by causing superset recoding, as illustrated in Figure 5.36.
    || Code instability input sequences. D C A; B A; B C = ; |D|<|B|<|C|; where |E| is the number of features in the set E. Any set of input vectors that satisfy the above conditions will lead to unstable coding if they are periodically presented in the order ABCAD and the top-down ART Matching Rule is shut off.
  • image p226fig05.36 Column (a) shows catastrophic forgetting when the ART Matching Rule is not operative. It is due to superset recoding. Column (b) shows how category learning quickly stabilizes when the ART Martching Rule is restored.
    || Stabel and unstable learning, superset recoding
  • image p228fig05.37 A macrocircuit of the neurotrophic Spectrally Timed ART, or nSTART, model. I developed nSTART with my PhD student Daniel Franklin. It proposes how adaptively timed learning in the hippocampus, bolstered by Brain Derived Neurotrophic Factor, or BDNF, helps to ensure normal memory consolidation.
    || habituative gates, CS, US, Thalamus (sensory cortex, category learning, conditioned reinforcer learning, adaptively time learning and BDNF), Amygdala (incentive motivation learning), Hippocampus (BDNF), Prefrontal Cortex (attention), Pontine nuclei, Cerebellum (adaptively timed motor learning)
  • image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
    || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
  • image p240fig05.44 When an algebraic exemplar model is realized using only local computations, it starts looking like an ART prototype model.
    || How does the model know which exemplars are in category A? BU-TD learning. How does a NOVEL test item access category A?
  • image p241fig05.45 The 5-4 category structure is one example of how an ART network learns the same kinds of categories as human learners. See the text for details.
    || 5-4 Category structure. A1-A5: closer to the (1 1 1 1) prototype; B1-B4: closer to the (0 0 0 0) prototype
  • image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
    ||
  • image p254fig06.03 These interactions of the ARTSCAN Search model enable it to learn to recognize and name invariant object categories. Interactions between spatial attention in the Where cortical stream, via surface-shroud resonances, and object attention in the What cortical stream, that obeys the ART Matching Rule, coordinate these learning, recognition, and naming processes.
    || Retinal image -> On & OFF cell contrast normalization (retina/LGN) -> polarity [, in]sensitive contrast enhancement (V1) -> object [boundary (V2), surface (V2/V4)] -> surface contour (V2); What stream categories: volition control (BG). object boundary (V2) <-> view (ITp) <-> view integrator (ITp) <-> object (ITa) <-> [object-value (ORB), value (Amyg)] <-> name (PFC)
  • image p315fig08.35 The output signals from the directional grouping network obey the ART Matching Rule. They thereby select consistent motion directional signals while suppressing inconsistent ones, and do not distort that the spared cells code. The aperture problem is hereby solved by the same mechanism that dynamically stabilizes the learning of directional grouping cells.
    || How to select correct direction and preserve speed estimates? Prediction: Feedback from MSTv to MT- obeys ART Matching Rule; Top-down, modulatory on-center, off-surround network (Grossberg 1976, 1980; Carpenter, Grossberg 1987, 1991); Explains how directional grouping network can stably develop and how top-down directional attention can work. (Cavanough 1992; Goner etal 1986; Sekuer, Ball 1977; Stelmach etal 1994). Directional grouping network (MSTv) <-> Directional long-range filter (MT). Modulatory on-center selects chosen direction and preserves speed. Off-surround inhibits incompatible directions.
  • image p316fig08.36 How the directional grouping network, notably properties of the ART Matching Rule, enables a small set of amplified feature tracking signals at the ends of a line to select consistent directions in the line interior, while suppressing inconsistent directions.
    || Motion capture by directional grouping feedback. Directional grouping network (MSTv) <-> Directional long-range filter (MT). It takes longer to capture ambiguous motion signals in the line interior as the length of the line increases cf (Castet etal 1993)
  • image p354fig10.01 The lamiar cortical circuit that realizes how we pay attention to an object sends signals from layer 6 of a higher cortical level to layer 6 of a lower cortical level and then back up to layer 4. This "folded feedback" circuit realizes a top-down, modulatory on-center, off-surround circuit that realizes the ART Matching Rule.
    || Top-down attention and folded feedback. Attentional signals also feed back into 6-to-4 on-center off-surround. 1-to-5-to-6 feedback path: Macaque (Lund, Booth 1975) cat (Gilbert, Wiesel 1979). V2-to-V1 feedback is on-center off-surround and affects layer 6 of V1 the most (Bullier etal 1996; Sandell, Schiller 1982). Attended stimuli enhanced, ignored stimuli suppressed. This circuit supports the predicted ART Matching Rule! [LGN, V[1,2][6->1]]
  • image p360fig10.08 The bottom-up on-center off-surround from LGN-to-6-to-4 has a modulatory on-center because of its role in realizing the ART Matching Rule and, with it, the ability of the cortex to dynamically stabilize its learned memories.
    || Modulation of priming by 6-to-4 on-center (Stratford etal 1996; Callaway 1998). On-center 6-to-4 excitation is inhibited down to being modulatory (priming, subthreshold). On-center 6-to-4 excitation cannot activate layer 4 on its own. Clarifies need for direct path. Prediction: plays key role in stable grouping, development and learning. ART Matching Rule!
  • image p419fig12.17 The auditory continuity illusion illustrates the ART Matching Rule at the level of auditory streaming. Its "backwards in time" effect of future context on past conscious perception is a signature of resonance.
    || Auditory continuity illusion. input, percept. Backwards in time - How does a future sound let past sound continue through noise? Resonance! - It takes a while to kick in. After it starts, a future tone can maintain it much more quickly. Why does this not happen if there is no noise? - ART Matching Rule! TD harmonic filter is maodulatory without BU input. It cannot create something out of nothing.
  • image p420fig12.18 The ARTSTREAM model explains and simulates the auditory continuity illusion as an example of a spectral-pitch resonance. Interactions of ART Matching Rule and asymmetric competition mechanisms in cortical strip maps explain how the tone selects the consistent frequency from the noise in its own stream while separating the rest of the noise into another stream.
    || ARTSTREAM model (Grossberg 1999; Grossberg, Govindarajan, Wyse, Cohen 2004). SPINET. Frequency and pitch strips. Bottom Up (BU) harmonic sieve. Top Down (TD) harmonic ART matching. Exclusive allocation. Learn pitch categories based on early harmonic processing. A stream is a Spectral-Pitch Resonance!
  • image p473fig12.69 Error rate and mean reaction time (RT) data from the lexical decision experiments of (Schvanevelt, McDonald 1981). ART Matching Rule properties explain these data in (Grossberg, Stone 1986).
    || (left) Error rate vs type of prime [R, N, U], [non,] word. (right) Mean RT (msec) vs type of prime [R, N, U], [non,] word.
  • image p483fig13.02 The object-value categories in the orbitofrontal cortex require converging specific inputs from the sensory cortex and nonspecific incentive motivational inputs from the amygdala in order to fire. When the orbitofrontal cortex fires, it can deliver top-down ART Matching Rule priming signals to the sensory cortical area by which it was activated, thereby helping to choose the active recognition categories there that have the most emotional support, while suppressing others, leading to attentional blocking of irrelevant cues.
    || Cognitive-Emotional-Motor (CogEM) model. Drive-> amygdala incentive motivational learning-> orbitofrontal cortex- need converging cue and incentive iputs to fire <-> sensory cortex- conditioned renforcer learning-> amygdala. CS-> sensory cortex. Motivated attention closes the cognitive-emotional feedback loop, focuses on relevant cues, and causes blocking of irrelevant cues.
  • image p548fig15.16 Homologous recognition learning and reinforcement learning macrocicuits enable adaptively timed conditioning in the reinforcement learning circuit to increase inhibition of the orienting system at times when a mismatch in the recognition system would have reduced inhibition of it.
    || Homolog between ART and CogEM model, complementary systems. [Recognition, Reinforcement] learning vs [Attentional, Orienting] system. Reinforcement: timing, drive representation.
  • image p600fig16.36 The entorhinal-hipppocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories. See the text for details.
    || Entorhinal-hippocampal interactions as an ART system. Hippocampal place cells as spatial categories. Angular head velocity-> head direction cells-> stripe cells- small scale 1D periodic code (ECIII) SOM-> grid cells- small scale 2D periodic code (ECII) SOM-> place cells- larger scale spatial map (DG/CA3)-> place cells (CA1)-> conjunctive-coding cells (EC V/VI)-> top-down feedback back to stripe cells- small scale 1D periodic code (ECIII). stripe cells- small scale 1D periodic code (ECIII)-> place cells (CA1).
  • image p613fig16.43 The main visual form and motion processing stream mechanisms of SOVEREIGN, many of them described at length in previous chapters.
    || Render 3-D scene (R3DS), figure-ground separation (FGS), log-polar transform (LPT), Gaussian coarse-coding (GCC), Invariant visual target map (IVTM), What Fuzzy ART (WhatFuzz), body spatial coordinates (BSC), where reactive visual TPV storage (WRVTS), Directional transient cell network (DTCN), Motion direction hemifild map (MDHM), Hemifiled left/right scoring (HLRS), reactive visual control signal (RVCS), Parvo/Magno/Erg competition (PMEC), Approach and Orient GOp (AOGp), GOm (GOm). R3DS [parvo-> FGS, magno-> DTCN], FGS-> [LPT, WRVTS], LPT-> GCC-> IVTM-> WhatFuzz, BSC-> [RVTS, PMEC], PMEC-> [gateRVTS-> RVTS, gateRVCS-> RVCS], DTCN-> MDHM-> HLRS, HLRS-> [PMEC, RVCS], AOGp-> gateRVTS, GOm-> gateRVCS.
  • p190 Howell: [neural microcircuits, modal architectures] used in ART -
    bottom-up filterstop-down expectationspurpose
    instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
    LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
    EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
    auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
    visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
    red - cognitive-emotional dynamics
    green - working memory dynamics
    black - see [bottom-up, top-down] lists
    EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

    backgound colours in the table signify :
    whitegeneral microcircuit : a possible component of ART architecture
    lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
    light bluepost-perceptual cognition?
    pink"the feeling of what happens" and knowing what event caused that feeling
  • image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
    || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
  • image p174fig04.52 An example of how the 3D LAMINART model can transform the two monocular images of the random dot stereogream in the top row into the three depth-separated surfaces representations in the bottom row.
    || Stereogram surface percepts: surface lightnesses are segregated in depth (Fang and Grossberg 2009). [left, right] inputs, [far, fixation, near] planes. Contrast algorithms that just compute disparity matches and let computer code build the surface eg (Marr, Poggio, etal 1974).
  • image p182fig04.58 LAMINART model processing stage that are sufficient to explain many percepts of transparency, including those summarized in Figure 4.57.
    || [left, right] eye, [LGN, V1 [6, 4, 3B, 2/3 A], V2 [4, 2/3]], [mo, bi]nocular cart [simple, complex] cells, [excita, inhibi]tory cart [connection, cell]s.
  • image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
    || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
  • image p231fig05.39 The SMART hypothesis testing and learning cycle predicts that vigilance increases when a mismatch in subcortical regions like the nonspecific thanlamus activates the nucleus basalis of Meynert which, in turn, broadcasts a burst of the neurotransmitter acetylcholine, or ACh, to deeper cortical layers. Due to the way in which LAMINART proposes that cortical matching and mismatching occurs, this ACh burst can increase vigilance and thereby trigger a memory search. See the text for details.
    || [BU input, [, non]specific thalamic nucleus, thalamic reticulate nucleus, neocortical laminar circuit] cart [Arousal, Reset, Search, Vigilance]
  • image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
    ||
  • image p248fig05.49 This circuit of the LAMINART model helps to explain properties of Up and Down states during slow wave sleep, and how disturbances in ACh dynamics can disrupt them.
    ||
  • image p375fig11.07 The 3D LAMINART model uses both monocular and binocular simple cells to binocularly fuse like image contrasts. The remainder of the model generates 3D boundary and surface representations of multiple kinds of experiments as well as of natural scenes.
    || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A], V2 thin stripe [4->2/3A], V4]. V1 blob [V1-4 monocular, V1 interior binocular] simple cells. [complex, simple, inhibitory] cells, on-center off-surround
  • image p376fig11.10 The 3D LAMINART model shows how the disparity filter can be integrated into the circuit that completes 3D boundary representations using bipole grouping cells. It also explains how surface contours can strengthen boundaries that succeed in generating closed filling-in domains.
    || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A] surface contour, V2 thin stripe (monocular surface) [4->2/3A], V2 interior [disynaptic inhibitory interneurons, bipole grouping cells, disparity filter, V4 binocular surface]. [complex, simple, inhibitory] cells, on-center off-surround
  • image p378fig11.12 How monocular and binocular information are combined in V1 and V2 in the 3D LAMINART model.
    || Model utilizes monocular information. [left, right] eye cart V1-[4 monocular simple, 3B binocular simple, complex2/3A [mo,bi]nocular] cells, V2-4 binocular complex cells. black = monocular cells, blue = binocular cells. In V2, monocular inputs add to binocular inputs along the line of sight and contribute to depth perception.
  • image p379fig11.13 How the 3D LAMINART model explains DaVinci stereopsis. All the stages of boundary and surface formation are color coded to clarify their explanation. Although each mechanism is very simple, when all of them act together, the correct depthful surface representation is generated. See the text for details.
    || DaVinci stereopsis (Nakayama, Shimojo 1990). An emergent property of the previous simple mechanisms working together. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] pair [Binocular match: boundaries of thick bar -> Add monocular boundaries along lines-of-sight -> Line-of-sight inhibition kills weaker vertical boundaries -> 3D surface percept not just a disparity match!] pair [Binocular match: right edge of thin and thick bars -> Strongest boundaries: binocular and monocular boundaries add -> Vertical boundaries from monocular left edge of thin bar survive -> Filling-in contained by connected boundaries]. cart [very near, near, fixation plane, far, very far]
  • image p380fig11.14 The model explanation of DaVinci stereopsis when the input stimuli have opposite contrast polarities.
    || Polarity-reversed Da Vinci stereopsis (Nakayama, Shimojo 1990). Same explanation! (... as Figure 11.13 ...) [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p395fig11.34 A comparison of the properties of other rivalry models with those of the 3D LAMINART model (surrounded by red border). Significantly, only 3D LAMINART explains both stable vision and rivalry (green border).
    || Comparison of rivalry models
  • image p401fig11.41 The 3D LAMINART model proposes how angle cells and disparity-gradient interact through learning to generate 3D representations of slanted objects.
    || 3D LAMINART model. [LGN, V1, V2, V4] Four key additions: 1. Angle cells - tuned to various angles; 2. Disparity-gradient cells - tuned to disparity gradients in the image; 3. weights from [angle to disparity-gradient] cells - learned while viewing 3D image; Colinear grouping between [angle to disparity-gradient] cells - disambiguates ambiguous groupings.
  • image p402fig11.44 3D scenic reconstruction of the image in Figure 11.43 by the 3D LAMINART model.
    || Disparity [5, 6, 8, 10, 11, 14]: images of objects in common depth planes
  • image p445fig12.43 The LIST PARSE laminar cortical Cognitive Working Memory circuit, that is proposed to occur in ventrolateral prefrontal cortex, is homologous to the LAMINART circuit circuit that models aspects of how visual cortex sees. The Motor Working Memory, VITE Trajectory Generator, and Variable-Rate Volitional Control circuits model how other brain regions, including dorsolateral prefrontal cortex, motor cortex, cerebellum, and basal ganglia, interact with the Cognitive Working Memory to control working memory storage and variable-rate performance of item sequences.
    || List parse circuit diagram. Connectivity convention. sequence chunks [<- BU filter, TD expectation ->] working memory. Working memory and sequence chunking circuit is homologous to visual LAMINART circuit!
  • image p498fig13.22 (left column, top row) Adaptive filtering and conditioned arousal are both needed to regulate what cues can learn to activate particular space-time patterns. These developments lead inexorably to basic cognitive abilities, as embodied in the 3D LAMINART models for 3D vision and figure-ground perception (Chapter 11) and the 3D ARTSCAN SEARCH model for invariant object learning, recognition, and 3D search (Chapter 6). (right column, top row) Conditioned arousal enables only emotionally important cues to activate a motivationally relevant space-time pattern. (bottom row) Conditioned arousal and drive representations arise naturally from the unlumping of avalanche circuits to make them selective to motivationally important cues. The MOTIVATOR model is a natural outcome of this unlumping process (this chapter).
    || (top) Adaptive filtering and Conditioned arousal. Towards Cognition: need to filter inputs to the command cell. Towards Emotion: important signals turn arousal ON and OFF. (bottom) Conditioned arousal and Drive representations. Competition between conditioned arousal sources at drive representations, eg amygdala.
  • image p228fig05.37 A macrocircuit of the neurotrophic Spectrally Timed ART, or nSTART, model. I developed nSTART with my PhD student Daniel Franklin. It proposes how adaptively timed learning in the hippocampus, bolstered by Brain Derived Neurotrophic Factor, or BDNF, helps to ensure normal memory consolidation.
    || habituative gates, CS, US, Thalamus (sensory cortex, category learning, conditioned reinforcer learning, adaptively time learning and BDNF), Amygdala (incentive motivation learning), Hippocampus (BDNF), Prefrontal Cortex (attention), Pontine nuclei, Cerebellum (adaptively timed motor learning)
  • image p541fig15.02 The neurotrophic Spectrally Timed Adaptive Resonance Theory, or nSTART, model of (Franklin, Grossberg 2017) includes hippocampus to enable adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between CS and US.
    || Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. [CS, US] -> Sensory Cortex (SC) <- motivational attention <-> category learning -> Prefrontal Cortex (PFC). SC conditioned reinforcement learning-> Amygdala (cannot bridge the temporal gap) incentive motivational learning-> PFC. SC adaptively timer learning and BDNF-> Hippocampus (can bridge the temporal gap) BDNF-> PFC. PFC adaptively timed motor learning-> cerebellum.
  • p190 Howell: [neural microcircuits, modal architectures] used in ART -
    bottom-up filterstop-down expectationspurpose
    instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
    LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
    EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
    auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
    visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
    red - cognitive-emotional dynamics
    green - working memory dynamics
    black - see [bottom-up, top-down] lists
    EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

    backgound colours in the table signify :
    whitegeneral microcircuit : a possible component of ART architecture
    lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
    light bluepost-perceptual cognition?
    pink"the feeling of what happens" and knowing what event caused that feeling
  • image p221fig05.31 A system like Fuzzy ARTMAP can learn to associate learned categories in one ART network with learned categories in a second ART network. Because both bottom-up and top-down interactions occur in both networks, a bottom-up input pattern to the first ART network can learn to generate a top-down output pattern from the second ART network.
    || Fuzzy ARTMAP. Match tracking realizes minimax learning principle: vigilence increases to just above the match ratio of prototype / exemplar, thereby triggering search
  • image p225fig05.33 Some early ARTMAP benchmark studies. These successes led to the use of ARTMAP, and many variants that we and other groups have developed, in many large-scale applications in engineering and technology that has not abated even today.
    || see Early ARTMAP benchmark studies
  • image p225fig05.34 ARTMAP was successfully used to learn maps of natural terrains with many advantages over those of mapping projects that used AI expert systems. The advantages are so great that many mapping projects started to use this technology.
    || AI expert system - 1 year: field identification of natural regions; derivation of ad hoc rules for each region by expert geographers; correct 80,000 of 250,000 site labels; 230m (site-level) scale. ARTMAP system - 1 day: rapid, automatic, no natural regions or rules; confidence map; 30m (pixel-level) scale can see roads; equal accuracy at test sites
  • image p242fig05.46 Computer simulations of how two variants of Distributed ARTMAP incrementally learn the 5-4 category structure. See the text for details.
    || Distributed ARTMAP with [self-supervised learning, post-training LTM noise]
  • image p436fig12.30 The conscious ARTWORD, or cARTWORD, laminar cortical speech model simulates how future context can disambiguate noisy past speech sounds in such a way that the completed percept is consciously heard to proceed from past to future as a feature-item-list resonant wave propagates through time.
    || cARTWORD: Laminar cortical model macrocircuit (Grossberg, Kazerounian 2011) Simulates PHONEMIC RESTORATION: Cognitive Working Memory (processed item sequences) - [Excitatory-> inhibitory-> habituative-> adaptive filter-> adaptive filter-> adaptive filter with depletable synapse-> Acoustic [item, feature]
  • image p453fig12.49 The ARTWORD model that I published in 2000 with my PhD student Christopher Myers simulates data such as the (Repp etal 1978) data in Figure 12.68. See the text for details.
    || ARTWORD model (Grossberg, Myers 2000). Input phonetic features-> Phonemic item working memory-> Masking Field unitized lists-> Automatic gain control-> Phonemic item working memory. [habituative gate, adaptive filter]s.
  • image p453fig12.50 The ARTWORD perception cycle shows how sequences of items activate possible list chunks, which compete among each other and begin to send their top-down expectations back to the item working memory. An item-list resonance develops through time as a result.
    || ARTWORD perception cycle. (a) bottom-up activation (b) list chunk competition (c) item-list resonance (d) chunk reset due to habituative collapse.
  • image p455fig12.52 Simulation of cARTWORD dynamics in response to the complete list /1/-/2/-/3/. The relevant responses are surrounded by a red box.
    || Presentation of a normal sequence: input /1/-/2/-/3/. |c(i,1)-5| vs time (msec). List chunks select most predictive code. Order stored in WM layers. Resonant activity of /1/-/2/-/3/ in item and feature layers corresponds to conscious speech percept.
  • image p456fig12.53 Simulation of cARTWORD dynamics in response to the partial list /1/-silence-/3/ with /2/ replaced by silence. Only the representations of these items can be seen in the red box.
    || Presentation with silence duration: input /1/-silence-/3/. |c(i,1)-5| vs time (msec). List chunks select most predictive code. Order stored in WM layers. Gap in resonant activity of /1/-silence-/3/ in item and feature layers corresponds to perceived silence.
  • image p456fig12.54 Item /2/ is restored in the correct list position in response to the list /1/-noise-/3/.
    || Presentation with noise: input /1/-noise-/3/. |c(i,1)-5| vs time (msec). List chunks select the most predictive code. Order restored in WM layers. Resonant activity of /1/-/2/-/3/ in item and feature layers corresponds to restoration of item /2/ replaced by noise in input.
  • image p457fig12.55 Item /4/ is restored in the correct list position in response to the list /1/-noise-/5/. This and the previous figure show how future context can disambiguate past noisy sequences that are otherwise identical.
    || Presentation with noise: input /1/-noise-/5/. |c(i,1)-5| vs time (msec). List chunks select the most predictive code. Order restored in WM layers. Resonant activity of /1/-/4/-/3/ in item and feature layers corresponds to restoration of item /4/ replaced by noise in input.
  • image p444fig12.42 The LIST PARSE laminar cortical model of working memory and list chunking that I published with Lance Pearson in 2008 simulated the Averbeck etal data in Figure 12.41, as in the left column of the figure. It also simulated cognitive data about working memory storage by human subjects. See the text for details.
    || LIST PARSE: Laminar cortical model of working memory and list chunking (Grossberg, Pearson 2008). Simulates data about: [immediate, delayed, continuous] distractor free recall; immediate serial recall; and variable-speed sequential perfomance of motor acts. [velocity, acceleration] vs time (ms) from recall cue.
  • image p445fig12.43 The LIST PARSE laminar cortical Cognitive Working Memory circuit, that is proposed to occur in ventrolateral prefrontal cortex, is homologous to the LAMINART circuit circuit that models aspects of how visual cortex sees. The Motor Working Memory, VITE Trajectory Generator, and Variable-Rate Volitional Control circuits model how other brain regions, including dorsolateral prefrontal cortex, motor cortex, cerebellum, and basal ganglia, interact with the Cognitive Working Memory to control working memory storage and variable-rate performance of item sequences.
    || List parse circuit diagram. Connectivity convention. sequence chunks [<- BU filter, TD expectation ->] working memory. Working memory and sequence chunking circuit is homologous to visual LAMINART circuit!
  • image p446fig12.44 (left column, top row) LIST PARSE can model linguistic data from human subjects. In this figure, model parameters are fixed to enable a close fit to data about error-type distributions in immediate free recall experiments, notably transposition errors. (right column, top row) Simulation and data showing bowing of the serial position curve, including an extended primacy gradient. (left column, bottom row) The simulation curve overlays data about list length effects, notably the increasing recall difficulty of longer lists during immediate serial recall (ISR). (right column, bottom row) Simulation (bottom image) and data (top image) of the limited temporal extent for recall.
    || (1. TL) Error-type distributions in immediate serial recall (Hanson etal 1996). % occurence vs serial position. Graph convention: Data- dashed lines; Simulations- solid lines. Six letter visual ISR. Order errors- transpositions of neighboring items are the most common. Model explanation: Noisy activation levels change relative oorder in primacy gradient. Similar activation of neighboring items most susceptible to noise. Model paramenters fitted on these data. (2. TR) Bowing of serial position curve (Cowan etal 1999). % correct vs serial position. Auditory ISR with various list lengths (graphs shifted rightward): For [, sub-]span lists- extended primacy, with one (or two) item recency; Auditory presentation- enhanced performance for last items. LIST PARSE: End effects- first and last items half as many members; Echoic memory- last presented item retained in separate store. (3. BL) List length effects, circles (Crannell, Parrish 1968), squares (Baddeley, Hitch 1975), solid line- simulation. % list correct vs list length. Variable list length ISR: longer lists are more difficult to recall. LIST PARSE: More items- closer activation levels and lower absolute activity level with enough inputs; Noise is more likely to produce order errors, Activity levels more likely to drop below threshold;. (4. BR) Limited temporal extent for recall (Murdoch 1961). % recalled vs retention interval (s). ISR task with distractor-filled retention intervals (to prevent rehersal): Increasing retention interval - decreases probabilty of recalling list correctly; Load dependence- longer list more affected by delays; Performance plateau- subjects reach apparent asymptote. LIST PARSE: Increase convergence of activities with time; loss of order information;.
  • image p447fig12.45 (left column) LIST PARSE simulations of the proportion of order errors as a function of serial position for 6 item lists with (a) an extended pause of 7 time units between the third and fourth items, and (b) pauses of 5 time units (solid curve) and 10 time units (dashed curve) between all items. (right column) Simulations (solid curves) and data (dashed curves) illustrating close model fits in various immediate free recall tasks.
    || (Left) Temporal grouping and presentation variability. Temporal grouping: Inserting an extended pause leads to inter-group bowing; Significantly different times of integration and activity levels across pause, fewer interchanges. (Right) Immediate free recall, and [delayed, continuous] distractor-free recall. Overt rehersal IFR task with super-span (ie 20 item) lists: Extended recency- even more extended with shorter ISIs; Increased probability of recall with diminished time from last rehearsal; Early items in list rehearsed most;. LIST PARSE (unique) for long lists: Incoming items from recency gradient; Rehearsal (re-presentation) based upon level of activity.
  • image p254fig06.03 These interactions of the ARTSCAN Search model enable it to learn to recognize and name invariant object categories. Interactions between spatial attention in the Where cortical stream, via surface-shroud resonances, and object attention in the What cortical stream, that obeys the ART Matching Rule, coordinate these learning, recognition, and naming processes.
    || Retinal image -> On & OFF cell contrast normalization (retina/LGN) -> polarity [, in]sensitive contrast enhancement (V1) -> object [boundary (V2), surface (V2/V4)] -> surface contour (V2); What stream categories: volition control (BG). object boundary (V2) <-> view (ITp) <-> view integrator (ITp) <-> object (ITa) <-> [object-value (ORB), value (Amyg)] <-> name (PFC)
  • image p255fig06.04 The ARTSCAN Search model can also search for a desired target object in a scene, thereby clarifying how our brains solve the Where's Waldo problem.
    || similar ilustration as Figure 06.03, with some changes to arrows
  • image p259fig06.08 The distributed ARTSCAN, or dARTSCAN, model includes spatial attention in both PPC and PFC, and both fast-acting attention, triggered by transient cells in Where cortical areas such as MT, and slower-acting surface-shroud resonances in What cortical areas such as V4 and PPC. See the text for details.
    || dARTSCN spatial attention hierarchy, Fast (Where stream) Slow (What stream) (Foley, Grossberg, and Mingolia 2012). [transient cells (MT) ->, object surfaces (V4) <->] [object shrouds (PPC) <-> spatial shrouds (PPC/PFC)]
  • image p271fig06.17 Persistent activity in IT cells is just what is needed to enable view-invariant object category learning by ARTSCAN to be generalized to [view, position, size]-invariant category learning by positional ARTSCAN, or pARTSCAN. See the text for details.
    || Persistent activity in IT. Physiological data show that persistent activity exist in IT (Fuester and Jervey 1981, Miyashita and Chang 1988, Tomita etal 1999). Adapted from (Tomita etal 1999 Nature)
  • image p272fig06.18 The pARTSCAN model can learn [view, position, size]-invariant categories by adding view category integrator cells that have the properties of persistent neurons in IT. These integrator cells get reset with the invariant object category, not the view category.
    || pARTSCAN: positionally-invariant object learning. (Cao, Grossberg, Markowitz 2011). IT cells with persistent activities are modeled by view category integrators in ITp. View-specific category cells are RESET as the eyes move within the object. View category integrator cells are NOT RESET when the view-specific category is reset. They are RESET along with invariant object category cells when a spatial attention shift occurs.
  • image p273fig06.20 pARTSCAN can simulate the IT cell recoding that Li and DiCarlo reported in their swapping experiments because the swapping procedure happens without causing a parietal reset burst to occur. Thus the originally activated invariant category remains activated and can get associated with the swapped object features.
    || Simulation of Li and DiCarlo swapping data. data (Li and DiCarlo 2008), model (Cao, Grossberg, Markowitz 2011). normalized response vs. exposure (swaps and/or hours)
  • image p274fig06.21 pARTSCAN can also simulate the trade-off in IT cell responses between position invariance and selectivity that was reported by Zoccolan etal 2007. This trade-off limits the amount of position invariance that can be learned by a cortical area like V1 that is constrained by the cortical magnification factor.
    || Trade-off in IT cell response properties. Inferotemporal cortex cells with greater position invariance respond less selectively to natural objects. invariance-tolerance, selectivity-sparseness. data (Zoccolan etal 2007) model (Grossberg, Markowitzm, Cao 2011). position tolerance (PT, degrees) vs sparseness (S)
  • image p274fig06.22 pARTSCAN can simulate how IT cortex processes image morphs, when it learns with high vigilance. See the text for details.
    || Akrami etal simulation: a case of high vigilance. tested on morphs between image pairs
  • image p498fig13.22 (left column, top row) Adaptive filtering and conditioned arousal are both needed to regulate what cues can learn to activate particular space-time patterns. These developments lead inexorably to basic cognitive abilities, as embodied in the 3D LAMINART models for 3D vision and figure-ground perception (Chapter 11) and the 3D ARTSCAN SEARCH model for invariant object learning, recognition, and 3D search (Chapter 6). (right column, top row) Conditioned arousal enables only emotionally important cues to activate a motivationally relevant space-time pattern. (bottom row) Conditioned arousal and drive representations arise naturally from the unlumping of avalanche circuits to make them selective to motivationally important cues. The MOTIVATOR model is a natural outcome of this unlumping process (this chapter).
    || (top) Adaptive filtering and Conditioned arousal. Towards Cognition: need to filter inputs to the command cell. Towards Emotion: important signals turn arousal ON and OFF. (bottom) Conditioned arousal and Drive representations. Competition between conditioned arousal sources at drive representations, eg amygdala.
  • image p531fig14.06 Classification of scenic properties as texture categories by the ARTSCENE model. See the text for details.
    || Image-> Feature extraction (texture principal component rankings)-> Learning feature-to-scene mapping (texture category principal component rankings)<- scene class. Large-to-small attentional shrouds as principle component higher.
  • image p531fig14.07 Voting in the ARTSCENE model achieves even better prediction of scene type. See the text for details.
    || Image-> Feature extraction (texture principal component rankings)-> Learning feature-to-scene mapping (texture category principal component rankings)-> evidence accumulation (sum)-> scene class winner-take-all inference. Large-to-small attentional shrouds as principle component higher.
  • image p532fig14.08 Macrocircuit of the ARTSCENE Search neural model for learning to search for desired objects by using the sequences of already experienced objects and their locations to predict what and where the desired object is. V1 = First visual area or primary visual cortex; V2 = Second visual area; V4 = Fourth visual area; PPC = Posterior Parietal Cortex; ITp = posterior InferoTemporal cortex; ITa = anterior InferoTemporal cortex; MTL = Medial Temporal Lobe; PHC = ParaHippoCampal cortex; PRC = PeriRhinal Cortex; PFC = PreFrontal Cortex; DLPFC = DorsoLateral PreFrontal Cortex; VPFC = Ventral PFC; SC = Superior Colliculus.
    ||
  • image p533fig14.09 Search data and ARTSCENE Search simulations of them in each pair of images from (A) to (F). See the text for details.
    || 6*[data vs simulation], [Response time (ms) versus epoch].
  • image p214fig05.26 When a big enough mismatch occurs, the orienting system is activated and sends a burst of nonspecific arousal to the category level. This Mismatch Detector has properties of the N200 ERP.
    || Mismatch triggers nonspecific arousal. Mismatch at F1 eleicits a nonspecific event at F2. Call this event nonspecific arousal. N200 ERP Naatanen etal: 1. bottom-up, 2. unconditionable, 3. nonspecific, 4. mismatch
  • image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
    || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
  • image p231fig05.39 The SMART hypothesis testing and learning cycle predicts that vigilance increases when a mismatch in subcortical regions like the nonspecific thanlamus activates the nucleus basalis of Meynert which, in turn, broadcasts a burst of the neurotransmitter acetylcholine, or ACh, to deeper cortical layers. Due to the way in which LAMINART proposes that cortical matching and mismatching occurs, this ACh burst can increase vigilance and thereby trigger a memory search. See the text for details.
    || [BU input, [, non]specific thalamic nucleus, thalamic reticulate nucleus, neocortical laminar circuit] cart [Arousal, Reset, Search, Vigilance]
  • image p232fig05.40 Computer simulation of how the SMART mode generates (a) gamma oscillations if a good enough match occurs, or (c) beta oscillations if a bad enough match occurs. See the text for details.
    || Brain oscillations during match/mismatch, data, simulation. (a) TD corticothalamic feedback increases synchrony (Sillito etal 1994) (b) Match increases γ oscillations (c) Mismatch increases θ,β oscillations
  • image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
    || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
  • Grossberg 2021 p229c2h0.60 SMART computer simulations demonstrate that a good enough match of a top-down expectation with a bottom-up feature pattern generates an attentive resonance during which the spikes of active cells synchronize in the gamma frequency range of 20-70 Hz (Figure 5.40). Many labs have reported a link between attention and gamma oscillations in the brain, including two articles published in 2001, one from the laboratory of Robert Desimone when he was at the the National Institute of Mental Health in Bethseda (Fries, Reynolds, Rorie, Desimone 2001), and the other from the laboratory of Wolf Singer in Frankfurt (Engel, Fries, Singer 2001). You'll note that Pascal Fries participated in both studies, and is an acknowledged leader in neurobiological studies of gamma oscillations; eg (Fries 2009). .."
  • image p468fig12.65 Linguistic properties of the PHONET model and some of the data that it simulates. The upper left image summarizes the asymmetric transient-to-sustained gain control that helps to create invariant intraword ratios during variable-rate speech. The lower left image summarizes the rate-dependent gain control of the ARTPHONE model that creates rate-invariant working memory representations in response to sequences of variable-rate speech. The right image summarizes the kind of paradoxical VC-CV category boundary data of (Repp 1980) that ARTPHONE simulates. See the text for details.
    || (left upper) [transient, sustained] [working memory, filter, category]. (left lower) phone inputs-> [input rate estimate, features], Features w <- habituative transmitter gates -> categories-> rate invariant phonetic output, input rate estimate-> gain control-> [features, categories] rate-dependent integration of categories and features. (right) % 2-stop vs VC-CV silent interval (msec): [ib-ga, ib-ba, iga, iba].
  • image p420fig12.18 The ARTSTREAM model explains and simulates the auditory continuity illusion as an example of a spectral-pitch resonance. Interactions of ART Matching Rule and asymmetric competition mechanisms in cortical strip maps explain how the tone selects the consistent frequency from the noise in its own stream while separating the rest of the noise into another stream.
    || ARTSTREAM model (Grossberg 1999; Grossberg, Govindarajan, Wyse, Cohen 2004). SPINET. Frequency and pitch strips. Bottom Up (BU) harmonic sieve. Top Down (TD) harmonic ART matching. Exclusive allocation. Learn pitch categories based on early harmonic processing. A stream is a Spectral-Pitch Resonance!
  • image p422fig12.19 The ARTSTREAM model includes mechanisms for deriving streams both from pitch and from source direction. See the text for details.
    || [left, right] cart Peripheral processing = [input signal-> outer & middle ear preemphasis-> basilar membrane gammatone filterbank-> energy measure]. Spectral stream layer-> spectral summation layer-> delays-> [f-, tau] plane-> pitch stream layer-> pitch summation layer.
  • image p423fig12.20 The Spatial Pitch Network, or SPINET, model shows how a log polar spatial representation of the sound frequency spectrum can be derived from auditory signals occuring in time. The spatial representation allows the ARTSTREAM model to compute spatially distinct auditory streams.
    || SPINET model (Spatial Pitch Network) (Cohen, Grossberg, Wyse 1995). 1. input sound 2. Gamma-tone filter bank 3. Shaort-term average energy spectrum 4. MAP transfer function 5. On-center off-surround and rectification 6. Harmonic weighting 7. Harmonic summation and competition -> PITCH
  • image p424fig12.21 One of the many types of data about pitch processing that are simulated by the SPINET model. See the text for details.
    || Pitch shifts with component shifts (Patterson, Wightman 1976; Schouten 1962). Pitch vs lowest harmonic number.
  • image p425fig12.23 ARTSTREAM simulations of the auditory continuity illusion and other streaming properties (left column, top row). When two tones are separated by silence (Input), a percept of silence also separates them in a spectral-pitch resonance. (left column, bottom row). When two tones are separated by broadband noise, the percept of tone continues through the noise in one stream (stream 1) while the remainder of the noise occurs in a different stream (stream 2). (right column) Some of the other streaming properties that have been simulated by the ARTSTREAM model.
    || Auditory continuity does not occur without noise. Auditory continuity in noise. Other simulated streaming data.
  • image p431fig12.27 The strip maps that occur in ARTSTREAM and NormNet are variants of a cortical design that aalso creates ocular dominance columns in the visual cortex.
    || Adult organization of V1 (Grinvald etal http://www.weizmann.ac.il/brain/images/cubes.html). (1) Occular dominance columns (OCDs): Alternating strips of cortex respond preferentially to visual inputs of each eye (R/L corresponds to Right and Left eye inputs in the figure); Orientation columns: A smooth pattern of changing orientation preference within each ODC. Organized in a pinwheel like fashion.
  • p370 Chapter 11 means (Grossberg 2021) page 370, Chapter 11
    p002sec Illusion and realitymeans (Grossberg 2021) page 2, section Illusion and reality
    p013fig01.09means (Grossberg 2021) page 13, Figure 1.09 (1.9 as in book)
    p030tbl01.02 means (Grossberg 2021) page 30, Table 1.02 (1.2 as in book)
    p111c2h0.5means (Grossberg 2021) page 111, column 2, height from top as fraction of page height
    || text...Are notes in addition to [figure, table] captions, mostly comprised of text within the image, but also including quotes of text in the book. Rarely, it includes comments by Howell preceded by "Howell". The latter are distinct from "readers notes" (see, for example : reader Howell notes).
    p044 Howell: grepStr 'conscious' means a comment by reader Howell, extracted using the grep string shown, referring to page 44 in (Grossberg 2021)
  • p001 Chapter 1 Overview - From Complementary Computing and Adaptive Resonance to conscious awareness
  • p250 Chapter 6 Conscious seeing and invariant recognition - Complementary cortical streams coordinate attention for seeing and recognition
  • p539 Chapter 15 Adaptively timed learning - How timed motivation regulates conscious learning and memory consolidation
    p370 Chapter 11 means (Grossberg 2021) page 370, Chapter 11
    p002sec Illusion and realitymeans (Grossberg 2021) page 2, section Illusion and reality
    p013fig01.09means (Grossberg 2021) page 13, Figure 1.09 (1.9 as in book)
    p030tbl01.02 means (Grossberg 2021) page 30, Table 1.02 (1.2 as in book)
    p111c2h0.5means (Grossberg 2021) page 111, column 2, height from top as fraction of page height
    || text...Are notes in addition to [figure, table] captions, mostly comprised of text within the image, but also including quotes of text in the book. Rarely, it includes comments by Howell preceded by "Howell". The latter are distinct from "readers notes" (see, for example : reader Howell notes).
    p044 Howell: grepStr 'conscious' means a comment by reader Howell, extracted using the grep string shown, referring to page 44 in (Grossberg 2021)
  • image p039tbl01.03 The link between consciousness and movement
    ||
    VISUALseeing, knowing, and reaching
    AUDITORYhearing, knowing, and speaking
    EMOTIONALfeeling, knowing, and acting
  • image p042tbl01.04 The six main kinds of resonances which support different kinds of conscious awareness that will be explained and discussed in this book.
    ||
    type of resonancetype of consciousness
    surface-shroudsee visual object or scene
    feature-categoryrecognize visual object or scene
    stream-shroudhear auditory object or stream
    spectral-pitch-and-timbrerecognize auditory object or stream
    item-listrecognize speech and language
    cognitive-emotionalfeel emotion and know its source
  • image p270fig06.16 The same target position signal that can command the next saccade also updates a gain field that predictively maintains the attentional shroud in head-centered coordinates, even before the eye movement is complete. This process keeps the shroud invariant under eye movements, so that it can continue to inhibit reset of an emerging invariant category as t is associated with multiple object views, even while the conscious surface representation shifts with each eye movement in retinotopic coordinates. This pdating process is often called predictive re mapping.
    || Predictive remapping of eye movements! From V3A to LIP. [spatial attention, object attention, figure-ground separation, eye movement remapping, visual search]. (Beauvillaib etal 2005, Carlson-Radvansky 1999, Cavanaugh etal 2001, Fecteau & Munoz 2003, Henderson & Hollingworth 2003, Irwin 1991)
  • image p278fig06.27 A surface-shroud resonance through the Where stream enables us to consciously see an object while a feature-category resonance into the What stream enables us to recognize it. Both kinds of resonances can synchronize via visual cortex so that we can know what an object is when we see it.
    || What kinds of resonances support knowing vs seeing? What stream [knowing, feature-prototype resonance], Where stream [seeing, surface-shroud resonance]
  • image p278fig06.28 If the feature-category resonances cannot form, say due to a lesion in IT, then a surface-shroud resonance can still support conscious seeing of an attended object, and looking at or reaching for it, even if the individual doing so knows nothing about the object, as occurs during visual agnosia. The surface-shroud resonance supports both spatial attention and releases commands that embody the intention to move towards the attended object.
    || What kinds of resonances support knowing vs seeing? visual agnosia: reaching without knowing Patient DF (Goodale etal 1991). Attention and intention both parietal cortical functions (Anderson, Essick, Siegel 1985; Gnadt, Andersen 1988; Synder, Batista, Andersen 1997, 1998)
  • image p355fig10.02 Distinguishing processes of seeing vs knowing has been difficult because they interact so strongly.
    || Seeing vs. Knowing. Seeing and knowing [operate at different levels of the brain, use specialized circuits], but they [interact via feedback, use similar cortical designs, feedback is needed for conscious perception]. Cerebral Cortex: Seeing [V1-V4, MS-MST], Knowing [IT, PFC].
  • image p369fig10.19 Data from (Watanabe etal 2001) showing perceptual learning of the coherent motion direction, despite the lack of extra-foveal attention and awareness of the moving stimuli.
    || Unconscious perceptual learning of motion direction, % correct for two tests, compared to chance level results.
  • image p396fig11.35 Three properties of bipole boundary grouping in V2 can explain how boundaries oscillate in response to rivalry-inducing stimuli. Because all boundaries are invisible, however, these properties are not sufficient to generate a conscious percept of rivalrous surfaces.
    || 3 V2 boundary properties cause binocular rivalry. 1. Bipole grouping, 2. Orientational competition, 3. Actovity-dependent habituation
  • image p419fig12.17 The auditory continuity illusion illustrates the ART Matching Rule at the level of auditory streaming. Its "backwards in time" effect of future context on past conscious perception is a signature of resonance.
    || Auditory continuity illusion. input, percept. Backwards in time - How does a future sound let past sound continue through noise? Resonance! - It takes a while to kick in. After it starts, a future tone can maintain it much more quickly. Why does this not happen if there is no noise? - ART Matching Rule! TD harmonic filter is maodulatory without BU input. It cannot create something out of nothing.
  • image p436fig12.30 The conscious ARTWORD, or cARTWORD, laminar cortical speech model simulates how future context can disambiguate noisy past speech sounds in such a way that the completed percept is consciously heard to proceed from past to future as a feature-item-list resonant wave propagates through time.
    || cARTWORD: Laminar cortical model macrocircuit (Grossberg, Kazerounian 2011) Simulates PHONEMIC RESTORATION: Cognitive Working Memory (processed item sequences) - [Excitatory-> inhibitory-> habituative-> adaptive filter-> adaptive filter-> adaptive filter with depletable synapse-> Acoustic [item, feature]
  • image p455fig12.52 Simulation of cARTWORD dynamics in response to the complete list /1/-/2/-/3/. The relevant responses are surrounded by a red box.
    || Presentation of a normal sequence: input /1/-/2/-/3/. |c(i,1)-5| vs time (msec). List chunks select most predictive code. Order stored in WM layers. Resonant activity of /1/-/2/-/3/ in item and feature layers corresponds to conscious speech percept.
  • image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
    || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
  • Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
    The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

    Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in
    Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

    Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

    These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

    This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

    The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

    I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

    Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

    The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

    If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

    The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

    Howell: bold added for emphasis.
    (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
    see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
  • image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
    || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
  • image p514fig13.44 Analog of the COgEM model in Figure 6.1 of (Damasio 1999).
    || (a) map of object X-> map of proto-self at inaugural instant-> [, map of proto-self modified]-> assembly of second-order map. (b) map of object X enhanced-> second-order map imaged.
  • image p105fig03.23 The pointillist painting A Sunday on la Grande Jatte by Georges Seurat illustrates how we group together both large-scale coherence among the pixels of the painting, as well as forming small groupings around the individual dabs of color.
    ||
  • image p107fig03.25 The Roofs of Collioure by Matisse. See the text for details
    || p107c1h0.6 "... [Matisse] showed how patches of pure color, when laid down properly on a canvas, could be grouped by the brain into emergent boundarues, without the intervention of visible outlines. ... The trick was that these emergent boundaries, being invisible, or amodal, did not darken the colors in the surface representations. In this sense, Matisse intuitively realized that "all boundaries are invisible" through the masterful way in which he arranged his colors on canvas to generate boundaries that could support compelling surface representations. ..."
  • image p108fig03.27 Matisse's painting Open Window, Collioure 1905 combines continuously colored surfaces with color patches that created surface representations using amodal boundaries, as in Figure 3.26. Both kinds of surfaces cooperate to form the final painterly percept.
    ||
  • image p110fig03.32 Claude Monet's painting of Poppies Near Argenteuil. See the text for details.
    || Claude Monet Poppies Near Argenteuil 1873. p110c2h0.35 "... the red poppies and the green field around them are painted to have almost the same luminescence; that is, they are almost equiluminant. As a result, the boundaries between the red and green regions are weak and positionally unstable, thereby facilitating an occasional impression of the poppies moving in a gentle breeze, especially as one's attention wanders over the scene. ...".
  • image p120fig03.43 Four paintings by Monet of the Rouen cathedral under different lighting conditions (top row) and their monochromatic versions (bottom row). See the text for details.
    || p119c2h0.25 "... Monet uses nearby colors that are nearly equiluminant, and sharp, high-contrast luminance defined edges are sparse. He hereby creates weaker boundary signals within and between the parts of many forms, and stronger boundary signals between the forms. This combination facilitates color spreading within the forms and better separation of brightness and collor differences between forms. ... The grayscale versions of these paintings demonstrate the near equiluminance of the brushstrokes within forms, and places in which brightness and color differences significantly influence the groupings that differentiate between forms, including the differentiation between the cathedral and the sky. ..."
  • image p120fig03.44 The Rouen cathedral at sunset generates very different boundary webs than it does in full sunlight, as illustrated by Figure 3.45.
    || Rouen Cathedral at sunset (Monet 1892-1894).
    • Lighting almost equiluminant
    • Most boundaries are thus caused by color differences, not luminance differences
    • Fine architectural details are obscured, leading to...
    • Coarser and more uniform boundary webs, so...
    • Less depth in the painting.
  • image p121fig03.45 The Rouen cathedral in full sunlight.
    || Rouen Cathedral full sunlight (Monet 1892-1894).
    • Lighting is strongly non-uniform across most of the painting
    • Strong boundaries due to both luminance and color differences
    • Fine architectural details are much clearer, leading to...
    • Finer and more non-uniform boundary webs, so...
    • Much more detail and depth
  • image p121fig03.46 The Rouen cathedral in full sunlight contains T-Junctions that are not salient in the painting of it at sunset. These are among the painting's features that give it a much more depthful appearance.
    || Rouen Cathedral full sunlight (Monet 1892-1894).
    • There are also more T-junctions where vertical boundaries occlude horizontal boundaries, or conversely...
    • Leading to more depth.
    p119c2h1.0 "... Such T-junction boundary occlusions ... can generate percepts of depth in the absence of any other visual clues. ...".
  • image p171fig04.49 An example of DaVinci stereopsis in which the left eye sees more of the wall between A and C than the right eye does. The region between B and C is seen only by the left eye because the nearer wall between C and D occludes it from the right eye view.
  • image p377fig11.11 DaVinci stereopsis phenomena occur when only one eye can receive visual inputs from part of a 3D scene due to occlusion by a nearer surface.
    || How does monocular information contribute to depth perception? DaVinci steropsis (Gillam etal 1999). Only by utilizing monocular information can visual system create correct depth percept. [left, right] eye view
  • image p379fig11.13 How the 3D LAMINART model explains DaVinci stereopsis. All the stages of boundary and surface formation are color coded to clarify their explanation. Although each mechanism is very simple, when all of them act together, the correct depthful surface representation is generated. See the text for details.
    || DaVinci stereopsis (Nakayama, Shimojo 1990). An emergent property of the previous simple mechanisms working together. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] pair [Binocular match: boundaries of thick bar -> Add monocular boundaries along lines-of-sight -> Line-of-sight inhibition kills weaker vertical boundaries -> 3D surface percept not just a disparity match!] pair [Binocular match: right edge of thin and thick bars -> Strongest boundaries: binocular and monocular boundaries add -> Vertical boundaries from monocular left edge of thin bar survive -> Filling-in contained by connected boundaries]. cart [very near, near, fixation plane, far, very far]
  • image p380fig11.14 The model explanation of DaVinci stereopsis when the input stimuli have opposite contrast polarities.
    || Polarity-reversed Da Vinci stereopsis (Nakayama, Shimojo 1990). Same explanation! (... as Figure 11.13 ...) [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p381fig11.15 The same model mechanisms explain the surface percept that is generated by the variant of DaVinci stereopsis that Gillam, Blackburn, and Nakayama studied in 1999.
    || DaVinci stereopsis (Gillam, Blackburn, Nakayama 1999). same model mechanisms. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p382fig11.16 The version of DaVinci steropsis wherein three narrow rectangles are binocularly matched with one thick rectangle can also be explained is a similar way.
    || DaVinci stereopsis of [3 narrow, one thick] rectangles (Gillam, Blackburn, Nakayama 1999). [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • p618 Chapter 17 A universal development code - Mental measurements embody universal laws of cell biology and physics
  • image p073fig02.19 Computing with cells: infinity does not exist in biology!
    || Computing in a bounded activity domain, Gedanken experiment (Grossberg 1970). Vm sub-areas [xm, B - xm], I(all m)], m=[1, i, B].
    Bexcitable sites
    xi(t)excited sites (activity, potential)
    B - xi(t)unexcited sites
  • image p082fig02.37 My models begin with behavioral data, since brains are designed to achieve behavioral success. The text explains how models evolve in stages, through a process of successive refinements, or unlumpings. These unlumpings together carry out a kind of conceptual evolution, leading to models that can explain and predict ever larger psychological and neurobiological databases.
    || Modelling method and cycle.
    Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
    Operationalizes "proper level of abstraction"
    Operationalizes that you cannot "derive a brain" in one step.
  • image p501fig13.26 A simple differential equation describes the processes of transmitter accumulation and release that do their best, at a finite rate, to carry out unbiased transduction.
    || Transmitter accumulation and release. Transmitter y cannot be restored at an infinite rate: T = S*ym y ~= B, Differential equations: d[dt: y] = A*(B - y) - S*y = accumulate - release. Transmitter y tries to recover to ensure unbiased transduction. What if it falls behind? Evolution has exploited the good properties that happen then.
  • Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
    The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

    Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

    Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

    These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

    This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

    The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

    I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

    Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

    The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

    If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

    The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

    Howell: bold added for emphasis.
    (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
    see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
  • image p248fig05.49 This circuit of the LAMINART model helps to explain properties of Up and Down states during slow wave sleep, and how disturbances in ACh dynamics can disrupt them.
    ||
  • image p278fig06.28 If the feature-category resonances cannot form, say due to a lesion in IT, then a surface-shroud resonance can still support conscious seeing of an attended object, and looking at or reaching for it, even if the individual doing so knows nothing about the object, as occurs during visual agnosia. The surface-shroud resonance supports both spatial attention and releases commands that embody the intention to move towards the attended object.
    || What kinds of resonances support knowing vs seeing? visual agnosia: reaching without knowing Patient DF (Goodale etal 1991). Attention and intention both parietal cortical functions (Anderson, Essick, Siegel 1985; Gnadt, Andersen 1988; Synder, Batista, Andersen 1997, 1998)
  • image p557fig15.26 Brain regions and processes that contribute to autistic behavioral symptoms when they become imbalanced in prescribed ways.
    || Basal Gamglia prolonged gate opening <-> { Amygdala emotionally depressed-> [hippocampus- hyperspecific learning; Cerebellum- adaptive timing fails; hypofrontal blocking fails, no Theory of Mind]-> Neocortex; Neocortex- rewards not received-> Amygdala}.
  • image p189fig05.04 The hippocampus is one of several brain regions that are important in learning and remembering about objects and events that we experience throughout life. The book will describe several hippocampal processes that contribute to this achievement in different ways.
    || hypothalmic nuclei, amygdala, hippocampus, cingulate gyrus, corpus callosum, thalamus
  • image p228fig05.37 A macrocircuit of the neurotrophic Spectrally Timed ART, or nSTART, model. I developed nSTART with my PhD student Daniel Franklin. It proposes how adaptively timed learning in the hippocampus, bolstered by Brain Derived Neurotrophic Factor, or BDNF, helps to ensure normal memory consolidation.
    || habituative gates, CS, US, Thalamus (sensory cortex, category learning, conditioned reinforcer learning, adaptively time learning and BDNF), Amygdala (incentive motivation learning), Hippocampus (BDNF), Prefrontal Cortex (attention), Pontine nuclei, Cerebellum (adaptively timed motor learning)
  • image p233fig05.42 Mismatch-induced beta oscillations have been reported in at least three parts of the brain: V1, V4, and hippocampus. Althpough there may be other reasons for beta oscillations in the brain, those that are caused by a mismatch should be studied in concert with the gamma oscillations that occur during a good enough match. See tyhe text for details.
    || Is there evidence for the [gamma, beta] prediction? Yes, in at least three parts of the brain, (Buffalo EA, Fries P, Ladman R, Buschman TJ, Desimone R 2011, PNAS 108, 11262-11267) Does this difference in average oscillation frequencies in the superficial and deep layers reflect layer 4 reset? Superficial recording γ (gamma), Deep recording β (beta) (Berke etal 2008, hippocampus; Buschman and Miller 2009, FEF)
  • image p520fig14.02 Macrocircuit of the main brain regions, and connections between them, that are modelled in the unified predictive Adaptive Resonance Theory (pART) of cognitive-emotional and working memory dynamics. Abbreviations in red denote brain regions used in cognitive-emotional dynamics. Those in green denote brain regions used in working memory dynamics. Black abbreviations denote brain regions that carry out visual perception, learning and recognition of visual object categories, and motion perception, spatial representation and target tracking. Arrow denote non-excitatory synapses. Hemidiscs denote adpative excitatory synapses. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown. Also not shown are output signals from cortical areas to motor responses. V1: striate, or primary, visual cortex; V2 and V4: areas of prestriate visual cortex; MT: Middle Temporal cortex; MST: Medial Superior Temporal area; ITp: posterior InferoTemporal cortex; ITa: anterior InferoTemporal cortex; PPC: Posterior parietal Cortex; LIP: Lateral InterParietal area; VPA: Ventral PreArculate gyrus; FEF: Frontal Eye Fields; PHC: ParaHippocampal Cortex; DLPFC: DorsoLateral Hippocampal Cortex; HIPPO: hippocampus; LH: Lateral Hypothalamus; BG: Basal Ganglia; AMYG: AMYGdala; OFC: OrbitoFrontal Cortex; PRC: PeriRhinal Cortex; VPS: Ventral bank of the Principal Sulcus; VLPFC: VentroLateral PreFrontal Cortex. See the text for further details.
    ||
  • image p532fig14.08 Macrocircuit of the ARTSCENE Search neural model for learning to search for desired objects by using the sequences of already experienced objects and their locations to predict what and where the desired object is. V1 = First visual area or primary visual cortex; V2 = Second visual area; V4 = Fourth visual area; PPC = Posterior Parietal Cortex; ITp = posterior InferoTemporal cortex; ITa = anterior InferoTemporal cortex; MTL = Medial Temporal Lobe; PHC = ParaHippoCampal cortex; PRC = PeriRhinal Cortex; PFC = PreFrontal Cortex; DLPFC = DorsoLateral PreFrontal Cortex; VPFC = Ventral PFC; SC = Superior Colliculus.
    ||
  • image p541fig15.02 The neurotrophic Spectrally Timed Adaptive Resonance Theory, or nSTART, model of (Franklin, Grossberg 2017) includes hippocampus to enable adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between CS and US.
    || Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. [CS, US] -> Sensory Cortex (SC) <- motivational attention <-> category learning -> Prefrontal Cortex (PFC). SC conditioned reinforcement learning-> Amygdala (cannot bridge the temporal gap) incentive motivational learning-> PFC. SC adaptively timer learning and BDNF-> Hippocampus (can bridge the temporal gap) BDNF-> PFC. PFC adaptively timed motor learning-> cerebellum.
  • image p543fig15.06 The circuit between dentate granule cells and CA1 hippocampal pyramid cells seems to compute spectrally timed responses. See the text for details.
    || Hippocampal interpretation. 1. Dentate granule cells (Berger, Berry, Thompson 1986): "increasing firing...in the CS period...the latency...was constant". 2. Pyramidal cells: "Temporal model" Dentate granule cells-> CA3 pyramids. 3. Convergence (Squire etal 1989): 1e6 granule cells, 1.6e5 CA3 pyramids. 80-to-1 (ri).
  • image p549fig15.19 How the adaptively timed hippocampal spectrum T inhibits (red arrow) the orienting system A as motivated attention in orbitofrontal cortex Si(2) peaks at the ISI.
    || Conditioning, Attention, and Timing circuit. Hippocampus spectrum-> Amgdala orienting system-> neocortex motivational attention. Adaptive timing inhibits orienting system and maintains adaptively timed Motivated Attention on the CS.
  • image p557fig15.26 Brain regions and processes that contribute to autistic behavioral symptoms when they become imbalanced in prescribed ways.
    || Basal Gamglia prolonged gate opening <-> { Amygdala emotionally depressed-> [hippocampus- hyperspecific learning; Cerebellum- adaptive timing fails; hypofrontal blocking fails, no Theory of Mind]-> Neocortex; Neocortex- rewards not received-> Amygdala}.
  • image p573fig16.01 The experimental chamber (A) and neurophysiological recordings from a rat hippocampus (B) that led to the discovery of place cells. See the text for details.
    ||
  • image p575fig16.03 As a rat navigates in its experimental chamber (black curves), neurophysiological recordings disclose the firing patterns (in red) of (a) a hippocampal place cell and (b) an entrorhinal grid cell.
    ||
  • image p578fig16.04 Cross-sections of the hippocampal regions and the inputs to them. See the text for details.
    || EC-> CA1-> CA3-> DG. Layers [V/V1, II, II].
  • image p583fig16.10 The GRIDSmap model is embedded into a more complete representation of the processing stages from receipt of angular head velocity and linear velocity signals to this learning of place cells.
    || GRIDSmap. Pre-wired 2D stripe cells, learns 2D grid cells. vestibular cells [angular head velocity-> head direction cells, linear velocity]-> stripe cells- small scale 1D periodic spatial code (ECIII)-> SOM grid cells entorhinal cortex- small scale 2D periodic spatial scale-> SOM place cells hippocampal cortex- large scale 2D spatial code (dentate/CA3). Unified hierarchy of SOMs.
  • image p600fig16.36 The entorhinal-hipppocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories. See the text for details.
    || Entorhinal-hippocampal interactions as an ART system. Hippocampal place cells as spatial categories. Angular head velocity-> head direction cells-> stripe cells- small scale 1D periodic code (ECIII) SOM-> grid cells- small scale 2D periodic code (ECII) SOM-> place cells- larger scale spatial map (DG/CA3)-> place cells (CA1)-> conjunctive-coding cells (EC V/VI)-> top-down feedback back to stripe cells- small scale 1D periodic code (ECIII). stripe cells- small scale 1D periodic code (ECIII)-> place cells (CA1).
  • image p602fig16.37 Data showing the effect of hippocampal inactivation by muscimol on grid cell firing before, during, and six hours after the muscimol, reading from left to right.
    || Hippocampal inactivation disrupts grid cells (Bonnevie etal 2013). muscimole inactivation. spikes on trajectory: [before, after min [6-20, 20-40, 40-60, 6h]]. rate map (Hz) [18.6, 11.4, 9.5, 6.7, 10.8]. spatial autocorrelogram g=[1.12, 0.05, -0.34, 0.09, 1.27].
  • image p603fig16.38 Role of hippocampal feedback in maintaining grid fields. (a) Data showing the effect of hippocampal inactivation before and during muscimol inhibition of hippocampal cells, as in Figure 16.37. (b) Model simulation with normal grid fields. (c) Model simulation that emulates the effect of hippocampal inhibition on grid fields.
    || (a) Data: hippocampal inactivation [before, after] cart [spikes on trajectory (p: [18.6, 6.7] Hz), spatial autocorrelogram (g= [1.12, 0.09])]. (b) Model: noise-free path integration, [spikes on trajectory (p: 14.56 Hz), rate map, spatial autocorrelogram (g= 1.41), dynamic autocorrelogram (g=0.6)]. (c) Model: noisy path integration + non-specific tonic inhibition, [spikes on trajectory (p: 11.33 Hz), rate map, spatial autocorrelogram (g= 0.05), dynamic autocorrelogram (g=0.047)].
  • image p617fig16.50 The perirhinal and parahippocampal cortices enable adaptively timed reinforcement learning and spatial navigational processes that are modeled by Spectral Spacing models in the What and Where cortical streams, respectively, to be fused in the hippocampus.
    || What and Where inputs to the hippocampus (Diana, Yonelinas, Ranganath 2007). Adaptively timed conditioning and spatial naviga039tbl01.03 tion. Hippocampus <-> Entorhinal Cortex <-> [Perirhinal Cortex <-> what, Parahippocampal Cortex <-> where].
  • p190 Howell: [neural microcircuits, modal architectures] used in ART -
    bottom-up filterstop-down expectationspurpose
    instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
    LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
    EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
    auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
    visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
    red - cognitive-emotional dynamics
    green - working memory dynamics
    black - see [bottom-up, top-down] lists
    EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

    backgound colours in the table signify :
    whitegeneral microcircuit : a possible component of ART architecture
    lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
    light bluepost-perceptual cognition?
    pink"the feeling of what happens" and knowing what event caused that feeling
  • image p419fig12.17 The auditory continuity illusion illustrates the ART Matching Rule at the level of auditory streaming. Its "backwards in time" effect of future context on past conscious perception is a signature of resonance.
    || Auditory continuity illusion. input, percept. Backwards in time - How does a future sound let past sound continue through noise? Resonance! - It takes a while to kick in. After it starts, a future tone can maintain it much more quickly. Why does this not happen if there is no noise? - ART Matching Rule! TD harmonic filter is maodulatory without BU input. It cannot create something out of nothing.
  • image p420fig12.18 The ARTSTREAM model explains and simulates the auditory continuity illusion as an example of a spectral-pitch resonance. Interactions of ART Matching Rule and asymmetric competition mechanisms in cortical strip maps explain how the tone selects the consistent frequency from the noise in its own stream while separating the rest of the noise into another stream.
    || ARTSTREAM model (Grossberg 1999; Grossberg, Govindarajan, Wyse, Cohen 2004). SPINET. Frequency and pitch strips. Bottom Up (BU) harmonic sieve. Top Down (TD) harmonic ART matching. Exclusive allocation. Learn pitch categories based on early harmonic processing. A stream is a Spectral-Pitch Resonance!
  • image p425fig12.23 ARTSTREAM simulations of the auditory continuity illusion and other streaming properties (left column, top row). When two tones are separated by silence (Input), a percept of silence also separates them in a spectral-pitch resonance. (left column, bottom row). When two tones are separated by broadband noise, the percept of tone continues through the noise in one stream (stream 1) while the remainder of the noise occurs in a different stream (stream 2). (right column) Some of the other streaming properties that have been simulated by the ARTSTREAM model.
    || Auditory continuity does not occur without noise. Auditory continuity in noise. Other simulated streaming data.
  • p190 Howell: [neural microcircuits, modal architectures] used in ART -
    bottom-up filterstop-down expectationspurpose
    instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
    LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
    EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
    auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
    visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
    red - cognitive-emotional dynamics
    green - working memory dynamics
    black - see [bottom-up, top-down] lists
    EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

    backgound colours in the table signify :
    whitegeneral microcircuit : a possible component of ART architecture
    lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
    light bluepost-perceptual cognition?
    pink"the feeling of what happens" and knowing what event caused that feeling
  • p404 Chapter 12From seeing and reaching to hearing and speaking - Circular reaction, streaming, working memory, chunking, and number
  • image p030tbl01.02 The What and Where cortical processing streams obey complementary laws. These laws enable the What stream to rapidly and stably learn invariant object categories without experiencing catastrophic forgetting, while the Where stream learns labile spatial and action representations to control actions that are aimed towards these objects.
    ||
    WHATWHERE
    spatially-invariant object learning and recognitionspatially-variant reaching and movement
    fast learning without catastrophic forgettingcontinually update sensory-motor maps and gains
    IT InterferoTemporal CortexPPC Posterior Parietal Cortex
    WhatWhere
    matchingexcitatoryinhibitory
    learningmatchmismatch
  • image p032fig01.21 At least three parallel visual cortical streams respond to visual inputs that reach the retina. Two parvocellular streams process visual surfaces (blob stream) and visual boundaries (interblob stream). The magnocellular stream processes visual motion.
    || [Retina, LGNs, V[1,2,3,4], MT] to [What- inferotemporal areas, Where- parietal areas]: visual parallel streams [2x blob, 1x bound]
  • image p039tbl01.03 The link between consciousness and movement
    ||
    VISUALseeing, knowing, and reaching
    AUDITORYhearing, knowing, and speaking
    EMOTIONALfeeling, knowing, and acting
  • image p092fig03.05 A cross-section of the retinal layer. Note that light stimuli need to go through all retinal layers before they reach the photoreceptor layer at which the light signals are registered.
    || light stimuli ->
    retinal layerscellular composition
    inner limiting membrane
    retinal nerve fibreganglion nerve fibres
    ganglion cellganglion
    inner plexiformamacrine
    inner nuclearhorizontal
    outer plexiform
    outer limiting membrane
    photoreceptorrod
    photoreceptorcone
    retinal pigment epithelium
    <- signal transduction. http://brain.oxfordjournals.org/content/early/2011/01/20/brain.awq346
  • image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
    || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
  • image p278fig06.28 If the feature-category resonances cannot form, say due to a lesion in IT, then a surface-shroud resonance can still support conscious seeing of an attended object, and looking at or reaching for it, even if the individual doing so knows nothing about the object, as occurs during visual agnosia. The surface-shroud resonance supports both spatial attention and releases commands that embody the intention to move towards the attended object.
    || What kinds of resonances support knowing vs seeing? visual agnosia: reaching without knowing Patient DF (Goodale etal 1991). Attention and intention both parietal cortical functions (Anderson, Essick, Siegel 1985; Gnadt, Andersen 1988; Synder, Batista, Andersen 1997, 1998)
  • image p303fig08.20 The G-wave speeds up with the distance between flashes at a fixed delay, and has a consitent motion across multiple spatial scales.
    || G-wave properties (Grossberg 1977). Theorem 2 (Equal half-time property) The time at which the motion signal reaches position w=L/2. Apparent motion speed-up with distance: this half-time is independent of the distance L between the two flashes. Consistent motion across scales: half-time is independent of the scale size K. Method of proof: elementary algebra and calculus (Grossberg, Rudd 1989 appendix)
  • image p304fig08.21 A computer simulation of the equal half-time property whereby the apparent motions within different scales that respond to the same flashes all reach the half-way point in the motion trajectory at the same time.
    || Equal half-time property: how multiple scales cooperate to generate motion percept. Travelling waves from Gaussian filters of different sizes bridge the same distance in comparable time. The time needed to bridge half the distance between flashes is the same.
  • image p335fig08.61 Behavioral data (left image) and simulation (right image) about speed in correct and error trials of the RT task. See text for details.
    || Behavioral data: speed, correct and error trials (RT task) (Roltman, Shadien 2002). More coherence in the motion causes faster reaction time.
  • image p350fig09.22 How the negative Gaussian of an obstacle causes a peak shift to avoid the obstacle without losing sight of how to reach the goal.
    || Steering dynamics: obstacle avoidance. body-centered coordinates [obstacle, goal, heading] -> steering
  • image p351fig09.25 By the time MT+ is reached, directional transient cells and directional filters have begun to extract more global directional information from the image.
    || M+ computes global motion estimate. Estimate global motion from noisy local motion estimates.
  • image p414fig12.11 Neurophysiological data from cortical areas 4 and 5 (every other column) and simulations thereof (other columns) during a reach.
    || activation vs time. (a) area 4 phasic RT (IFV) (b) area 4 tonic (OPV) (c) area 4 phasic-tonic (OFPV) (d) area 4 phasic MT (DVV) (e) area 5 phasic (DV) (f) area 5 tonic (PPV)
  • image p416fig12.13 The DIRECT model learns, using a circular reaction that is energized by an Endogenous Random Generator, or ERG, to make motor-equivalent volitionally-activated reaches. This circular reaction learns a spatial representation of a target in space. It can hereby make accurate reaches with clamped joints and on its first try using a tool under visual guidance; see Figure 12.16.
    || DIRECT model (Bulloch, Grossberg, Guenther 1993). learns by circular reaction. learns spatial reresentation to me4diate between vision and action. motor-equivalent reaching. can reach target with clamped joints. can reach target with a TOOL on the first try under visual guidance. How did tool use arise?!
  • image p416fig12.14 Computer simulations of DIRECT reaches with (b) a tool, (c) a clamped elbow, and (d) with a blindfold, among other constraints.
    || Computer simulationsd of direct reaches [unconstrained, with TOOL, elbow clamped at 140°, blindfolded]
  • image p417fig12.15 The DIRECT and DIVA models have homologous circuits to learn and control motor-equivalent reaching and speaking, with tool use and coarticulation resulting properties. See the text for why.
    || From Seeing and Reaching to Hearing and Speaking, Circular reactions (Piaget 1945, 1951, 1952). Homologous circuits for development and learning of motor-equivalent REACHING and SPEAKING. DIRECT TOOL use (Bullock, Grossberg, Guenther 1993), DIVA Coarticulation (Guenther 1995)
  • image p428fig12.25 (left architecture) Auditory-articulatory feedback loop whereby babbled sounds active learning in an imitative map that is later used to learn to reproduce the sounds of other speakers. An articulatory-to-auditory expectation renders learning possible by making the auditory and motor data dimensionally consistent, as in the motor theory of speech. (right architecture) Parallel streams in the ARTSPEECH model for learning speaker-independent speech and language meaning, including a mechanism for speaker normalization (right cortical stream) and for learning speaker-dependent vocalic qualities (left cortical stream).
    || left: Speaker-dependent vocalic qualities; right: Speaker-independent speech and language meaning
  • image p430fig12.26 The NormNet model shows how speaker normalization can be achieved using specializations of the same mechanisms that create auditory streams. See the text for how.
    || [Anchor vs Stream] log frequency map. -> diagonals-> Speaker-independent acoustic item information-> [BU adaptive filter, TD learned expectation]-> leaned item recognition categories
  • image p446fig12.44 (left column, top row) LIST PARSE can model linguistic data from human subjects. In this figure, model parameters are fixed to enable a close fit to data about error-type distributions in immediate free recall experiments, notably transposition errors. (right column, top row) Simulation and data showing bowing of the serial position curve, including an extended primacy gradient. (left column, bottom row) The simulation curve overlays data about list length effects, notably the increasing recall difficulty of longer lists during immediate serial recall (ISR). (right column, bottom row) Simulation (bottom image) and data (top image) of the limited temporal extent for recall.
    || (1. TL) Error-type distributions in immediate serial recall (Hanson etal 1996). % occurence vs serial position. Graph convention: Data- dashed lines; Simulations- solid lines. Six letter visual ISR. Order errors- transpositions of neighboring items are the most common. Model explanation: Noisy activation levels change relative oorder in primacy gradient. Similar activation of neighboring items most susceptible to noise. Model paramenters fitted on these data. (2. TR) Bowing of serial position curve (Cowan etal 1999). % correct vs serial position. Auditory ISR with various list lengths (graphs shifted rightward): For [, sub-]span lists- extended primacy, with one (or two) item recency; Auditory presentation- enhanced performance for last items. LIST PARSE: End effects- first and last items half as many members; Echoic memory- last presented item retained in separate store. (3. BL) List length effects, circles (Crannell, Parrish 1968), squares (Baddeley, Hitch 1975), solid line- simulation. % list correct vs list length. Variable list length ISR: longer lists are more difficult to recall. LIST PARSE: More items- closer activation levels and lower absolute activity level with enough inputs; Noise is more likely to produce order errors, Activity levels more likely to drop below threshold;. (4. BR) Limited temporal extent for recall (Murdoch 1961). % recalled vs retention interval (s). ISR task with distractor-filled retention intervals (to prevent rehersal): Increasing retention interval - decreases probabilty of recalling list correctly; Load dependence- longer list more affected by delays; Performance plateau- subjects reach apparent asymptote. LIST PARSE: Increase convergence of activities with time; loss of order information;.
  • image p231fig05.39 The SMART hypothesis testing and learning cycle predicts that vigilance increases when a mismatch in subcortical regions like the nonspecific thanlamus activates the nucleus basalis of Meynert which, in turn, broadcasts a burst of the neurotransmitter acetylcholine, or ACh, to deeper cortical layers. Due to the way in which LAMINART proposes that cortical matching and mismatching occurs, this ACh burst can increase vigilance and thereby trigger a memory search. See the text for details.
    || [BU input, [, non]specific thalamic nucleus, thalamic reticulate nucleus, neocortical laminar circuit] cart [Arousal, Reset, Search, Vigilance]
  • image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
    || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
  • image p461fig12.58 The lisTELOS model built upon key processes that were earlier modeled by the TELOS model. See the text for details.
    || TELOS model (Brown, Bulloch, Grossberg 1999, 2004). shows [BG nigro-[thalamic, collicular], FEF, ITa, PFC, PNR-THAL, PPC, SEF, SC, V1, V4/ITp, Visual Cortex input] and [GABA].
  • image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
    ||
  • image p524fig14.04 (a) Model basal ganglia circuit for the control of dopaminergic Now Print signals from the substantia nigra pars compacta, or SNc, in response to unexpected rewards. Cortical inputs (Ii), activated by conditioned stimuli, learn to excite the SNc via a multi-stage pathway from the vantral striatum (S) to the ventral pallidum and then on to the PPTN (P) and the SNc (D). The inputs Ii excite the ventral striatum via adaptive weights WIS, and the ventral striatum excites the SNc with strength W_PD. The striosomes, which contain an adaptive spectral timing mechanism [xij, Gij, Yij, Zij], learn to generate adaptively timed signals that inhibit reward-related activation of the SNc. Primary reward signals (I_R) from the lateral hypothalamus both excite the PPTN directly (with strength W_RP) and act as training signals to the ventral striatum S (with strength W_RS) that trains the weights W_IS. Arrowheads denote excitatory pathways, circles denote inhibitory pathways, and hemidiscs denote synapses at which learning occurs. Thick pathways denote dopaminergic signals.
    ||
  • image p559fig15.27 Brain regions and processes that contribute to the release of dopaminergic Now Print signals by the substantia nigra pars compacta, or SNc, in response to unexpected reinforcing events. See the text for details.
    || Model of spectrally timed SNc learning (Brown, Bulloch, Grossberg 1999). Delayed inhibitory expectations of reward. Dopamine cells signal an error in reqard prediction timing or magnitude. Immediate excitatory predictions of reward. Lateral hypothalamus (Primary Reward Input)-> [(+)ventral striatum <-> ventral pallidium (+)-> PPTN(+)-> SNc]. SNc-> [dopamine signal -> ventral striatum, Striosomal cells]. Conditioned Stimuli (CS)(+)-> [ventral striatum, striosomal cells]. Striosomal cells(-)-> SNc.
  • image p560fig15.29 Excitatory pathways that support activation of the SNc by a US and the conditioning of a CS to the US.
    || Excitatory pathway. Primary reward (apple juice) briefly excites lateral hypothalamus. Hypothalamic-PPTN excitation causes SNc dopamine burst. Hypothalamic activity excites ventral striatum for training. Active CS working memory signals learn to excite ventral striatum. Lateral hypothalamus (Primary Reward Input)-> [(+)ventral striatum <-> ventral pallidium(+)-> PPTN(+)-> SNc]. SNc-> [dopamine signal -> ventral striatum. Conditioned Stimuli working memory trace (CS)(+)-> ventral striatum.
  • image p560fig15.30 The inhibitory pathway from striosomal cells to the SNc is able to inhibit the SNc when a reward occurs with expected timing and magnitude.
    || Inhibitory pathway. Learning: CS-striosomal LTP occurs due to a three-way coincidence [An active CS working memory input, a Ca2+ spike, a dopamine burst]; Signaling: The delayed Ca2+ spike facilitates striosomal-SNc inhibition;. Striosomal cells learn to predict both timing and magnitude of reward signal to cancel it: reward expectation;. Conditioned stimuli (CS) LTP-> Striosomal cells <- dopamine | (-)-> SNc->.
  • image p561fig15.32 The SNc can generate both dopamine bursts and dips in response to rewards whose amplitude is unexpectedly large or small.
    || Inhibitory pathway: expectation magnitude. 1. If reward is greater than expected, a dopamine burst causes striosomal expectation to increase. 2. If reward is less than expected, a dopamine dip causes striosomal expectation to decrease. 3. This is a negative feedback control system for learning. Conditioned stimuli (CS)-> Striosomal cells <- dopamine | (-)-> SNc->.
  • image p569fig15.40 The direct and indirect basal ganglia circuits that control GO and STOP movement signals. See the text for details.
    || [Direct path GO(+), Indirect path STOP(+), dopamine from SNc(+-)]-> striatum. GO-> GPi/SNr-> Thalamus (VA/Vlo) <-> frontal cortex. STOP-> GPe <-> STN-> GPi/SNr. NAc-> GPi/SNr.
  • image p375fig11.06 The contrast constraint on binocular fusion is realized by obligate cells in layer 3B of cortical area V1.
    || Model implements contrast constraint on binocular fusion (cf. "obligate" cells Poggio 1991). An ecological constraint on cortical development. [left, right] eye cart V1-[4 monocular simple, 3B binocular simple, complex2/3A] cells. Inhibitory cells (red) ensure that fusion occurs when contrasts in left and right eye are approximately equal.
  • image p376fig11.09 The disparity filter in V2 helps to solve the correspondence problem by eliminating spurious contrasts using line-of-sight inhibition.
    || Model V2 disparity filter solves the correspondence problem. An ecological constraint on cortical development. [left, right] eye view: False matches (black) suppressed by line-of-sight inhibition (green lines). "Cells that fire together wire together".
  • image p581fig16.06 The learning of hexagonal grid cell receptive fields as an animal navigates an open field is a natural consequence of simple trigonometric properties of the positions at which the firing of stripe cells that are tuned to different directions will co-occur.
    || The Trigonometry of spatial navigation. Coactivation of stripe cells.
  • image p583fig16.09 The GRIDSmap model used algorithmically defined stripe cells to process realistic rat trajectories. The stripe cell outputs then formed inputs to the adaptive filter of a self-organizing map which learned hexagonal grid cell receptive fields.
    || GRIDSmap. Self-organizing map receives inputs from stripe cells and learns to respond to most frequent co-activation patterns. Stripe cells combine speed and head direction to create a periodic 1D position code. Virtual rat navigated using live rat trajectories from Moser Lab. Speed and head direction drives stripe cells.
  • image p584fig16.11 GRIDSmap simulation of the learning of hexagonal grid fields. See the text for details.
    || Simulation results. Multiple phases per scale. response vs lenght scale (0.5m+).
  • image p585fig16.13 Hexagonal grid cell receptive fields develop if their stripe cell directional preferences are separated by 7, 10, 15, 20, or random numbers degrees. The number and directional selectivities of stripe cells can thus be chosen within broad limits without undermining grid cell development.
    ||
  • image p585fig16.14 Superimposing firing of stripe cells whose directional preferences differ by 60 degrees supports learning hexagonal grid cell receptive fields in GRIDSmap.
    || GRIDSmap: from stripe cells to grid cells. Grid-cell Regularity from Integrated Distance through Self-organizing map. Superimposing firing of stripe cells oriented at intervals of 60 degrees. Hexagonal grid!
  • image p586fig16.15 Superimposing stripe cells oriented by 45 degrees does not lead to learning of rectangular grids in GRIDSmap, but it does in an oscillatory inference model.
    || Why is a hexagonal grid favored? Superimposing firing of stripe cells oriented at intervals of 45 degrees. Rectangular grid. This and many other possibilities do not happen in vivo. They do happen in the oscillatory inference model. How are they prevented in GRIDSmap?
  • image p587fig16.17 A finer analysis of the 2D trigonometry of spatial navigation showed that both the frequency and amplitude of coactivations by stripe cells determine the learning of hexagonal grid fields.
    || A refined analysis: SOM amplifies most frequent and energetic coactivations (Pilly, Grossberg 2012). [linear track, 2D environment]. (left) Stripe fields separated by 90°. 25 coactivations by 2 inputs. (right) Stripe fields separated by 60°. 23 coactivations by 3 inputs.
  • image p588fig16.18 Simulations of coordinated learning of grid cell receptive fields (second row) and unimodal place cell receptive fields (third row) by the hierarchy of SOMs in the GridPlaceMap model. Note the exquisite regularity of the hexagonal grid cell firing fields.
    || [stripe, grid, place] cells vs [spikes on trajectory, unsmoothed rate map, smoothed rate map].
  • image p605fig16.39 Data showing effects of medial septum (MS) inactivation on grid cells and network theta oscillations in medial entorhinal cortex (MEC). (A) Examples of disruption in the spatial expression of the hexagonal grid structure for two grid cells (Brandon etal 2011). (B) Temporal reduction in the power and frequency of network theta oscillations (Koenig etal 2011). (C) Temporary reduction in the gridness score, mean firing rate, and patial stability of grid cells (Koenig etal 2011).
    || Disruptive effects of Medial Septum inactivation in Medial Entorhinal Cortex (Brandon etal 2011; Koenig etal 2011). (A) Rate map [rate map, spatial autocorrelations, trajectory] vs [baseline, sub-sampled, medial septum inactivation, 3-6 hour recovery, 24 hour recovery], [rate map (Hz- m, p), spatial autocorrelations (gridness)][ 1.2, 7.2, 1.1; 0.25, 1.7, 0.6; 0.25, 2.5, -0.53; 0.7, 5.1, 0.55; 1.0, 5.3, 1.3; 2.1, 15, 0.19; 1.7, 12, 0.71; 1.7, 3.2, -0.22; 1.8, 9.1, 0.68; 2.5, 13, 0.46]. (B) [normalized power at 7-9 Hz, frequency (Hz)] vs 5-minute periods. (C) [mean gridness score (+-SEM), mean firing rate (% of baseline), mean correlation coeff (+-SEM)] vs 10-minute periods.
  • p353 Chapter 10 Laminar computing by cerebral cortex - Towards a unified theory of biologucal and artificial intelligence
  • image p011fig01.07 The choice of signal function f determines how an initial activity pattern will be transformed and stored in short-term memory (STM). Among [same, slower, faster]-than-linear signal functions, only the last one can suppress noise. It does so as it chooses the population that receives the largest input for storage, while suppressing the activities of all other population, thereby giving rise to a winner-take-all choice.
    || initial pattern (xi(0) vs i):
    fXi(∞)= xi(∞)/sum[j: xj(∞)]x(∞)
    linearperfect storage of any patternamplifies noise (or no storage)
    slower-than-linearsaturatesamplifies noise
    faster-than-linearchooses max [winner-take-all, Bayesian], categorical perceptionsuppresses noise, [normalizes, quantizes] total activity, finite state machine
  • image p357fig10.04 Laminar Computing achieves its properties by computing in a new way that sythesizes the best properties of feedforward and feedback interactions, analog and digital computations, and preattentive and attentive learning. The property of analog coherence enables coherent groupings and decisions to form without losing sensitivity to the amount of evidence that supports them.
    || Laminar Computing: a new way to compute. 1. Feedforward and feedback: a) Fast feedforward processing when data are unambiguous (eg Thorpe etal), b) slower feedback chooses among ambiguous alternatives [self-normalizing property, real-time probabiligy theory], c) A self-organizing system that trades certainty against speed: Goes beyond Bayesian models! 2. Analog and Digital: Analog Coherence combines the stability of digital with the sensitivity of analog. 3. Preattentive and Attentive Learning: Reconciles the differences of (eg) Helmholtz and Kanizsa, "A preattentive grouping is its own 'attentional' prime"
  • image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
    || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
  • image p232fig05.40 Computer simulation of how the SMART mode generates (a) gamma oscillations if a good enough match occurs, or (c) beta oscillations if a bad enough match occurs. See the text for details.
    || Brain oscillations during match/mismatch, data, simulation. (a) TD corticothalamic feedback increases synchrony (Sillito etal 1994) (b) Match increases γ oscillations (c) Mismatch increases θ,β oscillations
  • image p233fig05.42 Mismatch-induced beta oscillations have been reported in at least three parts of the brain: V1, V4, and hippocampus. Althpough there may be other reasons for beta oscillations in the brain, those that are caused by a mismatch should be studied in concert with the gamma oscillations that occur during a good enough match. See tyhe text for details.
    || Is there evidence for the [gamma, beta] prediction? Yes, in at least three parts of the brain, (Buffalo EA, Fries P, Ladman R, Buschman TJ, Desimone R 2011, PNAS 108, 11262-11267) Does this difference in average oscillation frequencies in the superficial and deep layers reflect layer 4 reset? Superficial recording γ (gamma), Deep recording β (beta) (Berke etal 2008, hippocampus; Buschman and Miller 2009, FEF)
  • image p296fig08.07 When two flashes turn on and off out of phase with the correct range of interstimulus intervals, and not too far from one another, then either beta motion of phi motion are perceived.
    || Beta and Phi motion percepts. Beta motion: percepts of continuous motion of a well-defined object across empty intervening space. Phi motion: sense of "pure" motion without a concurrent percept of moving object. (Exner 1875) http://www.yorku.ca/eye/balls.htm
  • image p297fig08.08 When a second flash is more intense than the first flash, then apparent motion may occur from the second to the first flash.
    || Delta motion: motions from the second to the first flash. Data: (Kolers 1972; Korte 1915). Simulation: (Grossberg, Rudd 1992). This occurs when the luminance or contrast of the second flash is large compared to that of the first flash. Sustained and transient cells obey shunting dynamics whose averaging rates speed up with output intensity. The first flash to wane is the one that will be the source of the G-wave.
  • image p340fig09.07 Log polar remapping from the retina to cortical area V1 and beyond converts expansion, translation, and spiral flows on the retina into parallel flows, with different orientations, on the cortical map.
    || Log polar remapping of optic flow. retina -> cortex. Any combination of expansion and circular motion centered on the fovea maps to cortex as a single direction. Retinal Cartesian coordinates (x,y) map to cortical polar coordinates (r,theta). This makes it easy to compute directional receptive fields in the cortex!
  • image p423fig12.20 The Spatial Pitch Network, or SPINET, model shows how a log polar spatial representation of the sound frequency spectrum can be derived from auditory signals occuring in time. The spatial representation allows the ARTSTREAM model to compute spatially distinct auditory streams.
    || SPINET model (Spatial Pitch Network) (Cohen, Grossberg, Wyse 1995). 1. input sound 2. Gamma-tone filter bank 3. Shaort-term average energy spectrum 4. MAP transfer function 5. On-center off-surround and rectification 6. Harmonic weighting 7. Harmonic summation and competition -> PITCH
  • image p598fig16.34 The spiking GridPlaceMap model generates theta-modulated place and grid cell firing, unlike the rate-based model.
    || Theta-modulated cells in spiking model. [place, grid] cell vs [membrane potential (mV vs time), frequency vs inter-spike intervals (s), power spectra (normalized power vs frequency (Hz))].
  • image p605fig16.39 Data showing effects of medial septum (MS) inactivation on grid cells and network theta oscillations in medial entorhinal cortex (MEC). (A) Examples of disruption in the spatial expression of the hexagonal grid structure for two grid cells (Brandon etal 2011). (B) Temporal reduction in the power and frequency of network theta oscillations (Koenig etal 2011). (C) Temporary reduction in the gridness score, mean firing rate, and patial stability of grid cells (Koenig etal 2011).
    || Disruptive effects of Medial Septum inactivation in Medial Entorhinal Cortex (Brandon etal 2011; Koenig etal 2011). (A) Rate map [rate map, spatial autocorrelations, trajectory] vs [baseline, sub-sampled, medial septum inactivation, 3-6 hour recovery, 24 hour recovery], [rate map (Hz- m, p), spatial autocorrelations (gridness)][ 1.2, 7.2, 1.1; 0.25, 1.7, 0.6; 0.25, 2.5, -0.53; 0.7, 5.1, 0.55; 1.0, 5.3, 1.3; 2.1, 15, 0.19; 1.7, 12, 0.71; 1.7, 3.2, -0.22; 1.8, 9.1, 0.68; 2.5, 13, 0.46]. (B) [normalized power at 7-9 Hz, frequency (Hz)] vs 5-minute periods. (C) [mean gridness score (+-SEM), mean firing rate (% of baseline), mean correlation coeff (+-SEM)] vs 10-minute periods.
  • Grossberg 2021 p229c2h0.60 SMART computer simulations demonstrate that a good enough match of a top-down expectation with a bottom-up feature pattern generates an attentive resonance during which the spikes of active cells synchronize in the gamma frequency range of 20-70 Hz (Figure 5.40). Many labs have reported a link between attention and gamma oscillations in the brain, including two articles published in 2001, one from the laboratory of Robert Desimone when he was at the the National Institute of Mental Health in Bethseda (Fries, Reynolds, Rorie, Desimone 2001), and the other from the laboratory of Wolf Singer in Frankfurt (Engel, Fries, Singer 2001). You'll note that Pascal Fries participated in both studies, and is an acknowledged leader in neurobiological studies of gamma oscillations; eg (Fries 2009). .."
  • p618 Chapter 17 A universal development code - Mental measurements embody universal laws of cell biology and physics
  • image p025fig01.16 (left panel) The main processing stages of the Cognitive-Emotional-Motor (CogEM) model have anatomical interpretations in terms of sensory cortex, amygdala, and prefrontal cortex. Chapter 13 will describe in greater detail how CS cues activate invariant object categories in the sensory cortex, value categories in the amygdala, and object-value categories in the prefrontal cortex, notably the orbitofrontal cortex. The amygdala is also modulated by internal drive inputs like hunger and satiety. (right panel) Anatomical data support this circuit, as do many neurophysiological data.
    || drive -> amygdala -> prefrontal cortex <-> sensory cortex -> amygdala. [visual, somatosensory, auditory, gustatory, olfactory] cortex -> [amygdala, Orbital Prefrontal Cortex]. amygdala -> Lateral Prefrontal Cortex
  • image p058fig02.04 Serial learning paradigm: Learning the temporal order of events by practicing them in the order that they occur in time.
    || Learning a global arrow in time. How do we learn to encode the temporal order of events in LTM? serial learning. [w=intra, W=inter]trial interval. "... data about serial verbal learning (Figure 2.4) seemed to suggest that events can go "backwards in time". ..."
  • image p059fig02.05 Bowed serial position curve. This kind of data emphasizes the importance of modelling how our brains give rise to our minds using nonlinear systems of differential equations.
    || Effects of [inter, intra]trial intervals (Hovland 1938). # of errors vs list position. [w (sec), W (sec)] = (2 6) (4 6) (2 126) (4 126). Nonoccurance of future items reduces the number of errors in response to past items. Thes data require a real-time theory for their explanation! that is, DIFFERENTIAL equations.
  • image p059fig02.06 The bowed serial position curve illustrates the sense in which "events can go backwards in time" during serial learning.
    || Bow due to backward effect in time. If the past influenced the future, but no conversely: # of errors vs list position; Data (Hoyland Hull, Underwood, etc).
  • image p071fig02.16 To solve the noise-saturation dilemma, individual neurons in a network that is receiving a distributed spatial patterns of inputs need to remain sensitive to the ratio of input to them divided by all the inputs in that spatial pattern. Although the inputs are delivered to a finite number of neurons, the input and activity patterns are drawn continuously across the cells for simplicity.
    || Noise-Saturation Dilemma. [Ii, xi] vs t. [Input, Activity] pattern [small -> noise, large -> saturation]. Problem: remain sensitive to input RATIOS θi = Ii / sum[j: Ij] as total input I = sum[j: Ij] -> ∞. Many kinds of data exhibit sensitivity to ratios of inputs.
  • image p073fig02.19 Computing with cells: infinity does not exist in biology!
    || Computing in a bounded activity domain, Gedanken experiment (Grossberg 1970). Vm sub-areas [xm, B - xm], I(all m)], m=[1, i, B].
    Bexcitable sites
    xi(t)excited sites (activity, potential)
    B - xi(t)unexcited sites
  • image p082fig02.37 My models begin with behavioral data, since brains are designed to achieve behavioral success. The text explains how models evolve in stages, through a process of successive refinements, or unlumpings. These unlumpings together carry out a kind of conceptual evolution, leading to models that can explain and predict ever larger psychological and neurobiological databases.
    || Modelling method and cycle.
    Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
    Operationalizes "proper level of abstraction"
    Operationalizes that you cannot "derive a brain" in one step.
  • image p085fig02.38 Our models have been used in many large-scale applications to engineering and technology. Linking brain to behavior explains how brain mechanisms give rise to psychological functions, and do so autonomously. The combination of mechanism, function, and autonomy helps to explain their value in helping to solve outstanding problems in technology.
    || Modeling method and cycle.
    Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
    Technology: Mathematical model and analysis <-> Technological applications
    At every stage, spin off new model designs and mechanisms to technologist who need autonomous intelligent applications.
  • image p134fig04.14 The kinds of displays that Michael Paradiso and Ken Nakayamas used to catch filling-in "in the act" and which Karl Arrington then simulated using the Grossberg and Todorovic 1988 model.
    || Experiments on filling-in. Catching "filling0in" in the act (Paradiso, Nakayama 1991). (Arrington 1994 Vision Research 34, 3371-3387) simulated these data using the model of Grossberg and Todorovic 1988.
  • image p145fig04.23 If end gaps were not closed by end cuts, then color would flow out of every line end!
    || A perceptual disaster in the feature contour system. feature contour, line boundary. input -> [boundary, surface]. boundary -> surface. Color would flow out of every line end! as it does during neon color spreading.
  • image p151fig04.29 Experimental evidence of bipole cells in cortical area V2 was reported by Von der Heydt, Peterhans, and Baumgarter (1984).
    || Bipoles: first neurophysiological evidence (V2) (von der Heydt, Peterhans, Baumgartner 1984, Peterhans, von der Heydt 1988). (Grossberg 1984) prediction.
    Ordering:
    Stimulus (S)
    probe location *
    cells in V2
    response?
    ...(S)*...YES
    ...*...(S)NO
    (S)...*...NO
    (S)...*...(S)YES
    (S)...*...
    (more contrast)
    NO
    (S)...*.....(S)YES
    Evidence for receptive field.
  • image p151fig04.30 Anatomical evidence for long-range horizontal connections has also been reported, as illustrated by the example above from (Bosking etal 1997).
    || Anatomy: horizontal connections (V1) (Bosking etal 1997). tree shrew. [10, 20]*[20, 10, 0, -10, -20] (degrees).
  • image p152fig04.31 The predicted bipole cell receptive field (upper left corner) has been supported by both neurophysiological data and psychophysical data, and used in various forms by many modelers. See the text for details.
    || Bipoles through the ages. (Grossberg 1984; Grossberg, Mongolla 1985). (Field, Hayes, Hess 1993) "association field". (Heitger, von der Heydt 1993). (Williams, Jacobs 1997). cf. "relatability" geometric constraints on which countours get to group (Kellman & Shipley 1991). Also "tensor voting" (Ullman, Zucker, Mumford, Guy, Medioni, ...).
  • image p159fig04.36 Graffiti art by Banksy exploits properties of amodal boundary completion and spatial impenetrability.
    || p159c1h0.75 perceptual psychologist Nava Rubin "... When the wall is smooth, Banksy leaves the regions previously covered by stencil unpainted, relying of observers' perception to segregate figural regions from the (identically colored) background. But when the wall is patterned with large-scale luminance edges - eg due to bricks - Banksy takes the extra time to fill in unpainted figural regions with another color (Rubin 2015). ..."
  • image p162fig04.38 How long-range cooperation among bipole cells and short-range competition by hypercomplex cells work together to generate the inverted-U in boundary strength that is found in the data of Figure 4.37 (right panel).
    || Cooperation and competition during grouping.
    few lineswide spacing, inputs outside spatial range of competition, more inputs cause higher bipole activity
    more linesnarrower spacing, slightly weakens net input to bipoles from each inducer
    increasing line densitycauses inhibition to reduce net total input to bipoles
  • image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
    || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
  • image p165fig04.41 The Kanizsa-Minguzzi ring. See the text for details.
    || p165c1h0.6 "... (left panel), the annulus is divided by two line segments into annular sectors of unequal area. Careful viewing shows that the smaller sector looks a little brighter than the larger one. (Kanizsa, Minguzzi 1986) noted that "this unexpected effect is not easily explained. In fact, it cannot be accounted for by any simple psychological mechanism such as lateral inhibition or freuency filtering. Furthermore, it does not seem obvious to invoke oganizational factors, like figural belongingness or figure-ground articulation."". p165c2h0.35 "... (Grossberg, Todorovic 1988). Our main claim is that the two radial lines play two roles, one in the formation of boundaries with which to contain the filling-in process, and the other as a source of feature contour signals that are filled-in within the annular regions to create a surface brightness percept. ..."
  • image p232fig05.40 Computer simulation of how the SMART mode generates (a) gamma oscillations if a good enough match occurs, or (c) beta oscillations if a bad enough match occurs. See the text for details.
    || Brain oscillations during match/mismatch, data, simulation. (a) TD corticothalamic feedback increases synchrony (Sillito etal 1994) (b) Match increases γ oscillations (c) Mismatch increases θ,β oscillations
  • image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
    || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
  • image p252fig06.01 A surface-shroud resonance begins to form when the surface representations of objects bid for spatial attention. In addition to these topographic excitatory inputs, there is long-range inhibition of the spatial attention cells that determines which inputs will attract spatial attention.
    || Bottom-up spatial attention competition. [more, less] luminous perceptual surfaces -> competition -> spatial attention
  • image p253fig06.02 After bottom-up surface inputs activate spatial attentional cells, they send top-down topographic excitatory signals back to the surface representations. This recurrent shunting on-center off-surround network contrast enhances larger attentional activities while approximately normalizing the total spatial attentional activity. A surface-shroud resonance hereby forms that selects an attentional shroud, enhances the perceived contrast of the attended surface (light blue region), and maintains spatial attention on it.
    || Surface-shroud resonance. perceptual surfaces -> competition -> spatial attention. (Carrasco, Penpeci-Talgar, and Eckstein 2000, Reynolds and Desimone 2003)
  • image p257fig06.05 A curve tracing task with monkeys was used by Roelfsema, Lamme, and Spekreijse in 1998 to demonstrate how spatial attention can flow along object boundaries. See the text for details.
    || Attention flows along curves: Roelfsema etal 1998: Macaque V1. fixation (300ms) -> stimulus (600ms RF - target curve, distractor) -> saccade. Crossed-curve condition: attention flows across junction between smoothly connected curve segments, Gestalt good continuation
  • image p258fig06.06 Neurophysiological data and simulation of how attention can flow along a curve. See the text for details.
    || Simulation of Roelfsema etal 1998, data & simulation. Attention directed only to far end of curve. Propagates along active layer 2/3 grouping to distal neurons.
  • image p265fig06.13 The basal ganglia gate perceptual, cognitive, emotional, and more processes through parallel loops.
    || [motor, ocularmotor, dorsolateral, ventral-orbital, anterior cingulate] vs. [Thalamus, pallidum-subs, nigra, Striatum, Cortex]
  • image p267fig06.14 Feedback from object surfaces to object boundaries uses surface contours. This feedback assures complementary consistency and enables figure-ground separation. A corollary discharge of the surface contours can be used to compite salient object feature positions.
    || Perceptual consistency and figure-ground separation.
  • image p271fig06.17 Persistent activity in IT cells is just what is needed to enable view-invariant object category learning by ARTSCAN to be generalized to [view, position, size]-invariant category learning by positional ARTSCAN, or pARTSCAN. See the text for details.
    || Persistent activity in IT. Physiological data show that persistent activity exist in IT (Fuester and Jervey 1981, Miyashita and Chang 1988, Tomita etal 1999). Adapted from (Tomita etal 1999 Nature)
  • image p273fig06.20 pARTSCAN can simulate the IT cell recoding that Li and DiCarlo reported in their swapping experiments because the swapping procedure happens without causing a parietal reset burst to occur. Thus the originally activated invariant category remains activated and can get associated with the swapped object features.
    || Simulation of Li and DiCarlo swapping data. data (Li and DiCarlo 2008), model (Cao, Grossberg, Markowitz 2011). normalized response vs. exposure (swaps and/or hours)
  • image p274fig06.21 pARTSCAN can also simulate the trade-off in IT cell responses between position invariance and selectivity that was reported by Zoccolan etal 2007. This trade-off limits the amount of position invariance that can be learned by a cortical area like V1 that is constrained by the cortical magnification factor.
    || Trade-off in IT cell response properties. Inferotemporal cortex cells with greater position invariance respond less selectively to natural objects. invariance-tolerance, selectivity-sparseness. data (Zoccolan etal 2007) model (Grossberg, Markowitzm, Cao 2011). position tolerance (PT, degrees) vs sparseness (S)
  • image p275fig06.23 Data from (Akrami etal 2009) and our simulation of it. See the text for details.
    || IT responses to image morphs. data vs model
  • image p284fig07.02 Psychophysical data (top row) and simulation (bottom row) of how persistence decreases with flash illuminance and duration.
    || Persistence data and simulations. (Francis, Grossberg, Mingolia 1994 Vision Research, 34, 1089-1104). Persistence decreases with flash illuminance and duration (Bowen, Pola, Matin 1974; Breitmeyer 1984; Coltheart 1980). Higher luminance or longer duration habituates the gated dipole ON channel more. Causes larger and faster rebound in the OFF channel to shut persisting ON activity off.
  • image p285fig07.03 Persistence decreases with flash illuminance and duration due to the way in which habituative transmitters regulate the strength of the rebound in response to offset of a stimulating input, and how this rebound inhibits previously activated bipole cells.
    || Persistence data and simulations (Francis, Grossberg, Mingolia 1994 Vision Research, 34, 1089-1104). Persistence decreases with flash illuminance and duration. Horizontal input excites a horizontal bipole cell, which supports persistence. Offset of the horizontal input causes a rebound of activity in the vertical pathway, which inhibits the horizontal bipole cell, thereby terminating persistence.
  • image p286fig07.04 Illusory contours persist longer than real contours because real contours have more inducers whose rebound at contour offset can cause faster boundary reset. Illusory contours also take longer to form than real contours, which explains the increasing portion of the curve.
    || Persistence data and simulations (Meyer, Ming 1988; Reynolds 1981). Increasing portion of curve is due to formation time of the illusory contour. Longer persistence is due to fewer bottom-up inducers of an illusory contour that has the same length as a real contour: only illuminance-derived edges generate reset signals. When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
  • image p286fig07.05 This figure shows the propagation through time of illusory contour offset from the rebounded cells that got direct inputs to the center of the contour.
    || Persistence data and simulations. Illusory contours persist longer than real contours (Meyer, Ming 1988; Reynolds 1981). When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
  • image p287fig07.06 The relative durations of persistence that occur due to an adaptation stimulus of the same or orthogonal orientation follow from the properties of the habituative gated dipoles that are embedded in the boundary completion system.
    || Persistence data and simulations. Change in persistence depends on whether adaptation stimulus has same or orthogonal orientation as test grating (Meyer, Lawson, Cohen 1975). If adaptation stimulus and test stimulus have the same orientation, they cause cumulative habituation, which causes a stronger reset signal, hence less persistence. When they are orthogonal, the competition on the ON channel is less, hence more persistence.
  • image p287fig07.07 Persistence increases with distance between a target and a masking stimulus due to weakening of the spatial competition in the first competitive stage of hypercomplex cells.
    || Persistence data and simulations. Persistence increases with distance between a target and a masking stimulus (Farrell, Pavel, Sperling 1990). There is less spatial competition from the masker to the target when they are more distant, hence the target is more persistent.
  • image p297fig08.08 When a second flash is more intense than the first flash, then apparent motion may occur from the second to the first flash.
    || Delta motion: motions from the second to the first flash. Data: (Kolers 1972; Korte 1915). Simulation: (Grossberg, Rudd 1992). This occurs when the luminance or contrast of the second flash is large compared to that of the first flash. Sustained and transient cells obey shunting dynamics whose averaging rates speed up with output intensity. The first flash to wane is the one that will be the source of the G-wave.
  • image p297fig08.09 Simulation of motion in opposite directions that is perceived when two later flashes occur on either side of the first flash.
    || Split motion. Data: (H.R. Silva 1926), Simulation: (Grossberg, Rudd 1992)
  • image p298fig08.10 Simulation of the motion speed-up that is perceived when flash duration decreases.
    || "The less you see it, the faster it moves". Data: (Giaschi, Anstis 1989), Simulation: (Grossberg, Rudd 1992). ISI = 0, flash duration decreases; SOA = constant, flash duration decreases
  • image p304fig08.22 Data (top image) and simulation (bottom image) of Korte's laws. The laws raise the question of how ISIs in the hundreds of milliseconds can cause apparent motion.
    || Korte's Laws, Data: (Korte 1915) Simulation: (Francis, Grossberg 1996)
  • image p311fig08.30 The data of (Castet etal 1993) in the left image was simulated in the right image by the 3D FORMOTION model that I developed with my PhD student Jonathan Chey. These data provide insight into how feature tracking signals propagate from the ends of a line to its interior, where they capture consistent motion directional signals and inhibit inconsistent ones.
    || Solving the aperture problem. A key design problem: How do amplified feature tracking signals propagate within depth to select the cirrect motion directions at ambiguous positions? This propagation from feature tracking signals to the line interior determines perceived speed in Castet etal data, which is why speed depends on line tilt and length. Data: (Castet etal 1993), Simulation: (Chey etal 1997)
  • image p319fig08.38 The neurophysiological data from MT (left image) confirms the prediction embodied in the simulation of MT (right image) concerning the fact that it takes a long time for MT to compute an object's real direction of motion.
    || Solving the aperture problem takes time. MT Data (Pack, Born 2001), MT simulation (Chey, Grossberg, Mingolia 1997)
  • image p333fig08.58 Neurophysiological data (left image) and simulation (right image) of LIP data during correct trials on the RT task. See the text for details.
    || LIP responses during RT task correct trials (Roltman, Shadlen 2002). More coherence in favored direction causes faster cell activation. More coherence in opposite direction causes faster cell inhibition. Coherence stops playing a role in the final stages of LIP firing.
  • image p334fig08.59 Neurophysiological data (left column) and simulations (right column) of LIP responses for the FD task during both [correct, error] trials. See the text for details.
    || LIP responses for the FD task during both [correct, error] trials (Shadlen, Newsome 2001). LIP encodes the perceptual decision regardless of the true direction of the dots. Predictiveness of LIP responses on error trials decreases with increasing coherence.
  • image p334fig08.60 Behavioral data (left image) and simulation (right image) about accuracy in both the RT and FD tasks. See text for details
    || Behavioral data: % correct vs % coherence (Mazurek etal 2003; Roltman, Shadien 2002). More coherence in the motion causes more accurate decisions. RT task accuracy at weaker coherence levels is slightly better than FD task accuracy.
  • image p335fig08.61 Behavioral data (left image) and simulation (right image) about speed in correct and error trials of the RT task. See text for details.
    || Behavioral data: speed, correct and error trials (RT task) (Roltman, Shadien 2002). More coherence in the motion causes faster reaction time.
  • image p335fig08.62 More remarkable simulation fits (right column) to LIP neurophysiology data (left column) about where and when to move the eyes.
    || LIP encodes not only where, but also when, to move the eyes. ...No Bayes(Roltman, Shadien 2002). Firing rate (sp/s) vs time (ms). Slope of firing rate (sp/s^2) vs % correct.
  • image p342fig09.11 Psychophysical data (left panel) and computer simulation (right column) of the importance of efference copy in real movements. See the text for details.
    || Heading: move to wall and fixate stationary object (adapted from Warren, Hannon 1990). Inaccurate for simulated eye rotation, accurate for real eye rotation, need confirmation by efference copy!
  • image p343fig09.13 When one scans the three different types of pears in the left image, as illustrated by the jagged blue curve with red movement end positions, and transforms the resulting retinal images via the cortical magnification factor, or log polar mapping, the result is the series of images in the right column. How do our brains figure out from such confusing data which views belong to which pear?
    || View-invariant object learning and recognition Three pears: Anjou, Bartlett, Comice. Which is the Bartlett pear? During unsupervised scanning and learning about the world, no one tells the brain what views belong to which objects while it learns view-invariant object categories. Cortical magnificantion in V1.
  • image p349fig09.20 Using virtual reality displays (left image), (Fajen, Warren 2003) collected data (right two images) about how observers avoid obstacles (open circular disks) as a function of their distance and angular position as they navigate towards a fixed goal (x). These data illustrate how goals act as attractors while obstacles act as repellers.
    || Steering from optic flow (Fajen, Warren 2003). goals are attractors, obstacles are repellers. Damped spring model explains human steering data.
  • image p352fig09.26 The final stage of the model computes a beautiful expansion optic flow that permits an easy estimate of the heading direction, with an accuracy that matches that of human navigators.
    || The model generates accurate heading (Warren, Hannon 1990; Royden, Crowell, Banks 1994). Maximally active MSTd cell = heading estimate. Accuracy matches human data. Random dots [mean +-1.5°, worst +-3.8°], Random dots with rotation [accurate with rotations <1°/s, rotation increases, error decreases], OpenGL & Yosemite benchmark +-1.5°, Driving video +-3°.
  • image p356fig10.03 Laminar computing achieves at least three basic properties of visual processing that have analogs in all biologically intelligent behaviors. These properties may be found in all cortical circuits in specialized form.
    || What does Laminar Computing achieve? 1. Self-stabilizing development and learning; 2. Seamless fusion of a) pre-attentive automatic bottom-up processing, b) attentive task-selective top-down processing; 3. Analog coherence: Solution of Binding Problem for perceptual grouping without loss of analog sensitivity. Even the earliest visual cortical stages carry out active adaptive information processing: [learn, group, attention]ing
  • image p357fig10.04 Laminar Computing achieves its properties by computing in a new way that sythesizes the best properties of feedforward and feedback interactions, analog and digital computations, and preattentive and attentive learning. The property of analog coherence enables coherent groupings and decisions to form without losing sensitivity to the amount of evidence that supports them.
    || Laminar Computing: a new way to compute. 1. Feedforward and feedback: a) Fast feedforward processing when data are unambiguous (eg Thorpe etal), b) slower feedback chooses among ambiguous alternatives [self-normalizing property, real-time probabiligy theory], c) A self-organizing system that trades certainty against speed: Goes beyond Bayesian models! 2. Analog and Digital: Analog Coherence combines the stability of digital with the sensitivity of analog. 3. Preattentive and Attentive Learning: Reconciles the differences of (eg) Helmholtz and Kanizsa, "A preattentive grouping is its own 'attentional' prime"
  • image p360fig10.09 Perceptual grouping is carried out in layer 2/3 by long-range horizontal excitatory recurrent connections, supplemented by short-range disynaptic inhibitory connections that together realize the bipole grouping properties that are diagrammed in Figure 10.10.
    || Grouping starts in layer 2/3. LGN-> 6-> 4-> 2/3: 1. Long-range horizontal excitation links collinear, coaxial receptive fields (Gilbert, Wiesel 1989; Bosking etal 1997; Schmidt etal 1997) 2. Short-range disynaptic inhibition of target pyramidal via pool of intraneurons (Hirsch, Gilbert 1991) 3. Unambiguous groupings can form and generate feedforward outputs quickly (Thorpe etal 1996).
  • image p361fig10.10 Bipole grouping is achieved by long-range horizontal recurrent connections that also give rise to short-range inhibitory interneurons which inhibit nearby bipole cells as well as each other.
    || Bipole property controls perceptual grouping. Collinear input on both sides. Excitatory inputs summate. Inhibitory inputs normalize, Shunting inhibition! Two-against-one. Cell is excited.
  • image p367fig10.15 Data (left column) and simulation (right column) of how attention prevents a masking stimulus from inhibiting the response to the on-center of the cell from which the recording was made.
    || Attention protects target from masking stimulus (Reynolds etal 1999; Grossberg, Raizada 2000).
  • image p367fig10.16 Neurophysiological data (left image) and simulation (right image) of how a low-contrast target can be facilitated if it is surrounded by a paid (31May2023 Howell - is word correct?) of collinear flankers, and suppresssed by them if it has high contrast.
    || Flankers can enhance or suppress targets (Polat etal 1998; Grossberg, Raizada 2000). target alone, target + flankers, flankers alone.
  • image p368fig10.17 Neurophysiological data (left image) and simulation (right image) showing that attention has a greater effect on low contrast than high contrast targets.
    || Attention has greater effect on low contrast targets (DeWeerd etal 1999; Raizada, Grossberg 2001). Threshold increase (deg) vs Grating contrast (%), [no, with] attention
  • image p368fig10.18 Neurophysiological data (left image) and simulation (right image) of relative on-cell activities when the input to that cell may also be surroubded by iso-orientation or perpendicular textures.
    || Texture reduces response to a bar: iso-orientation suppression (Knierim, van Essen 1992), perpendicular suppression (Raizada, Grossberg 2001)
  • image p369fig10.19 Data from (Watanabe etal 2001) showing perceptual learning of the coherent motion direction, despite the lack of extra-foveal attention and awareness of the moving stimuli.
    || Unconscious perceptual learning of motion direction, % correct for two tests, compared to chance level results.
  • image p393fig11.31 (Todd, Akerstrom 1987) created a series of 2D images from discrete black patches on a white disk and showed how the perceived depth varies with the factors summarized in the figure. The LIGHTSHAFT model quantitatively simulated their data.
    || Factors determining depth-from-texture percept. Perceived depth varies with texture element width, but only when elements are elongated and sufficiently aligned with one another to form long-range groupings. Data of (Todd, Akerstrom 1987) simulated by the LIGHTSHAFT model of (Grossberg, Kuhlmann 2007). [HP, LP, CCE, CCS, RO]
  • image p399fig11.39 Simulation of the eye rivalry data of (Lee, Blake 1999).
    || [Binocular, [left, right] eye] activity
  • image p402fig11.43 A pair of disparate images of a scene from the University of Tsukuba. Multiview imagre database.
    || input [left, right]
  • image p407fig12.03 Neurophysiological data showing how motor cortical cells code different vectors that are sensitive to both the direction of the commanded movement and its length.
    || (a) Single primary motor cortex neuron, onset of movement -> on..., radial architecture... (b) Motor cortex neuronal population, radial architecture...
  • image p409fig12.04 (top half) Neurophysiological data of vector cell responses in motor cortex. (bottom half) VITE model simulations of a simple movement in which the model's difference vector simulates the data as an emergent property of network interactions.
    || Neurophysiological data. VITE model [Present Position vector, Difference vector, Outflow velocity vector, go signal].
  • image p410fig12.06 Monkeys seamlessly transformed a movement initiated towards the 2 o'clock target into one towards the 10 o'clock target when the later target was substituted 50 or 100 msec after activation of the first target light.
    ||
  • image p414fig12.11 Neurophysiological data from cortical areas 4 and 5 (every other column) and simulations thereof (other columns) during a reach.
    || activation vs time. (a) area 4 phasic RT (IFV) (b) area 4 tonic (OPV) (c) area 4 phasic-tonic (OFPV) (d) area 4 phasic MT (DVV) (e) area 5 phasic (DV) (f) area 5 tonic (PPV)
  • image p424fig12.21 One of the many types of data about pitch processing that are simulated by the SPINET model. See the text for details.
    || Pitch shifts with component shifts (Patterson, Wightman 1976; Schouten 1962). Pitch vs lowest harmonic number.
  • image p425fig12.23 ARTSTREAM simulations of the auditory continuity illusion and other streaming properties (left column, top row). When two tones are separated by silence (Input), a percept of silence also separates them in a spectral-pitch resonance. (left column, bottom row). When two tones are separated by broadband noise, the percept of tone continues through the noise in one stream (stream 1) while the remainder of the noise occurs in a different stream (stream 2). (right column) Some of the other streaming properties that have been simulated by the ARTSTREAM model.
    || Auditory continuity does not occur without noise. Auditory continuity in noise. Other simulated streaming data.
  • image p428fig12.25 (left architecture) Auditory-articulatory feedback loop whereby babbled sounds active learning in an imitative map that is later used to learn to reproduce the sounds of other speakers. An articulatory-to-auditory expectation renders learning possible by making the auditory and motor data dimensionally consistent, as in the motor theory of speech. (right architecture) Parallel streams in the ARTSPEECH model for learning speaker-independent speech and language meaning, including a mechanism for speaker normalization (right cortical stream) and for learning speaker-dependent vocalic qualities (left cortical stream).
    || left: Speaker-dependent vocalic qualities; right: Speaker-independent speech and language meaning
  • image p432fig12.28 (left image) The SpaN model simulates how spatial representations of numerical quantities are generated in the parietal cortex. (right image) Behavior numerosity data and SpaN model simulations of it.
    || (Left) preprocessor-> spatial number map-> Comparison wave. (Right) data axis: number of lever presses; model axis: node position in the spatial number axis
  • image p437fig12.32 Data from a free recall experiment illustrate the bowed serial position curve.
    || Serial position function for free recall Data: (Murdock 1962 JEP 64, 482-488). % correct vs position of word on a 40-word list. Primacy gradient can be a mixture of STM and LTM read-out.
  • image p437fig12.33 Item and Order working memory models explain free recall data, as well as many other psychological and neurobiological data, by simulating how temporal series of events are stored as evolving spatial patterns of activity at content-addressable item categories. The categories with the largest activities are rehearsed first, and self-inhibit their activity as they do so in order to prevent tem from being rehearsed perseveratively. The laws whereby the items are stored in working memory obey basic design principles concerning list categories, or chunks, of sequences of stored items can be stably remembered.
    || Working memory models: item and order, or competitive queuing (Grossberg 1978; Houghton 1990; Page, Norris 1998). Event sequence in time stored as an evolving spatial pattern of activity. Primacy gradient of working memory activation stores correct temporal order at content-addressable cells. Maximally activated cell populations is performed next when a rehearsal wave is turned on. Output signal from chosen cell population inhibits its own activity to prevent perseveration: inhibition of return. Iterate until entire sequence is performed.
  • image p443fig12.41 Neurophysiological data from the Averbeck etal sequential copying experiments show the predicted primacy gradient in working memory and the self-inhibition of activity as an item is stored. When only the last item remains stored, it has the highest activity becasuse it has been freed from inhibition by earlier items.
    || Neurophysiology of sequential copying
  • image p444fig12.42 The LIST PARSE laminar cortical model of working memory and list chunking that I published with Lance Pearson in 2008 simulated the Averbeck etal data in Figure 12.41, as in the left column of the figure. It also simulated cognitive data about working memory storage by human subjects. See the text for details.
    || LIST PARSE: Laminar cortical model of working memory and list chunking (Grossberg, Pearson 2008). Simulates data about: [immediate, delayed, continuous] distractor free recall; immediate serial recall; and variable-speed sequential perfomance of motor acts. [velocity, acceleration] vs time (ms) from recall cue.
  • image p446fig12.44 (left column, top row) LIST PARSE can model linguistic data from human subjects. In this figure, model parameters are fixed to enable a close fit to data about error-type distributions in immediate free recall experiments, notably transposition errors. (right column, top row) Simulation and data showing bowing of the serial position curve, including an extended primacy gradient. (left column, bottom row) The simulation curve overlays data about list length effects, notably the increasing recall difficulty of longer lists during immediate serial recall (ISR). (right column, bottom row) Simulation (bottom image) and data (top image) of the limited temporal extent for recall.
    || (1. TL) Error-type distributions in immediate serial recall (Hanson etal 1996). % occurence vs serial position. Graph convention: Data- dashed lines; Simulations- solid lines. Six letter visual ISR. Order errors- transpositions of neighboring items are the most common. Model explanation: Noisy activation levels change relative oorder in primacy gradient. Similar activation of neighboring items most susceptible to noise. Model paramenters fitted on these data. (2. TR) Bowing of serial position curve (Cowan etal 1999). % correct vs serial position. Auditory ISR with various list lengths (graphs shifted rightward): For [, sub-]span lists- extended primacy, with one (or two) item recency; Auditory presentation- enhanced performance for last items. LIST PARSE: End effects- first and last items half as many members; Echoic memory- last presented item retained in separate store. (3. BL) List length effects, circles (Crannell, Parrish 1968), squares (Baddeley, Hitch 1975), solid line- simulation. % list correct vs list length. Variable list length ISR: longer lists are more difficult to recall. LIST PARSE: More items- closer activation levels and lower absolute activity level with enough inputs; Noise is more likely to produce order errors, Activity levels more likely to drop below threshold;. (4. BR) Limited temporal extent for recall (Murdoch 1961). % recalled vs retention interval (s). ISR task with distractor-filled retention intervals (to prevent rehersal): Increasing retention interval - decreases probabilty of recalling list correctly; Load dependence- longer list more affected by delays; Performance plateau- subjects reach apparent asymptote. LIST PARSE: Increase convergence of activities with time; loss of order information;.
  • image p447fig12.45 (left column) LIST PARSE simulations of the proportion of order errors as a function of serial position for 6 item lists with (a) an extended pause of 7 time units between the third and fourth items, and (b) pauses of 5 time units (solid curve) and 10 time units (dashed curve) between all items. (right column) Simulations (solid curves) and data (dashed curves) illustrating close model fits in various immediate free recall tasks.
    || (Left) Temporal grouping and presentation variability. Temporal grouping: Inserting an extended pause leads to inter-group bowing; Significantly different times of integration and activity levels across pause, fewer interchanges. (Right) Immediate free recall, and [delayed, continuous] distractor-free recall. Overt rehersal IFR task with super-span (ie 20 item) lists: Extended recency- even more extended with shorter ISIs; Increased probability of recall with diminished time from last rehearsal; Early items in list rehearsed most;. LIST PARSE (unique) for long lists: Incoming items from recency gradient; Rehearsal (re-presentation) based upon level of activity.
  • image p452fig12.48 (left column) In experiments of (Repp etal 1978), the silence duration between the words GRAY and SHIP was varied, as was the duration of the fricative noise in S, with surprising results. (right column) The red arrow directs our attention to surprising perceptual changes as silence and noise durations increase. See the text for details.
    || Perceptual integration of acoustic cues, data (Repp etal 1978). GRAY-> silence duration-> SHIP (noise duration from start of word). Noise duration vs silence duration: GRAY SHIP <-> [GREAT SHIP <-> GRAY CHIP] <-> GREAT CHIP.
  • image p453fig12.49 The ARTWORD model that I published in 2000 with my PhD student Christopher Myers simulates data such as the (Repp etal 1978) data in Figure 12.68. See the text for details.
    || ARTWORD model (Grossberg, Myers 2000). Input phonetic features-> Phonemic item working memory-> Masking Field unitized lists-> Automatic gain control-> Phonemic item working memory. [habituative gate, adaptive filter]s.
  • image p465fig12.63 Neurophysiological data (left image) and lisTELOS stimulation (right figure) showing how microstimulation biases saccadic performance order but not the positions to which the saccades will be directed. See the text for details.
    || Saccade trajectories converge to a single location in space. Microstimulation biased selection so saccade trajectories converged toward a single location in space. [Data, model] contra <-> Ipsi (msec)
  • image p468fig12.65 Linguistic properties of the PHONET model and some of the data that it simulates. The upper left image summarizes the asymmetric transient-to-sustained gain control that helps to create invariant intraword ratios during variable-rate speech. The lower left image summarizes the rate-dependent gain control of the ARTPHONE model that creates rate-invariant working memory representations in response to sequences of variable-rate speech. The right image summarizes the kind of paradoxical VC-CV category boundary data of (Repp 1980) that ARTPHONE simulates. See the text for details.
    || (left upper) [transient, sustained] [working memory, filter, category]. (left lower) phone inputs-> [input rate estimate, features], Features w <- habituative transmitter gates -> categories-> rate invariant phonetic output, input rate estimate-> gain control-> [features, categories] rate-dependent integration of categories and features. (right) % 2-stop vs VC-CV silent interval (msec): [ib-ga, ib-ba, iga, iba].
  • image p469fig12.66 (left column) A schematic of how preserving relative duration, as in the first and third images, of consonant and vowel pairs can preserve a percept, in this case of /ba/, but not doing so, as in the first and second images, can cause a change in percept, as from /ba/ to /wa/, as in the data of (Miller, Liberman 1979) that PHONET simulates. (right column) Changing frequency extent can also cause a /ba/ - /wa/ transition, as shown in data of (Schwab, Sawusch, Nusbaum 1981) that PHONET also simulates.
    || (left image) Maintaining relative duration as speech speeds up preserves percept (Miller, Liberman 1979). frequency vs time- [/ba/, /wa/, /ba/] (right image) Changing frequency extent causes /b/-/wa/ transition (Schwab, Sawusch, Nusbaum 1981). frequency vs time- [/ba/, /wa/] Dt extent.
  • image p473fig12.69 Error rate and mean reaction time (RT) data from the lexical decision experiments of (Schvanevelt, McDonald 1981). ART Matching Rule properties explain these data in (Grossberg, Stone 1986).
    || (left) Error rate vs type of prime [R, N, U], [non,] word. (right) Mean RT (msec) vs type of prime [R, N, U], [non,] word.
  • image p474fig12.70 The kind of model macrocircuit that was used in (Grossberg, Stone 1986) to explain lexical decision task data.
    || inputs-> A1 <-> A2 oconic sensory features <-> A3 item and order in sensory STM <-> A4 list parsing in STM (masking field) <-> A5 semantic network (self-feedback). [A4, A5] <-> V* visual object recognition system. M1-> [outputs, A1]. M1 <-> M2 iconic motor features <-> M3 item and order in motor STM. A2-> M2. A3-> M3.
  • image p476fig12.71 Word frequency data of (Underwood, Freund 1970) that were explained in (Grossberg, Stone 1986).
    || percent errors vs frequency of old words [L-H to H-H, L-L to H-L].
  • image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
    || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
  • image p485fig13.06 (left column) An inverted-U occurs in conditioned reinforcer strength as a function of the ISI between the CS and the US. Why is learning attenuated at 0 ISI? (right column) Some classical conditioning data that illustrate the inverted-U in conditioning as a function of the ISI.
    || InterStimulus Interval (ISI) effect. Data from (Dmith etal 1969; Schneiderman, Gormezano 1964).
  • image p490fig13.15 The CogEM circuit is an ancient design that is found even in mollusks like Aplysia. See the text for details.
    || Aplysia (Buononamo, Baxter, Byrne, Neural Networks 1990; Grossberg, Behavioral and Brain Sciences 1983). Facilitator neuron ~ drive representation.
  • image p504fig13.31 Behavioral contrast can occur during reinforcement learning due to decreases in either positive or negative reinforcers. See Figure 13.32 for illustrative operant conditioning data.
    || Behavioral contrast: rebounds! Shock level vs trials. 1. A sudden decrease in frequency or amount of food can act as a negative reinforcer: Frustration. 2. A sudden decrease in frequency or amount of shock can act as a positive reinforcer: Relief.
  • image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
    ||
  • image p533fig14.09 Search data and ARTSCENE Search simulations of them in each pair of images from (A) to (F). See the text for details.
    || 6*[data vs simulation], [Response time (ms) versus epoch].
  • image p542fig15.04 Conditioning data from (Smith 1968; Millenson etal 1977). The former shows the kind of Weber Law and inverted U that were simulated in Figure 15.3. The latter shows that, if there are two ISIs during an experiment, then the animals learn to adaptively time their responses with two properly scaled Weber laws.
    || (left) One ISI (Smith 1968) [mean membrane extension (mm) versus time after CS onset (msec)]. (right) Two ISIs (Millenson etal 1977) [200, 100] msec CS test trials, [mean momentary CS amplitude (mm) vs time after CS onset (msec)]. (bottom) Conditioned eye blinks, made with nictitating membrane and/or eyelid, are adaptively timed: peak closure occurs at expected time(s) of arrival of the US following the CS and obeys a Weber Law.
  • image p543fig15.05 Simulation of conditioning with two ISIs that generate their own Weber Laws, as in the data shown in Figure 15.4.
    || Learning with two ISIs: simulation: R = sum[all: f(xi)*yi*xi] vs msec. Each peak obeys Weber Law! strong evidence for spectral learning.
  • image p556fig15.24 (a) Data showing normally timed responding (solid curve) and short latency responses after lesioning cerebellar cortex (dashed curve). (b) computer simulation of short latency response after ablation of model cerebellar cortex.
    ||
  • image p559fig15.28 Neurophysiological data (left column) and model simulations (right column) of SNc responses. See the text for details.
    || membrane potential vs time
  • image p573fig16.01 The experimental chamber (A) and neurophysiological recordings from a rat hippocampus (B) that led to the discovery of place cells. See the text for details.
    ||
  • image p574fig16.02 Neurophysiological recordings of 18 different place cell receptive fields. See the text for details.
    ||
  • image p575fig16.03 As a rat navigates in its experimental chamber (black curves), neurophysiological recordings disclose the firing patterns (in red) of (a) a hippocampal place cell and (b) an entrorhinal grid cell.
    ||
  • image p582fig16.08 Some experimental evidence for stripe-like cell receptive fields has been reported. The band cells posited by Neil Burgess also exhibit the one-dimensional firing symmetry of stripe cells, but are modeled by oscillatory intererence. See the text for details.
    || Evidence for stripe-like cells. Entorhinal cortex data (Sargolini, Fyhn, Hafting, McNaughton, Witter, Moser, Moser 2006; Krupic, Burgess, O'Keefe 2012). Similar hypothetical construct used by Interference model but position is decoded by grid cell oscillatory interference- Band Cells (Burgess 2008).
  • image p589fig16.19 Neurophysiological data showing the smaller dorsal grid cell scales and the larger ventral grid cell scales.
    || Spatial scale of grid cells increase along the MEC dorsoventral axis (Hafting etal 2005; Sargolini etal 2006; Brun etal 2008). [dorsal (left), ventral (right)] cart [rate map, autocortelogram]. How does the spatial scale increase along the MEC dorsoventral axis?
  • image p593fig16.26 Data (left column) and simulations (right column) of the gradient of increasing grid cell spacing along the dorsoventral axis of MEC.
    || Gradient of grid spacing along dorsoventral axis of MEC (Brun etal 2008). data-[Distance (m?), Median grid spacing (m?)] simulations-[Grid spacing (cm), Grid spacing (cm)] vs response rate.
  • image p594fig16.27 Data (left column) and simulations (right column) of the gradient of increasing grid cell field width along the dorsoventral axis of MEC.
    || Gradient of field width along dorsoventral axis of MEC (Brun etal 2008). data-[Distance (m?), Width autocorr peak (m?)] simulations-[Grid field width (cm), Width autocorr peak (cm)] vs response rate.
  • image p595fig16.28 Data (left column) and simulations (right column) about peak and mean grid cell response rates along the dorsoventral axis of MEC.
    || Peak and mean rates at different locations along DV axis of MEC (Brun etal 2008). Peak rate (Hz) vs [data- DV quarter, simulations- Response rate].
  • image p596fig16.29 Data (top row) and simulations (bottom row) showing decreasing frequency of subthreshold membrane potential oscillations along the DV axis of MEC.
    || Subthreshold membrane potential oscillations at different locations along DV axis of MEC (Giocomo etal 2020; Yoshida etal 2011). Data [oscillations (Hz) vs distance from dorsal surface (mm) @[-50, -45] mV, Frequency (Hz) vs [-58, -54, -50] mV]. Simulations MPO frequency (Hz) s [response, habituation] rate.
  • image p596fig16.30 Data (top row) and simulations (bottom row) of spatial phases of learned grid and place cells.
    || Spatial phases of learned grid and place cells (Hafting etal 2005). Data: Cross-correlogram of rate maps of two grid cells; Distribution of phase difference: distance from origin to nearest peak in cross-correlogram. Simulations: Grid cell histogram of spatial correlation coefficients; Place cell histogram of spatial correlation coefficients.
  • image p597fig16.31 Data (a) and simulations (b-d) about multimodal place cell receptive fields in large spaces. The simulations are the result of learned place fields.
    || Multimodal place cell firing in large spaces (Fenton etal 2008; Henriksen etal 2010; Park etal 2011). Number of cells (%) vs Number of place fields. [2, 3] place fields, 100*100 cm space.
  • image p597fig16.32 Data (top row) and simulations (bottom row) about grid cell development in juvenile rats. Grid score increases (a-b and d), whereas grid spacing remains fairly flat (c and e).
    || Model fits data about grid cell development (Wills etal 2010; Langston etal 2010). Data: [Gridness, grid score, inter-field distance (cm)]. Simulations: [Gridness score, Grid spacing (cm)] vs trial.
  • image p598fig16.33 Data (top row) and simulations (bottom row) of changes in place cell properties in juvenile rats, notably about spatial information (a,c) and inter-trial stability (b,d).
    || Model fits data about grid cell development (Wills etal 2010). [Data, Simulation] vs [spatial information, inter-trial stability]. x-axis [age (postnatal day), trial].
  • image p599fig16.35 Data (a) and simulations (b,c) about anatomically overlapping grid cell modules. (a) shows the anatomical distribution of grid cells belonging to different modules in one animal. DV location (mm) vs postrhinal border. (b) shows the simulated distribution of learned grid cell spacings from two stripe cell scales. frequency (%) vs grid spacing (cm). mu = [1, 0.6]. (c) shows what happens when half the cells respond with one rate and half another rate. (d) shows the same with three rates. (e-g) show spatial maps and autocorrelograms of grid cells that arise from the different rates in (d). [rate map, autocorelogram] vs [score [1.07, 0.5, 0.67], spacing (cm) [23.58, 41, 63.64]].
    ||
  • image p602fig16.37 Data showing the effect of hippocampal inactivation by muscimol on grid cell firing before, during, and six hours after the muscimol, reading from left to right.
    || Hippocampal inactivation disrupts grid cells (Bonnevie etal 2013). muscimole inactivation. spikes on trajectory: [before, after min [6-20, 20-40, 40-60, 6h]]. rate map (Hz) [18.6, 11.4, 9.5, 6.7, 10.8]. spatial autocorrelogram g=[1.12, 0.05, -0.34, 0.09, 1.27].
  • image p603fig16.38 Role of hippocampal feedback in maintaining grid fields. (a) Data showing the effect of hippocampal inactivation before and during muscimol inhibition of hippocampal cells, as in Figure 16.37. (b) Model simulation with normal grid fields. (c) Model simulation that emulates the effect of hippocampal inhibition on grid fields.
    || (a) Data: hippocampal inactivation [before, after] cart [spikes on trajectory (p: [18.6, 6.7] Hz), spatial autocorrelogram (g= [1.12, 0.09])]. (b) Model: noise-free path integration, [spikes on trajectory (p: 14.56 Hz), rate map, spatial autocorrelogram (g= 1.41), dynamic autocorrelogram (g=0.6)]. (c) Model: noisy path integration + non-specific tonic inhibition, [spikes on trajectory (p: 11.33 Hz), rate map, spatial autocorrelogram (g= 0.05), dynamic autocorrelogram (g=0.047)].
  • image p605fig16.39 Data showing effects of medial septum (MS) inactivation on grid cells and network theta oscillations in medial entorhinal cortex (MEC). (A) Examples of disruption in the spatial expression of the hexagonal grid structure for two grid cells (Brandon etal 2011). (B) Temporal reduction in the power and frequency of network theta oscillations (Koenig etal 2011). (C) Temporary reduction in the gridness score, mean firing rate, and patial stability of grid cells (Koenig etal 2011).
    || Disruptive effects of Medial Septum inactivation in Medial Entorhinal Cortex (Brandon etal 2011; Koenig etal 2011). (A) Rate map [rate map, spatial autocorrelations, trajectory] vs [baseline, sub-sampled, medial septum inactivation, 3-6 hour recovery, 24 hour recovery], [rate map (Hz- m, p), spatial autocorrelations (gridness)][ 1.2, 7.2, 1.1; 0.25, 1.7, 0.6; 0.25, 2.5, -0.53; 0.7, 5.1, 0.55; 1.0, 5.3, 1.3; 2.1, 15, 0.19; 1.7, 12, 0.71; 1.7, 3.2, -0.22; 1.8, 9.1, 0.68; 2.5, 13, 0.46]. (B) [normalized power at 7-9 Hz, frequency (Hz)] vs 5-minute periods. (C) [mean gridness score (+-SEM), mean firing rate (% of baseline), mean correlation coeff (+-SEM)] vs 10-minute periods.
  • image p607fig16.40 Effects of medial septum (MS) inactivation on grid cells. (a) Each row shows data and different data-derived measures of grid cell responsiveness, starting from the left with the baseline response to the middle column with maximal inhibition. (b) Data showing the temporary reduction in the gridness scores during MS inactivation, followed by recovery. (c) Simulation of the collapse in gridness, achieved by reduction in cell response rates to mimic reduced cholinergic transmission. (d,e) Simulations of the reduction in gridness scores in (d) by reduction of cell response rates, in (e) by changing the leak conductance. See the text for details.
    ||
  • Grossberg 2021 p229c2h0.60 SMART computer simulations demonstrate that a good enough match of a top-down expectation with a bottom-up feature pattern generates an attentive resonance during which the spikes of active cells synchronize in the gamma frequency range of 20-70 Hz (Figure 5.40). Many labs have reported a link between attention and gamma oscillations in the brain, including two articles published in 2001, one from the laboratory of Robert Desimone when he was at the the National Institute of Mental Health in Bethseda (Fries, Reynolds, Rorie, Desimone 2001), and the other from the laboratory of Wolf Singer in Frankfurt (Engel, Fries, Singer 2001). You'll note that Pascal Fries participated in both studies, and is an acknowledged leader in neurobiological studies of gamma oscillations; eg (Fries 2009). .."
  • Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
    The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

    Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

    Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

    These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

    This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

    The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

    I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

    Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

    The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

    If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

    The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

    Howell: bold added for emphasis.
    (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
    see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
  • p190 Howell: [neural microcircuits, modal architectures] used in ART -
    bottom-up filterstop-down expectationspurpose
    instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
    LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
    EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
    auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
    visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
    red - cognitive-emotional dynamics
    green - working memory dynamics
    black - see [bottom-up, top-down] lists
    EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

    backgound colours in the table signify :
    whitegeneral microcircuit : a possible component of ART architecture
    lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
    light bluepost-perceptual cognition?
    pink"the feeling of what happens" and knowing what event caused that feeling
  • image p085fig02.38 Our models have been used in many large-scale applications to engineering and technology. Linking brain to behavior explains how brain mechanisms give rise to psychological functions, and do so autonomously. The combination of mechanism, function, and autonomy helps to explain their value in helping to solve outstanding problems in technology.
    || Modeling method and cycle.
    Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
    Technology: Mathematical model and analysis <-> Technological applications
    At every stage, spin off new model designs and mechanisms to technologist who need autonomous intelligent applications.
  • image p225fig05.33 Some early ARTMAP benchmark studies. These successes led to the use of ARTMAP, and many variants that we and other groups have developed, in many large-scale applications in engineering and technology that has not abated even today.
    || see Early ARTMAP benchmark studies
  • image p490fig13.15 The CogEM circuit is an ancient design that is found even in mollusks like Aplysia. See the text for details.
    || Aplysia (Buononamo, Baxter, Byrne, Neural Networks 1990; Grossberg, Behavioral and Brain Sciences 1983). Facilitator neuron ~ drive representation.
  • image p563fig15.33 The basal ganglia gate neural processing in many parts of the brain. The feedback loop through the lateral orbitofrontal cortex (blue arrow, lateral orbitofrontal) is the one that MOTIVATOR models.
    || MOTIVATOR models one of several thalamocortical loops through basal ganglia (Adapted from Fundamental Neuroscience. 2002 Copyright Elsevier). [cortex-> striatum-> pallidum S. nigra-> thalamus] vs [motor, oculomotor, dorsolateral prefrontal, lateral orbitofrontal, anterior cingulate]. thalamus-> [striatum, cortex].
  • image p563fig15.34 The colored regions are distinct parts of the basal ganglia in the loops depicted in Figure 15.33.
    || Distinct basal ganglia zones for each loop (Adapted from Fundamental Neuroscience. 2002 Copyright Elsevier). Menu
  • p370 Chapter 11 means (Grossberg 2021) page 370, Chapter 11
    p002sec Illusion and realitymeans (Grossberg 2021) page 2, section Illusion and reality
    p013fig01.09means (Grossberg 2021) page 13, Figure 1.09 (1.9 as in book)
    p030tbl01.02 means (Grossberg 2021) page 30, Table 1.02 (1.2 as in book)
    p111c2h0.5means (Grossberg 2021) page 111, column 2, height from top as fraction of page height
    || text...Are notes in addition to [figure, table] captions, mostly comprised of text within the image, but also including quotes of text in the book. Rarely, it includes comments by Howell preceded by "Howell". The latter are distinct from "readers notes" (see, for example : reader Howell notes).
    p044 Howell: grepStr 'conscious' means a comment by reader Howell, extracted using the grep string shown, referring to page 44 in (Grossberg 2021)
  • p00I PrefacePreface - Biological intelligence in sickness, health, and technology
  • p001 Chapter 1 Overview - From Complementary Computing and Adaptive Resonance to conscious awareness
  • p050 Chapter 2 How a brain makes a mind - Physics and psychology split as brain theories were born
  • p086 Chapter 3 How a brain sees: Constructing reality - Visual reality as illusions that explain how we see art
  • p122 Chapter 4 How a brain sees: Neural mechanisms - From boundary completion and surface flling-in to figure-ground perception
  • p184 Chapter 5 Learning to attend, recognize, and predict the world -
  • p250 Chapter 6 Conscious seeing and invariant recognition - Complementary cortical streams coordinate attention for seeing and recognition
  • p280 Chapter 7 How do we see a changing world? - How vision regulates object and scene persistence
  • p289 Chapter 8 How we see and recognize object motion - Visual form and motion perception obey complementary laws
  • p337 Chapter 9 Target tracking, navigation, and decision-making - Visual tracking and navigation obey complementary laws
  • p353 Chapter 10 Laminar computing by cerebral cortex - Towards a unified theory of biologucal and artificial intelligence
  • p370 Chapter 11 How we see the world in depth - From 3D vision to how 2D pictures induce 3D percepts
  • p404 Chapter 12From seeing and reaching to hearing and speaking - Circular reaction, streaming, working memory, chunking, and number
  • p480 Chapter 13 From knowing to feeling - How emotion regulates motivation, attention, decision, and action
  • p517 Chapter 14 How prefrontal cortex works - Cognitive working memory, planning, and emotion conjointly achieved valued goals
  • p539 Chapter 15 Adaptively timed learning - How timed motivation regulates conscious learning and memory consolidation
  • p572 Chapter 16 Learning maps to navigate space - From grid, place, and time cells to autonomous mobile agents
  • p618 Chapter 17 A universal development code - Mental measurements embody universal laws of cell biology and physics Menu
    p370 Chapter 11 means (Grossberg 2021) page 370, Chapter 11
    p002sec Illusion and realitymeans (Grossberg 2021) page 2, section Illusion and reality
    p013fig01.09means (Grossberg 2021) page 13, Figure 1.09 (1.9 as in book)
    p030tbl01.02 means (Grossberg 2021) page 30, Table 1.02 (1.2 as in book)
    p111c2h0.5means (Grossberg 2021) page 111, column 2, height from top as fraction of page height
    || text...Are notes in addition to [figure, table] captions, mostly comprised of text within the image, but also including quotes of text in the book. Rarely, it includes comments by Howell preceded by "Howell". The latter are distinct from "readers notes" (see, for example : reader Howell notes).
    p044 Howell: grepStr 'conscious' means a comment by reader Howell, extracted using the grep string shown, referring to page 44 in (Grossberg 2021)
  • image pxvifig00.01 Macrocircuit of the visual system
  • image p002fig01.01 The difference between seeing and recognizing.
    || (W. Epstein, R. Gregory, H. von Helmholtz, G. Kanizsa, P. Kellman, A. Michote...) Seeing an object vs Knowing what it is. Seeing Ehrenstein illusion (See, recognize) va Recognizing offset grating Do not see, recognize). offset grating: some boundaries are invisible or amodal.
  • image p002fig01.02 Dalmation in snow
    || p002c2h0.55 "...This image reminds us that invisible boundaries can sometimes be very useful in helping us to recognize visual objects in the world. ... When we first look at this picture, it may just look like an array of black splotches of different sizes, desities, and orientations across the picture. Gradually, however, we can recognize the Dalmatian in it as new boundaries form in our brain between the black splotches. ..."
  • image p003fig01.03 Amodal completion
    || p00c1h0.75 "... Figure 1.3 illustrates what I mean by the claim that percepts derived from pictures are often illusions. Figure 1.3 (left column) shows three rectangular shapes that abut one another. Our percept of this image irresitably creates a different interpretation, however. We perceive a horizontal bar lying in front of a partially occluded vertical bar that is amodally completed behind it. ..."
  • image p004fig01.04 (top row) Kanizsa stratification; (botton row) transparency images
    || [top row images] "... are called stratification percepts... This simple percept can ... be perceived either as a white cross in front of a white outline square, or as a white outline square in front of a white cross. The former percept usually occurs, but the percept can intermittently switch between these two interpretations. ...it is said to be a bistable percept. ..."
  • image p008fig01.05 Noise-saturation dilemma.
    || cell activity vs cell number; [minimum, equilibrium, current, maximal] activity
  • image p009fig01.06 Primacy gradient of activity stored in working memory within a recurrent shunting on-center off-surround network. Rehersal is controlled by a nonspecific rehersal wave and self-inhibitory feedback of the item that is currently being rehearsed. Rehearsal is controlled by a nonspecific rehearsal wave and self-inhibitory feedback of the item that is currently being rehearsed. Green = excitatory, red = inhibitory
    || inputs? -> item and order WM storage -> competitive selection-> rehearsal wave -> outputs
  • image p011fig01.07 The choice of signal function f determines how an initial activity pattern will be transformed and stored in short-term memory (STM). Among [same, slower, faster]-than-linear signal functions, only the last one can suppress noise. It does so as it chooses the population that receives the largest input for storage, while suppressing the activities of all other population, thereby giving rise to a winner-take-all choice.
    || initial pattern (xi(0) vs i):
    fXi(∞)= xi(∞)/sum[j: xj(∞)]x(∞)
    linearperfect storage of any patternamplifies noise (or no storage)
    slower-than-linearsaturatesamplifies noise
    faster-than-linearchooses max [winner-take-all, Bayesian], categorical perceptionsuppresses noise, [normalizes, quantizes] total activity, finite state machine
  • image p012fig01.08 A sigmoidal signal function is a hybrid signal that combines the best properties of [faster, same, slower]-than linear signals. It can suppress noise and store a partially contrast-enhanced activity pattern. slower-than-linear saturates pattern; approximately linear- preserves pattern and normalizes; faster-than-linear- noise suppression and contrast-enhancement.
    || Sigmoidal signal: a hybrid. (upper) saturates pattern- slower-than-linear; (middle) preserves pattern and normalizes- approximately linear. (lower) noise suppression and contrast enhancement- faster-than-linear.
  • image p013fig01.09 A sigmoid signal function generates a quenching threshold below which cell activities are treated like noise and suppressed. Activities that are larger than the quenching threshold are contrast enhanced and stored in short-term memory.
    || Quenching threshold xi(o) vs i.
    fXi(∞)= xi(∞)/sum[j: xj(∞)]x(∞)
    sigmoidtunable filter
    stores infinitely many contrast-enhanced patterns
    suppresses noise
  • image p016fig01.10 The blocking paradigm shows how sensory cues that are conditioned to predict specific consequences can attentionally block other cues that do not change those predictions. On the other hand, if the total cue context is changed by adding a cue that does not change the predicted consequences, then the new cues can be conditioned to the direction of that change. They can hereby learn, for example, to predict fear if the shock level unexpectedly increases, or relief if the shock level unexpectedly decreases.
    || Minimal adaptive prediction. blocking- CS2 is irrelevant, unblocking- CS2 predicts US change. Learn if CS2 predicts a different (novel) outcome than CS1. CS2 is not redundant.
  • image p016fig01.11 A sufficiently big mismatch between a bottom-up input pattern and a top-down expectation can activate the orienting system, which triggers a burst of nonspecific arousal that can reset the recognition category that read out the expectation. In this way, unexpected events can reset short-term memory and initiate a search for a category that better represents the current situation.
    || [category- top-down (TD) expectation; Bottom-up (BU) input pattern] -> Feature pattern -> BU-TD mismatch -> orienting system -> non-specific arousal -> category.
  • image p018fig01.12 Peak shift and behavioural contrast. When a negative generalization gradient (in red) is subtracted from a positive generalization gradient (in green), the net gradient (in purple) is shifted way from the negative gradient and has a width that is narrower than any of its triggering gradients. Because the total activity of the network tends to be normalized, the renormalized peak of the net gradient is higher than that of the rewarded gradient, thereby illustrating that we can prefer experiences that we have never previously experienced over those for which we have previously been rewarded.
    ||
  • image p019fig01.13 Affective circuits are organized into opponent channels, such as fear vs. relief, and hunger vs. frustration. On a larger scale of affective behaviours, exploration and consummation are also opponent types of behaviour. Exploration helps to discover novel sources of reward. Consummation enables expected rewards to be acted upon. Exploration must be inhibited to enable an animal to maintain attention long enough upon a stationary reward in order to consume it.
    || exploration vs consummation
  • image p023fig01.14 A gated dipole opponent process can generate a transient antagonistic reboubnd from its OFF channel in response to offset of an input J to its ON channel. sustained on-response; transient off-response; opponent process; gates arousal: energy for rebound.
    ||
  • image p024fig01.15 A REcurrent Associative Dipole, or READ, circuit is a recurrent shunting on-center off-surround network with habituative transmitter gates. Sensory cues sample it with LTM traces and thereby become conditioned reinforcers.
    ||
  • image p025fig01.16 (left panel) The main processing stages of the Cognitive-Emotional-Motor (CogEM) model have anatomical interpretations in terms of sensory cortex, amygdala, and prefrontal cortex. Chapter 13 will describe in greater detail how CS cues activate invariant object categories in the sensory cortex, value categories in the amygdala, and object-value categories in the prefrontal cortex, notably the orbitofrontal cortex. The amygdala is also modulated by internal drive inputs like hunger and satiety. (right panel) Anatomical data support this circuit, as do many neurophysiological data.
    || drive -> amygdala -> prefrontal cortex <-> sensory cortex -> amygdala. [visual, somatosensory, auditory, gustatory, olfactory] cortex -> [amygdala, Orbital Prefrontal Cortex]. amygdala -> Lateral Prefrontal Cortex
  • image p025fig01.17 Sensory-drive heterarchy vs. drive hierarchy. How cues and drives interact to choose the drive and motivation that will control behavioral choices.
    || [drive inputs, sensory cue [before, after] cross-over] -> incentive motivation [eat, sex].
  • image p026fig01.18 Inverted U as a function of arousal. A Golden Mean at intermediate levels of arousal generates a combination of behavioral threshold, sensitivity, and activation that can support typical behaviors. Both underarousal and overarousal lead to symptoms that are found in mental disorders.
    || Behavior vs arousal.
    depressionunder-arousedover-aroused
    thresholdelevatedlow
    excitable above thresholdHyperHypo
    "UPPER" brings excitability "DOWN".
  • image p027fig01.19 The ventral What stream is devoted to perception and categorization. The dorsal Where stream is devoted to spatial representation and action. The Where stream is also often called the Where/How stream because of its role in the control of action.
    ||
    Spatial representation of actionPerception categorization
    WHERE dorsalWHAT ventral
    Parietal pathway "where"Temporal pathway "what"
    Posterior Parietal Cortex (PPC)Inferior temporal Cortex (IT)
    Lateral Prefrontal Cortex (LPFC)Lateral Prefrontal Cortex (LPFC)
  • image p029tbl01.01 Some pairs of complementary processing streams.
    ||
    visual boundary:
    interblob stream V1-V2-V4
    visual surface:
    blob stream V1-V2-V4
    visual boundary:
    interblob stream V1-V2-V4
    visual motion:
    magno stream V1-MT-MST
    WHAT streamWHERE stream
    perception & recognition:
    interferotemporal & prefrontal areas
    space & action:
    parietal & prefrontal areas
    object tracking:
    MT interbands & MSTv
    optic flow navigation:
    MT+ bands & MSTd
    motor target position:
    motor & parietal cortex
    volitional speed:
    basal ganglia
  • image p030tbl01.02 The What and Where cortical processing streams obey complementary laws. These laws enable the What stream to rapidly and stably learn invariant object categories without experiencing catastrophic forgetting, while the Where stream learns labile spatial and action representations to control actions that are aimed towards these objects.
    ||
    WHATWHERE
    spatially-invariant object learning and recognitionspatially-variant reaching and movement
    fast learning without catastrophic forgettingcontinually update sensory-motor maps and gains
    IT InterferoTemporal CortexPPC Posterior Parietal Cortex
    WhatWhere
    matchingexcitatoryinhibitory
    learningmatchmismatch
  • image p030fig01.20 A schematic cross-section of a slice of laminar neocortex whose cells are organized in a characteristic way in six layers, which themselves may be organized into distinct sublaminae. The computational paradigm of Laminar Computing attempts to show how different parts of neocortex can represent and control very different kinds of behavior - including vision, speech, can cognition - using specializations of the same canonical laminar cortical design.
    || Projection fibres: Cortico[spinal, bulbar, pontine, striate, reticulat, etc]; Thalamocortical fibres; Diffuse cortical afferent fibres: [nonspecific thalamocortical, Cholinergic, Monoaminergic]; Corticocortical efferents; Projection [cell, fibre]; Corticocortical efferent terminals.
  • image p032fig01.21 At least three parallel visual cortical streams respond to visual inputs that reach the retina. Two parvocellular streams process visual surfaces (blob stream) and visual boundaries (interblob stream). The magnocellular stream processes visual motion.
    || [Retina, LGNs, V[1,2,3,4], MT] to [What- inferotemporal areas, Where- parietal areas]: visual parallel streams [2x blob, 1x bound]
  • image p035fig01.22 A classical example of phonemic restoration. The spectrogram of the word "legislatures" is either excised, leaving a silent interval, or filled with broad-band noise. A percept of the restored phoneme is heard when it is replaced by noise, but not by silence.
    || [normal, silence, noise replaced] presentations. frequency (Hz) vs time (sec).
  • image p036fig01.23 As more items are stored in working memory through time, they can select larger chunks with which to represent the longer list of stored items.
    || [x, y, z] -> [xy, xyz]
  • image p037fig01.24 Only three processing stages are needed to learn how to store and categorize sentences with repeated words in working memory. See the text for more discussion.
    || IOR working memory (item chunk-> sequences) <-> IOR masking field: [item->list]<->[list->list] chunks. (<-> signifies <- expectation/attention, adaptive filter ->)
  • image p038fig01.25 The ART Matching Rule stabilizes real time learning using a [top-down, modulatory on-center, off-surround] network. Object attention is realized by such a network. See text for additional discussion.
    || ART Matching Rule [volition, categories, features]. [one, two] against one.
  • image p039tbl01.03 The link between consciousness and movement
    ||
    VISUALseeing, knowing, and reaching
    AUDITORYhearing, knowing, and speaking
    EMOTIONALfeeling, knowing, and acting
  • image p042tbl01.04 The six main kinds of resonances which support different kinds of conscious awareness that will be explained and discussed in this book.
    ||
    type of resonancetype of consciousness
    surface-shroudsee visual object or scene
    feature-categoryrecognize visual object or scene
    stream-shroudhear auditory object or stream
    spectral-pitch-and-timbrerecognize auditory object or stream
    item-listrecognize speech and language
    cognitive-emotionalfeel emotion and know its source
  • image p051fig02.01 Along the boundaries between adjacent shades of gray, laterial inhibition makes the darker area appear even darker, and the lighter areas appear even lighter. (Ernst Mach bands)
    ||
  • image p052fig02.02 Feature-category resonances enable us to rapidly learn how to recognize objects without experiencing catastrophic forgetting. Attentive matching between bottom-up feature pattern inputs and top-down expectations prevent catastrophic forgetting by focussing object attention upon expected patterns of features, while suppressing outlier features that might otherwise have caused catastophic forgetting if they were learned also.
    || Adaptive Resonance. Attended feature clusters reactivate bottom-up pathways. Activated categories reactivate their top-down pathways. Categories STM, Feature patterns STM. Feature-Category resonance [synchronize, amplify, prolong]s system response. Resonance triggers learning in bottom-up and top-down adaptive weights: adaptive resonance!
  • image p057fig02.03 Some basic anatomical and physiological properties of individual neurons. See the text for additional discussion.
    ||
    physiologycell body potentialaxonal signalchemical transmitter
    anatomynerve cell bodyaxonsynaptic knob, synapse
  • image p058fig02.04 Serial learning paradigm: Learning the temporal order of events by practicing them in the order that they occur in time.
    || Learning a global arrow in time. How do we learn to encode the temporal order of events in LTM? serial learning. [w=intra, W=inter]trial interval. "... data about serial verbal learning (Figure 2.4) seemed to suggest that events can go "backwards in time". ..."
  • image p059fig02.05 Bowed serial position curve. This kind of data emphasizes the importance of modelling how our brains give rise to our minds using nonlinear systems of differential equations.
    || Effects of [inter, intra]trial intervals (Hovland 1938). # of errors vs list position. [w (sec), W (sec)] = (2 6) (4 6) (2 126) (4 126). Nonoccurance of future items reduces the number of errors in response to past items. Thes data require a real-time theory for their explanation! that is, DIFFERENTIAL equations.
  • image p059fig02.06 The bowed serial position curve illustrates the sense in which "events can go backwards in time" during serial learning.
    || Bow due to backward effect in time. If the past influenced the future, but no conversely: # of errors vs list position; Data (Hoyland Hull, Underwood, etc).
  • image p060fig02.07 Position-specific-forward and backward error gradients illustrate how associations can form in both the forward and backward directions in time before the list is completely learned.
    || Error gradients: depend on list position. # of responses vs list position:
    list beginninganticipatory errorsforward in time
    list middleanticipatory and perseverative errorsforward and backward in time
    list endperseverative errorsbackward in time
  • image p061fig02.08 The existence of forward and backward associations, such as from A to B and from B to A is naturally explained by a network of neurons with their own activities or STM traces, and bidirectional connections between them with their own adaptive weights or LTM traces.
    || How these results led to neural networks (Grossberg 1957). Networks can learn forward and backward associations! Practice A->B also learn B<-A. Because learning AB is not the same as learning BA, you need STM traces, or activations, xp at the nodes, or cells, and LTM traces, or adaptive weights, zg, for learning at the synapses.
  • image p063fig02.09 The Additive Model describes how multiple effects add up influence the activities, or STM, traces of neurons.
    || STM: Additive model (Grossberg, PNAS 1967, 1968).
    Short-term memory (STM)
    trace activation
    signaladaptive weightLong-term memory (LTM)
    trace
    xi(j)fi(xi(t))*Bijzij(t)xj(t)
    learning rate?passive decaypositive feedbacknegative feedbackinput
    d[dt: xi(t)] = - Ai*xi + sum[j=1 to n: fj(xj(t))*Bji*zji] - sum[j=1 to n: gj(xj)*Cp*Zp] + Ii
    Special case : d[dt: xi(t)] = - Ai*xi + sum[j=1 to n: fj(xj(t))*zp] + Ii
  • image p064fig02.10 The Shunting Model includes upper and lower bounds on neuronal activities. These bound have the effect of multiplying additive terms by excitatory and inhibitory automatic gain terms that enable such models to preserve their sensitivity to inputs whose size may vary greatly in size through time, while also approximately normalizing their total activities.
    || STM: Shunting Model (Grossberg, PNAS 1967, 1968). Mass action in membrane equations. Bi/Ci -> xi(t) -> O -> -Fi/Ei. Bounded activations, automatic gain control. d[dt: xi(t)] = - Ai*xi + (Bi - Ci*xi)sum[j=1 to n: fj(xj(t))*Dji*yji*zji + Ii] - (Ei*Xi + Fi)*sum[j=1 to n: gj(xj)*Gji*Yji*Zji + Ji]. Includes the Additive Model.
  • image p064fig02.11 Medium-Term Memory (MTM) and Long-Term Memory (LTM) equations complement the Additive and Shunting Models of STM. MTM is typically defined by a chemical transmitter that is released from the synaptic knobs of a neuron (Figure 2.03). Its release or inactivation in an activity-dependent way is also called habituation. LTM defines how associative learning occurs between a pair of neurons whose activities are approximately correlated through time. See the text for details.
    || Medium and Long Term memory.
    MTMhabituative transmitter gated[dt: yki(t)] = H*(K - yki) - L*fk(xk)*yki
    LTMgated steepest descent learningd[dt: zki(t)] = Mk*fk(xk)*(hi(xi) - zki)
  • image p065fig02.12 Three sources of neural network research: [binary, linear, continuous nonlinear]. My own research has contributed primarily to the third.
    || Three sources of neural network research.
    BinaryLinearContinuous and non-Linear
    neural network signal processingSystems theoryNeurophysiology and Psychology
    McCullogh-Pitts 1943
    ... Xi(t+1) = sgn{sum[j: Aij*Xj(t) - Bi}
    Von Neumann 1945
    Calanielio 1961
    Rosenblatt 1962
    Widrow 1962
    Anderson 1968
    Kohonen 1971
    Hodgkin, Huxley 1952
    Hartline, Ratliff 1957
    Grossberg 1967
    Von der Malsburg 1973
    digital computerY-A*X
    cross-correlate
    steepest descent
  • image p068fig02.13 Hartline's lab developed a model to describe signal processing by the retina of the horseshoe crab.
    || Neurophysiology (network): lateral inhibition in limulus retina of horseshoe crab (Hartline, Ratliff, Miller 1963, Nobel Prize)
    hi = ei - sum[j=1 to n: {∫[dv, 0 to t: e^(-A*(t-v))*hj(v)] - Γj}(+) *Bji
    ei = spiking frequency without inhibition
    hi = spiking frequency with inhibition
    [w - r]+ vs i, Precursor of ADDITIVE network model.
  • image p068fig02.14 Hodgkin and Huxley developed a model to explain how spikes travel down the squid giant axon.
    || Neurophysiology (single cell): spike potentials in squid giant axon (Hodgekin, Huxley 1952, Nobel Prize). time -> (dendrites -> cell body -> axon).
    C*dp[dt: V] = α*dp^2[dX^2: V] + (V(+) - V)*g(+) + (V(-) - V)*g(-) + (V^p - V)*g^p
    g(+) = G(+)(m,h), g(-) = G(-)(n), G^p = const, [m, h, n] - ionic processes, V - voltage
    Precursor of Shunting network model (Rail 1962). (Howell: see p075fig02.24 Membrane equations of neurophysiology. Shunting equation
  • image p071fig02.15 The noise saturation dilemma: How do neurons retain their sensitivity to the relative sizes of input patterns whose total sizes can change greatly through time?
    || Noise-Saturation Dilemma (Grossberg 1968-1973). Bounded activities from multiple input sources.
    If activities xi are sensitive to SMALL inputs, then why don't the saturate to large outputs?
    If xi are sensitive to LARGE inputs, then why don't small inputs get lost in system noise?
    The functional unit is a spatial activity pattern .
  • image p071fig02.16 To solve the noise-saturation dilemma, individual neurons in a network that is receiving a distributed spatial patterns of inputs need to remain sensitive to the ratio of input to them divided by all the inputs in that spatial pattern. Although the inputs are delivered to a finite number of neurons, the input and activity patterns are drawn continuously across the cells for simplicity.
    || Noise-Saturation Dilemma. [Ii, xi] vs t. [Input, Activity] pattern [small -> noise, large -> saturation]. Problem: remain sensitive to input RATIOS θi = Ii / sum[j: Ij] as total input I = sum[j: Ij] -> ∞. Many kinds of data exhibit sensitivity to ratios of inputs.
  • image p072fig02.17 Brightness constancy.
    || Vision: brightness constancy, contrast normalization. Compute RATIOS of reflected light. Reflectance processing. p72c1h0.45 "... In other words, the perceived brightness of the gray disk is constant despite changes in the overall illumination. On the other hand, if only the gray disk were illuminated at increaing intensities, with the annulus illuminated at a constant intensity, then the gray disk would look progressively brighter.
  • image p072fig02.18 Vision: brightness contrast. Conserve a total quantity, Total activity normalization.
    LUCERatio scales in choice behavior
    ZEILERAdaptation level theory

    ||
  • image p073fig02.19 Computing with cells: infinity does not exist in biology!
    || Computing in a bounded activity domain, Gedanken experiment (Grossberg 1970). Vm sub-areas [xm, B - xm], I(all m)], m=[1, i, B].
    Bexcitable sites
    xi(t)excited sites (activity, potential)
    B - xi(t)unexcited sites
  • image p073fig02.20 Shunting saturation occurs when inputs get larger to non-interacting cells.
    || Shunting saturation. [xi(t), B - xi(t)].
    (a)(b)
    d[dt: xi] = -A*xi + (B - xi)*Ii
    (a) Spontaneous decay of activity xi to equilibrium
    (b) Turn on unexcited sites B - xo by inputs Ii (mass action)
    Inadequate response to a SPATIAL PATTERN of inputs: Ii(t) = θi*I(t)
    θirelative intensity (cf. reflectance)
    I(t)total intensity (cf. luminance)
  • image p073fig02.21 How shunting saturation turns on all of a cell's excitable sites as input intensity increases.
    || Shunting saturation. At equilibrium:
    0 = d[dt: xi] = -A*xi + (B - xi)*Ii
    xi = B*Ii / (A + Ii) = B*θi*I / (A + θi*I) -> B as I -> ∞
    Ii = θi*I, I = sum[j: Ij]
    I small: lost in noise; I large: saturates
    Sensitivity loss to relative intensity as total intensity increases.
  • image p073fig02.22 An on-center off-surround network is capable of computing input ratios.
    || Computing with patterns.
    How to compute the pattern-sensitive variable: θi = Ii / sum[k=1 to n: Ik]?
    Needs interactions! What type? θi = Ii / sum[k ≠ i: Ik]
    Ii↑ ⇒ θi↑ excitation, Ik↑ ⇒ θk↓, k ≠ i inhibition
    On-center off-surround network.
  • image p074fig02.23 The equations for a shunting on-center off-surround network. Shunting terms lead to many beautiful and important properties of these networks, which are found ubiquitously, in one form or another, in all cellular tissues.
    || Shunting on-center off-surround network.
    Mass action: d[dt: xi] = -A*xi +(B - xi)*Ii -xi*sum[k≠i: Ik]
    Turn on unexcited sitesTurn off excited sites
    At equilibrium:
    0 = d[dt: xi] = -(A + Ii + sum[k≠i: Ik])*xi + B*Ii = -(A + I)*xi + B*Ii
    xi = B*Ii/(A + I) = B*θi*I/(A + I) = θi* B*I/(A + I)No saturation!
    Infinite dynamical range
    Automatic gain control
    Compute ratio scale
    Weber law
    x = sum[k-1 to n: xk] = B*I/(A + I) ≤ B Conserve total activity
    NORMALIZATION
    Limited capacty
    Real-time probability
  • image p075fig02.24 The membrane equations of neurophysiology describe how cell voltages change in response to excitatory, inhibitory, and passive input channels. Each channel is described by a potential difference multiplied by a conductance. With the special choices shown in the lower right-hand corner, this equation defines a feedforward shuntin on-center off-surround network.
    || Membrane equations of neurophysiology.
    C*dp[dt] = (V(+) - V)*g(+) +(V(-) - V)*g(-) +(V(p) - V)*g(p)
    Shunting equation (not additive)
    V Voltage
    V(+), V(-), V(p) Saturating voltages
    g(+), g(-), g(p) Conductances
    V(+) = B, C = 1; V(-) = V(p) = 0; g(+) = Ii; g(-) = sum[k≠i: Ik];
    lower V: V(+) = V(p) Silent inhibition, upper V: V(+). (Howell: see p068fig02.14 Grossberg's comment that Hodgkin&Huxley model was a "... Precursor of Shunting network model (Rail 1962) ...").
  • image p076fig02.25 An on-center off-surround network can respond to increasing on-center excitatory inputs without a loss of sensitivity. Instead, as the off-surround input increases, the region of a cell's maximal sensitivity to an increasing on-center input shifts to a range of larger inputs. This is because the off-surround divides the effect of the on-center input, an effect that is often called a Weber law.
    || Web er law, adaptation, and shift property (Grossberg 1963).
    Convert to logarithmic coordinates:
    K = ln(Ii), Ii = e^K, J = sum[k≠i: Ik]
    xi(K,J) = B*Ii/(A + Ii + J) = B*e^K/(A + e^K + J)
    x(K + S, J1) = x(K, J2), S = ln((A + J1)/(A + J2)) size of SHIFT.
  • image p076fig02.26 The mudpuppy retina exhibits the shift property that occurs in the feedforward shunting on-center off-surround network in Figure 2.25. As a result, its sensitivity also shifts in response to different background off-surrounds, and therefore exhibits no compression (dashed purple lines).
    || Mudpuppy retina neurophysiology.
    I center, J background
    a) Relative figure-to-ground
    b) Weber-Fechner I*(A + J)^(-I)
    c) No hyperpolarization, SHUNT: Silent inhibition
    d) Shift property(Werblin 1970) xi(K,J) vs K = ln(I)
    Adaptation- sensitivity shifts for different backgrounds. NO COMPRESSION.
  • image p077fig02.27 A schematic of the on-center off-surround network that occurs in the mudpuppy retina, including three main cell types: receptors, horizontal cells, and bipolar cells.
    || Mechanism: cooperative-competitive dynamics.
    On-center off-surround (Kuffler 1953) cat retina
    Subtractive lateral inhibition (Hartline, Ratcliff 1956/7+) limulus retina.
    R receptor -> H horizontal -> B bipolar (Werblin, Dowling, etal 1969+) mudpuppy retina.
  • image p077fig02.28 Silent inhibition is replaced by hyperpolarization when the inhibitory saturating potential is smaller than the passive saturating potential. Then an adpatation level is created that determines how big input ratios need to be to activate their cells.
    || Weber Law and adaptation level.
    Hyperpolarization vs Silent inhibition
    d[dt: xi] = -A*xi +(B - xi)*Ii -(xi + C)*sum[k≠i: Ik]
    At equilibrium:
    0 = d[dt: xi] = -(A + Ii + )*xi +B*Ii -C*sum[k≠i: Ik]
    = -(A + I)*xi +(B + C)*Ii -C*I
    = -(A + I)*xi +(B + C)*I*[θi -C/(B + C)]
    xi = (B + C)*I/(A + I)* [θi -C/(B + C)]
    Weber Law Reflectance Adaptation level
  • image p078fig02.29 How the adaptation level is chosen to enable sufficiently distinct inputs to activate their cells.
    || Weber Law and adaptation level.
    xi = (B + C)*I/(A + I)* [θi -C/(B + C)]
    Weber Law Reflectance Adaptation level
    V(+) >> V(-) ⇒ B >> C ⇒ C/(B + C) << 1
    Adaptation level theory (Zeiler 1963).
  • image p078fig02.30 Choosing the adaptation level to achieve informational noise suppression.
    || Noise suppression. Attenuate Zero Spatial frequency patterns: no information. Ii vs i (flat line), xi vs i (flat line at zero)
    B >> C: Try B = (n - 1)*C or C/(B + C) = 1/n
    Choose a uniform input pattern (no distinctive features): All θi = 1/n
    xi = (B + C)*I/(A + I)*[θi -C/(B + C)] = 0 no matter how intense I is.
  • image p078fig02.31 How noise suppression enables matching of bottom-up and top-down input patterns.
    || Noise suppression -> pattern matching. mismatch (out of phase) suppressed, match (in phase) amplifies pattern.
  • image p079fig02.32 Matching amplifies the matched pattern due to automatic gain control. See terms I and J in the equation.
    || Substrate of resonance. Match (in phase) of BU and TD input patterns AMPLIFIES matched pattern due to automatic gain control by shunting terms. J = sum[i: Ji], I = sum[i: Ii], θi = (Ii + Ji)/(I + J)
    xi = (B + C)*(I + J)/(A + I + J)*[θi -C/(B + C)]
    Need top-down expectations to be MODULATORY.
  • image p080fig02.33 An opposite-attracts rule during the development of intracellular connections can lead to a mature network that realizes informational noise suppression.
    || How do noise suppression parameters arise? Symmetry-breaking during morphogenesis? Opposites attract rule.
    Intracellular parameters C/B = 1/(1 - n) Intercellular parameters
    Predicts that:
    • Intracellular excitatory and inhibitory saturation points can control the growth during development of :
    • Intercellular excitatory and inhibitory connections.
  • image p080fig02.34 How to achieve informational noise suppression in a network with multiple parallel processing channels.
    || Symmetry-breaking: dynamics and anatomy.
    Dynamics:
    • excitatory range is amplified
    • inhibitory range is compressed
    Anatomy:
    • narrow on-center
    • broad off-surround
    Noise suppression: attenuates uniform patterns
    Contour direction: enhances pattern gradients
  • image p081fig02.35 The equilibrium activities of a shunting netwok with Gaussian on-center off-surround kernels are sensitive to the ratio-contrasts of the input patterns that they process. The terms in the denominator of the equilibrium activities accomplish this using the shunting on-center and off-surround terms.
    || Ratio-contrast detector. flat versus [Gaussian Cki, flattened Gaussian? Eki]
    d[dt: xi] = -A*xi +(B - xi)*sum[k≠i: Ik]*Cki -(xi + D)*sum[k=1 to n: Ik*Eki]
    Cki = C*e^(-μ*(k - i)^2), Eki = E*e^(-μ*(k - i)^2)
    At equilibrium: xi = I*sum[k=1 to n: θk*Fki] / (A + I*sum[k=1 to n: θk*Gki])
    Fki = B*Cki -D*Eki (weighted D.O.G)
    Gki = Cki +Eki (S,O,G)
    • Reflectance processing
    • Contrast normalization
    • Discount illuminant
  • image p081fig02.36 Informational noise suppression in network with Gaussian on-center and off-surround function as contour detectors that are sensitive to ratio-contrast.
    || Noise suppression and contour detection.
    If B*sum[k=1 to n: Cki] <= D*sum[k=1 to n: Eki] then:
    • uniform patterns are suppressed
    • contrasts are selectively enhanced
    • contours are detected
    Ii vs i, xi vs i
    Responses are selective to [REFLECTANCE, SPATIAL SCALE], eg color [feature, surface] contours.
  • image p082fig02.37 My models begin with behavioral data, since brains are designed to achieve behavioral success. The text explains how models evolve in stages, through a process of successive refinements, or unlumpings. These unlumpings together carry out a kind of conceptual evolution, leading to models that can explain and predict ever larger psychological and neurobiological databases.
    || Modelling method and cycle.
    Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
    Operationalizes "proper level of abstraction"
    Operationalizes that you cannot "derive a brain" in one step.
  • image p085fig02.38 Our models have been used in many large-scale applications to engineering and technology. Linking brain to behavior explains how brain mechanisms give rise to psychological functions, and do so autonomously. The combination of mechanism, function, and autonomy helps to explain their value in helping to solve outstanding problems in technology.
    || Modeling method and cycle.
    Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
    Technology: Mathematical model and analysis <-> Technological applications
    At every stage, spin off new model designs and mechanisms to technologist who need autonomous intelligent applications.
  • image p087fig03.01 A macrocircuit of key visual processes (in green) and the cortical areas in which they primarily occur (in red), from the retina to the Prefrontal Cortex (PFC), including both the What and Where cortical streams. The [bottom-up, horizontal, and top-down] interactions help each of these processes to overcome computationally complementary processing deficiencies that they would experience without them, and aso to read-out top-down expectations that help to stabilize learning while they focus attention on salient objects and positions.
    || Emerging unified theory of visual intelligence. [What, Where] streams. Bottom-up and top-down interactions overcome COMPLEMENTARY processing deficiencies.
  • image p089fig03.02 What do you think lies under the two grey disks? (on a checkers board)
    || p089c1h0.55 "... As your eye traverses the entire circular boundary (Howell: of a grey disk on a checkerboard), the contrast keeps flipping between light-to-dark and dark-to-light. Despite these contrast reversals, we perceive a single continuous boundary surrounding the gray disk. ...".
  • image p090fig03.03 Kanizsa square and reverse-contrast Kanizsa square precepts. The spatial arrangement of pac-men, lines, and relative contrasts determines the perceived brightness of the squares, and even if they exhibit no brightness difference from their backgrounds, as in (b). These factors also determine whether pac-men will appear to be amodally completed behind the squares, and how far behind them.
    || p089c2h0.65 "...
    a) The percept of the square that abuts the pac-men is a visual illusion that is called the Kanizsa square. The enhanced brightness of the square is also an illusion.
    c) shows that these boundaries can be induced by either collinear edges or perpendicular line ends, and that both kinds of inducers cooperate to generate an even stronger boundary.
    d) if the perpendicular lines cross the positions of the illusory contours, then they can inhibit the strength of these contours. ..."
  • image p091fig03.04 A cross-section of the eye, and top-down view of the retina, shao how the blind spot and retinal veins can occlude the registration of light signals at their positions on the retina.
    || Eye: [optic nerve, ciliary body, iris,lens, pupil, cornea, sclera, choroid, retina]. Human retina: [fovea, blind spot, optic nerve]. see alsi cross-section of retinal layer.
  • image p092fig03.05 A cross-section of the retinal layer. Note that light stimuli need to go through all retinal layers before they reach the photoreceptor layer at which the light signals are registered.
    || light stimuli ->
    retinal layerscellular composition
    inner limiting membrane
    retinal nerve fibreganglion nerve fibres
    ganglion cellganglion
    inner plexiformamacrine
    inner nuclearhorizontal
    outer plexiform
    outer limiting membrane
    photoreceptorrod
    photoreceptorcone
    retinal pigment epithelium
    <- signal transduction. http://brain.oxfordjournals.org/content/early/2011/01/20/brain.awq346
  • image p093fig03.06 Every line is an illusion because regions of the line that are occluded by the blind spot or retinal veins are completed at higher levels of brain processing by boundary completion and surface filling-in.
    || Every line is an illusion!
    Boundary completionWhich boundaries to connect?
    Surface filling-inWhat color and brightness do we see?
  • image p094fig03.07 The processes of boundary completion and surface filling-in are computationally complementary.
    ||
    Boundary completionSurface filling-in
    outwardinward
    orientedunoriented
    insensitive to direction of contrastsensitive to direction-of-contrast
  • image p095fig03.08 Computer simulation of a Kanizsa square percept. See the text for details.
    || p094c2h0.2 "...
    b) shows the feature contours that are induced just inside the pac-man boundaries.
    c) feature contours fill-in within the square boundary
    d) create a percept of enhanced brightness throughout the square surface ..."
  • image p095fig03.09 Simulation of a reverse-contrast Kanizsa square percept. See the text for details.
    || p094c2h0.5 "...
    b) whereas bright feature contours are induced just inside the boundaries of the two black pac-men at the bottom of the figure, dark feature contours are induced inside the boundaries of the two white pac-man at the top of the figure
    c) the square boundary is recognized
    d) Because these dark and bright feature contours are approximately balanced, the filled-in surface color is indistinguishable from the filled-in surface color outside of the square, ... but [the square boundary is] not seen ..."
  • image p096fig03.10 The visual illusion of eon color spreading. Neither the square nor the blue color that are percieved within it are in the image that defines a neon color display. The display consists only of black and blue arcs.
    ||
  • image p096fig03.11 Another example of neon color spreading. The image is composed of black and blue crosses. See the text for details.
    || Howell: note the appearance of illusory red squares
  • image p098fig03.12 In this picture of Einstein's face, [edges, texture, shading] are overlaid.
    ||
  • image p100fig03.13 The Ehrenstein percept in the left panel is significantly weakened as the orientations of the lines that induce it deviate from being perpendicular deviate from being perpendicular to the illusory circle.
    ||
  • image p100fig03.14 Boundaries are completed with the orientations that receive the largest total amount of evidence, or support. Some can form in the locally preferred orientations that are perpendicular to the inducing lines, while others can form through orientations that are not locally preferred, thus showing that there is initially a fuzzy band of almost perpendicular initial grouping orientations at the end of each line.
    || Perpendicular induction at line ends wrt [circular, square] boundaries
    line ends localglobal
    perpendicular, crisppreferredpreferred
    NOT perpendicular, fuzzyunpreferredpreferred
  • image p100fig03.15 A fuzzy band of possible initial grouping orientations allows grouping to get started. Cooperative-competitive feedback via a hierarchical resolution of uncertainty chooses a sharp final grouping that has the most evidence to support it.
    || before choice: transient; after choice: equilibrium
  • image p102fig03.16 T's and L's group together based on shared orientations, not identities.
    ||
  • image p102fig03.17 The relative positions of the squares give rise to a percept of three regions. In the middle region, emergent diagonal groupings form, despite the fact that all the orientations in the image are verticals and horizontals.
    ||
  • image p103fig03.18 Computer simulations in [b, c, e, f] of groupings in response to different spatial arrangements in [a,c, e, g] of inducers that are composed of short vertical boundaries. Note the emergent horizontal groupings in [d, f, h] and the diagonal groupings in h, despite the fact that all its inducers have vertical orientations.
    ||
  • image p103fig03.19 As in Figure 3.18, emergent groupings can form whose orientations differ from thos of the inducing stimuli.
    || Thats how multiple orientations can induce boundary completion of an object. [diagonal, perpendicular, parallel]
  • image p104fig03.20 Sean Williams: how boundaries can form
    ||
  • image p104fig03.21 Four examples of how emergent boundaries can form in response to different kinds of images. These examples show how boundary webs can shape themselves to textures, as in (c), and shading, as in (d), in addition to lines, as in (a). In all these cases, the boundaries are invisible, but reveal themselves by supporting filling-in of surface brightness and color within their form-sensitive webs.
    ||
  • image p105fig03.22 Depth-selective boundary representations capture brightness and colors in surface filling-in domains. See the text for details.
    || 3D vision and figure-ground separation. multiple-scale, depth-selective boundary webs. refer to Figure 3.21(d)
    depth increasing ↓boundariessurfaces
    BC inputsurface capture!
    FC input
  • image p105fig03.23 The pointillist painting A Sunday on la Grande Jatte by Georges Seurat illustrates how we group together both large-scale coherence among the pixels of the painting, as well as forming small groupings around the individual dabs of color.
    ||
  • image p106fig03.24 In response to the Synthetic Aperture image (upper corner left), a shunting on-center off-surround network "discounts the illiminant" and thereby normalizes cell activities to compute feature contours, without causing saturation (upper right corner). Multiple-scale boundaries form in response to spatially coherent activities in the feature contours (lower left corner) and create the webs, or containers, into which the feature contours fill-in the final surface representations (lower right corner).
    || Do these ideas work on hard problems? SAR!
    input imagefeature contoursboundary contoursfilled-in surface
    Synthetic Aperture Radar: sees through weather 5 orders of magnitude of power in radar returndiscounting the illuminant
    • normalizes the image: preseves RELATIVE activities without SATURATION
    • shows individual PIXELS
    boundaries complete between regions where normalized feature contrasts changefilling-in averages brightnesses within boundary compartments
  • image p107fig03.25 The Roofs of Collioure by Matisse. See the text for details
    || p107c1h0.6 "... [Matisse] showed how patches of pure color, when laid down properly on a canvas, could be grouped by the brain into emergent boundarues, without the intervention of visible outlines. ... The trick was that these emergent boundaries, being invisible, or amodal, did not darken the colors in the surface representations. In this sense, Matisse intuitively realized that "all boundaries are invisible" through the masterful way in which he arranged his colors on canvas to generate boundaries that could support compelling surface representations. ..."
  • image p107fig03.26 How "drawing directly in color" leads to colored surface representations. Amodal boundary webs control the filling-in of color within these surface representations. See the text for details.
    || color patches on canvas -> [surface color and form, Amodal boundary web]. Amodal boundary web -> surface color and form.
  • image p108fig03.27 Matisse's painting Open Window, Collioure 1905 combines continuously colored surfaces with color patches that created surface representations using amodal boundaries, as in Figure 3.26. Both kinds of surfaces cooperate to form the final painterly percept.
    ||
  • image p108fig03.28 The watercolor illusion of Baingio Pinna 1987 can be explained using spatial competition betweeen like-oriented boundary signals. This occurs at what I have called the First Competitive Stage. This is one stage in the brain's computation of hypercomplex cells, which are also called endstopped complex cells. Why the blue regions seem to bulge in depth may be explained using multple-scale, depth-selective boundary webs. See ther text for details.
    || Baigio Pinna. Watercolor illusion 1987. Filled-in regions bulge in depth. Multiple-scale, depth-selective boundary web!
  • image p109fig03.29 The 3D percepts that are generated by chiaroscuro and trompe l'oeil both exploit the same kind of multiple-scale, depth-selective boundary webs that create the impression of a 3D bulge of the blue regions in the watercolor percept in Figure 3.28.
    || Chiascuro - Rembrandt self-portraitm Trompe l'oeil - Graham Rust.
  • image p109fig03.30 The triptych of Joe Baer, called Primary Light Goup: Red, Green, and Blue 1964-1965, generates watercolor illusion percepts which, when displayed side by side in a museum, create a striking impression.
  • image p110fig03.31 Henry Hensche's painting of The Bather is suffused with light.
    || p109c2h0.8 (Hawthorne 1938/60) wrote "... (pp 25-26) the outline and color of each spot of color against every other spot of color it touches, is the only kind of drawing you need to bother about ...Let color make form- do not make form and color it. ...". p110c1h0.6 (Robichaux 1997, p27) "... The untrained eye is fooled to think he sees forms by the model edges, not with color ... Fool the eye into seeing form without edges. (p33) Every form change must be a color change. ...".
  • image p110fig03.32 Claude Monet's painting of Poppies Near Argenteuil. See the text for details.
    || Claude Monet Poppies Near Argenteuil 1873. p110c2h0.35 "... the red poppies and the green field around them are painted to have almost the same luminescence; that is, they are almost equiluminant. As a result, the boundaries between the red and green regions are weak and positionally unstable, thereby facilitating an occasional impression of the poppies moving in a gentle breeze, especially as one's attention wanders over the scene. ...".
  • image p112fig03.33 Various ways that spatial gradients in boundary webs can cause self-luminous percepts. See the text for details.
    || Boundary web gradient can cause self luminosity. Similar to watercolor illusion. Gloss by attached highlight (Beck, Prazdny 1981), glare. (Bresan 2001) Double brilliant illusion, (Grossberg, Hong 2004) simulation. p111c2h0.5 "... This effect may be explained as the result of the boundary webs that are generated in response to the luminance gradients and how they control the filling-in of lightness within themselves and abutting regions. ... Due to the mutually inhibitory interactions across the boundaries that comprise these boundary webs, more lightness can spread into the central square as the steepness of the boundary gradients increases. ...".
  • image p112fig03.34 Examples of Ross Bleckner's self-luminous paintings.
    || Self-luminous paintings (Ross Bleckner). Galaxy painting (1993), Galaxy with Birds (1993). p112c2h0.15 "... Bleckner does this, not by painting large surface areas with high reflectances or bright colors, but rather creating compositions of small, star-like, circular regions that are perceived as self luminous ...".
  • image p113fig03.35 The Highest Luminance As White (HLAW) rule of (Hans Wallach 1948) works in some cases (top row) but not others (bottom row).
  • image p113fig03.36 The Blurred Highest Luminance As White (BHLAW) rule that I developed with my PhD student, Simon Hong, works in caseswhere the rule of Hans Wallach fails, as can be seen by comparing the simulation in Figure 3.35 with the one in this figure.
    || Blurred Highest Luminance As White (BHLAW) rule (Grossberg, Hong 2004, 2006). Spatial integration (blurring) adds spatial context to lightness perception.
  • image p114fig03.37 How the Blurred Highest Luminance as White rule sometimes normalizes the highest luminance to white (left panel) but at other times normalizes it to be self-luminous (right panel). See the text for details.
    || perceived reflectance vs cross-section of visual field. [white level, anchored lightness, self-luminous*, BHCAW]. *self-luminous only when conditions are right.
  • image p114fig03.38 Four color-field spray paintings of Jules Olitski. The text explains why they generate surfaces percepts with such ambiguous depth.
    || Jules and his friends (1967), Lysander-1 (1970), Instant Loveland (1968), Comprehensive Dream (1965). p114c2h0.4 "... it is impossible to visually perceive descrete colored units within the boundary webs in Olitski's spray paintings. ... create a sense of ambiguous depth in the viewer, similar to staring into a space filled with colored fog, or into a sunset free of discrete clouds. Olitski intentionally created this effect. ...".
  • image p115fig03.39 Two of Gene Davis's paintings in full color (top row) and in the monochromatic versions (bottom row). The text text explains how they achieve their different percepts of grouping and relative depth.
    || Gene Davis [Black popcorn, Pink flamingo] in [full color, monchromatic]. p115c1h0.8 "... His paintings ... are built up from vertical stripes. They do not contain size differences, shading, or recognizable objects. ...". p115c2h0.15 "... For starters, color similarities and/or almost equal luminances between stripes can influuence whether the viewer's eyes are drawn to individual stripes or groups of stripes. The achromatic versions of the two paintings more clearly show regions where the color assimilation is facilitated. ... Such form-sensitive spatial attention is called an attentional shroud. An attentional shroud, in turn, is created by a dynamical state in the brain that I call a surface-shroud resonance. ...".
  • image p116fig03.40 A combination of T-junctions and perspective cues can create a strong percept of depth in response to 2D images, with a famous example being Leonardo da Vinci's painting of the Mona Lisa.
    || p116c2h0.05 "... Many Renaissance artists learned how to use perspective cues ... Renaissance artists also undeerstood how to use T-junctionslike the ones that occur where the vertical and horizontal edges intersect in Figure 3.40 (left column, bottom row), or in the Kanizsa square percepts in Figure 3.3, or in the zebra image in Figure 3.21b.
  • image p117fig03.41 End gaps, or small breaks or weakenings of boundaries, can form where a stronger boundary abuts a weaker, like-oriented, boundary, as occurs where black boundaries touch red boundaries in the neon color spreading image of Figure 3.11.
    || Boundary contours - lower contrast boundary signals are weakened. feature contours- no inhibition, feature signals survive and spread. MP -> [BCS, FCS]. BCS -> FCS.
  • image p117fig03.42 Two paintings by Frank Stella. See the text for details.
    || Firzubad (top row) ... and Khurasan Gate (variation) (bottom row). p117c1h0.75 "... The luminance and color structure within a painting affects how it groups and stratifies the figures within it. These processes, in turn, affect the formation of attentional shrouds that organize how spatial attention is is allocated as we view them. ..." "... Stella wrote Furzabad is a good example of of lookng for stability and trying to create as much instability as possible. 'Cause those things are like bicycle wheels spinning around'."
  • image p120fig03.43 Four paintings by Monet of the Rouen cathedral under different lighting conditions (top row) and their monochromatic versions (bottom row). See the text for details.
    || p119c2h0.25 "... Monet uses nearby colors that are nearly equiluminant, and sharp, high-contrast luminance defined edges are sparse. He hereby creates weaker boundary signals within and between the parts of many forms, and stronger boundary signals between the forms. This combination facilitates color spreading within the forms and better separation of brightness and collor differences between forms. ... The grayscale versions of these paintings demonstrate the near equiluminance of the brushstrokes within forms, and places in which brightness and color differences significantly influence the groupings that differentiate between forms, including the differentiation between the cathedral and the sky. ..."
  • image p120fig03.44 The Rouen cathedral at sunset generates very different boundary webs than it does in full sunlight, as illustrated by Figure 3.45.
    || Rouen Cathedral at sunset (Monet 1892-1894).
    • Lighting almost equiluminant
    • Most boundaries are thus caused by color differences, not luminance differences
    • Fine architectural details are obscured, leading to...
    • Coarser and more uniform boundary webs, so...
    • Less depth in the painting.
  • image p121fig03.45 The Rouen cathedral in full sunlight.
    || Rouen Cathedral full sunlight (Monet 1892-1894).
    • Lighting is strongly non-uniform across most of the painting
    • Strong boundaries due to both luminance and color differences
    • Fine architectural details are much clearer, leading to...
    • Finer and more non-uniform boundary webs, so...
    • Much more detail and depth
  • image p121fig03.46 The Rouen cathedral in full sunlight contains T-Junctions that are not salient in the painting of it at sunset. These are among the painting's features that give it a much more depthful appearance.
    || Rouen Cathedral full sunlight (Monet 1892-1894).
    • There are also more T-junctions where vertical boundaries occlude horizontal boundaries, or conversely...
    • Leading to more depth.
    p119c2h1.0 "... Such T-junction boundary occlusions ... can generate percepts of depth in the absence of any other visual clues. ...".
  • image p123fig04.01 A classical example of how boundaries are barriers to filling-in.
    || Combining stabilized images with filling-in (Krauskopf 1963, Yarbus 1967). Image: Stabilize these boundaries with suction cup attached to retina or electronic feedback circuit. Percept: A visible effect of an invisible cause!
  • image p124fig04.02 The verical cusp of lesser and greater illuminance is the same in both images, but the one on the left prevents brightness from flowing around it by creating closed boundaries that tighly surround the cusp.
  • image p126fig04.03 A McCann Mondrian is an excellent display with which to illustrate how our brains discount the illuminant to compute the "real" colors of objects. See the text for details.
    || Color constancy: compute ratios. McCann Mondrian. Biological advantage: never see in bright light, eg tropical fish
    Discount the illuminantCompute lightness
    Different colors seen from the same spectrum
    ... similar to those seen in white light
    Physical basis: reflectance RATIOS!
  • image p128fig04.04 When a gradient of light illuminates a McCann Mondrian, there is a jump in the total light that is reflected at nearby positions where the reflectances of the patches change,
    || Compute reflectance changes at contours. Fill-in illuminant-discounted surface colors.
    leftright
    I + εI - ε
    A*(I + ε)B*(I - ε)
    A*(I + ε)/(B*(I - ε)) - 1 = A/B - 1
  • image p129fig04.05 Multiple-scale balanced competition chooses color contours where the reflectance of the patches change. These color contours discount the illuminant.
    || Compute reflectance changes at contours. Fill-in illuminant-discounted surface colors. Discount illuminant: compute color contours.
  • image p129fig04.06 Filling-in of color contours restores a surface percept with colors that substantially discount the illuminant.
    || Compute reflectance changes at contours. Fill-in illuminant-discounted surface colors. Fill-in surface color: hierarchical resolution of uncertainty.
  • image p130fig04.07 Simulation of brightness constancy under uniform illumination.
    || Simulation of brightness constancy (Grossberg & Todorovic 1988). Uniform illumination. [stimulus (S), feature (F), boundary (B), output]. B -> F -> S -> B: Veridical! Boundary peaks are spatially narrower than feature peaks.
  • image p131fig04.08 Simulation of brightness constancy under an illimination gradient. Note that the feature content pattern (F) is the same in both cases, so too is the boundary contour (B) pattern that is derived from it, and the final filled-in surface.
    || Simulation of brightness constancy. Discount the illuminant. [stimulus (S), feature (F), boundary (B), output]. B -> F -> S -> B: not veridical, but useful! Ratio-sensitive feature contours (F).
  • image p131fig04.09 Simulation of brightness contrast
    || Simulation of brightness contrast. [stimulus (S), feature (F), boundary (B), output].
  • image p132fig04.10 Simulation of brightness assimilation. Note how the equal steps on the left and right sides of the luminance profile are transformed into different brightness levels.
    || Simulation of brightness assimilation. [stimulus (S), feature (F), boundary (B), output].
  • image p132fig04.11 Simulations of a double step (left panel) and the Craik-O'Brien-Cornsweet (COCE) illusion. Note that discounting the illuminant creates similar feature contour patterns, from which the fact that the COCE looks like the double step follows immediately.
    || Simulations of double step and COCE. [stimulus (S), feature (F), boundary (B), output].
  • image p133fig04.12 Simulation of the 2D COCE.
    || (Todorovic, Grossberg 1988). p132c2h0.6 "... 2D Craik-O'Brien-Cornsweet Effect percepts that are generated by the stimulus in the left panel of Figure 4.2. ..."
  • image p134fig04.13 Contrast constancy shows how the relative luminances when a picture is viewed in an illumination gradient can even be reversed to restore the correct reflectances due to discounting the illuminant.
  • image p134fig04.14 The kinds of displays that Michael Paradiso and Ken Nakayamas used to catch filling-in "in the act" and which Karl Arrington then simulated using the Grossberg and Todorovic 1988 model.
    || Experiments on filling-in. Catching "filling0in" in the act (Paradiso, Nakayama 1991). (Arrington 1994 Vision Research 34, 3371-3387) simulated these data using the model of Grossberg and Todorovic 1988.
  • image p138fig04.15 Simple cells are oriented contrast detectors, not edge detectors.
    || From oriented filtering to grouping and boundary completion (Hubei, Weisel 1968). Oriented receptive fields: SIMPLE CELLS. Sensitive to : orientation, [amount, direction] of contrast, spatial scale. Oriented local contrast detectors, not edge detectors!
  • image p139fig04.16 The simplest way to realize an odd simple cell receptive field and firing threshold.
    || "Simplest" simple cell model. need more complexity for processing natural scenes. Difference-of-Gaussian or Gabor filter (J. Daugman, D. Pollen...). Output signal vs cell activity. Threshold linear signal, half-wave rectification.
  • image p140fig04.17 Complex cells pool inputs from simple cells that are sensitive to opposite contrast polarities. Complex cells hereby become contrast invartiant, and can respond to contrasts of either polarity.
    || Complex cells: pool signals from like-oriented simple cells of opposite contrast polarity at the same position. They are "insensitive to contrast polarity". Half-wave rectification of inputs from simple cells.
  • image p141fig04.18 The images formed on the two retinas in response to a single object in the world are displaced by different amounts with respect to their foveas. This binocular disparity is a powerful cue for determining the depth of the object from an observer.
    || Binocular Disparity. Binocular disparities are used in the brain to reconstruct depth from 2D retinal inputs, for relatively near objects.
  • image p141fig04.19 A laminar cortical circuit for computing binocular disparities in layer 3B of V1 at binocular simple cells. These cells add positionally disparate inputes from like polarized monocular simple cells (layer 4 of V1). Binocular simple cells at each position that is sensitive to opposite polarities then add their outputs at complex cells in layer 2/3. Chapter 10 will explain how these laminar circuits work in greater detail.
    || Laminar cortical circuit for complex cells. [left, right] eye.
    V1 layerdescription
    2/3Acomplex cells
    3Bbinocular simple cells
    4monocular simple cells
  • image p142fig04.20 A Glass pattern and a reverse-contrast Glass pattern give rise to a different boundary groupings because simple cells can only pool signals from like-polarity visual features. See the text for details.
  • image p143fig04.21 Oriented simple cells can respond at the ends of thick enough bar ends, but not at the ends of thin enough lines. See the text for an explanation of why this is true, and its implications for visual system design.
    || Hierarchical resolution of uncertainty. For a given field size. Different responses occur at bar ends and line ends. For a thin line no detector perpendicular to line end can respond enough to close the boundary there. Network activity.
  • image p144fig04.22 Computer simulation of how simple and complex cells respond to the end of a line (gray region) that is thin enough relative to the receptive field size (thick dashed region in the left panel). These cells cannot detect the line end, as indicated by the lack of responses there in the left panel (oriented short lines denote the cells' preferred positions and orientations, and their lengths denote relative cell activations). Such an end gap is corrected in the responses of hypercomplex cells that create a boundary at the line end which is called an end cut (right panel). See the text for details.
    || End gap and end cut simulation (Grossberg, Mingolia 1985). End gap, filter size, end cut.
  • image p145fig04.23 If end gaps were not closed by end cuts, then color would flow out of every line end!
    || A perceptual disaster in the feature contour system. feature contour, line boundary. input -> [boundary, surface]. boundary -> surface. Color would flow out of every line end! as it does during neon color spreading.
  • image p145fig04.24 A brain's task in creating an end cut to replace an ambiguous end gap requires that it be sensitive to the pattern of signals across the network, not just the activities of individual neurons.
    || Hierarchical resolution of uncertainty. End Cuts. The boundary system must CREATE a line end at next processing stage: Every line end is illusory! input -> ambiguous -> end cut. vertical -> vertical, ambiguous -> horizontal. A pattern-to-pattern map, not a pixel-to-pixel map.
  • image p146fig04.25 Networks of simple, complex, and hypercomplex cells can create end cuts as an example of hierarchical resolution of uncertainty. See the text for details.
    || How are end cuts created? (Grossberg 1984) Two stages of short-range competition. 1st stage: Simple cells -> complex cells -> hypercomplex - endstopped complex. First competitive stage- across position, same orientation; Second competitive stage- same position, across orientation. -> cooperation.
  • image p148fig04.26 End cuts are formed during neon color spreading in the same way that they are formed at line ends.
    || End cut during neon color spreading.
    FIRST competitive stageSECOND competitive stage
    within orientationacross orientation
    across positionwithin position
    to generate end cuts.
  • image p149fig04.27 Bipole cells can form boundaries that interpolate end cuts, and use their cooperative-competitive interactions to choose the boundary groupings that have the most support from them.
    || Bipole cells: boundary completion. long-range cooperation & short-range inhibition: complete winning boundary groupings and suppress weaker boundaries.
  • image p150fig04.28 Bipole cells have two branches (A and B), or poles, in their receptive fields. They help to carry out long-range boundary completion.
    || Bipole property. Boundary completion via long-range cooperation. Completing boundaries inwardly between pairs or great numbers of inducers in an oriented way. fuzzy "AND" gate.
  • image p151fig04.29 Experimental evidence of bipole cells in cortical area V2 was reported by Von der Heydt, Peterhans, and Baumgarter (1984).
    || Bipoles: first neurophysiological evidence (V2) (von der Heydt, Peterhans, Baumgartner 1984, Peterhans, von der Heydt 1988). (Grossberg 1984) prediction.
    Ordering:
    Stimulus (S)
    probe location *
    cells in V2
    response?
    ...(S)*...YES
    ...*...(S)NO
    (S)...*...NO
    (S)...*...(S)YES
    (S)...*...
    (more contrast)
    NO
    (S)...*.....(S)YES
    Evidence for receptive field.
  • image p151fig04.30 Anatomical evidence for long-range horizontal connections has also been reported, as illustrated by the example above from (Bosking etal 1997).
    || Anatomy: horizontal connections (V1) (Bosking etal 1997). tree shrew. [10, 20]*[20, 10, 0, -10, -20] (degrees).
  • image p152fig04.31 The predicted bipole cell receptive field (upper left corner) has been supported by both neurophysiological data and psychophysical data, and used in various forms by many modelers. See the text for details.
    || Bipoles through the ages. (Grossberg 1984; Grossberg, Mongolla 1985). (Field, Hayes, Hess 1993) "association field". (Heitger, von der Heydt 1993). (Williams, Jacobs 1997). cf. "relatability" geometric constraints on which countours get to group (Kellman & Shipley 1991). Also "tensor voting" (Ullman, Zucker, Mumford, Guy, Medioni, ...).
  • image p153fig04.32 The double filter network embodies simple, complex, and hypercomplex (or endstopped complex) cells. It feeds into a network of bipole cells that can complete boundaries when it properly interacts with the double filter.
    || Double filter and grouping network. Cells : simple -> complex -> hypercomplex (endstopping) -> bipole
    Grouping networkbipole cells
    Double filterhypercomplex cells
    endstopping
    complex cells
    simple cells
  • image p156fig04.33 A tripartite texture (top row) and two bipartite textures (bottom row) that illustrate how emergent boundary groupings can segregate textured regions from one another.
  • image p157fig04.34 Some textures that were simulated with mixed success by the complex channels model. In particular, the model gets the wrong answer for the textures in (g) and (i). The Boundary Contour System model of Figure 4.32, which includes both a double filter and a bipole grouping network, simulates the observed results.
  • image p159fig04.35 Spatial impenetrability prevents grouping between the pac-men figures in the left figure, but not in the figure on the right.
    || p158c2h0.75 "... In the image shown in the left panel, the horizontal boundaries of the background squares interfere with vertical boundary completion by vertically-oriented bipole cells, again by spatial impenetrability. In contrast, the vertical boundaries of the background squares are collinear with the vertical pac-man inducers, thereby supporting formation of the square boundaries. Finer aspects of these percepts, such as why the square ... (right panel) appears to lie in front of four partially occuded circular discs, as regularly occurs when the Kanizsa square can form (eg Figure 3.3), can be understood using FACADE theory mechanism that will shown below to explain many figure-ground percepts using natural extensions to the three dimensional world of boundary and and surface mechanisms that we have already discussed. ..."
  • image p159fig04.36 Graffiti art by Banksy exploits properties of amodal boundary completion and spatial impenetrability.
    || p159c1h0.75 perceptual psychologist Nava Rubin "... When the wall is smooth, Banksy leaves the regions previously covered by stencil unpainted, relying of observers' perception to segregate figural regions from the (identically colored) background. But when the wall is patterned with large-scale luminance edges - eg due to bricks - Banksy takes the extra time to fill in unpainted figural regions with another color (Rubin 2015). ..."
  • image p161fig04.37 Kanizsa squares that form either collinearly to their inducers (left panel) or perpendicular to them (right panel) confirm predictions of the BCS boundary completion model.
    || Analog-sensitive boundary completion. contour strength vs Kanizsa square image. Increases with "support ratio" (Shipley, Kellman 1992). Inverted-U (Lesher, Mingoloa 1993; cf Soriano, Spillmann, Bach 1994)(shifted gratings). p370h0.6 BCS = Boundary Contour System, FCS = Feature Contour System. p161c1h0.85 "... As predicted by the BCS, they found an Inverted-U in contour strength as a function of line density. ... This effect may be explained by the action of the short-range competition that occurs before the stage of long-range cooperative grouping by bipole cells (Figure 4.32). It is thus another example of the balance between cooperative and competitive mechanisms. ..."
  • image p162fig04.38 How long-range cooperation among bipole cells and short-range competition by hypercomplex cells work together to generate the inverted-U in boundary strength that is found in the data of Figure 4.37 (right panel).
    || Cooperation and competition during grouping.
    few lineswide spacing, inputs outside spatial range of competition, more inputs cause higher bipole activity
    more linesnarrower spacing, slightly weakens net input to bipoles from each inducer
    increasing line densitycauses inhibition to reduce net total input to bipoles
  • image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
    || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
  • image p164fig04.40 The Koffka-Benussi ring. See the text for details.
    || p164c2h0.25 "... [left image] The luminance of the ring is intermediate between the luminances of the two background regions. Its perceived brightness is also between the brightnesses of the two background regions, and appears to be uniform throughout. The right image differs from the left only in that a vertical line divides the two halves of the ring where it intersects the two halves in the background. Although the luminance of the ring is still uniform throughout, the two halves of the rig now have noticeably different brightnesses, with the left half of the ring looking darker than the right half. How can drawing a line have such a profound effect on the brightnesses of surface positions that are so far away from the line? ..."
  • image p165fig04.41 The Kanizsa-Minguzzi ring. See the text for details.
    || p165c1h0.6 "... (left panel), the annulus is divided by two line segments into annular sectors of unequal area. Careful viewing shows that the smaller sector looks a little brighter than the larger one. (Kanizsa, Minguzzi 1986) noted that "this unexpected effect is not easily explained. In fact, it cannot be accounted for by any simple psychological mechanism such as lateral inhibition or freuency filtering. Furthermore, it does not seem obvious to invoke oganizational factors, like figural belongingness or figure-ground articulation."". p165c2h0.35 "... (Grossberg, Todorovic 1988). Our main claim is that the two radial lines play two roles, one in the formation of boundaries with which to contain the filling-in process, and the other as a source of feature contour signals that are filled-in within the annular regions to create a surface brightness percept. ..."
  • image p166fig04.42 Computer simulation of Kanizsa-Minguzzi ring percept. See the text for details.
  • image p167fig04.43 (a) How bipole cells cause end cuts. (b) The Necker cube generates a bistable percept of two 3D parallelopipeds. (c) Focusing spatial attention on one of the disks makes it look both nearer and darker, as (Tse 1995) noted and (Grossbert, Yazdanbakhsh 1995) explained.
    || T-junction sensitivity. image -> bipole cells -> boundary. (+) long-range cooperation, (-) short-range competition.
  • image p168fig04.44 Macrocircuit of the main boundary and surface formation stages that take place from the lateral geniculate nucleus, or LGN, through cortical areas [V1, V2, V4]. See the text for details.
    ||
    left eyebinocularright eye
    V4 binocular surface
    V2 monocular surfaceV2 layer 2/3 binocular boundaryV2 monocular surface
    V2 layer 4 binocular boundary
    V1 monocular surfaceV1 monocular boundaryV1 binocular boundaryV1 monocular boundaryV1 monocular surface
    LGNLGN
  • image p168fig04.45 How ON and OFF feature contour (FC) activities give rise to filled-in surface regions when they are adjacent to a like oriented boundary, but not otherwise.
  • image p170fig04.46 Surface regions can fill-in using feature contour inputs (+ and - signs) if they are adjacent to, and collinear with, boundary contour inputs (solid) line, as in (a), but not otherwise, as in (b).
  • image p170fig04.47 A double-opponent network processes output signals from opponent ON and OFF Filling-In DOmains, or FIDOs.
    || OFF FIDO -> shunting networks -> ON FIDO -> shunting networks-> opponent interation -> FIDO outputs
  • image p171fig04.48 How closed boundaries contain filling-in of feature contour signals, whereas open boundaries allow color to spread to both sides of the boundary.
    || Before filling-in: boundary contour, illuminant-discounted feature contour; After filling-in: no gap, gap
  • image p171fig04.49 An example of DaVinci stereopsis in which the left eye sees more of the wall between A and C than the right eye does. The region between B and C is seen only by the left eye because the nearer wall between C and D occludes it from the right eye view.
  • image p173fig04.50 This figure illustrates how a closed boundary can be formed in a prescribed depth due to addition of binocular and monocular boundaries, but not at other depths.
    || How are closed 3D boundaries formed? V1 Binocular, V2 boundary, V2 surface; Prediction: monocular and horizontal boundaries are added to ALL binocular boundaries along the line of sight. Regions that are surrounded by a CLOSED boundary can depth-selectively contain filling-in of lightness and colored signals.
  • image p174fig04.51 The same feedback circuit that ensures complementary consistency between boundaries and surfaces also, automatically, initiates figure-ground separation! See the text for details.
    || before feedback: [V1 -> V2 pale stripe -> V2 thin stripe, "attention pointers" (Cavanagh etal 2010)]; after feedback: [V1 + V2 thin stripe] -> V2 pale stripe via contrast sensitive [exhitation, inhibition] for depths [1, 2] -> object recognition
  • image p174fig04.52 An example of how the 3D LAMINART model can transform the two monocular images of the random dot stereogream in the top row into the three depth-separated surfaces representations in the bottom row.
    || Stereogram surface percepts: surface lightnesses are segregated in depth (Fang and Grossberg 2009). [left, right] inputs, [far, fixation, near] planes. Contrast algorithms that just compute disparity matches and let computer code build the surface eg (Marr, Poggio, etal 1974).
  • image p176fig04.53 The on-center off-surround network within position and across depth helps to explain why brighter Kanizsa squares look closer.
    || inhibition vs. depth. p176c1h0.25 "... to qualitatively understand how this example of proximity-luminance covariance works. It follows directly from the boundary pruning by surface contour feedback signals (Figure 4.51) that achieves complementary consistency and initiates figure-ground perception. ...". p176c1h0.45 "... these inhibitory sigals are part of an off-surround network whose strength decreases as the depth difference increases between the surface that generates the signal and its recipient boundaries. ...". p176c1h0.8 "... Within FACADE theory, the perceived depth of a surface is controlled by the boundaries that act as its filling-in generators and barriers (Figure 3.22), since these boundaries select the depth-sselective FIDOs within whin filling-in can occur, and thereby achieve surface capture. These boundaries, in turn, are themselves strengthened after surface-to-boundary contour feedback eliminates redundant boundaries that cannot support sucessful filling-in (Figure 4.51). These surface contour feedback signals have precisely the properties that are needed to explain why brighter Kanizsa squares look closer! ..."
  • image p178fig04.54 Initial steps in figure-ground separation. See the text for details.
    ||
    topLeftrepeats the image in Figure 1.3
    topRightshows again the long-range cooperation and short-range compeition that are controlled by the bipole grouping process (Figure 4.43a middle panel)
    bottomLeftshows the end gaps that are caused by these bipole grouping mechanisms
    bottomRightshows how surface filling-in is contained within the closed horizontal rectangular boundary, but spills out of the end gaps formed in the other two rectangles
  • image p178fig04.55 Amodal completion of boundaries and surfaces in V2.
    || Separated V2 boundaries: near, far (amodal boundary completion); Separated V2 surfaces: ?horizonal, vertical? (amodal surface filling-in).
  • image p179fig04.56 Final steps in generating a visible, figure-ground separated, 3D surface representation in V4 of the unoccluded parts of opaque surfaces.
    || Visible surface perception.
    Boundary enrichment:nearfarasymmetry between near & far
    V4horizontal rectanglehorizontal & vertical rectanglescannot use these (overlapping?) boundaries for occluded object recognition
    V2horizontal rectanglevertical rectangleuse these boundaries for occluded object recognition
    Visible surface filling-in:filling-in of entire vertical rectanglepartial filling in of horizontal rectanglevisible percept of unoccluded [vertical] surface
  • image p181fig04.57 Percepts of unimodal and bistable transparency (top row) as well as of a flat 2D surface (bottom row, left column) can be induced just by changing the relative contrasts in an image with a fixed geometry.
    || X junction
  • image p182fig04.58 LAMINART model processing stage that are sufficient to explain many percepts of transparency, including those summarized in Figure 4.57.
    || [left, right] eye, [LGN, V1 [6, 4, 3B, 2/3 A], V2 [4, 2/3]], [mo, bi]nocular cart [simple, complex] cells, [excita, inhibi]tory cart [connection, cell]s.
  • image p186fig05.01 Humans and other autonomous adaptive intelligent agents need to be able to learn both many-to-one and one-to-many maps.
    || Learn many-to-one (compression, naming) and one-to-many (expert knowledge) maps
  • image p186fig05.02 Learning a many-to-one map from multiple visual fonts of a letter to the letter's name requires a stage of category learning followed by one of asscociatively learned mapping.
    || Many-to-one map- two stages of compression: visual categories, auditory categories
  • image p186fig05.03 Many-to-one maps can learn a huge variety of kinds of predictive information.
    || Many-to-one map, two stage compression: IF-THEN rules: [symptom, test, treatment]s; length of stay in hospital
  • image p189fig05.04 The hippocampus is one of several brain regions that are important in learning and remembering about objects and events that we experience throughout life. The book will describe several hippocampal processes that contribute to this achievement in different ways.
    || hypothalmic nuclei, amygdala, hippocampus, cingulate gyrus, corpus callosum, thalamus
  • image p192fig05.05 ON and OFF cells in the LGN respond differently to the sides and ends of lines.
    || [ON, OFF]-center, [OFF, ON]-surround (respectively). OFF-center cells maximum response at line end (interior), ON-center cells maximum response along sides (exterior)
  • image p192fig05.06 Bottom-up and top-down circuits between the LGN and cortical area V1. The top-down circuits obey the ART Matching Rule for matching with bottom-up input patterns and focussing attention on expected critical features.
    || Model V1-LGN circuits, version [1, 2]. retina -> LGN relay cells -> interneurons -> cortex [simple, endstopped] cells -> cortex complex cells
  • image p193fig05.07 A more detailed description of the connections between retinal ganglion cells, the LGN, and V1.
    ||
  • image p193fig05.08 The patterns of LGN activation and inhibition on the sides and ends of a line without the top-down feedback (A) and with it (C). The top-down distribution of excitation (+) and inhibition (-) are shown in (B).
    ||
  • image p194fig05.09 A computer simulation of the percept (D) that is generated by feature contours (B) and boundary contours (C) in response to an Ehrenstein disk stimulus (A).
    ||
  • image p198fig05.10 A competitive learning circuit learns to transform distributed feature patterns into selective responses of recognition categories.
    || Competitive learning and Self-Organized Maps (SOMs). input patterns -> feature level (F1) -> adaptive filter (T=ZS) ->
  • image p199fig05.11 Instar learning enables a bottom-up adaptive filter to become selectively tuned to particular feature patterns. Such pattern learning needs adaptive weights that can either increase or decrease to match the featural activations that they filter.
    || Instar learning STM->LTM: need both increases and decreases in strength for the LTM pattern to learn the STM pattern
  • image p200fig05.12 The duality of the outstar and instar networks is evident when they are drawn as above.
    ||
  • image p200fig05.13 Instar and outstar learning are often used to learn the adaptive weights in the bottom-up filters and top-down expectations that occur in ART. The ART Matching Rule for object attention enables top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features.
    || Expectations focus attention: feature pattern (STM), Bottom-Up adaptive filter (LTM), Category (STM), competition, Top-Down expectation (LTM); ART Matching Rule: STM before top-down matching, STM after top-down matching (attention!)
  • image p200fig05.14 Outstar learning enables individual sampling cells to learn distributed spatial patterns of activation at the network of cells that they sample. Again, both increases and decreases in LTM traces must be possible to enable them to match the activity pattern at the sampled cells.
    || Outstar learning, need both increases and decreases in ????
  • image p201fig05.15 An outstar can learn an arbitrary spatial pattern of activation at its sampled nodes, or cells. The net pattern that is learned is a time average of all the patterns that are active at the sampled nodes when the sampling node is active.
    || Spatial learning pattern, outstar learning.
  • image p202fig05.16 In the simplest example of category learning, the category that receives the largest total input from the feature level is chosen, and drives learning in the adaptive weights that abut it. Learning in this "classifying vector", denoted by zi, makes this vector more parallel to the input vector from the feature level that is driving the learning (dashed red arrow).
    || Geometry of choice and learning
  • image p202fig05.17 This figure summarizes the simplest equations whereby the adaptive weights of a winning category learn the input pattern that drove it to win, or more generally a time-average of all the input patterns that succeeded in doing so.
    || Geometry of choice and learning, learning trains the closest LTM vector
  • image p205fig05.18 How catastrophic forgetting can occur in a competitive learning or self-organizing map model due to basic properties of competition and associative learning.
    || Learning from pattern sequences, practicing a sequence of spatial patterns can recode all of them! When is learning stable? Input patterns cannot be too dense relative to the number of categories; Either: not to many distributed inputs relative to the number of categories, or not too many input clusters
  • image p207fig05.19 The ART hypothesis testing and learning cycle. See the text for details about how the attentional system and orienting system interact in order to incorporate learning of novel categories into the corpus of already learned categories without causing catastophic forgetting.
    ||
  • image p211fig05.20 The PN and N200 event-related potentials are computationally complementary events that are computed within the attentional and orienting systems.
    || PN and N200 are complementary waves. PN [top-down, conditionable, specific] match; N200 [bottom-up, unconditionable, nonspecific] mismatch
  • image p211fig05.21 Sequences of P120, N200, and P300 event-related potentials occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    || ERP support for mismatch-mediated reset: event-related potentials: human scalp potentials. ART predicted correlated sequences of P120-N200-P300 Event Related Potentials during oddball learning. P120 mismatch; N200 arousal/novelty; P300 STM reset. Confirmed in (Banquet and Grossberg 1987)
  • image p213fig05.22 Suppose that a very different exemplar activates a category than the one that originally learned how to do this.
    || By prior learning, X1 at F1 is coded at F2, Suppose that X2 incorrectly activates the same F2 code. How to correct the error? The problem occurs no matter how you define an "error"
  • image p213fig05.23 A category, symbol, or other highly compressed representation cannot determine whether an error has occurred.
    || Compression vs error correction. past vs present. Where is the knowledge than an error was made? Not at F2! The compressed code cannot tell the difference! X2 is at F1 when (green right triangle GRT) is at F2 defines the error. There is a mismatch between X1 and X2 at F1. How does the system know this?
  • image p214fig05.24 Learning of a top-down expectation must occur during bottom-up learning in the adaptive filter in order to be able to match the previously associated feature pattern with the one that is currently active.
    || Learning top-down expectations. When the code (green right triangle GRT) for X1 was learned at F2, GRT learned to read-out X1 at F1. [Bottom-Up, Top-Down] learning
  • image p214fig05.25 The sequence of events whereby a novel input pattern can activate a category which, in turn, reads out its learned top-down expectation to be matched against the input pattern. Error correction thus requires the use of a Match Detector that has properties of the Processing Negativity ERP.
    || How is an error corrected. During bottom-up learning, top-down learning must also occur so that the pattern that is read out top-down can be compared with the pattern that is activated by bottom-up inputs. Match detector: Processing Negativity ERP. 1. top-down, 2. conditionable, 3. specific, 4. match
  • image p214fig05.26 When a big enough mismatch occurs, the orienting system is activated and sends a burst of nonspecific arousal to the category level. This Mismatch Detector has properties of the N200 ERP.
    || Mismatch triggers nonspecific arousal. Mismatch at F1 eleicits a nonspecific event at F2. Call this event nonspecific arousal. N200 ERP Naatanen etal: 1. bottom-up, 2. unconditionable, 3. nonspecific, 4. mismatch
  • image p215fig05.27 Every event activates both the attentional system and the orienting system. This text explains why.
    || Attentional and Orienting systems. Every event has a cue (specific) and an arousal (nonspecific) function
  • image p215fig05.28 How a mismatch between bottom-up and top-down input patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level.
    || Mismatch -> inhibition -> arousal -> reset. BU input orienting arousal, BU+TD mismatch arousal and reset. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
  • image p220fig05.29 Vigilance is a gain parameter on inputs to the orienting system that regulates whether net excitation from bottom-up inputs or inhibition from activated categories will dominate the orienting system. If excitation wins, then a memory search for a better matching will occur. If inhibition wins, then the orienting system will remain quiet, thereby enabling resonance and learning to occur.
    || Vigilance control [resonate and learn, reset and search]. ρ is a sensitivity or gain parameter
  • image p221fig05.30 When a predictive disconfirmation occurs, vigilance increases enough to drive a search for a more predictive category. If vigilance increases just enough to exceed the analog match between features that survive top-down matching and the entire bottom-up input pattern, then minimax learning occurs. In this case, the minimum amount of category generalization is given up to correct the predictive error.
    || Match tracking realizes minimax learning principle. Given a predictive error, vigilance increases just enough to trigger search and thus acrifices the minimum generalization to correct the error ... and enables expert knowledge to be incrementally learned. predictive error -> vigilance increase just enough -> minimax learning
  • image p221fig05.31 A system like Fuzzy ARTMAP can learn to associate learned categories in one ART network with learned categories in a second ART network. Because both bottom-up and top-down interactions occur in both networks, a bottom-up input pattern to the first ART network can learn to generate a top-down output pattern from the second ART network.
    || Fuzzy ARTMAP. Match tracking realizes minimax learning principle: vigilence increases to just above the match ratio of prototype / exemplar, thereby triggering search
  • image p224fig05.32 Learning the alphabet with two different levels of vigilance. The learning in column (b) is higher than in column (a), leading to more concrete categories with less abstract prototypes. See the text for details.
    ||
  • image p225fig05.33 Some early ARTMAP benchmark studies. These successes led to the use of ARTMAP, and many variants that we and other groups have developed, in many large-scale applications in engineering and technology that has not abated even today.
    || see Early ARTMAP benchmark studies
  • image p225fig05.34 ARTMAP was successfully used to learn maps of natural terrains with many advantages over those of mapping projects that used AI expert systems. The advantages are so great that many mapping projects started to use this technology.
    || AI expert system - 1 year: field identification of natural regions; derivation of ad hoc rules for each region by expert geographers; correct 80,000 of 250,000 site labels; 230m (site-level) scale. ARTMAP system - 1 day: rapid, automatic, no natural regions or rules; confidence map; 30m (pixel-level) scale can see roads; equal accuracy at test sites
  • image p226fig05.35 I had shown in 1976 how a competitive learning or self-organizing map model could undergo catastrophic forgetting if the input environment was sufficiently dense and nonstationary, as illustrated by Figure 5.18. Later work with Gail Carpenter showed how, if the ART Matching Rule was shut off, repeating just four input patterns in the correct order could also casue catastrophic forgetting by causing superset recoding, as illustrated in Figure 5.36.
    || Code instability input sequences. D C A; B A; B C = ; |D|<|B|<|C|; where |E| is the number of features in the set E. Any set of input vectors that satisfy the above conditions will lead to unstable coding if they are periodically presented in the order ABCAD and the top-down ART Matching Rule is shut off.
  • image p226fig05.36 Column (a) shows catastrophic forgetting when the ART Matching Rule is not operative. It is due to superset recoding. Column (b) shows how category learning quickly stabilizes when the ART Martching Rule is restored.
    || Stabel and unstable learning, superset recoding
  • image p228fig05.37 A macrocircuit of the neurotrophic Spectrally Timed ART, or nSTART, model. I developed nSTART with my PhD student Daniel Franklin. It proposes how adaptively timed learning in the hippocampus, bolstered by Brain Derived Neurotrophic Factor, or BDNF, helps to ensure normal memory consolidation.
    || habituative gates, CS, US, Thalamus (sensory cortex, category learning, conditioned reinforcer learning, adaptively time learning and BDNF), Amygdala (incentive motivation learning), Hippocampus (BDNF), Prefrontal Cortex (attention), Pontine nuclei, Cerebellum (adaptively timed motor learning)
  • image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
    || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
  • image p231fig05.39 The SMART hypothesis testing and learning cycle predicts that vigilance increases when a mismatch in subcortical regions like the nonspecific thanlamus activates the nucleus basalis of Meynert which, in turn, broadcasts a burst of the neurotransmitter acetylcholine, or ACh, to deeper cortical layers. Due to the way in which LAMINART proposes that cortical matching and mismatching occurs, this ACh burst can increase vigilance and thereby trigger a memory search. See the text for details.
    || [BU input, [, non]specific thalamic nucleus, thalamic reticulate nucleus, neocortical laminar circuit] cart [Arousal, Reset, Search, Vigilance]
  • image p232fig05.40 Computer simulation of how the SMART mode generates (a) gamma oscillations if a good enough match occurs, or (c) beta oscillations if a bad enough match occurs. See the text for details.
    || Brain oscillations during match/mismatch, data, simulation. (a) TD corticothalamic feedback increases synchrony (Sillito etal 1994) (b) Match increases γ oscillations (c) Mismatch increases θ,β oscillations
  • image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
    || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
  • image p233fig05.42 Mismatch-induced beta oscillations have been reported in at least three parts of the brain: V1, V4, and hippocampus. Althpough there may be other reasons for beta oscillations in the brain, those that are caused by a mismatch should be studied in concert with the gamma oscillations that occur during a good enough match. See tyhe text for details.
    || Is there evidence for the [gamma, beta] prediction? Yes, in at least three parts of the brain, (Buffalo EA, Fries P, Ladman R, Buschman TJ, Desimone R 2011, PNAS 108, 11262-11267) Does this difference in average oscillation frequencies in the superficial and deep layers reflect layer 4 reset? Superficial recording γ (gamma), Deep recording β (beta) (Berke etal 2008, hippocampus; Buschman and Miller 2009, FEF)
  • image p236fig05.43 The activation of the nucleus basalis of Meynert, and its subsequent release of ACh into deeper layers of neocortex, notably layer 5, is assumed to increase vigilance by reducing afterhyperpolarization (AHP) currents.
    || Vigilance control: mismatch-mediated acetylcholine release (Grossberg and Versace 2008). Acetylcholine (ACh) regulation by nonspecific thalamic nuclei via nucleus basalis of Meynert reduces AHP in layer 5 and causes a mismatch/reset thereby increasing vigilance. HIGH vigilance ~ sharp code, LOW vigilance ~ coarse code
  • image p240fig05.44 When an algebraic exemplar model is realized using only local computations, it starts looking like an ART prototype model.
    || How does the model know which exemplars are in category A? BU-TD learning. How does a NOVEL test item access category A?
  • image p241fig05.45 The 5-4 category structure is one example of how an ART network learns the same kinds of categories as human learners. See the text for details.
    || 5-4 Category structure. A1-A5: closer to the (1 1 1 1) prototype; B1-B4: closer to the (0 0 0 0) prototype
  • image p242fig05.46 Computer simulations of how two variants of Distributed ARTMAP incrementally learn the 5-4 category structure. See the text for details.
    || Distributed ARTMAP with [self-supervised learning, post-training LTM noise]
  • image p245fig05.47 How long-range excitatory connections and short-range disynaptic inhibitory connections realize the bipole grouping law.
    || stimulus -> boundary representation -> layer 2/3
  • image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
    ||
  • image p248fig05.49 This circuit of the LAMINART model helps to explain properties of Up and Down states during slow wave sleep, and how disturbances in ACh dynamics can disrupt them.
    ||
  • image p252fig06.01 A surface-shroud resonance begins to form when the surface representations of objects bid for spatial attention. In addition to these topographic excitatory inputs, there is long-range inhibition of the spatial attention cells that determines which inputs will attract spatial attention.
    || Bottom-up spatial attention competition. [more, less] luminous perceptual surfaces -> competition -> spatial attention
  • image p253fig06.02 After bottom-up surface inputs activate spatial attentional cells, they send top-down topographic excitatory signals back to the surface representations. This recurrent shunting on-center off-surround network contrast enhances larger attentional activities while approximately normalizing the total spatial attentional activity. A surface-shroud resonance hereby forms that selects an attentional shroud, enhances the perceived contrast of the attended surface (light blue region), and maintains spatial attention on it.
    || Surface-shroud resonance. perceptual surfaces -> competition -> spatial attention. (Carrasco, Penpeci-Talgar, and Eckstein 2000, Reynolds and Desimone 2003)
  • image p254fig06.03 These interactions of the ARTSCAN Search model enable it to learn to recognize and name invariant object categories. Interactions between spatial attention in the Where cortical stream, via surface-shroud resonances, and object attention in the What cortical stream, that obeys the ART Matching Rule, coordinate these learning, recognition, and naming processes.
    || Retinal image -> On & OFF cell contrast normalization (retina/LGN) -> polarity [, in]sensitive contrast enhancement (V1) -> object [boundary (V2), surface (V2/V4)] -> surface contour (V2); What stream categories: volition control (BG). object boundary (V2) <-> view (ITp) <-> view integrator (ITp) <-> object (ITa) <-> [object-value (ORB), value (Amyg)] <-> name (PFC)
  • image p255fig06.04 The ARTSCAN Search model can also search for a desired target object in a scene, thereby clarifying how our brains solve the Where's Waldo problem.
    || similar ilustration as Figure 06.03, with some changes to arrows
  • image p257fig06.05 A curve tracing task with monkeys was used by Roelfsema, Lamme, and Spekreijse in 1998 to demonstrate how spatial attention can flow along object boundaries. See the text for details.
    || Attention flows along curves: Roelfsema etal 1998: Macaque V1. fixation (300ms) -> stimulus (600ms RF - target curve, distractor) -> saccade. Crossed-curve condition: attention flows across junction between smoothly connected curve segments, Gestalt good continuation
  • image p258fig06.06 Neurophysiological data and simulation of how attention can flow along a curve. See the text for details.
    || Simulation of Roelfsema etal 1998, data & simulation. Attention directed only to far end of curve. Propagates along active layer 2/3 grouping to distal neurons.
  • image p258fig06.07 A top-down spotlight of attention can also be converted into a shroud. This process begins when the spotlight triggers surface filling-in within a region. Figure 6.8 shows how it is completed.
    || Reconciling spotlights and shrouds: top-down attentional spotlight becomes a shroud. spotlight of attention, surface filling-in
  • image p259fig06.08 The distributed ARTSCAN, or dARTSCAN, model includes spatial attention in both PPC and PFC, and both fast-acting attention, triggered by transient cells in Where cortical areas such as MT, and slower-acting surface-shroud resonances in What cortical areas such as V4 and PPC. See the text for details.
    || dARTSCN spatial attention hierarchy, Fast (Where stream) Slow (What stream) (Foley, Grossberg, and Mingolia 2012). [transient cells (MT) ->, object surfaces (V4) <->] [object shrouds (PPC) <-> spatial shrouds (PPC/PFC)]
  • image p260fig06.09 Crowding in the periphery of the eye can be avoided by expanding the size and spacing of the letters to match the cortical magnification factor.
    || Crowding: visible objects and confused recognition. Accurate target recogition requires increased flanker spacing at higher eccentricity
  • image p260fig06.10 The cortical magnification factor transforms (A) artesian coordinates in the retina into (B) log polar coordinates in visual cortical area V1.
    ||
  • image p261fig06.11 If the sizes and distances between the letters stays the same as they are received by more peripheral parts of the retina, then all three letters may be covered by a single shroud, thereby preventing their individual perception and recognition.
    || Crowding: visible objects and confused recognition. log compression and center-surround processing cause... input same eccentricity, surface, object shroud, crowding threshold. object shrouds merge!
  • image p261fig06.12 Pop-out of the L among T's can easily occur when inspecting the picture to the left. In the picture to the right, a more serial search is needed to detect the vertical red bar due to overlapping conjunctions of features.
    ||
  • image p265fig06.13 The basal ganglia gate perceptual, cognitive, emotional, and more processes through parallel loops.
    || [motor, ocularmotor, dorsolateral, ventral-orbital, anterior cingulate] vs. [Thalamus, pallidum-subs, nigra, Striatum, Cortex]
  • image p267fig06.14 Feedback from object surfaces to object boundaries uses surface contours. This feedback assures complementary consistency and enables figure-ground separation. A corollary discharge of the surface contours can be used to compite salient object feature positions.
    || Perceptual consistency and figure-ground separation.
  • image p268fig06.15 The largest salient feature signal is chosen to determine the next target position of a saccadic eye movement. This This target position signal self-inhibits to enable the next most salient position to be foveated. In this way, multiple feature combinations of the object can be foveated and categorized. This process clarifies how the eyes can explire even novel objects before moving to other objects. These eye movements enable invariant categories to be learned. Each newly chosen target position is, moreover, an "attention pointer" whereby attention shifts to the newly foveated object position.
    || How are saccades within an object determined? Figure-ground outputs control eye movements via V3AA! Support for prediction (Theeuwes, Mathot, and Kingstone 2010), More support: "attention pointers" (Cavanaugh etal 2010), Even more support (Backus etal 2001, Caplovitz and Tse 2006, Galletti and Battaglia 1989, Nakamura and Colby 2000)
  • image p270fig06.16 The same target position signal that can command the next saccade also updates a gain field that predictively maintains the attentional shroud in head-centered coordinates, even before the eye movement is complete. This process keeps the shroud invariant under eye movements, so that it can continue to inhibit reset of an emerging invariant category as t is associated with multiple object views, even while the conscious surface representation shifts with each eye movement in retinotopic coordinates. This pdating process is often called predictive re mapping.
    || Predictive remapping of eye movements! From V3A to LIP. [spatial attention, object attention, figure-ground separation, eye movement remapping, visual search]. (Beauvillaib etal 2005, Carlson-Radvansky 1999, Cavanaugh etal 2001, Fecteau & Munoz 2003, Henderson & Hollingworth 2003, Irwin 1991)
  • image p271fig06.17 Persistent activity in IT cells is just what is needed to enable view-invariant object category learning by ARTSCAN to be generalized to [view, position, size]-invariant category learning by positional ARTSCAN, or pARTSCAN. See the text for details.
    || Persistent activity in IT. Physiological data show that persistent activity exist in IT (Fuester and Jervey 1981, Miyashita and Chang 1988, Tomita etal 1999). Adapted from (Tomita etal 1999 Nature)
  • image p272fig06.18 The pARTSCAN model can learn [view, position, size]-invariant categories by adding view category integrator cells that have the properties of persistent neurons in IT. These integrator cells get reset with the invariant object category, not the view category.
    || pARTSCAN: positionally-invariant object learning. (Cao, Grossberg, Markowitz 2011). IT cells with persistent activities are modeled by view category integrators in ITp. View-specific category cells are RESET as the eyes move within the object. View category integrator cells are NOT RESET when the view-specific category is reset. They are RESET along with invariant object category cells when a spatial attention shift occurs.
  • image p272fig06.19 The various parts of this figure explain why persistent activity is needed in order to learn positionally-invariant object categories, and how this fails when persistent activity is not available. See the text for details.
    ||
  • image p273fig06.20 pARTSCAN can simulate the IT cell recoding that Li and DiCarlo reported in their swapping experiments because the swapping procedure happens without causing a parietal reset burst to occur. Thus the originally activated invariant category remains activated and can get associated with the swapped object features.
    || Simulation of Li and DiCarlo swapping data. data (Li and DiCarlo 2008), model (Cao, Grossberg, Markowitz 2011). normalized response vs. exposure (swaps and/or hours)
  • image p274fig06.21 pARTSCAN can also simulate the trade-off in IT cell responses between position invariance and selectivity that was reported by Zoccolan etal 2007. This trade-off limits the amount of position invariance that can be learned by a cortical area like V1 that is constrained by the cortical magnification factor.
    || Trade-off in IT cell response properties. Inferotemporal cortex cells with greater position invariance respond less selectively to natural objects. invariance-tolerance, selectivity-sparseness. data (Zoccolan etal 2007) model (Grossberg, Markowitzm, Cao 2011). position tolerance (PT, degrees) vs sparseness (S)
  • image p274fig06.22 pARTSCAN can simulate how IT cortex processes image morphs, when it learns with high vigilance. See the text for details.
    || Akrami etal simulation: a case of high vigilance. tested on morphs between image pairs
  • image p275fig06.23 Data from (Akrami etal 2009) and our simulation of it. See the text for details.
    || IT responses to image morphs. data vs model
  • image p275fig06.24 Left and right eye stereogram inputs are constructed to generate percepts of objects in depth. These percepts include the features of the objects, not only their relative depths, a property that is not realized in some other models of steriopsis. See the text for details.
    || Sterogram surface percepts: surface lightnesses are segregated in depth (Fand, Grossberg 2009). Contrast algorithms that just compute disparity matches and let computer code build the surface, eg (Marr, Poggio 1974)
  • image p276fig06.25 In addition to the gain field that predictively maintains a shroud in head-centered coordinates during saccades, there are gain fields that predictively maintain binocular boundaries in head-centered coordinates so that they can maintain binocular fusion during saccades and control the filling-in of surfaces in retinotopic coordinates.
    || Surface-shroud resonance.
  • image p277fig06.26 Gain fields also enable predictive remapping that maintain binocular boundary fusion as the eyes move betweeen objects. See the text for details.
    || Predictive remapping maintains binocular boundary fusion even as eyes move between objects. retinotopic boundary -> invariant boundary (binocular)
  • image p278fig06.27 A surface-shroud resonance through the Where stream enables us to consciously see an object while a feature-category resonance into the What stream enables us to recognize it. Both kinds of resonances can synchronize via visual cortex so that we can know what an object is when we see it.
    || What kinds of resonances support knowing vs seeing? What stream [knowing, feature-prototype resonance], Where stream [seeing, surface-shroud resonance]
  • image p278fig06.28 If the feature-category resonances cannot form, say due to a lesion in IT, then a surface-shroud resonance can still support conscious seeing of an attended object, and looking at or reaching for it, even if the individual doing so knows nothing about the object, as occurs during visual agnosia. The surface-shroud resonance supports both spatial attention and releases commands that embody the intention to move towards the attended object.
    || What kinds of resonances support knowing vs seeing? visual agnosia: reaching without knowing Patient DF (Goodale etal 1991). Attention and intention both parietal cortical functions (Anderson, Essick, Siegel 1985; Gnadt, Andersen 1988; Synder, Batista, Andersen 1997, 1998)
  • image p283fig07.01 The usual boundary processing stages of [simple, complex, hypercomplex, bipole] cells enable our brains to correct uncontrolled persistence of previously excited cells just by adding habituative transmitter gates, or MTM traces, at appropriate places in the network.
    || Boundary processing with habituative gates. spatial competition with habituative gates, orientational competition: gated dipole, bipole grouping
  • image p284fig07.02 Psychophysical data (top row) and simulation (bottom row) of how persistence decreases with flash illuminance and duration.
    || Persistence data and simulations. (Francis, Grossberg, Mingolia 1994 Vision Research, 34, 1089-1104). Persistence decreases with flash illuminance and duration (Bowen, Pola, Matin 1974; Breitmeyer 1984; Coltheart 1980). Higher luminance or longer duration habituates the gated dipole ON channel more. Causes larger and faster rebound in the OFF channel to shut persisting ON activity off.
  • image p285fig07.03 Persistence decreases with flash illuminance and duration due to the way in which habituative transmitters regulate the strength of the rebound in response to offset of a stimulating input, and how this rebound inhibits previously activated bipole cells.
    || Persistence data and simulations (Francis, Grossberg, Mingolia 1994 Vision Research, 34, 1089-1104). Persistence decreases with flash illuminance and duration. Horizontal input excites a horizontal bipole cell, which supports persistence. Offset of the horizontal input causes a rebound of activity in the vertical pathway, which inhibits the horizontal bipole cell, thereby terminating persistence.
  • image p286fig07.04 Illusory contours persist longer than real contours because real contours have more inducers whose rebound at contour offset can cause faster boundary reset. Illusory contours also take longer to form than real contours, which explains the increasing portion of the curve.
    || Persistence data and simulations (Meyer, Ming 1988; Reynolds 1981). Increasing portion of curve is due to formation time of the illusory contour. Longer persistence is due to fewer bottom-up inducers of an illusory contour that has the same length as a real contour: only illuminance-derived edges generate reset signals. When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
  • image p286fig07.05 This figure shows the propagation through time of illusory contour offset from the rebounded cells that got direct inputs to the center of the contour.
    || Persistence data and simulations. Illusory contours persist longer than real contours (Meyer, Ming 1988; Reynolds 1981). When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
  • image p287fig07.06 The relative durations of persistence that occur due to an adaptation stimulus of the same or orthogonal orientation follow from the properties of the habituative gated dipoles that are embedded in the boundary completion system.
    || Persistence data and simulations. Change in persistence depends on whether adaptation stimulus has same or orthogonal orientation as test grating (Meyer, Lawson, Cohen 1975). If adaptation stimulus and test stimulus have the same orientation, they cause cumulative habituation, which causes a stronger reset signal, hence less persistence. When they are orthogonal, the competition on the ON channel is less, hence more persistence.
  • image p287fig07.07 Persistence increases with distance between a target and a masking stimulus due to weakening of the spatial competition in the first competitive stage of hypercomplex cells.
    || Persistence data and simulations. Persistence increases with distance between a target and a masking stimulus (Farrell, Pavel, Sperling 1990). There is less spatial competition from the masker to the target when they are more distant, hence the target is more persistent.
  • image p290fig08.01 Motion in a given direction pools all possible contrast-sensitive sources of information that are moving in that direction.
    ||
  • image p291fig08.02 Complex cells can respond to motion in opposite directions and from features with opposite contrast polarities.
    ||
  • image p292fig08.03 The MacKay and waterfall illusion aftereffects dramatically illustrate the different symmetries that occur in the orientational form stream and the directional motion stream.
    || Form and motion aftereffects. different inhibitory symmetries govern orientation and direction. illusions: [Form- MacKay 90°, Motion- waterfall 180°]. stimulus, aftereffect percept
  • image p293fig08.04 Most local motion signals on a moving object (red arrows) may not point in the direction of the object's real motion (green arrows). This problem besets every neuron due to the fact that it receives signals only in a space-limited aperture.
    || Most motion signals may not point in an object's direction of motion. Aperture problem. EVERY neuron's receptive field experiences an aperture problem. How doe the brain use the small number of [correct, unambiguous] motion signals to compute an object's motion direction?
  • image p295fig08.05 The perceived direction of an object is derived either from a small subset of feature tracking signals, or by voting among ambiguous signals when feature tracking signals are not available.
    || Aperture problem. Barberpole illusion (Wallach). How do sparse feature tracking signals capture so many ambiguous motion signals to determine the perceived motion direction?
  • image p296fig08.06 In the simplest example of apparent motion, two dots turning on and off out of phase in time generate a compelling percept of continuous motion between them.
    || Simplest long-range motion paradigm. ISI- interstimulus interval, SOA- stimulus onset synchrony
  • image p296fig08.07 When two flashes turn on and off out of phase with the correct range of interstimulus intervals, and not too far from one another, then either beta motion of phi motion are perceived.
    || Beta and Phi motion percepts. Beta motion: percepts of continuous motion of a well-defined object across empty intervening space. Phi motion: sense of "pure" motion without a concurrent percept of moving object. (Exner 1875) http://www.yorku.ca/eye/balls.htm
  • image p297fig08.08 When a second flash is more intense than the first flash, then apparent motion may occur from the second to the first flash.
    || Delta motion: motions from the second to the first flash. Data: (Kolers 1972; Korte 1915). Simulation: (Grossberg, Rudd 1992). This occurs when the luminance or contrast of the second flash is large compared to that of the first flash. Sustained and transient cells obey shunting dynamics whose averaging rates speed up with output intensity. The first flash to wane is the one that will be the source of the G-wave.
  • image p297fig08.09 Simulation of motion in opposite directions that is perceived when two later flashes occur on either side of the first flash.
    || Split motion. Data: (H.R. Silva 1926), Simulation: (Grossberg, Rudd 1992)
  • image p298fig08.10 Simulation of the motion speed-up that is perceived when flash duration decreases.
    || "The less you see it, the faster it moves". Data: (Giaschi, Anstis 1989), Simulation: (Grossberg, Rudd 1992). ISI = 0, flash duration decreases; SOA = constant, flash duration decreases
  • image p298fig08.11 This formotion percept is a double illusion due to boundary completion in the form stream followed by long-range apparent motion using the completed bioundaries in the motion stream.
    || Form-motion interactions. Apparent motion of illusory contours (Ramachandran 1985). Double illusion! Illusory contour is created in form stream V1-V2. Apparent motion of illusory contours occurs in motion stream due to a V2-MT interaction.
  • image p300fig08.12 A single flash activates a Gaussian receptive field across space whose maximum is chosen by a winner-take-all recurrent on-center off-surround network.
    || Gaussian receptive fields are sufficient! (Grossberg, Rudd 1992). Single flash. Suppose that a single flash causes a narrow peak of activity at the position where it occurs. It generates output signals through a Gaussian filter that produces a Gaussian activity profile at the next processing stage., A recurrent on-center off-surround network chooses the maximum activity and suppresses samaller activities. Winner-take-all
  • image p300fig08.13 As a flash waxes and wanes through time, so too do the activities of the cells in its Gaussian receptive field. Because the maximum of each Gaussian occurs at the same position, nothing is perceived to move.
    || Temporal profile of a single flash. Suppose that a single flash quickly turns on to maximum activity, stays there for a short time, and then shuts off. It causes an increase in activity, followed by an exponential decay of activity. The corresponding Gaussian profile waxes and wanes through time. Since the peak position of the Gaussian does not change through time, nothing moves.
  • image p300fig08.14 Visual inertia depicts how the effects of a flash decay after the flash shuts off.
    || Inertia (%) vs ISI (msec)
  • image p301fig08.15 If two flashes occur in succession, then the cell activation that is caused by the first one can be waning while the activation due to the second one is waxing.
    || Temporal profile of two flashes. Of two flashes occur in succession, the waning of the activity due to the first flash may overlap with the waxing of the activity due to the second flash.
  • image p301fig08.16 The sum of the waning Gaussian activity profile due to the first flash and the waxing Gaussian activity profile due to the second flash has a maximum that moves like a travelling wave from the first to the second flash.
    || Travelling wave (G-wave): long-range motion. If the Gaussian activity profiles of two flashes overlap sufficiently in space and time, then the sum of Gaussians produced by the waning of the first flash added to the Gaussian produced by the waxing of the second flash, can produce a single-peaked travelling wave from the position of the first flash to that of the second flash. The wave is then processed through a WTA choice network (Winner Take All). The resulting continuous motion percept is both long-range and sharp.
  • image p302fig08.17 An important constraint on whether long-rang apparent motion occurs is whether the Gaussian kernel is broad enough to span the distance between successive flashes.
    || Motion speed-up with increasing distance: For a fixed ISI, how does perceived velocity increase with distance between the flashes? Gaussian filter : Gp = exp{ -(j-i)^2 / (2*K^2) }. The largest separation, L_crit, for which sufficient spatial overlap between two Gaussians centered at locations i and j will exist to support a travelling wave of summed peak activity is : L_crit = 2*K
  • image p302fig08.18 This theorem shows how far away (L), given a fixed Gaussian width, two flashes can be to generate a wave of apparent motion between them.
    || G-wave properties (Grossberg 1977). Let flashes occur at positions i=0 and i=L. Suppose that d[dt: x0] = -A*x0 + J0; d[dt: xL] = -A*xL + JL; Define G(w,t) ...; Theorem 1 max_w G(w,t) moves continuously through time from w=0 to w=L if and only if L <= 2*K.
  • image p303fig08.19 The dashed red line divides combinations of flash distance L and Gaussian width K into two regions of no apparent motion (above the line) and apparent motion (below the line).
    || No motion vs motion at multiple scales.
  • image p303fig08.20 The G-wave speeds up with the distance between flashes at a fixed delay, and has a consitent motion across multiple spatial scales.
    || G-wave properties (Grossberg 1977). Theorem 2 (Equal half-time property) The time at which the motion signal reaches position w=L/2. Apparent motion speed-up with distance: this half-time is independent of the distance L between the two flashes. Consistent motion across scales: half-time is independent of the scale size K. Method of proof: elementary algebra and calculus (Grossberg, Rudd 1989 appendix)
  • image p304fig08.21 A computer simulation of the equal half-time property whereby the apparent motions within different scales that respond to the same flashes all reach the half-way point in the motion trajectory at the same time.
    || Equal half-time property: how multiple scales cooperate to generate motion percept. Travelling waves from Gaussian filters of different sizes bridge the same distance in comparable time. The time needed to bridge half the distance between flashes is the same.
  • image p304fig08.22 Data (top image) and simulation (bottom image) of Korte's laws. The laws raise the question of how ISIs in the hundreds of milliseconds can cause apparent motion.
    || Korte's Laws, Data: (Korte 1915) Simulation: (Francis, Grossberg 1996)
  • image p305fig08.23 Despite its simplicity, the Terus display can induce one of four possible percepts, depending on the ISI.
    || Ternus motion. ISI [small- stationary, intermediate- element, larger- group] motion http://en.wikipedia.org/wiki/Ternus_illusion
  • image p305fig08.24 When each stimulus has an opposite contrast relative to the background, element motion is eliminated and replaced by group motion at intermediate values of the ISI.
    || Reverse-contrast Ternus motion. ISI [small- stationarity, intermediate- group (not element!), larger- group] motion.
  • image p306fig08.25 The Motion BCS model can explain and simulate all the long-range apparent motion percepts that this chapter describes.
    || Motion BCS model (Grossberg, Rudd 1989, 1992) Level 1: discount illuminant; Level 2: short-range filter, pool sustained simple cell inputs with like-oriented receptive fields aligned in a given direction. Sensitive to direction-of-contrast; Level 3: Transient celss with unoriented receptive field. Sensitive to direction-of-change
  • image p306fig08.26 The 3D FORMOTION model combines mechanisms for determining the relative depth of a visual form with mechanisms for both short-range and long-range motion filtering and grouping. A formotion interaction from V2 to MT is predicted to enable the motion stream to track objects moving in depth.
    || 3D Formotion model (Chey etal 1997; Grossberg etal 2001; Berzhanskaya etal 2007). Form [LGN contours -> simple cells orientation selectivity -> complex cells (contrast pooling, orientation selectivity, V1) -> hypercomplex cells (end-stopping, spatial sharpening) <-> bipole cells (grouping, cross-orientation competition) -> depth-separated boundaries (V2)], Motion: [LGN contours -> transient cells (directional stability, V1) -> short-range motion filter -> spatial competition -> long-range motion filter and boundary selection in depth (MT) <-> directional grouping, attentional priming (MST)]
  • image p307fig08.27 The distribution of transients through time at onsets and offsets of Ternus display flashes helps to determine whether element motion or group motion will be perceived.
    || Ternus motion. Element motion: zero or weak transients at positions 2 and 3; Group motion: strong transients at positions 2 and 3. Conditions that favor visual persistence and thus perceived stationarity of element (2,3) favor element motion (Braddick, Adlard 1978; Breitmeyer, Ritter 1986; Pantle, Peteresik 1980)
  • image p308fig08.28 The Gaussian distributions of activity that arise from the three simultaneous flashes in a Ternus display add to generate a maximum value at their midpoint. The motion of this group gives rise to group motion.
    || Ternus group motion simulation. If L < 2*K, Gaussian filter of three flashes forms one global maximum.
  • image p310fig08.29 When the individual component motions in (A) and (B) combine into a plaid motion (C), both their perceived direction and speed changes.
    ||
  • image p311fig08.30 The data of (Castet etal 1993) in the left image was simulated in the right image by the 3D FORMOTION model that I developed with my PhD student Jonathan Chey. These data provide insight into how feature tracking signals propagate from the ends of a line to its interior, where they capture consistent motion directional signals and inhibit inconsistent ones.
    || Solving the aperture problem. A key design problem: How do amplified feature tracking signals propagate within depth to select the cirrect motion directions at ambiguous positions? This propagation from feature tracking signals to the line interior determines perceived speed in Castet etal data, which is why speed depends on line tilt and length. Data: (Castet etal 1993), Simulation: (Chey etal 1997)
  • image p311fig08.31 Processing stages of the Motion BCS convert locally ambiguous motion signals from transient cells into a globally coherent percept of object motion, thereby solving the aperture problem.
    || Why are so many motion processing stages needed? change sensitive receptors -> directional transient cells -> directional short-range filter -> spatial and directional competition -> directional long-range filter (MT) <-> Directional grouping network
  • image p312fig08.32 Schematic of motion filtering circuits.
    || Level 1: Change sensitive units -> Level 2: transient cells -> Level 3: short-range spatial filters -> Level 4: intra-scale competition -> Level 5: inter-scale competition
  • image p312fig08.33 Processing motion signals by a population of speed-tuned neurons.
    ||
  • image p314fig08.34 The VISTARS model for visually-based spatial navigation. It uses the Motion BCS as a front end and feeds it output signals into two computationally complementary cortical processing streams for computing optic flow and target tracking information.
    || VISTARS navigation model (Browning, Grossberg, Mingolia 2009). Use FORMOTION model as front end for higher level navigational circuits: input natural image sequences -> estimate heading (MT+)-MSTd -> additive processing -> estimate object position (MT-)-MSTv direction and speed subtractive processing -> Complementary Computing. [optic flow navigation, object tracking]
  • image p315fig08.35 The output signals from the directional grouping network obey the ART Matching Rule. They thereby select consistent motion directional signals while suppressing inconsistent ones, and do not distort that the spared cells code. The aperture problem is hereby solved by the same mechanism that dynamically stabilizes the learning of directional grouping cells.
    || How to select correct direction and preserve speed estimates? Prediction: Feedback from MSTv to MT- obeys ART Matching Rule; Top-down, modulatory on-center, off-surround network (Grossberg 1976, 1980; Carpenter, Grossberg 1987, 1991); Explains how directional grouping network can stably develop and how top-down directional attention can work. (Cavanough 1992; Goner etal 1986; Sekuer, Ball 1977; Stelmach etal 1994). Directional grouping network (MSTv) <-> Directional long-range filter (MT). Modulatory on-center selects chosen direction and preserves speed. Off-surround inhibits incompatible directions.
  • image p316fig08.36 How the directional grouping network, notably properties of the ART Matching Rule, enables a small set of amplified feature tracking signals at the ends of a line to select consistent directions in the line interior, while suppressing inconsistent directions.
    || Motion capture by directional grouping feedback. Directional grouping network (MSTv) <-> Directional long-range filter (MT). It takes longer to capture ambiguous motion signals in the line interior as the length of the line increases cf (Castet etal 1993)
  • image p317fig08.37 Processing stages that transform the transient cell inputs in response to a tilted moving line into a global percept of the object's direction of motion. The orientations of the lines denote the directional preferences of the corresponding cells, whereas line lengths are proportional to cell activities.
    || Motion capture by directional grouping feedback (Chey, Grossberg, Mingolia 1997). thresholded short-range filter outputs, directional long-range filter cell activities a 3 times, directional short-range filter cells, directionally-sensitive transient cells
  • image p319fig08.38 The neurophysiological data from MT (left image) confirms the prediction embodied in the simulation of MT (right image) concerning the fact that it takes a long time for MT to compute an object's real direction of motion.
    || Solving the aperture problem takes time. MT Data (Pack, Born 2001), MT simulation (Chey, Grossberg, Mingolia 1997)
  • image p320fig08.39 Simulation of the barberpole illusion direction field at two times. Note that the initial multiple directions due to the feature tracking signals at the contiguous vertical and horizontal sides of the barberpole (upper image) get supplanted by the horizontal direction of the two horizontal sides (lower image).
    || Barberpole illusion (one line) simulation
  • image p321fig08.40 Visible occluders capture the boundaries that they share with moving edges. Invisible occluders do not. Consequently, the two types of motions are influenced by different combinations of feature tracking signals.
    || Motion grouping across occluders (J. Lorenceau, D. Alais 2001). Rotating contours observed through apertures. Determine direction of a circular motion. [, in]visible occluders http://persci.mit.edu/demos/square/square.html
  • image p322fig08.41 A percept of motion transparency can be achieved by using motion grouping feedback that embodies the "asymmetry between near and far" along with the usual opponent competition between opposite motion directions.
    || Motion transparency. near: big scale; far: small scale MSTv, "Asymmetry between near and far" Inhibition from near (large scales) to far (small scales) at each position
  • image p323fig08.42 The chopsticks illusion not only depends upon how feature tracking signals are altered by visible and invisible occluders, but also upon how the form system disambiguates the ambiguous region where the two chopsticks intersect and uses figure-ground mechanisms to separate them in depth.
    || Chopsticks: motion separation in depth (Anstis 1990). [, in]visible occluders [display, percept]
  • image p324fig08.43 Attention can flow along the boundaries of one chopstick and enable it to win the orientation competition where the two chopsticks cross, thereby enabling bipole grouping and figure-ground mechanisms to separate them in depth within the form cortical stream.
    || The ambiguous X-junction. motion system. Attention propagates along chopstick and enhances cell activations in one branch of a chopstick. MT-MST directional motion grouping helps to bridge the ambiguous position.
  • image p325fig08.44 Attentional feedback from MST-to-MT-to-V2 can strengthen one branch of a chopstick (left image). Then bipole cell activations that are strengthened by this feedback can complete that chopstick's boundaries across the ambiguous X region (right image).
    || The role of MT-V1 feedback. Motion-form feedback: MT-to-V2 feedback strengthens boundaries of one bar. Bipole boundary completion: Bipole grouping helps to complete bar boundary even if motion grouping does not cross the gap.
  • image p325fig08.45 The feedback loop between MT/MST-to-V1-to-V2-to-MT/MST enables a percept of two chopsticks sliding one in front of the other while moving in opposite directions.
    || Closing formotion feedback loop. [formotion interaction, motion grouping] V1 -> V2 -> (MT <-> MST) -> V1
  • image p326fig08.46 How do we determine the relative motion direction of a part of a scene when it moves with a larger part that determines an object reference frame?
    || How do we perceive relative motion of object parts?
  • image p327fig08.47 Two classical examples of part motion in a moving reference frame illustrate the general situation where complex objects move while their multiplie parts may move in different directions relative to the direction of the reference frame.
    || Two kinds of percepts and variations (Johansson 1950). Symmetrically moving inducers: each do moves along a straight path, each part contributes equally to common motion; Duncker wheel (Duncker 1929): one dot moves on a cycloid, the other dot (the "center") moves stright, unequal contributipon from parts; If the dot is presented alone: seen as cycloid; if with center: seen as if it were on the rim of a wheel.
  • image p328fig08.48 How vector subtraction from the reference frame motion direction computes the part directions.
    || How vector decomposition can explain them. Common motion subtracted from retinal motion gives part motion: [retinal, common, part] motion
  • image p328fig08.49 A directional peak shift in a directional hypercolumn determines the part directions relative to a moving reference frame.
    || What is the mechanism of vector decomposition? (Grossberg, Leveille, Versace 2011). Prediction: directional peak shift! ...specifically, a peak shift due to Gaussian lateral inhibition. [retinal, part, common, relative] motion. shunting dynamics, self-normalization, contrast gain control
  • image p329fig08.50 The common motion direction of the two dots builds upon illusory contours that connect the dots as they move through time. The common motion directin signal can flow along these boundaries.
    || How is common motion direction computed? retinal motion. Bipole grouping in the form stream creates illusory contours between the dots. V2-MT formotion interaction injects the completed boundaries into the motion stream where they capture consistent motion signals. Motion of illusory contours is computed in the motion stream: cf. Ramanchandran
  • image p329fig08.51 Large and small scale boundaries differentially form illusory contours between the dots and boundaries that surround each of them respectively. These boundaries capture the motion signals that they will support via V2-to-MT formotion interaction. The MST-to-MT directional peak shift has not yet occurred.
    || Large scale: near. Can bridge gap between dots to form illusory contours. Spatial competition inhibits inner dot boundaries.; Small scale: far. Forms boundaries around dots.
  • image p330fig08.52 Direction fields of the object frame (left column) and of the two dot "parts" (right column) show the correct motion directions after the peak shift top-down expectation acts.
    || Simulation of motion vector decomposition. [Larger scale (nearer depth), Small scale (farther depth)] vs [Down, Up]
  • image p330fig08.53 Simulation of the various directional signals of the left dot through time. Note the amplification of the downward directional signal due to the combined action of the short-range and long-range directional signals.
    ||
  • image p331fig08.54 The simulated part directions of the rotating dot through time after the translational motion of the frame does its work via the top-down peak shift mechanism.
    || Cycloid. Motion directions of a single dot moving slowly along a cycloid curve through time.
  • image p331fig08.55 The rightward motion of the dot that determines the frame propagates along the illusory contour between the dots and thereby dominates the motion directions along the rim as well, thereby setting the stage for the peak shift mechanism.
    || Duncker Wheel: large scale. [cyc;oid, center] velocity -> rightward common velocity. Stable rightward motion at the center captures motion at the rim.
  • image p332fig08.56 Simulation of the Duncker Wheel motion through time. See the text for details.
    || Duncker Wheel: small scale. Temporal procession of activity in eight directions. Wheel motion as seen when directions are collapsed.
  • image p332fig08.57 The MODE model uses the Motion BCS as its front end, followed by a saccadic target selection circuit in the model LIP region that converts motion directions into movement directions. These movement choices are also under basal ganglia (BG) control. More will be explained about the BG in Chapters 13 and 15.
    || MODE (MOtion DEcision) model (Grossberg, Pilly 2008, Vision Research). Change sensitive receptors -> directional transient cells -> directiponal short-range filter -> spatial and directional competition -> directional long-range filter (MT) <-> directional grouping network (MSTv) -> saccadic target selection <-> gsting mechanism (BG). Representation of problem that solves the aperture problem (change sensitive receptors (CSR) -> directional grouping network (DGN, MSTv)). Gated movement choice (saccadic target selection & gating mechanism)
  • image p333fig08.58 Neurophysiological data (left image) and simulation (right image) of LIP data during correct trials on the RT task. See the text for details.
    || LIP responses during RT task correct trials (Roltman, Shadlen 2002). More coherence in favored direction causes faster cell activation. More coherence in opposite direction causes faster cell inhibition. Coherence stops playing a role in the final stages of LIP firing.
  • image p334fig08.59 Neurophysiological data (left column) and simulations (right column) of LIP responses for the FD task during both [correct, error] trials. See the text for details.
    || LIP responses for the FD task during both [correct, error] trials (Shadlen, Newsome 2001). LIP encodes the perceptual decision regardless of the true direction of the dots. Predictiveness of LIP responses on error trials decreases with increasing coherence.
  • image p334fig08.60 Behavioral data (left image) and simulation (right image) about accuracy in both the RT and FD tasks. See text for details
    || Behavioral data: % correct vs % coherence (Mazurek etal 2003; Roltman, Shadien 2002). More coherence in the motion causes more accurate decisions. RT task accuracy at weaker coherence levels is slightly better than FD task accuracy.
  • image p335fig08.61 Behavioral data (left image) and simulation (right image) about speed in correct and error trials of the RT task. See text for details.
    || Behavioral data: speed, correct and error trials (RT task) (Roltman, Shadien 2002). More coherence in the motion causes faster reaction time.
  • image p335fig08.62 More remarkable simulation fits (right column) to LIP neurophysiology data (left column) about where and when to move the eyes.
    || LIP encodes not only where, but also when, to move the eyes. ...No Bayes(Roltman, Shadien 2002). Firing rate (sp/s) vs time (ms). Slope of firing rate (sp/s^2) vs % correct.
  • image p338fig09.01 The brain regions that help to use visual information for navigating in the world and tracking objects are highlighted in yellow.
    || How does a moving observer use optic flow to navigate while tracking a moving object? [What ventral, Where dorsal] retina -> many locations -> PFC
  • image p338fig09.02 Heading, or the direction of self-motion (green dot), can be derived from the optic flow (red arrows) as an object, in this case an airplane landing, moves forward.
    || Heading and optic flow (Gibson 1950). Optic flow: scene motion generates a velocity field. Heading: direction of travel- self-motion direction. Heading from optic flow, focus of expansion (Gibson 1950). Humans determine heading accurately to within 1-2 degrees.
  • image p339fig09.03 When an observer moves forward, an expanding optic flow is caused. Eye rotations cause a translating flow. When these flows are combined, a spiral flow is caused. How do our brains compensate for eye rotations to compute the heading of the expanding optic flow?
    || Optic flow during navigation (adapted from Warren, Hannon 1990) [observer, retinal flow]: [linear movement, expansion], [eye rotation, translation], [combined motion, spiral]
  • image p339fig09.04 This figure emphasizes that the sum of the expansion and translation optic flows is a spiral optic flow. It thereby raises the question: How can the translation flow be subtracted from the spiral flow to recover the expansion flow?
    || Eye rotations add a uniform translation to an flow field. Resulting retinal patterns are spirals. Expansion + translation = spiral
  • image p340fig09.05 An outflow movement command, also called efference copy or corollary discharge, is the souce ot the signals whereby the commanded eye movement position is subtracted from spiral flow to recover expansion flow and, with it, heading.
    || Subtracting efference copy. Many experiments suggest that the brain internally subtracts the translational component due to eye movements. Efference copy subtracts the translational component using pathways that branch from outflow movement commands to the eye muscles.
  • image p340fig09.06 Corollary discharges are computed using a branch of the outflow movement commands that move their target muscles.
    ||
  • image p340fig09.07 Log polar remapping from the retina to cortical area V1 and beyond converts expansion, translation, and spiral flows on the retina into parallel flows, with different orientations, on the cortical map.
    || Log polar remapping of optic flow. retina -> cortex. Any combination of expansion and circular motion centered on the fovea maps to cortex as a single direction. Retinal Cartesian coordinates (x,y) map to cortical polar coordinates (r,theta). This makes it easy to compute directional receptive fields in the cortex!
  • image p341fig09.08 How the various optic flows on the retina are mapped through V1m MT, and MSTd to then compute heading in parietal cortex was modeled by (Grossberg, Mingolia, Pack 1999), using the crucial transformation via V1 log polar mapping into parallel cortical flow fields.
    || MSTd model (Grossberg, Mingolia, Pack 1999). Retinal motion -> V1 log polar mapping -> Each MT Gaussian RF sums motion in preferred direction -> Each MSTd cell sums MT cell inputs with same log polar direction -> Efference copy subtracts rotational flow from MSTd cells.
  • image p341fig09.09 Responses of MSTd cells that are used to compute heading. See the text for details.
    || Cortical area MSTd (adapted from Graziano, Anderson, Snowden 1994). MSTd cells are sensitive to spiral motion as combinations of rotation and expansion.
  • image p342fig09.10 Model simulations of how the peak of MSTd cell activation varies with changes of heading.
    || Heading in log polar space: Retina -> log polar -> MSTd cell. Log polar motion direction correlates with heading eccentricity.
  • image p342fig09.11 Psychophysical data (left panel) and computer simulation (right column) of the importance of efference copy in real movements. See the text for details.
    || Heading: move to wall and fixate stationary object (adapted from Warren, Hannon 1990). Inaccurate for simulated eye rotation, accurate for real eye rotation, need confirmation by efference copy!
  • image p343fig09.12 Transforming two retinal views of the Simpsons into log polar coordinates dramatizes the problem that our brains need to solve in order to separate, and recognize, overlapping figures.
    || View 1 cortical magnification. View 2 How do we know if we are still fixating on the same object?!
  • image p343fig09.13 When one scans the three different types of pears in the left image, as illustrated by the jagged blue curve with red movement end positions, and transforms the resulting retinal images via the cortical magnification factor, or log polar mapping, the result is the series of images in the right column. How do our brains figure out from such confusing data which views belong to which pear?
    || View-invariant object learning and recognition Three pears: Anjou, Bartlett, Comice. Which is the Bartlett pear? During unsupervised scanning and learning about the world, no one tells the brain what views belong to which objects while it learns view-invariant object categories. Cortical magnificantion in V1.
  • image p344fig09.14 (top row, left column) By fitting MT tuning curves with Gaussian receptive fields, a tuning width of 38° is estimated, and leads to the observed standard spiral tuning of 61° in MSTd. (bottom row, left column) The spiral tuning estimate in Figure 9.16 maximizes the position invariant of MSTd receptive fields. (top row, right column) Heading sensitivity is not impaired by these parameter choices.
    || [Spiral tuning (deg), position invariance (deg^(-1)), heading sensitivity] versus log polar direction tuning σ (deg)
  • image p345fig09.15 Double opponent directional receptive fields in MT are capable of detecting the motion of objects relative to each other and their backgrounds.
    || Motion opponency in MT (Born, Tootell 1992). Motion opponent (Grossberg etal), Differential motion (Royden etal), Subtractive motion cells (Neumann etal). ON center directionally selective: [excit, inhibit]ed by motion in [one, opponent] direction. OFF surround directionally selective: [excit, inhibit]ed by motion in [opponent, center] direction.
  • image p346fig09.16 A macrocircuit of some of the main brain regions that are used to move the eyes. Black boxes denote areas belonging to the saccadic eye movement systes (SAC), white boxes the smooth pursuit eye system (SPEM), and gray boxes, both systems. The abbreviations for the different brain regions are: LIP - Lateral Intra-Parietal area; FPA - Frontal Pursuit Area; MST - Middle Superior Temporal area; MT - Middle Temporal area; FEF - Frontal Eye Fields; NRPT - Nucleus Reticularis Tegmenti Pontis; DLPN - Dorso-Lateral Pontine Nuclei; SC - Superior Colliculus; CBM - CereBelluM; MVN/rLVN - Medial and Rostro-Lateral Vestibular Nucleii; PPRF - a Peri-Pontine Reticular Formation; TN - Tonic Neurons
    ||
  • image p347fig09.17 The leftward eye movement control channel in the model that I developed with Christopher Pack. See the text for details.
    || retinal image -> MT -> MST[v,d] -> pursuit
  • image p347fig09.18 These circuits between MSTv and MSTd enable predictive target tracking to be achieved by the pursuit system, notably when the eyes are successfully foveating a moving target. Solid arrows depict excitatory connections, dashed arrows depict inhibitory connections.
    ||
  • image p348fig09.19 How a constant pursuit speed that is commanded by MSTv cells starts by using target speed on the retina and ends by using backgound speed on the retina in the reverse direction during successful predictive pursuit.
    || target speed on retina, background speed on retina, pursuit speed command by MSTV cells
  • image p349fig09.20 Using virtual reality displays (left image), (Fajen, Warren 2003) collected data (right two images) about how observers avoid obstacles (open circular disks) as a function of their distance and angular position as they navigate towards a fixed goal (x). These data illustrate how goals act as attractors while obstacles act as repellers.
    || Steering from optic flow (Fajen, Warren 2003). goals are attractors, obstacles are repellers. Damped spring model explains human steering data.
  • image p349fig09.21 How attractor-repeller dynamics with Gaussians change the net steering gradient as the goal is approached.
    || Steering dynamics: goal approach. body-centered coordinates [obstacle, goal, heading] -> steering
  • image p350fig09.22 How the negative Gaussian of an obstacle causes a peak shift to avoid the obstacle without losing sight of how to reach the goal.
    || Steering dynamics: obstacle avoidance. body-centered coordinates [obstacle, goal, heading] -> steering
  • image p350fig09.23 Unidirectional transient cells respond to changes in all image contours as an auto navigates and urban scene while taking a video of it.
    || Unidirectional transient cells (Baloch, Grossberg 1997; Berzhanskaya, Grossberg, Mingolia 2007). Transient cells respond to leading and trailing boundaries. Transient cells response, driving video
  • image p351fig09.24 Directional transient cells respond most to motion in their preferred directions.
    || Directional transient cells. 8 directions, 3 speeds
  • image p351fig09.25 By the time MT+ is reached, directional transient cells and directional filters have begun to extract more global directional information from the image.
    || M+ computes global motion estimate. Estimate global motion from noisy local motion estimates.
  • image p352fig09.26 The final stage of the model computes a beautiful expansion optic flow that permits an easy estimate of the heading direction, with an accuracy that matches that of human navigators.
    || The model generates accurate heading (Warren, Hannon 1990; Royden, Crowell, Banks 1994). Maximally active MSTd cell = heading estimate. Accuracy matches human data. Random dots [mean +-1.5°, worst +-3.8°], Random dots with rotation [accurate with rotations <1°/s, rotation increases, error decreases], OpenGL & Yosemite benchmark +-1.5°, Driving video +-3°.
  • image p354fig10.01 The lamiar cortical circuit that realizes how we pay attention to an object sends signals from layer 6 of a higher cortical level to layer 6 of a lower cortical level and then back up to layer 4. This "folded feedback" circuit realizes a top-down, modulatory on-center, off-surround circuit that realizes the ART Matching Rule.
    || Top-down attention and folded feedback. Attentional signals also feed back into 6-to-4 on-center off-surround. 1-to-5-to-6 feedback path: Macaque (Lund, Booth 1975) cat (Gilbert, Wiesel 1979). V2-to-V1 feedback is on-center off-surround and affects layer 6 of V1 the most (Bullier etal 1996; Sandell, Schiller 1982). Attended stimuli enhanced, ignored stimuli suppressed. This circuit supports the predicted ART Matching Rule! [LGN, V[1,2][6->1]]
  • image p355fig10.02 Distinguishing processes of seeing vs knowing has been difficult because they interact so strongly.
    || Seeing vs. Knowing. Seeing and knowing [operate at different levels of the brain, use specialized circuits], but they [interact via feedback, use similar cortical designs, feedback is needed for conscious perception]. Cerebral Cortex: Seeing [V1-V4, MS-MST], Knowing [IT, PFC].
  • image p356fig10.03 Laminar computing achieves at least three basic properties of visual processing that have analogs in all biologically intelligent behaviors. These properties may be found in all cortical circuits in specialized form.
    || What does Laminar Computing achieve? 1. Self-stabilizing development and learning; 2. Seamless fusion of a) pre-attentive automatic bottom-up processing, b) attentive task-selective top-down processing; 3. Analog coherence: Solution of Binding Problem for perceptual grouping without loss of analog sensitivity. Even the earliest visual cortical stages carry out active adaptive information processing: [learn, group, attention]ing
  • image p357fig10.04 Laminar Computing achieves its properties by computing in a new way that sythesizes the best properties of feedforward and feedback interactions, analog and digital computations, and preattentive and attentive learning. The property of analog coherence enables coherent groupings and decisions to form without losing sensitivity to the amount of evidence that supports them.
    || Laminar Computing: a new way to compute. 1. Feedforward and feedback: a) Fast feedforward processing when data are unambiguous (eg Thorpe etal), b) slower feedback chooses among ambiguous alternatives [self-normalizing property, real-time probabiligy theory], c) A self-organizing system that trades certainty against speed: Goes beyond Bayesian models! 2. Analog and Digital: Analog Coherence combines the stability of digital with the sensitivity of analog. 3. Preattentive and Attentive Learning: Reconciles the differences of (eg) Helmholtz and Kanizsa, "A preattentive grouping is its own 'attentional' prime"
  • image p359fig10.05 Activation of V1 is initiated, in part, by direct excitatory signals from the LGN to layer 4 of V1.
    || How are layer 2/3 bipole cells activated? Direct bottom-up activation of layer 4. LGN -> V1 layer 4. Strong bottom-up LGN input to layer 4 (Stratford etal 1996; Chung, Ferster 1998). Many details omitted.
  • image p359fig10.06 Another, albeit indirect, pathway from LGN exists that can also excite layer 4 of V1. Why are not these two pathways redundant? The answer, ultimately, how to do with how cortex learns, as well as with how it pays attention. See the text for details.
    || Another bottom-up input to layer 4: Why?? Layer 6-to-4 on-center off-surround (Grieve, Sillito 1991, 1995; Ahmedetal 1994, 1997). LGN projects to layers 6 and 4. Layer 6 excites spiny stellates in column above it. Medium range connections onto inhibitory neurons. 6-t-4 path acts as on-center off-curround.
  • image p359fig10.07 The two bottom-up pathways from LGN to layer 4 of V1 can together activate layer 4 and contrast-normalize layer 4 responses.
    || Bottom-up contrast normalization (Grossberg 1968, 1973; Sperling, Sondhi 1968; Heeger 1992; Douglas etal 1995; Shapley etal 2004). Together, direct LGN-to-4 path and 6-to-4 on-center off-surround provide contrast normalization if cells obey shunting or membrane equation dynamics.
  • image p360fig10.08 The bottom-up on-center off-surround from LGN-to-6-to-4 has a modulatory on-center because of its role in realizing the ART Matching Rule and, with it, the ability of the cortex to dynamically stabilize its learned memories.
    || Modulation of priming by 6-to-4 on-center (Stratford etal 1996; Callaway 1998). On-center 6-to-4 excitation is inhibited down to being modulatory (priming, subthreshold). On-center 6-to-4 excitation cannot activate layer 4 on its own. Clarifies need for direct path. Prediction: plays key role in stable grouping, development and learning. ART Matching Rule!
  • image p360fig10.09 Perceptual grouping is carried out in layer 2/3 by long-range horizontal excitatory recurrent connections, supplemented by short-range disynaptic inhibitory connections that together realize the bipole grouping properties that are diagrammed in Figure 10.10.
    || Grouping starts in layer 2/3. LGN-> 6-> 4-> 2/3: 1. Long-range horizontal excitation links collinear, coaxial receptive fields (Gilbert, Wiesel 1989; Bosking etal 1997; Schmidt etal 1997) 2. Short-range disynaptic inhibition of target pyramidal via pool of intraneurons (Hirsch, Gilbert 1991) 3. Unambiguous groupings can form and generate feedforward outputs quickly (Thorpe etal 1996).
  • image p361fig10.10 Bipole grouping is achieved by long-range horizontal recurrent connections that also give rise to short-range inhibitory interneurons which inhibit nearby bipole cells as well as each other.
    || Bipole property controls perceptual grouping. Collinear input on both sides. Excitatory inputs summate. Inhibitory inputs normalize, Shunting inhibition! Two-against-one. Cell is excited.
  • image p362fig10.11 Feedback between layer 2/3 to the layer 6-to-4-to-2/3 feedback loop chooses the strongest grouping in cases where there is more than one. If only one grouping exists, then the circuit can function very quickly in a feedforward manner. When multiple groupings exist, the cortex "runs as fast as it can" to select the one with the most evidence to support it using the self-normalizing inhibition in the layer 6-to-4 off-surround.
    || How is the final grouping selected? Folded feedback LGN-> 6-> 4-> 2/3. 1. Layer 2/3 groupings feed back into 6-to-4 on-center off-surround: a) direct layer 2/3 -to-6 path; b) can also go via layer 5 (Blasdel etal 1985; Kisvarday etal 1989). 2. Strongest grouping enhanced by its on-center. 3. Inputs to weaker groupings suppressed by off-surround. 4. Interlaminar feedback creates functional columns. Activities of conflicting groupings are reduced by self-normalizing inhibition, slowing processing; intracortical feedback selects and contrast-enhances the winning grouping, speeding processing.
  • image p363fig10.12 The same laminar circuit design repeats in V1 and V2, albeit with specializations that include longer horizontal grouping axoms and figure-ground separation interactions.
    || V2 repeats V1 circuitry at larger spatial scale, LGN-> V1[6,4,2/3]-> V2[6,4,2/3]. V2 layer 2/3 horizontal axons longer-range than in V1 (Amir etal 1993). Therefore, longer-range groupings can form in V2 (Von der Heydt etal 1984)
  • image p364fig10.13 The bottom-up adaptive filter, intracortical grouping circuit, and intercortical top-down attentional circuit all use the same competitive decision circuit between layers 6 and 4, called the attention-preattention interface, with which to select the featural patterns that will be processed.
    || Bottom-up filters and intracortical grouping feedback use the same 6-to-4 decision circuit, LGN-> Vx[6,4,2/3]. competitive decision circuit, modulatory on-center off-surround network. Top-down intercortical attention also uses the same 6-to-4 decision circuit!
  • image p364fig10.14 This figure emphasizes how preattentive intracortical groupings and top-down intercortical attention share the same modulatory on-center, off-surround layer 4-to-6 decision circuit.
    || Explanation: grouping and attention share the same modulatory decision circuit. Layer 6-6-4-2/3 pathway shown; also a layer 6-1-2/3 path. intercortical attention, both act via a modulatory on-center off-surround decision circuit, intracortical feedback from groupings
  • image p367fig10.15 Data (left column) and simulation (right column) of how attention prevents a masking stimulus from inhibiting the response to the on-center of the cell from which the recording was made.
    || Attention protects target from masking stimulus (Reynolds etal 1999; Grossberg, Raizada 2000).
  • image p367fig10.16 Neurophysiological data (left image) and simulation (right image) of how a low-contrast target can be facilitated if it is surrounded by a paid (31May2023 Howell - is word correct?) of collinear flankers, and suppresssed by them if it has high contrast.
    || Flankers can enhance or suppress targets (Polat etal 1998; Grossberg, Raizada 2000). target alone, target + flankers, flankers alone.
  • image p368fig10.17 Neurophysiological data (left image) and simulation (right image) showing that attention has a greater effect on low contrast than high contrast targets.
    || Attention has greater effect on low contrast targets (DeWeerd etal 1999; Raizada, Grossberg 2001). Threshold increase (deg) vs Grating contrast (%), [no, with] attention
  • image p368fig10.18 Neurophysiological data (left image) and simulation (right image) of relative on-cell activities when the input to that cell may also be surroubded by iso-orientation or perpendicular textures.
    || Texture reduces response to a bar: iso-orientation suppression (Knierim, van Essen 1992), perpendicular suppression (Raizada, Grossberg 2001)
  • image p369fig10.19 Data from (Watanabe etal 2001) showing perceptual learning of the coherent motion direction, despite the lack of extra-foveal attention and awareness of the moving stimuli.
    || Unconscious perceptual learning of motion direction, % correct for two tests, compared to chance level results.
  • image p371fig11.01 FACADE theory explains how the 3D boundaries and surfaces are formed with which we see the world in depth.
    || 3D Vision and figure-ground perception (Grossberg 1987, 1994, 1997). How are 3D boundaries and 3D surfaces formed? How the world looks without assuming naive realism. Form And Color And DEpth theory (FACADE). Prediction: Visible figure-ground-separated Form-And-Color-And-DEpth are represented in cortical area V4.
  • image p372fig11.02 FACADE theory explains how multiple depth-selective boundary representations can capture the surface lightnesses and colors at the correct depths. The fact that both surface qualia and depth are determined by a single process implies that, for example, a change in brightness can cause a change in depth.
    || 3D surface filling-in. From filling-in of surface lightness and color to filling-in of surface depth. Prediction: Depth-selective boundary-gated filling-in defines the 3D surfaces that we see. Prediction: A single process fills-in lightness, color, and depth. Can a change in brightness cause a change in depth? YES! eg proximity-luminance covariance (Egusa 1983, Schwartz, Sperling 1983). Why is depth not more unstable when lighting changes? Prediction: Discounting the illuminant limits variability.
  • image p373fig11.03 Both contrast-specific binocular fusion and contrast-invariant boundary perception are needed to properly see the world in depth.
    || How to unify contrast-specific binocular fusion with contrast-invariant boundary perception? Contrast-specific binocular fusion: [Left, right] eye view [, no] binocular fusion. Contrast-invariant boundary perception: contrast polarity along the gray square edge reverses; opposite polarities are pooled to form object boundary.
  • image p374fig11.04 The three processing stages of monocular simple cells, and complex cells accomplish both contrast-specific binocular fusion and contrast-invariant boundary perception.
    || Model unifies contrast-specific binocular fusion and contrast-invariant boundary perception (Ohzawa etal 1990; Grossberg, McLoughlin 1997). [Left, right] eye V1-4 simple cells-> V1-3B simple cells-> V1-2/3A complex cells. Contrast-specific stereoscopic fusion by disparity-selective simple cells. Contrast-invariant boundaries by pooling opposite polarity binocular simple cells at complex cells layer 2/3A.
  • image p374fig11.05 The brain uses a contrast constraint on binocular fusion to help ensure that only contrasts which are derived from the same objects in space are binoculary matched.
    || Contrast constraint on binocular fusion. Left and right input from same object has similar contrast, Percept changes when one contrast is different. Fusion only occurs between bars of similar contrast (McKee etal 1994)
  • image p375fig11.06 The contrast constraint on binocular fusion is realized by obligate cells in layer 3B of cortical area V1.
    || Model implements contrast constraint on binocular fusion (cf. "obligate" cells Poggio 1991). An ecological constraint on cortical development. [left, right] eye cart V1-[4 monocular simple, 3B binocular simple, complex2/3A] cells. Inhibitory cells (red) ensure that fusion occurs when contrasts in left and right eye are approximately equal.
  • image p375fig11.07 The 3D LAMINART model uses both monocular and binocular simple cells to binocularly fuse like image contrasts. The remainder of the model generates 3D boundary and surface representations of multiple kinds of experiments as well as of natural scenes.
    || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A], V2 thin stripe [4->2/3A], V4]. V1 blob [V1-4 monocular, V1 interior binocular] simple cells. [complex, simple, inhibitory] cells, on-center off-surround
  • image p376fig11.08 The contrast constraint on binocular fusion is not sufficient to prevent many of the false binocular matches that satisfy this constraint.
    || How to solve the correspondance problem? How does the brain inhibit false matches? Contrast constraint is not enough. [stimulus, multiple possible binocular matches] - Which squares in the two retinal images must be fused to form the correct percept?
  • image p376fig11.09 The disparity filter in V2 helps to solve the correspondence problem by eliminating spurious contrasts using line-of-sight inhibition.
    || Model V2 disparity filter solves the correspondence problem. An ecological constraint on cortical development. [left, right] eye view: False matches (black) suppressed by line-of-sight inhibition (green lines). "Cells that fire together wire together".
  • image p376fig11.10 The 3D LAMINART model shows how the disparity filter can be integrated into the circuit that completes 3D boundary representations using bipole grouping cells. It also explains how surface contours can strengthen boundaries that succeed in generating closed filling-in domains.
    || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A] surface contour, V2 thin stripe (monocular surface) [4->2/3A], V2 interior [disynaptic inhibitory interneurons, bipole grouping cells, disparity filter, V4 binocular surface]. [complex, simple, inhibitory] cells, on-center off-surround
  • image p377fig11.11 DaVinci stereopsis phenomena occur when only one eye can receive visual inputs from part of a 3D scene due to occlusion by a nearer surface.
    || How does monocular information contribute to depth perception? DaVinci steropsis (Gillam etal 1999). Only by utilizing monocular information can visual system create correct depth percept. [left, right] eye view
  • image p378fig11.12 How monocular and binocular information are combined in V1 and V2 in the 3D LAMINART model.
    || Model utilizes monocular information. [left, right] eye cart V1-[4 monocular simple, 3B binocular simple, complex2/3A [mo,bi]nocular] cells, V2-4 binocular complex cells. black = monocular cells, blue = binocular cells. In V2, monocular inputs add to binocular inputs along the line of sight and contribute to depth perception.
  • image p379fig11.13 How the 3D LAMINART model explains DaVinci stereopsis. All the stages of boundary and surface formation are color coded to clarify their explanation. Although each mechanism is very simple, when all of them act together, the correct depthful surface representation is generated. See the text for details.
    || DaVinci stereopsis (Nakayama, Shimojo 1990). An emergent property of the previous simple mechanisms working together. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] pair [Binocular match: boundaries of thick bar -> Add monocular boundaries along lines-of-sight -> Line-of-sight inhibition kills weaker vertical boundaries -> 3D surface percept not just a disparity match!] pair [Binocular match: right edge of thin and thick bars -> Strongest boundaries: binocular and monocular boundaries add -> Vertical boundaries from monocular left edge of thin bar survive -> Filling-in contained by connected boundaries]. cart [very near, near, fixation plane, far, very far]
  • image p380fig11.14 The model explanation of DaVinci stereopsis when the input stimuli have opposite contrast polarities.
    || Polarity-reversed Da Vinci stereopsis (Nakayama, Shimojo 1990). Same explanation! (... as Figure 11.13 ...) [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p381fig11.15 The same model mechanisms explain the surface percept that is generated by the variant of DaVinci stereopsis that Gillam, Blackburn, and Nakayama studied in 1999.
    || DaVinci stereopsis (Gillam, Blackburn, Nakayama 1999). same model mechanisms. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p382fig11.16 The version of DaVinci steropsis wherein three narrow rectangles are binocularly matched with one thick rectangle can also be explained is a similar way.
    || DaVinci stereopsis of [3 narrow, one thick] rectangles (Gillam, Blackburn, Nakayama 1999). [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p383fig11.17 The bars in the left and right images that are in the same positions are marked in red to simplify tracking how they are processed at subsequent stages.
    || The Venetian blind effect (Howard, Rogers 1995). Every second bar on L in same position as every third bar on R. These bars are marked in red; see them match in Fixation Plane. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p384fig11.18 Surface and surface-to-boundary surface contour signals that are generated by the Venetian blind image.
    || Venetian blind effect (Howard, Rogers 1995). Every second bar on L in same position as every third bar on R. PERCEPT: 3-bar ramps sloping up from L to R with step returns. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p385fig11.19 Dichoptic masking occurs when the bars in the left and right images have sufficiently different contrasts.
    || Dichoptic masking (McKee, Bravo, Smallman, Legge 1994). [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p385fig11.20 Dichoptic masking occurs in Panum's limiting case for reasons explained in the text.
    || Dichoptic masking in Panum's limiting case (McKee, Bravo, Smallman, Legge 1995). Panum's limiting case is a simplified version of the Venetian blind effect! [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p386fig11.21 A simulation of the Craik-O'Brien-Cornsweet Effect when viewed on a planar surface in depth.
    || Craik-O'Brien-Cornsweet Effect. Can the model simulate other surface percepts? eg surface brightness. The 2D surface with the image on it is viewed at a very near depth. Adapts (Grossberg, Todovoric 1988) to 3D. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
  • image p387fig11.22 Simulation of the boundaries that are generated by the Julesz stereogram in Figure 4.59 (top row) without (second row) and with (third row) surface contour feedback.
    || Boundary cart [V2-2, V2, V1] cart [near, fixation, far]
  • image p388fig11.23 Simulation of the surface percept that is seen in response to a sparse stereogram. The challenge is to assign large regions of ambiguous white to the correct surface in depth.
    || [left, right] retinal input. Surface [near, fixation, far] V4
  • image p388fig11.24 Boundary groupings capture the ambiguous depth-ambiguous feature contour signals and lift them to the correct surface in depth.
    || [surface, boundary] cart [near, fixation, far] V2.
  • image p389fig11.25 Boundaries are not just edge detectors. If they were, a shaded ellipse would look flat, and uniformly gray.
    || 3D vision and figure-ground separation. Multiple-scale, depth-selective boundary webs. [dark-light, light-dark] boundaries -> complex cells! If boundaries were just edge detectors, there would be just a bounding edge of the ellipse. After filling-in, it would look like this:.
  • image p390fig11.26 Although larger scales sometimes look closer (left image), that is not always true, as the right image of (Brown, Weisstein 1988) illustrates. The latter percept is, moreover, bistable. These images show the importance of interactions between groupings and multiple scales to determine perceived surface depths.
    || Multiple-scale depth-selective groupings determine perceived depth (Brown, Weisstein 1988). As an object approaches, it gets bigger on the retina. Does a big scale (RF) always signal NEAR? NO! The same scale can signal either near or far. Some scales fuse more than one disparity.
  • image p391fig11.27 (left image) Each scale can binocularly fuse a subset of spatial scales, with larger scales fusing more scales and closer ones than small scales. (right image) Cortical hypercolumns enable binocular fusion to occur in a larger scale even as rivalry occurs in a smaller scale.
    || Multiple-scale grouping and size-disparity correlation. Depth-selective cooperation and competition among multiple scales determines perceived depth: a) Larger scales fuse more depth; b) Simultaneous fusion and rivalry. Boundary prining using surface contours: Surface-to-boundary feedback from the nearest surface that is surrounded by a connected boundary eliminates redundant boundaries at the same position and further depths.
  • image p391fig11.28 (left image) Ocular dominance columns respond selectively to inputs from one eye or the other. (right image) Inputs from the two eyes are mapped into layer 4C of V1, among other layers.
    || Cortex V1[1, 2/3, 4A, 4B, 4C, 5, 6], LGN
  • image p392fig11.29 Boundary webs of the smallest scales are closer to the boundary edge of the ellipse, and progressively larger scale webs penetrate ever deeper into the ellipse image, due to the amount of evidence that they need to fire. Taken together, they generate a multiple-scale boundary web with depth-selective properties that can capture depth-selective surface filling-in.
    || 3D vision and figure-ground separation. Multiple-scale, depth-selective boundary webs. Instead, different size detectors generate dense boundary webs at different positions and depths along the shading gradient. Small-far, Larger-nearer, Largest-nearest. Each boundary web captures the gray shading in small compartments at its position and depths. A shaded percept in depth results.
  • image p392fig11.30 Multiple scales interact with bipole cells that represent multiple depths, and conversely. See the text for details.
    || How multiple scales vote for multiple depths. Scale-to-depth and depth-to-scale maps. Smallest scale projects to, and receives feedback from, boundary groupings that represent the furthest depths. Largest scale connects to boundary groupings that represent all depths. multiple-[depth, scale] dot [grouping, filter] cells. [small <-> large] vs [far <-> near]
  • image p393fig11.31 (Todd, Akerstrom 1987) created a series of 2D images from discrete black patches on a white disk and showed how the perceived depth varies with the factors summarized in the figure. The LIGHTSHAFT model quantitatively simulated their data.
    || Factors determining depth-from-texture percept. Perceived depth varies with texture element width, but only when elements are elongated and sufficiently aligned with one another to form long-range groupings. Data of (Todd, Akerstrom 1987) simulated by the LIGHTSHAFT model of (Grossberg, Kuhlmann 2007). [HP, LP, CCE, CCS, RO]
  • image p393fig11.32 Kulikowski stereograms involve binocular matching of out-of-phase (a) Gaussians or (b) rectangles. The latter can generate a percept of simultaneous fusion and rivalry. See the text for why.
    ||
  • image p394fig11.33 The Kaufman stereogram also creates a percept of simultaneous fusion and rivalry. The square in depth remains fused and the perpendicular lines in the two images are pervceived as rivalrous.
    || 3D groupings determine perceived depth, stereogram (Kaufman 1974). Vertical illusory contours are at different disparities than those of bounding squares. Illusory square is seen in depth. Vertical illusory contours are binocularly fused and determine the perceived depth of the square. Thin, oblique lines, being perpendicular, are rivalrous: simultaneous fusion and rivalry.
  • image p395fig11.34 A comparison of the properties of other rivalry models with those of the 3D LAMINART model (surrounded by red border). Significantly, only 3D LAMINART explains both stable vision and rivalry (green border).
    || Comparison of rivalry models
  • image p396fig11.35 Three properties of bipole boundary grouping in V2 can explain how boundaries oscillate in response to rivalry-inducing stimuli. Because all boundaries are invisible, however, these properties are not sufficient to generate a conscious percept of rivalrous surfaces.
    || 3 V2 boundary properties cause binocular rivalry. 1. Bipole grouping, 2. Orientational competition, 3. Actovity-dependent habituation
  • image p397fig11.36 Simulation of the temporal dynamics of rivalrous, but coherent, boundary switching.
    || Simulation of 2D rivalry dynamics. [Inputs, Temporal dynamics of V2 layer 2/3 boundary cells] cart [left, right]
  • image p398fig11.37 Simulation of the no swap baseline condition of (Logothetis, Leopold, Sheinberg 1996).
    || [Binocular, [left, right] eye] activity
  • image p399fig11.38 Simulation of the swap condition of (Logothetis, Leopold, Sheinberg 1996).
    || [Binocular, [left, right] eye] activity
  • image p399fig11.39 Simulation of the eye rivalry data of (Lee, Blake 1999).
    || [Binocular, [left, right] eye] activity
  • image p400fig11.40 When planar 2D parallelograms are justaposed, the resultant forms generate 3D percepts that are sensitive to the configuration of angles and edges in the fugure. See the text for why.
    || 3D representation of 2D images, Monocular cues (eg angles) can interact together to yield 3D interpretation. Monocular cues by themselves are often ambiguous. Same angles and shapes, different surface slants. How do these ambiguous 2D shapes contextually define a 3D object form?
  • image p401fig11.41 The 3D LAMINART model proposes how angle cells and disparity-gradient interact through learning to generate 3D representations of slanted objects.
    || 3D LAMINART model. [LGN, V1, V2, V4] Four key additions: 1. Angle cells - tuned to various angles; 2. Disparity-gradient cells - tuned to disparity gradients in the image; 3. weights from [angle to disparity-gradient] cells - learned while viewing 3D image; Colinear grouping between [angle to disparity-gradient] cells - disambiguates ambiguous groupings.
  • image p401fig11.42 A hypothetical cortical hypercolumn structure proposes how angle cells and disparity-gradient cells, including bipole cells that stay within a given depth, may self-organize during development.
    || Hypercolumn representation of angles [leftm right] cart [far-to-near, zero, near-to-far]
  • image p402fig11.43 A pair of disparate images of a scene from the University of Tsukuba. Multiview imagre database.
    || input [left, right]
  • image p402fig11.44 3D scenic reconstruction of the image in Figure 11.43 by the 3D LAMINART model.
    || Disparity [5, 6, 8, 10, 11, 14]: images of objects in common depth planes
  • image p403fig11.45 The multiple boundary and surface scales that were used to simulate a reconstruction of the SAR image in Figure 3.24.
    || SAR processing by multiple scales. [boundaries before completion, boundaries after completion, surface filling-in] versus scale [small, medium, large]. large scale bipole
  • image p405fig12.01 A What ventral cortical stream and Where/How dorsal cortical stream have been described for audition, no less than for vision.
    || Partietal lobe: where; Temporal lobe: what. V1-> [[what: IT], [where: PPC-> DLPFC]]. A1-> [[what: [ST-> VLPFC], VLPFC], [where: [PPC-> DLPFC], DLPFC]].
  • image p406fig12.02 The Vector Integration to Endpoint, or VITE, model of arm trajectory formation enables the three S's of a movement to be realized: Synergy formation, Synchrony of muscles within a synergy, and variable Speed that is under volitional control (G). This is accomplished by subtracting a present position vector (P) from a target position vector (T) to form a difference vector (V) which moves P towards T at a speed that is determined by G.
    || The three S's of movement control. T-> D [-> [[D]+G]-> P->], P-> D (inhib), G-> [[D]+G]. 1. Synergy - Defining T determines the muscle groups that will contract during the movement. 2. Synchrony - When G turns on, all muscle groups for which D != 0 contract by variable amounts in equal time. Because G multiplies D, it does not change the direction in which P moves to acquire T: straight line movement. 3. Speed - P integrates D at rate G until P = T. Increasing (decreasing) G makes the movement faster (slower).
  • image p407fig12.03 Neurophysiological data showing how motor cortical cells code different vectors that are sensitive to both the direction of the commanded movement and its length.
    || (a) Single primary motor cortex neuron, onset of movement -> on..., radial architecture... (b) Motor cortex neuronal population, radial architecture...
  • image p409fig12.04 (top half) Neurophysiological data of vector cell responses in motor cortex. (bottom half) VITE model simulations of a simple movement in which the model's difference vector simulates the data as an emergent property of network interactions.
    || Neurophysiological data. VITE model [Present Position vector, Difference vector, Outflow velocity vector, go signal].
  • image p410fig12.05 VITE simulation of velocity profile invariance if the same GO signal gates shorter (a) or longer (b) movements. Note the higher velocities in (b).
    || [[short, long] cart [G, dP/dt]] vs time. G = GO signal, dP/dt = velocity profile.
  • image p410fig12.06 Monkeys seamlessly transformed a movement initiated towards the 2 o'clock target into one towards the 10 o'clock target when the later target was substituted 50 or 100 msec after activation of the first target light.
    ||
  • image p411fig12.07 The left column simulation by VITE shows the velocity profile when the GO signal (G) starts with the movement. The right signal column shows that the peak velocity is much greater if a second movement begins when the GO signal is already positive.
    || Higher peak velocity due to target switching. VITE simulation of higher peak speed if second target rides on first GO signal. [[first, second] target cart [G, dP/dt]] vs time. Second target GO is much higher. G = GO signal, dP/dt = velocity profile.
  • image p411fig12.08 Agonist-antagonist opponent organization of difference vector (DV) and present position vector (PPV) processing stages and how GO signals gate them.
    ||
  • image p412fig12.09 How a Vector Associative Map, or VAM, model uses mismatch learning during its development to calibrate inputs from a target position vector (T) and a present position vector (P) via mismatch learning of adaptive weights at the difference vector (D). See the text for details.
    || Vector Associative Map model (VAP). During critical period, Endogenous Random Generator (ERG+) tirns on, activates P, and causes random movements that sample workspace. When ERG+ shuts off, posture occurs. ERG- then turns on (rebound) and opens Now Print (NP) gate, that dumps P into T. Mismatch learning enables adaptive weights between T and D to change until D (the mismatch) appoaches 0. Then T and P are both correctly calibrated to represent the same positions.
  • image p413fig12.10 Processing stages in cortical areas 4 and 5 whereby the VITE model combines outflow VITE trajectory formation signals with inflow signals from the spinal cord and cerebellum that enable it to carry out movements with variable loads and in the presence of obstacles. See the text for details.
    || area 4 (rostral) <-> area 5 (caudal).
  • image p414fig12.11 Neurophysiological data from cortical areas 4 and 5 (every other column) and simulations thereof (other columns) during a reach.
    || activation vs time. (a) area 4 phasic RT (IFV) (b) area 4 tonic (OPV) (c) area 4 phasic-tonic (OFPV) (d) area 4 phasic MT (DVV) (e) area 5 phasic (DV) (f) area 5 tonic (PPV)
  • image p415fig12.12 The combined VITE, FLETE, cerebellar, and multi-joint opponent muscle model for trajectory formation in the presence of variable forces and obstacles.
    ||
  • image p416fig12.13 The DIRECT model learns, using a circular reaction that is energized by an Endogenous Random Generator, or ERG, to make motor-equivalent volitionally-activated reaches. This circular reaction learns a spatial representation of a target in space. It can hereby make accurate reaches with clamped joints and on its first try using a tool under visual guidance; see Figure 12.16.
    || DIRECT model (Bulloch, Grossberg, Guenther 1993). learns by circular reaction. learns spatial reresentation to me4diate between vision and action. motor-equivalent reaching. can reach target with clamped joints. can reach target with a TOOL on the first try under visual guidance. How did tool use arise?!
  • image p416fig12.14 Computer simulations of DIRECT reaches with (b) a tool, (c) a clamped elbow, and (d) with a blindfold, among other constraints.
    || Computer simulationsd of direct reaches [unconstrained, with TOOL, elbow clamped at 140°, blindfolded]
  • image p417fig12.15 The DIRECT and DIVA models have homologous circuits to learn and control motor-equivalent reaching and speaking, with tool use and coarticulation resulting properties. See the text for why.
    || From Seeing and Reaching to Hearing and Speaking, Circular reactions (Piaget 1945, 1951, 1952). Homologous circuits for development and learning of motor-equivalent REACHING and SPEAKING. DIRECT TOOL use (Bullock, Grossberg, Guenther 1993), DIVA Coarticulation (Guenther 1995)
  • image p418fig12.16 Anatomical interpretations of the DIVA model processing stages.
    || [Feedforward control system (FF), Feedback control subsystem (FB)]. Speech sound map (Left Ventral Premotor Cortex (LVPC)), Cerebellum, Articulatory velocity and position maps (Motor Cortex (MC)), Somatosensory Error Map (Inferior Parietal Cortex (IPC)), Auditory Error Map (Superior Temporal Cortex (STC)), Auditory State Map (Superior Temporal Cortex)), Somatosensory State Map (Inferior Parietal Cortex)), articulatory musculature via subcortical nuclei, auditory feedback via subcortical nuclei
  • image p419fig12.17 The auditory continuity illusion illustrates the ART Matching Rule at the level of auditory streaming. Its "backwards in time" effect of future context on past conscious perception is a signature of resonance.
    || Auditory continuity illusion. input, percept. Backwards in time - How does a future sound let past sound continue through noise? Resonance! - It takes a while to kick in. After it starts, a future tone can maintain it much more quickly. Why does this not happen if there is no noise? - ART Matching Rule! TD harmonic filter is maodulatory without BU input. It cannot create something out of nothing.
  • image p420fig12.18 The ARTSTREAM model explains and simulates the auditory continuity illusion as an example of a spectral-pitch resonance. Interactions of ART Matching Rule and asymmetric competition mechanisms in cortical strip maps explain how the tone selects the consistent frequency from the noise in its own stream while separating the rest of the noise into another stream.
    || ARTSTREAM model (Grossberg 1999; Grossberg, Govindarajan, Wyse, Cohen 2004). SPINET. Frequency and pitch strips. Bottom Up (BU) harmonic sieve. Top Down (TD) harmonic ART matching. Exclusive allocation. Learn pitch categories based on early harmonic processing. A stream is a Spectral-Pitch Resonance!
  • image p422fig12.19 The ARTSTREAM model includes mechanisms for deriving streams both from pitch and from source direction. See the text for details.
    || [left, right] cart Peripheral processing = [input signal-> outer & middle ear preemphasis-> basilar membrane gammatone filterbank-> energy measure]. Spectral stream layer-> spectral summation layer-> delays-> [f-, tau] plane-> pitch stream layer-> pitch summation layer.
  • image p423fig12.20 The Spatial Pitch Network, or SPINET, model shows how a log polar spatial representation of the sound frequency spectrum can be derived from auditory signals occuring in time. The spatial representation allows the ARTSTREAM model to compute spatially distinct auditory streams.
    || SPINET model (Spatial Pitch Network) (Cohen, Grossberg, Wyse 1995). 1. input sound 2. Gamma-tone filter bank 3. Shaort-term average energy spectrum 4. MAP transfer function 5. On-center off-surround and rectification 6. Harmonic weighting 7. Harmonic summation and competition -> PITCH
  • image p424fig12.21 One of the many types of data about pitch processing that are simulated by the SPINET model. See the text for details.
    || Pitch shifts with component shifts (Patterson, Wightman 1976; Schouten 1962). Pitch vs lowest harmonic number.
  • image p424fig12.22 Decomposition of a sound (bottom row) in terms of three of its harmonics (top three rows).
    ||
  • image p425fig12.23 ARTSTREAM simulations of the auditory continuity illusion and other streaming properties (left column, top row). When two tones are separated by silence (Input), a percept of silence also separates them in a spectral-pitch resonance. (left column, bottom row). When two tones are separated by broadband noise, the percept of tone continues through the noise in one stream (stream 1) while the remainder of the noise occurs in a different stream (stream 2). (right column) Some of the other streaming properties that have been simulated by the ARTSTREAM model.
    || Auditory continuity does not occur without noise. Auditory continuity in noise. Other simulated streaming data.
  • image p426fig12.24 Spectrograms of /ba/ and /pa/ show the transient and sustained parts of their spectrograms.
    ||
  • image p428fig12.25 (left architecture) Auditory-articulatory feedback loop whereby babbled sounds active learning in an imitative map that is later used to learn to reproduce the sounds of other speakers. An articulatory-to-auditory expectation renders learning possible by making the auditory and motor data dimensionally consistent, as in the motor theory of speech. (right architecture) Parallel streams in the ARTSPEECH model for learning speaker-independent speech and language meaning, including a mechanism for speaker normalization (right cortical stream) and for learning speaker-dependent vocalic qualities (left cortical stream).
    || left: Speaker-dependent vocalic qualities; right: Speaker-independent speech and language meaning
  • image p430fig12.26 The NormNet model shows how speaker normalization can be achieved using specializations of the same mechanisms that create auditory streams. See the text for how.
    || [Anchor vs Stream] log frequency map. -> diagonals-> Speaker-independent acoustic item information-> [BU adaptive filter, TD learned expectation]-> leaned item recognition categories
  • image p431fig12.27 The strip maps that occur in ARTSTREAM and NormNet are variants of a cortical design that aalso creates ocular dominance columns in the visual cortex.
    || Adult organization of V1 (Grinvald etal http://www.weizmann.ac.il/brain/images/cubes.html). (1) Occular dominance columns (OCDs): Alternating strips of cortex respond preferentially to visual inputs of each eye (R/L corresponds to Right and Left eye inputs in the figure); Orientation columns: A smooth pattern of changing orientation preference within each ODC. Organized in a pinwheel like fashion.
  • image p432fig12.28 (left image) The SpaN model simulates how spatial representations of numerical quantities are generated in the parietal cortex. (right image) Behavior numerosity data and SpaN model simulations of it.
    || (Left) preprocessor-> spatial number map-> Comparison wave. (Right) data axis: number of lever presses; model axis: node position in the spatial number axis
  • image p433fig12.29 Learning of place-value number maps language categories in the What cortical stream into numerical strip maps in the Where cortical stream. See the text for details.
    || (1) spoken word "seven"-> (2) What processing stream- learned number category <-> (3) What-Where learned assoociations <- (4) Where processing stream- spatial number map <-(5) visual clues of seven objects
  • image p436fig12.30 The conscious ARTWORD, or cARTWORD, laminar cortical speech model simulates how future context can disambiguate noisy past speech sounds in such a way that the completed percept is consciously heard to proceed from past to future as a feature-item-list resonant wave propagates through time.
    || cARTWORD: Laminar cortical model macrocircuit (Grossberg, Kazerounian 2011) Simulates PHONEMIC RESTORATION: Cognitive Working Memory (processed item sequences) - [Excitatory-> inhibitory-> habituative-> adaptive filter-> adaptive filter-> adaptive filter with depletable synapse-> Acoustic [item, feature]
  • image p436fig12.31 Working memories do not store longer sequences of events in the correct temporal order. Instead, items at the beginning and end of the list are oftem called first, and with the highest probability.
    || Working memory. How to design a working memory to code "Temporal Order Information" in STM before it is stored in LTM. Speech, language, sensory-motor control, cognitive planning. eg repeat a telephone number unless you are distracted first. Temporal order STM is often imperfect, eg Free Recall. [probability, order] of recall vs list position. WHY?
  • image p437fig12.32 Data from a free recall experiment illustrate the bowed serial position curve.
    || Serial position function for free recall Data: (Murdock 1962 JEP 64, 482-488). % correct vs position of word on a 40-word list. Primacy gradient can be a mixture of STM and LTM read-out.
  • image p437fig12.33 Item and Order working memory models explain free recall data, as well as many other psychological and neurobiological data, by simulating how temporal series of events are stored as evolving spatial patterns of activity at content-addressable item categories. The categories with the largest activities are rehearsed first, and self-inhibit their activity as they do so in order to prevent tem from being rehearsed perseveratively. The laws whereby the items are stored in working memory obey basic design principles concerning list categories, or chunks, of sequences of stored items can be stably remembered.
    || Working memory models: item and order, or competitive queuing (Grossberg 1978; Houghton 1990; Page, Norris 1998). Event sequence in time stored as an evolving spatial pattern of activity. Primacy gradient of working memory activation stores correct temporal order at content-addressable cells. Maximally activated cell populations is performed next when a rehearsal wave is turned on. Output signal from chosen cell population inhibits its own activity to prevent perseveration: inhibition of return. Iterate until entire sequence is performed.
  • image p438fig12.34 The LTM Invariance Principle insists that words being stored in working memory for the first time (eg MYSELF) do not cause catastrophic forgetting of the categories that have already been learned for their subwords (eg MY, SELF, and ELF) or other subset linguistic groups.
    || LTM invariance principle. unfamiliar STM -> LTM familiar. How does STM storage of SELF influence STM storage of MY? It should not recode LTM of either MY or SELF!
  • image p439fig12.35 The Normalization Rule insists that the total activity of stored items in working memory has an upper bound that is approximately independent of the number of items that are stored.
    || Normalization Rule (Grossberg 1978). Total STM activity has a finite bound independent of the number of items (limited capacity of STM). Activity vs Items for [slow, quick] asymptotic energy growth.
  • image p439fig12.36 (1) Inputs to Item and Order working memories are stored by content-addressable item categories. (2) The relative activities of the item categories code the temporal order of performance. (3) In addition to excitatory recurrent signals from each working memory cell (population) to itself, there are also inhibitory recurrent signals to other working memory cells, in order to solve the noise-saturation dilemma. (4) A nonspecific rehearsal wave allows the most active cell to be rehearsed first. (5) As an item is being rehearsed, it inhibits its own activity using a feedback inhibitory interneuron. Persevervation performance is hereby prevented.
    || Item and order working memories. (1) Content-addressable item codes (2) Temporal order stored as relative sizes of item activities (3) Competition between working memory cells: Competition balances the positive feedback that enables the cells to remain active. Without it, cell activities may all saturate at their maximal values-> Noise saturation dilemma again! (4) Read-out by nonspecific reheasal wave- Largest activity is the first out (5) STM reset self-inhibition prevents perseveration: [input/self-excitatory, rehearsal wave]-> [output, self-inhibition]
  • image p440fig12.37 Simulation of a primacy gradient for a short list (left image) being transformed into a bowed gradient for a longer list (right image). Activities of cells that store the longer list are smaller die to the Normalization Rule, which follows from the shunting inhibition in the working memory network.
    || Primacy bow as more items stored. [activities, final y] (Left) Primacy gradient 6 items (Right) Bowed gradient 20 items
  • image p441fig12.38 The LTM Invariance Principle is realized if the relative sizes of the inputs to the list chunk level stay the same as more items are stored in working memory. This property, in turn, follows from shunting previously stored working memory activities when a ne4w item occurs.
    || LTM Invariance principle. Choose STM activities so that newly stored STM activities may alter the size of old STM activities without recoding their LTM patterns. In particular: New events do not change the relative activities of past event sequences, but may reduce their absolute activites. Why? Bottom-up adaptive filtering uses dot products: T(j) = sum[i=1 to n: x(i)*z(i,j) = total input to v(j). The relative sizes of inputs to coding nodes v(j) are preserved. x(i) -> w*x(i), 0 < w <= 1, leaves all past ratios T(j)/T(k) unchanged.
  • image p442fig12.39 (left column, top row) How a shunt plus normalization can lead to a bow in the stored working memory spatial pattern. Time increases in each row as every item is stored with activity 1 before it is shunted by w due to each successive item's storage, and the total working memory activity in each row is normalized to a total activity of 1. (right column, top row) When the working memory stored pattern is shunted sufficiently strongly (w > 1/2), then the pattern bows at position 2 in the list as more items are stored through time. (left column, bottom row) LTM invariance can be generalized to consider arbitrary amounts of attention u, being paid when the i_th item is stored with an arbitrary amount of shunting w(j) to the j_th item. (right colum, bottom row) The Normalization Rule can also be generalized to approach the maximum possible normalized total activity that is stored across all the working memory cells at different rates.
    || Shunt normalization -> STM bow. (topLeft) Algebraic working memory (Grossberg 1978) (topRight) Strong inhibition of new inputs by stored STM items. Bow at position 2. Can we classify all working memory codes of this type? Yes! (bottomLeft) 1. LTM invariance principle (bottomRight) 2. Normalization Rule (Kahneman, Beatty 1966)
  • image p442fig12.40 Given the hypothesis in Figure 12.39 (right column, bottom row) and a generalized concept of steady, albeit possibly decreasing, attention to each item as it is stored in working memory, only a primacy, or bowed gradient of activity across the working memory items can be stored.
    || LTM Invariance + Normalization. (... given conditions ...) Then the x(i) can ONLY form: [primacy gradient, recency gradient, unimodal bow]
  • image p443fig12.41 Neurophysiological data from the Averbeck etal sequential copying experiments show the predicted primacy gradient in working memory and the self-inhibition of activity as an item is stored. When only the last item remains stored, it has the highest activity becasuse it has been freed from inhibition by earlier items.
    || Neurophysiology of sequential copying
  • image p444fig12.42 The LIST PARSE laminar cortical model of working memory and list chunking that I published with Lance Pearson in 2008 simulated the Averbeck etal data in Figure 12.41, as in the left column of the figure. It also simulated cognitive data about working memory storage by human subjects. See the text for details.
    || LIST PARSE: Laminar cortical model of working memory and list chunking (Grossberg, Pearson 2008). Simulates data about: [immediate, delayed, continuous] distractor free recall; immediate serial recall; and variable-speed sequential perfomance of motor acts. [velocity, acceleration] vs time (ms) from recall cue.
  • image p445fig12.43 The LIST PARSE laminar cortical Cognitive Working Memory circuit, that is proposed to occur in ventrolateral prefrontal cortex, is homologous to the LAMINART circuit circuit that models aspects of how visual cortex sees. The Motor Working Memory, VITE Trajectory Generator, and Variable-Rate Volitional Control circuits model how other brain regions, including dorsolateral prefrontal cortex, motor cortex, cerebellum, and basal ganglia, interact with the Cognitive Working Memory to control working memory storage and variable-rate performance of item sequences.
    || List parse circuit diagram. Connectivity convention. sequence chunks [<- BU filter, TD expectation ->] working memory. Working memory and sequence chunking circuit is homologous to visual LAMINART circuit!
  • image p446fig12.44 (left column, top row) LIST PARSE can model linguistic data from human subjects. In this figure, model parameters are fixed to enable a close fit to data about error-type distributions in immediate free recall experiments, notably transposition errors. (right column, top row) Simulation and data showing bowing of the serial position curve, including an extended primacy gradient. (left column, bottom row) The simulation curve overlays data about list length effects, notably the increasing recall difficulty of longer lists during immediate serial recall (ISR). (right column, bottom row) Simulation (bottom image) and data (top image) of the limited temporal extent for recall.
    || (1. TL) Error-type distributions in immediate serial recall (Hanson etal 1996). % occurence vs serial position. Graph convention: Data- dashed lines; Simulations- solid lines. Six letter visual ISR. Order errors- transpositions of neighboring items are the most common. Model explanation: Noisy activation levels change relative oorder in primacy gradient. Similar activation of neighboring items most susceptible to noise. Model paramenters fitted on these data. (2. TR) Bowing of serial position curve (Cowan etal 1999). % correct vs serial position. Auditory ISR with various list lengths (graphs shifted rightward): For [, sub-]span lists- extended primacy, with one (or two) item recency; Auditory presentation- enhanced performance for last items. LIST PARSE: End effects- first and last items half as many members; Echoic memory- last presented item retained in separate store. (3. BL) List length effects, circles (Crannell, Parrish 1968), squares (Baddeley, Hitch 1975), solid line- simulation. % list correct vs list length. Variable list length ISR: longer lists are more difficult to recall. LIST PARSE: More items- closer activation levels and lower absolute activity level with enough inputs; Noise is more likely to produce order errors, Activity levels more likely to drop below threshold;. (4. BR) Limited temporal extent for recall (Murdoch 1961). % recalled vs retention interval (s). ISR task with distractor-filled retention intervals (to prevent rehersal): Increasing retention interval - decreases probabilty of recalling list correctly; Load dependence- longer list more affected by delays; Performance plateau- subjects reach apparent asymptote. LIST PARSE: Increase convergence of activities with time; loss of order information;.
  • image p447fig12.45 (left column) LIST PARSE simulations of the proportion of order errors as a function of serial position for 6 item lists with (a) an extended pause of 7 time units between the third and fourth items, and (b) pauses of 5 time units (solid curve) and 10 time units (dashed curve) between all items. (right column) Simulations (solid curves) and data (dashed curves) illustrating close model fits in various immediate free recall tasks.
    || (Left) Temporal grouping and presentation variability. Temporal grouping: Inserting an extended pause leads to inter-group bowing; Significantly different times of integration and activity levels across pause, fewer interchanges. (Right) Immediate free recall, and [delayed, continuous] distractor-free recall. Overt rehersal IFR task with super-span (ie 20 item) lists: Extended recency- even more extended with shorter ISIs; Increased probability of recall with diminished time from last rehearsal; Early items in list rehearsed most;. LIST PARSE (unique) for long lists: Incoming items from recency gradient; Rehearsal (re-presentation) based upon level of activity.
  • image p448fig12.46 A Masking Field working memory is a multiple-scale self-similar recurrent shunting on-center off-surround network. It can learn list chunks that respond selectively to lists of item chunks of variable length that are stored in an item working memory at the previous processing stage. Chunks that code for longer lists (eg MY vs MYSELF) are larger, and give rise to stronger recurrent inhibitory neurons (red arrows).
    || How to code variable length lists? MASKING FIELDS code list chunks of variable length (Cohen, Grossberg 1986, 1987; Grossberg, Kazerounian 2011, 2016; Grossberg, Meyers 2000; Grossberg, Pearson 2008). Multiple-scale self-similar WM: Masking field, adaptive filter. Variable length coding- Masjking fields select list chunks that are sensitive to WM sequences of variable length; Selectivity- Larger cells selectively code code longer lists; Assymetric competition- Larger cells can inhibit smaller cells more than conversely MAgic Number 7! Temporal order- different list chunks respond to the same items in different orders eg LEFT vs FELT;.
  • image p449fig12.47 This figure illustrates the self-similarity in a Masking Field of both its recurrent inhibitory connections (red arrows) and its top-down excitatory priming signals (green arrows) to the item chunk working memory.
    || Both recurrent inhibition and top-down excitatory priming are self-similar in a masking field. MYSELF <-> [MY, MYSELF]
  • image p452fig12.48 (left column) In experiments of (Repp etal 1978), the silence duration between the words GRAY and SHIP was varied, as was the duration of the fricative noise in S, with surprising results. (right column) The red arrow directs our attention to surprising perceptual changes as silence and noise durations increase. See the text for details.
    || Perceptual integration of acoustic cues, data (Repp etal 1978). GRAY-> silence duration-> SHIP (noise duration from start of word). Noise duration vs silence duration: GRAY SHIP <-> [GREAT SHIP <-> GRAY CHIP] <-> GREAT CHIP.
  • image p453fig12.49 The ARTWORD model that I published in 2000 with my PhD student Christopher Myers simulates data such as the (Repp etal 1978) data in Figure 12.68. See the text for details.
    || ARTWORD model (Grossberg, Myers 2000). Input phonetic features-> Phonemic item working memory-> Masking Field unitized lists-> Automatic gain control-> Phonemic item working memory. [habituative gate, adaptive filter]s.
  • image p453fig12.50 The ARTWORD perception cycle shows how sequences of items activate possible list chunks, which compete among each other and begin to send their top-down expectations back to the item working memory. An item-list resonance develops through time as a result.
    || ARTWORD perception cycle. (a) bottom-up activation (b) list chunk competition (c) item-list resonance (d) chunk reset due to habituative collapse.
  • image p454fig12.51 (left column) Even as a resonance with the list chunk GRAY begins to develop, if the delay between "gray" and "chip" is increased, greater habituation of this resonance may allow the GREAT chunk to begin to win, thereby smoothly transfering the item-list resonance from GRAY to GREAT through time. (right column) Simulation of a resonant treansfer from GRAY to GREAT, and back again as the silence interval between the words {gray" and "chip" increases. The red region between GRAY and GREAT curves calls attention to when GREAT wins. See the text for details.
    || Resonant transfer, as silence interval increases. (left) Delay GRAY resonance weakens. A delayed additional item can facilitate perception of a longer list. (right) GRAY-> GREAT-> GRAY.
  • image p455fig12.52 Simulation of cARTWORD dynamics in response to the complete list /1/-/2/-/3/. The relevant responses are surrounded by a red box.
    || Presentation of a normal sequence: input /1/-/2/-/3/. |c(i,1)-5| vs time (msec). List chunks select most predictive code. Order stored in WM layers. Resonant activity of /1/-/2/-/3/ in item and feature layers corresponds to conscious speech percept.
  • image p456fig12.53 Simulation of cARTWORD dynamics in response to the partial list /1/-silence-/3/ with /2/ replaced by silence. Only the representations of these items can be seen in the red box.
    || Presentation with silence duration: input /1/-silence-/3/. |c(i,1)-5| vs time (msec). List chunks select most predictive code. Order stored in WM layers. Gap in resonant activity of /1/-silence-/3/ in item and feature layers corresponds to perceived silence.
  • image p456fig12.54 Item /2/ is restored in the correct list position in response to the list /1/-noise-/3/.
    || Presentation with noise: input /1/-noise-/3/. |c(i,1)-5| vs time (msec). List chunks select the most predictive code. Order restored in WM layers. Resonant activity of /1/-/2/-/3/ in item and feature layers corresponds to restoration of item /2/ replaced by noise in input.
  • image p457fig12.55 Item /4/ is restored in the correct list position in response to the list /1/-noise-/5/. This and the previous figure show how future context can disambiguate past noisy sequences that are otherwise identical.
    || Presentation with noise: input /1/-noise-/5/. |c(i,1)-5| vs time (msec). List chunks select the most predictive code. Order restored in WM layers. Resonant activity of /1/-/4/-/3/ in item and feature layers corresponds to restoration of item /4/ replaced by noise in input.
  • image p459fig12.56 (Grossberg, Pearson 2008) proposed that the ability of working memories to store repeated items in a sequence represents rank information about the position of an item in a list using numerical hypercolumns in the prefrontal cortex (circels with numbered sectors: 1,2,3,4). These numerical hypercolumns are conjointly activated by inputs from item categories and from the analog spatial representation of numerosity in the parietal cortex. Thes parietal representations (overlapping Gausian activity profiles that obey a Weber Law) had earlier been modeled by (Grossberg, Repin 2003). See the text for details.
    || Item-order-rank working memory, rank information from parietal numerosity cicuit (Grossberg, Peaarson 2008; Grossberg, Repin 2003). [Sensory working memory-> adaptive filter-> list chunk-> attentive prime-> Motor working memory]-> [large, small] numbers-> transfer functions with variable thresholds and slopes-> uniform input-> integrator amplitude-> number of transient sensory signals.
  • image p460fig12.57 The lisTELOS architecture explains and simulates how sequences of saccadic eye movement commands can be stored in a spatial working memory and recalled. Multiple brain regions are needed to coordinate these processes, notably three different basal ganglia loops to replace saccade storage, choice, and performance, and the supplementary eye fields (SEF) to choose the next saccadic command from a stored sequence. Because all working memories use a similar network design, this model can be used as a prototype for storing and recalling many other kinds of cognitive, spatial, and motor information. See the text for details.
    || lisTELOS model- Spatial working memory (Silver, Grossberg, Bulloch, Histed, Miller 2011). Simulates how [PPC, PFC, SEF, FEF, SC] interact with 3 BG loops to learn and perform sequences of saccadic eye movements.
  • image p461fig12.58 The lisTELOS model built upon key processes that were earlier modeled by the TELOS model. See the text for details.
    || TELOS model (Brown, Bulloch, Grossberg 1999, 2004). shows [BG nigro-[thalamic, collicular], FEF, ITa, PFC, PNR-THAL, PPC, SEF, SC, V1, V4/ITp, Visual Cortex input] and [GABA].
  • image p462fig12.59 The TELOS model clarifies how reactive vs. planned eye movements may be properly balanced against one another, notably how a fast reactive movement is prevented from occuring in response to onset of a cue that requires a different, and more contextually appropriate, response, even if the latter response takes longer to be chosen and performed. The circuit explains how "the brain knows it before it knows" what this latter response should be by changing the balance of excitation to inhibition in the basal ganglie (BG) to keep the reactive gate stays shut until the correct target position can be chosen by a frontal-parietal resonance.
    || Balancing reactive vs. planned movements (Brown, Bulloch, Grossberg 2004). (a) shows [FEF, PPC]-> [BG, SC], and BG-> SC. (b) FTE vs time (msec) for [fixation, saccade, overlap, gap, delayed saccade] tasks.
  • image p463fig12.60 Rank-related activity in prefrontal cortex and supplementary eye fields from two different experiments. See the text for details.
    || Rank-related activity in PFC and SEF. Prefrontal cortex (Averbeck etal 2003) [sqare, inverted triangle]. Supplementary eye field (Isoda, Tanju 2002).
  • image p464fig12.61 (left column) A microstimulating electrode causes a spatial gradient of habituation. (right column) The spatial gradient of habituation that is caused by microstimulation alters the order of saccadic performance of a stored sequence, but not which saccades are performed, using interactions between the prefrontal cortex (PFC) working memory and the supplemental eye field (SEF) saccadic choice.
    || (left) Microstimulation causes habituation (Grossberg 1968). Stimulation caused habituation. Cells close to the stimulation site habituate most strongly. (right) Stimulation biases selection PFC-> SEF-> SEF. PFC Activity gradient in working memory, SEF Microstimulation causes habituation, During selection habituated nodes are less likely to win this competition.
  • image p464fig12.62 The most habituated positions have their neuronal activites most reduced, other things being equal, as illustrated by the gradient from deep habituation (red) to less habituation (pink). The saccadic performance orders (black arrows) consequently tend to end in the most habituated positions that have been stored.
    || The most habituated position is foveated last. For each pair of cues, the cue closest to the stimulation site is most habituated -- and least likely to be selected. Because stimulation spreads in all directions, saccade trajectories tend to converge.
  • image p465fig12.63 Neurophysiological data (left image) and lisTELOS stimulation (right figure) showing how microstimulation biases saccadic performance order but not the positions to which the saccades will be directed. See the text for details.
    || Saccade trajectories converge to a single location in space. Microstimulation biased selection so saccade trajectories converged toward a single location in space. [Data, model] contra <-> Ipsi (msec)
  • image p467fig12.64 Some of the auditory cortical regions that respond to sustained or transient sounds. See text for details.
    || Some auditory cortical regions. Core <-> belt <-> parabelt. [Belt, Core, ls, PAi, Parabelt, PGa, TAs, TE, TP, TPO, st s].
  • image p468fig12.65 Linguistic properties of the PHONET model and some of the data that it simulates. The upper left image summarizes the asymmetric transient-to-sustained gain control that helps to create invariant intraword ratios during variable-rate speech. The lower left image summarizes the rate-dependent gain control of the ARTPHONE model that creates rate-invariant working memory representations in response to sequences of variable-rate speech. The right image summarizes the kind of paradoxical VC-CV category boundary data of (Repp 1980) that ARTPHONE simulates. See the text for details.
    || (left upper) [transient, sustained] [working memory, filter, category]. (left lower) phone inputs-> [input rate estimate, features], Features w <- habituative transmitter gates -> categories-> rate invariant phonetic output, input rate estimate-> gain control-> [features, categories] rate-dependent integration of categories and features. (right) % 2-stop vs VC-CV silent interval (msec): [ib-ga, ib-ba, iga, iba].
  • image p469fig12.66 (left column) A schematic of how preserving relative duration, as in the first and third images, of consonant and vowel pairs can preserve a percept, in this case of /ba/, but not doing so, as in the first and second images, can cause a change in percept, as from /ba/ to /wa/, as in the data of (Miller, Liberman 1979) that PHONET simulates. (right column) Changing frequency extent can also cause a /ba/ - /wa/ transition, as shown in data of (Schwab, Sawusch, Nusbaum 1981) that PHONET also simulates.
    || (left image) Maintaining relative duration as speech speeds up preserves percept (Miller, Liberman 1979). frequency vs time- [/ba/, /wa/, /ba/] (right image) Changing frequency extent causes /b/-/wa/ transition (Schwab, Sawusch, Nusbaum 1981). frequency vs time- [/ba/, /wa/] Dt extent.
  • image p469fig12.67 PHONET contains transient and sustained cells that respond to different kinds of sounds, notably the transients of certain consonants and the sustained sounds of certain vowels. It then uses the transient working memory to gain contol the integration rate of the sustained working memory to which these different detectors input.
    || Phonetic model summary. (left) Acoustic tokens [consonant, vowel]. (middle) Acoustic detectors [transient (sensitive to rate), Sustained (sensitive to duration)]. (right) Working memory, Spatially stored transient pattern (extent) + gain control-> spatially stored sustained pattern.
  • image p471fig12.68 A mismatch reset of /b/ in response to the /g/ in [ib]-[ga] can rapidly shut off the [ib] percept, leading to the percept of [ga] after an interval of silence. In contrast, resonant fusion of the two occurences of /b/ in [ib]-[ba] can cause a continuous percept of sound [iba] to occur during times at which silence is heard in response to [ib]-[ga].
    || Mismatch vs resonant fusion
  • image p473fig12.69 Error rate and mean reaction time (RT) data from the lexical decision experiments of (Schvanevelt, McDonald 1981). ART Matching Rule properties explain these data in (Grossberg, Stone 1986).
    || (left) Error rate vs type of prime [R, N, U], [non,] word. (right) Mean RT (msec) vs type of prime [R, N, U], [non,] word.
  • image p474fig12.70 The kind of model macrocircuit that was used in (Grossberg, Stone 1986) to explain lexical decision task data.
    || inputs-> A1 <-> A2 oconic sensory features <-> A3 item and order in sensory STM <-> A4 list parsing in STM (masking field) <-> A5 semantic network (self-feedback). [A4, A5] <-> V* visual object recognition system. M1-> [outputs, A1]. M1 <-> M2 iconic motor features <-> M3 item and order in motor STM. A2-> M2. A3-> M3.
  • image p476fig12.71 Word frequency data of (Underwood, Freund 1970) that were explained in (Grossberg, Stone 1986).
    || percent errors vs frequency of old words [L-H to H-H, L-L to H-L].
  • image p481fig13.01 Macrocircuit of the functional stages and anatomical interpretations of the Cognitive-Emotional-Motor, or CogEM, model.
    || Drive-> hypothalamus value categories <-> amygdala incentive motivational learning-> Orbitofrontal cortex- object-value categories <-> sensory cortex- invariant object categories- conditioned reinforcer learning-> amygdala-> hypothalamus.
  • image p483fig13.02 The object-value categories in the orbitofrontal cortex require converging specific inputs from the sensory cortex and nonspecific incentive motivational inputs from the amygdala in order to fire. When the orbitofrontal cortex fires, it can deliver top-down ART Matching Rule priming signals to the sensory cortical area by which it was activated, thereby helping to choose the active recognition categories there that have the most emotional support, while suppressing others, leading to attentional blocking of irrelevant cues.
    || Cognitive-Emotional-Motor (CogEM) model. Drive-> amygdala incentive motivational learning-> orbitofrontal cortex- need converging cue and incentive iputs to fire <-> sensory cortex- conditioned renforcer learning-> amygdala. CS-> sensory cortex. Motivated attention closes the cognitive-emotional feedback loop, focuses on relevant cues, and causes blocking of irrelevant cues.
  • image p483fig13.03 The predicted processing stages of CogEM have been supported by anatomical studies of connections between sensory cortices, amygdala, and orbitofrontal cortex.
    || Adapted from (Barbas 1995). sensory cortices = [visual, somatosensory, auditory, gustatory, olfactory]. sensory cortices-> amygdala-> orbital prefrontal cortex. sensory cortices-> orbital prefrontal cortex. [visual cortex, amygdala]-> lateral prefrontal cortex.
  • image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
    || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
  • image p484fig13.05 Classical conditioning is perhaps the simplest kind of associative learning.
    || Classical conditioning (nonstationary prediction). Bell (CS)-> (CR), Shock (US)-> Fear (UR), associative learning.
  • image p485fig13.06 (left column) An inverted-U occurs in conditioned reinforcer strength as a function of the ISI between the CS and the US. Why is learning attenuated at 0 ISI? (right column) Some classical conditioning data that illustrate the inverted-U in conditioning as a function of the ISI.
    || InterStimulus Interval (ISI) effect. Data from (Dmith etal 1969; Schneiderman, Gormezano 1964).
  • image p485fig13.07 The paradigm of secondary conditioning. See the text for details.
    || Secondary conditioning (Advertising!). [CS1, C2] become conditioned reinforcers.
  • image p486fig13.08 The blocking paradigm illustrates how cues that do not predict different consequences may fail to be attended.
    || Blocking- minimal adaptive prediction. Phase [I, II] - CS2 is irrelevant.
  • image p486fig13.09 Equally salient cues can be conditioned in parallel to an emotional consequence.
    || Parallel processing of equally salient cues vs overshadowing (Pavlov).
  • image p486fig13.10 Blocking follows if both secondary conditioning and attenuation of conditioning at a zero ISI occur.
    || Blocking = ISI + secondary conditioning.
  • image p487fig13.11 The three main properties of CogEM that help to explain how attentional blocking occurs.
    || CogEM explanation of attentional blocking. Internal drive input <-> Conditioned reinforcer learning (self-recurrent) <-> Competition for STM <- Motor learning. 1. Sensory representations compete for limited capacity STM. 2. Previously reinforced cues amplify their STM via positive feedback. 3. Other dues lose STM via competition.
  • image p488fig13.12 (left column) How incentive motivational feedback amplifies activity of a sensory cortical cell population. (right column) A sensory cortical cell population whose activity is amplified by incentive motivational feedback can suppress the activities of less activated populations via self-normalizing recurrent competitive interactions.
    || Motivational feedback and blocking. (left) sensory input CS, STM activity without motivational feedback, STM activity with motivational feedback. (right) STM suppressed by competition, STM amplified by (+) feedback.
  • image p489fig13.13 (top row) If a positive ISI separates onset of a CS and US, then the CS can sample the consequences of the US during the time interval before it is inhibited by it. (bottom row) A CogEM simulation of the inverted-U in conditioning as a function of the ISI betweeen CS and US.
    || Positive ISI and conditioning.
  • image p490fig13.14 In order for conditioning to work properly, the sensory representation needs to have at least two successive processing stages. See the text for why.
    || Model of Cognitive-Emotional circuit. Drive-> Drive representation-> ??? <-> Sensory STM <-CS
  • image p490fig13.15 The CogEM circuit is an ancient design that is found even in mollusks like Aplysia. See the text for details.
    || Aplysia (Buononamo, Baxter, Byrne, Neural Networks 1990; Grossberg, Behavioral and Brain Sciences 1983). Facilitator neuron ~ drive representation.
  • image p492fig13.16 (left column) In order to satisfy all four postulates, there needs to be UCS-activated arousal of polyvalent CS-activated sampling neuron. (right column) The arousal needs to be nonspecific in order to activate any of the CSs that could be paired with the UCS.
    || Polyvalent CS sampling and US-activated nonspecific arousal.
  • image p493fig13.17 (top row) Overcoming the ostensible contradiction that seems to occur when attempting to simultaneously realize hypotheses (3) and (4). (bottom row) The problem is overcome by assuming the existence of US-activated drive representation to which CSs can be associated, and that activate nonspecific incentive motivational feedback to sensory representations.
    || Learning nonspecific arousal and CR read-out. (top) Learning to control nonspecific arousal, Learning to read-out the CR (bottom) Drive representation, Incentive motivation.
  • image p494fig13.18 Realizing the above constraints favor one particular circuit. Circuits (a) and (b) are impossible. Circuit (d) allows previously occurring sensory cues to be stored in STM. Circuit (e) in addition enables a CS can be stored in STM without initiating conditioning in the absence of a US.
    || Learning to control nonspecific arousal and read-out of the CR: two stages of CS. (d) & (e) polyvalent cells.
  • image p494fig13.19 (left column, top row) Secondary conditioning of both arousal and a specific response are now possible. (bottom row) The CogEM circuit may be naturally extended to include multiple drive representations and inputs. (right column, top row) The incentive motivational pathways is also conditionable in order to enable motivational sets to be learned.
    || Secondary conditioning. Homology: conditionable incentive motivation. Multiple drive representations and inputs.
  • image p496fig13.20 (top image) A single avalanche sampling cell can learn an arbitrary space-time pattern by sampling it as a temporally ordered series of spatial patterns using a series of outstars. Once an avalanche's sampling cell starts to fire, there is no way to stop it from performing the entire space-time pattern, no matter how dire the consequences. (bottom image) If nonspecific arousal and a specific cue input are both needed to fire the next cell in an avalanche, then environmental feedback can shut off avalanche performance at any time, and volition can speed up or slow down performance.
    || Space-time pattern learning: avalanche. (top image) CS sampling signal-> serially activated outstars-> US spacetime input pattern. Sample a space-time pattern as a sequence of sptial patterns. (bottom image) Nonspecific arousal as a command cell. Polyvalent cell: nonspecific arousal as a STOP and a GO signal.
  • image p497fig13.21 (left column) An early embodiment of nonspecific arousal was a command cell in such primitive animals as crayfish. (right column) The songbird pattern generator is also an avalanche. This kind of circuit raises the question of how the connections self-organize through developmental learning.
    || Nonspecific arousal as a command cell. Crayfish swimmerets (Stein 1971). Songbird pattern generator (Fee etal 2002)+. Motor-> RA-> HVC(RA).
  • image p498fig13.22 (left column, top row) Adaptive filtering and conditioned arousal are both needed to regulate what cues can learn to activate particular space-time patterns. These developments lead inexorably to basic cognitive abilities, as embodied in the 3D LAMINART models for 3D vision and figure-ground perception (Chapter 11) and the 3D ARTSCAN SEARCH model for invariant object learning, recognition, and 3D search (Chapter 6). (right column, top row) Conditioned arousal enables only emotionally important cues to activate a motivationally relevant space-time pattern. (bottom row) Conditioned arousal and drive representations arise naturally from the unlumping of avalanche circuits to make them selective to motivationally important cues. The MOTIVATOR model is a natural outcome of this unlumping process (this chapter).
    || (top) Adaptive filtering and Conditioned arousal. Towards Cognition: need to filter inputs to the command cell. Towards Emotion: important signals turn arousal ON and OFF. (bottom) Conditioned arousal and Drive representations. Competition between conditioned arousal sources at drive representations, eg amygdala.
  • image p499fig13.23 (left column) Self-organization in avalanches includes adaptive filtering by outstars [?instars?], serial learning of temporal order, and learned read-out of spatial patterns by outstars. (right column) Serial learning of temporal order occurs in recurrent associative networks.
    || (left) Self-organizing avalanches [instars, serial learning, outstars]. (right) Serial list learning.
  • image p500fig13.24 Both primary excitatory and inhibitory conditioning can occur using opponent processes and their antagonistic rebounds.
    || Opponent processing. Cognitive drive associations. Primary associations: excitatory [CS, US, Fear], inhibitory [CS, US, Fear, Relief rebound].
  • image p501fig13.25 When an unbiased transducer is embodied by a finite rate physical process, mass action by a chemical transmitter is the result.
    || Unbiased transducer (Grossberg 1968). S=input, T=output, ?SB?=SB B is the gain. Suppose T is due to release of chemical transmitter y at a synapse: release rate T = S*y (mass action); Accumulation y ~= B.
  • image p501fig13.26 A simple differential equation describes the processes of transmitter accumulation and release that do their best, at a finite rate, to carry out unbiased transduction.
    || Transmitter accumulation and release. Transmitter y cannot be restored at an infinite rate: T = S*ym y ~= B, Differential equations: d[dt: y] = A*(B - y) - S*y = accumulate - release. Transmitter y tries to recover to ensure unbiased transduction. What if it falls behind? Evolution has exploited the good properties that happen then.
  • image p502fig13.27 Despite the fact that less transmitter y is available after persistent activation by a larger input signal S, the gated output signal S*y is larger die to the mass action gating of S by y.
    || Minor mathematical miracle. At equilibrium: 0 = d[dt: y] = A*(B - y) - S*y. Transmitter y decreases when input S increases: y = A*B/(A + S). However, output S*y increases with S!: S*y = S*A*B/(A + S) (gate, mass action).
  • image p502fig13.28 Fast increments and decrements in an input S lead to slow habituation of the habituative gate, or medium-term memory, transmitter y. The output T is a product of these fast and slow variables, and consequently exhibits overshoots, habituation, and undershoots in its response.
    || Habituative transmitter gate: Input; Habituative gate d[dt: y] = A*(B - y) - S*y; Output [overshoot, habituation, undershoot]s Weber Law.
  • image p503fig13.29 The ON response to a phasic ON input has Weber Law properties due to the divisive terms in its equilibrium response, which are due to the habituative transmitter.
    || ON-response to phasic ON-input. S1 = f(I+J): y1 = A*B/(A+S1), T1 = s1*y1 = A*B*S1/(A+S1); S2 = f(I): y2 = A*B/(A+S2), T2 = s2*y2 = A*B*S2/(A+S2);. ON = T1 - T2 = (A^2*B*(f(I+J)-f(I)) / (A+f(I)) / (A+f(I+J)) Note Weber Law. When f has a threshold, small I requires larger J to fire due to numerator, but makes suprathreshold ON bigger due to denominator. When I is large, quadratic in denominator and upper bound of f make ON small.
  • image p504fig13.30 OFF rebound occurs when the ON-input shuts off due to the imbalance that is caused by the ON input in the habituation of the transmitters in the ON and OFF channels. The relative sizes of ON responses and OFF rebounds is determined by the arousal level I.
    || OFF-rebound due to phasic input offset. Shut off J (Not I!). Then: S1 = f(I), S2 = f(I); y1 ~= A*B/(A+f(I+J)) < y2 ~= A*B/(A+f(I)) y1 and y2 are SLOW; T1 = S1*y1, T2 = S2*y2, T1 < T2;. OFF = T2 - T1 = A*B*f(I)*(f(I+J) - f(I)) / (A+f(I)) / (A + f(I+J)), Note Weber Law due to remembered previous input. Arousal sets sensitivity of rebound: OFF/ON = f(I)/A. Why is the rebound transient? Note equal f(I) inputs.
  • image p504fig13.31 Behavioral contrast can occur during reinforcement learning due to decreases in either positive or negative reinforcers. See Figure 13.32 for illustrative operant conditioning data.
    || Behavioral contrast: rebounds! Shock level vs trials. 1. A sudden decrease in frequency or amount of food can act as a negative reinforcer: Frustration. 2. A sudden decrease in frequency or amount of shock can act as a positive reinforcer: Relief.
  • image p505fig13.32 Response suppression and the subsequent antagonist rebounds are both calibrated by the inducing shock levels.
    || Behavioral contrast (Reynolds 1968). Responses per minute (VI schedule) vs Trial shock level.
  • image p505fig13.33 An unexpected event can disconfirm ongoing processing by triggering a burst of nonspecific arousal that causes antagonistic rebounds in currently active gated dipoles, whether cognitive or affective.
    || Novelty reset: rebound to arousal onset. 1. Equilibrate to I and J: S1 = f(I+J); y1 = A*B/(A+S1); S2 = f(I+J); y2 = A*B/(A+S2);. 2. Keep phasic input J fixed; increase arousal I to I* = I + ∆I: (a) OFF reaction if T1 < T2; OFF = T2 - T1 = f(I*+J)*y2 - f(I*)*y1 = { A*B*(f(I*) - f(I*+J)) - B*(f(I*)*f(I+J) - f(I)*f(I*+J)) } / (A+f(I)) / (A + f(I+J)). 3. How to interpret this complicated equation?
  • image p506fig13.34 With a linear signal function, one can prove that the rebound increases with both the previous phasic input intensity J and the unexpectedness of the disconfirming event that caused the burst of nonspecific arousal.
    || Novelty reset: rebound to arousal onset.
  • image p506fig13.35 A shock, or other reinforcing event, can have multiple cognitive and emotional effects on different brain processes.
    || Multiple functional roles of shock. 1. Reinforcement sign reversal: An isolated shock is a negative reinforcer; In certain contexts, a shock can be a positive reinforcer. 2. STM-LTM interaction: Prior shock levels need to be remembered (LTM) and used to calibrate the effect of the present shock (STM). 3. Discriminative and situational cues: The present shock level is unexpected (novel) with respect to the shock levels that have previously been contingent upon experimental cues: shock as a [1.reinforcer, 2. sensory cue, 3. expectancy].
  • image p509fig13.36 How can life-long learning occur without passive forgetting or associative saturation?
    || Associative learning. 1. Forgetting (eg remember childhood experiences): forgetting [is NOT passive, is Selective]; 2. Selective: larger memory capacity; 3. Problem: why doesn't memory saturate?
  • image p510fig13.37 A disconfirmed expectation can cause an antagonistic rebound that inhibits prior incentive motivational feedback, but by itself is insufficient to prevent associative saturation.
    || Learn on-response. 1. CS-> ON, disconfirmed expectation-> antagonistic rebound, OFF-channel is conditioned 2. CS-> [ON, OFF]-> net, zero net output. What about associative saturation?
  • image p510fig13.38 Dissociation of the read-out of previously learned adaptive weights, or LTM traces, and of the read-in of new weight values enables back-propagating dendritic action potentials to teach the new adaptive weight values.
    || Dissociation of LTM read-out and read-in. Backpropagating dendritic action potentials as teaching signals. 1. LTM Denditic spines (Rall 1960's)-> Teaching signal - retrograde action potential-> opponent competition. 2. Early predictions: Ca++ currents in learning (Grossberg 1968); role of dendritic spines in learning (Grossberg 1975). Cf experiments of (Hausser, Markram, Poo, Sakmann, Spruston, etc).
  • image p510fig13.39 Shunting competition and informational noise suppression in affective gated dipoles, plus back-propagating action potentials for teaching signals, enable the net normalized adaptive weights to be learned. They never saturate!
    || Learn net dipole output pattern. Opponent "decision" controls learning. Cf. competitive learning. Learning signal, opponent extinction.
  • image p512fig13.40 A conditioning paradigm that illustrates what it means for conditioned excitators to extinguish.
    || Conditioned excitor extinguishes. 1. Learning phase: CS1 bell-> US, CS1-> Fear(-). 2. Forgetting phase: CS1 bell-> Forgetting. 3. The expectation of shock is disconfirmed.
  • image p513fig13.41 A conditioning paradigm that illustrates what it means for conditioned inhibitors not to extinguish.
    || Conditioned inhibitor does not extinguish. 1. Learning phase: CS1 light-> shock, CS1-> Fear(-); Forgetting phase: n/a;. 2. Learning phase : CS1 + CS bell-> no shock; CS2-> relief;. Forgetting phase: CS2 bell- no forgetting. SAME CS could be used! SAME "teacher" in forgetting phase! Something else must be going on , or else causality would be violated!
  • image p513fig13.42 A conditioned excitor extinguishes because the expectation that was learned of a shock during the learning phase is disconfirmed during the forgetting phase.
    || Conditioned excitor extinguishes. Learning phase: CS1 bell-> US; CS1-> Fear(-); CS1-> shock; CS1 is conditioned to an expectation of shock. Forgetting phase: CS2 bell-> forgetting;. The expectation of shock is disconfirmed.
  • image p513fig13.43 A conditioned inhibitor does not extinguish because the expectation that was learned of no shock during the learning phase is not disconfirmed during the forgetting phase.
    || Conditioned excitor extinguishes. 1. Learning phase: CS1 light-> Shock; CS1-> Fear(-);. Forgetting phase: n/a;. 2. Learning phase: CS1 bell + CS2-> NO shock; CS2-> relief(+); CS2-> no shock;. Forgetting phase: CS2 bell!-> no forgetting;. The expectation that "no shock" follows CS2 is NOT disconfirmed!
  • image p514fig13.44 Analog of the COgEM model in Figure 6.1 of (Damasio 1999).
    || (a) map of object X-> map of proto-self at inaugural instant-> [, map of proto-self modified]-> assembly of second-order map. (b) map of object X enhanced-> second-order map imaged.
  • image p519fig14.01 Coronal sections of prefrontal cortex. Note particulary the areas 11, 13, 14, and 12o.
    ||
  • image p520fig14.02 Macrocircuit of the main brain regions, and connections between them, that are modelled in the unified predictive Adaptive Resonance Theory (pART) of cognitive-emotional and working memory dynamics. Abbreviations in red denote brain regions used in cognitive-emotional dynamics. Those in green denote brain regions used in working memory dynamics. Black abbreviations denote brain regions that carry out visual perception, learning and recognition of visual object categories, and motion perception, spatial representation and target tracking. Arrow denote non-excitatory synapses. Hemidiscs denote adpative excitatory synapses. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown. Also not shown are output signals from cortical areas to motor responses. V1: striate, or primary, visual cortex; V2 and V4: areas of prestriate visual cortex; MT: Middle Temporal cortex; MST: Medial Superior Temporal area; ITp: posterior InferoTemporal cortex; ITa: anterior InferoTemporal cortex; PPC: Posterior parietal Cortex; LIP: Lateral InterParietal area; VPA: Ventral PreArculate gyrus; FEF: Frontal Eye Fields; PHC: ParaHippocampal Cortex; DLPFC: DorsoLateral Hippocampal Cortex; HIPPO: hippocampus; LH: Lateral Hypothalamus; BG: Basal Ganglia; AMYG: AMYGdala; OFC: OrbitoFrontal Cortex; PRC: PeriRhinal Cortex; VPS: Ventral bank of the Principal Sulcus; VLPFC: VentroLateral PreFrontal Cortex. See the text for further details.
    ||
  • image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
    ||
  • image p524fig14.04 (a) Model basal ganglia circuit for the control of dopaminergic Now Print signals from the substantia nigra pars compacta, or SNc, in response to unexpected rewards. Cortical inputs (Ii), activated by conditioned stimuli, learn to excite the SNc via a multi-stage pathway from the vantral striatum (S) to the ventral pallidum and then on to the PPTN (P) and the SNc (D). The inputs Ii excite the ventral striatum via adaptive weights WIS, and the ventral striatum excites the SNc with strength W_PD. The striosomes, which contain an adaptive spectral timing mechanism [xij, Gij, Yij, Zij], learn to generate adaptively timed signals that inhibit reward-related activation of the SNc. Primary reward signals (I_R) from the lateral hypothalamus both excite the PPTN directly (with strength W_RP) and act as training signals to the ventral striatum S (with strength W_RS) that trains the weights W_IS. Arrowheads denote excitatory pathways, circles denote inhibitory pathways, and hemidiscs denote synapses at which learning occurs. Thick pathways denote dopaminergic signals.
    ||
  • image p530fig14.05 Displays used by (Buschman, Miller 2007) in their visual search experiments. See the text foir details.
    || Fixation 500 ms-> Sample 1000 ms-> Delay 500 ms-> Visual [pop-out, search]- reaction time.
  • image p531fig14.06 Classification of scenic properties as texture categories by the ARTSCENE model. See the text for details.
    || Image-> Feature extraction (texture principal component rankings)-> Learning feature-to-scene mapping (texture category principal component rankings)<- scene class. Large-to-small attentional shrouds as principle component higher.
  • image p531fig14.07 Voting in the ARTSCENE model achieves even better prediction of scene type. See the text for details.
    || Image-> Feature extraction (texture principal component rankings)-> Learning feature-to-scene mapping (texture category principal component rankings)-> evidence accumulation (sum)-> scene class winner-take-all inference. Large-to-small attentional shrouds as principle component higher.
  • image p532fig14.08 Macrocircuit of the ARTSCENE Search neural model for learning to search for desired objects by using the sequences of already experienced objects and their locations to predict what and where the desired object is. V1 = First visual area or primary visual cortex; V2 = Second visual area; V4 = Fourth visual area; PPC = Posterior Parietal Cortex; ITp = posterior InferoTemporal cortex; ITa = anterior InferoTemporal cortex; MTL = Medial Temporal Lobe; PHC = ParaHippoCampal cortex; PRC = PeriRhinal Cortex; PFC = PreFrontal Cortex; DLPFC = DorsoLateral PreFrontal Cortex; VPFC = Ventral PFC; SC = Superior Colliculus.
    ||
  • image p533fig14.09 Search data and ARTSCENE Search simulations of them in each pair of images from (A) to (F). See the text for details.
    || 6*[data vs simulation], [Response time (ms) versus epoch].
  • image p540fig15.01 The timing of CS and US inputs in the delay and trace conditioning paradigms.
    || Delay and trace conditioning paradigms. [CS, US] vs [Delay, Trace]. To perform an adaptively timed CR, trace conditioning requires a CS memory trace over the Inter-Stimulus Interval (ISI).
  • image p541fig15.02 The neurotrophic Spectrally Timed Adaptive Resonance Theory, or nSTART, model of (Franklin, Grossberg 2017) includes hippocampus to enable adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between CS and US.
    || Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. [CS, US] -> Sensory Cortex (SC) <- motivational attention <-> category learning -> Prefrontal Cortex (PFC). SC conditioned reinforcement learning-> Amygdala (cannot bridge the temporal gap) incentive motivational learning-> PFC. SC adaptively timer learning and BDNF-> Hippocampus (can bridge the temporal gap) BDNF-> PFC. PFC adaptively timed motor learning-> cerebellum.
  • image p541fig15.03 Stages in the processing of adaptively timed conditioning, leading to timed responses in (d) that exhibit both individual Weber laws and an inverted U in conditioning as a function of ISI. See the text for details.
    || Curves of [Response vs ISI].
  • image p542fig15.04 Conditioning data from (Smith 1968; Millenson etal 1977). The former shows the kind of Weber Law and inverted U that were simulated in Figure 15.3. The latter shows that, if there are two ISIs during an experiment, then the animals learn to adaptively time their responses with two properly scaled Weber laws.
    || (left) One ISI (Smith 1968) [mean membrane extension (mm) versus time after CS onset (msec)]. (right) Two ISIs (Millenson etal 1977) [200, 100] msec CS test trials, [mean momentary CS amplitude (mm) vs time after CS onset (msec)]. (bottom) Conditioned eye blinks, made with nictitating membrane and/or eyelid, are adaptively timed: peak closure occurs at expected time(s) of arrival of the US following the CS and obeys a Weber Law.
  • image p543fig15.05 Simulation of conditioning with two ISIs that generate their own Weber Laws, as in the data shown in Figure 15.4.
    || Learning with two ISIs: simulation: R = sum[all: f(xi)*yi*xi] vs msec. Each peak obeys Weber Law! strong evidence for spectral learning.
  • image p543fig15.06 The circuit between dentate granule cells and CA1 hippocampal pyramid cells seems to compute spectrally timed responses. See the text for details.
    || Hippocampal interpretation. 1. Dentate granule cells (Berger, Berry, Thompson 1986): "increasing firing...in the CS period...the latency...was constant". 2. Pyramidal cells: "Temporal model" Dentate granule cells-> CA3 pyramids. 3. Convergence (Squire etal 1989): 1e6 granule cells, 1.6e5 CA3 pyramids. 80-to-1 (ri).
  • image p544fig15.07 In response to a step CS and sustained storage by I_CS of that input, a spectrum of responses xi at different rates ri develops through time.
    || Spectral timing: activation. CS-> I_CS-> All xi. STM sensory representation. Spectral activation d[dt: xi] = ri*[-A*xi + (1 - B*xi)*I_CS].
  • image p544fig15.08 The spectral activities xi generate sigmoid signals f(xi) before the signals are, in turn, gated by habituative transmitters yi.
    || Habituative transmitter gate. transmitter.
  • image p544fig15.09 As always, the habituative transmitter gate yi increases in response to accumulation and decreases due to gated inactivation, leading to the kinds of transmitter and output responses in the right hand column.
    || Habituative transmitter gate (Grossberg 1968). 1. d[dt: yi] = c*(1-yi) - D*f(xi)*yi, C-term - accumulation, D-term gated inactivation. 2. Sigmoid signal f(xi) = xi^n / (B^n + xi^n). 3. Gated output signal f(xi)*yi.
  • image p545fig15.10 When the activity spectrum xi generates a spectrum of sigmoidal signals (f(xi), the corresponding transmitters habituate at different rates. The output signals f(xi)*yi therefore generate a series of unimodal activity profiles that peak at different times, as in Figure 15.3a.
    || A timed spectrum of sampling intervals. [f(xi) activation, yi habituation, f(xi)*yi gated sampling] spectra. gated = sampling intervals.
  • image p545fig15.11 The adaptive weight, or LTM trace , zi learns from the US input I_US at times when the sampling signal f(xi)*yi is on. It then gates the habituative sampling signal f(xi)*yi to generate a doubly gated response f(xi)*yi*zi.
    || Associative learning, gated steepest descent learning (Grossberg 1969). d[dt: zi] = E*f(xi)*yi*[-zi + I_US], E-term read-out of CS gated signal, []-term read-out of US. Output from each population: f(xi)*yi*zi doubly gated signal.
  • image p546fig15.12 The adaptive weights zi in the spectrum learn fastest whose sampling signals are large when the US occurs, as illustrated by the green region in this simulation of (Grossberg, Schmajuk 1989).
    || Computer simulation of spectral learning. (left) fast (right) slow. Constant ISI: 6 cells fast to slow, 4 learning trials, 1 test trial.
  • image p546fig15.13 The total learned response is a sum R of all the doubly gated signals in the spectrum.
    || Adaptive timing is a population property. Total output signal: R = sum[i: f(xi)*yi*zi]. Adaptive timing is a collective property of the circuit. "Random" spectrum of rates achieves good collective timing.
  • image p547fig15.14 An individual's survival depends upon being able to process ?UN?expected non-occurrences, or disconfirmations, of goals differently from EXPECTED non-occurrences, or disconfirmations. See the text for details.
    || Unexpected non-occurences of goal: a predictive failure, eg reward that does not occur at the expected time. Leads to Orienting Reactions: Cognitive- STM reset, attention shift, forgetting; Emotional- Frustration; Motor- Exploratory behaviour;. What about an Expected non-occurrence? predictive signal, all other events, expected goal.
  • image p547fig15.15 Expected non-occurences do not prevent the processing of sensory events and their expectations. Rather, they prevent mismatches of those expectations from triggering orienting reactions.
    || Expected non-occurrence of goal. Some rewards are reliable but delayed in time. Does not lead to orienting reactions: How? Both expected and unexpected nonoccurrences are diue to mismatch of a sensory event with learned expectations. Expected non-occurrences do not inhibit sensory matching: eg a pigeon can see an earlier-than-usual food pellet. Hypothesis: Expected non-occurrences inhibit the process whereby sensory mismatch activates orienting reactions. Mismatch not-> orient.
  • image p548fig15.16 Homologous recognition learning and reinforcement learning macrocicuits enable adaptively timed conditioning in the reinforcement learning circuit to increase inhibition of the orienting system at times when a mismatch in the recognition system would have reduced inhibition of it.
    || Homolog between ART and CogEM model, complementary systems. [Recognition, Reinforcement] learning vs [Attentional, Orienting] system. Reinforcement: timing, drive representation.
  • image p548fig15.17 The timing paradox asks how inhibition of an orienting response (-) can be spread throughout the ISI, yet accurately timed responding can be excited (+) at the end of the ISI.
    || Timing paradox. [CS light, US shock] vs t. ISI = InterStimulus Interval = expected delay of reinforcer. Want timing to be accurate. Want to inhibit exploratory behaviour throught ISI.
  • image p549fig15.18 The Weber Law solves the timing paradox by creating an adaptively timed response throughout the ISI that peaks at the ISI. Within the reinforcement learning circuit, this response can maintain inhibition of the orienting system A at the same time as it generates adaptively timed incentive motivation to the orbitofrontal cortex.
    || Weber Law: reconciling accurate and distributed timing. Resolution: Output can inhibit orienting, peak response probability. What about different ISIs? Standard deviation = peak time. Weber law rule.
  • image p549fig15.19 How the adaptively timed hippocampal spectrum T inhibits (red arrow) the orienting system A as motivated attention in orbitofrontal cortex Si(2) peaks at the ISI.
    || Conditioning, Attention, and Timing circuit. Hippocampus spectrum-> Amgdala orienting system-> neocortex motivational attention. Adaptive timing inhibits orienting system and maintains adaptively timed Motivated Attention on the CS.
  • image p550fig15.20 Adaptively timed conditioning of Long Term Depression, or LTD, occurs in the cerebellum at synapses between parallel fibres and Purkinje cells, thereby reducing inhibition of subcortical nucleus cells and enabling them to express their learned movement gains within the learned time interval. Also see Figure 15.21.
    || [CS-Activated input pathways parallel fibres, US-Activated climbing fibres]-> [Subcortical nucleus (gain control), Cerebella cortex- Purkinje cells (timing)].
  • image p551fig15.21 The most important cells types and circuitry of the cerebellum: Purkinje cells (PC) receive excitatory inputs from the climbing fibres (CF) that originate in the inferior olive (IO) and from parallel fibres (PF), which are the axons for granule cells (GC). GCs, in turn, receive inputs from the mossy fibres (MF) coming from the precerebellar nuclei (PCN). The PF also inhibit PC via basket cells (BC), thereby helping to select the most highly activated PC. The PC generate inhibitory outputs from the cerebellum cortex to the deep cerebellar nuclei (DCN), as in Figure 15.20. Excitatory signals are denoted by (+) and inhibitory signals by (-). Other notations: GL- granular layer; GoC- golgi cells; ML- molecular layer; PCL- Purkinje cell layer; SC- stellate cell; WM- white matter.
    ||
  • image p551fig15.22 Responses of a retinal cone in the turtle retina to brief flashes of light of increasing intensity.
    || response vs msc.
  • image p552fig15.23 Cerebellar biochemistry that supports the hypothesis of how mGluR supports adaptively timed conditioning at cerebellar Purkinje cells. AMPA, Amino-3-hydroxy-5-methyl4-isoxazole priopionic acid-sensitive glutamate receptor; cGMP, cyclic guanosine monophosphate; DAG, diacylglycerol; glu, glutamate; GC, guanylyl cyclase; gK, Ca+-dependent K+ channel protein; GTP, guanosine triphosphate; IP 3'inositol,4,5-triphosphate; NO, nitric oxide; NOS, nitric oxide synthase; P, phosphate; PLC, phospholipase C; PKC, protein kinase C; PKG, cGMP-dependent protein kinase; PP-I, protein phosphatase-i;.
    || climbing fibre induced depolarization, parallel fibre induced mGLuR1 activation. PDE, GTP, 5'GMP, G-substrate, calcineurin, AMPA...
  • image p556fig15.24 (a) Data showing normally timed responding (solid curve) and short latency responses after lesioning cerebellar cortex (dashed curve). (b) computer simulation of short latency response after ablation of model cerebellar cortex.
    ||
  • image p557fig15.25 Computer simulations of (a) adaptively timed long term depression at Purkinje cells, and (b) adaptively timed activation of cereballar nuclear cells.
    || response vs time (msec)
  • image p557fig15.26 Brain regions and processes that contribute to autistic behavioral symptoms when they become imbalanced in prescribed ways.
    || Basal Gamglia prolonged gate opening <-> { Amygdala emotionally depressed-> [hippocampus- hyperspecific learning; Cerebellum- adaptive timing fails; hypofrontal blocking fails, no Theory of Mind]-> Neocortex; Neocortex- rewards not received-> Amygdala}.
  • image p559fig15.27 Brain regions and processes that contribute to the release of dopaminergic Now Print signals by the substantia nigra pars compacta, or SNc, in response to unexpected reinforcing events. See the text for details.
    || Model of spectrally timed SNc learning (Brown, Bulloch, Grossberg 1999). Delayed inhibitory expectations of reward. Dopamine cells signal an error in reqard prediction timing or magnitude. Immediate excitatory predictions of reward. Lateral hypothalamus (Primary Reward Input)-> [(+)ventral striatum <-> ventral pallidium (+)-> PPTN(+)-> SNc]. SNc-> [dopamine signal -> ventral striatum, Striosomal cells]. Conditioned Stimuli (CS)(+)-> [ventral striatum, striosomal cells]. Striosomal cells(-)-> SNc.
  • image p559fig15.28 Neurophysiological data (left column) and model simulations (right column) of SNc responses. See the text for details.
    || membrane potential vs time
  • image p560fig15.29 Excitatory pathways that support activation of the SNc by a US and the conditioning of a CS to the US.
    || Excitatory pathway. Primary reward (apple juice) briefly excites lateral hypothalamus. Hypothalamic-PPTN excitation causes SNc dopamine burst. Hypothalamic activity excites ventral striatum for training. Active CS working memory signals learn to excite ventral striatum. Lateral hypothalamus (Primary Reward Input)-> [(+)ventral striatum <-> ventral pallidium(+)-> PPTN(+)-> SNc]. SNc-> [dopamine signal -> ventral striatum. Conditioned Stimuli working memory trace (CS)(+)-> ventral striatum.
  • image p560fig15.30 The inhibitory pathway from striosomal cells to the SNc is able to inhibit the SNc when a reward occurs with expected timing and magnitude.
    || Inhibitory pathway. Learning: CS-striosomal LTP occurs due to a three-way coincidence [An active CS working memory input, a Ca2+ spike, a dopamine burst]; Signaling: The delayed Ca2+ spike facilitates striosomal-SNc inhibition;. Striosomal cells learn to predict both timing and magnitude of reward signal to cancel it: reward expectation;. Conditioned stimuli (CS) LTP-> Striosomal cells <- dopamine | (-)-> SNc->.
  • image p561fig15.31 The CS activates a population of striosomal cells that respond with different delays in order to enable adaptively timed inhibition of the SNc.
    || Expectation timing (Fiala, Grossberg, Bulloch 1996; Grossberg, Merrill 1992, 1996; Grossberg, Schmajuk 1989). How do cells bridge hundreds of milliseconds? Timing spectrum (msec). 1. CS activates a population of cells with delayed transient signals: MGluR. 2. Each has a different delay, so that the range of delays covers the entire interval. 3. Delayed transients gate both learning and read-out of expectations.
  • image p561fig15.32 The SNc can generate both dopamine bursts and dips in response to rewards whose amplitude is unexpectedly large or small.
    || Inhibitory pathway: expectation magnitude. 1. If reward is greater than expected, a dopamine burst causes striosomal expectation to increase. 2. If reward is less than expected, a dopamine dip causes striosomal expectation to decrease. 3. This is a negative feedback control system for learning. Conditioned stimuli (CS)-> Striosomal cells <- dopamine | (-)-> SNc->.
  • image p563fig15.33 The basal ganglia gate neural processing in many parts of the brain. The feedback loop through the lateral orbitofrontal cortex (blue arrow, lateral orbitofrontal) is the one that MOTIVATOR models.
    || MOTIVATOR models one of several thalamocortical loops through basal ganglia (Adapted from Fundamental Neuroscience. 2002 Copyright Elsevier). [cortex-> striatum-> pallidum S. nigra-> thalamus] vs [motor, oculomotor, dorsolateral prefrontal, lateral orbitofrontal, anterior cingulate]. thalamus-> [striatum, cortex].
  • image p563fig15.34 The colored regions are distinct parts of the basal ganglia in the loops depicted in Figure 15.33.
    || Distinct basal ganglia zones for each loop (Adapted from Fundamental Neuroscience. 2002 Copyright Elsevier).
  • image p564fig15.35 (a) A pair of recurrent shunting on-center off-surround networks for control of the fore limbs and hind limbs. (b) Varying the GO signal to these networks can trigger changes in movement gaits. See the text for details.
    ||
  • image p565fig15.36 (a) The FOVEATE model circuit for the control of saccadic eye movements within the peri-pontine reticular formation. (b) A simulated saccade staircase. See the text for details.
    || [left, right] eye FOVEATE model. [vertical vs horizontal] position (deg).
  • image p566fig15.37 Steps in the FOVEATE model's generation of a saccade. See the text for details.
    || input(+)-> LLBN-> [(-)OPN, (+)EBN], EBN(-)-> LLBN. (A) rest OPN active. (B) charge [input, LLBN, OPN] active. (C) burst [input, LLBN, EBN] active. (D) shutdown [OPN, EBN] active.
  • image p567fig15.38 (a) The Gated Pacemaker model for the control of circadian rythms is a recurrent shunting on-center off-surround network whose excitatory feedback signals are gated by habituative transmitters. Tonic arousal signals energize the pacemaker. Diurnal (left) and nocturnal (right) pacemakers are determined by whether phasic light signals turn the pacemaker on or off. An activity-dependent fatigue signal prevents the pacemaker from becoming overly active for too long. (b) Two simulations of circadian activity cycles during different schedules of light (L) and dark (D). See the text for details.
    || sourceOn-> on-cells (recurrent) <-(-) (-)> off-cells (recurrent) <-sourceOff. on-cells-> activity-> off-cells. off-cells-> fatigue. Diurnal: sourceOn=[light, arousal]; sourceOff=arousal;. Nocturnal: sourceOn=arousal; sourceOff=[arousal, light];.
  • image p568fig15.39 Circuits of the MOTIVATOR model that show hypothalamic gated dipoles.
    || [inputs, -> [object, value] categories-> object-value categories-> [reward expectation filter, [FEF, EAT] outputs]. reward expectation filter [DA dip, arousal burst]-> alpha1 non-specific arousal-> value categories. Msi drive inputs-> value categories.
  • image p569fig15.40 The direct and indirect basal ganglia circuits that control GO and STOP movement signals. See the text for details.
    || [Direct path GO(+), Indirect path STOP(+), dopamine from SNc(+-)]-> striatum. GO-> GPi/SNr-> Thalamus (VA/Vlo) <-> frontal cortex. STOP-> GPe <-> STN-> GPi/SNr. NAc-> GPi/SNr.
  • image p573fig16.01 The experimental chamber (A) and neurophysiological recordings from a rat hippocampus (B) that led to the discovery of place cells. See the text for details.
    ||
  • image p574fig16.02 Neurophysiological recordings of 18 different place cell receptive fields. See the text for details.
    ||
  • image p575fig16.03 As a rat navigates in its experimental chamber (black curves), neurophysiological recordings disclose the firing patterns (in red) of (a) a hippocampal place cell and (b) an entrorhinal grid cell.
    ||
  • image p578fig16.04 Cross-sections of the hippocampal regions and the inputs to them. See the text for details.
    || EC-> CA1-> CA3-> DG. Layers [V/V1, II, II].
  • image p580fig16.05 Macrocircuit of the GridPlaceMap model, which can learn both 2D grid cells and place cells in response to realistic trajectories of navigating rats using a hierarchy of SOMs with identical equations.
    || GridPlaceMap model: rate-based and spiking (Pilly, Grossberg 2012). Pre-wired 1D stripe cells, learns both 2D frid and place cells! Same laws for both; both select most frequent and energetic inputs. Place cells emerge gradually in response to developing grid cells. [place-> grid-> stripe] cells-> path integration-> vestibular signals
  • image p581fig16.06 The learning of hexagonal grid cell receptive fields as an animal navigates an open field is a natural consequence of simple trigonometric properties of the positions at which the firing of stripe cells that are tuned to different directions will co-occur.
    || The Trigonometry of spatial navigation. Coactivation of stripe cells.
  • image p582fig16.07 Stripe cells were predicted in (Mhatre, Gorchetchnikov, Grossberg 2012) to convert linear velocity signals into the distances travelled in particular directions. They are modeled by directionally-sensitive ring attractors, which help to explain their periodic activation as an animal continues to move in a given direction. See the text for details.
    || Stripe cells. Stripe cells are predicted to exist in (or no later than) EC layer (III, V/VI). Linear path integrators: represent distance traveled using linear velocity modulated with head direction signal. Ring attractor circuit: the activity bump represents distance traveled, stripe cells with same spatial period and directional preference fire with different spatial phases at different ring positions. Distance is computed directly, it does not require decoding by oscillatory interference. Periodic stripe cell activation due to ring anatomy: periodic boundary conditions. Stripe firing fields with multiple orientations, phases and scales.
  • image p582fig16.08 Some experimental evidence for stripe-like cell receptive fields has been reported. The band cells posited by Neil Burgess also exhibit the one-dimensional firing symmetry of stripe cells, but are modeled by oscillatory intererence. See the text for details.
    || Evidence for stripe-like cells. Entorhinal cortex data (Sargolini, Fyhn, Hafting, McNaughton, Witter, Moser, Moser 2006; Krupic, Burgess, O'Keefe 2012). Similar hypothetical construct used by Interference model but position is decoded by grid cell oscillatory interference- Band Cells (Burgess 2008).
  • image p583fig16.09 The GRIDSmap model used algorithmically defined stripe cells to process realistic rat trajectories. The stripe cell outputs then formed inputs to the adaptive filter of a self-organizing map which learned hexagonal grid cell receptive fields.
    || GRIDSmap. Self-organizing map receives inputs from stripe cells and learns to respond to most frequent co-activation patterns. Stripe cells combine speed and head direction to create a periodic 1D position code. Virtual rat navigated using live rat trajectories from Moser Lab. Speed and head direction drives stripe cells.
  • image p583fig16.10 The GRIDSmap model is embedded into a more complete representation of the processing stages from receipt of angular head velocity and linear velocity signals to this learning of place cells.
    || GRIDSmap. Pre-wired 2D stripe cells, learns 2D grid cells. vestibular cells [angular head velocity-> head direction cells, linear velocity]-> stripe cells- small scale 1D periodic spatial code (ECIII)-> SOM grid cells entorhinal cortex- small scale 2D periodic spatial scale-> SOM place cells hippocampal cortex- large scale 2D spatial code (dentate/CA3). Unified hierarchy of SOMs.
  • image p584fig16.11 GRIDSmap simulation of the learning of hexagonal grid fields. See the text for details.
    || Simulation results. Multiple phases per scale. response vs lenght scale (0.5m+).
  • image p584fig16.12 Temporal development of grid cell receptive fields on successive learning trials (1,3,5,7,25,50,75,100).
    || Temporal development of grid fields. Cells begin to exhibit grid structure by 3rd trial. Orientations of the emergent grid rotate to align with each other over trials.
  • image p585fig16.13 Hexagonal grid cell receptive fields develop if their stripe cell directional preferences are separated by 7, 10, 15, 20, or random numbers degrees. The number and directional selectivities of stripe cells can thus be chosen within broad limits without undermining grid cell development.
    ||
  • image p585fig16.14 Superimposing firing of stripe cells whose directional preferences differ by 60 degrees supports learning hexagonal grid cell receptive fields in GRIDSmap.
    || GRIDSmap: from stripe cells to grid cells. Grid-cell Regularity from Integrated Distance through Self-organizing map. Superimposing firing of stripe cells oriented at intervals of 60 degrees. Hexagonal grid!
  • image p586fig16.15 Superimposing stripe cells oriented by 45 degrees does not lead to learning of rectangular grids in GRIDSmap, but it does in an oscillatory inference model.
    || Why is a hexagonal grid favored? Superimposing firing of stripe cells oriented at intervals of 45 degrees. Rectangular grid. This and many other possibilities do not happen in vivo. They do happen in the oscillatory inference model. How are they prevented in GRIDSmap?
  • image p586fig16.16 In the place cell learning model of (Gorchetnikov, Grossberg 2007), three populations of five cells each of entorhinal grid cells (only two are shown) with different spatial periods input to the model's dentate gyrus. The grid cells are one-dimensional and defined algorithmically. A model dentate gyrus granule cell that receives strong projections from all three grid cell scales fires (green cell) and activates a recurrent inhibitory interneuron that inhibits other granule cells. It also generates back-propagating action potentials that trigger learning in the adaptive weights of the projections from the grid cells, thereby causing learning of place cell receptive fields.
    || Grid-to-place Self-Organizing map (Gorchetnikov, Grossberg 2007). Formation of place cell fields via grid-to-place cell learning. Least common multiple: [grid (cm), place (m)] scales: [40, 50, 60 (cm); 6m], [50, 60, 70 (cm); 21m], [41, 53, 59 (cm); 1.282 km]. Our simulations: [40, 50 (cm); 2m], [44, 52 (cm); 5.72m]. Our SOM: Spiking Hodgkin-Huxley membrane equations; Nonlinear choice by contrast-enhancing recurrent on-center off-surround net;. Choice triggers back-propagating action potentials that induce STDP-modulated learning on cell dendrites.
  • image p587fig16.17 A finer analysis of the 2D trigonometry of spatial navigation showed that both the frequency and amplitude of coactivations by stripe cells determine the learning of hexagonal grid fields.
    || A refined analysis: SOM amplifies most frequent and energetic coactivations (Pilly, Grossberg 2012). [linear track, 2D environment]. (left) Stripe fields separated by 90°. 25 coactivations by 2 inputs. (right) Stripe fields separated by 60°. 23 coactivations by 3 inputs.
  • image p588fig16.18 Simulations of coordinated learning of grid cell receptive fields (second row) and unimodal place cell receptive fields (third row) by the hierarchy of SOMs in the GridPlaceMap model. Note the exquisite regularity of the hexagonal grid cell firing fields.
    || [stripe, grid, place] cells vs [spikes on trajectory, unsmoothed rate map, smoothed rate map].
  • image p589fig16.19 Neurophysiological data showing the smaller dorsal grid cell scales and the larger ventral grid cell scales.
    || Spatial scale of grid cells increase along the MEC dorsoventral axis (Hafting etal 2005; Sargolini etal 2006; Brun etal 2008). [dorsal (left), ventral (right)] cart [rate map, autocortelogram]. How does the spatial scale increase along the MEC dorsoventral axis?
  • image p590fig16.20 Integration rate of grid cells decreases along the dorsoventral gradient of the Medial Entorhinal Cortex, or MEC.
    || Dorsoventral gradient in the rate of synaptic integration of MEC layer II stellate cells (Garden etal 2008). Cross-section of [Hp, CC, LEC, MEC. (A left column) [dorsal, ventral] mV? vs msec. (B center column) [half width (ms), rise time (ms), amplitude (mV)] vs location (μm). (C right upper) responses (D right lower) width (ms) vs loacation (μm).
  • image p590fig16.21 Frequency of membrane potential oscillations in grid cells decreases along the dorsoventral gradient of the MEC.
    || Dorsoventral gradient in the frequency of membrane potential oscillations of MEC layer II stellate cells (Giocomo etal 2007). (C left column) Oscillation (Hz) vs distance from dorsal surface (mm). (D right upper) [dorsal, ventral oscillations 5mV-500ms. (E right lower) [dorsal, ventral oscillations 100ms. Both membrane potential oscillation frequency and resonance frequency decrease from the dorsal to ventral end of MEC.
  • image p591fig16.22 Time constants and duration of afterhyperpolarization currents of grid cells increase along the dorsoventral gradient of the MEC.
    || Dorsoventral gradient in afterhyperpolarization (AHP) kinetics of MEC layer II stellate cells (Navratilova etal 2012). [mAHP time constant (ms), Half-width (mm)] vs distance from the dorsal surface (mm), at [-55, -50, -45] mV. Time constants and duration of AHP increase from the dorsal to the ventral end of MEC layer II. Effectively, the relative refractory period is longer for ventral stellate cells in MEC layer II.
  • image p591fig16.23 The Spectral Spacing Model uses a rate gradient to learn a spatial gradient of grid cell receptive field sizes along the dorsoventral gradient of the MEC.
    || Spectral spacing model. Map cells responding to stripe cell inputs of multiple scales. Grid cells: MEC layer II (small scale 2D spatial code). Stripe cells: PaS / MEC deep layer (small scale 1D spatial code). Path Integration. Vestibular signals- linear velocity and angular head velocity. SOM. How do entorhinal cells solve the scale selection problem?
  • image p592fig16.24 Parameter settings in the Spectral Spacing Model that were used in simulations.
    || Simulation settings. Activity vs distance (cm). Learning trials: 40.
  • image p593fig16.25 Spectral Spacing Model STM, MTM, and LTM equations. The rate spectrum that determines the dorsoventral gradient of multiple grid cell properties is defined by μm.
    || Spectral Spacing Model equations. [STM, MTM, LTM]. μm = rate spectrum.
  • image p593fig16.26 Data (left column) and simulations (right column) of the gradient of increasing grid cell spacing along the dorsoventral axis of MEC.
    || Gradient of grid spacing along dorsoventral axis of MEC (Brun etal 2008). data-[Distance (m?), Median grid spacing (m?)] simulations-[Grid spacing (cm), Grid spacing (cm)] vs response rate.
  • image p594fig16.27 Data (left column) and simulations (right column) of the gradient of increasing grid cell field width along the dorsoventral axis of MEC.
    || Gradient of field width along dorsoventral axis of MEC (Brun etal 2008). data-[Distance (m?), Width autocorr peak (m?)] simulations-[Grid field width (cm), Width autocorr peak (cm)] vs response rate.
  • image p595fig16.28 Data (left column) and simulations (right column) about peak and mean grid cell response rates along the dorsoventral axis of MEC.
    || Peak and mean rates at different locations along DV axis of MEC (Brun etal 2008). Peak rate (Hz) vs [data- DV quarter, simulations- Response rate].
  • image p596fig16.29 Data (top row) and simulations (bottom row) showing decreasing frequency of subthreshold membrane potential oscillations along the DV axis of MEC.
    || Subthreshold membrane potential oscillations at different locations along DV axis of MEC (Giocomo etal 2020; Yoshida etal 2011). Data [oscillations (Hz) vs distance from dorsal surface (mm) @[-50, -45] mV, Frequency (Hz) vs [-58, -54, -50] mV]. Simulations MPO frequency (Hz) s [response, habituation] rate.
  • image p596fig16.30 Data (top row) and simulations (bottom row) of spatial phases of learned grid and place cells.
    || Spatial phases of learned grid and place cells (Hafting etal 2005). Data: Cross-correlogram of rate maps of two grid cells; Distribution of phase difference: distance from origin to nearest peak in cross-correlogram. Simulations: Grid cell histogram of spatial correlation coefficients; Place cell histogram of spatial correlation coefficients.
  • image p597fig16.31 Data (a) and simulations (b-d) about multimodal place cell receptive fields in large spaces. The simulations are the result of learned place fields.
    || Multimodal place cell firing in large spaces (Fenton etal 2008; Henriksen etal 2010; Park etal 2011). Number of cells (%) vs Number of place fields. [2, 3] place fields, 100*100 cm space.
  • image p597fig16.32 Data (top row) and simulations (bottom row) about grid cell development in juvenile rats. Grid score increases (a-b and d), whereas grid spacing remains fairly flat (c and e).
    || Model fits data about grid cell development (Wills etal 2010; Langston etal 2010). Data: [Gridness, grid score, inter-field distance (cm)]. Simulations: [Gridness score, Grid spacing (cm)] vs trial.
  • image p598fig16.33 Data (top row) and simulations (bottom row) of changes in place cell properties in juvenile rats, notably about spatial information (a,c) and inter-trial stability (b,d).
    || Model fits data about grid cell development (Wills etal 2010). [Data, Simulation] vs [spatial information, inter-trial stability]. x-axis [age (postnatal day), trial].
  • image p598fig16.34 The spiking GridPlaceMap model generates theta-modulated place and grid cell firing, unlike the rate-based model.
    || Theta-modulated cells in spiking model. [place, grid] cell vs [membrane potential (mV vs time), frequency vs inter-spike intervals (s), power spectra (normalized power vs frequency (Hz))].
  • image p599fig16.35 Data (a) and simulations (b,c) about anatomically overlapping grid cell modules. (a) shows the anatomical distribution of grid cells belonging to different modules in one animal. DV location (mm) vs postrhinal border. (b) shows the simulated distribution of learned grid cell spacings from two stripe cell scales. frequency (%) vs grid spacing (cm). mu = [1, 0.6]. (c) shows what happens when half the cells respond with one rate and half another rate. (d) shows the same with three rates. (e-g) show spatial maps and autocorrelograms of grid cells that arise from the different rates in (d). [rate map, autocorelogram] vs [score [1.07, 0.5, 0.67], spacing (cm) [23.58, 41, 63.64]].
    ||
  • image p600fig16.36 The entorhinal-hipppocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories. See the text for details.
    || Entorhinal-hippocampal interactions as an ART system. Hippocampal place cells as spatial categories. Angular head velocity-> head direction cells-> stripe cells- small scale 1D periodic code (ECIII) SOM-> grid cells- small scale 2D periodic code (ECII) SOM-> place cells- larger scale spatial map (DG/CA3)-> place cells (CA1)-> conjunctive-coding cells (EC V/VI)-> top-down feedback back to stripe cells- small scale 1D periodic code (ECIII). stripe cells- small scale 1D periodic code (ECIII)-> place cells (CA1).
  • image p602fig16.37 Data showing the effect of hippocampal inactivation by muscimol on grid cell firing before, during, and six hours after the muscimol, reading from left to right.
    || Hippocampal inactivation disrupts grid cells (Bonnevie etal 2013). muscimole inactivation. spikes on trajectory: [before, after min [6-20, 20-40, 40-60, 6h]]. rate map (Hz) [18.6, 11.4, 9.5, 6.7, 10.8]. spatial autocorrelogram g=[1.12, 0.05, -0.34, 0.09, 1.27].
  • image p603fig16.38 Role of hippocampal feedback in maintaining grid fields. (a) Data showing the effect of hippocampal inactivation before and during muscimol inhibition of hippocampal cells, as in Figure 16.37. (b) Model simulation with normal grid fields. (c) Model simulation that emulates the effect of hippocampal inhibition on grid fields.
    || (a) Data: hippocampal inactivation [before, after] cart [spikes on trajectory (p: [18.6, 6.7] Hz), spatial autocorrelogram (g= [1.12, 0.09])]. (b) Model: noise-free path integration, [spikes on trajectory (p: 14.56 Hz), rate map, spatial autocorrelogram (g= 1.41), dynamic autocorrelogram (g=0.6)]. (c) Model: noisy path integration + non-specific tonic inhibition, [spikes on trajectory (p: 11.33 Hz), rate map, spatial autocorrelogram (g= 0.05), dynamic autocorrelogram (g=0.047)].
  • image p605fig16.39 Data showing effects of medial septum (MS) inactivation on grid cells and network theta oscillations in medial entorhinal cortex (MEC). (A) Examples of disruption in the spatial expression of the hexagonal grid structure for two grid cells (Brandon etal 2011). (B) Temporal reduction in the power and frequency of network theta oscillations (Koenig etal 2011). (C) Temporary reduction in the gridness score, mean firing rate, and patial stability of grid cells (Koenig etal 2011).
    || Disruptive effects of Medial Septum inactivation in Medial Entorhinal Cortex (Brandon etal 2011; Koenig etal 2011). (A) Rate map [rate map, spatial autocorrelations, trajectory] vs [baseline, sub-sampled, medial septum inactivation, 3-6 hour recovery, 24 hour recovery], [rate map (Hz- m, p), spatial autocorrelations (gridness)][ 1.2, 7.2, 1.1; 0.25, 1.7, 0.6; 0.25, 2.5, -0.53; 0.7, 5.1, 0.55; 1.0, 5.3, 1.3; 2.1, 15, 0.19; 1.7, 12, 0.71; 1.7, 3.2, -0.22; 1.8, 9.1, 0.68; 2.5, 13, 0.46]. (B) [normalized power at 7-9 Hz, frequency (Hz)] vs 5-minute periods. (C) [mean gridness score (+-SEM), mean firing rate (% of baseline), mean correlation coeff (+-SEM)] vs 10-minute periods.
  • image p607fig16.40 Effects of medial septum (MS) inactivation on grid cells. (a) Each row shows data and different data-derived measures of grid cell responsiveness, starting from the left with the baseline response to the middle column with maximal inhibition. (b) Data showing the temporary reduction in the gridness scores during MS inactivation, followed by recovery. (c) Simulation of the collapse in gridness, achieved by reduction in cell response rates to mimic reduced cholinergic transmission. (d,e) Simulations of the reduction in gridness scores in (d) by reduction of cell response rates, in (e) by changing the leak conductance. See the text for details.
    ||
  • image p611fig16.41 How back-propagating action potentials, supplemented by recurrent inhibitory interneurons, control both learning within the synapses on the apical dendrites of winning pyramidal cells, and regulate a rythm by which associative read-out is dissociated from read-in. See the text for details.
    ||
  • image p612fig16.42 Macrocircuit of the main SOVEREIGN subsystems.
    || [reward input, drive input, drive representation (DR), visual working memory and planning system (VWMPS), visual form and motion system (VFMS), motor approach and orienting system (MAOS), visual input (VisIn), motor working memory and planning system (MWMPS), motor approach and orienting system (MAOS), motor plant (MotP), Proprioceptive Input (PropIn), Vestibular Input (VesIn), Environmental feedback (EnvFB). DR [incentive motivational learning-> [VWMPS, MWMPS], -> VFMS, -> MAOS], VWMPS [conditioned reinforcer learning-> DR, MAOS], VFMS [visual object categories-> VWMPS, reactive movement commands-> MAOS], MWMPS [conditioned reinforcer learning-> DR, planned movement commands-> MAOS], MAOS [motor map positions-> MWMPS, motor outflow-> MotP], VisIn-> VFMS, VesIn-> MAOS, EnvFB-> [VisIn, MotP, VesIn].
  • image p613fig16.43 The main visual form and motion processing stream mechanisms of SOVEREIGN, many of them described at length in previous chapters.
    || Render 3-D scene (R3DS), figure-ground separation (FGS), log-polar transform (LPT), Gaussian coarse-coding (GCC), Invariant visual target map (IVTM), What Fuzzy ART (WhatFuzz), body spatial coordinates (BSC), where reactive visual TPV storage (WRVTS), Directional transient cell network (DTCN), Motion direction hemifild map (MDHM), Hemifiled left/right scoring (HLRS), reactive visual control signal (RVCS), Parvo/Magno/Erg competition (PMEC), Approach and Orient GOp (AOGp), GOm (GOm). R3DS [parvo-> FGS, magno-> DTCN], FGS-> [LPT, WRVTS], LPT-> GCC-> IVTM-> WhatFuzz, BSC-> [RVTS, PMEC], PMEC-> [gateRVTS-> RVTS, gateRVCS-> RVCS], DTCN-> MDHM-> HLRS, HLRS-> [PMEC, RVCS], AOGp-> gateRVTS, GOm-> gateRVCS.
  • image p613fig16.44 The main target position vector (TPV), difference vector (DV), and volitional GO computations in SOVEREIGN that bring together reactive and planned signals to control decision-making and action. See the text for details.
    || Reactive visual TPV (RVT), NETs (NETs), S-MV mismatch (SMVM), NETmv (NETmv), reactive visual TPV storage (RVTS), reactive DV1 (RD1), NET (NET), motivated what and where decisions (MWWD), Planned DV1 (PD1), tonic (Tonic), top-down readout mismatch (TDRM), Parvo gate (tonic) (PG), Orienting GOp offset (OGpO). RVT-> [NETs, RVTS], NETs-> [SMVM, NET], SMVM-> NET, NETmv-> SMVM, RVTS-> [NETs, RD1], NET-> [RD1, PD1, TDRM], MWWD-> PD1, PD1-> Tonic-> TDRMPG-> NETs, OGpO-> [NETmv, PD1].
  • image p614fig16.45 The main distance (d) and angle (a) computations that bring together and learn dimensionally-consistent visual and motor information whereby to make the currently best decisions and actions. See the text for details.
    || Reactive Visual TPV [m storage], NETm S-MV mismatch, MV mismatch, NETmv, PPVv, PPVm, Vestibular feedback, motor copy.
  • image p615fig16.46 SOVEREIGN uses homologous processing stages to model the (a) What cortical stream and the (b) Where cortical stream, including their cognitive working memories and chunking networks, and their modulation by motivational mechanisms. See the text for details.
    ||
  • image p615fig16.47 SOVEREIGN models how multiple READ circuits, operating in parallel in response to multiple internal drive sources, can be coordinated to realize a sensory-drive heterarchy that can maximally amplify the motivationally most currently favored option.
    ||
  • image p616fig16.48 SOVEREIGN was tested using a virtual reality 3D rendering of a cross maze (a) with different visual cues at the end of each corridor.
    ||
  • image p616fig16.49 The animat learned to convert (a) inefficient exploration of the maze into (b) an efficient direct learned path to the goal.
    ||
  • image p617fig16.50 The perirhinal and parahippocampal cortices enable adaptively timed reinforcement learning and spatial navigational processes that are modeled by Spectral Spacing models in the What and Where cortical streams, respectively, to be fused in the hippocampus.
    || What and Where inputs to the hippocampus (Diana, Yonelinas, Ranganath 2007). Adaptively timed conditioning and spatial naviga039tbl01.03 tion. Hippocampus <-> Entorhinal Cortex <-> [Perirhinal Cortex <-> what, Parahippocampal Cortex <-> where].
  • image p627tbl17.01 Homologs between reaction-diffusion and recurrent shunting cellular network models of development.
    || byRows: (reaction-diffusion, recurrent shunting net) (activator, excitatory activity) (inhibitor, inhibitory activity) (morphogenic source density, inputs) (firing of morphogen gradient, contrast enhancement) (maintenance of morphogen gradient, short-term memory) (power or sigmoidal signal functions, power or sigmoidal signal functions) (on-center off-surround interactions via diffusion, on-center off-surround interactions via signals) (self-stabilizing distributions of morphogens if inhibitors equilibrate rapidly, short-term memory pattern if inhibitors equilibrate rapidly) (periodic pulses if inhibitors equilibrate slowly, periodic pulses if inhibitors equilibrate slowly) (regulation, adaptation).
  • image p628fig17.01 A hydra
    ||
  • image p628fig17.02 Schematics of how different cuts and grafts of the normal Hydra in (a) may (*) or may not lead to the growth of a new head. See the text for details.
    ||
  • image p629fig17.03 How an initial morphogenetic gradient may be contrast enhanced to exceed the threshold for head formation in its most active region.
    || head formation threshold, final gradient, initial gradient.
  • image p630fig17.04 Morphogenesis: more ratios (Wolpert 1969). Shape preserved as size increases. French flag problem. Use cellular models! (Grossberg 1976, 1978) vs chemical or fluid reaction-diffusion models (Turing 1952; Gierer, Meinhardt 1972).
    ||
  • image p631fig17.05 How a blastula develops into a gastrula. See the text for details.
    || 1. The vegetal pole of the blastula flattens, [Animal, vegetal] hemisphere, blastocoel. 2. Some cells change shape and move inward to form the archenteron, Elastopore. 3. Other cells break free, becoming mesenchyme. 4. Then extensions of mesenchyme cells attach to the overlying ctoderm, Archenteron. 5. The archenteron elongates, assisted by the contraction of mesenchyme cells. 6. The mouth will form, where the archenteron meets ectoderm. 7. The blastopone will form the anus of the mature animal. [Mesenchyme, Ectoderm, Endoderm, Blastocoel, Archenteron, Mesenchyme]. Concept 38.3, www.macmillanhighered.com
  • image p634fig17.06 Summing over a population of cells with binary output signals whose firing thresholds are Gaussianly distributed (left image) generates a total output signal that grows in a sigmoidal fashion with increasing input size (dashed vertical line).
    || How binary cells with a Gaussian distribution of output thresholds generates a sigmoidal population signal. [# of binary cells with threshold T, Total output signal] vs Cell firing thresholds T. Cell population with firing thresholds Gaussianly distributed around a mean value. As input increases (dashed line), more cells in population fire with binary signals. Total population output obeys a sigmoid signal function f. Menu Menu
  • As described on the Introduction webPage, questions driving this "webSite" (collection of webPages, defined by the menu above) are :
  • How difficult would it be to augment "Transformer Neural Networks" (TrNNs) with Grossberg's [concept, architecture]s, including the emergent systems for consciousness? Perhaps this would combine the scalability of the former with the [robust, extendable] foundations of the latter, which is supported by [broad, diverse, deep] data from [neuroscience, psychology], as well success in real world advanced [science, engineering] applications? This section is repeated in the Introduction webPage.
    Menu
  • Menu
  • Menu
  • Grossberg 2021 p229c2h0.60 SMART computer simulations demonstrate that a good enough match of a top-down expectation with a bottom-up feature pattern generates an attentive resonance during which the spikes of active cells synchronize in the gamma frequency range of 20-70 Hz (Figure 5.40). Many labs have reported a link between attention and gamma oscillations in the brain, including two articles published in 2001, one from the laboratory of Robert Desimone when he was at the the National Institute of Mental Health in Bethseda (Fries, Reynolds, Rorie, Desimone 2001), and the other from the laboratory of Wolf Singer in Frankfurt (Engel, Fries, Singer 2001). You'll note that Pascal Fries participated in both studies, and is an acknowledged leader in neurobiological studies of gamma oscillations; eg (Fries 2009). .."
  • Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
    The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

    Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

    Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

    These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

    This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

    The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

    I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

    Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

    The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

    If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

    The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

    Howell: bold added for emphasis.
    (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
    see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet). Menu
  • Menu see incorporate reader questions into theme webPage
    see Navigation: [menu, link, directory]s
  • p153 Howell: grepStr 'uncertainty' "multiple conflicting hypothesis"- a slef-imposed practice to avoid becoming a [believer, tool] of a concept. But this wasd intended for [long-term, well-established, mainstream] theories, and well as new ideas that excite me. Does Grossberg's uncertainty" concept also allow for "multiple conflicting hypothesis" to sit there and brew?
  • p190 Howell: [neural microcircuits, modal architectures] used in ART -
    bottom-up filterstop-down expectationspurpose
    instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
    LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
    EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
    auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
    visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
    red - cognitive-emotional dynamics
    green - working memory dynamics
    black - see [bottom-up, top-down] lists
    EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
    CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

    backgound colours in the table signify :
    whitegeneral microcircuit : a possible component of ART architecture
    lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
    light bluepost-perceptual cognition?
    pink"the feeling of what happens" and knowing what event caused that feeling
    Menu Menu Note that a separate webPage lists a very small portion of Stephan Grossberg's publications.
  • J.E. Kaal, A. Otte, J.A. Sorensen, J.G. Emming 2021 "The nature of the atom" www.Curtis-Press.com, 268pp ISBN 978-1-8381280-2-9 https://StructuredAtom.org/
  • rationalwiki.org "Quantum consciousness" (last update 07Nov2022, viewed 16Jul2023)
    also critiques of the article above
  • Terrence J. Sejnowski 21Aug2023 "Large Language Models and the Reverse Turing Test", Neural Computation (2023) 35 (3): 309–342 (33 pages) https://direct.mit.edu/neco/issue (also copy in case original link fails)
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin 12Jun2017 "Attention Is All You Need" [v5] Wed, 6 Dec 2017 03:30:32 UTC https://arxiv.org/abs/1706.03762
  • Wikipedia Consciousness Menu
  • Menu
  • Grossbergs list of [chapter, section]s.html - Note that the links on this webPage can be used to individually view all captioned images.
  • directory of captioned images - users can easily view all of the captioned images, especially if they are downloaded onto their computer. Many image viewers have "forward, backward] arrows to go through these sequentially, or right-click to open a link in a window.
  • core bash script for extracting captions from webPage listing, convert them to images, then vertically appending them to the figure.
  • my bash utility to [position, move] windows. This is normally used to start up 6 workspaces on my computer (Linux Mint Debian Edition), each with 5-10 apps in separate windows.
  • Prepared themes with links to the captioned images - there are a huge number of themes from the book to focus on. I have prepared a few as examples.
  • What is consciousness? - video example not ready as of 30Aug2023. I save videos as "ogv/ogg" files, and open standard format. The "VLC media viewer" is the program that I use to view them. I have found that although some of the standard video viewers complain, when pushed into the process ogv files can be viewed with them. Menu
  • Navigation: [menu, link, directory]s
  • Theme webPage generation by bash script
  • Notation for [chapter, section, figure, table, index, note]s
  • incorporate reader questions into theme webPages
    GNU Public License The GNU Free Documentation License; Creative Commons License Menu
  • A very primitive bash script is used to generate the search results for ALL themes in the Themes webPage. Many readers will already have far better tools for this from the Computational Intelligence area etc.
    Because the theme webPage is automatically generated, and frequently re-generated as I update the list of themes and sources, I do NOT edit the file directly. The output format can be confusing, due to the special formatted [chapter, section] headings, and large tables which will keep the readers guessing whether they are still within the theme they want to peruse (as per the Table of Contents). Perhaps I can upgrade the searches in time to reduce the confusion, and to split themes in a better way.
  • list of [chapter, section]s
  • list of [figure, table]s
  • selected index items - I have NO intention of re-typing the entire index!
  • Grossberg quotes
  • reader Howell notes - this is an example of building your own webPage of [note, comment, thought]s when reading the book, which can them be added to the bash script for searches. Are notes in addition to [figure, table] captions, mostly comprised of text within the image, but also including quotes of text in the book. Rarely, it includes comments by Howell preceded by "Howell".
    The latter are distinct from "readers notes" (see, for example : Grossberg's list items- related notes from others). The reader may want to create their own file of comments based on this example, or augment this list with their [own, others'] notes. If using a new file, it should be added to the bash search script.
    More importantly, and as an easy first adaptation of Grossbergs [core, fun, strange] concepts.html thematic listings, you probably want to get rid of Howell's [comments, question]s. This can be done for a "local directory on your system" simply by :
  • downloading the entire webDirectories below to some directory on your filesystem, say {yourDir} : TrNNs_ART , bin (hopefully I'm not missing too many other directories in this list)
  • adapt the bash script bash script: thematic [search, collect]s.sh to your own system, and run. This will require re-defining several environmental variables for your, such as : Menu
  • thematic sub-lists appear in the webPage "Grossberg's [core, fun, strange] concepts", created by very simple searches for key [word, phrase]s. Links in the sub-lists lead quickly to pertinent figures or other content. Menu
  • 29Sep2023 Here is a list of various problems with the captioned images and their links on the webPage Grossbergs list of [figure, table]s.html :
    10Aug2023 I haven't yet provided content for this webPage. It does touch on one of three questions of this webSite as mentioned in the Introduction :
  • How difficult would it be to augment "Transformer Neural Networks" (TrNNs) with Grossberg's [concept, architecture]s, including the emergent systems for consciousness? Perhaps this would combine the scalability of the former with the [robust, extendable] foundations of the latter, which is supported by [broad, diverse, deep] data from [neuroscience, psychology], as well success in real world advanced [science, engineering] applications? 10Aug2023 This webPage has not yet been worked on. It will touch on one of three questions of this webSite as mentioned in the Introduction :
  • How difficult would it be to augment "Transformer Neural Networks" (TrNNs) with Grossberg's [concept, architecture]s, including the emergent systems for consciousness? Perhaps this would combine the scalability of the former with the [robust, extendable] foundations of the latter, which is supported by [broad, diverse, deep] data from [neuroscience, psychology], as well success in real world advanced [science, engineering] applications? 10Aug2023 I haven't yet provided content for this webPage. It does touch on one of three questions of this webSite as mentioned in the Introduction : Menu directory status & updates copyrights Celebrating 20 years of neural networks!
    1. IJCNN07 Orlando Florida USA - Publicity Chair, Guest Editor for the Neural Networks Special Issue
    2. IJCNN06 Vancouver BC Canada - International Liaison
    3. Menu directory status & updates copyrights
    4. Probably the most important section of this webPage is "Computations with multiple RNA strands". Most other sections provide context.
    5. extracellular - callerID-SNNs (Spiking Neural Networks), as introduced in another webPage
    6. 4-value logic (Colin James' short commentaries) when looking at RNA strands (of double-strand DNA sequences), [A,T,G,C(U)] does look like a 4-value encoding. Does this bring up the subject of 4-value logic, which is not [complete, optimal] in a normal boolean sense. But since ?date?, logicians have worked away from the limelight on this subject, and other forms of logic. Fuzzy logic is well-know, and has its own "Fuzzy Systems" area. Fuzzy Systems are one of three main original pillars of Computational Intelligence (CI), along with [evolutionary computation, neural networks]. (?? other logic approaches I have looked at very briefly, then have fogotten)...
    7. bitShifts (like hexadecimal microprocessor machine code) for time series following. This is considered in the callerID-SNNs project. 13Dec2023 (https://en.wikipedia.org/wiki/Transfer_RNA)

    8. 2006 MindCode WCCI2006 Vancouver "Howell 060215 Genetic specification of neural networks" :
    9. 2015 - 2020 MindCode
    10. voice musings of [scattered, random] thoughts - This really is just me, "arm waving and yapping", trying to identify [missing, lost] items. Does biology have "[relative, absolute] addressing" (relative - local proximity on same DNA or RNA strand, absolute - address may even be on different chromosome)? I don't remember any references mentioning that possibility. In a previous sub-section of this webPage, I have provided a few (incomplete) points on addressing. I have a lot of [read, program]ing to do here
      While I have long been a fan of the work of Stephen Grossberg and his colleagues, I was very surprised with his 2021 Book
      "Conscious Mind, Resonant Brain". (This link shows a menu that lists details of many themes from his book, plus commentary on well-known concepts of consciousness.) His book went far beyond my awareness of his work (obviously I was horribly out of date). [Right, wrong, true, false], it also goes far beyond any other [concept, work] that I am aware of in explaining how [neurons, the brain] work. The results are not simple, nor are they amenable to the normal "yap, wave your arms" that we all like so much. Maybe that's why it's not so [popular, well known]. To me, concepts of consciousness that do not [emerge from, work with] non-conscious processes, and which do not ellucidate mechanisms, are not satisfying, even though they can still be pragmatically useful.
      In any case, I will work on Grossberg's concepts in the future, and apart from providing a simple "figure-captions-based" thematic overview of his work, my only other comment is on the subject of consciousness (below).
      There are only two concepts of consciousness with which I am comfortable, biologically based concepts from Grossberg and colleagues, and the late John Taylor's "advanced control theory" concepts for consciousness (linked webPage not built yet 07Nov2023). But the latter is not amenable at the present time with the directions of MindCode, with a special emphasis on genetics. I did do a very [quick, incomplete] commentary on consciousness concepts, and a simple overview of [definitions, models] of consciousness.
    11. Glenn Borchardt's concept of infinity (one example application), with a few voice comments on how to avoid one trap of self-limiting thinking.
      Of possible interest to geologists: Puetz, Borchardt 150925 Quasi-periodic fractal patterns in geomagnetic reversals, geological activity, and astronomical events.pdf
    12. Howell 2006 "Genetic specification of recurrent neural networks" (draft version of my WCCI2006 conference paper)
    13. MindCode 2023 description
    14. MindCode 2023 program coding (QNial programming language) this is a simple one-line listing of each operator for each file
    15. callerID-SNNs Introduction (this webPage)
    16. callerID-SNNs program coding (QNial programming language)
    17. bash library: file operations used extensively, sometimes hybridized with the QNial programming language All of these are very incomplete, but the lists are a handy back-reference so that I don't forget ideas. LibreOffice documents of .odt file format. These are in original form in the directory Mind2020, and while I intend to convert them to html and may update them, I have not done so as of 20Nov2023.
    18. Introduction - Conceptual pseudo-basis for MindCode 2020 old description of MindCode
    19. MindCode components
    20. Historical [DNA, Protein, Evolutionary Computing, ANN] hybrid basis for epiDNA-NNs
    21. MindCode - arbitrary selections from Multiple Conflicting Hypothesis
    22. Assumed rules of the game
    23. Questions, not answers
    24. Static epiDNA-NN
    25. Dynamic epiDNA-NN coding
    26. [Neurological, biological] basis for epiDNA coding
    27. Ontogeny
    28. Specialized epiDNA-NNs for MindCode
    29. Hybrids of [algorithms, conventional computing, ANNs, MindCode] Menu directory status & updates copyrights Menu directory status & updates copyrights Menu directory status & updates copyrights Scientific Integrity, the 2021 Turing Lecture, and the 2018 Turing Award for Deep Learning AI Blog
      @SchmidhuberAI This is a point-for-point critique of ACM's justification of the ACM A. M. Turing Award for deep learning, as well as a critique of the Turing Lecture given by the awardees (published by ACM in July 2021). 2015 survey of deep learning[DL1] June 2020 article[T20a][R12] (see Executive Summary I, V, II, XII, XIX, XXI, XIII, XIV, XX, XVII). (A) speech recognition, (B) natural language processing, (C) robotics, (D) computer vision, (VII) medicine, astronomy, materials science. A, B, C, D, VII, XVII, VI, XVI). II, V, XX, XVIII) with Dr. Bengio & Dr. Hinton (see Sec. XVII, I). I respond to LBH's recent ACM article (July 2021). expands material in my Critique of the 2019 Honda Prize[HIN] (~3,000 words). Abstract & Outline (~300 words), Introduction (~300 words), Critique of LBH's ACM article (Turing Lecture) of July 2021[DL3a] Executive summary of what's wrong with ACM's laudation (~1,000 words), 21 comments on 21 claims by ACM (~8,000 words), Conclusion and Acknowledgments (~2,000 words). All backed up by over 250 references (~9,000 words). The text contains numerous hyperlinks to relevant overview sites from the AI Blog. science is self-correcting."[SV20] they are mine or other people's.[DL1-2][HIN][NASC1-9] The present page is offered as a resource for all good computer scientists who share this inclination. and to fight plagiarism, collusion rings,[LIT21] and systemic academic corruption in all of their more and less subtle forms.[FAKE] Sec. 2 LBH's 2021 ACM article[DL3a] which necessitated an extension of the first version of this post.[T20a][R12] ACM's official justification[T19] of the 2018 A.M. Turing Award[R1] After the Executive Summary in Sec. 3, Sec. 4 will split ACM's full text[T19] into 21 parts I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, XIX, XX, XXI. Most of the critiques are based on references to original papers and material from the AI Blog.[AIB][MIR][DEC][HIN] publishing yet another misleading overview of the field, this time based on LBH's Turing Lecture.[DL3a] LBH's well-known earlier omissions.[DLC][HIN][T20a] LBH claim to "briefly describe the origins of deep learning"[DL3a] without even mentioning the world's first working deep learning nets by Ivakhnenko and Lapa in 1965[DEEP1-2][R8] (see Sec. II). this class of methods was pioneered in 1991[UN-UN2] (see Sec. II, III). Highway Net, the first really deep feedforward NN.[HW1-3] (see Sec. D, VI). were all driven by my lab:[MOST] In 1991, I had the first very deep NNs based on unsupervised pre-training;[UN-UN2] LSTMs brought essentially unlimited depth to gradient-based supervised recurrent NNs;[LSTM0-17] later our Highway Nets[HW1-3] brought it to feedforward NNs. from 2007[LSTM4,14] based on LSTM[LSTM0-6] (1990s-2005) and CTC (2006).[CTC] our CTC-LSTM-based speech recognition (not that of Hinton) had been on most smartphones for years[GSR][GSR15-19][DL4] (see Sec. A, VI, XI, XV). Similarly for machine translation (see Sec. B). LBH cite Hinton (2012) for "dropout" without mentioning that dropout is just a variant of Hanson's 1990 stochastic delta rule[Drop1-2] (see Sec. XIV). von der Malsburg who introduced ReLUs in 1973[CMB] (see Sec. XIV). called AlexNet,[GPUCNN4] without mentioning that our earlier groundbreaking deep GPU-based DanNet[GPUCNN1-3,5-8][DAN] did not need ReLUs at all to win 4 earlier object recognition competitions and to achieve superhuman results already in 2011[GPUCNN1-8][R5-6] (see Sec. XIV). XVIII). already in 1965[DEEP1-2][R8] (see Sec. II). earlier fast weights of von der Malsburg (1981) and Feldman (1982).[FAST,FASTa-b][FWP] described in the 1991-93 papers on Fast Weight Programmers and linear Transformers[FWP0-1,6] (see Sec. XVI, XVII-2). dedicate an extra section to attention-based Transformers,[TR1-6] citing Bengio's team (2014) for "soft attention"[ATT14] without citing the much earlier original work of 1991-1993 on soft attention and linear Transformers[FWP,FWP0-2,6][ATT] (see Sec. XVII-1, XVI). LBH claim that Bengio's team[NPM] of text compression[SNT] (see Sec. XVI, XVII-1). LBH cite Bengio's 2014 paper on Generative Adversarial Networks (GANs)[GAN0-1] without mentioning that GANs are instances of the Adversarial Curiosity Principle of 1990[AC90-20][MIR](Sec. 5) (see Sec. XVII). In summation, LBH have repeatedly chosen to ignore the previous well-known critiques[DLC][HIN][T20a] and deep learning surveys,[DL1-2] and deep learning (e.g., Sec. I), ACM lauds Numerous references can be found under the relevant section links I-XXI which adhere to the sequential order of ACM's text[T19] Sec. II: it became really deep in 1991 in my lab, unsupervised pre-training of NNs, supervised LSTM. Sec. I contains 4 subsections A, B, C, D A: Speech Recognition (see also Sec. VI & XI & XV): The first superior end-to-end neural speech recognition combines two methods from my lab: LSTM (1990s-2005) and CTC (2006), which were Hinton (2012) and Bengio (XV) our revolutionary CTC-LSTM which was soon on most smartphones. Sec. B: Natural Language Processing (see also Sec. VI & XI & XVI): (soon used for several billions of was also based on our LSTM. Sec. C: Robotics. most visible breakthroughs Sec. D: Computer Vision XVIII & XIV & XI & VI) and applied to speech. All before LeCun's CNN work (XVIII). deep NNs pre-training (in contrast to Hinton's claims). Our DanNet was the first CNN fast & deep enough for superior computer vision in 2011, winning 4 image recognition contests in a row is an open-gated version of our earlier Highway Nets. Sec. XIV: deep & fast CNN (where LeCun participated), Sec. XI: ACM mentions GPU-accelerated NNs deep GPU-NN of 2010 debunked unsupervised pre-training (introduced by myself in 1991 and later championed by Hinton), and our GPU-CNN of 2011 (DanNet) was the first XVIII: Fukushima and Waibel (see Sec. D). VII: ACM explicitly mentions medicine and first to win medical imaging competitions Sec. XII & XIX & XXI: Modern backpropagation XIII & II & V III & IX & X & XX): Sec. XX: ACM credits LeCun for work on Sec. XXI: ACM credits LeCun for work on XV: ACM credits Bengio for hybrids of NNs and probabilistic models of sequences. CTC-LSTM A & B). XVI: ACM We started this in 1990-93 long before LBH Sec. XVII: Artificial Curiosity vanishing gradients (1991), metalearning (1987), unsupervised pre-training (1991), compressing or distilling one NN into another (1991), learning sequential attention with NNs (1990), fast weight programmers using and other topics.[R2-R6] Sec. IV is on Turing (1936) and his predecessors Critique of LBH's ACM article (Turing Lecture) of July 2021. Sec. Conclusion: In the recent decade of deep learning, (speech recognition, language translation, etc.) on billions of devices (also healthcare applications) Sec. II & III & V & XII & XIII & XVII & XIV & XIX & XX & XXI. In what follows, ACM's full text [T19] is split into 21 parts I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, XIX, XX, XXI.

      Critique of 2018 Turing Award LBH and their co-workers have contributed certain useful improvements of existing deep learning methods.[CNN2,4][CDI][LAN][RMSP][XAV][ATT14][CAPS] (1965),[DEEP1-2][R8] modern backpropagation (1970),[BP1-2][R7] architectures of recurrent NNs (1943-56)[MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC90,90b][AC20] unsupervised pre-training for deep NNs (1991),[UN1-2] vanishing gradients (1991)[VAN1] & Long Short-Term Memory or LSTM (Sec. A), GPU-accelerated NNs (2004),[GPUNN][DAN][DAN1][GPUCNN5] NNs with over 100 layers (2015),[HW1-3][R5] transformer-like[TR1-6][FWP] attention[FWP][ATT] through fast weight programmers (1991).[FWP0-2,6] [DL1-2][R2-R8] Often LBH failed to cite essential prior work, even in their later surveys.[DL3,DL3a][DLC][HIN][MIR](Sec. 21)[R2-R5, R7-R8] This may explain some of ACM's misattributions.[T19] II & III & V & XIII & X & XVII & XII & XVIII & XX. The deep NNs By the 2010s,[DEC] they were academia and industry,[DL4] mentioned by ACM (labeled as A, B, C, D) below: Long Short-Term Memory or LSTM (1990s-2005)[LSTM0-6] vanishing gradient problem student Sepp Hochreiter in 1991.[VAN1] This happened long before the similar work of Bengio (see Sec. XVII).[MIR] (Sec. 3,Sec. 4) LSTM was refined with my student Felix Gers[LSTM2] through "forget gates" based on end-to-end-differentiable fast weights.[MIR](Sec. 8)[FWP,FWP0-1] (A2) Connectionist Temporal Classification by my student Alex Graves et al. (2006).[CTC] Our team successfully applied CTC-trained LSTM to speech in 2007[LSTM4] (also with hierarchical LSTM stacks[LSTM14]). Markov models (HMMs)[BW][BRI][BOU] (Sec. XV). Hinton et al. (2012) still used the old hybrid approach[HYB12] and did not compare it to CTC-LSTM. became the first recurrent NN (RNN) to win international competitions. He later reused our end-to-end neural speech recognizer[LSTM4][LSTM14] as a postdoc in Hinton's lab.[LSTM8] CTC-LSTM dramatically improved Google's speech recognition.[GSR][GSR15][DL4] on-device speech recognition[GSR19] (not any longer on the server) LSTM[MIR](Sec. 4) (see Sec. VI & XI & XV). of text[SNT] (see Sec. XVI). In 2001, we showed that LSTM can learn languages unlearnable by traditional models such as HMMs,[LSTM13] See also Sec. VI & XI & XV. tailored by Bengio's team.[ATT14][FWP] However, such attention mechanisms also have their roots in my lab (1991);[FWP][FWP0-2,6] see Sec. XVI. C. Robotics & RL etc. Since 2003, our team has used LSTM for Reinforcement Learning (RL) and robotics.[LSTM-RL][RPG][LSTMPG] In the 2010s, For example, in 2018, a PG-trained LSTM was the core of OpenAI's famous Dactyl which learned to control a dextrous robot hand without a teacher.[OAI1][OAI1a] beat a pro player in the game of Starcraft, which is theoretically harder than Chess or Go[DM2] in many ways, using Alphastar whose brain has a deep LSTM core trained by PG.[DM3] OpenAI Five which learned to defeat human experts in the Dota 2 video game (2018).[OAI2] Bill Gates called this a "huge milestone in advancing artificial intelligence".[OAI2a][MIR](Sec. 4)[LSTMPG] Apart from A, B, C above, in healthcare, chemistry, molecular design, lip reading, speech synthesis,[AM16] predicting what's going on in nuclear fusion reactors, and so on.[DEC][DL4] was being used for LSTM (only 5% for the CNNs of Sec. D).[JOU17] Apparently the first LSTM journal paper[LSTM1][R5] is now the most frequently cited D. Computer Vision was revolutionized in the 2010s by a particular feedforward NN called the convolutional NN (CNN).[CNN1-4] The basic CNN architecture with convolutional and downsampling layers is due to Fukushima (1979).[CNN1] The popular downsampling variant called max-pooling was introduced by Weng et al. (1993).[CNN3] In 1987, NNs with convolutions were combined by Waibel with weight sharing and backpropagation.[CNN1a] Waibel did not call this CNNs but TDNNs. LeCun's team later contributed improvements of CNNs, especially for images[CNN2,4] (see Sec. XVIII). Finally, my own team showed in 2010[MLP1] unsupervised pre-training is not necessary to train deep NNs, contrary to claims by Hinton[VID1] who said that "nobody in their right mind would ever suggest" this. Then we Our fast GPU-based CNN of 2011[GPUCNN1] known as DanNet[DAN,DAN1][R6] CNNs of 2006.[GPUCNN] winning four of them in a row (15 May 2011, 6 Aug 2011, 1 Mar 2012, 10 Sep 2012).[GPUCNN5] at IJCNN 2011 in Silicon Valley, DanNet blew away the competition and achieved the first superhuman visual pattern recognition[DAN1] in an international contest (where LeCun's team took a distant second place, with DanNet was also the first deep CNN to win: a Chinese handwriting contest (ICDAR 2011), an image segmentation contest (ISBI, May 2012), CVPR paper on DanNet[GPUCNN3] of Hinton's student Krizhevsky won the ImageNet[IM09] 2012 contest[GPUCNN4-5][R6] (now also without unsupervised pre-training, citing DanNet). Our CNN image scanners were 1000 times faster than previous methods.[SCAN] The VGG network (ImageNet 2014 winner)[GPUCNN9] and other highly cited CNNs[RCNN1-3] further extended the work of 2011.[MIR](Sec. 19) ResNet, the ImageNet 2015 winner[HW2] (Dec 2015) which currently gets more citations per year[MOST] Highway Net (May 2015).[HW1-3][R5] The Highway Net is actually the feedforward net version of vanilla LSTM.[LSTM2] It was the first working, really deep feedforward NN with hundreds of layers (previous NNs had at most a few tens of layers). See also Sec. XVIII & XIV & XI & VI.

      Critique of 2018 Turing Award appeared long before the 1980s. were proposed already in the 1940s/50s[MC43][K56] (but don't forget prior work in physics since the 1920s[L20][I25][K41][W45]). deep convolutional NN architecture was proposed in the 1970s.[CNN1] NNs without hidden layers learned in 1958[R58] regression and the method of least squares[DL1-2]). about deeper adaptive NNs[R61,R62] layers (already containing the now popular multiplicative gates).[DEEP1-2][DL1-2] A paper of 1971[DEEP2] highly cited method which was still popular in the new millennium,[DL2] especially in Eastern Europe, where much of Machine Learning was born. Ivakhnenko did not call it an NN, but that's what it was.[MIR](Sec. 1)[R8] LBH failed to cite this. XIII & III & V & VIII & IX & X. LBH & co-authors, e.g., Sejnowski[S20] (see Sec. XIII). It goes more or less like this: "In 1969, Minsky & Papert[M69] researchers took a fresh look at the problem in the 1980s."[S20] However, as mentioned above, the 1969 book[M69] addressed a "problem" of Gauss & Legendre's shallow learning (~1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method.[DEEP1-2][DL2] Minsky was apparently unaware of this and failed to correct it later.[HIN](Sec. I) (but see a 1989 paper[MOZ]). However, it became really deep in 1991 in my lab,[UN-UN3] which has See Sec. 1 of the overview:[MIR] First Very Deep NNs, Based on Unsupervised Pre-Training (1991). "Very Deep Learning" tasks of depth > 1000.[UN2][DL1][UN] (By 2003, LSTM variants successfully dealt with language problems of depth up to 30,000[LSTM17] more.) drove the shift from unsupervised pre-training to purely supervised learning (1991-95; 2006-10).[HIN](Sec. II)[MIR] (Sec. 19) III. Note that LSTMs brought essentially unlimited depth to supervised recurrent NNs; Highway Nets[HW1-3] brought it to feedforward NNs.[MOST]

      Critique of 2018 Turing Award by others (Sec. III).[DLC][DEEP1-2][BP1][DL1-2][R7-R8][R2-R4] deep learning multilayer perceptrons (1965),[DEEP1-2][R8] modern backpropagation (1970),[BP1,2][R7] architectures of recurrent NNs (1943-56)[MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC90,90b][AC20] unsupervised pre-training for deep NNs,[UN1-2] the vanishing gradient problem (1991)[VAN1] & solutions to it (Sec. A), GPU-accelerated NNs (2004),[GPUNN][GPUCNN5] and other foundations.[DL1-2][R2-R8] Often LBH failed to cite essential prior work.[DLC][HIN][MIR](Sec. 21) II & V & XIII & IX & X & XVII & XII & XVIII & XX & I. deeplearning.net which until 2019 advertised deep learning as "moving beyond shallow machine learning since 2006",[DL7] referring to Hinton's[UN4] and Bengio's[UN5] we had this type of deep learning already in 1991;[UN][UN1-2] see Sec. II & XVII (5). Not to mention Ivakhnenko's even earlier supervised layer-wise training of deep NNs[DEEP1-2] which Hinton,[UN4] Bengio,[UN5] and LBH[DL3,DL3a] did not cite either. See Sec. X.

      Critique of 2018 Turing Award my comments systematically track the sequential order of ACM's claims.[T19]

      ACM's statement on Turing is greatly misleading, like some of its other statements.[T19] any type of computation-based AI.[GOD][BIB3][MIR](Sec. 18)[GOD21,21a] Much of early AI in the 1940s-70s was actually about theorem proving[ZU48][NS56]

      In 1936, Turing Turing Machine.[TUR] He rederived the above-mentioned result,[CHU][TUR][HIN][GOD21,21a][TUR21][LEI21,21a] In the same year of 1936, Emil Post published yet another independent universal model of computing,[POS] my reply to Hinton who criticized my website on Turing without suggesting any fact-based corrections.[HIN]) open problem "P=NP?" in his famous letter to John von Neumann (1956).[GOD56][URQ10] Likewise, Konrad Zuse (1910-1995) created the world's first working programmable general-purpose computer 1935-41. His patent application of 1936[ZU36-38][Z36][RO98][ZUS21] predating Claude Shannon's 1937 thesis on digital circuit design.[SHA37] Zuse also created the first high-level programming language in the early 1940s.[BAU][KNU] conditional jump instruction.[RO98]

      Critique of 2018 Turing Award that learn internal representations (1965),[DEEP1-2][R8] modern backpropagation (1970),[BP1,2][R7] architectures of recurrent NNs (1943-56)[MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC][AC90,90b][AC10][AC20] unsupervised pre-training for deep NNs (1991),[UN1-2][UN] vanishing gradients (1991)[VAN1] & solutions to it (Sec. A),[LSTM0-17][CTC] (2004),[GPUNN][GPUCNN5] record-breaking deep supervised NNs (2010)[MLP1-2] and contest-winning deep CNNs (2011),[DAN][DAN1][GPUCNN5] NNs with over 100 layers (2015),[HW1-3][R5] transformer-like[TR1-6][FWP] attention[FWP][ATT] through fast weight programmers (1991),[FWP0-2,6] and more.[DL1-2][R2-R8] Often LBH failed to cite essential prior work.[DL3,DL3a][DLC][HIN][MIR](Sec. 21)[R2-R5,R7,R8,R11] II & I & III & XIII & X & XVII & XII & XVIII & XX.

      Critique of 2018 Turing Award "advances in natural language processing" and in speech supervised NNs and CNNs achieved by our group 2010-2011[MLP1-2][DAN][DAN1][GPUCNN5][R6] and through Highway Net-like NNs (2015),[HW1-3][R5] although the principles of CNNs were invented and developed by others since the 1970s.[CNN1-4] See Sec. D & XVIII & XIV as well as Sec. 4 & Sec. 19 of the overview.[MIR]

      Critique of 2018 Turing Award DanNet[DAN][DAN1][GPUCNN5] the first NN to win a medical imaging contest through deep learning (Sept 2012, on cancer detection).[GPUCNN5,8] and were able to greatly improve steel defect detection.[ST] All of this happened before the similar GPU-accelerated AlexNet of Hinton's student Krizhevsky won ImageNet 2012.[GPUCNN5][R6] mitosis detection.[MGC][GPUCNN5,8] approach of D & XI).

      Critique of 2018 Turing Award without citing them.[DL1][DLC][HIN][R2-R4][R7-R8] V & XII & XIX & II & III & XIII & XVII & X & I.

      Critique of 2018 Turing Award who failed to cite them, even in later work.[HIN][DLC][DL1-2][DEEP1-2][CMB][R7-R8] See Sec. II & III & XIII & V & X & XIV & I.

      Critique of 2018 Turing Award first introduced to Machine Learning by Dechter (1986), and to NNs by Aizenberg et al (2000).[DL2] To my knowledge, LBH have never cited them. (Margin note: our 2005 paper on deep RL[DL6,6a] was the first machine learning LBH started talking about "deep learning ... moving beyond shallow machine learning since 2006",[DL7] referring to their unsupervised pre-training methods of 2006. See Sec. III. others built careers on this notion long before LBH recognized this.[DEEP1-2][CNN1][HIN][R8][DL1][DLC] Even deep learning through unsupervised pre-training was introduced by others.[UN1-3][R4][HIN](Sec. II) II & III & XIII & V & I.

      Critique of 2018 Turing Award ignored by LBH's papers[HIN][R7-R8][R2-R5] (see Sec. V & II & III & I & XIII & XII & XIX & X & XVII).

      ACM correctly mentions advancements through GPUs. The first to use GPUs for NNs were Jung & Oh (2004),[GPUNN][GPUCNN5] made GPU-based NNs fast and deep enough an important benchmark record,[MLP1-2] unsupervised pre-training (pioneered by myself in 1991) is not necessary to train deep NNs, contrary to Hinton's claims.[VID1] our CNNs were deep and fast enough[DAN][DAN1][GPUCNN5] vision (explicitly mentioned by ACM) for the first time[R6] (see Sec. D).

      Furthermore, by the mid 2010s, speech recognition and machine translation (explicitly mentioned by ACM) were actually dominated by LSTM and CTC of our team.[LSTM1-4][CTC] In particular, as mentioned in Sec. A, such as HMMs.[BW][BOU][BRI][HYB12] As mentioned in Sec. B and XVI, the first superior end-to-end neural machine translation was also based on LSTM.

      Critique of 2018 Turing Award ACM's statement is "less wrong" than Honda's[HIN](Sec. I) but still (and apparently even other award committees[HIN](Sec. I) backpropagation by Rumelhart et al. (1985-86)[RUM] (1982).[BP2] And the article[RUM] even failed to mention Linnainmaa, the inventor of this famous algorithm for credit assignment in networks (1970),[BP1] Kelley already had a precursor thereof in the field of control theory;[BPA] see also later work of the early 1960s.[BPB][BPC][R7] internal representations in hidden layers of NNs.[RUM] But this was essentially just an experimental analysis of a known method.[BP1-2] And history of backpropagation can be found at Scholarpedia[DL2] and in my award-winning survey.[DL1] Also see Sec. XIX, II.

      Some claim that "backpropagation is just the chain rule of Leibniz (1676) & L'Hopital (1696)." No, it is the efficient way of applying the chain rule to big networks with differentiable nodes (there are also many inefficient ways of doing this). It was not published until 1970.[BP1] recent debate:[HIN] It is true that in 2018, Hinton[AOI] Rumelhart[RUM] with the "invention" of backpropagation. for "creating" the method and for other things he didn't do.[HIN] Neither in a popular book[AOI] nor in other recent work[DL3,DL3a] did he cite Linnainmaa (1970),[BP1] the true creator.[BP4-5] that his 2015 survey[DL3] does cite Werbos (1974) who however described the method correctly only later in 1982[BP2] and also failed to cite Linnainmaa[BP1] (compare Amari's work of 1977[BP6]). Linnainmaa's method was well-known.[BP5][DL1-2][DLC] It wasn't created by "lots of different people" as Hinton suggested[AOI][HIN][R11] one person who published first[BP1] and therefore should get the credit.

      Critique of 2018 Turing Award Boltzmann Machine (BM)[BM] a learning.[HIN] Recently, however, I learnt through a reader that even the BM paper[BM] did not cite prior relevant work by Sherrington & Kirkpatrick[SK75] and Glauber.[G63] (Compare related work.[H86][H88][S93]) multilayer perceptrons with arbitrarily many layers.[DEEP1-2][HIN] Sec. II V & X.[MIR](Sec. 1)[R8]

      As mentioned in Sec. II, Sejnowski's rather self-serving "history of deep learning" [S20] claims: In 1969, Minsky & Papert[M69] at the problem in the 1980s."[S20] However, the 1969 book[M69] addressed a "deep learning problem" (a limitation of Gauss & Legendre's shallow learning around 1800[DL1-2]) that had already been solved four years prior (see Sec. II), also in the 1970s, especially outside of the Anglosphere.[DEEP2][BP6][CNN1][DL1-2]

      Critique of 2018 Turing Award Dropout is actually a variant of Hanson's much earlier stochastic delta rule (1990).[Drop1-2] Hinton's 2012 paper and his later patent did not cite this either. as we showed already in 2011 in a contest where LeCun's team participated as well,[DAN1] Sec. D above. Back then, the only really of deep CNNs through GPUs.[GPUCNN1,3,5][R6] Already before ImageNet 2012,[R6] fast deep CNN called DanNet a monopoly on winning computer vision competitions.[GPUCNN5] It more than "halved the error rate for object recognition" (ACM's wording) in a contest already in 2011[GPUCNN2][DAN,DAN1][R6] long before the similar system of Hinton's student. See Sec. D as well as Sec. 19 of the overview.[MIR]

      Critique of 2018 Turing Award since the late 1980s.[BW][BRI][BOU] LSTM (1990s-2005)[LSTM0-6] and CTC[CTC] (2006), which were applied to speech in 2007.[LSTM4][LSTM14] CTC-LSTM is end-to-end-neural and thus very different from (and superior to) the hybrid methods since the late 1980s.[BW][BRI][BOU][HYB12] See also Sec. A.

      Critique of 2018 Turing Award 5 years earlier, in 1995, we already had a similar, excellent neural probabilistic text model.[SNT] Bengio[NPM] characterizes it only briefly as "related" (see also Pollack's earlier work on embeddings of words and other structures[PO87][PO90]). In the 2010s, was actually the LSTM of our team,[LSTM0-6] which Bloomberg called the "arguably the most commercial AI achievement."[AV1][MIR](Sec. 4) See Sec. B. Bengio's team[ATT14] has indeed become important. For example, it helped to further improve Facebook's LSTM-based translation (see Sec. B). adaptive neural sequential attention: end-to-end-differentiable "soft" attention in the latent space of Fast Weight Programmers (FWPs),[FWP2][FWP] and "hard" attention (in observation space) in the context of RL[ATT][ATT0-1] (1990). attention-based Transformers[TR1-6] are FWPs of 1991[FWP0-1] which have become a popular alternative to RNNs. My FWP of 1991[FWP0-1] (now often called keys and values for self-attention).[TR1-6][FWP] the 2010s,[DEC] Transformers[TR1-2] a traditional LSTM domain (see Sec. B). rapidly learn to solve quickly[LSTM13,17] linear Transformers or Performers[TR5-6] which are formally equivalent to my 1991 FWPs (apart from normalization).[FWP6][FWP] In 1993, I introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves.

      See[MIR](Sec. 9)[R4] for my related priority dispute on attention with Hinton. He was the reviewer of my 1990 paper[ATT2] his own work:[ATT3]

      Critique of 2018 Turing Award GANs[GAN0-1] (2010-2014) are actually a simple application[AC] of the adversarial curiosity (AC) principle from 1990[AC90,90b][AC20] (see also surveys[AC09-10]). This principle is now widely used for exploration in RL (e.g., Sec. C) and for image synthesis[GAN1] (also mentioned by ACM in Sec. XVIII). predictor NN minimizes its error, while the generator NN tries to make outputs that maximize this error: one net's loss is the other net's gain. 4 years before the GAN paper,[GAN1] a well-known 2010 survey[AC10] summarised the generative adversarial NNs of 1990 as follows: a whether the controller's (or generator's) output is in a given set.[AC20][AC] early adversarial machine learning settings[S59][H90] neither involved unsupervised NNs nor were about modeling data nor used gradient descent.[AC20]) Bengio et al. neither cited the original work[AC90,90b][AC20] nor corrected their erroneous claims[GAN1] about the other (1991).[PM1-2][AC20][R2][MIR](Sec. 5) Bloomberg,[AV1] their NIPS 2014 paper[GAN1] and some of the erroneous claims it made about my prior work.[AC20] Goodfellow eventually admitted that PM is adversarial (his paper[GAN1] still claims the opposite), but emphasized that it's not generative. However, the even earlier AC[AC90,90b][AC10][AC20] is both adversarial and generative (its generator contains probabilistic units[AC90] like in StyleGANs[GAN2]). When the authors[GAN1] I published one myself in the hopes of correcting the annals of history.[AC20] that they are instances of my earlier work.[R2][AC20] vanishing gradient problem,[MIR](Sec. 3)[VAN1] Bengio published his own,[VAN2] without citing Sepp. was settled in favor of Sepp.[VAN1] However, even after a common publication,[VAN3] Bengio published papers[VAN4][XAV] are poor indicators of truly pioneering work.[NAT1] (Margin note: Bengio states[YB20] that in 2018 he one must at least clarify it later,[DLC] Bengio also claims[YB20] that in 1995 my publications on exactly this topic date back to 1991-93.[UN0-2][UN] which I started in 1987[META1][META] long before Bengio that he did it before me.[R3] Bengio also writes[YB20] that in Regarding attention-based Transformers,[TR1-6] Bengio[DL3a] cites his own team (2014) for "soft attention" without citing my much earlier original work of 1991-1993 on soft attention and linear Transformers.[FWP,FWP0-2,6] Bengio has also heavily used our LSTM (see Sec. A-C), "gated recurrent units (GRU)"[LSTMGRU] for a variant of our vanilla LSTM architecture[LSTM2] (2000) which he did not cite although our work[LSTM2] was the one that introduced gated recurrent units. In addition, our team automatically evolved lots of additional LSTM variants and topologies already in 2009[LSTM7] without changing the name of the basic method. learn to count[LSTMGRU2] nor learn simple non-regular languages;[LSTMGRU2] they according to Google Brain.[LSTMGRU3]) unsupervised pre-training for deep NNs.[UN0-4][HIN](Sec. II)[MIR](Sec. 1) Hinton's paper[UN4] (2006) appeared long after my earlier work on this[UN0-2] the first NNs shown to solve very deep problems (see Sec. II above).[UN] It was published in 1991-92[UN1] when compute was about 1000 times more expensive than in 2006. survey (2015),[DL3][DLC] See also Sec. II & III. compressing or distilling one NN into another.[UN0-2][DIST1-2][MIR](Sec. 2) Hinton[DIST2] (2006) did not cite my much earlier original work on this (1991),[UN1][UN] not even in his later patent application fast weight programmers[FWP][FWP0-4a] through tensor-like outer products (1991-2016) and their motivation[FWP2][FWP4a][MIR](Sec. 8) (see also Sec. XVI above). learning sequential attention with NNs.[MIR](Sec. 9) Hinton[ATT3] (2010) our much earlier work on this[ATT1][ATT] although he was both reviewer and editor of my summary[ATT2] (1990; see Sec. XVI above).

      The ten priority disputes mentioned in the present Sec. XVII are not on the only ones.[R4] Remarkably, three of them are related to the 1991 paper[UN1][UN] which in many ways started what people now call deep learning, going beyond Most of them go back to work of 1990-91.[MIR] See Sec. I for additional related issues of credit assignment.

      Critique of 2018 Turing Award LeCun's team has made important contributions to CNNs since 1989.[CNN2,4] However, the basic CNN architecture with convolutional and downsampling layers is actually due to Fukushima (1979).[CNN1] NNs with convolutions were later (1987) combined by Waibel with weight sharing and backpropagation.[CNN1a] Waibel called this TDNN and All of this happened before LeCun's work on CNNs. See Sec. D above and Sec. 21 of the overview of our Annus Mirabilis 1990-1991.[MIR] at IJCNN 2011 in Silicon Valley, our DanNet[DAN][GPUCNN1-3] won the superhuman performance three times worse performance).[DAN1] Again see Sec. D. at ICPR 2012, our DanNet[GPUCNN1-3] won the medical imaging contest (Sept 2012, on detection of mitosis/cancer)[GPUCNN5,7,8] (before the similar AlexNet won ImageNet 2012[GPUCNN5][R6] and the similar VGG network[GPUCNN9] won ImageNet 2014). mitosis detection.[MGC][GPUCNN5,7,8] Many major companies are using it now. See Sec. D & VII. ACM also explicitly mentions speech recognition, speech synthesis,[AM16][DL1] All of these fields were heavily shaped in the 2010s by our non-CNN methods.[DL1][DL4][AM16][GSR][GSR15][GT16][WU][FB17] See Sec. A, B, VI, XI.

      Critique of 2018 Turing Award As mentioned in Sec. XII, backpropagation was actually proposed earlier as a learning method for NNs by Werbos (1982)[BP2-4] (see also Amari's work of 1977[BP6]). recent work.[DL3,DL3a][DLC] In 1960, Kelley already had a precursor of the algorithm.[BPA] Furthermore, many besides LeCun have worked "to speed up backpropagation algorithms"[DL1] (ACM's wording). More on the history of backpropagation can be found at Scholarpedia.[DL2][BP4]

      Critique of 2018 Turing Award However, "hierarchical feature representation" in deep learning networks is what Ivakhnenko & Lapa (1965)[DEEP1-2] (and also Fukushima[CNN1][DL2]) had long before LeCun. See Sec. D & II & XIII & V.

      Critique of 2018 Turing Award LeCun et al. neither cited the origins[BP1] (1970) of this widely used type of automatic differentiation for differentiable networks of modules[DL2][BP4-5][DLC] for such systems.[S80] See also Sec. XIX & XII. before LeCun who did not cite them. See also Pollack's even earlier relevant work.[PO87-90]

      (Furthermore, "complex networks of modules where backpropagation is performed" were the central theme of my much earlier habilitation thesis (1993).[UN2] For example, our adaptive subgoal generators (1991)[HRL0-2] were trained through end-to-end-differentiable chains of such modules.[MIR](Sec. 10) planning and reinforcement learning with recurrent neural world models (1990).[PLAN][MIR](Sec. 11) Same for my linear transformer-like fast weight programmers[FWP0-2][FWP][ATT][MIR](Sec. 8) since 1991 (see Sec. XVI) see "100 Authors against Einstein."[AH1] ad hominem attacks[AH2-3][HIN] "If you cannot dispute a fact-based message, attack the messenger himself."[HIN] award can ever change that.[HIN] and their co-workers have contributed useful improvements of deep learning methods.[CNN2,4][CDI][LAN][RMSP][XAV][ATT14][CAPS] whom they did not cite II, V, XII, XIX, XXI, XIII, XIV, XI, and XX, and 2). Sec. I, A, B, C, D, XVII, VI, and XVI). As emphasized earlier:[DLC][HIN] to self-correction,"[SV20] as is already the standard in other scientific fields. in popular science venues without peer review? For example, the narrator of a popular 2018 Bloomberg video[VID2] Germany and Switzerland (LSTM & CTC; see Sec. A) long before Hinton's methods. Similarly, in 2016, the NY Times published an article[NYT3] Google's original 2016 paper on Google Translate[WU] mentions LSTM over 50 times (see Sec. B). In ad hominem style,[AH2-3] claiming credit he doesn't deserve for many, many things",[NYT1] without LeCun also called the GANs of Bengio's team[GAN1] GANs are variations of my work in 1990.[AC90,90b][AC20][R2] According to Bloomberg,[AV2] Bengio has simply "denied my claims" without backing up his denial by any facts; see Sec. XVII. and forcefully contradict public figures who promote it."[FAKE] LBH, who called themselves the deep learning conspiracy,[DLC] Our LSTM paper[LSTM1] has got more citations than any paper by Bengio or LeCun,[R5] Hinton's most cited paper (2012) is the one on GPU-based CNNs.[GPUCNN4][R5] It follows our earlier work on supervised deep NNs (2010)[MLP1] unsupervised pre-training for deep NNs by myself [UN][UN0-3] and later championed by Hinton;[UN4][VID1] see Sec. D). Hinton (2012)[GPUCNN4] characterizes our deep and fast DanNet (2011)[GPUCNN1-3] as AlexNet won one;[R6] see Sec. D, XIV. The highly cited VGG network (2014)[GPUCNN9] Hinton's 2nd most cited paper[RUM][R5] of Hinton's paper,[RUM] adding citations for a book by Rumelhart & McClelland[R5]). Backpropagation is a previously invented method[BP1] whose origins of Ivakhnenko whom he has never cited;[DEEP1-2][R7-R8] see Sec. II, XIII. Bengio's 2nd most cited research paper is the one on GANs (2014),[GAN1] instances of my artificial curiosity (1990)[AC90,90b][AC20][R2] which he did not cite; see Sec. XVII. Hinton's highly cited papers on unsupervised pre-training for deep NNs (2006-)[UN4] by ours[UN0-2][UN] were preceded by Hanson's[Drop1-2] As recently as of 2021, ACM published yet another misleading deep learning "survey" by LBH,[DL3a] again heavily citing LBH without Consult the Executive Summary and Sec. I-XXI of this critique for more. So virtually all the algorithms that have attracted have their conceptual and technical roots in my labs in Munich and Lugano,[MOST] of deep learning MLPs since 1965[DEEP1-2] (see Sec. II, XX) and backpropagation (1960-70)[BPA][BP1] (see Sec. XIX, XII) and convolutional NNs since 1979[CNN1-4] (see Sec. XVIII, D). Our LSTM (1990s, see Sec. A, B; also for RL, 2003-, see Sec. C) → our Highway Net (May 2015) → ResNet (Dec 2015, see Sec. D). Our adversarial Artificial Curiosity (1990) → GANs (2010s, see Sec. XVII). our own unsupervised pre-training of deep NNs (1991, see Sec. II & III) for recurrent NNs in the 1990s → our LSTM (see Sec. A-C) and for feedforward NNs in 2010 → our DanNet (2011) → AlexNet (2012); VGG Net (2014) (see Sec. D). our LSTM brought essentially unlimited depth to supervised recurrent NNs in the 1990s; our Highway Nets[HW1-3] brought it to feedforward NNs in May 2015.[MOST] superior computer vision (2011, see Sec. D, XVIII), medical diagnosis (2012, see Sec. VII, XVIII), and many other applications.[DEC] speech recognition (with our CTC, 2007-15, see Sec. A), machine translation (2016, see Sec. B), robotics & video game players (2018-19, see Sec. C), and many other applications.[DEC] Fast Weight Programmers (1991, see Sec. XVI) are formally equivalent to linear Transformers (now popular in NLP). I, A, B, C, D, VII, XVIII.

      As mentioned earlier,[MIR](Sec. 21) it is not always clear[DLC] depth that really learned.[DEEP1-2][R8] Five years later, modern backpropagation

      Yes, this critique is also an implicit critique of certain other awards to LBH.[HIN] reddit.com/r/MachineLearning[R1-R12] (the largest machine learning forum with back then over 800k subscribers), many of them influenced by my overview.[MIR]

      Dr. LeCun himself is well aware of the challenges to scientific integrity in our field:[LECP] "... else cites."[LECP]

      Note that I am insisting on proper credit assignment not only in my own research field but also in quite disconnected areas,[HIN] as demonstrated by my numerous letters in this regard published in Science and Nature, e.g., on the history of aviation,[NASC1-2] the telephone,[NASC3] the computer,[NASC4-7] resilient robots,[NASC8] and scientists of the 19th century.[NASC9] AI scientists and AI historians equipped with artificial curiosity[SA17][AC90-AC20][PP-PP2]

      Creative Commons LicenseThanks to many expert reviewers for useful comments. Since science is about self-correction, let me know under juergen@idsia.ch if you can spot any remaining error. Many additional relevant publications can be found in my publication page and my arXiv page. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. J.  Schmidhuber (AI Blog, 2021). 3 decades of artificial curiosity & creativity. Our PDF. The first paper on planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks (more). PDF. More. PDF. PDF. PDF. PDF. (More on artificial scientists and artists.) IEEE link. PDF. With a brief summary of the generative adversarial neural networks of 1990[AC90,90b][AC20] (more). Preprint arXiv/1906.04493. Link. Link. [AIB] J. Schmidhuber. AI Blog. Includes variants of chapters of the AI Book. Blog of Werner Vogels, CTO of Amazon (Nov 2016): [ATT] J. Schmidhuber (AI Blog, 2020). 30-year anniversary of end-to-end differentiable sequential neural attention. Plus goal-conditional reinforcement learning. We had both hard attention (1990) and soft attention (1991-93).[FWP] Today, both types are very popular. PDF. PDF. More. PS. (PDF.) arXiv/1409.0473, 2014-16. Bloomberg, May 15, 2018. Bloomberg, May 17, 2018. PDF. HTML. PDF. Precursor of modern backpropagation.[BP1-4] PDF. Link. PDF. First application of backpropagation[BP1] to NNs (concretizing thoughts in his 1974 thesis). [BP4] J. Schmidhuber (AI Blog, 2014; updated 2020). Who invented backpropagation? More.[DL2] English version: [CNN1+]. More in Scholarpedia. Link. [CNN1a] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. First application of backpropagation[BP1][BP2] and weight-sharing PDF. Spatial Averaging.[CNN1] PDF. PDF. PDF. PDF. Beijing, 2014. Preprint arXiv:1402.3511 [cs.NE]. J. Schmidhuber (AI Blog, 2021). 10-year anniversary. In 2011, DanNet triggered the deep convolutional neural network (CNN) revolution. Named 1st superhuman result in 2011.[DAN1] J. Schmidhuber (AI Blog, 2011; updated 2021 for 10th birthday of DanNet): First superhuman visual pattern recognition. our artificial neural network called DanNet [DEC] J. Schmidhuber (AI Blog, 02/20/2020, revised 2021). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s. The [DIST1] J. Schmidhuber, 1991.[UN-UN2] More. Deep Learning. HTML. [DL3a] Y. Bengio, Y. LeCun, G. Hinton (2021). Turing Lecture: Deep Learning for AI. Communications of the ACM, July 2021. HTML. Local copy (HTML only). [DL4] J. Schmidhuber (AI Blog, 2017). Our impact on the world's most valuable public companies: Apple, Google, Microsoft, Facebook, Amazon... By greatly improved (CTC-based) on-device speech recognition (on the phone, not the server) LSTM. PDF. J. Schmidhuber (AI Blog, Nov 2020). 15-year anniversary: 1st paper with "learn deep" in the title (2005). Our deep reinforcement learning & neuroevolution solved problems of depth 1000 and more.[DL6] Soon after its publication, everybody started talking about "deep learning." Causality or correlation? Web site deeplearning.net of Y. Bengio's MILA (2015, retrieved May 2020; compare the version in the Internet Archive), referring to Hinton's[UN4] and Bengio's[UN5] unsupervised pre-training for deep NNs[UN4] (2006) although this type of deep learning dates back to 1991.[UN1-2][UN] II & XVII & III. [DLC] J. Schmidhuber (AI Blog, June 2015). Critique of Paper by "Deep Learning Conspiracy" (Nature 521 p 436). arxiv:1312.5602. Link. Alphastar has a "deep LSTM core." arXiv:1808.03578, 2018. used LSTM over 4 billion automatic translations per day (The Verge, August 4, 2017); Facebook blog by J.M. Pino, A. Sidorov, N.F. Ayan (August 3, 2017) PDF. J.  Schmidhuber (AI Blog, 26 March 2021). alternative[FWP0-1] to recurrent NNs. the fast weights[FAST,FASTa] of Such Fast Weight Programmers[FWP0-6,FWPMETA1-7] can learn to memorize past data, e.g., by computing fast weight changes through additive outer products of self-invented activation patterns[FWP0-1] (now often called keys and values for self-attention[TR1-6]). The similar Transformers[TR1-2] combine this with projections linear Transformers or Performers[TR5-6] In 1993, I introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves. PDF. PDF. HTML. Pictures (German). PDF. Preprint: arXiv:1811.12143. PDF. PDF. Like [FWP0-2]. Preprint: arXiv:2003.08165. PDF. HTML overview. Linear Transformers Are Secretly Fast Weight Programmers. ICML 2021. Preprint: arXiv:2102.11174. Preprint: arXiv:2106.06295 (June 2021). PDF. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here. Preprint arXiv:2012.14905 [cs.LG], 2020. Report arXiv:2011.07831 [cs.AI], 2020. Google Research Blog, Sep 2015, see also Aug 2015 Google's speech recognition based on CTC and LSTM. Alphr Technology, Jul 2015, or 9to5google, Jul 2015 WIRED, Sep 2016, siliconANGLE, Sep 2016 Blog post, Internet Archive, 2010. A blog post describing the basic ideas[AC][AC90, AC90b][AC20] of GANs. Description of GANs that does not cite the original work of 1990[AC][AC90,AC90b][AC20][R2] (also containing wrong claims about Predictability Minimization[PM0-2][AC20]). Link. This was number 1 on Hacker News. Frankfurter Allgemeine Zeitung, 16/6/2021. Preprint arXiv/2005.14165. for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. PDF. ArXiv preprint. win four important computer vision competitions 2011-2012 before others won any PDF. HTML overview. competitor.[DAN1] This led to massive interest from industry. [GPUCNN3] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, July 2012. PDF. Longer TR of Feb 2012: arXiv:1202.2745v1 [cs.CV]. More. PDF. J. Schmidhuber (AI Blog, 2017; updated 2021 for 10th birthday of DanNet): History of computer vision contests won by deep CNNs since 2011. DanNet won 4 of them in a row before the similar AlexNet/VGG Net and the Resnet (a Highway Net with open gates) joined the party. Today, deep CNNs are standard in computer vision. PDF. PDF. first deep learner to win a medical imaging contest (2012). HTML. [HIN] J. Schmidhuber (AI Blog, 2020). Critique of Honda Prize for Dr. Hinton. Science must not allow corporate PR to distort the academic record. PDF. North-Holland, 1991. PDF. Extending TR FKI-129-90, TUM, 1990. PDF. PDF. Preprints arXiv:1505.00387 (May 2015) and arXiv:1507.06228 (July 2015). Also at NIPS 2015. The LSTM with forget gates[LSTM2] for RNNs.) Resnets[HW2] are a version of this where the gates are always open: g(x)=t(x)=const=1. Highway Nets perform roughly as well as ResNets[HW2] on ImageNet.[HW3] Highway layers are also often used for natural language processing, where the simpler residual layers do not work as well.[HW3] More. Link. arXiv:1512.03385 (Dec 2015). Residual nets are a version of Highway Nets[HW1] More. arxiv:1612.07771 (2016). Also at ICLR 2017. Preprint arXiv:1704.04760 PDF. PDF. arXiv:1607.06450, 2016. A New Publishing Model in Computer Science. Local copy (HTML only). [LEI21] J. Schmidhuber (AI Blog, 2021). 375th birthday of Leibniz, founder of computer science. Frankfurter Allgemeine Zeitung (FAZ), 17/5/2021. FAZ online: 19/5/2021. PDF. [LSTM1] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. PDF. Based on [LSTM0]. More. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. Preprint: arxiv:1506.07452. PDF. J. Schmidhuber (AI Blog, Dec 2020). 10-year anniversary of our journal paper on deep reinforcement learning with policy gradients for LSTM (2007-2010). Recent PDF. Preprint arXiv:1805.04908. Architectures. Preprint arXiv:1703.03906 J. Schmidhuber (AI Blog, 2020). 1/3 century anniversary of Searchable PDF scan (created by OCRmypdf which uses LSTM). HTML. better GP methods through Meta-Evolution. More. [MIR] J. Schmidhuber (AI Blog, 2019). Deep Learning: Our Miraculous Year 1990-1991. Preprint arXiv:2005.05744, 2020. Computation 22(12): 3207-3220, 2010. ArXiv Preprint. (AI Blog, Sep 2020). 10-year anniversary of supervised deep learning breakthrough (2010). No unsupervised pre-training. By 2010, when compute was 100 times more expensive than today, both our feedforward NNs[MLP1] J.  Schmidhuber (AI Blog, 2021). The most cited neural networks all build on work done in my labs. Foundations of the most popular NNs originated in my labs at TU Munich and IDSIA. Here I mention: (1) Long Short-Term Memory (LSTM), (2) ResNet (which is our earlier Highway Net with open gates), (3) AlexNet and VGG Net (both citing our similar earlier DanNet: the first deep convolutional NN to win image recognition competitions), Adversarial Artificial Curiosity), and (5) variants of Transformers (linear Transformers are formally equivalent to my earlier Fast Weight Programmers). Annus Mirabilis of 1990-1991.[MIR] Preprint arXiv:1611.01578 (PDF), 2017. [NASC1] J. Schmidhuber. First Pow(d)ered flight / plane truth. Correspondence, Nature, 421 p 689, Feb 2003. [NASC3] J. Schmidhuber. The last inventor of the telephone. Letter, Science, 319, no. 5871, p. 1759, March 2008. Correspondence, Nature, vol 483, p 541, March 2012, doi:10.1038/483541b. Letter, Science, vol 336, p 1639, June 2012. See also comment on response by A. Hodges (DOI:10.1126/science.336.6089.1639-a) [NASC6] J. Schmidhuber. Colossus was the first electronic digital computer. Correspondence, Nature, 441 p 25, May 2006. [NASC7] J. Schmidhuber. Turing's impact. Correspondence, Nature, 429 p 501, June 2004 [NASC8] J. Schmidhuber. Prototype resilient, self-modeling robots. Correspondence, Science, 316, no. 5825 p 688, May 2007. [NASC9] J. Schmidhuber. Comparing the legacies of Gauss, Pasteur, Darwin. Correspondence, Nature, vol 452, p 530, April 2008. HTML. Link. NY Times article NY Times article Learning Dexterous In-Hand Manipulation. arxiv:1312.5602 (PDF). arxiv:1912.06680. An LSTM composes 84% of the model's total parameter count. 2018. An LSTM with 84% of the model's total parameter count was the core of OpenAI Five. PDF. HTML. J. Schmidhuber (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990). This work also introduced high-dimensional reward signals, deterministic policy gradients for RNNs, the GAN principle Based on TR FKI-126-90 (1990).[AC90] More. PDF. Partially based on TR FKI-126-90 (1990).[AC90] Report arXiv:1210.0118 [cs.AI], 2015. One Big Net For Everything. Preprint arXiv:1802.08864 [cs.AI], Feb 2018. Preprint: arXiv:1809.01999. Github: World Models. minimization. TR CU-CS-565-91, Univ. Colorado at Boulder, 1991. PDF. More. 1991. PDF. More. PDF. More. arXiv:1112.5309 [cs.AI] First Experiments with PowerPlay. arXiv:1210.8385 [cs.AI]. [R1] Reddit/ML, 2019. Hinton, LeCun, Bengio receive ACM Turing Award. [R2] Reddit/ML, 2019. J. Schmidhuber really had GANs in 1990. [R3] Reddit/ML, 2019. NeurIPS 2019 Bengio Schmidhuber Meta-Learning Fiasco. [R4] Reddit/ML, 2019. Five major deep learning papers by G. Hinton did not cite similar earlier work by J. Schmidhuber. [R5] Reddit/ML, 2019. The 1997 LSTM paper by Hochreiter & Schmidhuber has become the most cited deep learning research paper of the 20th century. [R6] Reddit/ML, 2019. DanNet, the CUDA CNN of Dan Ciresan in J. Schmidhuber's team, won 4 image recognition challenges prior to AlexNet. [R7] Reddit/ML, 2019. J. Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970. [R8] Reddit/ML, 2019. J. Schmidhuber on Alexey Ivakhnenko, godfather of deep learning 1965. [R9] Reddit/ML, 2019. We [R11] Reddit/ML, 2020. Schmidhuber: Critique of Honda Prize for Dr. Hinton [R12] Reddit/ML, 2020. J. Schmidhuber: Critique of Turing Award for Drs. Bengio & Hinton & LeCun [R15] Reddit/ML, 2021. J. Schmidhuber's work on fast weights from 1991 is similar to linearized variants of Transformers Preprint arXiv/1311.2524, Nov 2013. Preprint arXiv/1703.06870, 2017. PDF. This experimental analysis of backpropagation did not cite the origin of the method,[BP1-4] also known as the reverse mode of automatic differentiation. Link. The Past, Present and Future of Artificial Intelligence. PDF. PDF. ACM's justification of the 2018 A.M. Turing Award (announced in 2019). WWW link. Local copy 1 (HTML only). Local copy 2 (HTML only). [T20a] J. Schmidhuber (AI Blog, 25 June 2020). Critique of 2018 Turing Award for Drs. Bengio & Hinton & LeCun. The first version of the present critique. Link. [TUR21] J. Schmidhuber (AI Blog, Sep 2021). Turing Oversold. It's not Turing's fault, though. J. Schmidhuber (AI Blog, 2021). 30-year anniversary. 1991: First very deep learning with unsupervised pre-training. Unsupervised PDF. 1992. Based on TR FKI-148-91, TUM, 1991.[UN0] PDF. approaches are now widely used. More. [UN2] J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here (depth > 1000). 2006. PDF. Link. [VAN1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TUM, 1991 (advisor J. Schmidhuber). PDF. More on the Fundamental Deep Learning Problem. PDF. [VAN4] Y. Bengio. Neural net language models. Scholarpedia, 3(1):3881, 2008. Link. Link. Youtube video [see 28:16]. But in 2010, our team showed[MLP1-2] unsupervised pre-training is not necessary Youtube video, 2018. Preprint arXiv:1609.08144 (PDF), 2016. Based on LSTM which it mentions at least 50 times. WWW link (retrieved 15 May 2020). Local copy (plain HTML only). a general, practical, program-controlled computer. PDF. J. Schmidhuber (AI Blog, 2021). 80th anniversary celebrations: 1941: Konrad Zuse completes the first working general computer, based on his 1936 patent application.

      Deep Learning: Our Miraculous Year 1990-1991 Menu directory status & updates copyrights 1991: Neural nets learn to program neural nets with fast weights (1991) - by Juergen Schmidhuber AI Blog
      Twitter: @SchmidhuberAI Traditionally this is done with recurrent NNs (RNNs) published.[FWP0-1] the fast weights of another NN (see Sec. 1). In 1991, one of them[FWP0-1] (now often called keys and values for self-attention; Sec. 2). The very similar Transformers[TR1-2] combine this with projections Transformers with linearized self-attention[TR5-6] to the 1991 Fast Weight Programmers[MOST] (see this tweet). In 1993, I also introduced the attention terminology[FWP2] now used in this context[ATT] (Sec. 4), and RNNs that program themselves (Sec. 3). famous vanishing gradient problem aka deep learning problem (analyzed a few months later in 1991[VAN1]) through additive fast weight changes (Sec. 5). additive neural activations of LSTMs / Highway Nets / ResNets[HW1-3] (Sec. 5) Annus Mirabilis of deep learning.[MIR] brand new, improved version[FWP6] of the 1991 fast weight update rule (Sec. 6). reinforcement learning through neuroevolution[FWP5] (2005-, Sec. 7), goal-conditioned policy generators (2022),[GGP] metalearning machines that learn to learn[FWPMETA1-9] (1992-2022, Sec. 8).

      Goedel Machine As I have frequently emphasized since 1990,[AC90][PLAN][META] artificial neural network (NN) universal self-referential formal systems,[GOD][GOD34] I built NNs whose outputs are changes of programs or weight matrices of other NNs[FWP0-2] (Sec. 1, 2, 3), their own weight change algorithms or learning algorithms[FWPMETA1-5] (Sec. 8). gradient descent procedure[BP1-4][BPA][R7]) can compute a direction in program space where one may find a better program,[AC90] better program-modifying program.[FWP0-2][FWPMETA1-5] started in 1965 layers.[DEEP1-2] Their activation functions were Kolmogorov-Gabor polynomials which include the now popular multiplicative gates,[DL1-2] fast weights. von der Malsburg was the first to explicitly emphasize the importance of NNs with rapidly changing weights.[FAST] The second paper on this was published by Feldman in 1982.[FASTa] The weights of a 1987 NN were sums of weights with a large learning rate and weights with a small rate[FASTb][T22] (but have nothing to do with the NN-programming NNs discussed below). Fast Weight Programmers (FWPs) were published in 1991-93[FWP0-2] (Sec. 1, 2, 3, 4). attention[ATT] (Sec. 4) and Transformers[TR1-6] (Sec. 2, 3, 4, 5).

      End-To-End Differentiable Fast Weights: NNs Learn to Program NNs (1991) on 26 March 1991, slow NN that learns by backpropagation[BP1-4] to rapidly modify the fast weights of another NN,[FWP0] essentially published in Neural Computation.[FWP1] attention[ATT] (Sec. 4) That is, I separated storage and control like in traditional computers, but in a fully neural way (rather than in a hybrid fashion[PDA1][PDA2][DNC]). Synthetic Gradients.[NAN1-5] recurrent NNs (RNNs) One of the FWPs of 1991[FWP0-1] is illustrated in the figure. There is A disadvantage addressed in Sec. 2 is that the slow net needs many output units if the fast net is large.

      Slow neural net programs fast neural net through additive outer products

      The Fast Weight Programmer[FWP0-1] depicted in Sec. 1 has a slow net unit for each fast weight. However, Section 2 of the same 1991 paper[FWP0] linear[TR5-6] Transformers[TR1-2] or attention[ATT] (compare Sec. 4). to the fast weight (which then may be normalized by a squashing function[FWP0]). The second order tensor products.[FWP0-3a] linear Transformers).[FWP6][TR5-6] The highly successful Transformers of 2017[TR1-2] can be viewed as a combination of my additive outer product fast weight principle[FWP0-2] NN-programmed fast weights (Sec. 5 & 1). linear Transformers (2020-21)[TR5-6] abandoned the softmax, essentially resurrecting the original 1991 system.[FWP0-1] Compare Sec. 6. go back at least to Hebb's informal rule (1949)[HE49] and Steinbuch's Learning Matrix around 1960.[ST61-63][AMH1-2][KOH72][LIT74][PAL80][KOS88] since 1991.[FWP0-3a][TR5-6] I offered the FWPs of 1991[FWP0-1] as an sequence-processing recurrent NNs (RNNs) (Sec. 1), the computationally most powerful NNs of them all.[UN][MIR](Sec. 0) Modern Transformers are also viewed as RNN alternatives, despite their limitations.[TR3-4] The slow net and the fast net of the 1991 system[FWP0-1] in Sec. 2 were feedforward NNs (FNNs), like most current Transformers.[TR1-6] I collapsed all of this into a single RNN that could rapidly reprogram all of its own fast weights through additive outer product-based weight changes.[FWP2] One motivation reflected by the title of the paper[FWP2] of the same size: O(H2) instead of O(H), where H is the number of hidden units. This motivation and a variant of the method was republished over two decades later.[FWP4a][R4][MIR](Sec. 8)[T22](Sec. XVII, item H3) See also our more recent work on FWPs since 2017,[FWP3-3a][FWPMETA7][FWP6] and compare a recent study.[RA21] 4. Attention terminology of 1993 End-to-End Differentiable Sequential Neural Attention 1990-93. Juergen Schmidhuber Today, everybody is talking about attention when it comes to describing the principles of Transformers.[TR1-2] The additive outer products[FWP0-1] of the Fast Weight Programmers described in Sec. 2 and Sec. 3 Similarly, the attention weights or self-attention weights (see also[FWP4b-d]) NN-programmed fast weights (Sec. 5).[FWP0-1], Sec. 9 & Sec. 8 of [MIR], Sec. XVII of [T22] 1993 paper[FWP2] which internal spotlights of attention Fast Weight Programmers.[FWP2][ATT] Apart from possible normalization/squashing,[FWP0] are additive (Sec. 1 & 2). do not suffer during sequence learning from the famous vanishing gradient by my brilliant student Sepp Hochreiter a few months later in his 1991 diploma thesis.[VAN1]

      LSTM and both of them dating back to 1991, our miraculous year of deep learning.[MIR] Basic Long Short-Term Memory[LSTM1] solves the problem by adding at every time step That is, the core of LSTM is operating in a linear additive activation space (ignoring LSTM's multiplicative gates).[LSTM1][VAN1][MIR](Sec. 4 & Sec. 8) Additive FWPs[FWP0-2] (Sec. 1 & 2), however, solve the problem through a dual approach, By favoring additive operations yielding non-vanishing first derivatives and error flow,[VAN1] Transformers[TR1-6] also follow the additive approach.[FWP0-2] (compare Sec. 2 and Sec. 4 on attention terminology since 1993).

      Highway Networks:
LSTM's traditional additive activation-based  approach<sup><small><small><a href=[LSTM1-13] is mirrored in the LSTM-inspired Highway Network (May 2015),[HW1][HW1a][HW3] the first working really deep It is essentially a feedforward version of LSTM[LSTM1] with forget gates.[LSTM2] Residual Net or ResNet[HW2] (Dec 2015). Remarkably, both of these dual approaches of 1991 have become successful. the mid 2010s,[DEC] major IT companies overwhelmingly used smartphones.[DL4] rapidly learn to solve quickly[LSTM13] while plain Transformers can't yet.[TR4] unsupervised pre-training of deep NNs.[UN0-UN2][MIR](Sec. 1) dates back to 1991[UN] Recent work of February 2021[FWP6] mechanisms[TR5-6] and Fast Weight Programmers[FWP0-2] (Sec. 2).[FWP4a][R4][MIR](Sec. 8)[T22](Sec. XVII, item H3) variants.[TR5-6] Building on previous work[FWPMETA7] on FWPs (Sec. 1, 2, 3, 8), we replace the 1991 elementary programming instruction based on additive outer products[FWP0-2] by a delta rule-like[WID] language modeling tasks.[FWP6] Our code is public. work of June 2021[FWP7] (also with Robert Csordas) points out that the original FWP formulation of 1991[FWP0-1] is more general than the one of linear Transformers: a slow NN continually reprograms the weights of a fast NN with Our code is public.

      Reinforcement learning robotino double pole balancer with neuroevolution for fast weights as shown in 2005 with my former postdoc Faustino Gomez[FWP5] (now CEO of NNAISENSE) Our 2005 paper on deep RL[DL6,6a] was actually the first machine learning numerous weights of large NNs through very compact codes.[KO0-2][CO1-4] Here we exploited that the Kolmogorov complexity or algorithmic information content of successful huge NNs may actually be rather small. Compressed Network Search[CO2] unsupervised pre-training.

      Recent work of 2022[GGP] with

      self-referential weight matrix My first work on metalearning machines that learn to learn was published in 1987.[META][R3] metalearning in a very general way. In references[FWPMETA1-5] since 1992, the slow NN and the fast NN (Sec. 1) are recurrent and identical. The RNN can see its own errors or reward signals called eval(t+1) in the image.[FWPMETA5]

      The 1993 FWP of Sec. 3[FWP2] also was an RNN RNN above,[FWPMETA1-5] it used outer products between key patterns and value patterns (Sec. 2) to manipulate used gradient descent in LSTM networks[LSTM1] instead of traditional functions of two variables[HO1] (more on LSTM and fast weights in Sec. 5). In 2020, Imanol et al. augmented an LSTM with an associative fast weight memory.[FWPMETA7] partially observable environments.[FWPMETA7] Our recent MetaGenRL (2020)[METARL10] meta-learns See the blog post of my PhD student Louis Kirsch. outer-product-like fast weights encoded in the activations of LSTMs.[FWPMETA6] variables[FWP2] (Sec. 3). VS-ML can also learn to implement the backpropagation learning algorithm[BP1-4] purely in the end-to-end differentiable forward dynamics of RNNs.[FWPMETA6]

      In 2022, we also published at ICML a modern self-referential weight matrix (SWRM)[FWPMETA8] based on the 1992 SWRM.[FWPMETA1-5] self-improvement (compare this tweet). A modern self-referential weight matrix (2022) based on the one of 1992 There is another version of this article This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Creative Commons License J.  Schmidhuber (AI Blog, 2021). 3 decades of artificial curiosity & creativity. Our PDF. The first paper on long-term planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks (more). PDF. First publication of what was later sometimes called the Hopfield network[AMH2] or Amari-Hopfield Network. The Hopfield network or Amari-Hopfield Network was published in 1972 by Amari.[AMH1] [ATT] J. Schmidhuber (AI Blog, 2020). 30-year anniversary of end-to-end differentiable sequential neural attention. Plus goal-conditional reinforcement learning. Schmidhuber Transformers with linearized self-attention (1991-93).[FWP] Today, both types are very popular. PDF. PDF. More. PS. (PDF.) Precursor of modern backpropagation.[BP1-4] PDF. Link. PDF. First application of backpropagation[BP1] to NNs (concretizing thoughts in his 1974 thesis). [BP4] J. Schmidhuber (AI Blog, 2014; updated 2020). Who invented backpropagation? More.[DL2] PDF. PDF. PDF. [DEC] J. Schmidhuber (AI Blog, 02/20/2020, updated 2021, 2022). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s. The More. Deep Learning. [DL4] J. Schmidhuber (AI Blog, 2017). Our impact on the world's most valuable public companies: Apple, Google, Microsoft, Facebook, Amazon... By greatly improved (CTC-based) on-device speech recognition (on the phone, not the server) LSTM. PDF. J. Schmidhuber (AI Blog, Nov 2020). 15-year anniversary: 1st paper with "learn deep" in the title (2005). Our deep reinforcement learning & neuroevolution solved problems of depth 1000 and more.[DL6] Soon after its publication, everybody started talking about "deep learning." Causality or correlation? neural networks learning to control dynamic external memories.[PDA1-2][FWP0-1] J.  Schmidhuber (AI Blog, 26 March 2021, updated 2022). alternative[FWP0-1] to recurrent NNs. the fast weights[FAST,FASTa] of Such Fast Weight Programmers[FWP0-6,FWPMETA1-8] can learn to memorize past data, e.g., by computing fast weight changes through additive outer products of self-invented activation patterns[FWP0-1] (now often called keys and values for self-attention[TR1-6]). The similar Transformers[TR1-2] combine this with projections Transformers with linearized self-attention[TR5-6] In 1993, he introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves. See tweet of 2022. PDF. "Transformer with linearized self-attention."[FWP] PDF. HTML. Pictures (German). See tweet of 2022 for 30-year anniversary. PDF. Preprint: arXiv:1811.12143. PDF. PDF. Preprint: arXiv:2003.08165. PDF. HTML overview. Linear Transformers Are Secretly Fast Weight Programmers. ICML 2021. Preprint: arXiv:2102.11174. Preprint: arXiv:2106.06295 (June 2021). PDF. PDF. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here. Preprint arXiv:2012.14905 [cs.LG], 2020. Report arXiv:2011.07831 [cs.AI], 2020. Preprint: arXiv:2202.05780. Preprint arXiv/2207.01570, 4 July 2022 (submitted in May 2022). Preprints arXiv:1505.00387 (May 2015) and arXiv:1507.06228 (July 2015). Also at NIPS 2015. The LSTM with forget gates[LSTM2] for RNNs.) Resnets[HW2] are a version of this where the gates are always open: g(x)=t(x)=const=1. Highway Nets perform roughly as well as ResNets[HW2] on ImageNet.[HW3] Variants of highway gates are used for certain algorithmic tasks, where the simpler residual layers do not work as well.[NDR] More. Link. arXiv:1512.03385 (Dec 2015). Residual nets are a version of Highway Nets[HW1] More. arxiv:1612.07771 (2016). Also at ICLR 2017. PDF. PDF. PDF. [LSTM1] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. PDF. More. PDF. PDF. J. Schmidhuber (AI Blog, 2020). 1/3 century anniversary of Searchable PDF scan (created by OCRmypdf which uses LSTM). HTML. better GP methods through Meta-Evolution. More. [MIR] J. Schmidhuber (AI Blog, 2019). Deep Learning: Our Miraculous Year 1990-1991. Preprint arXiv:2005.05744, 2020. J.  Schmidhuber (AI Blog, 2021). The most cited neural networks all build on work done in my labs. Foundations of the most popular NNs originated in my labs at TU Munich and IDSIA. Here I mention: (1) Long Short-Term Memory (LSTM), (2) ResNet (which is our earlier Highway Net with open gates), (3) AlexNet and VGG Net (both building on our similar earlier DanNet: the first deep convolutional NN to win image recognition competitions), Adversarial Artificial Curiosity), and (5) variants of Transformers (Transformers with linearized self-attention are formally equivalent to my earlier Fast Weight Programmers). Annus Mirabilis of 1990-1991.[MIR] PDF. PDF. Preprint arXiv:1608.05343, 2016. The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization. Proc. ICLR 2022. Preprint arXiv/2110.07732. the 1991 publication on what's now called "Transformers with linearized self-attention."[FWP0-6][TR5-6] attention terminology in 1993.[ATT][FWP2][R4] See tweet of 2022 for 30-year anniversary. J. Schmidhuber (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990). This work also introduced high-dimensional reward signals, deterministic policy gradients for RNNs, the GAN principle [R3] Reddit/ML, 2019. NeurIPS 2019 Bengio Schmidhuber Meta-Learning Fiasco. [R4] Reddit/ML, 2019. Five major deep learning papers by G. Hinton did not cite similar earlier work by J. Schmidhuber. [R7] Reddit/ML, 2019. J. Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970. [T22] J. Schmidhuber (AI Blog, 2022). Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award. Technical Report IDSIA-77-21, IDSIA, Lugano, Switzerland, 2022. PDF. J. Schmidhuber (AI Blog, 2021). 30-year anniversary. 1991: First very deep learning with unsupervised or self-supervised pre-training. Unsupervised PDF. 1992. Based on TR FKI-148-91, TUM, 1991.[UN0] PDF. approaches are now widely used. More. [UN2] J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here (depth > 1000). [VAN1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TUM, 15 June 1991 (advisor J. Schmidhuber). PDF. Metalearning or Learning to Learn Since 1987. Juergen Schmidhuber. Transformers with linearized self-attention in Neural Computation 1992, equivalent to fast weight programmers (apart from normalization), separating storage and control. Key/value was called FROM/TO. The attention terminology was introduced at ICANN 1993. Juergen Schmidhuber. Menu directory status & updates copyrights
      https://people.idsia.ch/~juergen/deep-learning-history.html AI Blog
      @SchmidhuberAI
      arXiv:2212.11279 is dominated by artificial neural networks (NNs) and deep learning,[DL1-4] hyperlinks to relevant overview sites from my AI Blog. It also debunks certain popular but misleading historic accounts of deep learning, and supplements my previous deep learning survey[DL1] mentioning my own team's work, because (as of 2022) the most cited NNs are based on it.[MOST] Sec. 1: Introduction
      Sec. 2: 1676: The Chain Rule For Backward Credit Assignment
      Sec. 3: Circa 1800: First Neural Net (NN) / Linear Regression / Shallow Learning
      Sec. 4: 1920-1925: First Recurrent NN (RNN) Architecture. ~1972: First Learning RNNs
      Sec. 5: 1958: Multilayer Feedforward NN (without Deep Learning)
      Sec. 6: 1965: First Deep Learning
      Sec. 7: 1967-68: Deep Learning by Stochastic Gradient Descent
      Sec. 8: 1970: Backpropagation. 1982: For NNs. 1960: Precursor.
      Sec. 9: 1979: First Deep Convolutional NN (1969: Rectified Linear Units)
      Sec. 10: 1980s-90s: Graph NNs / Stochastic Delta Rule (Dropout) / More RNNs / Etc
      Sec. 11: Feb 1990: Generative Adversarial Networks / Artificial Curiosity / NN Online Planners
      Sec. 12: April 1990: NNs Learn to Generate Subgoals / Work on Command
      Sec. 13: March 1991: NNs Learn to Program NNs. Transformers with Linearized Self-Attention
      Sec. 14: April 1991: Deep Learning by Self-Supervised Pre-Training. Distilling NNs
      Sec. 15: June 1991: Fundamental Deep Learning Problem: Vanishing/Exploding Gradients
      Sec. 16: June 1991: Roots of Long Short-Term Memory / Highway Nets / ResNets
      Sec. 17: 1980s-: NNs for Learning to Act Without a Teacher
      Sec. 18: It's the Hardware, Stupid!
      Sec. 19: But Don't Neglect the Theory of AI (Since 1931) and Computer Science
      Sec. 20: The Broader Historic Context from Big Bang to Far Future
      Sec. 21: Acknowledgments
      Sec. 22: 555+ Partially Annotated References (many more in the award-winning survey[DL1])
      quite erroneous ideas about the origins of the universe (see the final section

      A history of AI written in the 1980s would have emphasized topics such as theorem proving,[GOD][GOD34][ZU48][NS56] logic programming, expert systems, and heuristic search.[FEI63,83][LEN83] an old area of research seeing renewed interest. Practical AI dates back at least to 1914, when Leonardo Torres y Quevedo (see below) built the first working chess end game player[BRU1-4] any type of computation-based AI.[GOD][BIB3][GOD21,a,b] emphasis on topics such as support vector machines and kernel methods,[SVM1-4] Bayesian (actually Laplacian or possibly Saundersonian[STI83-85]) reasoning[BAY1-8][FI22] and other concepts of probability theory and statistics,[MM1-5][NIL98][RUS95] decision trees,e.g.,[MIT97] ensemble methods,[ENS1-4] swarm intelligence,[SW1] and evolutionary computation.[EVO1-7]([TUR1],unpublished) Why? Because back then such techniques drove many successful AI applications.

      A history of AI written in the 2020s must emphasize concepts such as the even older chain rule[LEI07] and deep nonlinear artificial neural networks (NNs) trained by gradient descent,[GD'] in particular, feedback-based recurrent networks, which are general computers whose programs are weight matrices.[AC90] Why? Because many of the most famous and most commercial recent AI applications depend on them.[DL4] MACY conferences (1946-1953)[MACY51] and the 1951 Paris conference on calculating machines and human thought, now often viewed as the first conference on AI.[AI51][BRO21][BRU4] modern AI based on "deep learning" with NNs.[DL1-2][DEC] minimize pain, maximize pleasure, drive cars, etc.[MIR](Sec. 0)[DL1-4]

      The present piece also debunks a frequently repeated, misleading "history of deep learning"[S20][DL3,3a] which ignores most of the pioneering work mentioned below.[T22] See Footnote 6. The title image of the present article is a reaction to an erroneous piece of common knowledge which says[T19] that the use of NNs "as a tool to help computers recognize patterns and simulate human intelligence had been introduced in the 1980s," although such NNs appeared long before the 1980s.[T22] on the history of aviation,[NASC1-2] the telephone,[NASC3] the computer,[NASC4-7] resilient robots,[NASC8] and scientists of the 19th century.[NASC9] Finally,

      Leibniz, father of computer science circa 1670, publishes the chain rule in 1676

      In 1676, Gottfried Wilhelm Leibniz textbook on Leibniz' differential calculus.[LEI07-10][L84]

      Cauchy This answer is used by the technique of gradient descent (GD), apparently first proposed by Augustin-Louis Cauchy in 1847[GD'] (and much later by Jacques Hadamard[GD'']; the stochastic version called SGD is due to Herbert Robbins and Sutton Monro (1951)[STO51-52]).

      Footnote 1. In 1684, Leibniz was also the first to publish "modern" calculus;[L84][SON18][MAD05][LEI21,a,b] later Isaac Newton was also credited for his unpublished work.[SON18] Their priority dispute,[SON18] however, did not encompass the chain rule.[LEI07-10] Of course, both were building on earlier work: in the 2nd century B.C., Archimedes (perhaps the greatest scientist ever[ARC06]) paved the way for infinitesimals Sangamagrama and colleagues of the Indian Kerala school.[MAD86-05] "the world's first computer scientist"[LA14]) also laid foundations of modern computer science. He and the first with an internal memory.[BL16] He described the principles of binary computers (1679)[L79][L03][LA14][HO66][LEI21,a,b] His formal Algebra of Thought (1686)[L86][WI48] was deductively equivalent[LE18] to the much later Boolean Algebra (1847).[BOO] all possible questions through computation;[WI48]

      Footnote 3. Some claim that the backpropagation algorithm (discussed further down; now widely used to train deep NNs) is just the chain rule of Leibniz (1676) & L'Hopital (1696).[CONN21] doing this).[T22] It was not published until 1970, as discussed below.[BP1,4,5]

      In 1805, Adrien-Marie Legendre published what's now often called a linear neural network (NN). Johann Carl Friedrich Gauss was also credited for earlier unpublished work on this done circa 1795

      In 1805, Adrien-Marie Legendre published what's now often called a linear neural network (NN). Later Johann Carl Friedrich Gauss was also credited for earlier unpublished work on this done circa 1795.[STI81]

      In 1795, Gauss used what's now called a linear neural net, but Legendre published this first in 1805. Gauss is often called the greatest mathematician since antiquity Rosenblatt's perceptron (1958)[R58] combined a linear NN as above with an output threshold function to obtain a pattern classifier (compare his more advanced work on multi-layer networks discussed below). Joseph[R61] Widrow & Hoff's similar Adaline learned in 1962.[WID62]

      In 1924, Ernst Ising published the first recurrent network architecture: the Ising Model or Lenz-Ising model analyzed by physicists Ernst Ising and Wilhelm Lenz in the 1920s.[L20][I24,I25][K41][W45][T22] It settles into an equilibrium state in response to input conditions, and is the foundation of the first learning RNNs (see below). were also discussed in 1943 by neuroscientists Warren McCulloch und Walter Pitts[MC43] and formally analyzed in 1956 by Stephen Cole Kleene.[K56]

      In 1972, Shun-Ichi Amari made the Ising recurrent net adaptive. This was the first published learning artificial recurrent neural network

      In 1972, Shun-Ichi Amari made the Lenz-Ising recurrent architecture adaptive such that it could learn to associate input patterns with output patterns by changing its connection weights.[AMH1] See also Stephen Grossberg's work on biological networks,[GRO69] David Marr's[MAR71] and Teuvo Kohonen's[KOH72] work, and Kaoru Nakano's learning RNN.[NAK72]

      Alan Turing

      10 years later, the Amari network was republished (and its storage capacity analyzed).[AMH2] Some called it the Hopfield Network (!) or Amari-Hopfield Network.[AMH3] sequence-processing generalization thereof.[AMH1] learning RNNs. This, however, was first published many decades later,[TUR1] which explains the obscurity of his thoughts here.[TUR21] (Margin note: it has been pointed out that the famous "Turing Test" should actually be called the "Descartes Test."[TUR3,a,b][TUR21])

      Today, the most popular RNN is the Long Short-Term Memory (LSTM) mentioned below, which has become the most cited NN of the 20th century.[MOST]

      In 1958, Frank Rosenblatt had  multilayer perceptrons whose last layer learned

      In 1958, Frank Rosenblatt not only combined linear NNs and threshold functions (see the section on shallow learning since 1800), he also had more interesting, deeper multilayer perceptrons (MLPs).[R58] because only the last layer learned,[DL1] Rosenblatt basically had what much later was rebranded as Extreme Learning Machines (ELMs) without proper attribution.[ELM1-2][CONN21][T22]

      MLPs were also discussed in 1961 by Karl Steinbuch[ST61-95] and Roger David Joseph[R61] (1961). See also Oliver Selfridge's multilayer Pandemonium[SE59] (1959). wrote about "back-propagating errors" in an MLP with a hidden layer[R62] although he did not yet have a general deep learning algorithm for deep MLPs. What's now called backpropagation is quite different and was first published in 1970, as discussed below.[BP1-BP5][BPA-C]

      Today, the most popular FNN is a version of the LSTM-based Highway Net (mentioned below) called ResNet,[HW1-3] which has become the most cited NN of the 21st century.[MOST]

      In 1965, Alexey Ivakhnenko & Valentin Lapa introduced  the first working deep learning algorithm for deep MLPs with arbitrarily many hidden layers multiplicative gates).[DEEP1-2][DL1-2][FDL] A paper of 1971[DEEP2] highly cited method which was still popular in the new millennium,[DL2] especially in Eastern Europe, where much of Machine Learning was born.[MIR](Sec. 1)[R8] first introduced to Machine Learning much later by Dechter (1986), and to NNs by Aizenberg et al (2000).[DL2] (Margin note: our 2005 paper on deep learning[DL6,6a] was the first machine learning publication with the word combination "learn deep" in the title.[T22])

      In 1967-68, Shun-Ichi Amari trained deep MLPs by stochastic gradient descent

      Ivakhnenko and Lapa (1965, see above) end-to-end fashion from scratch by stochastic gradient descent (SGD),[GD1] a method proposed in 1951 by Robbins & Monro.[STO51-52]

      Amari's implementation[GD2,GD2a] (with his student Saito) learned internal representations in a five layer MLP with two modifiable layers, which was trained to classify

      See also Iakov Zalmanovich Tsypkin's even earlier work on gradient descent-based on-line learning for non-linear systems.[GDa-b]

      Remarkably, as mentioned above, Amari also published learning RNNs in 1972.[AMH1]

      who invented backpropagation?

      In 1970, Seppo Linnainmaa was the first to publish what's now known as backpropagation, the famous algorithm for credit assignment in networks of differentiable nodes,[BP1,4,5]

      In 1960, Henry J. Kelley had a precursor of backpropagation in the field of control theory

      In 1982, Paul Werbos proposed to use the method to train NNs,[BP2] extending ideas in his 1974 thesis.

      In 1960, Henry J. Kelley already had a precursor of backpropagation in the field of control theory;[BPA] see also later work of the early 1960s by Stuart Dreyfus and Arthur E. Bryson.[BPB][BPC][R7] Unlike Linnainmaa's general method,[BP1] the systems of the 1960s[BPA-C]

      Backpropagation is essentially an efficient way of implementing Leibniz's chain rule[LEI07-10] (1676) (see above) for deep networks. Cauchy's gradient descent[GD'] uses this to such that the NN behaves more and more like some teacher, which could be a human, or another NN,[UN-UN2] or something else. had just become accessible in wealthier academic labs. An experimental analysis of the known method[BP1-2] yield useful internal representations in hidden layers of NNs.[RUM] At least for supervised learning, backpropagation is generally more efficient than Amari's above-mentioned deep learning through the more general SGD method (1967), which learned useful internal representations in NNs about 2 decades earlier.[GD1-2a]

      It took 4 decades until the backpropagation method of 1970[BP1-2] got widely accepted as a training method for deep NNs. Before 2010, many thought that the training of NNs with many layers requires unsupervised pre-training, a methodology introduced by myself in 1991[UN][UN0-3] (see below), and later championed by others (2006).[UN4] In fact, it was claimed[VID1] postdoc Dan Ciresan[MLP1-2] pre-training for important applications.[MLP2]

      10-year anniversary of supervised deep learning breakthrough (2010)

      Our system set a new performance record[MLP1] on Jung & Oh in 2004[GPUNN]). A reviewer called this a "wake-up call to the machine learning community." researchers took a fresh look at the problem in the 1980s."[S20] However, the 1969 book[M69] addressed a "problem" of Gauss & Legendre's shallow learning (circa 1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method,[DEEP1-2][DL2] and then also by Amari's SGD for MLPs.[GD1-2] Minsky neither cited this work nor corrected his book later.[HIN](Sec. I)[T22] (such as the Boltzmann machine[BM][HIN][SK75][G63][T22]) without relating them to the original work,[DLC][S20][T22] although the true history is well-known. in the 1960s-70s, especially outside of the Anglosphere.[DEEP1-2][GD1-3][CNN1][DL1-2][T22] Blatant misattribution and unintentional[PLAG1][CONN21] or intentional[FAKE2] plagiarism are still tainting the entire field of deep learning.[T22] Scientific journals "need to make clearer and firmer commitments to self-correction,"[SV20] as is already the standard in other scientific fields.

      In 1979, Kunihiko Fukushima introduced the convolutional neural network (CNN) architecture. Computer Vision was revolutionized in the 2010s by a particular feedforward NN called the convolutional NN (CNN).[CNN1-4] Neocognitron.[CNN1] rectified linear units (ReLUs) for NNs (1969).[RELU1] They are now widely used in CNNs and other NNs.

      In 1987, NNs with convolutions were combined by Alex Waibel with weight sharing and backpropagation (see above),[BP1-2] and applied to speech.[CNN1a] Waibel did not call this CNNs but TDNNs. called max-pooling was introduced by Yamaguchi et al. for TDNNs in 1990[CNN3a] and by Juan Weng et al. for higher-dimensional CNNs in 1993.[CNN3] Yann LeCun's team has contributed improvements of CNNs, especially for images.[CNN2,4][T22] Baldi and Chauvin (1993) had the first application of CNNs with backpropagation to biomedical/biometric images.[BA93]

      History of computer vision contests won by deep CNNs since 2011 CNNs (Dan Ciresan et al., 2011).[GPUCNN1,3,5] Our fast GPU-based[GPUNN][GPUCNN5] CNN of 2011[GPUCNN1] known as DanNet[DAN,DAN1][R6] CNNs of 2006.[GPUCNN] In 2011, DanNet became the first pure deep CNN to win computer vision contests.[GPUCNN2-3,5] Competition[GPUCNN5] ICDAR 2011 Chinese handwriting DanNet[GPUCNN1-3] IJCNN 2011 traffic signs DanNet[DAN,DAN1][R6] ISBI 2012 image segmentation DanNet[GPUCNN3a] ICPR 2012 medical imaging DanNet[GPUCNN8] ImageNet 2012 AlexNet[GPUCNN4] MICCAI 2013 Grand Challenge DanNet[GPUCNN8] ImageNet 2014 VGG Net[GPUCNN9] ResNet,[HW2] a
      Highway Net[HW1]
      with open gates
      winning four of them in a row (15 May 2011, 6 Aug 2011, 1 Mar 2012, 10 Sep 2012).[GPUCNN5] at IJCNN 2011 in Silicon Valley, DanNet blew away the competition and achieved the first superhuman visual pattern recognition[DAN1] in an international contest. DanNet was also the first deep CNN to win: a Chinese handwriting contest (ICDAR 2011), an image segmentation contest (ISBI, May 2012), and were able to greatly improve steel defect detection.[ST] CVPR paper on DanNet[GPUCNN3] 5 months later, the similar GPU-accelerated AlexNet won the ImageNet[IM09] 2012 contest.[GPUCNN4-5][R6] Our CNN image scanners were 1000 times faster than previous methods.[SCAN] The VGG network (ImageNet 2014 winner)[GPUCNN9] and other highly cited CNNs[RCNN1-3] further extended the DanNet of 2011.[MIR](Sec. 19)[MOST]

      ResNet, the ImageNet 2015 winner[HW2] (Dec 2015) and currently the most cited NN,[MOST] is a version (with open gates) of our earlier Highway Net (May 2015).[HW1-3][R5] The Highway Net (see below) is actually the feedforward net version of our vanilla LSTM (see below).[LSTM2] It was the first working, really deep feedforward NN with hundreds of layers (previous NNs had at most a few tens of layers). NNs with rapidly changing "fast weights" were introduced by v.d. Malsburg (1981) and others.[FAST,a,b] Deep learning architectures that can manipulate structured data such as graphs[T22] were our graph NN-like, Transformer-like Fast Weight Programmers of 1991[FWP0-1][FWP6][FWP] which learn to continually rewrite mappings from inputs to outputs (addressed below), and the work of Baldi and colleagues.[BA96-03] Today, graph NNs are used in numerous applications.

      Werbos,[BP2][BPTT1] Williams,[BPTT2][CUB0-2] and others[ROB87][BPTT3][DL1] analyzed ways of implementing gradient descent[GD'][STO51-52][GDa-b][GD1-2a] in RNNs. Kohonen's self-organising maps became popular.[KOH82-89] in space and time.[BB2][NAN1-4][NHE][HEL] See overviews[MIR](Sec. 15, Sec. 17) and recent renewed interest in such methods.[NAN5][FWPMETA6][HIN22] version of this became popular under the moniker "dropout."[Drop1-4][GPUCNN4] Generative Adversarial Networks (GANs) have become very popular.[MOST] They were first published in 1990 in Munich under the moniker Artificial Curiosity.[AC90-20][GAN1] Two dueling NNs (a probabilistic generator and a predictor) are trying to maximize each other's loss in a minimax game.[AC](Sec. 1) (using stochastic units[AC90] like in the much later StyleGANs[GAN2]). the predictor NN minimizes its error, while the generator NN tries to make outputs that maximize this error: one net's loss is the other net's gain.[AC90] (The world model can also be used for continual online action planning.[AC90][PLAN2-3][PLAN])

      Artificial Curiosity & Creativity Since 1990-91

      4 years before a 2014 paper on GANs,[GAN1] my well-known 2010 survey[AC10] summarised the generative adversarial NNs of 1990 as follows: a given set.[AC20][AC][T22](Sec. XVII) early adversarial machine learning settings[S59][H90] neither involved unsupervised NNs nor were about modeling data nor used gradient descent.[AC20] Predictability Minimization: unsupervised minimax game where one neural network minimizes the objective function maximized by another has been widely used for exploration in Reinforcement Learning[SIN5][OUD13][PAT17][BUR18] for synthesis of realistic images,[GAN1,2] although the latter domain was recently taken over by Rombach et al.'s Latent Diffusion, another method published in Munich,[DIF1] building on Jarzynski's earlier work in physics from the previous millennium[DIF2] and more recent papers.[DIF3-5] Predictability Minimization for creating disentangled representations of partially redundant data, applied to images in 1996.[PM0-2][AC20][R2][MIR](Sec. 7) which is now considered a remaining grand challenge.[LEC] The early 1990s, however, saw first exceptions: NNs that learn to decompose complex spatio-temporal observation sequences into compact but meaningful chunks[UN0-3] (see further below), and NN-based planners of hierarchical action sequences for compositional learning,[HRL0] as discussed next. This work injected concepts of traditional "symbolic" hierarchical AI[NS59][FU77] into end-to-end differentiable "sub-symbolic" NNs. end-to-end differentiable NN-based subgoal generators for Hierarchical Reinforcement Learning (HRL).[HRL0] Soon afterwards, this was also done with recurrent NNs that learn to generate sequences of subgoals.[HRL1-2][PHD][MIR](Sec. 10) problem."[LEC]

      Compare other NNs that have "worked on command" since April 1990, in particular, for learning selective attention,[ATT0-3] artificial curiosity and self-invented problems,[PP][PPa,1,2][AC] upside-down reinforcement learning[UDRL1-2] and its generalizations.[GGP] Recently, Transformers[TR1] have been all the rage, e.g., generating human-sounding texts.[GPT3] Transformers with "linearized self-attention"[TR5-6] were first published in March 1991[FWP0-1][FWP6][FWP] These so-called "Fast Weight Programmers" or "Fast Weight Controllers"[FWP0-1] separated storage and control like in traditional computers, but in an end-to-end-differentiable, adaptive, fully neural way (rather than in a hybrid fashion[PDA1-2][DNC]). The "self-attention" in standard Transformers[TR1-4] combines this with a projection and softmax (using attention terminology like the one I introduced in 1993[ATT][FWP2][R4]).

      26 March 1991: Neural nets learn to program neural nets with fast weights—like today's Transformer variants. 2021: New stuff!

      Today's Transformers heavily use unsupervised pre-training[UN0-3] (see next section), another deep learning methodology Annus Mirabilis of 1990-1991.[MIR][MOST]

      The 1991 fast weight programmers 1992[FWPMETA1-9][HO1] extended my 1987 diploma thesis,[META1] which introduced algorithms not just for learning but also for meta-learning or learning to learn,[META] to learn better learning algorithms through experience. This became very popular in the 2010s[DEC] when computers were a million times faster. layers of neurons or many subsequent computational stages.[MIR] ones[DL1-2] (but see a 1989 paper[MOZ]). of arbitrary depth.[DL1] Before the 1990s, however, RNNs failed to learn deep problems in practice.[MIR](Sec. 0) scales:[LEC] the Neural Sequence Chunker[UN0] or Neural History Compressor.[UN1] First Very Deep Learner of 1991 "very deep learning" tasks of depth > 1000[UN2] (requiring Neural History Compressor.[UN3] (See also recent work on unsupervised NN-based abstraction.[OBJ1-5]) More than a decade after this work,[UN1] called Deep Belief Networks (DBNs).[UN4] (or negative log probability) of the data representation in the level below.[HIN][T22][MIR] using my NN distillation procedure of 1991.[UN0-1][MIR] NN distillation was also republished many years later,[DIST2][MIR][HIN][T22] and is widely used today. used by Transformers[TR1-6] for Transformers with linearized self-attention were also first published[FWP0-6] in Annus Mirabilis of 1990-1991,[MIR][MOST] together with unsupervised/self-supervised pre-training for deep learning.[UN0-3] See the previous section. Sepp Hochreiter's Analysis of the Fundamental Deep Learning Problem (1991) Deep learning is hard because of the Fundamental Deep Learning Problem his diploma thesis which I had the pleasure to supervise.[VAN1] First he implemented the Neural History Compressor above but then did much more: In both cases, learning fails (compare[VAN2]). This analysis led to basic principles of what's now called LSTM (see below). Long Short-Term Memory (LSTM) recurrent neural network[LSTM1-6] overcomes the Fundamental Deep Learning Problem identified by Sepp in his above-mentioned 1991 diploma thesis,[VAN1] which I consider one of the most important documents in the history of machine learning. It also provided essential insights for overcoming the problem, through basic principles (such as constant error flow) of what we called LSTM in a tech report of 1995.[LSTM0] After the main peer-reviewed publication in 1997[LSTM1][25y97] (now the most cited NN article of the 20th century[MOST]), application of LSTM to speech (2004).[LSTM10] 2005 saw the first publication of LSTM with full backpropagation through time and of bi-directional LSTM[LSTM3] (now widely used). Recurrent Neural Networks, especially LSTM Another milestone of 2006 was the training method "Connectionist Temporal Classification" or CTC[CTC] for simultaneous alignment and recognition of sequences. Our team successfully applied CTC-trained LSTM to speech in 2007[LSTM4] (also with hierarchical LSTM stacks[LSTM14]). NNs and traditional approaches such as Hidden Markov Models (HMMs).[BW][BRI][BOU][HYB12][T22] three ICDAR 2009 Connected Handwriting Competitions (French, Farsi, Arabic). LSTM was soon used for everything that involves sequential data such as speech[LSTM10-11][LSTM4][DL1] and videos. Google's speech recognition on the Android smartphones.[GSR15] Many other companies adopted this.[DL4] on-device speech recognition of 2019 (now on your phone, not on the server) LSTM. In 1995, we already had an excellent neural probabilistic text model[SNT] whose basic concepts were Nakamura and Shikano's 1989 word category prediction model.[NPMa] In 2001, we showed that LSTM can learn languages unlearnable by traditional models such as HMMs,[LSTM13] achieve only 10 billion clicks),[FB17][DL4] Apple's Quicktype on roughly 1 billion iPhones,[DL4] the voice of Amazon's Alexa,[DL4] image caption generation[DL4] & automatic email answering[DL4] etc. Business Week called LSTM "arguably the most commercial AI achievement."[AV1] have "LSTM" in their title.[DEC]

      Highway Networks:
our <a href=Highway Network[HW1] (previous NNs had at most a few tens of layers). Microsoft's ResNet[HW2] (which won the ImageNet 2015 contest) is a version thereof The earlier Highway Nets perform roughly as well as their ResNet versions on ImageNet.[HW3] Variants of highway gates are also used for certain algorithmic tasks where the pure residual layers do not work as well.[NDR]

      Deep learning is all about NN depth.[DL1] LSTMs brought essentially unlimited depth to supervised recurrent NNs; in the 2000s, the LSTM-inspired Highway Nets brought it to feedforward Net version called ResNet the most cited NN of the 21st.[MOST] (Citations, however, are a highly questionable measure of true impact.[NAT1]) Reinforcement Learning (RL),[KAE96][BER96][TD3][UNI][GM3][LSTMPG] expected cumulative reward signals.[DL1] formulated in the general RL framework.[UNI] Monte Carlo (tree) search (MC, 1949),[MOC1-5] dynamic programming (DP, 1953),[BEL53] artificial evolution (1954),[EVO1-7]([TUR1],unpublished) alpha-beta-pruning (1959),[S59] control theory and system identification (1950s),[KAL59][GLA85] stochastic gradient descent (SGD, 1951),[STO51-52] and universal search techniques (1973).[AIT7] system identification,[WER87-89][MUN87][NGU89] DP and its online variant called Temporal Differences (TD),[TD1-3] artificial evolution,[EVONN1-3] and policy gradients.[GD1][PG1-3] Many additional references on this can be found in Sec. 6 of the 2015 survey.[DL1]

      When there is a Markovian interface[PLAN3] RL with DP/TD/MC-based FNNs can be very successful, as shown in 1994[TD2] (master-level backgammon player) and the 2010s[DM1-2a] (superhuman players for Go, chess, and other games). history of previous inputs, our combinations of RL algorithms and LSTM[LSTM-RL][RPG] have become standard, in particular, our LSTM trained by policy gradients (2007).[RPG07][RPG][LSTMPG]

      Deep Reinforcement Learning with Policy Gradients for Long Short-Term Memory (LSTM) ) For example, in 2018, a PG-trained LSTM was the core of OpenAI's famous Dactyl which learned to control a dextrous robot hand without a teacher.[OAI1][OAI1a] beat a pro player in the game of Starcraft, which is theoretically harder than Chess or Go[DM2] in many ways, using Alphastar whose brain has a deep LSTM core trained by PG.[DM3] OpenAI Five which learned to defeat human experts in the Dota 2 video game (2018).[OAI2] Bill Gates called this a "huge milestone in advancing artificial intelligence".[OAI2a][MIR](Sec. 4)[LSTMPG] commonsense reasoning[MAR15] and learning to think.[PLAN4-5] time scales?[LEC] We published answers to these questions in 1990-91: self-supervised neural history compressors[UN][UN0-3] learn to represent percepts at multiple levels of abstraction and multiple time scales (see above), while end-to-end differentiable NN-based subgoal generators[HRL3][MIR](Sec. 10) learn hierarchical action plans through gradient descent (see above). More sophisticated ways of learning to think in abstract ways were published in 1997[AC97][AC99][AC02] and 2015-18.[PLAN4-5] century[SHA7a][RAU1] by Heron of Alexandria Highlights of over 2000 years of computing history. Juergen Schmidhuber was perhaps the first machine with a stored program.[BAN][KOE1] It used pins on

      2021: 375th birthday of Leibniz, father of computer science. Juergen Schmidhuber. Wilhelm Schickard, In 1673, the already mentioned Gottfried Wilhelm Leibniz (called "the smartest man who ever lived"[SMO13]) designed the first machine (the step reckoner) that could perform all four arithmetic operations, and the first with a memory.[BL16] cards (1679),[L79][L03][LA14][HO66] and published the chain rule[LEI07-10] (see above), essential ingredient of deep learning and modern AI.

      Leonardo Torres y Quevedo, the  20th century's first pioneer of practical AI Leonardo Torres y Quevedo (mentioned in the introduction) became it at the 1951 Paris AI conference.[AI51][BRO21][BRU4] Konrad Zuse The corresponding patent of 1936[ZU36-38][RO98][ZUS21] predating Claude Shannon's 1937 thesis on digital circuit design.[SHA37] Unlike Babbage, Zuse used Leibniz' principles of binary computation (1679)[L79][LA14][HO66][L03] This greatly simplified the hardware.[LEI21,a,b] Church[CHU] (1935), Turing[TUR] (1936), and Post[POS] (1936). conditional jump instruction.[RO98]

      1941: Konrad Zuse builds first working general computer; patent application 1936. Juergen Schmidhuber. John Atanasoff (the "father of tube-based computing"[NASC6a]). Julius Edgar Lilienfeld in 1925.[LIL1-2] used to break the Nazi code.[NASC6] someone other than Zuse (1941)[RO98] was Howard Aiken's decimal MARK I (US, 1944). and the 1948 upgrade of ENIAC, which was reprogrammed by entering numerical instruction codes into read-only memory.[HAI14b] with several transistors on a common substrate (granted in 1952).[IC49-14] In 1959, Robert Noyce presented a monolithic IC.[IC14] ICs/GPUs of today (2022) contain many billions of transistors (almost all of them of Lilienfeld's 1925 FET type[LIL1-2]). Moore's Law which states that the number of transistors[LIL1-2] raw computational power of all human brains combined.[RAW] According to Bremermann (1982),[BRE] as previously noted back in 2004.[OOPS2][ZUS21] are actually light beams).[DL2] are expected to become even much more important than they are today.[DL2] any type of computation-based AI.[GOD][BIB3][MIR](Sec. 18)[GOD21,21a]

      1931: Theoretical Computer Science & AI Theory Founded by Goedel. Juergen Schmidhuber. He combined Georg Cantor's diagonalization trick[CAN] with the foundational work by Gottlob Frege[FRE] (who introduced the first formal language in 1879), Thoralf Skolem[SKO23] (who introduced primitive recursive functions in 1923) and Jacques Herbrand[GOD86] (who identified Gottfried Wilhelm Leibniz[L86][WI48] (see above), deductively equivalent[LE18] to the later Boolean Algebra of 1847.[BOO] In 1936, Alan M. Turing Turing Machine.[TUR] He rederived the above-mentioned result.[CHU][TUR][HIN][GOD21,21a][TUR21][LEI21,21a] In the same year of 1936, Emil Post published yet another independent universal model of computing.[POS] the world's first working programmable general-purpose computer,[ZU36-38][RO98][ZUS21] the first high-level programming language.[BAU][KNU] 1945[KNU] in 1948.[ZU48] Compare Newell & Simon's later work on theorem proving (1956).[NS56] In 1964, Ray Solomonoff combined Bayesian (actually Laplacian[STI83-85]) probabilistic reasoning and theoretical computer science[GOD][CHU][TUR][POS] of learning to predict future data from past observations.[AIT1][AIT10] With Andrej Kolmogorov, he founded the theory of Kolmogorov complexity or algorithmic information theory (AIT),[AIT1-22] going beyond traditional information theory[SHA48][KUL] this concept,[AIT7][AIT5][AIT12-13][AIT16-17] as well as applications to NNs.[KO2][CO1-3]

      In the early 2000s, Marcus Hutter (while working under my Swiss National Science Foundation grant[UNI]) augmented Solomonoff's universal predictor[AIT1][AIT10] environments.[AIT20,22] He also derived the asymptotically fastest algorithm for all well-defined computational problems,[AIT21] a beautiful pattern of exponential acceleration in it,[OMG] which I have presented in many talks since then, and which also made it into Sibylle Berg's award-winning book "GRM: Brainfuck."[OMG2] intervals: just a few decades or centuries or at most millennia.[OMG1] The most important events since the beginning of the universe seem to be neatly aligned on a timeline of exponential acceleration converging in an Omega point in the year 2040 or so (J Schmidhuber, 2014) Heron of Alexandria[RAU1] in the 1st century). The telephone (e.g., Meucci 1857, Reis 1860, Bell 1876)[NASC3] Haber-Bosch process for creating artificial fertilizer, without which the world could feed at most 4 billion people.[HAB1-2] first truly self-driving cars robot cars were driving in highway traffic, up to 180 km/h).[AUT] Back then, I worked on my 1987 diploma thesis,[META1] which introduced algorithms not just for learning but also for meta-learning or learning to learn,[META] to learn better learning algorithms through experience (now a very popular topic[DEC]). And then came our Miraculous Year 1990-91[MIR] at TU Munich, the root of today's most cited NNs[MOST] and of modern deep learning through artificial curiosity and generative adversarial NNs for agents that invent their own problems (see above),[AC90-AC20][PP-PP2][SA17] Transformers with linearized self-attention (see above),[FWP0-6][TR5-6] distilling teacher NNs into student NNs (see above),[UN][UN0-3] at multiple levels of abstraction and multiple time scales (see above),[HRL0-2][LEC] and other exciting stuff. Much of this has become very popular, and improved the lives of billions of people.[DL4][DEC][MOST] (take all of this with a grain of salt, though[OMG1]). lab for decades[AC][AC90,AC90b]) will quickly improve themselves, restricted only by the fundamental limits of computability and physics. it,[ACM16][FA15][SP16][SA17] make more and bigger AIs. Those who don't won't have an impact.[ACM16][FA15][SP16] the simplest and fastest way of computing all possible metaverses or computable universes. Juergen Schmidhuber, 1997

      Creative Commons License Some of the material above was taken from previous AI Blog posts.[MIR] [DEC] [GOD21] [ZUS21] [LEI21] [AUT] [HAB2] [ARC06] [AC] [ATT] [DAN] [DAN1] [DL4] [GPUCNN5,8] [DLC] [FDL] [FWP] [LEC] [META] [MLP2] [MOST] [PLAN] [UN] [LSTMPG] [BP4] [DL6a] [HIN] [T22] publication page and my arXiv page. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 555+ References (and many more in the survey[DL1]) In 2022, we are celebrating the following works from a quarter-century ago. 1. Journal paper on Long Short-Term Memory, the (and basis of the most cited NN of the 21st). all possible metaverses 3. Implementing artificial curiosity and creativity through generative adversarial agents that learn to design abstract, interesting computational experiments. meta-reinforcement learning. 5. Journal paper on hierarchical Q-learning. 8. Journal paper on Low-Complexity Art, the Minimal Art of the Information Age. J.  Schmidhuber (AI Blog, 2021). 3 decades of artificial curiosity & creativity. PDF. The first paper on online planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks (more). PDF. More. PDF. PDF. general system systems with intrinsic motivation,[AC90-AC95] the system also See later publications.[AC99][AC02] PDF. PDF. PDF. (More on artificial scientists and artists.) IEEE link. PDF. With a brief summary of the generative adversarial neural networks of 1990[AC90,90b][AC20] (more). Preprint arXiv/1906.04493. Link. [AIB] J. Schmidhuber. AI Blog. Includes variants of chapters of the AI Book. H. Bruderer[BRU4] calls that the first conference on AI. Blog of Werner Vogels, CTO of Amazon (Nov 2016): PDF. First publication of what was later sometimes called the Hopfield network[AMH2] or Amari-Hopfield Network,[AMH3] based on the (uncited) Lenz-Ising recurrent architecture.[L20][I25][T22] Mentions the recurrent Ising model[L20][I25]on which the (uncited) Amari network[AMH1,2] is based. The Hopfield network or Amari-Hopfield Network was first published in 1972 by Amari.[AMH1] [AMH2] did not cite [AMH1]. [ATT] J. Schmidhuber (AI Blog, 2020). 30-year anniversary of end-to-end differentiable sequential neural attention. Plus goal-conditional reinforcement learning. Schmidhuber Transformers with linearized self-attention (1991-93).[FWP] Today, both types are very popular. PDF. PDF. More. PS. (PDF.) H. Larochelle, G. E. Hinton. Learning to combine foveal glimpses with a third-order Boltzmann machine. NIPS 2010. This work is very similar to [ATT0-2] which the authors did not cite. In fact, Hinton was the reviewer of a 1990 paper[ATT2] his own work:[ATT3] attentional component (the fixation controller)." See [MIR](Sec. 9)[R4]. arXiv/1409.0473, 2014-16. This work on soft "attention" did not cite Schmidhuber's much earlier original work of 1991-1993 on soft attention and Transformers with linearized self-attention.[FWP,FWP0-2,6][ATT] J.  Schmidhuber (AI Blog, 2005). Highlights of robot car history. Around Bloomberg, May 15, 2018. PDF. HTML. PDF. by Sherrington & Kirkpatrick[SK75] & Glauber[G63] nor the first working algorithms for deep learning of internal representations (Ivakhnenko & Lapa, 1965)[DEEP1-2][HIN] nor Amari's work (1967-68)[GD1-2] on learning internal representations in deep nets through stochastic gradient descent. Even later surveys by the authors[S20][DLC] failed to cite the prior art.[T22] formal Algebra of Thought (1686)[L86][WI48] was deductively equivalent[LE18] to the much later Precursor of modern backpropagation.[BP1-5] PDF. Link. PDF. First application of backpropagation[BP1] to NNs (concretizing thoughts in Werbos' 1974 thesis). [BP4] J. Schmidhuber (AI Blog, 2014; updated 2020). Who invented backpropagation? More.[DL2] Link. IEEE Spectrum, 2021. Link. English version: [CNN1+]. More in Scholarpedia. Link. [CNN1a] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. First application of backpropagation[BP1-5] and weight-sharing PDF. Spatial Averaging.[CNN1] Spatial Averaging.[CNN1] PDF. PDF. PDF. Inverse, 2016. Link. Since November 2021: Comments on version 1 of the report[T22] in the Connectionists Mailing List, perhaps the oldest mailing list on artificial neural networks. Link to the archive. PDF. PDF. Beijing, 2014. Preprint arXiv:1402.3511 [cs.NE]. J. Schmidhuber (AI Blog, 2021). 10-year anniversary. In 2011, DanNet triggered the deep convolutional neural network (CNN) revolution. Named 1st superhuman result in 2011.[DAN1] Now everybody is using this approach. J. Schmidhuber (AI Blog, 2011; updated 2021 for 10th birthday of DanNet): First superhuman visual pattern recognition. the artificial neural network called DanNet [DEC] J. Schmidhuber (AI Blog, 02/20/2020, updated 2021, 2022). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s. The 1991 NN distillation procedure,[UN0-2][MIR](Sec. 2) More. Deep Learning. HTML. A "survey" of deep learning that does not mention the pioneering works of deep learning [T22]. [DL3a] Y. Bengio, Y. LeCun, G. Hinton (2021). Turing Lecture: Deep Learning for AI. Communications of the ACM, July 2021. HTML. Local copy (HTML only). Another "survey" of deep learning that does not mention the pioneering works of deep learning [T22]. [DL4] J. Schmidhuber (AI Blog, 2017). Our impact on the world's most valuable public companies: Apple, Google, Microsoft, Facebook, Amazon... By greatly improved (CTC-based) on-device speech recognition (on the phone, not the server) LSTM. PDF. J. Schmidhuber (AI Blog, Nov 2020). 15-year anniversary: 1st paper with "learn deep" in the title (2005). The deep reinforcement learning & neuroevolution developed in Schmidhuber's lab solved problems of depth 1000 and more.[DL6] Soon after its publication, everybody started talking about "deep learning." Causality or correlation? Web site deeplearning.net of Y. Bengio's MILA (2015, retrieved May 2020; compare the version in the Internet Archive), referring to Hinton's[UN4] and Bengio's[UN5] unsupervised pre-training for deep NNs[UN4] (2006) although this type of deep learning dates back to Schmidhuber's work of 1991.[UN1-2][UN] [DLC] J. Schmidhuber (AI Blog, June 2015). Critique of Paper by self-proclaimed[DLC2] "Deep Learning Conspiracy" (Nature 521 p 436). it). More on this under [T22]. J. Schmidhuber (AI Blog, 2022). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, IDSIA, Lugano, Switzerland, 2022. Preprint arXiv:2212.11279. Tweet of 2022. arxiv:1312.5602. Link. the first sentence of the abstract of the earlier tech report version[DM1] was created earlier by Jan Koutnik et al. in Schmidhuber's lab.[CO2] and PhDs in computer science. More. Alphastar has a "deep LSTM core." Hochreiter et al.'s first successful application [HO07] of deep learning to protein folding (2007). Preprint arXiv:2112.10752, LMU Munich, 2021. neural networks learning to control dynamic external memories.[PDA1-2][FWP0-1] arXiv:1808.03578, 2018. arXiv:1808.03578, 2018. Conf. on Neural Networks, Vol. 2, 2004, pp. 985-990. This paper does not mention that the "ELM" concept goes back to Rosenblatt's work in the 1950s.[R62][T22] This overview does not mention that the "ELM" concept goes back to Rosenblatt's work in the 1950s.[R62][T22] Link. used LSTM over 4 billion automatic translations per day (The Verge, August 4, 2017); Facebook blog by J.M. Pino, A. Sidorov, N.F. Ayan (August 3, 2017) [FDL] J. Schmidhuber (AI Blog, 2013). My First Deep Learning System of 1991 + Deep Learning Timeline 1960-2013. PDF. J.  Schmidhuber (AI Blog, 26 March 2021, updated 2022). alternative[FWP0-1] to recurrent NNs. the fast weights[FAST,FASTa,b] of Such Fast Weight Programmers[FWP0-6,FWPMETA1-8] can learn to memorize past data, e.g., by computing fast weight changes through additive outer products of self-invented activation patterns[FWP0-1] (now often called keys and values for self-attention[TR1-6]). The similar Transformers[TR1-2] combine this with projections Transformers with linearized self-attention[TR5-6] In 1993, he introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves. See tweet of 2022. PDF. normalization).[FWP] PDF. HTML. Pictures (German). See tweet of 2022 for 30-year anniversary. PDF. Preprint: arXiv:1811.12143. PDF. PDF. Very similar to [FWP0-2], in both motivation [FWP2] and execution. This work on "attention" did not cite Schmidhuber's much earlier original work of 1991-1993 on soft attention and Transformers with linearized self-attention.[FWP,FWP0-2,6][ATT] Preprint: arXiv:2003.08165. PDF. HTML overview. Linear Transformers Are Secretly Fast Weight Programmers. ICML 2021. Preprint: arXiv:2102.11174. Preprint: arXiv:2106.06295 (June 2021). PDF. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here. Preprint arXiv:2012.14905 [cs.LG], 2020. Report arXiv:2011.07831 [cs.AI], 2020. Preprint: arXiv:2202.05780. PDF. Probably the first paper on using stochastic gradient descent[STO51-52] reverse mode of automatic differentiation or backpropagation[BP1]). OCR-based PDF scan of pages 94-135 (see pages 119-120). Implementation of Amari's 1967 stochastic gradient descent method for multilayer perceptrons.[GD1] (S. Amari, personal communication, 2021.) Preprint arXiv/2207.01570, 4 July 2022 (submitted in May 2022). arXiv:cs/0309048 (2003). More. PDF. Cognitive Computation 1(2):177-193, 2009. PDF. More. Google Research Blog, Sep 2015, see also Aug 2015 Google's speech recognition based on CTC and LSTM. Alphr Technology, Jul 2015, or 9to5google, Jul 2015 WIRED, Sep 2016, siliconANGLE, Sep 2016 Blog post, Internet Archive, 2010. A blog post describing basic ideas[AC][AC90,AC90b][AC20] of GANs. A description of GANs that does not cite Schmidhuber's original GAN principle of 1990[AC][AC90,AC90b][AC20][R2][T22] (also containing wrong claims about Schmidhuber's adversarial NNs for Predictability Minimization[PM0-2][AC20][T22]). Link. This was number 1 on Hacker News. Frankfurter Allgemeine Zeitung, 16/6/2021. Preprint arXiv/2005.14165. for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. PDF. ArXiv preprint. win four important computer vision competitions 2011-2012 before others won any PDF. HTML overview. competitor.[DAN1] This led to massive interest from industry. [GPUCNN3] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, July 2012. PDF. Longer TR of Feb 2012: arXiv:1202.2745v1 [cs.CV]. More. PDF. DanNet,[DAN,DAN1][R6] to win computer vision contests in 2011[GPUCNN2-3,5] (AlexNet and VGG Net[GPUCNN9] followed in 2012-2014). [GPUCNN4] emphasizes benefits of Fukushima's ReLUs (1969)[RELU1] and dropout (a variant of Hanson 1990 stochastic delta rule)[Drop1-4] but neither cites the original work[RELU1][Drop1] nor the basic CNN architecture (Fukushima, 1979).[CNN1] J. Schmidhuber (AI Blog, 2017; updated 2021 for 10th birthday of DanNet): History of computer vision contests won by deep CNNs since 2011. DanNet was the first CNN to win one, and won 4 of them in a row before the similar AlexNet/VGG Net and the Resnet (a Highway Net with open gates) joined the party. Today, deep CNNs are standard in computer vision. PDF. PDF. [GPUCNN8] J. Schmidhuber (AI Blog, 2017; updated 2021 for 10th birthday of DanNet). first deep learner to win a medical imaging contest (2012). Link. J.  Schmidhuber (Blog, 2000). Most influential persons of the 20th century (according to Nature, 1999). The Haber-Bosch process has often been called the most important invention of the 20th century[HAB1] PDF. PDF. Bengio claimed[YB20] Schmidhuber's publications on exactly this topic date back to 1991-93.[UN0-2][UN] An unsupervised learning algorithm related to Schmidhuber's supervised Neural Heat Exchanger.[NHE] [HIN] J. Schmidhuber (AI Blog, 2020). Critique of Honda Prize for Dr. Hinton. Science must not allow corporate PR to distort the academic record. See also [T22]. previous related work.[BB2][NAN1-4][NHE][MIR](Sec. 15, Sec. 17)[FWPMETA6] PDF. what Y. LeCun called an "open problem" in 2022.[LEC] North-Holland, 1991. PDF. Extending TR FKI-129-90, TUM, 1990. PDF. This work did not cite Schmidhuber's gradient-based subgoal generators for hierarchical reinforcement learning (1990).[HRL0-2] PDF. Preprints arXiv:1505.00387 (May 2015) and arXiv:1507.06228 (July 2015). Also at NIPS 2015. The LSTM with forget gates[LSTM2] for RNNs.) Resnets[HW2] are a version of this where the gates are always open: g(x)=t(x)=const=1. Highway Nets perform roughly as well as ResNets[HW2] on ImageNet.[HW3] Variants of highway gates are also used for certain algorithmic tasks, where the simpler residual layers do not work as well.[NDR] More. Link. arXiv:1512.03385 (Dec 2015). Residual nets are a version of Highway Nets[HW1] More. arxiv:1612.07771 (2016). Also at ICLR 2017. This work did not cite the earlier LSTM[LSTM0-6] trained by Connectionist Temporal Classification (CTC, 2006).[CTC] CTC-LSTM was successfully applied to speech in 2007[LSTM4] (also with hierarchical LSTM stacks[LSTM14]) and became the first superior end-to-end neural speech recogniser that outperformed the state of the art, dramatically improving Google's speech recognition.[GSR][GSR15][DL4] Markov models (HMMs).[BW][BRI][BOU] [HYB12] still used the old hybrid approach and did not compare it to CTC-LSTM. Later, however, Hinton switched to LSTM, too.[LSTM8] Ernst Ising and Wilhelm Lenz in the 1920s.[L20][I25][K41][W45][T22] It settles into an equilibrium state in response to input conditions, and is the foundation of the first well-known learning RNNs.[AMH1-2] Who Invented the IC? Preprint arXiv:1704.04760 PDF. PDF. Mathematischen Schriften, ed. C. Gerhardt, Berlin 1879, vol.7, p.223. English link. Link. arXiv:1607.06450, 2016. [LEC] J. Schmidhuber (AI Blog, 2022). LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015. Years See tweet1. LeCun also listed the "5 best ideas 2012-2022" without mentioning that See tweet2. [LEI21] J. Schmidhuber (AI Blog, 2021). 375th birthday of Leibniz, founder of computer science. Frankfurter Allgemeine Zeitung (FAZ), 17/5/2021. FAZ online: 19/5/2021. [LEI21b] J. Schmidhuber (AI Blog, 2021). 375. Geburtstag des Herrn Leibniz, dem Vater der Informatik. PDF. [LSTM1] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. PDF. Based on [LSTM0]. More. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. Preprint: arxiv:1506.07452. PDF. J. Schmidhuber (AI Blog, Dec 2020). 10-year anniversary of our journal paper on deep reinforcement learning with policy gradients for LSTM (2007-2010). Recent PDF. are actually a variant of the vanilla LSTM architecture[LSTM2] (2000) which the authors did not cite although this work[LSTM2] was the one that introduced gated recurrent units. Furthermore, Schmidhuber's team automatically evolved lots of additional LSTM variants and topologies already in 2009[LSTM7] without changing the name of the basic method. learn to count[LSTMGRU2] nor learn simple non-regular languages;[LSTMGRU2] they according to Google Brain.[LSTMGRU3]) Preprint arXiv:1805.04908. Architectures. Preprint arXiv:1703.03906 A misleading "history of deep learning" goes more or less like this: "In 1969, Minsky & Papert[M69] researchers took a fresh look at the problem in the 1980s."[S20] However, the 1969 book[M69] addressed a "problem" of Gauss & Legendre's shallow learning (circa 1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method,[DEEP1-2][DL2] and then also by Amari's SGD for MLPs.[GD1-2] Minsky was apparently unaware of this and failed to correct it later.[HIN](Sec. I)[T22](Sec. XIII) J. Schmidhuber (AI Blog, 2020). 1/3 century anniversary of Searchable PDF scan (created by OCRmypdf which uses LSTM). HTML. better GP methods through Meta-Evolution. More. [MIR] J. Schmidhuber (AI Blog, Oct 2019, updated 2021, 2022). Deep Learning: Our Miraculous Year 1990-1991. Preprint arXiv:2005.05744, 2020. The Computation 22(12): 3207-3220, 2010. ArXiv Preprint. (AI Blog, Sep 2020). 10-year anniversary of supervised deep learning breakthrough (2010). No unsupervised pre-training. By 2010, when compute was 100 times more expensive than today, both the feedforward NNs[MLP1] J.  Schmidhuber (AI Blog, 2021). The most cited neural networks all build on work done in my labs. Foundations of the most popular NNs originated in Schmidhuber's labs at TU Munich and IDSIA. (1) Long Short-Term Memory (LSTM), (2) ResNet (which is the earlier Highway Net with open gates), (3) AlexNet and VGG Net (both building on the similar earlier DanNet: the first deep convolutional NN to win image recognition competitions), Adversarial Artificial Curiosity), and (5) variants of Transformers (Transformers with linearized self-attention are formally equivalent to the much earlier Fast Weight Programmers). Annus Mirabilis of 1990-1991.[MIR] PDF. PDF. Preprint arXiv:1608.05343, 2016. Preprint arXiv:1611.01578 (PDF), 2017. Compare the earlier Neural Architecture Search of Bayer et al. (2009) for LSTM-like topologies.[LSTM7] [NASC1] J. Schmidhuber. First Pow(d)ered flight / plane truth. Correspondence, Nature, 421 p 689, Feb 2003. [NASC3] J. Schmidhuber. The last inventor of the telephone. Letter, Science, 319, no. 5871, p. 1759, March 2008. Correspondence, Nature, vol 483, p 541, March 2012, doi:10.1038/483541b. Letter, Science, vol 336, p 1639, June 2012. See also comment on response by A. Hodges (DOI:10.1126/science.336.6089.1639-a) [NASC6] J. Schmidhuber. Colossus was the first electronic digital computer. Correspondence, Nature, 441 p 25, May 2006. [NASC6a] J. Schmidhuber. Comment on "Biography: The ABC of computing" by J. Gilbey, Nature 468 p 760-761 (2010). Link. [NASC7] J. Schmidhuber. Turing's impact. Correspondence, Nature, 429 p 501, June 2004 [NASC8] J. Schmidhuber. Prototype resilient, self-modeling robots. Correspondence, Science, 316, no. 5825 p 688, May 2007. [NASC9] J. Schmidhuber. Comparing the legacies of Gauss, Pasteur, Darwin. Correspondence, Nature, vol 452, p 530, April 2008. HTML. The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization. Proc. ICLR 2022. Preprint arXiv/2110.07732. Link. excellent 1995 neural probabilistic text model.[SNT] See also Nakamura and Shikano's 1989 word category prediction model.[NPMa] Compare Konrad Zuse's much earlier 1948 work on theorem proving[ZU48] the first high-level programming language.[BAU][KNU] NY Times article Learning Dexterous In-Hand Manipulation. arxiv:1312.5602 (PDF). arxiv:1912.06680. An LSTM composes 84% of the model's total parameter count. 2018. An LSTM with 84% of the model's total parameter count was the core of OpenAI Five. Link. J. Schmidhuber (Blog, 2006). Is History Converging? Again? history's exponential acceleration since the Big Bang.[OMG] Preprint arXiv/1606.06724. Preprint arXiv/1708.03498. Preprint arXiv/1802.10353. Preprint arXiv/2010.03635. Preprint arXiv/2011.12930. PDF. HTML. HTML overview. OOPS source code in crystalline format. PDF. HTML. Link. J. Schmidhuber (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990). This work also introduced high-dimensional reward signals, deterministic policy gradients for RNNs, and the GAN principle Based on TR FKI-126-90 (1990).[AC90] More. PDF. Partially based on TR FKI-126-90 (1990).[AC90] Report arXiv:1210.0118 [cs.AI], 2015. One Big Net For Everything. Preprint arXiv:1802.08864 [cs.AI], Feb 2018. Preprint: arXiv:1809.01999. Github: World Models. minimization. TR CU-CS-565-91, Univ. Colorado at Boulder, 1991. PDF. More. 1991. PDF. More. PDF. More. Link. arXiv:1112.5309 [cs.AI] PDF. First Experiments with PowerPlay. arXiv:1210.8385 [cs.AI]. [R1] Reddit/ML, 2019. Hinton, LeCun, Bengio receive ACM Turing Award. This announcement contains more comments about Schmidhuber than about any of the awardees. [R2] Reddit/ML, 2019. J. Schmidhuber really had GANs in 1990. [R3] Reddit/ML, 2019. NeurIPS 2019 Bengio Schmidhuber Meta-Learning Fiasco. in 1987[META1][META] long before Bengio [R4] Reddit/ML, 2019. Five major deep learning papers by G. Hinton did not cite similar earlier work by J. Schmidhuber. [R5] Reddit/ML, 2019. The 1997 LSTM paper by Hochreiter & Schmidhuber has become the most cited deep learning research paper of the 20th century. [R6] Reddit/ML, 2019. DanNet, the CUDA CNN of Dan Ciresan in J. Schmidhuber's team, won 4 image recognition challenges prior to AlexNet. [R7] Reddit/ML, 2019. J. Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970. [R8] Reddit/ML, 2019. J. Schmidhuber on Alexey Ivakhnenko, godfather of deep learning 1965. [R9] Reddit/ML, 2019. We [R11] Reddit/ML, 2020. Schmidhuber: Critique of Honda Prize for Dr. Hinton [R12] Reddit/ML, 2020. J. Schmidhuber: Critique of Turing Award for Drs. Bengio & Hinton & LeCun [R15] Reddit/ML, 2021. J. Schmidhuber's work on fast weights from 1991 is similar to linearized variants of Transformers Although these MLPs did not yet have deep learning, because only the last layer learned,[DL1] Rosenblatt basically had what much later was rebranded as Extreme Learning Machines (ELMs) without proper attribution.[ELM1-2][CONN21][T22] J. Schmidhuber (AI Blog, 2001). Raw Computing Power. Preprint arXiv/1311.2524, Nov 2013. Preprint arXiv/1703.06870, 2017. PDF. The first paper on policy gradients for LSTM. This approach has become very important in reinforcement learning.[LSTMPG] This experimental analysis of backpropagation did not cite the origin of the method,[BP1-5] also known as the reverse mode of automatic differentiation. the first working algorithms for deep learning of internal representations (Ivakhnenko & Lapa, 1965)[DEEP1-2][HIN] as well as Amari's work (1967-68)[GD1-2] on learning internal representations in deep nets through stochastic gradient descent. Even later surveys by the authors[DL3,3a] failed to cite the prior art.[T22] Link. A misleading "history of deep learning" which goes more or less like this: "In 1969, Minsky & Papert[M69] researchers took a fresh look at the problem in the 1980s."[S20] However, the 1969 book[M69] addressed a "problem" of Gauss & Legendre's shallow learning (circa 1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method,[DEEP1-2][DL2] and then also by Amari's SGD for MLPs.[GD1-2] Minsky was apparently unaware of this and failed to correct it later.[HIN](Sec. I)[T22](Sec. XIII) in the 1960s-70s, especially outside of the Anglosphere.[DEEP1-2][GD1-3][CNN1][DL1-2][T22] The Past, Present and Future of Artificial Intelligence. Link. PDF. Much later this was called a probabilistic language model.[T22] PDF. Link. ACM's justification of the 2018 A.M. Turing Award (announced in 2019). WWW link. Local copy 1 (HTML only). Local copy 2 (HTML only). [T22] debunks this justification. [T20a] J. Schmidhuber (AI Blog, 25 June 2020). Critique of 2018 Turing Award for Drs. Bengio & Hinton & LeCun. A precursor of [T22]. [T22] J. Schmidhuber (AI Blog, 2022). Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award. Technical Report IDSIA-77-21, IDSIA, Lugano, Switzerland, 2022. Debunking [T19] and [DL3a] . the 1991 publication on what's now called "Transformers with linearized self-attention."[FWP0-6][TR5-6] attention terminology in 1993.[ATT][FWP2][R4] See tweet of 2022 for 30-year anniversary. Link. [TUR21] J. Schmidhuber (AI Blog, Sep 2021). Turing Oversold. It's not Turing's fault, though. The Turing Test. YouTube video, 2022. Preprint arXiv/1912.02875, 5 Dec 2019. Preprint arXiv/1912.02877, 5 Dec 2019. J. Schmidhuber (AI Blog, 2021). 30-year anniversary. 1991: First very deep learning with unsupervised or self-supervised pre-training. Unsupervised PDF. By 1993, the approach solved problems of depth 1000 [UN2] neural knowledge distillation procedure The systems of 1991 allowed for much deeper learning than previous methods. More. 1992. Based on TR FKI-148-91, TUM, 1991.[UN0] PDF. approaches are now widely used. More. [UN2] J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here (depth > 1000). 2006. PDF. It did not cite the much earlier 1991 unsupervised pre-training of stacks of more general recurrent NNs (RNNs)[UN0-3] the first NNs shown to solve very deep problems. (or negative log probability) of the data representation in the level below.[HIN][T22][MIR] This can greatly facilitate very deep downstream learning.[UN0-3] The comment under reference[UN4] applies here as well. Theory of Universal Learning Machines & Universal AI. Link. [VAN1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TUM, 1991 (advisor J. Schmidhuber). PDF. More on the Fundamental Deep Learning Problem. Results are essentially identical to those of Schmidhuber's diploma student Sepp Hochreiter (1991).[VAN1] Even after a common publication,[VAN3] the first author of [VAN2] published papers[VAN4] that cited only their own [VAN2] but not the original work. PDF. [VAN4] Y. Bengio. Neural net language models. Scholarpedia, 3(1):3881, 2008. Link. Link. Youtube video [see 28:16]. However, in 2010, Schmidhuber's team in Switzerland showed[MLP1-2] unsupervised pre-training is not necessary Preprint arXiv:1609.08144 (PDF), 2016. Based on LSTM which it mentions at least 50 times. WWW link (retrieved 15 May 2020). Local copy (plain HTML only). Schmidhuber's publications on exactly this topic date back to 1991-93.[UN0-2][UN] already in 1995.[SNT] a general, practical, program-controlled computer. architecture [NEU45]. PDF. J. Schmidhuber (AI Blog, 2021). 80th anniversary celebrations: 1941: Konrad Zuse completes the first working general computer, based on his 1936 patent application. J. Schmidhuber (AI Blog, 2021). 80. Jahrestag: 1941: Konrad Zuse baut ersten funktionalen Allzweckrechner, basierend auf der Patentanmeldung von 1936. Weltwoche, Nr. 33.21, 19 August 2021. PDF. Menu directory status & updates copyrights Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award AI Blog
      Twitter: @SchmidhuberAI
      (v1: 24 Sep 2021, v2: 31 Dec 2021) Versions since 2021 archived in the Internet Archive This is a point-for-point critique of ACM's justification of the ACM A. M. Turing Award for deep learning, as well as a critique of the Turing Lecture given by the awardees (published by ACM in July 2021). deep learning survey,[DL1] and can also be seen as a short history of the deep learning revolution, at least as far as ACM's erroneous laudation and the Turing Lecture are concerned. 2015 survey of deep learning[DL1] June 2020 article[T20a][R12] version 1 of the present report. (see Executive Summary I, V, II, XII, XIX, XXI, XIII, XIV, XX, XVII). (A) speech recognition, (B) natural language processing, (C) robotics, (D) computer vision, (VII) medicine, astronomy, materials science. A, B, C, D, VII, XVII, VI, XVI). II, V, XX, XVIII) with Dr. Bengio & Dr. Hinton (see Sec. XVII, I). I respond to LBH's recent ACM article (July 2021). expands material in my Critique of the 2019 Honda Prize[HIN] (~3,000 words). Abstract & Outline (~300 words), Introduction (~300 words), Critique of LBH's ACM article (Turing Lecture) of July 2021[DL3a] Executive summary of what's wrong with ACM's laudation (~1,000 words), 21 comments on 21 claims by ACM (~8,000 words), Conclusion (~2,000 words). All backed up by over 300 references (over 10,000 words). The text contains numerous hyperlinks to relevant overview sites from the AI Blog. science is self-correcting."[SV20] they are mine or other people's.[DL1-2][HIN][NASC1-9] The present page is offered as a resource for all good computer scientists who share this inclination. and to fight plagiarism,[FAKE2] collusion rings,[LIT21] and systemic academic corruption in all of their more and less subtle forms.[FAKE] Sec. 2 LBH's 2021 ACM article[DL3a] which necessitated an extension of the first version of this post.[T20a][R12] ACM's official justification[T19] of the 2018 A.M. Turing Award[R1] After the Executive Summary in Sec. 3, Sec. 4 will split ACM's full text[T19] into 21 parts I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, XIX, XX, XXI. Most of the critiques are based on references to original papers and material from the AI Blog.[AIB][MIR][DEC][HIN] publishing yet another misleading overview of the field, this time based on LBH's Turing Lecture.[DL3a] LBH's well-known earlier omissions.[DLC][HIN][T20a] LBH claim to "briefly describe the origins of deep learning"[DL3a] without even mentioning the world's first working deep learning nets by Ivakhnenko and Lapa in 1965[DEEP1-2][R8] (see Sec. II). this class of methods was pioneered in 1991[UN-UN2] (see Sec. II, III). Highway Net, the first really deep feedforward NN.[HW1-3] (see Sec. D, VI). were all driven by my lab:[MOST] In 1991, I had the first very deep NNs based on unsupervised pre-training;[UN-UN2] LSTMs brought essentially unlimited depth to gradient-based supervised recurrent NNs;[LSTM0-17] later our Highway Nets[HW1-3] brought it to feedforward NNs. from 2007[LSTM4,14] based on LSTM[LSTM0-6] (1990s-2005) and CTC (2006).[CTC] our CTC-LSTM-based speech recognition (not that of Hinton) had been on most smartphones for years[GSR][GSR15-19][DL4] (see Sec. A, VI, XI, XV). Similarly for machine translation (see Sec. B). LBH cite Hinton (2012) for "dropout" without mentioning that dropout is just a variant of Hanson's 1990 stochastic delta rule[Drop1-3] (see Sec. XIV). perceptrons through stochastic gradient descent[GD1-3] (without reverse mode backpropagation[BP1]). Fukushima who introduced ReLUs in 1969[RELU1-2] (see Sec. XIV). called AlexNet,[GPUCNN4] without mentioning that our earlier groundbreaking deep GPU-based DanNet[GPUCNN1-3,5-8][DAN] did not need ReLUs at all to win 4 earlier object recognition competitions and to achieve superhuman results already in 2011[GPUCNN1-8][R5-6] (see Sec. XIV). XVIII). already in 1965[DEEP1-2][R8] (see Sec. II). earlier fast weights of von der Malsburg (1981) and Feldman (1982).[FAST,FASTa-b][FWP] described in the 1991-93 papers on Fast Weight Programmers and linear Transformers[FWP0-1,6] (see Sec. XVI, XVII-2). dedicate an extra section to attention-based Transformers,[TR1-6] citing Bengio's team (2014) for "soft attention"[ATT14] without citing the much earlier original work of 1991-1993 on soft attention and linear Transformers[FWP,FWP0-2,6][ATT] (see Sec. XVII-1, XVI). LBH claim that Bengio's team[NPM] of text compression[SNT] (see Sec. XVI, XVII-1). LBH cite Bengio's 2014 paper on Generative Adversarial Networks (GANs)[GAN0-1] without mentioning that GANs are instances of the Adversarial Curiosity Principle of 1990[AC90-20][MIR](Sec. 5) (see Sec. XVII). In summation, LBH have repeatedly chosen to ignore the previous well-known critiques[DLC][HIN][T20a] and deep learning surveys,[DL1-2] and ACM's peer review process failed to catch this. ACM's Code of Ethics and Professional Conduct[ACM18] states: "Computing and deep learning (e.g., Sec. I), ACM lauds Numerous references can be found under the relevant section links I-XXI which adhere to the sequential order of ACM's text[T19] Sec. II: it became really deep in 1991 in my lab, unsupervised pre-training of NNs, supervised LSTM. Sec. I contains 4 subsections A, B, C, D A: Speech Recognition (see also Sec. VI & XI & XV): The first superior end-to-end neural speech recognition combines two methods from my lab: LSTM (1990s-2005) and CTC (2006), which were Hinton (2012) and Bengio (XV) our revolutionary CTC-LSTM which was soon on most smartphones. Sec. B: Natural Language Processing (see also Sec. VI & XI & XVI): (soon used for several billions of was also based on our LSTM. Sec. C: Robotics. most visible breakthroughs Sec. D: Computer Vision XVIII & XIV & XI & VI) and applied to speech. All before LeCun's CNN work (XVIII). deep NNs pre-training (in contrast to Hinton's claims). Our DanNet was the first CNN fast & deep enough for superior computer vision in 2011, winning 4 image recognition contests in a row is an open-gated version of our earlier Highway Nets. Sec. XIV: deep & fast CNN (where LeCun participated), Sec. XI: ACM mentions GPU-accelerated NNs deep GPU-NN of 2010 debunked unsupervised pre-training (introduced by myself in 1991 and later championed by Hinton), and our GPU-CNN of 2011 (DanNet) was the first XVIII: Fukushima and Waibel (see Sec. D). The first application of CNNs with backpropagation to biomedical/biometric images is due to Baldi and Chauvin.[BA93] VII: ACM explicitly mentions medicine and first to win medical imaging competitions Sec. XII & XIX & XXI: Modern backpropagation XIII & II & V III & IX & X & XX): Sec. XX: ACM credits LeCun for work on Sec. XXI: ACM credits LeCun for work on XV: ACM credits Bengio for hybrids of NNs and probabilistic models of sequences. CTC-LSTM A & B). XVI: ACM We started this in 1990-93 long before LBH Sec. XVII: Artificial Curiosity vanishing gradients (1991), metalearning (1987), unsupervised pre-training (1991), compressing or distilling one NN into another (1991), learning sequential attention with NNs (1990), fast weight programmers using and other topics.[R2-R6] Sec. IV is on Turing (1936) and his predecessors Critique of LBH's ACM article (Turing Lecture) of July 2021. Sec. Conclusion: In the recent decade of deep learning, (speech recognition, language translation, etc.) on billions of devices (also healthcare applications) Sec. II & III & V & XII & XIII & XVII & XIV & XIX & XX & XXI. In what follows, ACM's full text [T19] is split into 21 parts I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, XIX, XX, XXI.

      Critique of 2018 Turing Award LBH and their co-workers have contributed certain useful improvements of existing deep learning methods.[CNN2,4][CDI][LAN][RMSP][XAV][ATT14][CAPS] (1965),[DEEP1-2][R8] stochastic gradient descent for multilayer perceptrons (1967),[GD1-3] modern backpropagation (1970),[BP1-2][R7] architectures of recurrent NNs (1925-56)[I25][MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC90,90b][AC20] unsupervised pre-training for deep NNs (1991),[UN1-2] vanishing gradients (1991)[VAN1] & Long Short-Term Memory or LSTM (Sec. A), GPU-accelerated NNs (2004),[GPUNN][DAN][DAN1][GPUCNN5] NNs with over 100 layers (2015),[HW1-3][R5] transformer-like[TR1-6][FWP] attention[FWP][ATT] through fast weight programmers (1991).[FWP0-2,6] [DL1-2][R2-R8] Often LBH failed to cite essential prior work, even in their later surveys.[DL3,DL3a][DLC][HIN][MIR](Sec. 21)[R2-R5, R7-R8] This may explain some of ACM's misattributions.[T19] II & III & V & XIII & X & XVII & XII & XVIII & XX. The deep NNs By the 2010s,[DEC] they were academia and industry,[DL4] mentioned by ACM (labeled as A, B, C, D) below: Long Short-Term Memory or LSTM (1990s-2005)[LSTM0-6] vanishing gradient problem student Sepp Hochreiter in 1991.[VAN1] This happened long before the similar work of Bengio (see Sec. XVII).[MIR] (Sec. 3,Sec. 4) LSTM was refined with my student Felix Gers[LSTM2] through "forget gates" based on end-to-end-differentiable fast weights.[MIR](Sec. 8)[FWP,FWP0-1] (A2) Connectionist Temporal Classification by my student Alex Graves et al. (2006).[CTC] Our team successfully applied CTC-trained LSTM to speech in 2007[LSTM4] (also with hierarchical LSTM stacks[LSTM14]). Markov models (HMMs)[BW][BRI][BOU] (Sec. XV). Hinton et al. (2012) still used the old hybrid approach[HYB12] and did not compare it to CTC-LSTM. became the first recurrent NN (RNN) to win international competitions. He later reused our end-to-end neural speech recognizer[LSTM4][LSTM14] as a postdoc in Hinton's lab.[LSTM8] CTC-LSTM dramatically improved Google's speech recognition.[GSR][GSR15][DL4] on-device speech recognition[GSR19] (not any longer on the server) LSTM[MIR](Sec. 4) (see Sec. VI & XI & XV). of text[SNT] (see Sec. XVI). In 2001, we showed that LSTM can learn languages unlearnable by traditional models such as HMMs,[LSTM13] See also Sec. VI & XI & XV. tailored by Bengio's team.[ATT14][FWP] However, such attention mechanisms also have their roots in my lab (1991);[FWP][FWP0-2,6] see Sec. XVI. C. Robotics & RL etc. Since 2003, our team has used LSTM for Reinforcement Learning (RL) and robotics.[LSTM-RL][RPG][LSTMPG] In the 2010s, For example, in 2018, a PG-trained LSTM was the core of OpenAI's famous Dactyl which learned to control a dextrous robot hand without a teacher.[OAI1][OAI1a] beat a pro player in the game of Starcraft, which is theoretically harder than Chess or Go[DM2] in many ways, using Alphastar whose brain has a deep LSTM core trained by PG.[DM3] OpenAI Five which learned to defeat human experts in the Dota 2 video game (2018).[OAI2] Bill Gates called this a "huge milestone in advancing artificial intelligence".[OAI2a][MIR](Sec. 4)[LSTMPG] Apart from A, B, C above, in healthcare, chemistry, molecular design, lip reading, speech synthesis,[AM16] predicting what's going on in nuclear fusion reactors, and so on.[DEC][DL4] was being used for LSTM (only 5% for the CNNs of Sec. D).[JOU17] Apparently the first LSTM journal paper[LSTM1][R5] is now the 20th century D. Computer Vision was revolutionized in the 2010s by a particular feedforward neural net (NN) called the convolutional NN (CNN).[CNN1-4] The basic CNN architecture with convolutional and downsampling layers is due to Fukushima (1979),[CNN1] who also introduced the now widely used rectified linear units (ReLUs) in 1969.[RELU1] In 1987, NNs with convolutions were combined by Waibel with weight sharing and backpropagation.[CNN1a] Waibel did not call this CNNs but TDNNs. called max-pooling was introduced by Yamaguchi et al. for TDNNs in 1990[CNN3a] and by Weng et al. for higher-dimensional CNNs in 1993.[CNN3] Since 1989, LeCun's team has contributed improvements of CNNs, especially for images[CNN2,4] (see Sec. XVIII). Finally, my own team showed in 2010[MLP1] unsupervised pre-training is not necessary to train deep NNs, contrary to claims by Hinton[VID1] who said that "nobody in their right mind would ever suggest" this. Then we Our fast GPU-based CNN of 2011[GPUCNN1] known as DanNet[DAN,DAN1][R6] CNNs of 2006.[GPUCNN] winning four of them in a row (15 May 2011, 6 Aug 2011, 1 Mar 2012, 10 Sep 2012).[GPUCNN5] at IJCNN 2011 in Silicon Valley, DanNet blew away the competition and achieved the first superhuman visual pattern recognition[DAN1] in an international contest (where LeCun's team took a distant second place, with DanNet was also the first deep CNN to win: a Chinese handwriting contest (ICDAR 2011), an image segmentation contest (ISBI, May 2012), CVPR paper on DanNet[GPUCNN3] of Hinton's student Krizhevsky won the ImageNet[IM09] 2012 contest[GPUCNN4-5][R6] (now also without unsupervised pre-training, citing DanNet). Our CNN image scanners were 1000 times faster than previous methods.[SCAN] The VGG network (ImageNet 2014 winner)[GPUCNN9] and other highly cited CNNs[RCNN1-3] further extended the work of 2011.[MIR](Sec. 19) ResNet, the ImageNet 2015 winner[HW2] (Dec 2015) and currently the most cited neural network,[MOST] is a version (with open gates) of our earlier Highway Net (May 2015).[HW1-3][R5] The Highway Net is actually the feedforward net version of vanilla LSTM.[LSTM2] It was the first working, really deep feedforward NN with hundreds of layers (previous NNs had at most a few tens of layers). See also Sec. XVIII & XIV & XI & VI.

      Critique of 2018 Turing Award appeared long before the 1980s. The first non-learning recurrent NN (RNN) architecture (the Lenz-Ising model) was analyzed by physicists in the 1920s.[L20][I25][K41][W45] were also discussed in 1943 by McCulloch and Pitts[MC43] and formally analyzed in 1956 by Kleene.[K56] In 1972, Amari reused the Lenz-Ising model to build a learning RNN, later sometimes called the Hopfield network or Amari-Hopfield Network.[AMH1-3] artificial evolution[TUR1] and single adaptive layer learned in 1958[R58] (Joseph[R61] Widrow & Hoff's similar Adaline learned in 1962.[WID62] regression and the method of least squares[DL1-2] multilayer perceptrons (MLPs) were discussed by Steinbuch[ST61-95] (1961), Joseph[R61] (1961), and Rosenblatt[R62] (1962), who wrote about "back-propagating errors" in an MLP with a hidden layer,[R62] but did not yet have a general deep learning algorithm for deep MLPs (what's now called backpropagation is quite different and was first published by Linnainmaa in 1970[BP1-BP5][BPA-C]). Compare also Selfridge's multilayer Pandemonium[SE59] (1959). containing the now popular multiplicative gates).[DEEP1-2][DL1-2] A paper of 1971[DEEP2] already described a deep learning net with 8 layers, trained by their highly cited method which was still popular in the new millennium,[DL2] especially in Eastern Europe, where much of Machine Learning was born.[MIR](Sec. 1)[R8] LBH failed to cite this, just like they failed to cite Amari,[GD1] who in 1967 proposed stochastic gradient descent[STO51-52] (SGD) for MLPs and whose implementation[GD2,GD2a] (with Saito) learned internal representations at a time when compute was billions of times more expensive than today (see also Tsypkin's work[GDa-b]). deep convolutional NN architecture was first introduced in the 1970s;[CNN1] his very popular ReLU already in 1969.[RELU1-2] XIII, III, V, VIII, IX, and X. LBH & co-authors, e.g., Sejnowski[S20] (see Sec. XIII). It goes more or less like this: "In 1969, Minsky & Papert[M69] researchers took a fresh look at the problem in the 1980s."[S20] However, as mentioned above, the 1969 book[M69] addressed a "problem" of Gauss & Legendre's shallow learning (~1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method[DEEP1-2][DL2] (and then also by Amari's SGD for MLPs[GD1-2]). Minsky was apparently unaware of this and failed to correct it later.[HIN](Sec. I) (but see a 1989 paper[MOZ]). However, it became really deep in 1991 in my lab,[UN-UN3] which has See Sec. 1 of the overview:[MIR] First Very Deep NNs, Based on Unsupervised Pre-Training (1991). "Very Deep Learning" tasks of depth > 1000.[UN2][DL1][UN] (By 2003, LSTM variants successfully dealt with language problems of depth up to 30,000[LSTM17] more.) drove the shift from unsupervised pre-training to purely supervised learning (1991-95; 2006-10).[HIN](Sec. II)[MIR] (Sec. 19) III. Note that LSTMs brought essentially unlimited depth to gradient-based supervised recurrent NNs; Highway Nets[HW1-3] brought it to feedforward NNs.[MOST]

      Critique of 2018 Turing Award by others (Sec. III).[DLC][DEEP1-2][BP1][DL1-2][R7-R8][R2-R4] deep learning multilayer perceptrons (1965),[DEEP1-2][R8] stochastic gradient descent for multilayer perceptrons (1967),[GD1-3] modern backpropagation (1970),[BP1,2][R7] architectures of recurrent NNs (1925-56)[I25][MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC90,90b][AC20] unsupervised pre-training for deep NNs,[UN1-2] the vanishing gradient problem (1991)[VAN1] & solutions to it (Sec. A), GPU-accelerated NNs (2004),[GPUNN][GPUCNN5] and other foundations.[DL1-2][R2-R8] Often LBH failed to cite essential prior work.[DLC][HIN][MIR](Sec. 21) II & V & XIII & IX & X & XVII & XII & XVIII & XX & I. deeplearning.net which until 2019 advertised deep learning as "moving beyond shallow machine learning since 2006",[DL7] referring to Hinton's[UN4] and Bengio's[UN5] we had this type of deep learning already in 1991;[UN][UN1-2] see Sec. II & XVII (5). Not to mention Ivakhnenko's even earlier supervised layer-wise training of deep NNs[DEEP1-2] which Hinton,[UN4] Bengio,[UN5] and LBH[DL3,DL3a] did not cite either. See Sec. X.

      Critique of 2018 Turing Award my comments systematically track the sequential order of ACM's claims.[T19]

      ACM's statement on Turing is greatly misleading, like some of its other statements.[T19] any type of computation-based AI.[GOD][BIB3][MIR](Sec. 18)[GOD21,21a] Much of early AI in the 1940s-70s was actually about theorem proving[ZU48][NS56]

      In 1936, Turing Turing Machine.[TUR] He rederived the above-mentioned result,[CHU][TUR][HIN][GOD21,21a][TUR21][LEI21,21a] In the same year of 1936, Emil Post published yet another independent universal model of computing,[POS] my reply to Hinton who criticized my website on Turing without suggesting any fact-based corrections.[HIN]) open problem "P=NP?" in his famous letter to John von Neumann (1956).[GOD56][URQ10] Likewise, Konrad Zuse (1910-1995) created the world's first working programmable general-purpose computer 1935-41. His patent application of 1936[ZU36-38][Z36][RO98][ZUS21] predating Claude Shannon's 1937 thesis on digital circuit design.[SHA37] Zuse also created the first high-level programming language in the early 1940s.[BAU][KNU] conditional jump instruction.[RO98]

      Critique of 2018 Turing Award that learn internal representations (1965),[DEEP1-2][R8] stochastic gradient descent for multilayer perceptrons (1967),[GD1-3] modern backpropagation (1970),[BP1,2][R7] architectures of recurrent NNs (1925-56)[I25][MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC][AC90,90b][AC10][AC20] unsupervised pre-training for deep NNs (1991),[UN1-2][UN] vanishing gradients (1991)[VAN1] & solutions to it (Sec. A),[LSTM0-17][CTC] (2004),[GPUNN][GPUCNN5] record-breaking deep supervised NNs (2010)[MLP1-2] and contest-winning deep CNNs (2011),[DAN][DAN1][GPUCNN5] NNs with over 100 layers (2015),[HW1-3][R5] transformer-like[TR1-6][FWP] attention[FWP][ATT] through fast weight programmers (1991),[FWP0-2,6] and more.[DL1-2][R2-R8] Often LBH failed to cite essential prior work.[DL3,DL3a][DLC][HIN][MIR](Sec. 21)[R2-R5,R7,R8,R11] II & I & III & XIII & X & XVII & XII & XVIII & XX.

      Critique of 2018 Turing Award "advances in natural language processing" and in speech supervised NNs and CNNs achieved by our group 2010-2011[MLP1-2][DAN][DAN1][GPUCNN5][R6] and through Highway Net-like NNs (2015),[HW1-3][R5] although the principles of CNNs were invented and developed by others since the 1970s.[CNN1-4] See Sec. D & XVIII & XIV as well as Sec. 4 & Sec. 19 of the overview.[MIR]

      Critique of 2018 Turing Award Baldi and Chauvin (1993) had the first application of CNNs with backpropagation to biomedical/biometric images.[BA93] DanNet[DAN][DAN1][GPUCNN5] the first NN to win a medical imaging contest through deep learning (Sept 2012, on cancer detection).[GPUCNN5,8] and were able to greatly improve steel defect detection.[ST] All of this happened before the similar GPU-accelerated AlexNet of Hinton's student Krizhevsky[GPUCNN4-5][R6] and the VGG network[GPUCNN9] mitosis detection.[MGC][GPUCNN5,8] approach of D & XI).

      Critique of 2018 Turing Award without citing them.[DL1][DLC][HIN][R2-R4][R7-R8] V & XII & XIX & II & III & XIII & XVII & X & I.

      Critique of 2018 Turing Award who failed to cite them, even in later work.[HIN][DLC][DL1-2][DEEP1-2][RELU1-2][R7-R8] See Sec. II & III & XIII & V & X & XIV & I.

      Critique of 2018 Turing Award first introduced to Machine Learning by Dechter (1986), and to NNs by Aizenberg et al (2000).[DL2] To my knowledge, LBH have never cited them. (Margin note: our 2005 paper on deep RL[DL6,6a] was the first machine learning LBH started talking about "deep learning ... moving beyond shallow machine learning since 2006",[DL7] referring to their unsupervised pre-training methods of 2006. See Sec. III. others built careers on this notion long before LBH recognized this.[DEEP1-2][CNN1][HIN][R8][DL1][DLC] Even deep learning through unsupervised pre-training was introduced by others.[UN1-3][R4][HIN](Sec. II) II & III & XIII & V & I.

      Critique of 2018 Turing Award ignored by LBH's papers[HIN][R7-R8][R2-R5] (see Sec. V & II & III & I & XIII & XII & XIX & X & XVII).

      ACM correctly mentions advancements through GPUs. The first to use GPUs for NNs were Jung & Oh (2004),[GPUNN][GPUCNN5] made GPU-based NNs fast and deep enough an important benchmark record,[MLP1-2] unsupervised pre-training (pioneered by myself in 1991) is not necessary to train deep NNs, contrary to Hinton's claims.[VID1] our CNNs were deep and fast enough[DAN][DAN1][GPUCNN5] vision (explicitly mentioned by ACM) for the first time[R6] (see Sec. D).

      Furthermore, by the mid 2010s, speech recognition and machine translation (explicitly mentioned by ACM) were actually dominated by LSTM and CTC of our team.[LSTM1-4][CTC] In particular, as mentioned in Sec. A, such as HMMs.[BW][BOU][BRI][HYB12] As mentioned in Sec. B and XVI, the first superior end-to-end neural machine translation was also based on LSTM.

      Critique of 2018 Turing Award ACM's statement is "less wrong" than Honda's[HIN](Sec. I) but still (and apparently even other award committees[HIN](Sec. I)) backpropagation by Rumelhart et al. (1985-86)[RUM] (1982).[BP2] And the article[RUM] even failed to mention Linnainmaa, the inventor of this famous algorithm for credit assignment in networks (1970),[BP1] Kelley already had a precursor thereof in the field of control theory;[BPA] see also later work of the early 1960s.[BPB][BPC][R7] internal representations in hidden layers of NNs.[RUM] But this was essentially just an experimental analysis of a known method.[BP1-2] And history of backpropagation can be found at Scholarpedia[DL2] and in my award-winning survey.[DL1] Also see Sec. XIX, II.

      Some claim that "backpropagation is just the chain rule of Leibniz (1676) & L'Hopital (1696)." No, it is the efficient way of applying the chain rule to big networks with differentiable nodes (there are also many inefficient ways of doing this). It was not published until 1970.[BP1] recent debate:[HIN] It is true that in 2018, Hinton[AOI] Rumelhart[RUM] with the "invention" of backpropagation. for "creating" the method and for other things he didn't do.[HIN] Neither in a popular book[AOI] nor in other recent work[DL3,DL3a] did he cite Linnainmaa (1970),[BP1] the true creator.[BP4-5] that his 2015 survey[DL3] does cite Werbos (1974) who however described the method correctly only later in 1982[BP2] and also failed to cite Linnainmaa.[BP1] Compare the 1967-68 work of Amari:[GD1-3] to my knowledge the first to propose and implement stochastic gradient descent[STO51-52] reverse mode gradient descent method now known as backpropagation[BP1]); see also Tsypkin's work of 1966.[GDa-b] Linnainmaa's backpropagation method was well-known.[BP5][DL1-2][DLC] It wasn't created by "lots of different people" as Hinton suggested[AOI][HIN][R11] one person who published first[BP1] and therefore should get the credit.

      Critique of 2018 Turing Award Boltzmann Machine (BM)[BM] a learning.[HIN] Recently, however, I learnt through a reader that even the BM paper[BM] did not cite prior relevant work by Sherrington & Kirkpatrick[SK75] and Glauber.[G63] (Compare related work.[H86][H88][S93]) multilayer perceptrons with arbitrarily many layers.[DEEP1-2][HIN] Sec. II V & X.[MIR](Sec. 1)[R8]

      As mentioned in Sec. II, Sejnowski's rather self-serving "history of deep learning" [S20] claims: In 1969, Minsky & Papert[M69] at the problem in the 1980s."[S20] However, the 1969 book[M69] addressed a "deep learning problem" (a limitation of Gauss & Legendre's shallow learning around 1800[DL1-2]) that had already been solved four years prior (see Sec. II), also in the 1970s, especially outside of the Anglosphere.[DEEP2][GD1-3][CNN1][DL1-2]

      Critique of 2018 Turing Award Dropout is actually a variant of Hanson's much earlier stochastic delta rule (1990).[Drop1-3] Hinton's 2012 paper and his later patent did not cite this either. as we showed already in 2011 in a contest where LeCun's team participated as well,[DAN1] Sec. D above. Back then, the only really of deep CNNs through GPUs.[GPUCNN1,3,5][R6] Already before ImageNet 2012,[R6] fast deep CNN called DanNet a monopoly on winning computer vision competitions.[GPUCNN5] It more than "halved the error rate for object recognition" (ACM's wording) in a contest already in 2011[GPUCNN2][DAN,DAN1][R6] long before the similar system of Hinton's student. See Sec. D as well as Sec. 19 of the overview.[MIR]

      Critique of 2018 Turing Award since the late 1980s.[BW][BRI][BOU] LSTM (1990s-2005)[LSTM0-6] and CTC[CTC] (2006), which were applied to speech in 2007.[LSTM4][LSTM14] CTC-LSTM is end-to-end-neural and thus very different from (and superior to) the hybrid methods since the late 1980s.[BW][BRI][BOU][HYB12] See also Sec. A.

      Critique of 2018 Turing Award 5 years earlier, in 1995, we already had a similar, excellent neural probabilistic text model.[SNT] Bengio[NPM] characterizes it only briefly as "related" (see also Pollack's earlier work on embeddings of words and other structures[PO87][PO90]). In the 2010s, was actually the LSTM of our team,[LSTM0-6] which Bloomberg called the "arguably the most commercial AI achievement."[AV1][MIR](Sec. 4) See Sec. B. Bengio's team[ATT14] has indeed become important. For example, it helped to further improve Facebook's LSTM-based translation (see Sec. B). adaptive neural sequential attention: end-to-end-differentiable "soft" attention in the latent space of Fast Weight Programmers (FWPs),[FWP2][FWP] and "hard" attention (in observation space) in the context of RL[ATT][ATT0-1] (1990). attention-based Transformers[TR1-6] are FWPs of 1991[FWP0-1] which have become a popular alternative to RNNs. My FWP of 1991[FWP0-1] (now often called keys and values for self-attention).[TR1-6][FWP] the 2010s,[DEC] Transformers[TR1-2] a traditional LSTM domain (see Sec. B). rapidly learn to solve quickly[LSTM13,17] linear Transformers or Performers[TR5-6] which are formally equivalent to my 1991 FWPs (apart from normalization).[FWP6][FWP] In 1993, I introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves.

      See[MIR](Sec. 9)[R4] for my related priority dispute on attention with Hinton. He was the reviewer of my 1990 paper[ATT2] his own work:[ATT3]

      Critique of 2018 Turing Award GANs[GAN0-1] (2010-2014) are actually a simple application[AC] of the adversarial curiosity (AC) principle from 1990[AC90,90b][AC20] (see also surveys[AC09-10]). This principle is now widely used for exploration in RL (e.g., Sec. C) and for image synthesis[GAN1] (also mentioned by ACM in Sec. XVIII). predictor NN minimizes its error, while the generator NN tries to make outputs that maximize this error: one net's loss is the other net's gain. 4 years before the GAN paper,[GAN1] a well-known 2010 survey[AC10] summarised the generative adversarial NNs of 1990 as follows: a whether the controller's (or generator's) output is in a given set.[AC20][AC] early adversarial machine learning settings[S59][H90] neither involved unsupervised NNs nor were about modeling data nor used gradient descent.[AC20]) Bengio et al. neither cited the original work[AC90,90b][AC20] nor corrected their erroneous claims[GAN1] about the other (1991).[PM1-2][AC20][R2][MIR](Sec. 5) Bloomberg,[AV1] their NIPS 2014 paper[GAN1] and some of the erroneous claims it made about my prior work.[AC20] Goodfellow eventually admitted that PM is adversarial (his paper[GAN1] still claims the opposite), but emphasized that it's not generative. However, the even earlier AC[AC90,90b][AC10][AC20] is both adversarial and generative (its generator contains probabilistic units[AC90] like in StyleGANs[GAN2]). When the authors[GAN1] I published one myself in the hopes of correcting the annals of history.[AC20] that they are instances of my earlier work.[R2][AC20] vanishing gradient problem,[MIR](Sec. 3)[VAN1] Bengio published his own,[VAN2] without citing Sepp. was settled in favor of Sepp.[VAN1] However, even after a common publication,[VAN3] Bengio published papers[VAN4][XAV] are poor indicators of truly pioneering work.[NAT1] (Margin note: Bengio states[YB20] that in 2018 he one must at least clarify it later,[DLC] Bengio also claims[YB20] that in 1995 my publications on exactly this topic date back to 1991-93.[UN0-2][UN] which I started in 1987[META1][META] long before Bengio that he did it before me.[R3] Bengio also writes[YB20] that in Regarding attention-based Transformers,[TR1-6] Bengio[DL3a] cites his own team (2014) for "soft attention" without citing my much earlier original work of 1991-1993 on soft attention and linear Transformers.[FWP,FWP0-2,6] Bengio has also heavily used our LSTM (see Sec. A-C), "gated recurrent units (GRU)"[LSTMGRU] for a variant of our vanilla LSTM architecture[LSTM2] (2000) which he did not cite although our work[LSTM2] was the one that introduced gated recurrent units. In addition, our team automatically evolved lots of additional LSTM variants and topologies already in 2009[LSTM7] without changing the name of the basic method. learn to count[LSTMGRU2] nor learn simple non-regular languages;[LSTMGRU2] they according to Google Brain.[LSTMGRU3]) unsupervised pre-training for deep NNs.[UN0-4][HIN](Sec. II)[MIR](Sec. 1) Hinton's paper[UN4] (2006) appeared long after my earlier work on this[UN0-2] the first NNs shown to solve very deep problems (see Sec. II above).[UN] It was published in 1991-92[UN1] when compute was about 1000 times more expensive than in 2006. survey (2015),[DL3][DLC] See also Sec. II & III. compressing or distilling one NN into another.[UN0-2][DIST1-2][MIR](Sec. 2) Hinton[DIST2] (2006) did not cite my much earlier original work on this (1991),[UN1][UN] not even in his later patent application fast weight programmers[FWP][FWP0-4a] through tensor-like outer products (1991-2016) and their motivation[FWP2][FWP4a][MIR](Sec. 8) (see also Sec. XVI above). learning sequential attention with NNs.[MIR](Sec. 9) Hinton[ATT3] (2010) our much earlier work on this[ATT1][ATT] although he was both reviewer and editor of my summary[ATT2] (1990; see Sec. XVI above).

      The ten priority disputes mentioned in the present Sec. XVII are not on the only ones.[R4] Remarkably, three of them are related to the 1991 paper[UN1][UN] which in many ways started what people now call deep learning, going beyond Most of them go back to work of 1990-91.[MIR] See Sec. I for additional related issues of credit assignment.

      Critique of 2018 Turing Award LeCun's team has made important contributions to CNNs since 1989.[CNN2,4] However, the basic CNN architecture with convolutional and downsampling layers is actually due to Fukushima (1979).[CNN1] NNs with convolutions were later (1987) combined by Waibel with weight sharing and backpropagation.[CNN1a] Waibel called this TDNN and All of this happened before LeCun's work on CNNs. See Sec. D above and Sec. 21 of the overview of our Annus Mirabilis 1990-1991.[MIR] at IJCNN 2011 in Silicon Valley, our DanNet[DAN][GPUCNN1-3] won the superhuman performance three times worse performance).[DAN1] Again see Sec. D. Baldi and Chauvin (1993) had the first application of CNNs with backpropagation to biomedical/biometric images.[BA93] at ICPR 2012, our DanNet[GPUCNN1-3] won the medical imaging contest (Sept 2012, on detection of mitosis/cancer)[GPUCNN5,7,8] (before the similar AlexNet won ImageNet 2012[GPUCNN5][R6] and the similar VGG network[GPUCNN9] won ImageNet 2014). mitosis detection.[MGC][GPUCNN5,7,8] Many major companies are using it now. See Sec. D & VII. ACM also explicitly mentions speech recognition, speech synthesis,[AM16][DL1] All of these fields were heavily shaped in the 2010s by our non-CNN methods.[DL1][DL4][AM16][GSR][GSR15][GT16][WU][FB17] See Sec. A, B, VI, XI.

      Critique of 2018 Turing Award As mentioned in Sec. XII, backpropagation was actually proposed earlier as a learning method for NNs by Werbos (1982)[BP2-4] (see also Amari's work on SGD for MLPs of 1967-68[GD1-2a]). recent work.[DL3,DL3a][DLC] In 1960, Kelley already had a precursor of the algorithm.[BPA] Furthermore, many besides LeCun have worked "to speed up backpropagation algorithms"[DL1] (ACM's wording). More on the history of backpropagation can be found at Scholarpedia.[DL2][BP4]

      Critique of 2018 Turing Award However, "hierarchical feature representation" in deep learning networks is what Ivakhnenko & Lapa (1965)[DEEP1-2] and Amari[GD1-2] (and also Fukushima[CNN1][DL2]) had long before LeCun. See Sec. D & II & XIII & V.

      Critique of 2018 Turing Award LeCun et al. neither cited the origins[BP1] (1970) of this widely used type of automatic differentiation for differentiable networks of modules[DL2][BP4-5][DLC] for such systems.[S80] See also Sec. XIX & XII. before LeCun who did not cite them. See also Pollack's even earlier relevant work;[PO87-90] compare the important work of Baldi and colleagues.[BA96-03]

      (Furthermore, "complex networks of modules where backpropagation is performed" were the central theme of my much earlier habilitation thesis (1993).[UN2] For example, our adaptive subgoal generators (1991)[HRL0-2] were trained through end-to-end-differentiable chains of such modules.[MIR](Sec. 10) planning and reinforcement learning with recurrent neural world models (1990).[PLAN][MIR](Sec. 11) Same for my linear transformer-like fast weight programmers[FWP0-2][FWP][ATT][MIR](Sec. 8) since 1991 (see Sec. XVI) see "100 Authors against Einstein."[AH1] ad hominem attacks[AH2-3][HIN] "If you cannot dispute a fact-based message, attack the messenger himself."[HIN] Science has a well-established way of dealing with plagiarism (which may be unintentional[PLAG1][CONN21] or not[FAKE2]) award can ever change that.[HIN] and their co-workers have contributed useful improvements of deep learning methods.[CNN2,4][CDI][LAN][RMSP][XAV][ATT14][CAPS] whom they did not cite, in contrast to ACM's Code of Ethics and Professional Conduct[ACM18] II, V, XII, XIX, XXI, XIII, XIV, XI, and XX, and 2). Sec. I, A, B, C, D, XVII, VI, and XVI). As emphasized earlier:[DLC][HIN] to self-correction,"[SV20] as is already the standard in other scientific fields. in popular science venues without peer review? For example, the narrator of a popular 2018 Bloomberg video[VID2] Germany and Switzerland (LSTM & CTC; see Sec. A) long before Hinton's methods. Similarly, in 2016, the NY Times published an article[NYT3] Google's original 2016 paper on Google Translate[WU] mentions LSTM over 50 times (see Sec. B). In ad hominem style,[AH2-3] claiming credit he doesn't deserve for many, many things",[NYT1] without LeCun also called the GANs of Bengio's team[GAN1] GANs are variations of my work in 1990.[AC90,90b][AC20][R2] According to Bloomberg,[AV2] Bengio has simply "denied my claims" without backing up his denial by any facts; see Sec. XVII. and forcefully contradict public figures who promote it."[FAKE] LBH, who called themselves the deep learning conspiracy,[DLC][DLC1-2] Our LSTM paper[LSTM1] has got more citations than any paper by Bengio or LeCun,[R5] Hinton's most cited paper (2012) is the one on GPU-based CNNs.[GPUCNN4][R5] It follows our earlier work on supervised deep NNs (2010)[MLP1] unsupervised pre-training for deep NNs by myself [UN][UN0-3] and later championed by Hinton;[UN4][VID1] see Sec. D). Hinton (2012)[GPUCNN4] characterizes our deep and fast DanNet (2011)[GPUCNN1-3] as AlexNet won one;[R6] see Sec. D, XIV. The highly cited VGG network (2014)[GPUCNN9] Hinton's 2nd most cited paper[RUM][R5] of Hinton's paper,[RUM] adding citations for a book by Rumelhart & McClelland[R5]). Backpropagation is a previously invented method[BP1] whose origins of Ivakhnenko whom he has never cited;[DEEP1-2][R7-R8] see Sec. II, XIII. Bengio's 2nd most cited research paper is the one on GANs (2014),[GAN1] which are instances of my artificial curiosity (1990)[AC90,90b][AC20][R2] which he did not cite; see Sec. XVII. Hinton's highly cited papers on unsupervised pre-training for deep NNs (2006-)[UN4] by ours[UN0-2][UN] were preceded by Hanson's[Drop1-3] As recently as of 2021, ACM published yet another misleading deep learning "survey" by LBH,[DL3a] again heavily citing LBH without Consult the Executive Summary and Sec. I-XXI of this critique for more. So virtually all the algorithms that have attracted have their conceptual and technical roots in my labs in Munich and Lugano,[MOST] of deep learning MLPs since 1965[DEEP1-2][GD1-2a] (see Sec. II, XX) and backpropagation (1960-70)[BPA][BP1] (see Sec. XIX, XII) and convolutional NNs since 1979[CNN1-4] (see Sec. XVIII, D). Our LSTM (1990s, see Sec. A, B; also for RL, 2003-, see Sec. C) → our Highway Net (May 2015) → ResNet (Dec 2015, see Sec. D). Our adversarial Artificial Curiosity (1990) → GANs (2010s, see Sec. XVII). our own unsupervised pre-training of deep NNs (1991, see Sec. II & III) for recurrent NNs in the 1990s → our LSTM (see Sec. A-C) and for feedforward NNs in 2010 → our DanNet (2011) → AlexNet (2012); VGG Net (2014) (see Sec. D). our LSTM brought essentially unlimited depth to gradient-based supervised recurrent NNs in the 1990s; our Highway Nets[HW1-3] brought it to feedforward NNs in May 2015.[MOST] superior computer vision (2011, see Sec. D, XVIII), medical diagnosis (2012, see Sec. VII, XVIII), and many other applications.[DEC] speech recognition (with our CTC, 2007-15, see Sec. A), machine translation (2016, see Sec. B), robotics & video game players (2018-19, see Sec. C), and many other applications.[DEC] Fast Weight Programmers (1991, see Sec. XVI) are formally equivalent to linear Transformers (now popular in NLP). I, A, B, C, D, VII, XVIII.

      As mentioned earlier,[MIR](Sec. 21) it is not always clear[DLC] depth that really learned.[DEEP1-2][R8] Soon afterwards, multilayer perceptrons learned internal representations through stochastic gradient descent in Japan.[GD1-2a] A few years later, modern backpropagation unintentional[PLAG1][CONN21] or intentional.[FAKE2]

      Yes, this critique is also an implicit critique of certain other awards to LBH.[HIN] reddit.com/r/MachineLearning[R1-R12] (the largest machine learning forum with back then over 800k subscribers), many of them influenced by my overview.[MIR]

      Dr. LeCun himself is well aware of the challenges to scientific integrity in our field:[LECP] "... else cites."[LECP] weights and an adaptive output layer.[R62] So Rosenblatt basically had what much later was rebranded as Extreme Learning Machines (ELMs)[ELM1] revisionist narrative of ELMs[ELM2][CONN21] self-proclaimed "deep learning conspiracy"[DLC1-2]

      Note that I am insisting on proper credit assignment not only in my own research field but also in quite disconnected areas,[HIN] as demonstrated by my numerous letters in this regard published in Science and Nature, e.g., on the history of aviation,[NASC1-2] the telephone,[NASC3] the computer,[NASC4-7] resilient robots,[NASC8] and scientists of the 19th century.[NASC9] AI scientists and AI historians equipped with artificial curiosity[SA17][AC90-AC20][PP-PP2][R1]

      Creative Commons LicenseThanks publication page and my arXiv page. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. J.  Schmidhuber (AI Blog, 2021). 3 decades of artificial curiosity & creativity. Our PDF. The first paper on planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks (more). PDF. More. PDF. PDF. PDF. PDF. (More on artificial scientists and artists.) IEEE link. PDF. With a brief summary of the generative adversarial neural networks of 1990[AC90,90b][AC20] (more). Preprint arXiv/1906.04493. ACM Code of Ethics and Professional Conduct. Association for Computing Machinery (ACM), 2018. Quote: Link. Link. [AIB] J. Schmidhuber. AI Blog. Includes variants of chapters of the AI Book. Blog of Werner Vogels, CTO of Amazon (Nov 2016): PDF. First publication of what was later sometimes called the Hopfield network[AMH2] or Amari-Hopfield Network.[AMH3] The Hopfield network or Amari-Hopfield Network was published in 1972 by Amari.[AMH1] [ATT] J. Schmidhuber (AI Blog, 2020). 30-year anniversary of end-to-end differentiable sequential neural attention. Plus goal-conditional reinforcement learning. We had both hard attention (1990) and soft attention (1991-93).[FWP] Today, both types are very popular. PDF. PDF. More. PS. (PDF.) arXiv/1409.0473, 2014-16. Bloomberg, May 15, 2018. Bloomberg, May 17, 2018. PDF. HTML. PDF. Precursor of modern backpropagation.[BP1-4] PDF. Link. PDF. First application of backpropagation[BP1] to NNs (concretizing thoughts in his 1974 thesis). [BP4] J. Schmidhuber (AI Blog, 2014; updated 2020). Who invented backpropagation? More.[DL2] English version: [CNN1+]. More in Scholarpedia. Link. [CNN1a] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. First application of backpropagation[BP1][BP2] and weight-sharing PDF. Spatial Averaging.[CNN1] Spatial Averaging.[CNN1] PDF. PDF. PDF. Since November 2021: Comments on version 1 of the present report[T21v1] in the Connectionists Mailing List, perhaps the oldest mailing list on artificial neural networks. Link to the archive. PDF. Beijing, 2014. Preprint arXiv:1402.3511 [cs.NE]. J. Schmidhuber (AI Blog, 2021). 10-year anniversary. In 2011, DanNet triggered the deep convolutional neural network (CNN) revolution. Named 1st superhuman result in 2011.[DAN1] J. Schmidhuber (AI Blog, 2011; updated 2021 for 10th birthday of DanNet): First superhuman visual pattern recognition. our artificial neural network called DanNet [DEC] J. Schmidhuber (AI Blog, 02/20/2020, revised 2021). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s. The [DIST1] J. Schmidhuber, 1991.[UN-UN2] More. Deep Learning. HTML. [DL3a] Y. Bengio, Y. LeCun, G. Hinton (2021). Turing Lecture: Deep Learning for AI. Communications of the ACM, July 2021. HTML. Local copy (HTML only). [DL4] J. Schmidhuber (AI Blog, 2017). Our impact on the world's most valuable public companies: Apple, Google, Microsoft, Facebook, Amazon... By greatly improved (CTC-based) on-device speech recognition (on the phone, not the server) LSTM. PDF. J. Schmidhuber (AI Blog, Nov 2020). 15-year anniversary: 1st paper with "learn deep" in the title (2005). Our deep reinforcement learning & neuroevolution solved problems of depth 1000 and more.[DL6] Soon after its publication, everybody started talking about "deep learning." Causality or correlation? Web site deeplearning.net of Y. Bengio's MILA (2015, retrieved May 2020; compare the version in the Internet Archive), referring to Hinton's[UN4] and Bengio's[UN5] unsupervised pre-training for deep NNs[UN4] (2006) although this type of deep learning dates back to 1991.[UN1-2][UN] II & XVII & III. [DLC] J. Schmidhuber (AI Blog, June 2015). Critique of Paper by self-proclaimed[DLC1-2] "Deep Learning Conspiracy" (Nature 521 p 436). arxiv:1312.5602. Link. Alphastar has a "deep LSTM core." arXiv:1808.03578, 2018. In fact, the ELM concept goes back to Rosenblatt's work around 1960.[R62] used LSTM over 4 billion automatic translations per day (The Verge, August 4, 2017); Facebook blog by J.M. Pino, A. Sidorov, N.F. Ayan (August 3, 2017) PDF. J.  Schmidhuber (AI Blog, 26 March 2021). alternative[FWP0-1] to recurrent NNs. the fast weights[FAST,FASTa] of Such Fast Weight Programmers[FWP0-6,FWPMETA1-7] can learn to memorize past data, e.g., by computing fast weight changes through additive outer products of self-invented activation patterns[FWP0-1] (now often called keys and values for self-attention[TR1-6]). The similar Transformers[TR1-2] combine this with projections linear Transformers or Performers[TR5-6] In 1993, I introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves. PDF. PDF. HTML. Pictures (German). PDF. Preprint: arXiv:1811.12143. PDF. PDF. Like [FWP0-2]. Preprint: arXiv:2003.08165. PDF. HTML overview. Linear Transformers Are Secretly Fast Weight Programmers. ICML 2021. Preprint: arXiv:2102.11174. Preprint: arXiv:2106.06295 (June 2021). PDF. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here. Preprint arXiv:2012.14905 [cs.LG], 2020. Report arXiv:2011.07831 [cs.AI], 2020. PDF. Probably the first paper on using stochastic gradient descent[STO51-52] reverse mode of automatic differentiation or backpropagation[BP1]). OCR-based PDF scan of pages 94-135 (see pages 119-120). Implementation of Amari's 1967 stochastic gradient descent method for multilayer perceptrons.[GD1] (S. Amari, personal communication, 2021.) Google Research Blog, Sep 2015, see also Aug 2015 Google's speech recognition based on CTC and LSTM. Alphr Technology, Jul 2015, or 9to5google, Jul 2015 WIRED, Sep 2016, siliconANGLE, Sep 2016 Blog post, Internet Archive, 2010. A blog post describing the basic ideas[AC][AC90, AC90b][AC20] of GANs. Description of GANs that does not cite the original work of 1990[AC][AC90,AC90b][AC20][R2] (also containing wrong claims about Predictability Minimization[PM0-2][AC20]). Link. This was number 1 on Hacker News. Frankfurter Allgemeine Zeitung, 16/6/2021. Preprint arXiv/2005.14165. for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. PDF. ArXiv preprint. win four important computer vision competitions 2011-2012 before others won any PDF. HTML overview. competitor.[DAN1] This led to massive interest from industry. [GPUCNN3] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, July 2012. PDF. Longer TR of Feb 2012: arXiv:1202.2745v1 [cs.CV]. More. PDF. J. Schmidhuber (AI Blog, 2017; updated 2021 for 10th birthday of DanNet): History of computer vision contests won by deep CNNs since 2011. DanNet won 4 of them in a row before the similar AlexNet/VGG Net and the Resnet (a Highway Net with open gates) joined the party. Today, deep CNNs are standard in computer vision. PDF. PDF. first deep learner to win a medical imaging contest (2012). HTML. [HIN] J. Schmidhuber (AI Blog, 2020). Critique of Honda Prize for Dr. Hinton. Science must not allow corporate PR to distort the academic record. PDF. North-Holland, 1991. PDF. Extending TR FKI-129-90, TUM, 1990. PDF. PDF. Preprints arXiv:1505.00387 (May 2015) and arXiv:1507.06228 (July 2015). Also at NIPS 2015. The LSTM with forget gates[LSTM2] for RNNs.) Resnets[HW2] are a version of this where the gates are always open: g(x)=t(x)=const=1. Highway Nets perform roughly as well as ResNets[HW2] on ImageNet.[HW3] Highway layers are also often used for natural language processing, where the simpler residual layers do not work as well.[HW3] More. Link. arXiv:1512.03385 (Dec 2015). Residual nets are a version of Highway Nets[HW1] More. arxiv:1612.07771 (2016). Also at ICLR 2017. Preprint arXiv:1704.04760 PDF. PDF. arXiv:1607.06450, 2016. A New Publishing Model in Computer Science. Local copy (HTML only). [LEI21] J. Schmidhuber (AI Blog, 2021). 375th birthday of Leibniz, founder of computer science. Frankfurter Allgemeine Zeitung (FAZ), 17/5/2021. FAZ online: 19/5/2021. PDF. [LSTM1] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. PDF. Based on [LSTM0]. More. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. Preprint: arxiv:1506.07452. PDF. J. Schmidhuber (AI Blog, Dec 2020). 10-year anniversary of our journal paper on deep reinforcement learning with policy gradients for LSTM (2007-2010). Recent PDF. Preprint arXiv:1805.04908. Architectures. Preprint arXiv:1703.03906 J. Schmidhuber (AI Blog, 2020). 1/3 century anniversary of Searchable PDF scan (created by OCRmypdf which uses LSTM). HTML. better GP methods through Meta-Evolution. More. [MIR] J. Schmidhuber (AI Blog, 2019). Deep Learning: Our Miraculous Year 1990-1991. Preprint arXiv:2005.05744, 2020. Computation 22(12): 3207-3220, 2010. ArXiv Preprint. (AI Blog, Sep 2020). 10-year anniversary of supervised deep learning breakthrough (2010). No unsupervised pre-training. By 2010, when compute was 100 times more expensive than today, both our feedforward NNs[MLP1] J.  Schmidhuber (AI Blog, 2021). The most cited neural networks all build on work done in my labs. Foundations of the most popular NNs originated in my labs at TU Munich and IDSIA. Here I mention: (1) Long Short-Term Memory (LSTM), (2) ResNet (which is our earlier Highway Net with open gates), (3) AlexNet and VGG Net (both citing our similar earlier DanNet: the first deep convolutional NN to win image recognition competitions), Adversarial Artificial Curiosity), and (5) variants of Transformers (linear Transformers are formally equivalent to my earlier Fast Weight Programmers). Annus Mirabilis of 1990-1991.[MIR] Preprint arXiv:1611.01578 (PDF), 2017. [NASC1] J. Schmidhuber. First Pow(d)ered flight / plane truth. Correspondence, Nature, 421 p 689, Feb 2003. [NASC3] J. Schmidhuber. The last inventor of the telephone. Letter, Science, 319, no. 5871, p. 1759, March 2008. Correspondence, Nature, vol 483, p 541, March 2012, doi:10.1038/483541b. Letter, Science, vol 336, p 1639, June 2012. See also comment on response by A. Hodges (DOI:10.1126/science.336.6089.1639-a) [NASC6] J. Schmidhuber. Colossus was the first electronic digital computer. Correspondence, Nature, 441 p 25, May 2006. [NASC7] J. Schmidhuber. Turing's impact. Correspondence, Nature, 429 p 501, June 2004 [NASC8] J. Schmidhuber. Prototype resilient, self-modeling robots. Correspondence, Science, 316, no. 5825 p 688, May 2007. [NASC9] J. Schmidhuber. Comparing the legacies of Gauss, Pasteur, Darwin. Correspondence, Nature, vol 452, p 530, April 2008. HTML. Link. NY Times article NY Times article Learning Dexterous In-Hand Manipulation. arxiv:1312.5602 (PDF). arxiv:1912.06680. An LSTM composes 84% of the model's total parameter count. 2018. An LSTM with 84% of the model's total parameter count was the core of OpenAI Five. PDF. HTML. Link. J. Schmidhuber (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990). This work also introduced high-dimensional reward signals, deterministic policy gradients for RNNs, the GAN principle Based on TR FKI-126-90 (1990).[AC90] More. PDF. Partially based on TR FKI-126-90 (1990).[AC90] Report arXiv:1210.0118 [cs.AI], 2015. One Big Net For Everything. Preprint arXiv:1802.08864 [cs.AI], Feb 2018. Preprint: arXiv:1809.01999. Github: World Models. minimization. TR CU-CS-565-91, Univ. Colorado at Boulder, 1991. PDF. More. 1991. PDF. More. PDF. More. arXiv:1112.5309 [cs.AI] First Experiments with PowerPlay. arXiv:1210.8385 [cs.AI]. [R1] Reddit/ML, 2019. Hinton, LeCun, Bengio receive ACM Turing Award. [R2] Reddit/ML, 2019. J. Schmidhuber really had GANs in 1990. [R3] Reddit/ML, 2019. NeurIPS 2019 Bengio Schmidhuber Meta-Learning Fiasco. [R4] Reddit/ML, 2019. Five major deep learning papers by G. Hinton did not cite similar earlier work by J. Schmidhuber. [R5] Reddit/ML, 2019. The 1997 LSTM paper by Hochreiter & Schmidhuber has become the most cited deep learning research paper of the 20th century. [R6] Reddit/ML, 2019. DanNet, the CUDA CNN of Dan Ciresan in J. Schmidhuber's team, won 4 image recognition challenges prior to AlexNet. [R7] Reddit/ML, 2019. J. Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970. [R8] Reddit/ML, 2019. J. Schmidhuber on Alexey Ivakhnenko, godfather of deep learning 1965. [R9] Reddit/ML, 2019. We [R11] Reddit/ML, 2020. Schmidhuber: Critique of Honda Prize for Dr. Hinton [R12] Reddit/ML, 2020. J. Schmidhuber: Critique of Turing Award for Drs. Bengio & Hinton & LeCun [R15] Reddit/ML, 2021. J. Schmidhuber's work on fast weights from 1991 is similar to linearized variants of Transformers Preprint arXiv/1311.2524, Nov 2013. Preprint arXiv/1703.06870, 2017. PDF. This experimental analysis of backpropagation did not cite the origin of the method,[BP1-4] also known as the reverse mode of automatic differentiation. Link. The Past, Present and Future of Artificial Intelligence. PDF. PDF. ACM's justification of the 2018 A.M. Turing Award (announced in 2019). WWW link. Local copy 1 (HTML only). Local copy 2 (HTML only). [T20a] J. Schmidhuber (AI Blog, 25 June 2020). Critique of 2018 Turing Award for Drs. Bengio & Hinton & LeCun. The first version of the present critique. Technical Report IDSIA-77-21 (v1), IDSIA, 24 Sep 2021. Link. Link. [TUR21] J. Schmidhuber (AI Blog, Sep 2021). Turing Oversold. It's not Turing's fault, though. J. Schmidhuber (AI Blog, 2021). 30-year anniversary. 1991: First very deep learning with unsupervised pre-training. Unsupervised PDF. 1992. Based on TR FKI-148-91, TUM, 1991.[UN0] PDF. approaches are now widely used. More. [UN2] J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here (depth > 1000). 2006. PDF. Link. [VAN1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TUM, 1991 (advisor J. Schmidhuber). PDF. More on the Fundamental Deep Learning Problem. PDF. [VAN4] Y. Bengio. Neural net language models. Scholarpedia, 3(1):3881, 2008. Link. Link. Youtube video [see 28:16]. But in 2010, our team showed[MLP1-2] unsupervised pre-training is not necessary Youtube video, 2018. Preprint arXiv:1609.08144 (PDF), 2016. Based on LSTM which it mentions at least 50 times. WWW link (retrieved 15 May 2020). Local copy (plain HTML only). a general, practical, program-controlled computer. PDF. J. Schmidhuber (AI Blog, 2021). 80th anniversary celebrations: 1941: Konrad Zuse completes the first working general computer, based on his 1936 patent application.

      Deep Learning: Our Miraculous Year 1990-1991 Menu directory status & updates copyrights Scientific Integrity, the 2021 Turing Lecture, and the 2018 Turing Award for Deep Learning AI Blog
      @SchmidhuberAI This is a point-for-point critique of ACM's justification of the ACM A. M. Turing Award for deep learning, as well as a critique of the Turing Lecture given by the awardees (published by ACM in July 2021). 2015 survey of deep learning[DL1] June 2020 article[T20a][R12] (see Executive Summary I, V, II, XII, XIX, XXI, XIII, XIV, XX, XVII). (A) speech recognition, (B) natural language processing, (C) robotics, (D) computer vision, (VII) medicine, astronomy, materials science. A, B, C, D, VII, XVII, VI, XVI). II, V, XX, XVIII) with Dr. Bengio & Dr. Hinton (see Sec. XVII, I). I respond to LBH's recent ACM article (July 2021). expands material in my Critique of the 2019 Honda Prize[HIN] (~3,000 words). Abstract & Outline (~300 words), Introduction (~300 words), Critique of LBH's ACM article (Turing Lecture) of July 2021[DL3a] Executive summary of what's wrong with ACM's laudation (~1,000 words), 21 comments on 21 claims by ACM (~8,000 words), Conclusion and Acknowledgments (~2,000 words). All backed up by over 250 references (~9,000 words). The text contains numerous hyperlinks to relevant overview sites from the AI Blog. science is self-correcting."[SV20] they are mine or other people's.[DL1-2][HIN][NASC1-9] The present page is offered as a resource for all good computer scientists who share this inclination. and to fight plagiarism, collusion rings,[LIT21] and systemic academic corruption in all of their more and less subtle forms.[FAKE] Sec. 2 LBH's 2021 ACM article[DL3a] which necessitated an extension of the first version of this post.[T20a][R12] ACM's official justification[T19] of the 2018 A.M. Turing Award[R1] After the Executive Summary in Sec. 3, Sec. 4 will split ACM's full text[T19] into 21 parts I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, XIX, XX, XXI. Most of the critiques are based on references to original papers and material from the AI Blog.[AIB][MIR][DEC][HIN] publishing yet another misleading overview of the field, this time based on LBH's Turing Lecture.[DL3a] LBH's well-known earlier omissions.[DLC][HIN][T20a] LBH claim to "briefly describe the origins of deep learning"[DL3a] without even mentioning the world's first working deep learning nets by Ivakhnenko and Lapa in 1965[DEEP1-2][R8] (see Sec. II). this class of methods was pioneered in 1991[UN-UN2] (see Sec. II, III). Highway Net, the first really deep feedforward NN.[HW1-3] (see Sec. D, VI). were all driven by my lab:[MOST] In 1991, I had the first very deep NNs based on unsupervised pre-training;[UN-UN2] LSTMs brought essentially unlimited depth to gradient-based supervised recurrent NNs;[LSTM0-17] later our Highway Nets[HW1-3] brought it to feedforward NNs. from 2007[LSTM4,14] based on LSTM[LSTM0-6] (1990s-2005) and CTC (2006).[CTC] our CTC-LSTM-based speech recognition (not that of Hinton) had been on most smartphones for years[GSR][GSR15-19][DL4] (see Sec. A, VI, XI, XV). Similarly for machine translation (see Sec. B). LBH cite Hinton (2012) for "dropout" without mentioning that dropout is just a variant of Hanson's 1990 stochastic delta rule[Drop1-2] (see Sec. XIV). von der Malsburg who introduced ReLUs in 1973[CMB] (see Sec. XIV). called AlexNet,[GPUCNN4] without mentioning that our earlier groundbreaking deep GPU-based DanNet[GPUCNN1-3,5-8][DAN] did not need ReLUs at all to win 4 earlier object recognition competitions and to achieve superhuman results already in 2011[GPUCNN1-8][R5-6] (see Sec. XIV). XVIII). already in 1965[DEEP1-2][R8] (see Sec. II). earlier fast weights of von der Malsburg (1981) and Feldman (1982).[FAST,FASTa-b][FWP] described in the 1991-93 papers on Fast Weight Programmers and linear Transformers[FWP0-1,6] (see Sec. XVI, XVII-2). dedicate an extra section to attention-based Transformers,[TR1-6] citing Bengio's team (2014) for "soft attention"[ATT14] without citing the much earlier original work of 1991-1993 on soft attention and linear Transformers[FWP,FWP0-2,6][ATT] (see Sec. XVII-1, XVI). LBH claim that Bengio's team[NPM] of text compression[SNT] (see Sec. XVI, XVII-1). LBH cite Bengio's 2014 paper on Generative Adversarial Networks (GANs)[GAN0-1] without mentioning that GANs are instances of the Adversarial Curiosity Principle of 1990[AC90-20][MIR](Sec. 5) (see Sec. XVII). In summation, LBH have repeatedly chosen to ignore the previous well-known critiques[DLC][HIN][T20a] and deep learning surveys,[DL1-2] and deep learning (e.g., Sec. I), ACM lauds Numerous references can be found under the relevant section links I-XXI which adhere to the sequential order of ACM's text[T19] Sec. II: it became really deep in 1991 in my lab, unsupervised pre-training of NNs, supervised LSTM. Sec. I contains 4 subsections A, B, C, D A: Speech Recognition (see also Sec. VI & XI & XV): The first superior end-to-end neural speech recognition combines two methods from my lab: LSTM (1990s-2005) and CTC (2006), which were Hinton (2012) and Bengio (XV) our revolutionary CTC-LSTM which was soon on most smartphones. Sec. B: Natural Language Processing (see also Sec. VI & XI & XVI): (soon used for several billions of was also based on our LSTM. Sec. C: Robotics. most visible breakthroughs Sec. D: Computer Vision XVIII & XIV & XI & VI) and applied to speech. All before LeCun's CNN work (XVIII). deep NNs pre-training (in contrast to Hinton's claims). Our DanNet was the first CNN fast & deep enough for superior computer vision in 2011, winning 4 image recognition contests in a row is an open-gated version of our earlier Highway Nets. Sec. XIV: deep & fast CNN (where LeCun participated), Sec. XI: ACM mentions GPU-accelerated NNs deep GPU-NN of 2010 debunked unsupervised pre-training (introduced by myself in 1991 and later championed by Hinton), and our GPU-CNN of 2011 (DanNet) was the first XVIII: Fukushima and Waibel (see Sec. D). VII: ACM explicitly mentions medicine and first to win medical imaging competitions Sec. XII & XIX & XXI: Modern backpropagation XIII & II & V III & IX & X & XX): Sec. XX: ACM credits LeCun for work on Sec. XXI: ACM credits LeCun for work on XV: ACM credits Bengio for hybrids of NNs and probabilistic models of sequences. CTC-LSTM A & B). XVI: ACM We started this in 1990-93 long before LBH Sec. XVII: Artificial Curiosity vanishing gradients (1991), metalearning (1987), unsupervised pre-training (1991), compressing or distilling one NN into another (1991), learning sequential attention with NNs (1990), fast weight programmers using and other topics.[R2-R6] Sec. IV is on Turing (1936) and his predecessors Critique of LBH's ACM article (Turing Lecture) of July 2021. Sec. Conclusion: In the recent decade of deep learning, (speech recognition, language translation, etc.) on billions of devices (also healthcare applications) Sec. II & III & V & XII & XIII & XVII & XIV & XIX & XX & XXI. In what follows, ACM's full text [T19] is split into 21 parts I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, XIX, XX, XXI.

      Critique of 2018 Turing Award LBH and their co-workers have contributed certain useful improvements of existing deep learning methods.[CNN2,4][CDI][LAN][RMSP][XAV][ATT14][CAPS] (1965),[DEEP1-2][R8] modern backpropagation (1970),[BP1-2][R7] architectures of recurrent NNs (1943-56)[MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC90,90b][AC20] unsupervised pre-training for deep NNs (1991),[UN1-2] vanishing gradients (1991)[VAN1] & Long Short-Term Memory or LSTM (Sec. A), GPU-accelerated NNs (2004),[GPUNN][DAN][DAN1][GPUCNN5] NNs with over 100 layers (2015),[HW1-3][R5] transformer-like[TR1-6][FWP] attention[FWP][ATT] through fast weight programmers (1991).[FWP0-2,6] [DL1-2][R2-R8] Often LBH failed to cite essential prior work, even in their later surveys.[DL3,DL3a][DLC][HIN][MIR](Sec. 21)[R2-R5, R7-R8] This may explain some of ACM's misattributions.[T19] II & III & V & XIII & X & XVII & XII & XVIII & XX. The deep NNs By the 2010s,[DEC] they were academia and industry,[DL4] mentioned by ACM (labeled as A, B, C, D) below: Long Short-Term Memory or LSTM (1990s-2005)[LSTM0-6] vanishing gradient problem student Sepp Hochreiter in 1991.[VAN1] This happened long before the similar work of Bengio (see Sec. XVII).[MIR] (Sec. 3,Sec. 4) LSTM was refined with my student Felix Gers[LSTM2] through "forget gates" based on end-to-end-differentiable fast weights.[MIR](Sec. 8)[FWP,FWP0-1] (A2) Connectionist Temporal Classification by my student Alex Graves et al. (2006).[CTC] Our team successfully applied CTC-trained LSTM to speech in 2007[LSTM4] (also with hierarchical LSTM stacks[LSTM14]). Markov models (HMMs)[BW][BRI][BOU] (Sec. XV). Hinton et al. (2012) still used the old hybrid approach[HYB12] and did not compare it to CTC-LSTM. became the first recurrent NN (RNN) to win international competitions. He later reused our end-to-end neural speech recognizer[LSTM4][LSTM14] as a postdoc in Hinton's lab.[LSTM8] CTC-LSTM dramatically improved Google's speech recognition.[GSR][GSR15][DL4] on-device speech recognition[GSR19] (not any longer on the server) LSTM[MIR](Sec. 4) (see Sec. VI & XI & XV). of text[SNT] (see Sec. XVI). In 2001, we showed that LSTM can learn languages unlearnable by traditional models such as HMMs,[LSTM13] See also Sec. VI & XI & XV. tailored by Bengio's team.[ATT14][FWP] However, such attention mechanisms also have their roots in my lab (1991);[FWP][FWP0-2,6] see Sec. XVI. C. Robotics & RL etc. Since 2003, our team has used LSTM for Reinforcement Learning (RL) and robotics.[LSTM-RL][RPG][LSTMPG] In the 2010s, For example, in 2018, a PG-trained LSTM was the core of OpenAI's famous Dactyl which learned to control a dextrous robot hand without a teacher.[OAI1][OAI1a] beat a pro player in the game of Starcraft, which is theoretically harder than Chess or Go[DM2] in many ways, using Alphastar whose brain has a deep LSTM core trained by PG.[DM3] OpenAI Five which learned to defeat human experts in the Dota 2 video game (2018).[OAI2] Bill Gates called this a "huge milestone in advancing artificial intelligence".[OAI2a][MIR](Sec. 4)[LSTMPG] Apart from A, B, C above, in healthcare, chemistry, molecular design, lip reading, speech synthesis,[AM16] predicting what's going on in nuclear fusion reactors, and so on.[DEC][DL4] was being used for LSTM (only 5% for the CNNs of Sec. D).[JOU17] Apparently the first LSTM journal paper[LSTM1][R5] is now the most frequently cited D. Computer Vision was revolutionized in the 2010s by a particular feedforward NN called the convolutional NN (CNN).[CNN1-4] The basic CNN architecture with convolutional and downsampling layers is due to Fukushima (1979).[CNN1] The popular downsampling variant called max-pooling was introduced by Weng et al. (1993).[CNN3] In 1987, NNs with convolutions were combined by Waibel with weight sharing and backpropagation.[CNN1a] Waibel did not call this CNNs but TDNNs. LeCun's team later contributed improvements of CNNs, especially for images[CNN2,4] (see Sec. XVIII). Finally, my own team showed in 2010[MLP1] unsupervised pre-training is not necessary to train deep NNs, contrary to claims by Hinton[VID1] who said that "nobody in their right mind would ever suggest" this. Then we Our fast GPU-based CNN of 2011[GPUCNN1] known as DanNet[DAN,DAN1][R6] CNNs of 2006.[GPUCNN] winning four of them in a row (15 May 2011, 6 Aug 2011, 1 Mar 2012, 10 Sep 2012).[GPUCNN5] at IJCNN 2011 in Silicon Valley, DanNet blew away the competition and achieved the first superhuman visual pattern recognition[DAN1] in an international contest (where LeCun's team took a distant second place, with DanNet was also the first deep CNN to win: a Chinese handwriting contest (ICDAR 2011), an image segmentation contest (ISBI, May 2012), CVPR paper on DanNet[GPUCNN3] of Hinton's student Krizhevsky won the ImageNet[IM09] 2012 contest[GPUCNN4-5][R6] (now also without unsupervised pre-training, citing DanNet). Our CNN image scanners were 1000 times faster than previous methods.[SCAN] The VGG network (ImageNet 2014 winner)[GPUCNN9] and other highly cited CNNs[RCNN1-3] further extended the work of 2011.[MIR](Sec. 19) ResNet, the ImageNet 2015 winner[HW2] (Dec 2015) which currently gets more citations per year[MOST] Highway Net (May 2015).[HW1-3][R5] The Highway Net is actually the feedforward net version of vanilla LSTM.[LSTM2] It was the first working, really deep feedforward NN with hundreds of layers (previous NNs had at most a few tens of layers). See also Sec. XVIII & XIV & XI & VI.

      Critique of 2018 Turing Award appeared long before the 1980s. were proposed already in the 1940s/50s[MC43][K56] (but don't forget prior work in physics since the 1920s[L20][I25][K41][W45]). deep convolutional NN architecture was proposed in the 1970s.[CNN1] NNs without hidden layers learned in 1958[R58] regression and the method of least squares[DL1-2]). about deeper adaptive NNs[R61,R62] layers (already containing the now popular multiplicative gates).[DEEP1-2][DL1-2] A paper of 1971[DEEP2] highly cited method which was still popular in the new millennium,[DL2] especially in Eastern Europe, where much of Machine Learning was born. Ivakhnenko did not call it an NN, but that's what it was.[MIR](Sec. 1)[R8] LBH failed to cite this. XIII & III & V & VIII & IX & X. LBH & co-authors, e.g., Sejnowski[S20] (see Sec. XIII). It goes more or less like this: "In 1969, Minsky & Papert[M69] researchers took a fresh look at the problem in the 1980s."[S20] However, as mentioned above, the 1969 book[M69] addressed a "problem" of Gauss & Legendre's shallow learning (~1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method.[DEEP1-2][DL2] Minsky was apparently unaware of this and failed to correct it later.[HIN](Sec. I) (but see a 1989 paper[MOZ]). However, it became really deep in 1991 in my lab,[UN-UN3] which has See Sec. 1 of the overview:[MIR] First Very Deep NNs, Based on Unsupervised Pre-Training (1991). "Very Deep Learning" tasks of depth > 1000.[UN2][DL1][UN] (By 2003, LSTM variants successfully dealt with language problems of depth up to 30,000[LSTM17] more.) drove the shift from unsupervised pre-training to purely supervised learning (1991-95; 2006-10).[HIN](Sec. II)[MIR] (Sec. 19) III. Note that LSTMs brought essentially unlimited depth to supervised recurrent NNs; Highway Nets[HW1-3] brought it to feedforward NNs.[MOST]

      Critique of 2018 Turing Award by others (Sec. III).[DLC][DEEP1-2][BP1][DL1-2][R7-R8][R2-R4] deep learning multilayer perceptrons (1965),[DEEP1-2][R8] modern backpropagation (1970),[BP1,2][R7] architectures of recurrent NNs (1943-56)[MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC90,90b][AC20] unsupervised pre-training for deep NNs,[UN1-2] the vanishing gradient problem (1991)[VAN1] & solutions to it (Sec. A), GPU-accelerated NNs (2004),[GPUNN][GPUCNN5] and other foundations.[DL1-2][R2-R8] Often LBH failed to cite essential prior work.[DLC][HIN][MIR](Sec. 21) II & V & XIII & IX & X & XVII & XII & XVIII & XX & I. deeplearning.net which until 2019 advertised deep learning as "moving beyond shallow machine learning since 2006",[DL7] referring to Hinton's[UN4] and Bengio's[UN5] we had this type of deep learning already in 1991;[UN][UN1-2] see Sec. II & XVII (5). Not to mention Ivakhnenko's even earlier supervised layer-wise training of deep NNs[DEEP1-2] which Hinton,[UN4] Bengio,[UN5] and LBH[DL3,DL3a] did not cite either. See Sec. X.

      Critique of 2018 Turing Award my comments systematically track the sequential order of ACM's claims.[T19]

      ACM's statement on Turing is greatly misleading, like some of its other statements.[T19] any type of computation-based AI.[GOD][BIB3][MIR](Sec. 18)[GOD21,21a] Much of early AI in the 1940s-70s was actually about theorem proving[ZU48][NS56]

      In 1936, Turing Turing Machine.[TUR] He rederived the above-mentioned result,[CHU][TUR][HIN][GOD21,21a][TUR21][LEI21,21a] In the same year of 1936, Emil Post published yet another independent universal model of computing,[POS] my reply to Hinton who criticized my website on Turing without suggesting any fact-based corrections.[HIN]) open problem "P=NP?" in his famous letter to John von Neumann (1956).[GOD56][URQ10] Likewise, Konrad Zuse (1910-1995) created the world's first working programmable general-purpose computer 1935-41. His patent application of 1936[ZU36-38][Z36][RO98][ZUS21] predating Claude Shannon's 1937 thesis on digital circuit design.[SHA37] Zuse also created the first high-level programming language in the early 1940s.[BAU][KNU] conditional jump instruction.[RO98]

      Critique of 2018 Turing Award that learn internal representations (1965),[DEEP1-2][R8] modern backpropagation (1970),[BP1,2][R7] architectures of recurrent NNs (1943-56)[MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC][AC90,90b][AC10][AC20] unsupervised pre-training for deep NNs (1991),[UN1-2][UN] vanishing gradients (1991)[VAN1] & solutions to it (Sec. A),[LSTM0-17][CTC] (2004),[GPUNN][GPUCNN5] record-breaking deep supervised NNs (2010)[MLP1-2] and contest-winning deep CNNs (2011),[DAN][DAN1][GPUCNN5] NNs with over 100 layers (2015),[HW1-3][R5] transformer-like[TR1-6][FWP] attention[FWP][ATT] through fast weight programmers (1991),[FWP0-2,6] and more.[DL1-2][R2-R8] Often LBH failed to cite essential prior work.[DL3,DL3a][DLC][HIN][MIR](Sec. 21)[R2-R5,R7,R8,R11] II & I & III & XIII & X & XVII & XII & XVIII & XX.

      Critique of 2018 Turing Award "advances in natural language processing" and in speech supervised NNs and CNNs achieved by our group 2010-2011[MLP1-2][DAN][DAN1][GPUCNN5][R6] and through Highway Net-like NNs (2015),[HW1-3][R5] although the principles of CNNs were invented and developed by others since the 1970s.[CNN1-4] See Sec. D & XVIII & XIV as well as Sec. 4 & Sec. 19 of the overview.[MIR]

      Critique of 2018 Turing Award DanNet[DAN][DAN1][GPUCNN5] the first NN to win a medical imaging contest through deep learning (Sept 2012, on cancer detection).[GPUCNN5,8] and were able to greatly improve steel defect detection.[ST] All of this happened before the similar GPU-accelerated AlexNet of Hinton's student Krizhevsky won ImageNet 2012.[GPUCNN5][R6] mitosis detection.[MGC][GPUCNN5,8] approach of D & XI).

      Critique of 2018 Turing Award without citing them.[DL1][DLC][HIN][R2-R4][R7-R8] V & XII & XIX & II & III & XIII & XVII & X & I.

      Critique of 2018 Turing Award who failed to cite them, even in later work.[HIN][DLC][DL1-2][DEEP1-2][CMB][R7-R8] See Sec. II & III & XIII & V & X & XIV & I.

      Critique of 2018 Turing Award first introduced to Machine Learning by Dechter (1986), and to NNs by Aizenberg et al (2000).[DL2] To my knowledge, LBH have never cited them. (Margin note: our 2005 paper on deep RL[DL6,6a] was the first machine learning LBH started talking about "deep learning ... moving beyond shallow machine learning since 2006",[DL7] referring to their unsupervised pre-training methods of 2006. See Sec. III. others built careers on this notion long before LBH recognized this.[DEEP1-2][CNN1][HIN][R8][DL1][DLC] Even deep learning through unsupervised pre-training was introduced by others.[UN1-3][R4][HIN](Sec. II) II & III & XIII & V & I.

      Critique of 2018 Turing Award ignored by LBH's papers[HIN][R7-R8][R2-R5] (see Sec. V & II & III & I & XIII & XII & XIX & X & XVII).

      ACM correctly mentions advancements through GPUs. The first to use GPUs for NNs were Jung & Oh (2004),[GPUNN][GPUCNN5] made GPU-based NNs fast and deep enough an important benchmark record,[MLP1-2] unsupervised pre-training (pioneered by myself in 1991) is not necessary to train deep NNs, contrary to Hinton's claims.[VID1] our CNNs were deep and fast enough[DAN][DAN1][GPUCNN5] vision (explicitly mentioned by ACM) for the first time[R6] (see Sec. D).

      Furthermore, by the mid 2010s, speech recognition and machine translation (explicitly mentioned by ACM) were actually dominated by LSTM and CTC of our team.[LSTM1-4][CTC] In particular, as mentioned in Sec. A, such as HMMs.[BW][BOU][BRI][HYB12] As mentioned in Sec. B and XVI, the first superior end-to-end neural machine translation was also based on LSTM.

      Critique of 2018 Turing Award ACM's statement is "less wrong" than Honda's[HIN](Sec. I) but still (and apparently even other award committees[HIN](Sec. I) backpropagation by Rumelhart et al. (1985-86)[RUM] (1982).[BP2] And the article[RUM] even failed to mention Linnainmaa, the inventor of this famous algorithm for credit assignment in networks (1970),[BP1] Kelley already had a precursor thereof in the field of control theory;[BPA] see also later work of the early 1960s.[BPB][BPC][R7] internal representations in hidden layers of NNs.[RUM] But this was essentially just an experimental analysis of a known method.[BP1-2] And history of backpropagation can be found at Scholarpedia[DL2] and in my award-winning survey.[DL1] Also see Sec. XIX, II.

      Some claim that "backpropagation is just the chain rule of Leibniz (1676) & L'Hopital (1696)." No, it is the efficient way of applying the chain rule to big networks with differentiable nodes (there are also many inefficient ways of doing this). It was not published until 1970.[BP1] recent debate:[HIN] It is true that in 2018, Hinton[AOI] Rumelhart[RUM] with the "invention" of backpropagation. for "creating" the method and for other things he didn't do.[HIN] Neither in a popular book[AOI] nor in other recent work[DL3,DL3a] did he cite Linnainmaa (1970),[BP1] the true creator.[BP4-5] that his 2015 survey[DL3] does cite Werbos (1974) who however described the method correctly only later in 1982[BP2] and also failed to cite Linnainmaa[BP1] (compare Amari's work of 1977[BP6]). Linnainmaa's method was well-known.[BP5][DL1-2][DLC] It wasn't created by "lots of different people" as Hinton suggested[AOI][HIN][R11] one person who published first[BP1] and therefore should get the credit.

      Critique of 2018 Turing Award Boltzmann Machine (BM)[BM] a learning.[HIN] Recently, however, I learnt through a reader that even the BM paper[BM] did not cite prior relevant work by Sherrington & Kirkpatrick[SK75] and Glauber.[G63] (Compare related work.[H86][H88][S93]) multilayer perceptrons with arbitrarily many layers.[DEEP1-2][HIN] Sec. II V & X.[MIR](Sec. 1)[R8]

      As mentioned in Sec. II, Sejnowski's rather self-serving "history of deep learning" [S20] claims: In 1969, Minsky & Papert[M69] at the problem in the 1980s."[S20] However, the 1969 book[M69] addressed a "deep learning problem" (a limitation of Gauss & Legendre's shallow learning around 1800[DL1-2]) that had already been solved four years prior (see Sec. II), also in the 1970s, especially outside of the Anglosphere.[DEEP2][BP6][CNN1][DL1-2]

      Critique of 2018 Turing Award Dropout is actually a variant of Hanson's much earlier stochastic delta rule (1990).[Drop1-2] Hinton's 2012 paper and his later patent did not cite this either. as we showed already in 2011 in a contest where LeCun's team participated as well,[DAN1] Sec. D above. Back then, the only really of deep CNNs through GPUs.[GPUCNN1,3,5][R6] Already before ImageNet 2012,[R6] fast deep CNN called DanNet a monopoly on winning computer vision competitions.[GPUCNN5] It more than "halved the error rate for object recognition" (ACM's wording) in a contest already in 2011[GPUCNN2][DAN,DAN1][R6] long before the similar system of Hinton's student. See Sec. D as well as Sec. 19 of the overview.[MIR]

      Critique of 2018 Turing Award since the late 1980s.[BW][BRI][BOU] LSTM (1990s-2005)[LSTM0-6] and CTC[CTC] (2006), which were applied to speech in 2007.[LSTM4][LSTM14] CTC-LSTM is end-to-end-neural and thus very different from (and superior to) the hybrid methods since the late 1980s.[BW][BRI][BOU][HYB12] See also Sec. A.

      Critique of 2018 Turing Award 5 years earlier, in 1995, we already had a similar, excellent neural probabilistic text model.[SNT] Bengio[NPM] characterizes it only briefly as "related" (see also Pollack's earlier work on embeddings of words and other structures[PO87][PO90]). In the 2010s, was actually the LSTM of our team,[LSTM0-6] which Bloomberg called the "arguably the most commercial AI achievement."[AV1][MIR](Sec. 4) See Sec. B. Bengio's team[ATT14] has indeed become important. For example, it helped to further improve Facebook's LSTM-based translation (see Sec. B). adaptive neural sequential attention: end-to-end-differentiable "soft" attention in the latent space of Fast Weight Programmers (FWPs),[FWP2][FWP] and "hard" attention (in observation space) in the context of RL[ATT][ATT0-1] (1990). attention-based Transformers[TR1-6] are FWPs of 1991[FWP0-1] which have become a popular alternative to RNNs. My FWP of 1991[FWP0-1] (now often called keys and values for self-attention).[TR1-6][FWP] the 2010s,[DEC] Transformers[TR1-2] a traditional LSTM domain (see Sec. B). rapidly learn to solve quickly[LSTM13,17] linear Transformers or Performers[TR5-6] which are formally equivalent to my 1991 FWPs (apart from normalization).[FWP6][FWP] In 1993, I introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves.

      See[MIR](Sec. 9)[R4] for my related priority dispute on attention with Hinton. He was the reviewer of my 1990 paper[ATT2] his own work:[ATT3]

      Critique of 2018 Turing Award GANs[GAN0-1] (2010-2014) are actually a simple application[AC] of the adversarial curiosity (AC) principle from 1990[AC90,90b][AC20] (see also surveys[AC09-10]). This principle is now widely used for exploration in RL (e.g., Sec. C) and for image synthesis[GAN1] (also mentioned by ACM in Sec. XVIII). predictor NN minimizes its error, while the generator NN tries to make outputs that maximize this error: one net's loss is the other net's gain. 4 years before the GAN paper,[GAN1] a well-known 2010 survey[AC10] summarised the generative adversarial NNs of 1990 as follows: a whether the controller's (or generator's) output is in a given set.[AC20][AC] early adversarial machine learning settings[S59][H90] neither involved unsupervised NNs nor were about modeling data nor used gradient descent.[AC20]) Bengio et al. neither cited the original work[AC90,90b][AC20] nor corrected their erroneous claims[GAN1] about the other (1991).[PM1-2][AC20][R2][MIR](Sec. 5) Bloomberg,[AV1] their NIPS 2014 paper[GAN1] and some of the erroneous claims it made about my prior work.[AC20] Goodfellow eventually admitted that PM is adversarial (his paper[GAN1] still claims the opposite), but emphasized that it's not generative. However, the even earlier AC[AC90,90b][AC10][AC20] is both adversarial and generative (its generator contains probabilistic units[AC90] like in StyleGANs[GAN2]). When the authors[GAN1] I published one myself in the hopes of correcting the annals of history.[AC20] that they are instances of my earlier work.[R2][AC20] vanishing gradient problem,[MIR](Sec. 3)[VAN1] Bengio published his own,[VAN2] without citing Sepp. was settled in favor of Sepp.[VAN1] However, even after a common publication,[VAN3] Bengio published papers[VAN4][XAV] are poor indicators of truly pioneering work.[NAT1] (Margin note: Bengio states[YB20] that in 2018 he one must at least clarify it later,[DLC] Bengio also claims[YB20] that in 1995 my publications on exactly this topic date back to 1991-93.[UN0-2][UN] which I started in 1987[META1][META] long before Bengio that he did it before me.[R3] Bengio also writes[YB20] that in Regarding attention-based Transformers,[TR1-6] Bengio[DL3a] cites his own team (2014) for "soft attention" without citing my much earlier original work of 1991-1993 on soft attention and linear Transformers.[FWP,FWP0-2,6] Bengio has also heavily used our LSTM (see Sec. A-C), "gated recurrent units (GRU)"[LSTMGRU] for a variant of our vanilla LSTM architecture[LSTM2] (2000) which he did not cite although our work[LSTM2] was the one that introduced gated recurrent units. In addition, our team automatically evolved lots of additional LSTM variants and topologies already in 2009[LSTM7] without changing the name of the basic method. learn to count[LSTMGRU2] nor learn simple non-regular languages;[LSTMGRU2] they according to Google Brain.[LSTMGRU3]) unsupervised pre-training for deep NNs.[UN0-4][HIN](Sec. II)[MIR](Sec. 1) Hinton's paper[UN4] (2006) appeared long after my earlier work on this[UN0-2] the first NNs shown to solve very deep problems (see Sec. II above).[UN] It was published in 1991-92[UN1] when compute was about 1000 times more expensive than in 2006. survey (2015),[DL3][DLC] See also Sec. II & III. compressing or distilling one NN into another.[UN0-2][DIST1-2][MIR](Sec. 2) Hinton[DIST2] (2006) did not cite my much earlier original work on this (1991),[UN1][UN] not even in his later patent application fast weight programmers[FWP][FWP0-4a] through tensor-like outer products (1991-2016) and their motivation[FWP2][FWP4a][MIR](Sec. 8) (see also Sec. XVI above). learning sequential attention with NNs.[MIR](Sec. 9) Hinton[ATT3] (2010) our much earlier work on this[ATT1][ATT] although he was both reviewer and editor of my summary[ATT2] (1990; see Sec. XVI above).

      The ten priority disputes mentioned in the present Sec. XVII are not on the only ones.[R4] Remarkably, three of them are related to the 1991 paper[UN1][UN] which in many ways started what people now call deep learning, going beyond Most of them go back to work of 1990-91.[MIR] See Sec. I for additional related issues of credit assignment.

      Critique of 2018 Turing Award LeCun's team has made important contributions to CNNs since 1989.[CNN2,4] However, the basic CNN architecture with convolutional and downsampling layers is actually due to Fukushima (1979).[CNN1] NNs with convolutions were later (1987) combined by Waibel with weight sharing and backpropagation.[CNN1a] Waibel called this TDNN and All of this happened before LeCun's work on CNNs. See Sec. D above and Sec. 21 of the overview of our Annus Mirabilis 1990-1991.[MIR] at IJCNN 2011 in Silicon Valley, our DanNet[DAN][GPUCNN1-3] won the superhuman performance three times worse performance).[DAN1] Again see Sec. D. at ICPR 2012, our DanNet[GPUCNN1-3] won the medical imaging contest (Sept 2012, on detection of mitosis/cancer)[GPUCNN5,7,8] (before the similar AlexNet won ImageNet 2012[GPUCNN5][R6] and the similar VGG network[GPUCNN9] won ImageNet 2014). mitosis detection.[MGC][GPUCNN5,7,8] Many major companies are using it now. See Sec. D & VII. ACM also explicitly mentions speech recognition, speech synthesis,[AM16][DL1] All of these fields were heavily shaped in the 2010s by our non-CNN methods.[DL1][DL4][AM16][GSR][GSR15][GT16][WU][FB17] See Sec. A, B, VI, XI.

      Critique of 2018 Turing Award As mentioned in Sec. XII, backpropagation was actually proposed earlier as a learning method for NNs by Werbos (1982)[BP2-4] (see also Amari's work of 1977[BP6]). recent work.[DL3,DL3a][DLC] In 1960, Kelley already had a precursor of the algorithm.[BPA] Furthermore, many besides LeCun have worked "to speed up backpropagation algorithms"[DL1] (ACM's wording). More on the history of backpropagation can be found at Scholarpedia.[DL2][BP4]

      Critique of 2018 Turing Award However, "hierarchical feature representation" in deep learning networks is what Ivakhnenko & Lapa (1965)[DEEP1-2] (and also Fukushima[CNN1][DL2]) had long before LeCun. See Sec. D & II & XIII & V.

      Critique of 2018 Turing Award LeCun et al. neither cited the origins[BP1] (1970) of this widely used type of automatic differentiation for differentiable networks of modules[DL2][BP4-5][DLC] for such systems.[S80] See also Sec. XIX & XII. before LeCun who did not cite them. See also Pollack's even earlier relevant work.[PO87-90]

      (Furthermore, "complex networks of modules where backpropagation is performed" were the central theme of my much earlier habilitation thesis (1993).[UN2] For example, our adaptive subgoal generators (1991)[HRL0-2] were trained through end-to-end-differentiable chains of such modules.[MIR](Sec. 10) planning and reinforcement learning with recurrent neural world models (1990).[PLAN][MIR](Sec. 11) Same for my linear transformer-like fast weight programmers[FWP0-2][FWP][ATT][MIR](Sec. 8) since 1991 (see Sec. XVI) see "100 Authors against Einstein."[AH1] ad hominem attacks[AH2-3][HIN] "If you cannot dispute a fact-based message, attack the messenger himself."[HIN] award can ever change that.[HIN] and their co-workers have contributed useful improvements of deep learning methods.[CNN2,4][CDI][LAN][RMSP][XAV][ATT14][CAPS] whom they did not cite II, V, XII, XIX, XXI, XIII, XIV, XI, and XX, and 2). Sec. I, A, B, C, D, XVII, VI, and XVI). As emphasized earlier:[DLC][HIN] to self-correction,"[SV20] as is already the standard in other scientific fields. in popular science venues without peer review? For example, the narrator of a popular 2018 Bloomberg video[VID2] Germany and Switzerland (LSTM & CTC; see Sec. A) long before Hinton's methods. Similarly, in 2016, the NY Times published an article[NYT3] Google's original 2016 paper on Google Translate[WU] mentions LSTM over 50 times (see Sec. B). In ad hominem style,[AH2-3] claiming credit he doesn't deserve for many, many things",[NYT1] without LeCun also called the GANs of Bengio's team[GAN1] GANs are variations of my work in 1990.[AC90,90b][AC20][R2] According to Bloomberg,[AV2] Bengio has simply "denied my claims" without backing up his denial by any facts; see Sec. XVII. and forcefully contradict public figures who promote it."[FAKE] LBH, who called themselves the deep learning conspiracy,[DLC] Our LSTM paper[LSTM1] has got more citations than any paper by Bengio or LeCun,[R5] Hinton's most cited paper (2012) is the one on GPU-based CNNs.[GPUCNN4][R5] It follows our earlier work on supervised deep NNs (2010)[MLP1] unsupervised pre-training for deep NNs by myself [UN][UN0-3] and later championed by Hinton;[UN4][VID1] see Sec. D). Hinton (2012)[GPUCNN4] characterizes our deep and fast DanNet (2011)[GPUCNN1-3] as AlexNet won one;[R6] see Sec. D, XIV. The highly cited VGG network (2014)[GPUCNN9] Hinton's 2nd most cited paper[RUM][R5] of Hinton's paper,[RUM] adding citations for a book by Rumelhart & McClelland[R5]). Backpropagation is a previously invented method[BP1] whose origins of Ivakhnenko whom he has never cited;[DEEP1-2][R7-R8] see Sec. II, XIII. Bengio's 2nd most cited research paper is the one on GANs (2014),[GAN1] instances of my artificial curiosity (1990)[AC90,90b][AC20][R2] which he did not cite; see Sec. XVII. Hinton's highly cited papers on unsupervised pre-training for deep NNs (2006-)[UN4] by ours[UN0-2][UN] were preceded by Hanson's[Drop1-2] As recently as of 2021, ACM published yet another misleading deep learning "survey" by LBH,[DL3a] again heavily citing LBH without Consult the Executive Summary and Sec. I-XXI of this critique for more. So virtually all the algorithms that have attracted have their conceptual and technical roots in my labs in Munich and Lugano,[MOST] of deep learning MLPs since 1965[DEEP1-2] (see Sec. II, XX) and backpropagation (1960-70)[BPA][BP1] (see Sec. XIX, XII) and convolutional NNs since 1979[CNN1-4] (see Sec. XVIII, D). Our LSTM (1990s, see Sec. A, B; also for RL, 2003-, see Sec. C) → our Highway Net (May 2015) → ResNet (Dec 2015, see Sec. D). Our adversarial Artificial Curiosity (1990) → GANs (2010s, see Sec. XVII). our own unsupervised pre-training of deep NNs (1991, see Sec. II & III) for recurrent NNs in the 1990s → our LSTM (see Sec. A-C) and for feedforward NNs in 2010 → our DanNet (2011) → AlexNet (2012); VGG Net (2014) (see Sec. D). our LSTM brought essentially unlimited depth to supervised recurrent NNs in the 1990s; our Highway Nets[HW1-3] brought it to feedforward NNs in May 2015.[MOST] superior computer vision (2011, see Sec. D, XVIII), medical diagnosis (2012, see Sec. VII, XVIII), and many other applications.[DEC] speech recognition (with our CTC, 2007-15, see Sec. A), machine translation (2016, see Sec. B), robotics & video game players (2018-19, see Sec. C), and many other applications.[DEC] Fast Weight Programmers (1991, see Sec. XVI) are formally equivalent to linear Transformers (now popular in NLP). I, A, B, C, D, VII, XVIII.

      As mentioned earlier,[MIR](Sec. 21) it is not always clear[DLC] depth that really learned.[DEEP1-2][R8] Five years later, modern backpropagation

      Yes, this critique is also an implicit critique of certain other awards to LBH.[HIN] reddit.com/r/MachineLearning[R1-R12] (the largest machine learning forum with back then over 800k subscribers), many of them influenced by my overview.[MIR]

      Dr. LeCun himself is well aware of the challenges to scientific integrity in our field:[LECP] "... else cites."[LECP]

      Note that I am insisting on proper credit assignment not only in my own research field but also in quite disconnected areas,[HIN] as demonstrated by my numerous letters in this regard published in Science and Nature, e.g., on the history of aviation,[NASC1-2] the telephone,[NASC3] the computer,[NASC4-7] resilient robots,[NASC8] and scientists of the 19th century.[NASC9] AI scientists and AI historians equipped with artificial curiosity[SA17][AC90-AC20][PP-PP2]

      Creative Commons LicenseThanks to many expert reviewers for useful comments. Since science is about self-correction, let me know under juergen@idsia.ch if you can spot any remaining error. Many additional relevant publications can be found in my publication page and my arXiv page. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. J.  Schmidhuber (AI Blog, 2021). 3 decades of artificial curiosity & creativity. Our PDF. The first paper on planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks (more). PDF. More. PDF. PDF. PDF. PDF. (More on artificial scientists and artists.) IEEE link. PDF. With a brief summary of the generative adversarial neural networks of 1990[AC90,90b][AC20] (more). Preprint arXiv/1906.04493. Link. Link. [AIB] J. Schmidhuber. AI Blog. Includes variants of chapters of the AI Book. Blog of Werner Vogels, CTO of Amazon (Nov 2016): [ATT] J. Schmidhuber (AI Blog, 2020). 30-year anniversary of end-to-end differentiable sequential neural attention. Plus goal-conditional reinforcement learning. We had both hard attention (1990) and soft attention (1991-93).[FWP] Today, both types are very popular. PDF. PDF. More. PS. (PDF.) arXiv/1409.0473, 2014-16. Bloomberg, May 15, 2018. Bloomberg, May 17, 2018. PDF. HTML. PDF. Precursor of modern backpropagation.[BP1-4] PDF. Link. PDF. First application of backpropagation[BP1] to NNs (concretizing thoughts in his 1974 thesis). [BP4] J. Schmidhuber (AI Blog, 2014; updated 2020). Who invented backpropagation? More.[DL2] English version: [CNN1+]. More in Scholarpedia. Link. [CNN1a] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. First application of backpropagation[BP1][BP2] and weight-sharing PDF. Spatial Averaging.[CNN1] PDF. PDF. PDF. PDF. Beijing, 2014. Preprint arXiv:1402.3511 [cs.NE]. J. Schmidhuber (AI Blog, 2021). 10-year anniversary. In 2011, DanNet triggered the deep convolutional neural network (CNN) revolution. Named 1st superhuman result in 2011.[DAN1] J. Schmidhuber (AI Blog, 2011; updated 2021 for 10th birthday of DanNet): First superhuman visual pattern recognition. our artificial neural network called DanNet [DEC] J. Schmidhuber (AI Blog, 02/20/2020, revised 2021). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s. The [DIST1] J. Schmidhuber, 1991.[UN-UN2] More. Deep Learning. HTML. [DL3a] Y. Bengio, Y. LeCun, G. Hinton (2021). Turing Lecture: Deep Learning for AI. Communications of the ACM, July 2021. HTML. Local copy (HTML only). [DL4] J. Schmidhuber (AI Blog, 2017). Our impact on the world's most valuable public companies: Apple, Google, Microsoft, Facebook, Amazon... By greatly improved (CTC-based) on-device speech recognition (on the phone, not the server) LSTM. PDF. J. Schmidhuber (AI Blog, Nov 2020). 15-year anniversary: 1st paper with "learn deep" in the title (2005). Our deep reinforcement learning & neuroevolution solved problems of depth 1000 and more.[DL6] Soon after its publication, everybody started talking about "deep learning." Causality or correlation? Web site deeplearning.net of Y. Bengio's MILA (2015, retrieved May 2020; compare the version in the Internet Archive), referring to Hinton's[UN4] and Bengio's[UN5] unsupervised pre-training for deep NNs[UN4] (2006) although this type of deep learning dates back to 1991.[UN1-2][UN] II & XVII & III. [DLC] J. Schmidhuber (AI Blog, June 2015). Critique of Paper by "Deep Learning Conspiracy" (Nature 521 p 436). arxiv:1312.5602. Link. Alphastar has a "deep LSTM core." arXiv:1808.03578, 2018. used LSTM over 4 billion automatic translations per day (The Verge, August 4, 2017); Facebook blog by J.M. Pino, A. Sidorov, N.F. Ayan (August 3, 2017) PDF. J.  Schmidhuber (AI Blog, 26 March 2021). alternative[FWP0-1] to recurrent NNs. the fast weights[FAST,FASTa] of Such Fast Weight Programmers[FWP0-6,FWPMETA1-7] can learn to memorize past data, e.g., by computing fast weight changes through additive outer products of self-invented activation patterns[FWP0-1] (now often called keys and values for self-attention[TR1-6]). The similar Transformers[TR1-2] combine this with projections linear Transformers or Performers[TR5-6] In 1993, I introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves. PDF. PDF. HTML. Pictures (German). PDF. Preprint: arXiv:1811.12143. PDF. PDF. Like [FWP0-2]. Preprint: arXiv:2003.08165. PDF. HTML overview. Linear Transformers Are Secretly Fast Weight Programmers. ICML 2021. Preprint: arXiv:2102.11174. Preprint: arXiv:2106.06295 (June 2021). PDF. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here. Preprint arXiv:2012.14905 [cs.LG], 2020. Report arXiv:2011.07831 [cs.AI], 2020. Google Research Blog, Sep 2015, see also Aug 2015 Google's speech recognition based on CTC and LSTM. Alphr Technology, Jul 2015, or 9to5google, Jul 2015 WIRED, Sep 2016, siliconANGLE, Sep 2016 Blog post, Internet Archive, 2010. A blog post describing the basic ideas[AC][AC90, AC90b][AC20] of GANs. Description of GANs that does not cite the original work of 1990[AC][AC90,AC90b][AC20][R2] (also containing wrong claims about Predictability Minimization[PM0-2][AC20]). Link. This was number 1 on Hacker News. Frankfurter Allgemeine Zeitung, 16/6/2021. Preprint arXiv/2005.14165. for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. PDF. ArXiv preprint. win four important computer vision competitions 2011-2012 before others won any PDF. HTML overview. competitor.[DAN1] This led to massive interest from industry. [GPUCNN3] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, July 2012. PDF. Longer TR of Feb 2012: arXiv:1202.2745v1 [cs.CV]. More. PDF. J. Schmidhuber (AI Blog, 2017; updated 2021 for 10th birthday of DanNet): History of computer vision contests won by deep CNNs since 2011. DanNet won 4 of them in a row before the similar AlexNet/VGG Net and the Resnet (a Highway Net with open gates) joined the party. Today, deep CNNs are standard in computer vision. PDF. PDF. first deep learner to win a medical imaging contest (2012). HTML. [HIN] J. Schmidhuber (AI Blog, 2020). Critique of Honda Prize for Dr. Hinton. Science must not allow corporate PR to distort the academic record. PDF. North-Holland, 1991. PDF. Extending TR FKI-129-90, TUM, 1990. PDF. PDF. Preprints arXiv:1505.00387 (May 2015) and arXiv:1507.06228 (July 2015). Also at NIPS 2015. The LSTM with forget gates[LSTM2] for RNNs.) Resnets[HW2] are a version of this where the gates are always open: g(x)=t(x)=const=1. Highway Nets perform roughly as well as ResNets[HW2] on ImageNet.[HW3] Highway layers are also often used for natural language processing, where the simpler residual layers do not work as well.[HW3] More. Link. arXiv:1512.03385 (Dec 2015). Residual nets are a version of Highway Nets[HW1] More. arxiv:1612.07771 (2016). Also at ICLR 2017. Preprint arXiv:1704.04760 PDF. PDF. arXiv:1607.06450, 2016. A New Publishing Model in Computer Science. Local copy (HTML only). [LEI21] J. Schmidhuber (AI Blog, 2021). 375th birthday of Leibniz, founder of computer science. Frankfurter Allgemeine Zeitung (FAZ), 17/5/2021. FAZ online: 19/5/2021. PDF. [LSTM1] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. PDF. Based on [LSTM0]. More. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. Preprint: arxiv:1506.07452. PDF. J. Schmidhuber (AI Blog, Dec 2020). 10-year anniversary of our journal paper on deep reinforcement learning with policy gradients for LSTM (2007-2010). Recent PDF. Preprint arXiv:1805.04908. Architectures. Preprint arXiv:1703.03906 J. Schmidhuber (AI Blog, 2020). 1/3 century anniversary of Searchable PDF scan (created by OCRmypdf which uses LSTM). HTML. better GP methods through Meta-Evolution. More. [MIR] J. Schmidhuber (AI Blog, 2019). Deep Learning: Our Miraculous Year 1990-1991. Preprint arXiv:2005.05744, 2020. Computation 22(12): 3207-3220, 2010. ArXiv Preprint. (AI Blog, Sep 2020). 10-year anniversary of supervised deep learning breakthrough (2010). No unsupervised pre-training. By 2010, when compute was 100 times more expensive than today, both our feedforward NNs[MLP1] J.  Schmidhuber (AI Blog, 2021). The most cited neural networks all build on work done in my labs. Foundations of the most popular NNs originated in my labs at TU Munich and IDSIA. Here I mention: (1) Long Short-Term Memory (LSTM), (2) ResNet (which is our earlier Highway Net with open gates), (3) AlexNet and VGG Net (both citing our similar earlier DanNet: the first deep convolutional NN to win image recognition competitions), Adversarial Artificial Curiosity), and (5) variants of Transformers (linear Transformers are formally equivalent to my earlier Fast Weight Programmers). Annus Mirabilis of 1990-1991.[MIR] Preprint arXiv:1611.01578 (PDF), 2017. [NASC1] J. Schmidhuber. First Pow(d)ered flight / plane truth. Correspondence, Nature, 421 p 689, Feb 2003. [NASC3] J. Schmidhuber. The last inventor of the telephone. Letter, Science, 319, no. 5871, p. 1759, March 2008. Correspondence, Nature, vol 483, p 541, March 2012, doi:10.1038/483541b. Letter, Science, vol 336, p 1639, June 2012. See also comment on response by A. Hodges (DOI:10.1126/science.336.6089.1639-a) [NASC6] J. Schmidhuber. Colossus was the first electronic digital computer. Correspondence, Nature, 441 p 25, May 2006. [NASC7] J. Schmidhuber. Turing's impact. Correspondence, Nature, 429 p 501, June 2004 [NASC8] J. Schmidhuber. Prototype resilient, self-modeling robots. Correspondence, Science, 316, no. 5825 p 688, May 2007. [NASC9] J. Schmidhuber. Comparing the legacies of Gauss, Pasteur, Darwin. Correspondence, Nature, vol 452, p 530, April 2008. HTML. Link. NY Times article NY Times article Learning Dexterous In-Hand Manipulation. arxiv:1312.5602 (PDF). arxiv:1912.06680. An LSTM composes 84% of the model's total parameter count. 2018. An LSTM with 84% of the model's total parameter count was the core of OpenAI Five. PDF. HTML. J. Schmidhuber (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990). This work also introduced high-dimensional reward signals, deterministic policy gradients for RNNs, the GAN principle Based on TR FKI-126-90 (1990).[AC90] More. PDF. Partially based on TR FKI-126-90 (1990).[AC90] Report arXiv:1210.0118 [cs.AI], 2015. One Big Net For Everything. Preprint arXiv:1802.08864 [cs.AI], Feb 2018. Preprint: arXiv:1809.01999. Github: World Models. minimization. TR CU-CS-565-91, Univ. Colorado at Boulder, 1991. PDF. More. 1991. PDF. More. PDF. More. arXiv:1112.5309 [cs.AI] First Experiments with PowerPlay. arXiv:1210.8385 [cs.AI]. [R1] Reddit/ML, 2019. Hinton, LeCun, Bengio receive ACM Turing Award. [R2] Reddit/ML, 2019. J. Schmidhuber really had GANs in 1990. [R3] Reddit/ML, 2019. NeurIPS 2019 Bengio Schmidhuber Meta-Learning Fiasco. [R4] Reddit/ML, 2019. Five major deep learning papers by G. Hinton did not cite similar earlier work by J. Schmidhuber. [R5] Reddit/ML, 2019. The 1997 LSTM paper by Hochreiter & Schmidhuber has become the most cited deep learning research paper of the 20th century. [R6] Reddit/ML, 2019. DanNet, the CUDA CNN of Dan Ciresan in J. Schmidhuber's team, won 4 image recognition challenges prior to AlexNet. [R7] Reddit/ML, 2019. J. Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970. [R8] Reddit/ML, 2019. J. Schmidhuber on Alexey Ivakhnenko, godfather of deep learning 1965. [R9] Reddit/ML, 2019. We [R11] Reddit/ML, 2020. Schmidhuber: Critique of Honda Prize for Dr. Hinton [R12] Reddit/ML, 2020. J. Schmidhuber: Critique of Turing Award for Drs. Bengio & Hinton & LeCun [R15] Reddit/ML, 2021. J. Schmidhuber's work on fast weights from 1991 is similar to linearized variants of Transformers Preprint arXiv/1311.2524, Nov 2013. Preprint arXiv/1703.06870, 2017. PDF. This experimental analysis of backpropagation did not cite the origin of the method,[BP1-4] also known as the reverse mode of automatic differentiation. Link. The Past, Present and Future of Artificial Intelligence. PDF. PDF. ACM's justification of the 2018 A.M. Turing Award (announced in 2019). WWW link. Local copy 1 (HTML only). Local copy 2 (HTML only). [T20a] J. Schmidhuber (AI Blog, 25 June 2020). Critique of 2018 Turing Award for Drs. Bengio & Hinton & LeCun. The first version of the present critique. Link. [TUR21] J. Schmidhuber (AI Blog, Sep 2021). Turing Oversold. It's not Turing's fault, though. J. Schmidhuber (AI Blog, 2021). 30-year anniversary. 1991: First very deep learning with unsupervised pre-training. Unsupervised PDF. 1992. Based on TR FKI-148-91, TUM, 1991.[UN0] PDF. approaches are now widely used. More. [UN2] J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here (depth > 1000). 2006. PDF. Link. [VAN1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TUM, 1991 (advisor J. Schmidhuber). PDF. More on the Fundamental Deep Learning Problem. PDF. [VAN4] Y. Bengio. Neural net language models. Scholarpedia, 3(1):3881, 2008. Link. Link. Youtube video [see 28:16]. But in 2010, our team showed[MLP1-2] unsupervised pre-training is not necessary Youtube video, 2018. Preprint arXiv:1609.08144 (PDF), 2016. Based on LSTM which it mentions at least 50 times. WWW link (retrieved 15 May 2020). Local copy (plain HTML only). a general, practical, program-controlled computer. PDF. J. Schmidhuber (AI Blog, 2021). 80th anniversary celebrations: 1941: Konrad Zuse completes the first working general computer, based on his 1936 patent application.

      Deep Learning: Our Miraculous Year 1990-1991 Menu directory status & updates copyrights 1991: Neural nets learn to program neural nets with fast weights (1991) - by Juergen Schmidhuber AI Blog
      Twitter: @SchmidhuberAI Traditionally this is done with recurrent NNs (RNNs) published.[FWP0-1] the fast weights of another NN (see Sec. 1). In 1991, one of them[FWP0-1] (now often called keys and values for self-attention; Sec. 2). The very similar Transformers[TR1-2] combine this with projections Transformers with linearized self-attention[TR5-6] to the 1991 Fast Weight Programmers[MOST] (see this tweet). In 1993, I also introduced the attention terminology[FWP2] now used in this context[ATT] (Sec. 4), and RNNs that program themselves (Sec. 3). famous vanishing gradient problem aka deep learning problem (analyzed a few months later in 1991[VAN1]) through additive fast weight changes (Sec. 5). additive neural activations of LSTMs / Highway Nets / ResNets[HW1-3] (Sec. 5) Annus Mirabilis of deep learning.[MIR] brand new, improved version[FWP6] of the 1991 fast weight update rule (Sec. 6). reinforcement learning through neuroevolution[FWP5] (2005-, Sec. 7), goal-conditioned policy generators (2022),[GGP] metalearning machines that learn to learn[FWPMETA1-9] (1992-2022, Sec. 8).

      Goedel Machine As I have frequently emphasized since 1990,[AC90][PLAN][META] artificial neural network (NN) universal self-referential formal systems,[GOD][GOD34] I built NNs whose outputs are changes of programs or weight matrices of other NNs[FWP0-2] (Sec. 1, 2, 3), their own weight change algorithms or learning algorithms[FWPMETA1-5] (Sec. 8). gradient descent procedure[BP1-4][BPA][R7]) can compute a direction in program space where one may find a better program,[AC90] better program-modifying program.[FWP0-2][FWPMETA1-5] started in 1965 layers.[DEEP1-2] Their activation functions were Kolmogorov-Gabor polynomials which include the now popular multiplicative gates,[DL1-2] fast weights. von der Malsburg was the first to explicitly emphasize the importance of NNs with rapidly changing weights.[FAST] The second paper on this was published by Feldman in 1982.[FASTa] The weights of a 1987 NN were sums of weights with a large learning rate and weights with a small rate[FASTb][T22] (but have nothing to do with the NN-programming NNs discussed below). Fast Weight Programmers (FWPs) were published in 1991-93[FWP0-2] (Sec. 1, 2, 3, 4). attention[ATT] (Sec. 4) and Transformers[TR1-6] (Sec. 2, 3, 4, 5).

      End-To-End Differentiable Fast Weights: NNs Learn to Program NNs (1991) on 26 March 1991, slow NN that learns by backpropagation[BP1-4] to rapidly modify the fast weights of another NN,[FWP0] essentially published in Neural Computation.[FWP1] attention[ATT] (Sec. 4) That is, I separated storage and control like in traditional computers, but in a fully neural way (rather than in a hybrid fashion[PDA1][PDA2][DNC]). Synthetic Gradients.[NAN1-5] recurrent NNs (RNNs) One of the FWPs of 1991[FWP0-1] is illustrated in the figure. There is A disadvantage addressed in Sec. 2 is that the slow net needs many output units if the fast net is large.

      Slow neural net programs fast neural net through additive outer products

      The Fast Weight Programmer[FWP0-1] depicted in Sec. 1 has a slow net unit for each fast weight. However, Section 2 of the same 1991 paper[FWP0] linear[TR5-6] Transformers[TR1-2] or attention[ATT] (compare Sec. 4). to the fast weight (which then may be normalized by a squashing function[FWP0]). The second order tensor products.[FWP0-3a] linear Transformers).[FWP6][TR5-6] The highly successful Transformers of 2017[TR1-2] can be viewed as a combination of my additive outer product fast weight principle[FWP0-2] NN-programmed fast weights (Sec. 5 & 1). linear Transformers (2020-21)[TR5-6] abandoned the softmax, essentially resurrecting the original 1991 system.[FWP0-1] Compare Sec. 6. go back at least to Hebb's informal rule (1949)[HE49] and Steinbuch's Learning Matrix around 1960.[ST61-63][AMH1-2][KOH72][LIT74][PAL80][KOS88] since 1991.[FWP0-3a][TR5-6] I offered the FWPs of 1991[FWP0-1] as an sequence-processing recurrent NNs (RNNs) (Sec. 1), the computationally most powerful NNs of them all.[UN][MIR](Sec. 0) Modern Transformers are also viewed as RNN alternatives, despite their limitations.[TR3-4] The slow net and the fast net of the 1991 system[FWP0-1] in Sec. 2 were feedforward NNs (FNNs), like most current Transformers.[TR1-6] I collapsed all of this into a single RNN that could rapidly reprogram all of its own fast weights through additive outer product-based weight changes.[FWP2] One motivation reflected by the title of the paper[FWP2] of the same size: O(H2) instead of O(H), where H is the number of hidden units. This motivation and a variant of the method was republished over two decades later.[FWP4a][R4][MIR](Sec. 8)[T22](Sec. XVII, item H3) See also our more recent work on FWPs since 2017,[FWP3-3a][FWPMETA7][FWP6] and compare a recent study.[RA21] 4. Attention terminology of 1993 End-to-End Differentiable Sequential Neural Attention 1990-93. Juergen Schmidhuber Today, everybody is talking about attention when it comes to describing the principles of Transformers.[TR1-2] The additive outer products[FWP0-1] of the Fast Weight Programmers described in Sec. 2 and Sec. 3 Similarly, the attention weights or self-attention weights (see also[FWP4b-d]) NN-programmed fast weights (Sec. 5).[FWP0-1], Sec. 9 & Sec. 8 of [MIR], Sec. XVII of [T22] 1993 paper[FWP2] which internal spotlights of attention Fast Weight Programmers.[FWP2][ATT] Apart from possible normalization/squashing,[FWP0] are additive (Sec. 1 & 2). do not suffer during sequence learning from the famous vanishing gradient by my brilliant student Sepp Hochreiter a few months later in his 1991 diploma thesis.[VAN1]

      LSTM and both of them dating back to 1991, our miraculous year of deep learning.[MIR] Basic Long Short-Term Memory[LSTM1] solves the problem by adding at every time step That is, the core of LSTM is operating in a linear additive activation space (ignoring LSTM's multiplicative gates).[LSTM1][VAN1][MIR](Sec. 4 & Sec. 8) Additive FWPs[FWP0-2] (Sec. 1 & 2), however, solve the problem through a dual approach, By favoring additive operations yielding non-vanishing first derivatives and error flow,[VAN1] Transformers[TR1-6] also follow the additive approach.[FWP0-2] (compare Sec. 2 and Sec. 4 on attention terminology since 1993).

      Highway Networks:
LSTM's traditional additive activation-based  approach<sup><small><small><a href=[LSTM1-13] is mirrored in the LSTM-inspired Highway Network (May 2015),[HW1][HW1a][HW3] the first working really deep It is essentially a feedforward version of LSTM[LSTM1] with forget gates.[LSTM2] Residual Net or ResNet[HW2] (Dec 2015). Remarkably, both of these dual approaches of 1991 have become successful. the mid 2010s,[DEC] major IT companies overwhelmingly used smartphones.[DL4] rapidly learn to solve quickly[LSTM13] while plain Transformers can't yet.[TR4] unsupervised pre-training of deep NNs.[UN0-UN2][MIR](Sec. 1) dates back to 1991[UN] Recent work of February 2021[FWP6] mechanisms[TR5-6] and Fast Weight Programmers[FWP0-2] (Sec. 2).[FWP4a][R4][MIR](Sec. 8)[T22](Sec. XVII, item H3) variants.[TR5-6] Building on previous work[FWPMETA7] on FWPs (Sec. 1, 2, 3, 8), we replace the 1991 elementary programming instruction based on additive outer products[FWP0-2] by a delta rule-like[WID] language modeling tasks.[FWP6] Our code is public. work of June 2021[FWP7] (also with Robert Csordas) points out that the original FWP formulation of 1991[FWP0-1] is more general than the one of linear Transformers: a slow NN continually reprograms the weights of a fast NN with Our code is public.

      Reinforcement learning robotino double pole balancer with neuroevolution for fast weights as shown in 2005 with my former postdoc Faustino Gomez[FWP5] (now CEO of NNAISENSE) Our 2005 paper on deep RL[DL6,6a] was actually the first machine learning numerous weights of large NNs through very compact codes.[KO0-2][CO1-4] Here we exploited that the Kolmogorov complexity or algorithmic information content of successful huge NNs may actually be rather small. Compressed Network Search[CO2] unsupervised pre-training.

      Recent work of 2022[GGP] with

      self-referential weight matrix My first work on metalearning machines that learn to learn was published in 1987.[META][R3] metalearning in a very general way. In references[FWPMETA1-5] since 1992, the slow NN and the fast NN (Sec. 1) are recurrent and identical. The RNN can see its own errors or reward signals called eval(t+1) in the image.[FWPMETA5]

      The 1993 FWP of Sec. 3[FWP2] also was an RNN RNN above,[FWPMETA1-5] it used outer products between key patterns and value patterns (Sec. 2) to manipulate used gradient descent in LSTM networks[LSTM1] instead of traditional functions of two variables[HO1] (more on LSTM and fast weights in Sec. 5). In 2020, Imanol et al. augmented an LSTM with an associative fast weight memory.[FWPMETA7] partially observable environments.[FWPMETA7] Our recent MetaGenRL (2020)[METARL10] meta-learns See the blog post of my PhD student Louis Kirsch. outer-product-like fast weights encoded in the activations of LSTMs.[FWPMETA6] variables[FWP2] (Sec. 3). VS-ML can also learn to implement the backpropagation learning algorithm[BP1-4] purely in the end-to-end differentiable forward dynamics of RNNs.[FWPMETA6]

      In 2022, we also published at ICML a modern self-referential weight matrix (SWRM)[FWPMETA8] based on the 1992 SWRM.[FWPMETA1-5] self-improvement (compare this tweet). A modern self-referential weight matrix (2022) based on the one of 1992 There is another version of this article This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Creative Commons License J.  Schmidhuber (AI Blog, 2021). 3 decades of artificial curiosity & creativity. Our PDF. The first paper on long-term planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks (more). PDF. First publication of what was later sometimes called the Hopfield network[AMH2] or Amari-Hopfield Network. The Hopfield network or Amari-Hopfield Network was published in 1972 by Amari.[AMH1] [ATT] J. Schmidhuber (AI Blog, 2020). 30-year anniversary of end-to-end differentiable sequential neural attention. Plus goal-conditional reinforcement learning. Schmidhuber Transformers with linearized self-attention (1991-93).[FWP] Today, both types are very popular. PDF. PDF. More. PS. (PDF.) Precursor of modern backpropagation.[BP1-4] PDF. Link. PDF. First application of backpropagation[BP1] to NNs (concretizing thoughts in his 1974 thesis). [BP4] J. Schmidhuber (AI Blog, 2014; updated 2020). Who invented backpropagation? More.[DL2] PDF. PDF. PDF. [DEC] J. Schmidhuber (AI Blog, 02/20/2020, updated 2021, 2022). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s. The More. Deep Learning. [DL4] J. Schmidhuber (AI Blog, 2017). Our impact on the world's most valuable public companies: Apple, Google, Microsoft, Facebook, Amazon... By greatly improved (CTC-based) on-device speech recognition (on the phone, not the server) LSTM. PDF. J. Schmidhuber (AI Blog, Nov 2020). 15-year anniversary: 1st paper with "learn deep" in the title (2005). Our deep reinforcement learning & neuroevolution solved problems of depth 1000 and more.[DL6] Soon after its publication, everybody started talking about "deep learning." Causality or correlation? neural networks learning to control dynamic external memories.[PDA1-2][FWP0-1] J.  Schmidhuber (AI Blog, 26 March 2021, updated 2022). alternative[FWP0-1] to recurrent NNs. the fast weights[FAST,FASTa] of Such Fast Weight Programmers[FWP0-6,FWPMETA1-8] can learn to memorize past data, e.g., by computing fast weight changes through additive outer products of self-invented activation patterns[FWP0-1] (now often called keys and values for self-attention[TR1-6]). The similar Transformers[TR1-2] combine this with projections Transformers with linearized self-attention[TR5-6] In 1993, he introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves. See tweet of 2022. PDF. "Transformer with linearized self-attention."[FWP] PDF. HTML. Pictures (German). See tweet of 2022 for 30-year anniversary. PDF. Preprint: arXiv:1811.12143. PDF. PDF. Preprint: arXiv:2003.08165. PDF. HTML overview. Linear Transformers Are Secretly Fast Weight Programmers. ICML 2021. Preprint: arXiv:2102.11174. Preprint: arXiv:2106.06295 (June 2021). PDF. PDF. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here. Preprint arXiv:2012.14905 [cs.LG], 2020. Report arXiv:2011.07831 [cs.AI], 2020. Preprint: arXiv:2202.05780. Preprint arXiv/2207.01570, 4 July 2022 (submitted in May 2022). Preprints arXiv:1505.00387 (May 2015) and arXiv:1507.06228 (July 2015). Also at NIPS 2015. The LSTM with forget gates[LSTM2] for RNNs.) Resnets[HW2] are a version of this where the gates are always open: g(x)=t(x)=const=1. Highway Nets perform roughly as well as ResNets[HW2] on ImageNet.[HW3] Variants of highway gates are used for certain algorithmic tasks, where the simpler residual layers do not work as well.[NDR] More. Link. arXiv:1512.03385 (Dec 2015). Residual nets are a version of Highway Nets[HW1] More. arxiv:1612.07771 (2016). Also at ICLR 2017. PDF. PDF. PDF. [LSTM1] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. PDF. More. PDF. PDF. J. Schmidhuber (AI Blog, 2020). 1/3 century anniversary of Searchable PDF scan (created by OCRmypdf which uses LSTM). HTML. better GP methods through Meta-Evolution. More. [MIR] J. Schmidhuber (AI Blog, 2019). Deep Learning: Our Miraculous Year 1990-1991. Preprint arXiv:2005.05744, 2020. J.  Schmidhuber (AI Blog, 2021). The most cited neural networks all build on work done in my labs. Foundations of the most popular NNs originated in my labs at TU Munich and IDSIA. Here I mention: (1) Long Short-Term Memory (LSTM), (2) ResNet (which is our earlier Highway Net with open gates), (3) AlexNet and VGG Net (both building on our similar earlier DanNet: the first deep convolutional NN to win image recognition competitions), Adversarial Artificial Curiosity), and (5) variants of Transformers (Transformers with linearized self-attention are formally equivalent to my earlier Fast Weight Programmers). Annus Mirabilis of 1990-1991.[MIR] PDF. PDF. Preprint arXiv:1608.05343, 2016. The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization. Proc. ICLR 2022. Preprint arXiv/2110.07732. the 1991 publication on what's now called "Transformers with linearized self-attention."[FWP0-6][TR5-6] attention terminology in 1993.[ATT][FWP2][R4] See tweet of 2022 for 30-year anniversary. J. Schmidhuber (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990). This work also introduced high-dimensional reward signals, deterministic policy gradients for RNNs, the GAN principle [R3] Reddit/ML, 2019. NeurIPS 2019 Bengio Schmidhuber Meta-Learning Fiasco. [R4] Reddit/ML, 2019. Five major deep learning papers by G. Hinton did not cite similar earlier work by J. Schmidhuber. [R7] Reddit/ML, 2019. J. Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970. [T22] J. Schmidhuber (AI Blog, 2022). Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award. Technical Report IDSIA-77-21, IDSIA, Lugano, Switzerland, 2022. PDF. J. Schmidhuber (AI Blog, 2021). 30-year anniversary. 1991: First very deep learning with unsupervised or self-supervised pre-training. Unsupervised PDF. 1992. Based on TR FKI-148-91, TUM, 1991.[UN0] PDF. approaches are now widely used. More. [UN2] J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here (depth > 1000). [VAN1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TUM, 15 June 1991 (advisor J. Schmidhuber). PDF. Metalearning or Learning to Learn Since 1987. Juergen Schmidhuber. Transformers with linearized self-attention in Neural Computation 1992, equivalent to fast weight programmers (apart from normalization), separating storage and control. Key/value was called FROM/TO. The attention terminology was introduced at ICANN 1993. Juergen Schmidhuber. Menu directory status & updates copyrights
      https://people.idsia.ch/~juergen/deep-learning-history.html AI Blog
      @SchmidhuberAI
      arXiv:2212.11279 is dominated by artificial neural networks (NNs) and deep learning,[DL1-4] hyperlinks to relevant overview sites from my AI Blog. It also debunks certain popular but misleading historic accounts of deep learning, and supplements my previous deep learning survey[DL1] mentioning my own team's work, because (as of 2022) the most cited NNs are based on it.[MOST] Sec. 1: Introduction
      Sec. 2: 1676: The Chain Rule For Backward Credit Assignment
      Sec. 3: Circa 1800: First Neural Net (NN) / Linear Regression / Shallow Learning
      Sec. 4: 1920-1925: First Recurrent NN (RNN) Architecture. ~1972: First Learning RNNs
      Sec. 5: 1958: Multilayer Feedforward NN (without Deep Learning)
      Sec. 6: 1965: First Deep Learning
      Sec. 7: 1967-68: Deep Learning by Stochastic Gradient Descent
      Sec. 8: 1970: Backpropagation. 1982: For NNs. 1960: Precursor.
      Sec. 9: 1979: First Deep Convolutional NN (1969: Rectified Linear Units)
      Sec. 10: 1980s-90s: Graph NNs / Stochastic Delta Rule (Dropout) / More RNNs / Etc
      Sec. 11: Feb 1990: Generative Adversarial Networks / Artificial Curiosity / NN Online Planners
      Sec. 12: April 1990: NNs Learn to Generate Subgoals / Work on Command
      Sec. 13: March 1991: NNs Learn to Program NNs. Transformers with Linearized Self-Attention
      Sec. 14: April 1991: Deep Learning by Self-Supervised Pre-Training. Distilling NNs
      Sec. 15: June 1991: Fundamental Deep Learning Problem: Vanishing/Exploding Gradients
      Sec. 16: June 1991: Roots of Long Short-Term Memory / Highway Nets / ResNets
      Sec. 17: 1980s-: NNs for Learning to Act Without a Teacher
      Sec. 18: It's the Hardware, Stupid!
      Sec. 19: But Don't Neglect the Theory of AI (Since 1931) and Computer Science
      Sec. 20: The Broader Historic Context from Big Bang to Far Future
      Sec. 21: Acknowledgments
      Sec. 22: 555+ Partially Annotated References (many more in the award-winning survey[DL1])
      quite erroneous ideas about the origins of the universe (see the final section

      A history of AI written in the 1980s would have emphasized topics such as theorem proving,[GOD][GOD34][ZU48][NS56] logic programming, expert systems, and heuristic search.[FEI63,83][LEN83] an old area of research seeing renewed interest. Practical AI dates back at least to 1914, when Leonardo Torres y Quevedo (see below) built the first working chess end game player[BRU1-4] any type of computation-based AI.[GOD][BIB3][GOD21,a,b] emphasis on topics such as support vector machines and kernel methods,[SVM1-4] Bayesian (actually Laplacian or possibly Saundersonian[STI83-85]) reasoning[BAY1-8][FI22] and other concepts of probability theory and statistics,[MM1-5][NIL98][RUS95] decision trees,e.g.,[MIT97] ensemble methods,[ENS1-4] swarm intelligence,[SW1] and evolutionary computation.[EVO1-7]([TUR1],unpublished) Why? Because back then such techniques drove many successful AI applications.

      A history of AI written in the 2020s must emphasize concepts such as the even older chain rule[LEI07] and deep nonlinear artificial neural networks (NNs) trained by gradient descent,[GD'] in particular, feedback-based recurrent networks, which are general computers whose programs are weight matrices.[AC90] Why? Because many of the most famous and most commercial recent AI applications depend on them.[DL4] MACY conferences (1946-1953)[MACY51] and the 1951 Paris conference on calculating machines and human thought, now often viewed as the first conference on AI.[AI51][BRO21][BRU4] modern AI based on "deep learning" with NNs.[DL1-2][DEC] minimize pain, maximize pleasure, drive cars, etc.[MIR](Sec. 0)[DL1-4]

      The present piece also debunks a frequently repeated, misleading "history of deep learning"[S20][DL3,3a] which ignores most of the pioneering work mentioned below.[T22] See Footnote 6. The title image of the present article is a reaction to an erroneous piece of common knowledge which says[T19] that the use of NNs "as a tool to help computers recognize patterns and simulate human intelligence had been introduced in the 1980s," although such NNs appeared long before the 1980s.[T22] on the history of aviation,[NASC1-2] the telephone,[NASC3] the computer,[NASC4-7] resilient robots,[NASC8] and scientists of the 19th century.[NASC9] Finally,

      Leibniz, father of computer science circa 1670, publishes the chain rule in 1676

      In 1676, Gottfried Wilhelm Leibniz textbook on Leibniz' differential calculus.[LEI07-10][L84]

      Cauchy This answer is used by the technique of gradient descent (GD), apparently first proposed by Augustin-Louis Cauchy in 1847[GD'] (and much later by Jacques Hadamard[GD'']; the stochastic version called SGD is due to Herbert Robbins and Sutton Monro (1951)[STO51-52]).

      Footnote 1. In 1684, Leibniz was also the first to publish "modern" calculus;[L84][SON18][MAD05][LEI21,a,b] later Isaac Newton was also credited for his unpublished work.[SON18] Their priority dispute,[SON18] however, did not encompass the chain rule.[LEI07-10] Of course, both were building on earlier work: in the 2nd century B.C., Archimedes (perhaps the greatest scientist ever[ARC06]) paved the way for infinitesimals Sangamagrama and colleagues of the Indian Kerala school.[MAD86-05] "the world's first computer scientist"[LA14]) also laid foundations of modern computer science. He and the first with an internal memory.[BL16] He described the principles of binary computers (1679)[L79][L03][LA14][HO66][LEI21,a,b] His formal Algebra of Thought (1686)[L86][WI48] was deductively equivalent[LE18] to the much later Boolean Algebra (1847).[BOO] all possible questions through computation;[WI48]

      Footnote 3. Some claim that the backpropagation algorithm (discussed further down; now widely used to train deep NNs) is just the chain rule of Leibniz (1676) & L'Hopital (1696).[CONN21] doing this).[T22] It was not published until 1970, as discussed below.[BP1,4,5]

      In 1805, Adrien-Marie Legendre published what's now often called a linear neural network (NN). Johann Carl Friedrich Gauss was also credited for earlier unpublished work on this done circa 1795

      In 1805, Adrien-Marie Legendre published what's now often called a linear neural network (NN). Later Johann Carl Friedrich Gauss was also credited for earlier unpublished work on this done circa 1795.[STI81]

      In 1795, Gauss used what's now called a linear neural net, but Legendre published this first in 1805. Gauss is often called the greatest mathematician since antiquity Rosenblatt's perceptron (1958)[R58] combined a linear NN as above with an output threshold function to obtain a pattern classifier (compare his more advanced work on multi-layer networks discussed below). Joseph[R61] Widrow & Hoff's similar Adaline learned in 1962.[WID62]

      In 1924, Ernst Ising published the first recurrent network architecture: the Ising Model or Lenz-Ising model analyzed by physicists Ernst Ising and Wilhelm Lenz in the 1920s.[L20][I24,I25][K41][W45][T22] It settles into an equilibrium state in response to input conditions, and is the foundation of the first learning RNNs (see below). were also discussed in 1943 by neuroscientists Warren McCulloch und Walter Pitts[MC43] and formally analyzed in 1956 by Stephen Cole Kleene.[K56]

      In 1972, Shun-Ichi Amari made the Ising recurrent net adaptive. This was the first published learning artificial recurrent neural network

      In 1972, Shun-Ichi Amari made the Lenz-Ising recurrent architecture adaptive such that it could learn to associate input patterns with output patterns by changing its connection weights.[AMH1] See also Stephen Grossberg's work on biological networks,[GRO69] David Marr's[MAR71] and Teuvo Kohonen's[KOH72] work, and Kaoru Nakano's learning RNN.[NAK72]

      Alan Turing

      10 years later, the Amari network was republished (and its storage capacity analyzed).[AMH2] Some called it the Hopfield Network (!) or Amari-Hopfield Network.[AMH3] sequence-processing generalization thereof.[AMH1] learning RNNs. This, however, was first published many decades later,[TUR1] which explains the obscurity of his thoughts here.[TUR21] (Margin note: it has been pointed out that the famous "Turing Test" should actually be called the "Descartes Test."[TUR3,a,b][TUR21])

      Today, the most popular RNN is the Long Short-Term Memory (LSTM) mentioned below, which has become the most cited NN of the 20th century.[MOST]

      In 1958, Frank Rosenblatt had  multilayer perceptrons whose last layer learned

      In 1958, Frank Rosenblatt not only combined linear NNs and threshold functions (see the section on shallow learning since 1800), he also had more interesting, deeper multilayer perceptrons (MLPs).[R58] because only the last layer learned,[DL1] Rosenblatt basically had what much later was rebranded as Extreme Learning Machines (ELMs) without proper attribution.[ELM1-2][CONN21][T22]

      MLPs were also discussed in 1961 by Karl Steinbuch[ST61-95] and Roger David Joseph[R61] (1961). See also Oliver Selfridge's multilayer Pandemonium[SE59] (1959). wrote about "back-propagating errors" in an MLP with a hidden layer[R62] although he did not yet have a general deep learning algorithm for deep MLPs. What's now called backpropagation is quite different and was first published in 1970, as discussed below.[BP1-BP5][BPA-C]

      Today, the most popular FNN is a version of the LSTM-based Highway Net (mentioned below) called ResNet,[HW1-3] which has become the most cited NN of the 21st century.[MOST]

      In 1965, Alexey Ivakhnenko & Valentin Lapa introduced  the first working deep learning algorithm for deep MLPs with arbitrarily many hidden layers multiplicative gates).[DEEP1-2][DL1-2][FDL] A paper of 1971[DEEP2] highly cited method which was still popular in the new millennium,[DL2] especially in Eastern Europe, where much of Machine Learning was born.[MIR](Sec. 1)[R8] first introduced to Machine Learning much later by Dechter (1986), and to NNs by Aizenberg et al (2000).[DL2] (Margin note: our 2005 paper on deep learning[DL6,6a] was the first machine learning publication with the word combination "learn deep" in the title.[T22])

      In 1967-68, Shun-Ichi Amari trained deep MLPs by stochastic gradient descent

      Ivakhnenko and Lapa (1965, see above) end-to-end fashion from scratch by stochastic gradient descent (SGD),[GD1] a method proposed in 1951 by Robbins & Monro.[STO51-52]

      Amari's implementation[GD2,GD2a] (with his student Saito) learned internal representations in a five layer MLP with two modifiable layers, which was trained to classify

      See also Iakov Zalmanovich Tsypkin's even earlier work on gradient descent-based on-line learning for non-linear systems.[GDa-b]

      Remarkably, as mentioned above, Amari also published learning RNNs in 1972.[AMH1]

      who invented backpropagation?

      In 1970, Seppo Linnainmaa was the first to publish what's now known as backpropagation, the famous algorithm for credit assignment in networks of differentiable nodes,[BP1,4,5]

      In 1960, Henry J. Kelley had a precursor of backpropagation in the field of control theory

      In 1982, Paul Werbos proposed to use the method to train NNs,[BP2] extending ideas in his 1974 thesis.

      In 1960, Henry J. Kelley already had a precursor of backpropagation in the field of control theory;[BPA] see also later work of the early 1960s by Stuart Dreyfus and Arthur E. Bryson.[BPB][BPC][R7] Unlike Linnainmaa's general method,[BP1] the systems of the 1960s[BPA-C]

      Backpropagation is essentially an efficient way of implementing Leibniz's chain rule[LEI07-10] (1676) (see above) for deep networks. Cauchy's gradient descent[GD'] uses this to such that the NN behaves more and more like some teacher, which could be a human, or another NN,[UN-UN2] or something else. had just become accessible in wealthier academic labs. An experimental analysis of the known method[BP1-2] yield useful internal representations in hidden layers of NNs.[RUM] At least for supervised learning, backpropagation is generally more efficient than Amari's above-mentioned deep learning through the more general SGD method (1967), which learned useful internal representations in NNs about 2 decades earlier.[GD1-2a]

      It took 4 decades until the backpropagation method of 1970[BP1-2] got widely accepted as a training method for deep NNs. Before 2010, many thought that the training of NNs with many layers requires unsupervised pre-training, a methodology introduced by myself in 1991[UN][UN0-3] (see below), and later championed by others (2006).[UN4] In fact, it was claimed[VID1] postdoc Dan Ciresan[MLP1-2] pre-training for important applications.[MLP2]

      10-year anniversary of supervised deep learning breakthrough (2010)

      Our system set a new performance record[MLP1] on Jung & Oh in 2004[GPUNN]). A reviewer called this a "wake-up call to the machine learning community." researchers took a fresh look at the problem in the 1980s."[S20] However, the 1969 book[M69] addressed a "problem" of Gauss & Legendre's shallow learning (circa 1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method,[DEEP1-2][DL2] and then also by Amari's SGD for MLPs.[GD1-2] Minsky neither cited this work nor corrected his book later.[HIN](Sec. I)[T22] (such as the Boltzmann machine[BM][HIN][SK75][G63][T22]) without relating them to the original work,[DLC][S20][T22] although the true history is well-known. in the 1960s-70s, especially outside of the Anglosphere.[DEEP1-2][GD1-3][CNN1][DL1-2][T22] Blatant misattribution and unintentional[PLAG1][CONN21] or intentional[FAKE2] plagiarism are still tainting the entire field of deep learning.[T22] Scientific journals "need to make clearer and firmer commitments to self-correction,"[SV20] as is already the standard in other scientific fields.

      In 1979, Kunihiko Fukushima introduced the convolutional neural network (CNN) architecture. Computer Vision was revolutionized in the 2010s by a particular feedforward NN called the convolutional NN (CNN).[CNN1-4] Neocognitron.[CNN1] rectified linear units (ReLUs) for NNs (1969).[RELU1] They are now widely used in CNNs and other NNs.

      In 1987, NNs with convolutions were combined by Alex Waibel with weight sharing and backpropagation (see above),[BP1-2] and applied to speech.[CNN1a] Waibel did not call this CNNs but TDNNs. called max-pooling was introduced by Yamaguchi et al. for TDNNs in 1990[CNN3a] and by Juan Weng et al. for higher-dimensional CNNs in 1993.[CNN3] Yann LeCun's team has contributed improvements of CNNs, especially for images.[CNN2,4][T22] Baldi and Chauvin (1993) had the first application of CNNs with backpropagation to biomedical/biometric images.[BA93]

      History of computer vision contests won by deep CNNs since 2011 CNNs (Dan Ciresan et al., 2011).[GPUCNN1,3,5] Our fast GPU-based[GPUNN][GPUCNN5] CNN of 2011[GPUCNN1] known as DanNet[DAN,DAN1][R6] CNNs of 2006.[GPUCNN] In 2011, DanNet became the first pure deep CNN to win computer vision contests.[GPUCNN2-3,5] Competition[GPUCNN5] ICDAR 2011 Chinese handwriting DanNet[GPUCNN1-3] IJCNN 2011 traffic signs DanNet[DAN,DAN1][R6] ISBI 2012 image segmentation DanNet[GPUCNN3a] ICPR 2012 medical imaging DanNet[GPUCNN8] ImageNet 2012 AlexNet[GPUCNN4] MICCAI 2013 Grand Challenge DanNet[GPUCNN8] ImageNet 2014 VGG Net[GPUCNN9] ResNet,[HW2] a
      Highway Net[HW1]
      with open gates
      winning four of them in a row (15 May 2011, 6 Aug 2011, 1 Mar 2012, 10 Sep 2012).[GPUCNN5] at IJCNN 2011 in Silicon Valley, DanNet blew away the competition and achieved the first superhuman visual pattern recognition[DAN1] in an international contest. DanNet was also the first deep CNN to win: a Chinese handwriting contest (ICDAR 2011), an image segmentation contest (ISBI, May 2012), and were able to greatly improve steel defect detection.[ST] CVPR paper on DanNet[GPUCNN3] 5 months later, the similar GPU-accelerated AlexNet won the ImageNet[IM09] 2012 contest.[GPUCNN4-5][R6] Our CNN image scanners were 1000 times faster than previous methods.[SCAN] The VGG network (ImageNet 2014 winner)[GPUCNN9] and other highly cited CNNs[RCNN1-3] further extended the DanNet of 2011.[MIR](Sec. 19)[MOST]

      ResNet, the ImageNet 2015 winner[HW2] (Dec 2015) and currently the most cited NN,[MOST] is a version (with open gates) of our earlier Highway Net (May 2015).[HW1-3][R5] The Highway Net (see below) is actually the feedforward net version of our vanilla LSTM (see below).[LSTM2] It was the first working, really deep feedforward NN with hundreds of layers (previous NNs had at most a few tens of layers). NNs with rapidly changing "fast weights" were introduced by v.d. Malsburg (1981) and others.[FAST,a,b] Deep learning architectures that can manipulate structured data such as graphs[T22] were our graph NN-like, Transformer-like Fast Weight Programmers of 1991[FWP0-1][FWP6][FWP] which learn to continually rewrite mappings from inputs to outputs (addressed below), and the work of Baldi and colleagues.[BA96-03] Today, graph NNs are used in numerous applications.

      Werbos,[BP2][BPTT1] Williams,[BPTT2][CUB0-2] and others[ROB87][BPTT3][DL1] analyzed ways of implementing gradient descent[GD'][STO51-52][GDa-b][GD1-2a] in RNNs. Kohonen's self-organising maps became popular.[KOH82-89] in space and time.[BB2][NAN1-4][NHE][HEL] See overviews[MIR](Sec. 15, Sec. 17) and recent renewed interest in such methods.[NAN5][FWPMETA6][HIN22] version of this became popular under the moniker "dropout."[Drop1-4][GPUCNN4] Generative Adversarial Networks (GANs) have become very popular.[MOST] They were first published in 1990 in Munich under the moniker Artificial Curiosity.[AC90-20][GAN1] Two dueling NNs (a probabilistic generator and a predictor) are trying to maximize each other's loss in a minimax game.[AC](Sec. 1) (using stochastic units[AC90] like in the much later StyleGANs[GAN2]). the predictor NN minimizes its error, while the generator NN tries to make outputs that maximize this error: one net's loss is the other net's gain.[AC90] (The world model can also be used for continual online action planning.[AC90][PLAN2-3][PLAN])

      Artificial Curiosity & Creativity Since 1990-91

      4 years before a 2014 paper on GANs,[GAN1] my well-known 2010 survey[AC10] summarised the generative adversarial NNs of 1990 as follows: a given set.[AC20][AC][T22](Sec. XVII) early adversarial machine learning settings[S59][H90] neither involved unsupervised NNs nor were about modeling data nor used gradient descent.[AC20] Predictability Minimization: unsupervised minimax game where one neural network minimizes the objective function maximized by another has been widely used for exploration in Reinforcement Learning[SIN5][OUD13][PAT17][BUR18] for synthesis of realistic images,[GAN1,2] although the latter domain was recently taken over by Rombach et al.'s Latent Diffusion, another method published in Munich,[DIF1] building on Jarzynski's earlier work in physics from the previous millennium[DIF2] and more recent papers.[DIF3-5] Predictability Minimization for creating disentangled representations of partially redundant data, applied to images in 1996.[PM0-2][AC20][R2][MIR](Sec. 7) which is now considered a remaining grand challenge.[LEC] The early 1990s, however, saw first exceptions: NNs that learn to decompose complex spatio-temporal observation sequences into compact but meaningful chunks[UN0-3] (see further below), and NN-based planners of hierarchical action sequences for compositional learning,[HRL0] as discussed next. This work injected concepts of traditional "symbolic" hierarchical AI[NS59][FU77] into end-to-end differentiable "sub-symbolic" NNs. end-to-end differentiable NN-based subgoal generators for Hierarchical Reinforcement Learning (HRL).[HRL0] Soon afterwards, this was also done with recurrent NNs that learn to generate sequences of subgoals.[HRL1-2][PHD][MIR](Sec. 10) problem."[LEC]

      Compare other NNs that have "worked on command" since April 1990, in particular, for learning selective attention,[ATT0-3] artificial curiosity and self-invented problems,[PP][PPa,1,2][AC] upside-down reinforcement learning[UDRL1-2] and its generalizations.[GGP] Recently, Transformers[TR1] have been all the rage, e.g., generating human-sounding texts.[GPT3] Transformers with "linearized self-attention"[TR5-6] were first published in March 1991[FWP0-1][FWP6][FWP] These so-called "Fast Weight Programmers" or "Fast Weight Controllers"[FWP0-1] separated storage and control like in traditional computers, but in an end-to-end-differentiable, adaptive, fully neural way (rather than in a hybrid fashion[PDA1-2][DNC]). The "self-attention" in standard Transformers[TR1-4] combines this with a projection and softmax (using attention terminology like the one I introduced in 1993[ATT][FWP2][R4]).

      26 March 1991: Neural nets learn to program neural nets with fast weights—like today's Transformer variants. 2021: New stuff!

      Today's Transformers heavily use unsupervised pre-training[UN0-3] (see next section), another deep learning methodology Annus Mirabilis of 1990-1991.[MIR][MOST]

      The 1991 fast weight programmers 1992[FWPMETA1-9][HO1] extended my 1987 diploma thesis,[META1] which introduced algorithms not just for learning but also for meta-learning or learning to learn,[META] to learn better learning algorithms through experience. This became very popular in the 2010s[DEC] when computers were a million times faster. layers of neurons or many subsequent computational stages.[MIR] ones[DL1-2] (but see a 1989 paper[MOZ]). of arbitrary depth.[DL1] Before the 1990s, however, RNNs failed to learn deep problems in practice.[MIR](Sec. 0) scales:[LEC] the Neural Sequence Chunker[UN0] or Neural History Compressor.[UN1] First Very Deep Learner of 1991 "very deep learning" tasks of depth > 1000[UN2] (requiring Neural History Compressor.[UN3] (See also recent work on unsupervised NN-based abstraction.[OBJ1-5]) More than a decade after this work,[UN1] called Deep Belief Networks (DBNs).[UN4] (or negative log probability) of the data representation in the level below.[HIN][T22][MIR] using my NN distillation procedure of 1991.[UN0-1][MIR] NN distillation was also republished many years later,[DIST2][MIR][HIN][T22] and is widely used today. used by Transformers[TR1-6] for Transformers with linearized self-attention were also first published[FWP0-6] in Annus Mirabilis of 1990-1991,[MIR][MOST] together with unsupervised/self-supervised pre-training for deep learning.[UN0-3] See the previous section. Sepp Hochreiter's Analysis of the Fundamental Deep Learning Problem (1991) Deep learning is hard because of the Fundamental Deep Learning Problem his diploma thesis which I had the pleasure to supervise.[VAN1] First he implemented the Neural History Compressor above but then did much more: In both cases, learning fails (compare[VAN2]). This analysis led to basic principles of what's now called LSTM (see below). Long Short-Term Memory (LSTM) recurrent neural network[LSTM1-6] overcomes the Fundamental Deep Learning Problem identified by Sepp in his above-mentioned 1991 diploma thesis,[VAN1] which I consider one of the most important documents in the history of machine learning. It also provided essential insights for overcoming the problem, through basic principles (such as constant error flow) of what we called LSTM in a tech report of 1995.[LSTM0] After the main peer-reviewed publication in 1997[LSTM1][25y97] (now the most cited NN article of the 20th century[MOST]), application of LSTM to speech (2004).[LSTM10] 2005 saw the first publication of LSTM with full backpropagation through time and of bi-directional LSTM[LSTM3] (now widely used). Recurrent Neural Networks, especially LSTM Another milestone of 2006 was the training method "Connectionist Temporal Classification" or CTC[CTC] for simultaneous alignment and recognition of sequences. Our team successfully applied CTC-trained LSTM to speech in 2007[LSTM4] (also with hierarchical LSTM stacks[LSTM14]). NNs and traditional approaches such as Hidden Markov Models (HMMs).[BW][BRI][BOU][HYB12][T22] three ICDAR 2009 Connected Handwriting Competitions (French, Farsi, Arabic). LSTM was soon used for everything that involves sequential data such as speech[LSTM10-11][LSTM4][DL1] and videos. Google's speech recognition on the Android smartphones.[GSR15] Many other companies adopted this.[DL4] on-device speech recognition of 2019 (now on your phone, not on the server) LSTM. In 1995, we already had an excellent neural probabilistic text model[SNT] whose basic concepts were Nakamura and Shikano's 1989 word category prediction model.[NPMa] In 2001, we showed that LSTM can learn languages unlearnable by traditional models such as HMMs,[LSTM13] achieve only 10 billion clicks),[FB17][DL4] Apple's Quicktype on roughly 1 billion iPhones,[DL4] the voice of Amazon's Alexa,[DL4] image caption generation[DL4] & automatic email answering[DL4] etc. Business Week called LSTM "arguably the most commercial AI achievement."[AV1] have "LSTM" in their title.[DEC]

      Highway Networks:
our <a href=Highway Network[HW1] (previous NNs had at most a few tens of layers). Microsoft's ResNet[HW2] (which won the ImageNet 2015 contest) is a version thereof The earlier Highway Nets perform roughly as well as their ResNet versions on ImageNet.[HW3] Variants of highway gates are also used for certain algorithmic tasks where the pure residual layers do not work as well.[NDR]

      Deep learning is all about NN depth.[DL1] LSTMs brought essentially unlimited depth to supervised recurrent NNs; in the 2000s, the LSTM-inspired Highway Nets brought it to feedforward Net version called ResNet the most cited NN of the 21st.[MOST] (Citations, however, are a highly questionable measure of true impact.[NAT1]) Reinforcement Learning (RL),[KAE96][BER96][TD3][UNI][GM3][LSTMPG] expected cumulative reward signals.[DL1] formulated in the general RL framework.[UNI] Monte Carlo (tree) search (MC, 1949),[MOC1-5] dynamic programming (DP, 1953),[BEL53] artificial evolution (1954),[EVO1-7]([TUR1],unpublished) alpha-beta-pruning (1959),[S59] control theory and system identification (1950s),[KAL59][GLA85] stochastic gradient descent (SGD, 1951),[STO51-52] and universal search techniques (1973).[AIT7] system identification,[WER87-89][MUN87][NGU89] DP and its online variant called Temporal Differences (TD),[TD1-3] artificial evolution,[EVONN1-3] and policy gradients.[GD1][PG1-3] Many additional references on this can be found in Sec. 6 of the 2015 survey.[DL1]

      When there is a Markovian interface[PLAN3] RL with DP/TD/MC-based FNNs can be very successful, as shown in 1994[TD2] (master-level backgammon player) and the 2010s[DM1-2a] (superhuman players for Go, chess, and other games). history of previous inputs, our combinations of RL algorithms and LSTM[LSTM-RL][RPG] have become standard, in particular, our LSTM trained by policy gradients (2007).[RPG07][RPG][LSTMPG]

      Deep Reinforcement Learning with Policy Gradients for Long Short-Term Memory (LSTM) ) For example, in 2018, a PG-trained LSTM was the core of OpenAI's famous Dactyl which learned to control a dextrous robot hand without a teacher.[OAI1][OAI1a] beat a pro player in the game of Starcraft, which is theoretically harder than Chess or Go[DM2] in many ways, using Alphastar whose brain has a deep LSTM core trained by PG.[DM3] OpenAI Five which learned to defeat human experts in the Dota 2 video game (2018).[OAI2] Bill Gates called this a "huge milestone in advancing artificial intelligence".[OAI2a][MIR](Sec. 4)[LSTMPG] commonsense reasoning[MAR15] and learning to think.[PLAN4-5] time scales?[LEC] We published answers to these questions in 1990-91: self-supervised neural history compressors[UN][UN0-3] learn to represent percepts at multiple levels of abstraction and multiple time scales (see above), while end-to-end differentiable NN-based subgoal generators[HRL3][MIR](Sec. 10) learn hierarchical action plans through gradient descent (see above). More sophisticated ways of learning to think in abstract ways were published in 1997[AC97][AC99][AC02] and 2015-18.[PLAN4-5] century[SHA7a][RAU1] by Heron of Alexandria Highlights of over 2000 years of computing history. Juergen Schmidhuber was perhaps the first machine with a stored program.[BAN][KOE1] It used pins on

      2021: 375th birthday of Leibniz, father of computer science. Juergen Schmidhuber. Wilhelm Schickard, In 1673, the already mentioned Gottfried Wilhelm Leibniz (called "the smartest man who ever lived"[SMO13]) designed the first machine (the step reckoner) that could perform all four arithmetic operations, and the first with a memory.[BL16] cards (1679),[L79][L03][LA14][HO66] and published the chain rule[LEI07-10] (see above), essential ingredient of deep learning and modern AI.

      Leonardo Torres y Quevedo, the  20th century's first pioneer of practical AI Leonardo Torres y Quevedo (mentioned in the introduction) became it at the 1951 Paris AI conference.[AI51][BRO21][BRU4] Konrad Zuse The corresponding patent of 1936[ZU36-38][RO98][ZUS21] predating Claude Shannon's 1937 thesis on digital circuit design.[SHA37] Unlike Babbage, Zuse used Leibniz' principles of binary computation (1679)[L79][LA14][HO66][L03] This greatly simplified the hardware.[LEI21,a,b] Church[CHU] (1935), Turing[TUR] (1936), and Post[POS] (1936). conditional jump instruction.[RO98]

      1941: Konrad Zuse builds first working general computer; patent application 1936. Juergen Schmidhuber. John Atanasoff (the "father of tube-based computing"[NASC6a]). Julius Edgar Lilienfeld in 1925.[LIL1-2] used to break the Nazi code.[NASC6] someone other than Zuse (1941)[RO98] was Howard Aiken's decimal MARK I (US, 1944). and the 1948 upgrade of ENIAC, which was reprogrammed by entering numerical instruction codes into read-only memory.[HAI14b] with several transistors on a common substrate (granted in 1952).[IC49-14] In 1959, Robert Noyce presented a monolithic IC.[IC14] ICs/GPUs of today (2022) contain many billions of transistors (almost all of them of Lilienfeld's 1925 FET type[LIL1-2]). Moore's Law which states that the number of transistors[LIL1-2] raw computational power of all human brains combined.[RAW] According to Bremermann (1982),[BRE] as previously noted back in 2004.[OOPS2][ZUS21] are actually light beams).[DL2] are expected to become even much more important than they are today.[DL2] any type of computation-based AI.[GOD][BIB3][MIR](Sec. 18)[GOD21,21a]

      1931: Theoretical Computer Science & AI Theory Founded by Goedel. Juergen Schmidhuber. He combined Georg Cantor's diagonalization trick[CAN] with the foundational work by Gottlob Frege[FRE] (who introduced the first formal language in 1879), Thoralf Skolem[SKO23] (who introduced primitive recursive functions in 1923) and Jacques Herbrand[GOD86] (who identified Gottfried Wilhelm Leibniz[L86][WI48] (see above), deductively equivalent[LE18] to the later Boolean Algebra of 1847.[BOO] In 1936, Alan M. Turing Turing Machine.[TUR] He rederived the above-mentioned result.[CHU][TUR][HIN][GOD21,21a][TUR21][LEI21,21a] In the same year of 1936, Emil Post published yet another independent universal model of computing.[POS] the world's first working programmable general-purpose computer,[ZU36-38][RO98][ZUS21] the first high-level programming language.[BAU][KNU] 1945[KNU] in 1948.[ZU48] Compare Newell & Simon's later work on theorem proving (1956).[NS56] In 1964, Ray Solomonoff combined Bayesian (actually Laplacian[STI83-85]) probabilistic reasoning and theoretical computer science[GOD][CHU][TUR][POS] of learning to predict future data from past observations.[AIT1][AIT10] With Andrej Kolmogorov, he founded the theory of Kolmogorov complexity or algorithmic information theory (AIT),[AIT1-22] going beyond traditional information theory[SHA48][KUL] this concept,[AIT7][AIT5][AIT12-13][AIT16-17] as well as applications to NNs.[KO2][CO1-3]

      In the early 2000s, Marcus Hutter (while working under my Swiss National Science Foundation grant[UNI]) augmented Solomonoff's universal predictor[AIT1][AIT10] environments.[AIT20,22] He also derived the asymptotically fastest algorithm for all well-defined computational problems,[AIT21] a beautiful pattern of exponential acceleration in it,[OMG] which I have presented in many talks since then, and which also made it into Sibylle Berg's award-winning book "GRM: Brainfuck."[OMG2] intervals: just a few decades or centuries or at most millennia.[OMG1] The most important events since the beginning of the universe seem to be neatly aligned on a timeline of exponential acceleration converging in an Omega point in the year 2040 or so (J Schmidhuber, 2014) Heron of Alexandria[RAU1] in the 1st century). The telephone (e.g., Meucci 1857, Reis 1860, Bell 1876)[NASC3] Haber-Bosch process for creating artificial fertilizer, without which the world could feed at most 4 billion people.[HAB1-2] first truly self-driving cars robot cars were driving in highway traffic, up to 180 km/h).[AUT] Back then, I worked on my 1987 diploma thesis,[META1] which introduced algorithms not just for learning but also for meta-learning or learning to learn,[META] to learn better learning algorithms through experience (now a very popular topic[DEC]). And then came our Miraculous Year 1990-91[MIR] at TU Munich, the root of today's most cited NNs[MOST] and of modern deep learning through artificial curiosity and generative adversarial NNs for agents that invent their own problems (see above),[AC90-AC20][PP-PP2][SA17] Transformers with linearized self-attention (see above),[FWP0-6][TR5-6] distilling teacher NNs into student NNs (see above),[UN][UN0-3] at multiple levels of abstraction and multiple time scales (see above),[HRL0-2][LEC] and other exciting stuff. Much of this has become very popular, and improved the lives of billions of people.[DL4][DEC][MOST] (take all of this with a grain of salt, though[OMG1]). lab for decades[AC][AC90,AC90b]) will quickly improve themselves, restricted only by the fundamental limits of computability and physics. it,[ACM16][FA15][SP16][SA17] make more and bigger AIs. Those who don't won't have an impact.[ACM16][FA15][SP16] the simplest and fastest way of computing all possible metaverses or computable universes. Juergen Schmidhuber, 1997

      Creative Commons License Some of the material above was taken from previous AI Blog posts.[MIR] [DEC] [GOD21] [ZUS21] [LEI21] [AUT] [HAB2] [ARC06] [AC] [ATT] [DAN] [DAN1] [DL4] [GPUCNN5,8] [DLC] [FDL] [FWP] [LEC] [META] [MLP2] [MOST] [PLAN] [UN] [LSTMPG] [BP4] [DL6a] [HIN] [T22] publication page and my arXiv page. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 555+ References (and many more in the survey[DL1]) In 2022, we are celebrating the following works from a quarter-century ago. 1. Journal paper on Long Short-Term Memory, the (and basis of the most cited NN of the 21st). all possible metaverses 3. Implementing artificial curiosity and creativity through generative adversarial agents that learn to design abstract, interesting computational experiments. meta-reinforcement learning. 5. Journal paper on hierarchical Q-learning. 8. Journal paper on Low-Complexity Art, the Minimal Art of the Information Age. J.  Schmidhuber (AI Blog, 2021). 3 decades of artificial curiosity & creativity. PDF. The first paper on online planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks (more). PDF. More. PDF. PDF. general system systems with intrinsic motivation,[AC90-AC95] the system also See later publications.[AC99][AC02] PDF. PDF. PDF. (More on artificial scientists and artists.) IEEE link. PDF. With a brief summary of the generative adversarial neural networks of 1990[AC90,90b][AC20] (more). Preprint arXiv/1906.04493. Link. [AIB] J. Schmidhuber. AI Blog. Includes variants of chapters of the AI Book. H. Bruderer[BRU4] calls that the first conference on AI. Blog of Werner Vogels, CTO of Amazon (Nov 2016): PDF. First publication of what was later sometimes called the Hopfield network[AMH2] or Amari-Hopfield Network,[AMH3] based on the (uncited) Lenz-Ising recurrent architecture.[L20][I25][T22] Mentions the recurrent Ising model[L20][I25]on which the (uncited) Amari network[AMH1,2] is based. The Hopfield network or Amari-Hopfield Network was first published in 1972 by Amari.[AMH1] [AMH2] did not cite [AMH1]. [ATT] J. Schmidhuber (AI Blog, 2020). 30-year anniversary of end-to-end differentiable sequential neural attention. Plus goal-conditional reinforcement learning. Schmidhuber Transformers with linearized self-attention (1991-93).[FWP] Today, both types are very popular. PDF. PDF. More. PS. (PDF.) H. Larochelle, G. E. Hinton. Learning to combine foveal glimpses with a third-order Boltzmann machine. NIPS 2010. This work is very similar to [ATT0-2] which the authors did not cite. In fact, Hinton was the reviewer of a 1990 paper[ATT2] his own work:[ATT3] attentional component (the fixation controller)." See [MIR](Sec. 9)[R4]. arXiv/1409.0473, 2014-16. This work on soft "attention" did not cite Schmidhuber's much earlier original work of 1991-1993 on soft attention and Transformers with linearized self-attention.[FWP,FWP0-2,6][ATT] J.  Schmidhuber (AI Blog, 2005). Highlights of robot car history. Around Bloomberg, May 15, 2018. PDF. HTML. PDF. by Sherrington & Kirkpatrick[SK75] & Glauber[G63] nor the first working algorithms for deep learning of internal representations (Ivakhnenko & Lapa, 1965)[DEEP1-2][HIN] nor Amari's work (1967-68)[GD1-2] on learning internal representations in deep nets through stochastic gradient descent. Even later surveys by the authors[S20][DLC] failed to cite the prior art.[T22] formal Algebra of Thought (1686)[L86][WI48] was deductively equivalent[LE18] to the much later Precursor of modern backpropagation.[BP1-5] PDF. Link. PDF. First application of backpropagation[BP1] to NNs (concretizing thoughts in Werbos' 1974 thesis). [BP4] J. Schmidhuber (AI Blog, 2014; updated 2020). Who invented backpropagation? More.[DL2] Link. IEEE Spectrum, 2021. Link. English version: [CNN1+]. More in Scholarpedia. Link. [CNN1a] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. First application of backpropagation[BP1-5] and weight-sharing PDF. Spatial Averaging.[CNN1] Spatial Averaging.[CNN1] PDF. PDF. PDF. Inverse, 2016. Link. Since November 2021: Comments on version 1 of the report[T22] in the Connectionists Mailing List, perhaps the oldest mailing list on artificial neural networks. Link to the archive. PDF. PDF. Beijing, 2014. Preprint arXiv:1402.3511 [cs.NE]. J. Schmidhuber (AI Blog, 2021). 10-year anniversary. In 2011, DanNet triggered the deep convolutional neural network (CNN) revolution. Named 1st superhuman result in 2011.[DAN1] Now everybody is using this approach. J. Schmidhuber (AI Blog, 2011; updated 2021 for 10th birthday of DanNet): First superhuman visual pattern recognition. the artificial neural network called DanNet [DEC] J. Schmidhuber (AI Blog, 02/20/2020, updated 2021, 2022). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s. The 1991 NN distillation procedure,[UN0-2][MIR](Sec. 2) More. Deep Learning. HTML. A "survey" of deep learning that does not mention the pioneering works of deep learning [T22]. [DL3a] Y. Bengio, Y. LeCun, G. Hinton (2021). Turing Lecture: Deep Learning for AI. Communications of the ACM, July 2021. HTML. Local copy (HTML only). Another "survey" of deep learning that does not mention the pioneering works of deep learning [T22]. [DL4] J. Schmidhuber (AI Blog, 2017). Our impact on the world's most valuable public companies: Apple, Google, Microsoft, Facebook, Amazon... By greatly improved (CTC-based) on-device speech recognition (on the phone, not the server) LSTM. PDF. J. Schmidhuber (AI Blog, Nov 2020). 15-year anniversary: 1st paper with "learn deep" in the title (2005). The deep reinforcement learning & neuroevolution developed in Schmidhuber's lab solved problems of depth 1000 and more.[DL6] Soon after its publication, everybody started talking about "deep learning." Causality or correlation? Web site deeplearning.net of Y. Bengio's MILA (2015, retrieved May 2020; compare the version in the Internet Archive), referring to Hinton's[UN4] and Bengio's[UN5] unsupervised pre-training for deep NNs[UN4] (2006) although this type of deep learning dates back to Schmidhuber's work of 1991.[UN1-2][UN] [DLC] J. Schmidhuber (AI Blog, June 2015). Critique of Paper by self-proclaimed[DLC2] "Deep Learning Conspiracy" (Nature 521 p 436). it). More on this under [T22]. J. Schmidhuber (AI Blog, 2022). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, IDSIA, Lugano, Switzerland, 2022. Preprint arXiv:2212.11279. Tweet of 2022. arxiv:1312.5602. Link. the first sentence of the abstract of the earlier tech report version[DM1] was created earlier by Jan Koutnik et al. in Schmidhuber's lab.[CO2] and PhDs in computer science. More. Alphastar has a "deep LSTM core." Hochreiter et al.'s first successful application [HO07] of deep learning to protein folding (2007). Preprint arXiv:2112.10752, LMU Munich, 2021. neural networks learning to control dynamic external memories.[PDA1-2][FWP0-1] arXiv:1808.03578, 2018. arXiv:1808.03578, 2018. Conf. on Neural Networks, Vol. 2, 2004, pp. 985-990. This paper does not mention that the "ELM" concept goes back to Rosenblatt's work in the 1950s.[R62][T22] This overview does not mention that the "ELM" concept goes back to Rosenblatt's work in the 1950s.[R62][T22] Link. used LSTM over 4 billion automatic translations per day (The Verge, August 4, 2017); Facebook blog by J.M. Pino, A. Sidorov, N.F. Ayan (August 3, 2017) [FDL] J. Schmidhuber (AI Blog, 2013). My First Deep Learning System of 1991 + Deep Learning Timeline 1960-2013. PDF. J.  Schmidhuber (AI Blog, 26 March 2021, updated 2022). alternative[FWP0-1] to recurrent NNs. the fast weights[FAST,FASTa,b] of Such Fast Weight Programmers[FWP0-6,FWPMETA1-8] can learn to memorize past data, e.g., by computing fast weight changes through additive outer products of self-invented activation patterns[FWP0-1] (now often called keys and values for self-attention[TR1-6]). The similar Transformers[TR1-2] combine this with projections Transformers with linearized self-attention[TR5-6] In 1993, he introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves. See tweet of 2022. PDF. normalization).[FWP] PDF. HTML. Pictures (German). See tweet of 2022 for 30-year anniversary. PDF. Preprint: arXiv:1811.12143. PDF. PDF. Very similar to [FWP0-2], in both motivation [FWP2] and execution. This work on "attention" did not cite Schmidhuber's much earlier original work of 1991-1993 on soft attention and Transformers with linearized self-attention.[FWP,FWP0-2,6][ATT] Preprint: arXiv:2003.08165. PDF. HTML overview. Linear Transformers Are Secretly Fast Weight Programmers. ICML 2021. Preprint: arXiv:2102.11174. Preprint: arXiv:2106.06295 (June 2021). PDF. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here. Preprint arXiv:2012.14905 [cs.LG], 2020. Report arXiv:2011.07831 [cs.AI], 2020. Preprint: arXiv:2202.05780. PDF. Probably the first paper on using stochastic gradient descent[STO51-52] reverse mode of automatic differentiation or backpropagation[BP1]). OCR-based PDF scan of pages 94-135 (see pages 119-120). Implementation of Amari's 1967 stochastic gradient descent method for multilayer perceptrons.[GD1] (S. Amari, personal communication, 2021.) Preprint arXiv/2207.01570, 4 July 2022 (submitted in May 2022). arXiv:cs/0309048 (2003). More. PDF. Cognitive Computation 1(2):177-193, 2009. PDF. More. Google Research Blog, Sep 2015, see also Aug 2015 Google's speech recognition based on CTC and LSTM. Alphr Technology, Jul 2015, or 9to5google, Jul 2015 WIRED, Sep 2016, siliconANGLE, Sep 2016 Blog post, Internet Archive, 2010. A blog post describing basic ideas[AC][AC90,AC90b][AC20] of GANs. A description of GANs that does not cite Schmidhuber's original GAN principle of 1990[AC][AC90,AC90b][AC20][R2][T22] (also containing wrong claims about Schmidhuber's adversarial NNs for Predictability Minimization[PM0-2][AC20][T22]). Link. This was number 1 on Hacker News. Frankfurter Allgemeine Zeitung, 16/6/2021. Preprint arXiv/2005.14165. for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. PDF. ArXiv preprint. win four important computer vision competitions 2011-2012 before others won any PDF. HTML overview. competitor.[DAN1] This led to massive interest from industry. [GPUCNN3] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, July 2012. PDF. Longer TR of Feb 2012: arXiv:1202.2745v1 [cs.CV]. More. PDF. DanNet,[DAN,DAN1][R6] to win computer vision contests in 2011[GPUCNN2-3,5] (AlexNet and VGG Net[GPUCNN9] followed in 2012-2014). [GPUCNN4] emphasizes benefits of Fukushima's ReLUs (1969)[RELU1] and dropout (a variant of Hanson 1990 stochastic delta rule)[Drop1-4] but neither cites the original work[RELU1][Drop1] nor the basic CNN architecture (Fukushima, 1979).[CNN1] J. Schmidhuber (AI Blog, 2017; updated 2021 for 10th birthday of DanNet): History of computer vision contests won by deep CNNs since 2011. DanNet was the first CNN to win one, and won 4 of them in a row before the similar AlexNet/VGG Net and the Resnet (a Highway Net with open gates) joined the party. Today, deep CNNs are standard in computer vision. PDF. PDF. [GPUCNN8] J. Schmidhuber (AI Blog, 2017; updated 2021 for 10th birthday of DanNet). first deep learner to win a medical imaging contest (2012). Link. J.  Schmidhuber (Blog, 2000). Most influential persons of the 20th century (according to Nature, 1999). The Haber-Bosch process has often been called the most important invention of the 20th century[HAB1] PDF. PDF. Bengio claimed[YB20] Schmidhuber's publications on exactly this topic date back to 1991-93.[UN0-2][UN] An unsupervised learning algorithm related to Schmidhuber's supervised Neural Heat Exchanger.[NHE] [HIN] J. Schmidhuber (AI Blog, 2020). Critique of Honda Prize for Dr. Hinton. Science must not allow corporate PR to distort the academic record. See also [T22]. previous related work.[BB2][NAN1-4][NHE][MIR](Sec. 15, Sec. 17)[FWPMETA6] PDF. what Y. LeCun called an "open problem" in 2022.[LEC] North-Holland, 1991. PDF. Extending TR FKI-129-90, TUM, 1990. PDF. This work did not cite Schmidhuber's gradient-based subgoal generators for hierarchical reinforcement learning (1990).[HRL0-2] PDF. Preprints arXiv:1505.00387 (May 2015) and arXiv:1507.06228 (July 2015). Also at NIPS 2015. The LSTM with forget gates[LSTM2] for RNNs.) Resnets[HW2] are a version of this where the gates are always open: g(x)=t(x)=const=1. Highway Nets perform roughly as well as ResNets[HW2] on ImageNet.[HW3] Variants of highway gates are also used for certain algorithmic tasks, where the simpler residual layers do not work as well.[NDR] More. Link. arXiv:1512.03385 (Dec 2015). Residual nets are a version of Highway Nets[HW1] More. arxiv:1612.07771 (2016). Also at ICLR 2017. This work did not cite the earlier LSTM[LSTM0-6] trained by Connectionist Temporal Classification (CTC, 2006).[CTC] CTC-LSTM was successfully applied to speech in 2007[LSTM4] (also with hierarchical LSTM stacks[LSTM14]) and became the first superior end-to-end neural speech recogniser that outperformed the state of the art, dramatically improving Google's speech recognition.[GSR][GSR15][DL4] Markov models (HMMs).[BW][BRI][BOU] [HYB12] still used the old hybrid approach and did not compare it to CTC-LSTM. Later, however, Hinton switched to LSTM, too.[LSTM8] Ernst Ising and Wilhelm Lenz in the 1920s.[L20][I25][K41][W45][T22] It settles into an equilibrium state in response to input conditions, and is the foundation of the first well-known learning RNNs.[AMH1-2] Who Invented the IC? Preprint arXiv:1704.04760 PDF. PDF. Mathematischen Schriften, ed. C. Gerhardt, Berlin 1879, vol.7, p.223. English link. Link. arXiv:1607.06450, 2016. [LEC] J. Schmidhuber (AI Blog, 2022). LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015. Years See tweet1. LeCun also listed the "5 best ideas 2012-2022" without mentioning that See tweet2. [LEI21] J. Schmidhuber (AI Blog, 2021). 375th birthday of Leibniz, founder of computer science. Frankfurter Allgemeine Zeitung (FAZ), 17/5/2021. FAZ online: 19/5/2021. [LEI21b] J. Schmidhuber (AI Blog, 2021). 375. Geburtstag des Herrn Leibniz, dem Vater der Informatik. PDF. [LSTM1] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. PDF. Based on [LSTM0]. More. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. Preprint: arxiv:1506.07452. PDF. J. Schmidhuber (AI Blog, Dec 2020). 10-year anniversary of our journal paper on deep reinforcement learning with policy gradients for LSTM (2007-2010). Recent PDF. are actually a variant of the vanilla LSTM architecture[LSTM2] (2000) which the authors did not cite although this work[LSTM2] was the one that introduced gated recurrent units. Furthermore, Schmidhuber's team automatically evolved lots of additional LSTM variants and topologies already in 2009[LSTM7] without changing the name of the basic method. learn to count[LSTMGRU2] nor learn simple non-regular languages;[LSTMGRU2] they according to Google Brain.[LSTMGRU3]) Preprint arXiv:1805.04908. Architectures. Preprint arXiv:1703.03906 A misleading "history of deep learning" goes more or less like this: "In 1969, Minsky & Papert[M69] researchers took a fresh look at the problem in the 1980s."[S20] However, the 1969 book[M69] addressed a "problem" of Gauss & Legendre's shallow learning (circa 1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method,[DEEP1-2][DL2] and then also by Amari's SGD for MLPs.[GD1-2] Minsky was apparently unaware of this and failed to correct it later.[HIN](Sec. I)[T22](Sec. XIII) J. Schmidhuber (AI Blog, 2020). 1/3 century anniversary of Searchable PDF scan (created by OCRmypdf which uses LSTM). HTML. better GP methods through Meta-Evolution. More. [MIR] J. Schmidhuber (AI Blog, Oct 2019, updated 2021, 2022). Deep Learning: Our Miraculous Year 1990-1991. Preprint arXiv:2005.05744, 2020. The Computation 22(12): 3207-3220, 2010. ArXiv Preprint. (AI Blog, Sep 2020). 10-year anniversary of supervised deep learning breakthrough (2010). No unsupervised pre-training. By 2010, when compute was 100 times more expensive than today, both the feedforward NNs[MLP1] J.  Schmidhuber (AI Blog, 2021). The most cited neural networks all build on work done in my labs. Foundations of the most popular NNs originated in Schmidhuber's labs at TU Munich and IDSIA. (1) Long Short-Term Memory (LSTM), (2) ResNet (which is the earlier Highway Net with open gates), (3) AlexNet and VGG Net (both building on the similar earlier DanNet: the first deep convolutional NN to win image recognition competitions), Adversarial Artificial Curiosity), and (5) variants of Transformers (Transformers with linearized self-attention are formally equivalent to the much earlier Fast Weight Programmers). Annus Mirabilis of 1990-1991.[MIR] PDF. PDF. Preprint arXiv:1608.05343, 2016. Preprint arXiv:1611.01578 (PDF), 2017. Compare the earlier Neural Architecture Search of Bayer et al. (2009) for LSTM-like topologies.[LSTM7] [NASC1] J. Schmidhuber. First Pow(d)ered flight / plane truth. Correspondence, Nature, 421 p 689, Feb 2003. [NASC3] J. Schmidhuber. The last inventor of the telephone. Letter, Science, 319, no. 5871, p. 1759, March 2008. Correspondence, Nature, vol 483, p 541, March 2012, doi:10.1038/483541b. Letter, Science, vol 336, p 1639, June 2012. See also comment on response by A. Hodges (DOI:10.1126/science.336.6089.1639-a) [NASC6] J. Schmidhuber. Colossus was the first electronic digital computer. Correspondence, Nature, 441 p 25, May 2006. [NASC6a] J. Schmidhuber. Comment on "Biography: The ABC of computing" by J. Gilbey, Nature 468 p 760-761 (2010). Link. [NASC7] J. Schmidhuber. Turing's impact. Correspondence, Nature, 429 p 501, June 2004 [NASC8] J. Schmidhuber. Prototype resilient, self-modeling robots. Correspondence, Science, 316, no. 5825 p 688, May 2007. [NASC9] J. Schmidhuber. Comparing the legacies of Gauss, Pasteur, Darwin. Correspondence, Nature, vol 452, p 530, April 2008. HTML. The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization. Proc. ICLR 2022. Preprint arXiv/2110.07732. Link. excellent 1995 neural probabilistic text model.[SNT] See also Nakamura and Shikano's 1989 word category prediction model.[NPMa] Compare Konrad Zuse's much earlier 1948 work on theorem proving[ZU48] the first high-level programming language.[BAU][KNU] NY Times article Learning Dexterous In-Hand Manipulation. arxiv:1312.5602 (PDF). arxiv:1912.06680. An LSTM composes 84% of the model's total parameter count. 2018. An LSTM with 84% of the model's total parameter count was the core of OpenAI Five. Link. J. Schmidhuber (Blog, 2006). Is History Converging? Again? history's exponential acceleration since the Big Bang.[OMG] Preprint arXiv/1606.06724. Preprint arXiv/1708.03498. Preprint arXiv/1802.10353. Preprint arXiv/2010.03635. Preprint arXiv/2011.12930. PDF. HTML. HTML overview. OOPS source code in crystalline format. PDF. HTML. Link. J. Schmidhuber (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990). This work also introduced high-dimensional reward signals, deterministic policy gradients for RNNs, and the GAN principle Based on TR FKI-126-90 (1990).[AC90] More. PDF. Partially based on TR FKI-126-90 (1990).[AC90] Report arXiv:1210.0118 [cs.AI], 2015. One Big Net For Everything. Preprint arXiv:1802.08864 [cs.AI], Feb 2018. Preprint: arXiv:1809.01999. Github: World Models. minimization. TR CU-CS-565-91, Univ. Colorado at Boulder, 1991. PDF. More. 1991. PDF. More. PDF. More. Link. arXiv:1112.5309 [cs.AI] PDF. First Experiments with PowerPlay. arXiv:1210.8385 [cs.AI]. [R1] Reddit/ML, 2019. Hinton, LeCun, Bengio receive ACM Turing Award. This announcement contains more comments about Schmidhuber than about any of the awardees. [R2] Reddit/ML, 2019. J. Schmidhuber really had GANs in 1990. [R3] Reddit/ML, 2019. NeurIPS 2019 Bengio Schmidhuber Meta-Learning Fiasco. in 1987[META1][META] long before Bengio [R4] Reddit/ML, 2019. Five major deep learning papers by G. Hinton did not cite similar earlier work by J. Schmidhuber. [R5] Reddit/ML, 2019. The 1997 LSTM paper by Hochreiter & Schmidhuber has become the most cited deep learning research paper of the 20th century. [R6] Reddit/ML, 2019. DanNet, the CUDA CNN of Dan Ciresan in J. Schmidhuber's team, won 4 image recognition challenges prior to AlexNet. [R7] Reddit/ML, 2019. J. Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970. [R8] Reddit/ML, 2019. J. Schmidhuber on Alexey Ivakhnenko, godfather of deep learning 1965. [R9] Reddit/ML, 2019. We [R11] Reddit/ML, 2020. Schmidhuber: Critique of Honda Prize for Dr. Hinton [R12] Reddit/ML, 2020. J. Schmidhuber: Critique of Turing Award for Drs. Bengio & Hinton & LeCun [R15] Reddit/ML, 2021. J. Schmidhuber's work on fast weights from 1991 is similar to linearized variants of Transformers Although these MLPs did not yet have deep learning, because only the last layer learned,[DL1] Rosenblatt basically had what much later was rebranded as Extreme Learning Machines (ELMs) without proper attribution.[ELM1-2][CONN21][T22] J. Schmidhuber (AI Blog, 2001). Raw Computing Power. Preprint arXiv/1311.2524, Nov 2013. Preprint arXiv/1703.06870, 2017. PDF. The first paper on policy gradients for LSTM. This approach has become very important in reinforcement learning.[LSTMPG] This experimental analysis of backpropagation did not cite the origin of the method,[BP1-5] also known as the reverse mode of automatic differentiation. the first working algorithms for deep learning of internal representations (Ivakhnenko & Lapa, 1965)[DEEP1-2][HIN] as well as Amari's work (1967-68)[GD1-2] on learning internal representations in deep nets through stochastic gradient descent. Even later surveys by the authors[DL3,3a] failed to cite the prior art.[T22] Link. A misleading "history of deep learning" which goes more or less like this: "In 1969, Minsky & Papert[M69] researchers took a fresh look at the problem in the 1980s."[S20] However, the 1969 book[M69] addressed a "problem" of Gauss & Legendre's shallow learning (circa 1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method,[DEEP1-2][DL2] and then also by Amari's SGD for MLPs.[GD1-2] Minsky was apparently unaware of this and failed to correct it later.[HIN](Sec. I)[T22](Sec. XIII) in the 1960s-70s, especially outside of the Anglosphere.[DEEP1-2][GD1-3][CNN1][DL1-2][T22] The Past, Present and Future of Artificial Intelligence. Link. PDF. Much later this was called a probabilistic language model.[T22] PDF. Link. ACM's justification of the 2018 A.M. Turing Award (announced in 2019). WWW link. Local copy 1 (HTML only). Local copy 2 (HTML only). [T22] debunks this justification. [T20a] J. Schmidhuber (AI Blog, 25 June 2020). Critique of 2018 Turing Award for Drs. Bengio & Hinton & LeCun. A precursor of [T22]. [T22] J. Schmidhuber (AI Blog, 2022). Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award. Technical Report IDSIA-77-21, IDSIA, Lugano, Switzerland, 2022. Debunking [T19] and [DL3a] . the 1991 publication on what's now called "Transformers with linearized self-attention."[FWP0-6][TR5-6] attention terminology in 1993.[ATT][FWP2][R4] See tweet of 2022 for 30-year anniversary. Link. [TUR21] J. Schmidhuber (AI Blog, Sep 2021). Turing Oversold. It's not Turing's fault, though. The Turing Test. YouTube video, 2022. Preprint arXiv/1912.02875, 5 Dec 2019. Preprint arXiv/1912.02877, 5 Dec 2019. J. Schmidhuber (AI Blog, 2021). 30-year anniversary. 1991: First very deep learning with unsupervised or self-supervised pre-training. Unsupervised PDF. By 1993, the approach solved problems of depth 1000 [UN2] neural knowledge distillation procedure The systems of 1991 allowed for much deeper learning than previous methods. More. 1992. Based on TR FKI-148-91, TUM, 1991.[UN0] PDF. approaches are now widely used. More. [UN2] J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here (depth > 1000). 2006. PDF. It did not cite the much earlier 1991 unsupervised pre-training of stacks of more general recurrent NNs (RNNs)[UN0-3] the first NNs shown to solve very deep problems. (or negative log probability) of the data representation in the level below.[HIN][T22][MIR] This can greatly facilitate very deep downstream learning.[UN0-3] The comment under reference[UN4] applies here as well. Theory of Universal Learning Machines & Universal AI. Link. [VAN1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TUM, 1991 (advisor J. Schmidhuber). PDF. More on the Fundamental Deep Learning Problem. Results are essentially identical to those of Schmidhuber's diploma student Sepp Hochreiter (1991).[VAN1] Even after a common publication,[VAN3] the first author of [VAN2] published papers[VAN4] that cited only their own [VAN2] but not the original work. PDF. [VAN4] Y. Bengio. Neural net language models. Scholarpedia, 3(1):3881, 2008. Link. Link. Youtube video [see 28:16]. However, in 2010, Schmidhuber's team in Switzerland showed[MLP1-2] unsupervised pre-training is not necessary Preprint arXiv:1609.08144 (PDF), 2016. Based on LSTM which it mentions at least 50 times. WWW link (retrieved 15 May 2020). Local copy (plain HTML only). Schmidhuber's publications on exactly this topic date back to 1991-93.[UN0-2][UN] already in 1995.[SNT] a general, practical, program-controlled computer. architecture [NEU45]. PDF. J. Schmidhuber (AI Blog, 2021). 80th anniversary celebrations: 1941: Konrad Zuse completes the first working general computer, based on his 1936 patent application. J. Schmidhuber (AI Blog, 2021). 80. Jahrestag: 1941: Konrad Zuse baut ersten funktionalen Allzweckrechner, basierend auf der Patentanmeldung von 1936. Weltwoche, Nr. 33.21, 19 August 2021. PDF. Menu directory status & updates copyrights Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award AI Blog
      Twitter: @SchmidhuberAI
      (v1: 24 Sep 2021, v2: 31 Dec 2021) Versions since 2021 archived in the Internet Archive This is a point-for-point critique of ACM's justification of the ACM A. M. Turing Award for deep learning, as well as a critique of the Turing Lecture given by the awardees (published by ACM in July 2021). deep learning survey,[DL1] and can also be seen as a short history of the deep learning revolution, at least as far as ACM's erroneous laudation and the Turing Lecture are concerned. 2015 survey of deep learning[DL1] June 2020 article[T20a][R12] version 1 of the present report. (see Executive Summary I, V, II, XII, XIX, XXI, XIII, XIV, XX, XVII). (A) speech recognition, (B) natural language processing, (C) robotics, (D) computer vision, (VII) medicine, astronomy, materials science. A, B, C, D, VII, XVII, VI, XVI). II, V, XX, XVIII) with Dr. Bengio & Dr. Hinton (see Sec. XVII, I). I respond to LBH's recent ACM article (July 2021). expands material in my Critique of the 2019 Honda Prize[HIN] (~3,000 words). Abstract & Outline (~300 words), Introduction (~300 words), Critique of LBH's ACM article (Turing Lecture) of July 2021[DL3a] Executive summary of what's wrong with ACM's laudation (~1,000 words), 21 comments on 21 claims by ACM (~8,000 words), Conclusion (~2,000 words). All backed up by over 300 references (over 10,000 words). The text contains numerous hyperlinks to relevant overview sites from the AI Blog. science is self-correcting."[SV20] they are mine or other people's.[DL1-2][HIN][NASC1-9] The present page is offered as a resource for all good computer scientists who share this inclination. and to fight plagiarism,[FAKE2] collusion rings,[LIT21] and systemic academic corruption in all of their more and less subtle forms.[FAKE] Sec. 2 LBH's 2021 ACM article[DL3a] which necessitated an extension of the first version of this post.[T20a][R12] ACM's official justification[T19] of the 2018 A.M. Turing Award[R1] After the Executive Summary in Sec. 3, Sec. 4 will split ACM's full text[T19] into 21 parts I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, XIX, XX, XXI. Most of the critiques are based on references to original papers and material from the AI Blog.[AIB][MIR][DEC][HIN] publishing yet another misleading overview of the field, this time based on LBH's Turing Lecture.[DL3a] LBH's well-known earlier omissions.[DLC][HIN][T20a] LBH claim to "briefly describe the origins of deep learning"[DL3a] without even mentioning the world's first working deep learning nets by Ivakhnenko and Lapa in 1965[DEEP1-2][R8] (see Sec. II). this class of methods was pioneered in 1991[UN-UN2] (see Sec. II, III). Highway Net, the first really deep feedforward NN.[HW1-3] (see Sec. D, VI). were all driven by my lab:[MOST] In 1991, I had the first very deep NNs based on unsupervised pre-training;[UN-UN2] LSTMs brought essentially unlimited depth to gradient-based supervised recurrent NNs;[LSTM0-17] later our Highway Nets[HW1-3] brought it to feedforward NNs. from 2007[LSTM4,14] based on LSTM[LSTM0-6] (1990s-2005) and CTC (2006).[CTC] our CTC-LSTM-based speech recognition (not that of Hinton) had been on most smartphones for years[GSR][GSR15-19][DL4] (see Sec. A, VI, XI, XV). Similarly for machine translation (see Sec. B). LBH cite Hinton (2012) for "dropout" without mentioning that dropout is just a variant of Hanson's 1990 stochastic delta rule[Drop1-3] (see Sec. XIV). perceptrons through stochastic gradient descent[GD1-3] (without reverse mode backpropagation[BP1]). Fukushima who introduced ReLUs in 1969[RELU1-2] (see Sec. XIV). called AlexNet,[GPUCNN4] without mentioning that our earlier groundbreaking deep GPU-based DanNet[GPUCNN1-3,5-8][DAN] did not need ReLUs at all to win 4 earlier object recognition competitions and to achieve superhuman results already in 2011[GPUCNN1-8][R5-6] (see Sec. XIV). XVIII). already in 1965[DEEP1-2][R8] (see Sec. II). earlier fast weights of von der Malsburg (1981) and Feldman (1982).[FAST,FASTa-b][FWP] described in the 1991-93 papers on Fast Weight Programmers and linear Transformers[FWP0-1,6] (see Sec. XVI, XVII-2). dedicate an extra section to attention-based Transformers,[TR1-6] citing Bengio's team (2014) for "soft attention"[ATT14] without citing the much earlier original work of 1991-1993 on soft attention and linear Transformers[FWP,FWP0-2,6][ATT] (see Sec. XVII-1, XVI). LBH claim that Bengio's team[NPM] of text compression[SNT] (see Sec. XVI, XVII-1). LBH cite Bengio's 2014 paper on Generative Adversarial Networks (GANs)[GAN0-1] without mentioning that GANs are instances of the Adversarial Curiosity Principle of 1990[AC90-20][MIR](Sec. 5) (see Sec. XVII). In summation, LBH have repeatedly chosen to ignore the previous well-known critiques[DLC][HIN][T20a] and deep learning surveys,[DL1-2] and ACM's peer review process failed to catch this. ACM's Code of Ethics and Professional Conduct[ACM18] states: "Computing and deep learning (e.g., Sec. I), ACM lauds Numerous references can be found under the relevant section links I-XXI which adhere to the sequential order of ACM's text[T19] Sec. II: it became really deep in 1991 in my lab, unsupervised pre-training of NNs, supervised LSTM. Sec. I contains 4 subsections A, B, C, D A: Speech Recognition (see also Sec. VI & XI & XV): The first superior end-to-end neural speech recognition combines two methods from my lab: LSTM (1990s-2005) and CTC (2006), which were Hinton (2012) and Bengio (XV) our revolutionary CTC-LSTM which was soon on most smartphones. Sec. B: Natural Language Processing (see also Sec. VI & XI & XVI): (soon used for several billions of was also based on our LSTM. Sec. C: Robotics. most visible breakthroughs Sec. D: Computer Vision XVIII & XIV & XI & VI) and applied to speech. All before LeCun's CNN work (XVIII). deep NNs pre-training (in contrast to Hinton's claims). Our DanNet was the first CNN fast & deep enough for superior computer vision in 2011, winning 4 image recognition contests in a row is an open-gated version of our earlier Highway Nets. Sec. XIV: deep & fast CNN (where LeCun participated), Sec. XI: ACM mentions GPU-accelerated NNs deep GPU-NN of 2010 debunked unsupervised pre-training (introduced by myself in 1991 and later championed by Hinton), and our GPU-CNN of 2011 (DanNet) was the first XVIII: Fukushima and Waibel (see Sec. D). The first application of CNNs with backpropagation to biomedical/biometric images is due to Baldi and Chauvin.[BA93] VII: ACM explicitly mentions medicine and first to win medical imaging competitions Sec. XII & XIX & XXI: Modern backpropagation XIII & II & V III & IX & X & XX): Sec. XX: ACM credits LeCun for work on Sec. XXI: ACM credits LeCun for work on XV: ACM credits Bengio for hybrids of NNs and probabilistic models of sequences. CTC-LSTM A & B). XVI: ACM We started this in 1990-93 long before LBH Sec. XVII: Artificial Curiosity vanishing gradients (1991), metalearning (1987), unsupervised pre-training (1991), compressing or distilling one NN into another (1991), learning sequential attention with NNs (1990), fast weight programmers using and other topics.[R2-R6] Sec. IV is on Turing (1936) and his predecessors Critique of LBH's ACM article (Turing Lecture) of July 2021. Sec. Conclusion: In the recent decade of deep learning, (speech recognition, language translation, etc.) on billions of devices (also healthcare applications) Sec. II & III & V & XII & XIII & XVII & XIV & XIX & XX & XXI. In what follows, ACM's full text [T19] is split into 21 parts I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, XIX, XX, XXI.

      Critique of 2018 Turing Award LBH and their co-workers have contributed certain useful improvements of existing deep learning methods.[CNN2,4][CDI][LAN][RMSP][XAV][ATT14][CAPS] (1965),[DEEP1-2][R8] stochastic gradient descent for multilayer perceptrons (1967),[GD1-3] modern backpropagation (1970),[BP1-2][R7] architectures of recurrent NNs (1925-56)[I25][MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC90,90b][AC20] unsupervised pre-training for deep NNs (1991),[UN1-2] vanishing gradients (1991)[VAN1] & Long Short-Term Memory or LSTM (Sec. A), GPU-accelerated NNs (2004),[GPUNN][DAN][DAN1][GPUCNN5] NNs with over 100 layers (2015),[HW1-3][R5] transformer-like[TR1-6][FWP] attention[FWP][ATT] through fast weight programmers (1991).[FWP0-2,6] [DL1-2][R2-R8] Often LBH failed to cite essential prior work, even in their later surveys.[DL3,DL3a][DLC][HIN][MIR](Sec. 21)[R2-R5, R7-R8] This may explain some of ACM's misattributions.[T19] II & III & V & XIII & X & XVII & XII & XVIII & XX. The deep NNs By the 2010s,[DEC] they were academia and industry,[DL4] mentioned by ACM (labeled as A, B, C, D) below: Long Short-Term Memory or LSTM (1990s-2005)[LSTM0-6] vanishing gradient problem student Sepp Hochreiter in 1991.[VAN1] This happened long before the similar work of Bengio (see Sec. XVII).[MIR] (Sec. 3,Sec. 4) LSTM was refined with my student Felix Gers[LSTM2] through "forget gates" based on end-to-end-differentiable fast weights.[MIR](Sec. 8)[FWP,FWP0-1] (A2) Connectionist Temporal Classification by my student Alex Graves et al. (2006).[CTC] Our team successfully applied CTC-trained LSTM to speech in 2007[LSTM4] (also with hierarchical LSTM stacks[LSTM14]). Markov models (HMMs)[BW][BRI][BOU] (Sec. XV). Hinton et al. (2012) still used the old hybrid approach[HYB12] and did not compare it to CTC-LSTM. became the first recurrent NN (RNN) to win international competitions. He later reused our end-to-end neural speech recognizer[LSTM4][LSTM14] as a postdoc in Hinton's lab.[LSTM8] CTC-LSTM dramatically improved Google's speech recognition.[GSR][GSR15][DL4] on-device speech recognition[GSR19] (not any longer on the server) LSTM[MIR](Sec. 4) (see Sec. VI & XI & XV). of text[SNT] (see Sec. XVI). In 2001, we showed that LSTM can learn languages unlearnable by traditional models such as HMMs,[LSTM13] See also Sec. VI & XI & XV. tailored by Bengio's team.[ATT14][FWP] However, such attention mechanisms also have their roots in my lab (1991);[FWP][FWP0-2,6] see Sec. XVI. C. Robotics & RL etc. Since 2003, our team has used LSTM for Reinforcement Learning (RL) and robotics.[LSTM-RL][RPG][LSTMPG] In the 2010s, For example, in 2018, a PG-trained LSTM was the core of OpenAI's famous Dactyl which learned to control a dextrous robot hand without a teacher.[OAI1][OAI1a] beat a pro player in the game of Starcraft, which is theoretically harder than Chess or Go[DM2] in many ways, using Alphastar whose brain has a deep LSTM core trained by PG.[DM3] OpenAI Five which learned to defeat human experts in the Dota 2 video game (2018).[OAI2] Bill Gates called this a "huge milestone in advancing artificial intelligence".[OAI2a][MIR](Sec. 4)[LSTMPG] Apart from A, B, C above, in healthcare, chemistry, molecular design, lip reading, speech synthesis,[AM16] predicting what's going on in nuclear fusion reactors, and so on.[DEC][DL4] was being used for LSTM (only 5% for the CNNs of Sec. D).[JOU17] Apparently the first LSTM journal paper[LSTM1][R5] is now the 20th century D. Computer Vision was revolutionized in the 2010s by a particular feedforward neural net (NN) called the convolutional NN (CNN).[CNN1-4] The basic CNN architecture with convolutional and downsampling layers is due to Fukushima (1979),[CNN1] who also introduced the now widely used rectified linear units (ReLUs) in 1969.[RELU1] In 1987, NNs with convolutions were combined by Waibel with weight sharing and backpropagation.[CNN1a] Waibel did not call this CNNs but TDNNs. called max-pooling was introduced by Yamaguchi et al. for TDNNs in 1990[CNN3a] and by Weng et al. for higher-dimensional CNNs in 1993.[CNN3] Since 1989, LeCun's team has contributed improvements of CNNs, especially for images[CNN2,4] (see Sec. XVIII). Finally, my own team showed in 2010[MLP1] unsupervised pre-training is not necessary to train deep NNs, contrary to claims by Hinton[VID1] who said that "nobody in their right mind would ever suggest" this. Then we Our fast GPU-based CNN of 2011[GPUCNN1] known as DanNet[DAN,DAN1][R6] CNNs of 2006.[GPUCNN] winning four of them in a row (15 May 2011, 6 Aug 2011, 1 Mar 2012, 10 Sep 2012).[GPUCNN5] at IJCNN 2011 in Silicon Valley, DanNet blew away the competition and achieved the first superhuman visual pattern recognition[DAN1] in an international contest (where LeCun's team took a distant second place, with DanNet was also the first deep CNN to win: a Chinese handwriting contest (ICDAR 2011), an image segmentation contest (ISBI, May 2012), CVPR paper on DanNet[GPUCNN3] of Hinton's student Krizhevsky won the ImageNet[IM09] 2012 contest[GPUCNN4-5][R6] (now also without unsupervised pre-training, citing DanNet). Our CNN image scanners were 1000 times faster than previous methods.[SCAN] The VGG network (ImageNet 2014 winner)[GPUCNN9] and other highly cited CNNs[RCNN1-3] further extended the work of 2011.[MIR](Sec. 19) ResNet, the ImageNet 2015 winner[HW2] (Dec 2015) and currently the most cited neural network,[MOST] is a version (with open gates) of our earlier Highway Net (May 2015).[HW1-3][R5] The Highway Net is actually the feedforward net version of vanilla LSTM.[LSTM2] It was the first working, really deep feedforward NN with hundreds of layers (previous NNs had at most a few tens of layers). See also Sec. XVIII & XIV & XI & VI.

      Critique of 2018 Turing Award appeared long before the 1980s. The first non-learning recurrent NN (RNN) architecture (the Lenz-Ising model) was analyzed by physicists in the 1920s.[L20][I25][K41][W45] were also discussed in 1943 by McCulloch and Pitts[MC43] and formally analyzed in 1956 by Kleene.[K56] In 1972, Amari reused the Lenz-Ising model to build a learning RNN, later sometimes called the Hopfield network or Amari-Hopfield Network.[AMH1-3] artificial evolution[TUR1] and single adaptive layer learned in 1958[R58] (Joseph[R61] Widrow & Hoff's similar Adaline learned in 1962.[WID62] regression and the method of least squares[DL1-2] multilayer perceptrons (MLPs) were discussed by Steinbuch[ST61-95] (1961), Joseph[R61] (1961), and Rosenblatt[R62] (1962), who wrote about "back-propagating errors" in an MLP with a hidden layer,[R62] but did not yet have a general deep learning algorithm for deep MLPs (what's now called backpropagation is quite different and was first published by Linnainmaa in 1970[BP1-BP5][BPA-C]). Compare also Selfridge's multilayer Pandemonium[SE59] (1959). containing the now popular multiplicative gates).[DEEP1-2][DL1-2] A paper of 1971[DEEP2] already described a deep learning net with 8 layers, trained by their highly cited method which was still popular in the new millennium,[DL2] especially in Eastern Europe, where much of Machine Learning was born.[MIR](Sec. 1)[R8] LBH failed to cite this, just like they failed to cite Amari,[GD1] who in 1967 proposed stochastic gradient descent[STO51-52] (SGD) for MLPs and whose implementation[GD2,GD2a] (with Saito) learned internal representations at a time when compute was billions of times more expensive than today (see also Tsypkin's work[GDa-b]). deep convolutional NN architecture was first introduced in the 1970s;[CNN1] his very popular ReLU already in 1969.[RELU1-2] XIII, III, V, VIII, IX, and X. LBH & co-authors, e.g., Sejnowski[S20] (see Sec. XIII). It goes more or less like this: "In 1969, Minsky & Papert[M69] researchers took a fresh look at the problem in the 1980s."[S20] However, as mentioned above, the 1969 book[M69] addressed a "problem" of Gauss & Legendre's shallow learning (~1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method[DEEP1-2][DL2] (and then also by Amari's SGD for MLPs[GD1-2]). Minsky was apparently unaware of this and failed to correct it later.[HIN](Sec. I) (but see a 1989 paper[MOZ]). However, it became really deep in 1991 in my lab,[UN-UN3] which has See Sec. 1 of the overview:[MIR] First Very Deep NNs, Based on Unsupervised Pre-Training (1991). "Very Deep Learning" tasks of depth > 1000.[UN2][DL1][UN] (By 2003, LSTM variants successfully dealt with language problems of depth up to 30,000[LSTM17] more.) drove the shift from unsupervised pre-training to purely supervised learning (1991-95; 2006-10).[HIN](Sec. II)[MIR] (Sec. 19) III. Note that LSTMs brought essentially unlimited depth to gradient-based supervised recurrent NNs; Highway Nets[HW1-3] brought it to feedforward NNs.[MOST]

      Critique of 2018 Turing Award by others (Sec. III).[DLC][DEEP1-2][BP1][DL1-2][R7-R8][R2-R4] deep learning multilayer perceptrons (1965),[DEEP1-2][R8] stochastic gradient descent for multilayer perceptrons (1967),[GD1-3] modern backpropagation (1970),[BP1,2][R7] architectures of recurrent NNs (1925-56)[I25][MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC90,90b][AC20] unsupervised pre-training for deep NNs,[UN1-2] the vanishing gradient problem (1991)[VAN1] & solutions to it (Sec. A), GPU-accelerated NNs (2004),[GPUNN][GPUCNN5] and other foundations.[DL1-2][R2-R8] Often LBH failed to cite essential prior work.[DLC][HIN][MIR](Sec. 21) II & V & XIII & IX & X & XVII & XII & XVIII & XX & I. deeplearning.net which until 2019 advertised deep learning as "moving beyond shallow machine learning since 2006",[DL7] referring to Hinton's[UN4] and Bengio's[UN5] we had this type of deep learning already in 1991;[UN][UN1-2] see Sec. II & XVII (5). Not to mention Ivakhnenko's even earlier supervised layer-wise training of deep NNs[DEEP1-2] which Hinton,[UN4] Bengio,[UN5] and LBH[DL3,DL3a] did not cite either. See Sec. X.

      Critique of 2018 Turing Award my comments systematically track the sequential order of ACM's claims.[T19]

      ACM's statement on Turing is greatly misleading, like some of its other statements.[T19] any type of computation-based AI.[GOD][BIB3][MIR](Sec. 18)[GOD21,21a] Much of early AI in the 1940s-70s was actually about theorem proving[ZU48][NS56]

      In 1936, Turing Turing Machine.[TUR] He rederived the above-mentioned result,[CHU][TUR][HIN][GOD21,21a][TUR21][LEI21,21a] In the same year of 1936, Emil Post published yet another independent universal model of computing,[POS] my reply to Hinton who criticized my website on Turing without suggesting any fact-based corrections.[HIN]) open problem "P=NP?" in his famous letter to John von Neumann (1956).[GOD56][URQ10] Likewise, Konrad Zuse (1910-1995) created the world's first working programmable general-purpose computer 1935-41. His patent application of 1936[ZU36-38][Z36][RO98][ZUS21] predating Claude Shannon's 1937 thesis on digital circuit design.[SHA37] Zuse also created the first high-level programming language in the early 1940s.[BAU][KNU] conditional jump instruction.[RO98]

      Critique of 2018 Turing Award that learn internal representations (1965),[DEEP1-2][R8] stochastic gradient descent for multilayer perceptrons (1967),[GD1-3] modern backpropagation (1970),[BP1,2][R7] architectures of recurrent NNs (1925-56)[I25][MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC][AC90,90b][AC10][AC20] unsupervised pre-training for deep NNs (1991),[UN1-2][UN] vanishing gradients (1991)[VAN1] & solutions to it (Sec. A),[LSTM0-17][CTC] (2004),[GPUNN][GPUCNN5] record-breaking deep supervised NNs (2010)[MLP1-2] and contest-winning deep CNNs (2011),[DAN][DAN1][GPUCNN5] NNs with over 100 layers (2015),[HW1-3][R5] transformer-like[TR1-6][FWP] attention[FWP][ATT] through fast weight programmers (1991),[FWP0-2,6] and more.[DL1-2][R2-R8] Often LBH failed to cite essential prior work.[DL3,DL3a][DLC][HIN][MIR](Sec. 21)[R2-R5,R7,R8,R11] II & I & III & XIII & X & XVII & XII & XVIII & XX.

      Critique of 2018 Turing Award "advances in natural language processing" and in speech supervised NNs and CNNs achieved by our group 2010-2011[MLP1-2][DAN][DAN1][GPUCNN5][R6] and through Highway Net-like NNs (2015),[HW1-3][R5] although the principles of CNNs were invented and developed by others since the 1970s.[CNN1-4] See Sec. D & XVIII & XIV as well as Sec. 4 & Sec. 19 of the overview.[MIR]

      Critique of 2018 Turing Award Baldi and Chauvin (1993) had the first application of CNNs with backpropagation to biomedical/biometric images.[BA93] DanNet[DAN][DAN1][GPUCNN5] the first NN to win a medical imaging contest through deep learning (Sept 2012, on cancer detection).[GPUCNN5,8] and were able to greatly improve steel defect detection.[ST] All of this happened before the similar GPU-accelerated AlexNet of Hinton's student Krizhevsky[GPUCNN4-5][R6] and the VGG network[GPUCNN9] mitosis detection.[MGC][GPUCNN5,8] approach of D & XI).

      Critique of 2018 Turing Award without citing them.[DL1][DLC][HIN][R2-R4][R7-R8] V & XII & XIX & II & III & XIII & XVII & X & I.

      Critique of 2018 Turing Award who failed to cite them, even in later work.[HIN][DLC][DL1-2][DEEP1-2][RELU1-2][R7-R8] See Sec. II & III & XIII & V & X & XIV & I.

      Critique of 2018 Turing Award first introduced to Machine Learning by Dechter (1986), and to NNs by Aizenberg et al (2000).[DL2] To my knowledge, LBH have never cited them. (Margin note: our 2005 paper on deep RL[DL6,6a] was the first machine learning LBH started talking about "deep learning ... moving beyond shallow machine learning since 2006",[DL7] referring to their unsupervised pre-training methods of 2006. See Sec. III. others built careers on this notion long before LBH recognized this.[DEEP1-2][CNN1][HIN][R8][DL1][DLC] Even deep learning through unsupervised pre-training was introduced by others.[UN1-3][R4][HIN](Sec. II) II & III & XIII & V & I.

      Critique of 2018 Turing Award ignored by LBH's papers[HIN][R7-R8][R2-R5] (see Sec. V & II & III & I & XIII & XII & XIX & X & XVII).

      ACM correctly mentions advancements through GPUs. The first to use GPUs for NNs were Jung & Oh (2004),[GPUNN][GPUCNN5] made GPU-based NNs fast and deep enough an important benchmark record,[MLP1-2] unsupervised pre-training (pioneered by myself in 1991) is not necessary to train deep NNs, contrary to Hinton's claims.[VID1] our CNNs were deep and fast enough[DAN][DAN1][GPUCNN5] vision (explicitly mentioned by ACM) for the first time[R6] (see Sec. D).

      Furthermore, by the mid 2010s, speech recognition and machine translation (explicitly mentioned by ACM) were actually dominated by LSTM and CTC of our team.[LSTM1-4][CTC] In particular, as mentioned in Sec. A, such as HMMs.[BW][BOU][BRI][HYB12] As mentioned in Sec. B and XVI, the first superior end-to-end neural machine translation was also based on LSTM.

      Critique of 2018 Turing Award ACM's statement is "less wrong" than Honda's[HIN](Sec. I) but still (and apparently even other award committees[HIN](Sec. I)) backpropagation by Rumelhart et al. (1985-86)[RUM] (1982).[BP2] And the article[RUM] even failed to mention Linnainmaa, the inventor of this famous algorithm for credit assignment in networks (1970),[BP1] Kelley already had a precursor thereof in the field of control theory;[BPA] see also later work of the early 1960s.[BPB][BPC][R7] internal representations in hidden layers of NNs.[RUM] But this was essentially just an experimental analysis of a known method.[BP1-2] And history of backpropagation can be found at Scholarpedia[DL2] and in my award-winning survey.[DL1] Also see Sec. XIX, II.

      Some claim that "backpropagation is just the chain rule of Leibniz (1676) & L'Hopital (1696)." No, it is the efficient way of applying the chain rule to big networks with differentiable nodes (there are also many inefficient ways of doing this). It was not published until 1970.[BP1] recent debate:[HIN] It is true that in 2018, Hinton[AOI] Rumelhart[RUM] with the "invention" of backpropagation. for "creating" the method and for other things he didn't do.[HIN] Neither in a popular book[AOI] nor in other recent work[DL3,DL3a] did he cite Linnainmaa (1970),[BP1] the true creator.[BP4-5] that his 2015 survey[DL3] does cite Werbos (1974) who however described the method correctly only later in 1982[BP2] and also failed to cite Linnainmaa.[BP1] Compare the 1967-68 work of Amari:[GD1-3] to my knowledge the first to propose and implement stochastic gradient descent[STO51-52] reverse mode gradient descent method now known as backpropagation[BP1]); see also Tsypkin's work of 1966.[GDa-b] Linnainmaa's backpropagation method was well-known.[BP5][DL1-2][DLC] It wasn't created by "lots of different people" as Hinton suggested[AOI][HIN][R11] one person who published first[BP1] and therefore should get the credit.

      Critique of 2018 Turing Award Boltzmann Machine (BM)[BM] a learning.[HIN] Recently, however, I learnt through a reader that even the BM paper[BM] did not cite prior relevant work by Sherrington & Kirkpatrick[SK75] and Glauber.[G63] (Compare related work.[H86][H88][S93]) multilayer perceptrons with arbitrarily many layers.[DEEP1-2][HIN] Sec. II V & X.[MIR](Sec. 1)[R8]

      As mentioned in Sec. II, Sejnowski's rather self-serving "history of deep learning" [S20] claims: In 1969, Minsky & Papert[M69] at the problem in the 1980s."[S20] However, the 1969 book[M69] addressed a "deep learning problem" (a limitation of Gauss & Legendre's shallow learning around 1800[DL1-2]) that had already been solved four years prior (see Sec. II), also in the 1970s, especially outside of the Anglosphere.[DEEP2][GD1-3][CNN1][DL1-2]

      Critique of 2018 Turing Award Dropout is actually a variant of Hanson's much earlier stochastic delta rule (1990).[Drop1-3] Hinton's 2012 paper and his later patent did not cite this either. as we showed already in 2011 in a contest where LeCun's team participated as well,[DAN1] Sec. D above. Back then, the only really of deep CNNs through GPUs.[GPUCNN1,3,5][R6] Already before ImageNet 2012,[R6] fast deep CNN called DanNet a monopoly on winning computer vision competitions.[GPUCNN5] It more than "halved the error rate for object recognition" (ACM's wording) in a contest already in 2011[GPUCNN2][DAN,DAN1][R6] long before the similar system of Hinton's student. See Sec. D as well as Sec. 19 of the overview.[MIR]

      Critique of 2018 Turing Award since the late 1980s.[BW][BRI][BOU] LSTM (1990s-2005)[LSTM0-6] and CTC[CTC] (2006), which were applied to speech in 2007.[LSTM4][LSTM14] CTC-LSTM is end-to-end-neural and thus very different from (and superior to) the hybrid methods since the late 1980s.[BW][BRI][BOU][HYB12] See also Sec. A.

      Critique of 2018 Turing Award 5 years earlier, in 1995, we already had a similar, excellent neural probabilistic text model.[SNT] Bengio[NPM] characterizes it only briefly as "related" (see also Pollack's earlier work on embeddings of words and other structures[PO87][PO90]). In the 2010s, was actually the LSTM of our team,[LSTM0-6] which Bloomberg called the "arguably the most commercial AI achievement."[AV1][MIR](Sec. 4) See Sec. B. Bengio's team[ATT14] has indeed become important. For example, it helped to further improve Facebook's LSTM-based translation (see Sec. B). adaptive neural sequential attention: end-to-end-differentiable "soft" attention in the latent space of Fast Weight Programmers (FWPs),[FWP2][FWP] and "hard" attention (in observation space) in the context of RL[ATT][ATT0-1] (1990). attention-based Transformers[TR1-6] are FWPs of 1991[FWP0-1] which have become a popular alternative to RNNs. My FWP of 1991[FWP0-1] (now often called keys and values for self-attention).[TR1-6][FWP] the 2010s,[DEC] Transformers[TR1-2] a traditional LSTM domain (see Sec. B). rapidly learn to solve quickly[LSTM13,17] linear Transformers or Performers[TR5-6] which are formally equivalent to my 1991 FWPs (apart from normalization).[FWP6][FWP] In 1993, I introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves.

      See[MIR](Sec. 9)[R4] for my related priority dispute on attention with Hinton. He was the reviewer of my 1990 paper[ATT2] his own work:[ATT3]

      Critique of 2018 Turing Award GANs[GAN0-1] (2010-2014) are actually a simple application[AC] of the adversarial curiosity (AC) principle from 1990[AC90,90b][AC20] (see also surveys[AC09-10]). This principle is now widely used for exploration in RL (e.g., Sec. C) and for image synthesis[GAN1] (also mentioned by ACM in Sec. XVIII). predictor NN minimizes its error, while the generator NN tries to make outputs that maximize this error: one net's loss is the other net's gain. 4 years before the GAN paper,[GAN1] a well-known 2010 survey[AC10] summarised the generative adversarial NNs of 1990 as follows: a whether the controller's (or generator's) output is in a given set.[AC20][AC] early adversarial machine learning settings[S59][H90] neither involved unsupervised NNs nor were about modeling data nor used gradient descent.[AC20]) Bengio et al. neither cited the original work[AC90,90b][AC20] nor corrected their erroneous claims[GAN1] about the other (1991).[PM1-2][AC20][R2][MIR](Sec. 5) Bloomberg,[AV1] their NIPS 2014 paper[GAN1] and some of the erroneous claims it made about my prior work.[AC20] Goodfellow eventually admitted that PM is adversarial (his paper[GAN1] still claims the opposite), but emphasized that it's not generative. However, the even earlier AC[AC90,90b][AC10][AC20] is both adversarial and generative (its generator contains probabilistic units[AC90] like in StyleGANs[GAN2]). When the authors[GAN1] I published one myself in the hopes of correcting the annals of history.[AC20] that they are instances of my earlier work.[R2][AC20] vanishing gradient problem,[MIR](Sec. 3)[VAN1] Bengio published his own,[VAN2] without citing Sepp. was settled in favor of Sepp.[VAN1] However, even after a common publication,[VAN3] Bengio published papers[VAN4][XAV] are poor indicators of truly pioneering work.[NAT1] (Margin note: Bengio states[YB20] that in 2018 he one must at least clarify it later,[DLC] Bengio also claims[YB20] that in 1995 my publications on exactly this topic date back to 1991-93.[UN0-2][UN] which I started in 1987[META1][META] long before Bengio that he did it before me.[R3] Bengio also writes[YB20] that in Regarding attention-based Transformers,[TR1-6] Bengio[DL3a] cites his own team (2014) for "soft attention" without citing my much earlier original work of 1991-1993 on soft attention and linear Transformers.[FWP,FWP0-2,6] Bengio has also heavily used our LSTM (see Sec. A-C), "gated recurrent units (GRU)"[LSTMGRU] for a variant of our vanilla LSTM architecture[LSTM2] (2000) which he did not cite although our work[LSTM2] was the one that introduced gated recurrent units. In addition, our team automatically evolved lots of additional LSTM variants and topologies already in 2009[LSTM7] without changing the name of the basic method. learn to count[LSTMGRU2] nor learn simple non-regular languages;[LSTMGRU2] they according to Google Brain.[LSTMGRU3]) unsupervised pre-training for deep NNs.[UN0-4][HIN](Sec. II)[MIR](Sec. 1) Hinton's paper[UN4] (2006) appeared long after my earlier work on this[UN0-2] the first NNs shown to solve very deep problems (see Sec. II above).[UN] It was published in 1991-92[UN1] when compute was about 1000 times more expensive than in 2006. survey (2015),[DL3][DLC] See also Sec. II & III. compressing or distilling one NN into another.[UN0-2][DIST1-2][MIR](Sec. 2) Hinton[DIST2] (2006) did not cite my much earlier original work on this (1991),[UN1][UN] not even in his later patent application fast weight programmers[FWP][FWP0-4a] through tensor-like outer products (1991-2016) and their motivation[FWP2][FWP4a][MIR](Sec. 8) (see also Sec. XVI above). learning sequential attention with NNs.[MIR](Sec. 9) Hinton[ATT3] (2010) our much earlier work on this[ATT1][ATT] although he was both reviewer and editor of my summary[ATT2] (1990; see Sec. XVI above).

      The ten priority disputes mentioned in the present Sec. XVII are not on the only ones.[R4] Remarkably, three of them are related to the 1991 paper[UN1][UN] which in many ways started what people now call deep learning, going beyond Most of them go back to work of 1990-91.[MIR] See Sec. I for additional related issues of credit assignment.

      Critique of 2018 Turing Award LeCun's team has made important contributions to CNNs since 1989.[CNN2,4] However, the basic CNN architecture with convolutional and downsampling layers is actually due to Fukushima (1979).[CNN1] NNs with convolutions were later (1987) combined by Waibel with weight sharing and backpropagation.[CNN1a] Waibel called this TDNN and All of this happened before LeCun's work on CNNs. See Sec. D above and Sec. 21 of the overview of our Annus Mirabilis 1990-1991.[MIR] at IJCNN 2011 in Silicon Valley, our DanNet[DAN][GPUCNN1-3] won the superhuman performance three times worse performance).[DAN1] Again see Sec. D. Baldi and Chauvin (1993) had the first application of CNNs with backpropagation to biomedical/biometric images.[BA93] at ICPR 2012, our DanNet[GPUCNN1-3] won the medical imaging contest (Sept 2012, on detection of mitosis/cancer)[GPUCNN5,7,8] (before the similar AlexNet won ImageNet 2012[GPUCNN5][R6] and the similar VGG network[GPUCNN9] won ImageNet 2014). mitosis detection.[MGC][GPUCNN5,7,8] Many major companies are using it now. See Sec. D & VII. ACM also explicitly mentions speech recognition, speech synthesis,[AM16][DL1] All of these fields were heavily shaped in the 2010s by our non-CNN methods.[DL1][DL4][AM16][GSR][GSR15][GT16][WU][FB17] See Sec. A, B, VI, XI.

      Critique of 2018 Turing Award As mentioned in Sec. XII, backpropagation was actually proposed earlier as a learning method for NNs by Werbos (1982)[BP2-4] (see also Amari's work on SGD for MLPs of 1967-68[GD1-2a]). recent work.[DL3,DL3a][DLC] In 1960, Kelley already had a precursor of the algorithm.[BPA] Furthermore, many besides LeCun have worked "to speed up backpropagation algorithms"[DL1] (ACM's wording). More on the history of backpropagation can be found at Scholarpedia.[DL2][BP4]

      Critique of 2018 Turing Award However, "hierarchical feature representation" in deep learning networks is what Ivakhnenko & Lapa (1965)[DEEP1-2] and Amari[GD1-2] (and also Fukushima[CNN1][DL2]) had long before LeCun. See Sec. D & II & XIII & V.

      Critique of 2018 Turing Award LeCun et al. neither cited the origins[BP1] (1970) of this widely used type of automatic differentiation for differentiable networks of modules[DL2][BP4-5][DLC] for such systems.[S80] See also Sec. XIX & XII. before LeCun who did not cite them. See also Pollack's even earlier relevant work;[PO87-90] compare the important work of Baldi and colleagues.[BA96-03]

      (Furthermore, "complex networks of modules where backpropagation is performed" were the central theme of my much earlier habilitation thesis (1993).[UN2] For example, our adaptive subgoal generators (1991)[HRL0-2] were trained through end-to-end-differentiable chains of such modules.[MIR](Sec. 10) planning and reinforcement learning with recurrent neural world models (1990).[PLAN][MIR](Sec. 11) Same for my linear transformer-like fast weight programmers[FWP0-2][FWP][ATT][MIR](Sec. 8) since 1991 (see Sec. XVI) see "100 Authors against Einstein."[AH1] ad hominem attacks[AH2-3][HIN] "If you cannot dispute a fact-based message, attack the messenger himself."[HIN] Science has a well-established way of dealing with plagiarism (which may be unintentional[PLAG1][CONN21] or not[FAKE2]) award can ever change that.[HIN] and their co-workers have contributed useful improvements of deep learning methods.[CNN2,4][CDI][LAN][RMSP][XAV][ATT14][CAPS] whom they did not cite, in contrast to ACM's Code of Ethics and Professional Conduct[ACM18] II, V, XII, XIX, XXI, XIII, XIV, XI, and XX, and 2). Sec. I, A, B, C, D, XVII, VI, and XVI). As emphasized earlier:[DLC][HIN] to self-correction,"[SV20] as is already the standard in other scientific fields. in popular science venues without peer review? For example, the narrator of a popular 2018 Bloomberg video[VID2] Germany and Switzerland (LSTM & CTC; see Sec. A) long before Hinton's methods. Similarly, in 2016, the NY Times published an article[NYT3] Google's original 2016 paper on Google Translate[WU] mentions LSTM over 50 times (see Sec. B). In ad hominem style,[AH2-3] claiming credit he doesn't deserve for many, many things",[NYT1] without LeCun also called the GANs of Bengio's team[GAN1] GANs are variations of my work in 1990.[AC90,90b][AC20][R2] According to Bloomberg,[AV2] Bengio has simply "denied my claims" without backing up his denial by any facts; see Sec. XVII. and forcefully contradict public figures who promote it."[FAKE] LBH, who called themselves the deep learning conspiracy,[DLC][DLC1-2] Our LSTM paper[LSTM1] has got more citations than any paper by Bengio or LeCun,[R5] Hinton's most cited paper (2012) is the one on GPU-based CNNs.[GPUCNN4][R5] It follows our earlier work on supervised deep NNs (2010)[MLP1] unsupervised pre-training for deep NNs by myself [UN][UN0-3] and later championed by Hinton;[UN4][VID1] see Sec. D). Hinton (2012)[GPUCNN4] characterizes our deep and fast DanNet (2011)[GPUCNN1-3] as AlexNet won one;[R6] see Sec. D, XIV. The highly cited VGG network (2014)[GPUCNN9] Hinton's 2nd most cited paper[RUM][R5] of Hinton's paper,[RUM] adding citations for a book by Rumelhart & McClelland[R5]). Backpropagation is a previously invented method[BP1] whose origins of Ivakhnenko whom he has never cited;[DEEP1-2][R7-R8] see Sec. II, XIII. Bengio's 2nd most cited research paper is the one on GANs (2014),[GAN1] which are instances of my artificial curiosity (1990)[AC90,90b][AC20][R2] which he did not cite; see Sec. XVII. Hinton's highly cited papers on unsupervised pre-training for deep NNs (2006-)[UN4] by ours[UN0-2][UN] were preceded by Hanson's[Drop1-3] As recently as of 2021, ACM published yet another misleading deep learning "survey" by LBH,[DL3a] again heavily citing LBH without Consult the Executive Summary and Sec. I-XXI of this critique for more. So virtually all the algorithms that have attracted have their conceptual and technical roots in my labs in Munich and Lugano,[MOST] of deep learning MLPs since 1965[DEEP1-2][GD1-2a] (see Sec. II, XX) and backpropagation (1960-70)[BPA][BP1] (see Sec. XIX, XII) and convolutional NNs since 1979[CNN1-4] (see Sec. XVIII, D). Our LSTM (1990s, see Sec. A, B; also for RL, 2003-, see Sec. C) → our Highway Net (May 2015) → ResNet (Dec 2015, see Sec. D). Our adversarial Artificial Curiosity (1990) → GANs (2010s, see Sec. XVII). our own unsupervised pre-training of deep NNs (1991, see Sec. II & III) for recurrent NNs in the 1990s → our LSTM (see Sec. A-C) and for feedforward NNs in 2010 → our DanNet (2011) → AlexNet (2012); VGG Net (2014) (see Sec. D). our LSTM brought essentially unlimited depth to gradient-based supervised recurrent NNs in the 1990s; our Highway Nets[HW1-3] brought it to feedforward NNs in May 2015.[MOST] superior computer vision (2011, see Sec. D, XVIII), medical diagnosis (2012, see Sec. VII, XVIII), and many other applications.[DEC] speech recognition (with our CTC, 2007-15, see Sec. A), machine translation (2016, see Sec. B), robotics & video game players (2018-19, see Sec. C), and many other applications.[DEC] Fast Weight Programmers (1991, see Sec. XVI) are formally equivalent to linear Transformers (now popular in NLP). I, A, B, C, D, VII, XVIII.

      As mentioned earlier,[MIR](Sec. 21) it is not always clear[DLC] depth that really learned.[DEEP1-2][R8] Soon afterwards, multilayer perceptrons learned internal representations through stochastic gradient descent in Japan.[GD1-2a] A few years later, modern backpropagation unintentional[PLAG1][CONN21] or intentional.[FAKE2]

      Yes, this critique is also an implicit critique of certain other awards to LBH.[HIN] reddit.com/r/MachineLearning[R1-R12] (the largest machine learning forum with back then over 800k subscribers), many of them influenced by my overview.[MIR]

      Dr. LeCun himself is well aware of the challenges to scientific integrity in our field:[LECP] "... else cites."[LECP] weights and an adaptive output layer.[R62] So Rosenblatt basically had what much later was rebranded as Extreme Learning Machines (ELMs)[ELM1] revisionist narrative of ELMs[ELM2][CONN21] self-proclaimed "deep learning conspiracy"[DLC1-2]

      Note that I am insisting on proper credit assignment not only in my own research field but also in quite disconnected areas,[HIN] as demonstrated by my numerous letters in this regard published in Science and Nature, e.g., on the history of aviation,[NASC1-2] the telephone,[NASC3] the computer,[NASC4-7] resilient robots,[NASC8] and scientists of the 19th century.[NASC9] AI scientists and AI historians equipped with artificial curiosity[SA17][AC90-AC20][PP-PP2][R1]

      Creative Commons LicenseThanks publication page and my arXiv page. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. J.  Schmidhuber (AI Blog, 2021). 3 decades of artificial curiosity & creativity. Our PDF. The first paper on planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks (more). PDF. More. PDF. PDF. PDF. PDF. (More on artificial scientists and artists.) IEEE link. PDF. With a brief summary of the generative adversarial neural networks of 1990[AC90,90b][AC20] (more). Preprint arXiv/1906.04493. ACM Code of Ethics and Professional Conduct. Association for Computing Machinery (ACM), 2018. Quote: Link. Link. [AIB] J. Schmidhuber. AI Blog. Includes variants of chapters of the AI Book. Blog of Werner Vogels, CTO of Amazon (Nov 2016): PDF. First publication of what was later sometimes called the Hopfield network[AMH2] or Amari-Hopfield Network.[AMH3] The Hopfield network or Amari-Hopfield Network was published in 1972 by Amari.[AMH1] [ATT] J. Schmidhuber (AI Blog, 2020). 30-year anniversary of end-to-end differentiable sequential neural attention. Plus goal-conditional reinforcement learning. We had both hard attention (1990) and soft attention (1991-93).[FWP] Today, both types are very popular. PDF. PDF. More. PS. (PDF.) arXiv/1409.0473, 2014-16. Bloomberg, May 15, 2018. Bloomberg, May 17, 2018. PDF. HTML. PDF. Precursor of modern backpropagation.[BP1-4] PDF. Link. PDF. First application of backpropagation[BP1] to NNs (concretizing thoughts in his 1974 thesis). [BP4] J. Schmidhuber (AI Blog, 2014; updated 2020). Who invented backpropagation? More.[DL2] English version: [CNN1+]. More in Scholarpedia. Link. [CNN1a] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. First application of backpropagation[BP1][BP2] and weight-sharing PDF. Spatial Averaging.[CNN1] Spatial Averaging.[CNN1] PDF. PDF. PDF. Since November 2021: Comments on version 1 of the present report[T21v1] in the Connectionists Mailing List, perhaps the oldest mailing list on artificial neural networks. Link to the archive. PDF. Beijing, 2014. Preprint arXiv:1402.3511 [cs.NE]. J. Schmidhuber (AI Blog, 2021). 10-year anniversary. In 2011, DanNet triggered the deep convolutional neural network (CNN) revolution. Named 1st superhuman result in 2011.[DAN1] J. Schmidhuber (AI Blog, 2011; updated 2021 for 10th birthday of DanNet): First superhuman visual pattern recognition. our artificial neural network called DanNet [DEC] J. Schmidhuber (AI Blog, 02/20/2020, revised 2021). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s. The [DIST1] J. Schmidhuber, 1991.[UN-UN2] More. Deep Learning. HTML. [DL3a] Y. Bengio, Y. LeCun, G. Hinton (2021). Turing Lecture: Deep Learning for AI. Communications of the ACM, July 2021. HTML. Local copy (HTML only). [DL4] J. Schmidhuber (AI Blog, 2017). Our impact on the world's most valuable public companies: Apple, Google, Microsoft, Facebook, Amazon... By greatly improved (CTC-based) on-device speech recognition (on the phone, not the server) LSTM. PDF. J. Schmidhuber (AI Blog, Nov 2020). 15-year anniversary: 1st paper with "learn deep" in the title (2005). Our deep reinforcement learning & neuroevolution solved problems of depth 1000 and more.[DL6] Soon after its publication, everybody started talking about "deep learning." Causality or correlation? Web site deeplearning.net of Y. Bengio's MILA (2015, retrieved May 2020; compare the version in the Internet Archive), referring to Hinton's[UN4] and Bengio's[UN5] unsupervised pre-training for deep NNs[UN4] (2006) although this type of deep learning dates back to 1991.[UN1-2][UN] II & XVII & III. [DLC] J. Schmidhuber (AI Blog, June 2015). Critique of Paper by self-proclaimed[DLC1-2] "Deep Learning Conspiracy" (Nature 521 p 436). arxiv:1312.5602. Link. Alphastar has a "deep LSTM core." arXiv:1808.03578, 2018. In fact, the ELM concept goes back to Rosenblatt's work around 1960.[R62] used LSTM over 4 billion automatic translations per day (The Verge, August 4, 2017); Facebook blog by J.M. Pino, A. Sidorov, N.F. Ayan (August 3, 2017) PDF. J.  Schmidhuber (AI Blog, 26 March 2021). alternative[FWP0-1] to recurrent NNs. the fast weights[FAST,FASTa] of Such Fast Weight Programmers[FWP0-6,FWPMETA1-7] can learn to memorize past data, e.g., by computing fast weight changes through additive outer products of self-invented activation patterns[FWP0-1] (now often called keys and values for self-attention[TR1-6]). The similar Transformers[TR1-2] combine this with projections linear Transformers or Performers[TR5-6] In 1993, I introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves. PDF. PDF. HTML. Pictures (German). PDF. Preprint: arXiv:1811.12143. PDF. PDF. Like [FWP0-2]. Preprint: arXiv:2003.08165. PDF. HTML overview. Linear Transformers Are Secretly Fast Weight Programmers. ICML 2021. Preprint: arXiv:2102.11174. Preprint: arXiv:2106.06295 (June 2021). PDF. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here. Preprint arXiv:2012.14905 [cs.LG], 2020. Report arXiv:2011.07831 [cs.AI], 2020. PDF. Probably the first paper on using stochastic gradient descent[STO51-52] reverse mode of automatic differentiation or backpropagation[BP1]). OCR-based PDF scan of pages 94-135 (see pages 119-120). Implementation of Amari's 1967 stochastic gradient descent method for multilayer perceptrons.[GD1] (S. Amari, personal communication, 2021.) Google Research Blog, Sep 2015, see also Aug 2015 Google's speech recognition based on CTC and LSTM. Alphr Technology, Jul 2015, or 9to5google, Jul 2015 WIRED, Sep 2016, siliconANGLE, Sep 2016 Blog post, Internet Archive, 2010. A blog post describing the basic ideas[AC][AC90, AC90b][AC20] of GANs. Description of GANs that does not cite the original work of 1990[AC][AC90,AC90b][AC20][R2] (also containing wrong claims about Predictability Minimization[PM0-2][AC20]). Link. This was number 1 on Hacker News. Frankfurter Allgemeine Zeitung, 16/6/2021. Preprint arXiv/2005.14165. for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. PDF. ArXiv preprint. win four important computer vision competitions 2011-2012 before others won any PDF. HTML overview. competitor.[DAN1] This led to massive interest from industry. [GPUCNN3] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, July 2012. PDF. Longer TR of Feb 2012: arXiv:1202.2745v1 [cs.CV]. More. PDF. J. Schmidhuber (AI Blog, 2017; updated 2021 for 10th birthday of DanNet): History of computer vision contests won by deep CNNs since 2011. DanNet won 4 of them in a row before the similar AlexNet/VGG Net and the Resnet (a Highway Net with open gates) joined the party. Today, deep CNNs are standard in computer vision. PDF. PDF. first deep learner to win a medical imaging contest (2012). HTML. [HIN] J. Schmidhuber (AI Blog, 2020). Critique of Honda Prize for Dr. Hinton. Science must not allow corporate PR to distort the academic record. PDF. North-Holland, 1991. PDF. Extending TR FKI-129-90, TUM, 1990. PDF. PDF. Preprints arXiv:1505.00387 (May 2015) and arXiv:1507.06228 (July 2015). Also at NIPS 2015. The LSTM with forget gates[LSTM2] for RNNs.) Resnets[HW2] are a version of this where the gates are always open: g(x)=t(x)=const=1. Highway Nets perform roughly as well as ResNets[HW2] on ImageNet.[HW3] Highway layers are also often used for natural language processing, where the simpler residual layers do not work as well.[HW3] More. Link. arXiv:1512.03385 (Dec 2015). Residual nets are a version of Highway Nets[HW1] More. arxiv:1612.07771 (2016). Also at ICLR 2017. Preprint arXiv:1704.04760 PDF. PDF. arXiv:1607.06450, 2016. A New Publishing Model in Computer Science. Local copy (HTML only). [LEI21] J. Schmidhuber (AI Blog, 2021). 375th birthday of Leibniz, founder of computer science. Frankfurter Allgemeine Zeitung (FAZ), 17/5/2021. FAZ online: 19/5/2021. PDF. [LSTM1] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. PDF. Based on [LSTM0]. More. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. Preprint: arxiv:1506.07452. PDF. J. Schmidhuber (AI Blog, Dec 2020). 10-year anniversary of our journal paper on deep reinforcement learning with policy gradients for LSTM (2007-2010). Recent PDF. Preprint arXiv:1805.04908. Architectures. Preprint arXiv:1703.03906 J. Schmidhuber (AI Blog, 2020). 1/3 century anniversary of Searchable PDF scan (created by OCRmypdf which uses LSTM). HTML. better GP methods through Meta-Evolution. More. [MIR] J. Schmidhuber (AI Blog, 2019). Deep Learning: Our Miraculous Year 1990-1991. Preprint arXiv:2005.05744, 2020. Computation 22(12): 3207-3220, 2010. ArXiv Preprint. (AI Blog, Sep 2020). 10-year anniversary of supervised deep learning breakthrough (2010). No unsupervised pre-training. By 2010, when compute was 100 times more expensive than today, both our feedforward NNs[MLP1] J.  Schmidhuber (AI Blog, 2021). The most cited neural networks all build on work done in my labs. Foundations of the most popular NNs originated in my labs at TU Munich and IDSIA. Here I mention: (1) Long Short-Term Memory (LSTM), (2) ResNet (which is our earlier Highway Net with open gates), (3) AlexNet and VGG Net (both citing our similar earlier DanNet: the first deep convolutional NN to win image recognition competitions), Adversarial Artificial Curiosity), and (5) variants of Transformers (linear Transformers are formally equivalent to my earlier Fast Weight Programmers). Annus Mirabilis of 1990-1991.[MIR] Preprint arXiv:1611.01578 (PDF), 2017. [NASC1] J. Schmidhuber. First Pow(d)ered flight / plane truth. Correspondence, Nature, 421 p 689, Feb 2003. [NASC3] J. Schmidhuber. The last inventor of the telephone. Letter, Science, 319, no. 5871, p. 1759, March 2008. Correspondence, Nature, vol 483, p 541, March 2012, doi:10.1038/483541b. Letter, Science, vol 336, p 1639, June 2012. See also comment on response by A. Hodges (DOI:10.1126/science.336.6089.1639-a) [NASC6] J. Schmidhuber. Colossus was the first electronic digital computer. Correspondence, Nature, 441 p 25, May 2006. [NASC7] J. Schmidhuber. Turing's impact. Correspondence, Nature, 429 p 501, June 2004 [NASC8] J. Schmidhuber. Prototype resilient, self-modeling robots. Correspondence, Science, 316, no. 5825 p 688, May 2007. [NASC9] J. Schmidhuber. Comparing the legacies of Gauss, Pasteur, Darwin. Correspondence, Nature, vol 452, p 530, April 2008. HTML. Link. NY Times article NY Times article Learning Dexterous In-Hand Manipulation. arxiv:1312.5602 (PDF). arxiv:1912.06680. An LSTM composes 84% of the model's total parameter count. 2018. An LSTM with 84% of the model's total parameter count was the core of OpenAI Five. PDF. HTML. Link. J. Schmidhuber (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990). This work also introduced high-dimensional reward signals, deterministic policy gradients for RNNs, the GAN principle Based on TR FKI-126-90 (1990).[AC90] More. PDF. Partially based on TR FKI-126-90 (1990).[AC90] Report arXiv:1210.0118 [cs.AI], 2015. One Big Net For Everything. Preprint arXiv:1802.08864 [cs.AI], Feb 2018. Preprint: arXiv:1809.01999. Github: World Models. minimization. TR CU-CS-565-91, Univ. Colorado at Boulder, 1991. PDF. More. 1991. PDF. More. PDF. More. arXiv:1112.5309 [cs.AI] First Experiments with PowerPlay. arXiv:1210.8385 [cs.AI]. [R1] Reddit/ML, 2019. Hinton, LeCun, Bengio receive ACM Turing Award. [R2] Reddit/ML, 2019. J. Schmidhuber really had GANs in 1990. [R3] Reddit/ML, 2019. NeurIPS 2019 Bengio Schmidhuber Meta-Learning Fiasco. [R4] Reddit/ML, 2019. Five major deep learning papers by G. Hinton did not cite similar earlier work by J. Schmidhuber. [R5] Reddit/ML, 2019. The 1997 LSTM paper by Hochreiter & Schmidhuber has become the most cited deep learning research paper of the 20th century. [R6] Reddit/ML, 2019. DanNet, the CUDA CNN of Dan Ciresan in J. Schmidhuber's team, won 4 image recognition challenges prior to AlexNet. [R7] Reddit/ML, 2019. J. Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970. [R8] Reddit/ML, 2019. J. Schmidhuber on Alexey Ivakhnenko, godfather of deep learning 1965. [R9] Reddit/ML, 2019. We [R11] Reddit/ML, 2020. Schmidhuber: Critique of Honda Prize for Dr. Hinton [R12] Reddit/ML, 2020. J. Schmidhuber: Critique of Turing Award for Drs. Bengio & Hinton & LeCun [R15] Reddit/ML, 2021. J. Schmidhuber's work on fast weights from 1991 is similar to linearized variants of Transformers Preprint arXiv/1311.2524, Nov 2013. Preprint arXiv/1703.06870, 2017. PDF. This experimental analysis of backpropagation did not cite the origin of the method,[BP1-4] also known as the reverse mode of automatic differentiation. Link. The Past, Present and Future of Artificial Intelligence. PDF. PDF. ACM's justification of the 2018 A.M. Turing Award (announced in 2019). WWW link. Local copy 1 (HTML only). Local copy 2 (HTML only). [T20a] J. Schmidhuber (AI Blog, 25 June 2020). Critique of 2018 Turing Award for Drs. Bengio & Hinton & LeCun. The first version of the present critique. Technical Report IDSIA-77-21 (v1), IDSIA, 24 Sep 2021. Link. Link. [TUR21] J. Schmidhuber (AI Blog, Sep 2021). Turing Oversold. It's not Turing's fault, though. J. Schmidhuber (AI Blog, 2021). 30-year anniversary. 1991: First very deep learning with unsupervised pre-training. Unsupervised PDF. 1992. Based on TR FKI-148-91, TUM, 1991.[UN0] PDF. approaches are now widely used. More. [UN2] J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here (depth > 1000). 2006. PDF. Link. [VAN1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TUM, 1991 (advisor J. Schmidhuber). PDF. More on the Fundamental Deep Learning Problem. PDF. [VAN4] Y. Bengio. Neural net language models. Scholarpedia, 3(1):3881, 2008. Link. Link. Youtube video [see 28:16]. But in 2010, our team showed[MLP1-2] unsupervised pre-training is not necessary Youtube video, 2018. Preprint arXiv:1609.08144 (PDF), 2016. Based on LSTM which it mentions at least 50 times. WWW link (retrieved 15 May 2020). Local copy (plain HTML only). a general, practical, program-controlled computer. PDF. J. Schmidhuber (AI Blog, 2021). 80th anniversary celebrations: 1941: Konrad Zuse completes the first working general computer, based on his 1936 patent application.

      Deep Learning: Our Miraculous Year 1990-1991 Menu directory status & updates copyrights Scientific Integrity, the 2021 Turing Lecture, and the 2018 Turing Award for Deep Learning AI Blog
      @SchmidhuberAI This is a point-for-point critique of ACM's justification of the ACM A. M. Turing Award for deep learning, as well as a critique of the Turing Lecture given by the awardees (published by ACM in July 2021). 2015 survey of deep learning[DL1] June 2020 article[T20a][R12] (see Executive Summary I, V, II, XII, XIX, XXI, XIII, XIV, XX, XVII). (A) speech recognition, (B) natural language processing, (C) robotics, (D) computer vision, (VII) medicine, astronomy, materials science. A, B, C, D, VII, XVII, VI, XVI). II, V, XX, XVIII) with Dr. Bengio & Dr. Hinton (see Sec. XVII, I). I respond to LBH's recent ACM article (July 2021). expands material in my Critique of the 2019 Honda Prize[HIN] (~3,000 words). Abstract & Outline (~300 words), Introduction (~300 words), Critique of LBH's ACM article (Turing Lecture) of July 2021[DL3a] Executive summary of what's wrong with ACM's laudation (~1,000 words), 21 comments on 21 claims by ACM (~8,000 words), Conclusion and Acknowledgments (~2,000 words). All backed up by over 250 references (~9,000 words). The text contains numerous hyperlinks to relevant overview sites from the AI Blog. science is self-correcting."[SV20] they are mine or other people's.[DL1-2][HIN][NASC1-9] The present page is offered as a resource for all good computer scientists who share this inclination. and to fight plagiarism, collusion rings,[LIT21] and systemic academic corruption in all of their more and less subtle forms.[FAKE] Sec. 2 LBH's 2021 ACM article[DL3a] which necessitated an extension of the first version of this post.[T20a][R12] ACM's official justification[T19] of the 2018 A.M. Turing Award[R1] After the Executive Summary in Sec. 3, Sec. 4 will split ACM's full text[T19] into 21 parts I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, XIX, XX, XXI. Most of the critiques are based on references to original papers and material from the AI Blog.[AIB][MIR][DEC][HIN] publishing yet another misleading overview of the field, this time based on LBH's Turing Lecture.[DL3a] LBH's well-known earlier omissions.[DLC][HIN][T20a] LBH claim to "briefly describe the origins of deep learning"[DL3a] without even mentioning the world's first working deep learning nets by Ivakhnenko and Lapa in 1965[DEEP1-2][R8] (see Sec. II). this class of methods was pioneered in 1991[UN-UN2] (see Sec. II, III). Highway Net, the first really deep feedforward NN.[HW1-3] (see Sec. D, VI). were all driven by my lab:[MOST] In 1991, I had the first very deep NNs based on unsupervised pre-training;[UN-UN2] LSTMs brought essentially unlimited depth to gradient-based supervised recurrent NNs;[LSTM0-17] later our Highway Nets[HW1-3] brought it to feedforward NNs. from 2007[LSTM4,14] based on LSTM[LSTM0-6] (1990s-2005) and CTC (2006).[CTC] our CTC-LSTM-based speech recognition (not that of Hinton) had been on most smartphones for years[GSR][GSR15-19][DL4] (see Sec. A, VI, XI, XV). Similarly for machine translation (see Sec. B). LBH cite Hinton (2012) for "dropout" without mentioning that dropout is just a variant of Hanson's 1990 stochastic delta rule[Drop1-2] (see Sec. XIV). von der Malsburg who introduced ReLUs in 1973[CMB] (see Sec. XIV). called AlexNet,[GPUCNN4] without mentioning that our earlier groundbreaking deep GPU-based DanNet[GPUCNN1-3,5-8][DAN] did not need ReLUs at all to win 4 earlier object recognition competitions and to achieve superhuman results already in 2011[GPUCNN1-8][R5-6] (see Sec. XIV). XVIII). already in 1965[DEEP1-2][R8] (see Sec. II). earlier fast weights of von der Malsburg (1981) and Feldman (1982).[FAST,FASTa-b][FWP] described in the 1991-93 papers on Fast Weight Programmers and linear Transformers[FWP0-1,6] (see Sec. XVI, XVII-2). dedicate an extra section to attention-based Transformers,[TR1-6] citing Bengio's team (2014) for "soft attention"[ATT14] without citing the much earlier original work of 1991-1993 on soft attention and linear Transformers[FWP,FWP0-2,6][ATT] (see Sec. XVII-1, XVI). LBH claim that Bengio's team[NPM] of text compression[SNT] (see Sec. XVI, XVII-1). LBH cite Bengio's 2014 paper on Generative Adversarial Networks (GANs)[GAN0-1] without mentioning that GANs are instances of the Adversarial Curiosity Principle of 1990[AC90-20][MIR](Sec. 5) (see Sec. XVII). In summation, LBH have repeatedly chosen to ignore the previous well-known critiques[DLC][HIN][T20a] and deep learning surveys,[DL1-2] and deep learning (e.g., Sec. I), ACM lauds Numerous references can be found under the relevant section links I-XXI which adhere to the sequential order of ACM's text[T19] Sec. II: it became really deep in 1991 in my lab, unsupervised pre-training of NNs, supervised LSTM. Sec. I contains 4 subsections A, B, C, D A: Speech Recognition (see also Sec. VI & XI & XV): The first superior end-to-end neural speech recognition combines two methods from my lab: LSTM (1990s-2005) and CTC (2006), which were Hinton (2012) and Bengio (XV) our revolutionary CTC-LSTM which was soon on most smartphones. Sec. B: Natural Language Processing (see also Sec. VI & XI & XVI): (soon used for several billions of was also based on our LSTM. Sec. C: Robotics. most visible breakthroughs Sec. D: Computer Vision XVIII & XIV & XI & VI) and applied to speech. All before LeCun's CNN work (XVIII). deep NNs pre-training (in contrast to Hinton's claims). Our DanNet was the first CNN fast & deep enough for superior computer vision in 2011, winning 4 image recognition contests in a row is an open-gated version of our earlier Highway Nets. Sec. XIV: deep & fast CNN (where LeCun participated), Sec. XI: ACM mentions GPU-accelerated NNs deep GPU-NN of 2010 debunked unsupervised pre-training (introduced by myself in 1991 and later championed by Hinton), and our GPU-CNN of 2011 (DanNet) was the first XVIII: Fukushima and Waibel (see Sec. D). VII: ACM explicitly mentions medicine and first to win medical imaging competitions Sec. XII & XIX & XXI: Modern backpropagation XIII & II & V III & IX & X & XX): Sec. XX: ACM credits LeCun for work on Sec. XXI: ACM credits LeCun for work on XV: ACM credits Bengio for hybrids of NNs and probabilistic models of sequences. CTC-LSTM A & B). XVI: ACM We started this in 1990-93 long before LBH Sec. XVII: Artificial Curiosity vanishing gradients (1991), metalearning (1987), unsupervised pre-training (1991), compressing or distilling one NN into another (1991), learning sequential attention with NNs (1990), fast weight programmers using and other topics.[R2-R6] Sec. IV is on Turing (1936) and his predecessors Critique of LBH's ACM article (Turing Lecture) of July 2021. Sec. Conclusion: In the recent decade of deep learning, (speech recognition, language translation, etc.) on billions of devices (also healthcare applications) Sec. II & III & V & XII & XIII & XVII & XIV & XIX & XX & XXI. In what follows, ACM's full text [T19] is split into 21 parts I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, XIX, XX, XXI.

      Critique of 2018 Turing Award LBH and their co-workers have contributed certain useful improvements of existing deep learning methods.[CNN2,4][CDI][LAN][RMSP][XAV][ATT14][CAPS] (1965),[DEEP1-2][R8] modern backpropagation (1970),[BP1-2][R7] architectures of recurrent NNs (1943-56)[MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC90,90b][AC20] unsupervised pre-training for deep NNs (1991),[UN1-2] vanishing gradients (1991)[VAN1] & Long Short-Term Memory or LSTM (Sec. A), GPU-accelerated NNs (2004),[GPUNN][DAN][DAN1][GPUCNN5] NNs with over 100 layers (2015),[HW1-3][R5] transformer-like[TR1-6][FWP] attention[FWP][ATT] through fast weight programmers (1991).[FWP0-2,6] [DL1-2][R2-R8] Often LBH failed to cite essential prior work, even in their later surveys.[DL3,DL3a][DLC][HIN][MIR](Sec. 21)[R2-R5, R7-R8] This may explain some of ACM's misattributions.[T19] II & III & V & XIII & X & XVII & XII & XVIII & XX. The deep NNs By the 2010s,[DEC] they were academia and industry,[DL4] mentioned by ACM (labeled as A, B, C, D) below: Long Short-Term Memory or LSTM (1990s-2005)[LSTM0-6] vanishing gradient problem student Sepp Hochreiter in 1991.[VAN1] This happened long before the similar work of Bengio (see Sec. XVII).[MIR] (Sec. 3,Sec. 4) LSTM was refined with my student Felix Gers[LSTM2] through "forget gates" based on end-to-end-differentiable fast weights.[MIR](Sec. 8)[FWP,FWP0-1] (A2) Connectionist Temporal Classification by my student Alex Graves et al. (2006).[CTC] Our team successfully applied CTC-trained LSTM to speech in 2007[LSTM4] (also with hierarchical LSTM stacks[LSTM14]). Markov models (HMMs)[BW][BRI][BOU] (Sec. XV). Hinton et al. (2012) still used the old hybrid approach[HYB12] and did not compare it to CTC-LSTM. became the first recurrent NN (RNN) to win international competitions. He later reused our end-to-end neural speech recognizer[LSTM4][LSTM14] as a postdoc in Hinton's lab.[LSTM8] CTC-LSTM dramatically improved Google's speech recognition.[GSR][GSR15][DL4] on-device speech recognition[GSR19] (not any longer on the server) LSTM[MIR](Sec. 4) (see Sec. VI & XI & XV). of text[SNT] (see Sec. XVI). In 2001, we showed that LSTM can learn languages unlearnable by traditional models such as HMMs,[LSTM13] See also Sec. VI & XI & XV. tailored by Bengio's team.[ATT14][FWP] However, such attention mechanisms also have their roots in my lab (1991);[FWP][FWP0-2,6] see Sec. XVI. C. Robotics & RL etc. Since 2003, our team has used LSTM for Reinforcement Learning (RL) and robotics.[LSTM-RL][RPG][LSTMPG] In the 2010s, For example, in 2018, a PG-trained LSTM was the core of OpenAI's famous Dactyl which learned to control a dextrous robot hand without a teacher.[OAI1][OAI1a] beat a pro player in the game of Starcraft, which is theoretically harder than Chess or Go[DM2] in many ways, using Alphastar whose brain has a deep LSTM core trained by PG.[DM3] OpenAI Five which learned to defeat human experts in the Dota 2 video game (2018).[OAI2] Bill Gates called this a "huge milestone in advancing artificial intelligence".[OAI2a][MIR](Sec. 4)[LSTMPG] Apart from A, B, C above, in healthcare, chemistry, molecular design, lip reading, speech synthesis,[AM16] predicting what's going on in nuclear fusion reactors, and so on.[DEC][DL4] was being used for LSTM (only 5% for the CNNs of Sec. D).[JOU17] Apparently the first LSTM journal paper[LSTM1][R5] is now the most frequently cited D. Computer Vision was revolutionized in the 2010s by a particular feedforward NN called the convolutional NN (CNN).[CNN1-4] The basic CNN architecture with convolutional and downsampling layers is due to Fukushima (1979).[CNN1] The popular downsampling variant called max-pooling was introduced by Weng et al. (1993).[CNN3] In 1987, NNs with convolutions were combined by Waibel with weight sharing and backpropagation.[CNN1a] Waibel did not call this CNNs but TDNNs. LeCun's team later contributed improvements of CNNs, especially for images[CNN2,4] (see Sec. XVIII). Finally, my own team showed in 2010[MLP1] unsupervised pre-training is not necessary to train deep NNs, contrary to claims by Hinton[VID1] who said that "nobody in their right mind would ever suggest" this. Then we Our fast GPU-based CNN of 2011[GPUCNN1] known as DanNet[DAN,DAN1][R6] CNNs of 2006.[GPUCNN] winning four of them in a row (15 May 2011, 6 Aug 2011, 1 Mar 2012, 10 Sep 2012).[GPUCNN5] at IJCNN 2011 in Silicon Valley, DanNet blew away the competition and achieved the first superhuman visual pattern recognition[DAN1] in an international contest (where LeCun's team took a distant second place, with DanNet was also the first deep CNN to win: a Chinese handwriting contest (ICDAR 2011), an image segmentation contest (ISBI, May 2012), CVPR paper on DanNet[GPUCNN3] of Hinton's student Krizhevsky won the ImageNet[IM09] 2012 contest[GPUCNN4-5][R6] (now also without unsupervised pre-training, citing DanNet). Our CNN image scanners were 1000 times faster than previous methods.[SCAN] The VGG network (ImageNet 2014 winner)[GPUCNN9] and other highly cited CNNs[RCNN1-3] further extended the work of 2011.[MIR](Sec. 19) ResNet, the ImageNet 2015 winner[HW2] (Dec 2015) which currently gets more citations per year[MOST] Highway Net (May 2015).[HW1-3][R5] The Highway Net is actually the feedforward net version of vanilla LSTM.[LSTM2] It was the first working, really deep feedforward NN with hundreds of layers (previous NNs had at most a few tens of layers). See also Sec. XVIII & XIV & XI & VI.

      Critique of 2018 Turing Award appeared long before the 1980s. were proposed already in the 1940s/50s[MC43][K56] (but don't forget prior work in physics since the 1920s[L20][I25][K41][W45]). deep convolutional NN architecture was proposed in the 1970s.[CNN1] NNs without hidden layers learned in 1958[R58] regression and the method of least squares[DL1-2]). about deeper adaptive NNs[R61,R62] layers (already containing the now popular multiplicative gates).[DEEP1-2][DL1-2] A paper of 1971[DEEP2] highly cited method which was still popular in the new millennium,[DL2] especially in Eastern Europe, where much of Machine Learning was born. Ivakhnenko did not call it an NN, but that's what it was.[MIR](Sec. 1)[R8] LBH failed to cite this. XIII & III & V & VIII & IX & X. LBH & co-authors, e.g., Sejnowski[S20] (see Sec. XIII). It goes more or less like this: "In 1969, Minsky & Papert[M69] researchers took a fresh look at the problem in the 1980s."[S20] However, as mentioned above, the 1969 book[M69] addressed a "problem" of Gauss & Legendre's shallow learning (~1800)[DL1-2] that had already been solved 4 years prior by Ivakhnenko & Lapa's popular deep learning method.[DEEP1-2][DL2] Minsky was apparently unaware of this and failed to correct it later.[HIN](Sec. I) (but see a 1989 paper[MOZ]). However, it became really deep in 1991 in my lab,[UN-UN3] which has See Sec. 1 of the overview:[MIR] First Very Deep NNs, Based on Unsupervised Pre-Training (1991). "Very Deep Learning" tasks of depth > 1000.[UN2][DL1][UN] (By 2003, LSTM variants successfully dealt with language problems of depth up to 30,000[LSTM17] more.) drove the shift from unsupervised pre-training to purely supervised learning (1991-95; 2006-10).[HIN](Sec. II)[MIR] (Sec. 19) III. Note that LSTMs brought essentially unlimited depth to supervised recurrent NNs; Highway Nets[HW1-3] brought it to feedforward NNs.[MOST]

      Critique of 2018 Turing Award by others (Sec. III).[DLC][DEEP1-2][BP1][DL1-2][R7-R8][R2-R4] deep learning multilayer perceptrons (1965),[DEEP1-2][R8] modern backpropagation (1970),[BP1,2][R7] architectures of recurrent NNs (1943-56)[MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC90,90b][AC20] unsupervised pre-training for deep NNs,[UN1-2] the vanishing gradient problem (1991)[VAN1] & solutions to it (Sec. A), GPU-accelerated NNs (2004),[GPUNN][GPUCNN5] and other foundations.[DL1-2][R2-R8] Often LBH failed to cite essential prior work.[DLC][HIN][MIR](Sec. 21) II & V & XIII & IX & X & XVII & XII & XVIII & XX & I. deeplearning.net which until 2019 advertised deep learning as "moving beyond shallow machine learning since 2006",[DL7] referring to Hinton's[UN4] and Bengio's[UN5] we had this type of deep learning already in 1991;[UN][UN1-2] see Sec. II & XVII (5). Not to mention Ivakhnenko's even earlier supervised layer-wise training of deep NNs[DEEP1-2] which Hinton,[UN4] Bengio,[UN5] and LBH[DL3,DL3a] did not cite either. See Sec. X.

      Critique of 2018 Turing Award my comments systematically track the sequential order of ACM's claims.[T19]

      ACM's statement on Turing is greatly misleading, like some of its other statements.[T19] any type of computation-based AI.[GOD][BIB3][MIR](Sec. 18)[GOD21,21a] Much of early AI in the 1940s-70s was actually about theorem proving[ZU48][NS56]

      In 1936, Turing Turing Machine.[TUR] He rederived the above-mentioned result,[CHU][TUR][HIN][GOD21,21a][TUR21][LEI21,21a] In the same year of 1936, Emil Post published yet another independent universal model of computing,[POS] my reply to Hinton who criticized my website on Turing without suggesting any fact-based corrections.[HIN]) open problem "P=NP?" in his famous letter to John von Neumann (1956).[GOD56][URQ10] Likewise, Konrad Zuse (1910-1995) created the world's first working programmable general-purpose computer 1935-41. His patent application of 1936[ZU36-38][Z36][RO98][ZUS21] predating Claude Shannon's 1937 thesis on digital circuit design.[SHA37] Zuse also created the first high-level programming language in the early 1940s.[BAU][KNU] conditional jump instruction.[RO98]

      Critique of 2018 Turing Award that learn internal representations (1965),[DEEP1-2][R8] modern backpropagation (1970),[BP1,2][R7] architectures of recurrent NNs (1943-56)[MC43][K56] and convolutional NNs (1979),[CNN1] principles of generative adversarial NNs and artificial curiosity (1990),[AC][AC90,90b][AC10][AC20] unsupervised pre-training for deep NNs (1991),[UN1-2][UN] vanishing gradients (1991)[VAN1] & solutions to it (Sec. A),[LSTM0-17][CTC] (2004),[GPUNN][GPUCNN5] record-breaking deep supervised NNs (2010)[MLP1-2] and contest-winning deep CNNs (2011),[DAN][DAN1][GPUCNN5] NNs with over 100 layers (2015),[HW1-3][R5] transformer-like[TR1-6][FWP] attention[FWP][ATT] through fast weight programmers (1991),[FWP0-2,6] and more.[DL1-2][R2-R8] Often LBH failed to cite essential prior work.[DL3,DL3a][DLC][HIN][MIR](Sec. 21)[R2-R5,R7,R8,R11] II & I & III & XIII & X & XVII & XII & XVIII & XX.

      Critique of 2018 Turing Award "advances in natural language processing" and in speech supervised NNs and CNNs achieved by our group 2010-2011[MLP1-2][DAN][DAN1][GPUCNN5][R6] and through Highway Net-like NNs (2015),[HW1-3][R5] although the principles of CNNs were invented and developed by others since the 1970s.[CNN1-4] See Sec. D & XVIII & XIV as well as Sec. 4 & Sec. 19 of the overview.[MIR]

      Critique of 2018 Turing Award DanNet[DAN][DAN1][GPUCNN5] the first NN to win a medical imaging contest through deep learning (Sept 2012, on cancer detection).[GPUCNN5,8] and were able to greatly improve steel defect detection.[ST] All of this happened before the similar GPU-accelerated AlexNet of Hinton's student Krizhevsky won ImageNet 2012.[GPUCNN5][R6] mitosis detection.[MGC][GPUCNN5,8] approach of D & XI).

      Critique of 2018 Turing Award without citing them.[DL1][DLC][HIN][R2-R4][R7-R8] V & XII & XIX & II & III & XIII & XVII & X & I.

      Critique of 2018 Turing Award who failed to cite them, even in later work.[HIN][DLC][DL1-2][DEEP1-2][CMB][R7-R8] See Sec. II & III & XIII & V & X & XIV & I.

      Critique of 2018 Turing Award first introduced to Machine Learning by Dechter (1986), and to NNs by Aizenberg et al (2000).[DL2] To my knowledge, LBH have never cited them. (Margin note: our 2005 paper on deep RL[DL6,6a] was the first machine learning LBH started talking about "deep learning ... moving beyond shallow machine learning since 2006",[DL7] referring to their unsupervised pre-training methods of 2006. See Sec. III. others built careers on this notion long before LBH recognized this.[DEEP1-2][CNN1][HIN][R8][DL1][DLC] Even deep learning through unsupervised pre-training was introduced by others.[UN1-3][R4][HIN](Sec. II) II & III & XIII & V & I.

      Critique of 2018 Turing Award ignored by LBH's papers[HIN][R7-R8][R2-R5] (see Sec. V & II & III & I & XIII & XII & XIX & X & XVII).

      ACM correctly mentions advancements through GPUs. The first to use GPUs for NNs were Jung & Oh (2004),[GPUNN][GPUCNN5] made GPU-based NNs fast and deep enough an important benchmark record,[MLP1-2] unsupervised pre-training (pioneered by myself in 1991) is not necessary to train deep NNs, contrary to Hinton's claims.[VID1] our CNNs were deep and fast enough[DAN][DAN1][GPUCNN5] vision (explicitly mentioned by ACM) for the first time[R6] (see Sec. D).

      Furthermore, by the mid 2010s, speech recognition and machine translation (explicitly mentioned by ACM) were actually dominated by LSTM and CTC of our team.[LSTM1-4][CTC] In particular, as mentioned in Sec. A, such as HMMs.[BW][BOU][BRI][HYB12] As mentioned in Sec. B and XVI, the first superior end-to-end neural machine translation was also based on LSTM.

      Critique of 2018 Turing Award ACM's statement is "less wrong" than Honda's[HIN](Sec. I) but still (and apparently even other award committees[HIN](Sec. I) backpropagation by Rumelhart et al. (1985-86)[RUM] (1982).[BP2] And the article[RUM] even failed to mention Linnainmaa, the inventor of this famous algorithm for credit assignment in networks (1970),[BP1] Kelley already had a precursor thereof in the field of control theory;[BPA] see also later work of the early 1960s.[BPB][BPC][R7] internal representations in hidden layers of NNs.[RUM] But this was essentially just an experimental analysis of a known method.[BP1-2] And history of backpropagation can be found at Scholarpedia[DL2] and in my award-winning survey.[DL1] Also see Sec. XIX, II.

      Some claim that "backpropagation is just the chain rule of Leibniz (1676) & L'Hopital (1696)." No, it is the efficient way of applying the chain rule to big networks with differentiable nodes (there are also many inefficient ways of doing this). It was not published until 1970.[BP1] recent debate:[HIN] It is true that in 2018, Hinton[AOI] Rumelhart[RUM] with the "invention" of backpropagation. for "creating" the method and for other things he didn't do.[HIN] Neither in a popular book[AOI] nor in other recent work[DL3,DL3a] did he cite Linnainmaa (1970),[BP1] the true creator.[BP4-5] that his 2015 survey[DL3] does cite Werbos (1974) who however described the method correctly only later in 1982[BP2] and also failed to cite Linnainmaa[BP1] (compare Amari's work of 1977[BP6]). Linnainmaa's method was well-known.[BP5][DL1-2][DLC] It wasn't created by "lots of different people" as Hinton suggested[AOI][HIN][R11] one person who published first[BP1] and therefore should get the credit.

      Critique of 2018 Turing Award Boltzmann Machine (BM)[BM] a learning.[HIN] Recently, however, I learnt through a reader that even the BM paper[BM] did not cite prior relevant work by Sherrington & Kirkpatrick[SK75] and Glauber.[G63] (Compare related work.[H86][H88][S93]) multilayer perceptrons with arbitrarily many layers.[DEEP1-2][HIN] Sec. II V & X.[MIR](Sec. 1)[R8]

      As mentioned in Sec. II, Sejnowski's rather self-serving "history of deep learning" [S20] claims: In 1969, Minsky & Papert[M69] at the problem in the 1980s."[S20] However, the 1969 book[M69] addressed a "deep learning problem" (a limitation of Gauss & Legendre's shallow learning around 1800[DL1-2]) that had already been solved four years prior (see Sec. II), also in the 1970s, especially outside of the Anglosphere.[DEEP2][BP6][CNN1][DL1-2]

      Critique of 2018 Turing Award Dropout is actually a variant of Hanson's much earlier stochastic delta rule (1990).[Drop1-2] Hinton's 2012 paper and his later patent did not cite this either. as we showed already in 2011 in a contest where LeCun's team participated as well,[DAN1] Sec. D above. Back then, the only really of deep CNNs through GPUs.[GPUCNN1,3,5][R6] Already before ImageNet 2012,[R6] fast deep CNN called DanNet a monopoly on winning computer vision competitions.[GPUCNN5] It more than "halved the error rate for object recognition" (ACM's wording) in a contest already in 2011[GPUCNN2][DAN,DAN1][R6] long before the similar system of Hinton's student. See Sec. D as well as Sec. 19 of the overview.[MIR]

      Critique of 2018 Turing Award since the late 1980s.[BW][BRI][BOU] LSTM (1990s-2005)[LSTM0-6] and CTC[CTC] (2006), which were applied to speech in 2007.[LSTM4][LSTM14] CTC-LSTM is end-to-end-neural and thus very different from (and superior to) the hybrid methods since the late 1980s.[BW][BRI][BOU][HYB12] See also Sec. A.

      Critique of 2018 Turing Award 5 years earlier, in 1995, we already had a similar, excellent neural probabilistic text model.[SNT] Bengio[NPM] characterizes it only briefly as "related" (see also Pollack's earlier work on embeddings of words and other structures[PO87][PO90]). In the 2010s, was actually the LSTM of our team,[LSTM0-6] which Bloomberg called the "arguably the most commercial AI achievement."[AV1][MIR](Sec. 4) See Sec. B. Bengio's team[ATT14] has indeed become important. For example, it helped to further improve Facebook's LSTM-based translation (see Sec. B). adaptive neural sequential attention: end-to-end-differentiable "soft" attention in the latent space of Fast Weight Programmers (FWPs),[FWP2][FWP] and "hard" attention (in observation space) in the context of RL[ATT][ATT0-1] (1990). attention-based Transformers[TR1-6] are FWPs of 1991[FWP0-1] which have become a popular alternative to RNNs. My FWP of 1991[FWP0-1] (now often called keys and values for self-attention).[TR1-6][FWP] the 2010s,[DEC] Transformers[TR1-2] a traditional LSTM domain (see Sec. B). rapidly learn to solve quickly[LSTM13,17] linear Transformers or Performers[TR5-6] which are formally equivalent to my 1991 FWPs (apart from normalization).[FWP6][FWP] In 1993, I introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves.

      See[MIR](Sec. 9)[R4] for my related priority dispute on attention with Hinton. He was the reviewer of my 1990 paper[ATT2] his own work:[ATT3]

      Critique of 2018 Turing Award GANs[GAN0-1] (2010-2014) are actually a simple application[AC] of the adversarial curiosity (AC) principle from 1990[AC90,90b][AC20] (see also surveys[AC09-10]). This principle is now widely used for exploration in RL (e.g., Sec. C) and for image synthesis[GAN1] (also mentioned by ACM in Sec. XVIII). predictor NN minimizes its error, while the generator NN tries to make outputs that maximize this error: one net's loss is the other net's gain. 4 years before the GAN paper,[GAN1] a well-known 2010 survey[AC10] summarised the generative adversarial NNs of 1990 as follows: a whether the controller's (or generator's) output is in a given set.[AC20][AC] early adversarial machine learning settings[S59][H90] neither involved unsupervised NNs nor were about modeling data nor used gradient descent.[AC20]) Bengio et al. neither cited the original work[AC90,90b][AC20] nor corrected their erroneous claims[GAN1] about the other (1991).[PM1-2][AC20][R2][MIR](Sec. 5) Bloomberg,[AV1] their NIPS 2014 paper[GAN1] and some of the erroneous claims it made about my prior work.[AC20] Goodfellow eventually admitted that PM is adversarial (his paper[GAN1] still claims the opposite), but emphasized that it's not generative. However, the even earlier AC[AC90,90b][AC10][AC20] is both adversarial and generative (its generator contains probabilistic units[AC90] like in StyleGANs[GAN2]). When the authors[GAN1] I published one myself in the hopes of correcting the annals of history.[AC20] that they are instances of my earlier work.[R2][AC20] vanishing gradient problem,[MIR](Sec. 3)[VAN1] Bengio published his own,[VAN2] without citing Sepp. was settled in favor of Sepp.[VAN1] However, even after a common publication,[VAN3] Bengio published papers[VAN4][XAV] are poor indicators of truly pioneering work.[NAT1] (Margin note: Bengio states[YB20] that in 2018 he one must at least clarify it later,[DLC] Bengio also claims[YB20] that in 1995 my publications on exactly this topic date back to 1991-93.[UN0-2][UN] which I started in 1987[META1][META] long before Bengio that he did it before me.[R3] Bengio also writes[YB20] that in Regarding attention-based Transformers,[TR1-6] Bengio[DL3a] cites his own team (2014) for "soft attention" without citing my much earlier original work of 1991-1993 on soft attention and linear Transformers.[FWP,FWP0-2,6] Bengio has also heavily used our LSTM (see Sec. A-C), "gated recurrent units (GRU)"[LSTMGRU] for a variant of our vanilla LSTM architecture[LSTM2] (2000) which he did not cite although our work[LSTM2] was the one that introduced gated recurrent units. In addition, our team automatically evolved lots of additional LSTM variants and topologies already in 2009[LSTM7] without changing the name of the basic method. learn to count[LSTMGRU2] nor learn simple non-regular languages;[LSTMGRU2] they according to Google Brain.[LSTMGRU3]) unsupervised pre-training for deep NNs.[UN0-4][HIN](Sec. II)[MIR](Sec. 1) Hinton's paper[UN4] (2006) appeared long after my earlier work on this[UN0-2] the first NNs shown to solve very deep problems (see Sec. II above).[UN] It was published in 1991-92[UN1] when compute was about 1000 times more expensive than in 2006. survey (2015),[DL3][DLC] See also Sec. II & III. compressing or distilling one NN into another.[UN0-2][DIST1-2][MIR](Sec. 2) Hinton[DIST2] (2006) did not cite my much earlier original work on this (1991),[UN1][UN] not even in his later patent application fast weight programmers[FWP][FWP0-4a] through tensor-like outer products (1991-2016) and their motivation[FWP2][FWP4a][MIR](Sec. 8) (see also Sec. XVI above). learning sequential attention with NNs.[MIR](Sec. 9) Hinton[ATT3] (2010) our much earlier work on this[ATT1][ATT] although he was both reviewer and editor of my summary[ATT2] (1990; see Sec. XVI above).

      The ten priority disputes mentioned in the present Sec. XVII are not on the only ones.[R4] Remarkably, three of them are related to the 1991 paper[UN1][UN] which in many ways started what people now call deep learning, going beyond Most of them go back to work of 1990-91.[MIR] See Sec. I for additional related issues of credit assignment.

      Critique of 2018 Turing Award LeCun's team has made important contributions to CNNs since 1989.[CNN2,4] However, the basic CNN architecture with convolutional and downsampling layers is actually due to Fukushima (1979).[CNN1] NNs with convolutions were later (1987) combined by Waibel with weight sharing and backpropagation.[CNN1a] Waibel called this TDNN and All of this happened before LeCun's work on CNNs. See Sec. D above and Sec. 21 of the overview of our Annus Mirabilis 1990-1991.[MIR] at IJCNN 2011 in Silicon Valley, our DanNet[DAN][GPUCNN1-3] won the superhuman performance three times worse performance).[DAN1] Again see Sec. D. at ICPR 2012, our DanNet[GPUCNN1-3] won the medical imaging contest (Sept 2012, on detection of mitosis/cancer)[GPUCNN5,7,8] (before the similar AlexNet won ImageNet 2012[GPUCNN5][R6] and the similar VGG network[GPUCNN9] won ImageNet 2014). mitosis detection.[MGC][GPUCNN5,7,8] Many major companies are using it now. See Sec. D & VII. ACM also explicitly mentions speech recognition, speech synthesis,[AM16][DL1] All of these fields were heavily shaped in the 2010s by our non-CNN methods.[DL1][DL4][AM16][GSR][GSR15][GT16][WU][FB17] See Sec. A, B, VI, XI.

      Critique of 2018 Turing Award As mentioned in Sec. XII, backpropagation was actually proposed earlier as a learning method for NNs by Werbos (1982)[BP2-4] (see also Amari's work of 1977[BP6]). recent work.[DL3,DL3a][DLC] In 1960, Kelley already had a precursor of the algorithm.[BPA] Furthermore, many besides LeCun have worked "to speed up backpropagation algorithms"[DL1] (ACM's wording). More on the history of backpropagation can be found at Scholarpedia.[DL2][BP4]

      Critique of 2018 Turing Award However, "hierarchical feature representation" in deep learning networks is what Ivakhnenko & Lapa (1965)[DEEP1-2] (and also Fukushima[CNN1][DL2]) had long before LeCun. See Sec. D & II & XIII & V.

      Critique of 2018 Turing Award LeCun et al. neither cited the origins[BP1] (1970) of this widely used type of automatic differentiation for differentiable networks of modules[DL2][BP4-5][DLC] for such systems.[S80] See also Sec. XIX & XII. before LeCun who did not cite them. See also Pollack's even earlier relevant work.[PO87-90]

      (Furthermore, "complex networks of modules where backpropagation is performed" were the central theme of my much earlier habilitation thesis (1993).[UN2] For example, our adaptive subgoal generators (1991)[HRL0-2] were trained through end-to-end-differentiable chains of such modules.[MIR](Sec. 10) planning and reinforcement learning with recurrent neural world models (1990).[PLAN][MIR](Sec. 11) Same for my linear transformer-like fast weight programmers[FWP0-2][FWP][ATT][MIR](Sec. 8) since 1991 (see Sec. XVI) see "100 Authors against Einstein."[AH1] ad hominem attacks[AH2-3][HIN] "If you cannot dispute a fact-based message, attack the messenger himself."[HIN] award can ever change that.[HIN] and their co-workers have contributed useful improvements of deep learning methods.[CNN2,4][CDI][LAN][RMSP][XAV][ATT14][CAPS] whom they did not cite II, V, XII, XIX, XXI, XIII, XIV, XI, and XX, and 2). Sec. I, A, B, C, D, XVII, VI, and XVI). As emphasized earlier:[DLC][HIN] to self-correction,"[SV20] as is already the standard in other scientific fields. in popular science venues without peer review? For example, the narrator of a popular 2018 Bloomberg video[VID2] Germany and Switzerland (LSTM & CTC; see Sec. A) long before Hinton's methods. Similarly, in 2016, the NY Times published an article[NYT3] Google's original 2016 paper on Google Translate[WU] mentions LSTM over 50 times (see Sec. B). In ad hominem style,[AH2-3] claiming credit he doesn't deserve for many, many things",[NYT1] without LeCun also called the GANs of Bengio's team[GAN1] GANs are variations of my work in 1990.[AC90,90b][AC20][R2] According to Bloomberg,[AV2] Bengio has simply "denied my claims" without backing up his denial by any facts; see Sec. XVII. and forcefully contradict public figures who promote it."[FAKE] LBH, who called themselves the deep learning conspiracy,[DLC] Our LSTM paper[LSTM1] has got more citations than any paper by Bengio or LeCun,[R5] Hinton's most cited paper (2012) is the one on GPU-based CNNs.[GPUCNN4][R5] It follows our earlier work on supervised deep NNs (2010)[MLP1] unsupervised pre-training for deep NNs by myself [UN][UN0-3] and later championed by Hinton;[UN4][VID1] see Sec. D). Hinton (2012)[GPUCNN4] characterizes our deep and fast DanNet (2011)[GPUCNN1-3] as AlexNet won one;[R6] see Sec. D, XIV. The highly cited VGG network (2014)[GPUCNN9] Hinton's 2nd most cited paper[RUM][R5] of Hinton's paper,[RUM] adding citations for a book by Rumelhart & McClelland[R5]). Backpropagation is a previously invented method[BP1] whose origins of Ivakhnenko whom he has never cited;[DEEP1-2][R7-R8] see Sec. II, XIII. Bengio's 2nd most cited research paper is the one on GANs (2014),[GAN1] instances of my artificial curiosity (1990)[AC90,90b][AC20][R2] which he did not cite; see Sec. XVII. Hinton's highly cited papers on unsupervised pre-training for deep NNs (2006-)[UN4] by ours[UN0-2][UN] were preceded by Hanson's[Drop1-2] As recently as of 2021, ACM published yet another misleading deep learning "survey" by LBH,[DL3a] again heavily citing LBH without Consult the Executive Summary and Sec. I-XXI of this critique for more. So virtually all the algorithms that have attracted have their conceptual and technical roots in my labs in Munich and Lugano,[MOST] of deep learning MLPs since 1965[DEEP1-2] (see Sec. II, XX) and backpropagation (1960-70)[BPA][BP1] (see Sec. XIX, XII) and convolutional NNs since 1979[CNN1-4] (see Sec. XVIII, D). Our LSTM (1990s, see Sec. A, B; also for RL, 2003-, see Sec. C) → our Highway Net (May 2015) → ResNet (Dec 2015, see Sec. D). Our adversarial Artificial Curiosity (1990) → GANs (2010s, see Sec. XVII). our own unsupervised pre-training of deep NNs (1991, see Sec. II & III) for recurrent NNs in the 1990s → our LSTM (see Sec. A-C) and for feedforward NNs in 2010 → our DanNet (2011) → AlexNet (2012); VGG Net (2014) (see Sec. D). our LSTM brought essentially unlimited depth to supervised recurrent NNs in the 1990s; our Highway Nets[HW1-3] brought it to feedforward NNs in May 2015.[MOST] superior computer vision (2011, see Sec. D, XVIII), medical diagnosis (2012, see Sec. VII, XVIII), and many other applications.[DEC] speech recognition (with our CTC, 2007-15, see Sec. A), machine translation (2016, see Sec. B), robotics & video game players (2018-19, see Sec. C), and many other applications.[DEC] Fast Weight Programmers (1991, see Sec. XVI) are formally equivalent to linear Transformers (now popular in NLP). I, A, B, C, D, VII, XVIII.

      As mentioned earlier,[MIR](Sec. 21) it is not always clear[DLC] depth that really learned.[DEEP1-2][R8] Five years later, modern backpropagation

      Yes, this critique is also an implicit critique of certain other awards to LBH.[HIN] reddit.com/r/MachineLearning[R1-R12] (the largest machine learning forum with back then over 800k subscribers), many of them influenced by my overview.[MIR]

      Dr. LeCun himself is well aware of the challenges to scientific integrity in our field:[LECP] "... else cites."[LECP]

      Note that I am insisting on proper credit assignment not only in my own research field but also in quite disconnected areas,[HIN] as demonstrated by my numerous letters in this regard published in Science and Nature, e.g., on the history of aviation,[NASC1-2] the telephone,[NASC3] the computer,[NASC4-7] resilient robots,[NASC8] and scientists of the 19th century.[NASC9] AI scientists and AI historians equipped with artificial curiosity[SA17][AC90-AC20][PP-PP2]

      Creative Commons LicenseThanks to many expert reviewers for useful comments. Since science is about self-correction, let me know under juergen@idsia.ch if you can spot any remaining error. Many additional relevant publications can be found in my publication page and my arXiv page. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. J.  Schmidhuber (AI Blog, 2021). 3 decades of artificial curiosity & creativity. Our PDF. The first paper on planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks (more). PDF. More. PDF. PDF. PDF. PDF. (More on artificial scientists and artists.) IEEE link. PDF. With a brief summary of the generative adversarial neural networks of 1990[AC90,90b][AC20] (more). Preprint arXiv/1906.04493. Link. Link. [AIB] J. Schmidhuber. AI Blog. Includes variants of chapters of the AI Book. Blog of Werner Vogels, CTO of Amazon (Nov 2016): [ATT] J. Schmidhuber (AI Blog, 2020). 30-year anniversary of end-to-end differentiable sequential neural attention. Plus goal-conditional reinforcement learning. We had both hard attention (1990) and soft attention (1991-93).[FWP] Today, both types are very popular. PDF. PDF. More. PS. (PDF.) arXiv/1409.0473, 2014-16. Bloomberg, May 15, 2018. Bloomberg, May 17, 2018. PDF. HTML. PDF. Precursor of modern backpropagation.[BP1-4] PDF. Link. PDF. First application of backpropagation[BP1] to NNs (concretizing thoughts in his 1974 thesis). [BP4] J. Schmidhuber (AI Blog, 2014; updated 2020). Who invented backpropagation? More.[DL2] English version: [CNN1+]. More in Scholarpedia. Link. [CNN1a] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. First application of backpropagation[BP1][BP2] and weight-sharing PDF. Spatial Averaging.[CNN1] PDF. PDF. PDF. PDF. Beijing, 2014. Preprint arXiv:1402.3511 [cs.NE]. J. Schmidhuber (AI Blog, 2021). 10-year anniversary. In 2011, DanNet triggered the deep convolutional neural network (CNN) revolution. Named 1st superhuman result in 2011.[DAN1] J. Schmidhuber (AI Blog, 2011; updated 2021 for 10th birthday of DanNet): First superhuman visual pattern recognition. our artificial neural network called DanNet [DEC] J. Schmidhuber (AI Blog, 02/20/2020, revised 2021). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s. The [DIST1] J. Schmidhuber, 1991.[UN-UN2] More. Deep Learning. HTML. [DL3a] Y. Bengio, Y. LeCun, G. Hinton (2021). Turing Lecture: Deep Learning for AI. Communications of the ACM, July 2021. HTML. Local copy (HTML only). [DL4] J. Schmidhuber (AI Blog, 2017). Our impact on the world's most valuable public companies: Apple, Google, Microsoft, Facebook, Amazon... By greatly improved (CTC-based) on-device speech recognition (on the phone, not the server) LSTM. PDF. J. Schmidhuber (AI Blog, Nov 2020). 15-year anniversary: 1st paper with "learn deep" in the title (2005). Our deep reinforcement learning & neuroevolution solved problems of depth 1000 and more.[DL6] Soon after its publication, everybody started talking about "deep learning." Causality or correlation? Web site deeplearning.net of Y. Bengio's MILA (2015, retrieved May 2020; compare the version in the Internet Archive), referring to Hinton's[UN4] and Bengio's[UN5] unsupervised pre-training for deep NNs[UN4] (2006) although this type of deep learning dates back to 1991.[UN1-2][UN] II & XVII & III. [DLC] J. Schmidhuber (AI Blog, June 2015). Critique of Paper by "Deep Learning Conspiracy" (Nature 521 p 436). arxiv:1312.5602. Link. Alphastar has a "deep LSTM core." arXiv:1808.03578, 2018. used LSTM over 4 billion automatic translations per day (The Verge, August 4, 2017); Facebook blog by J.M. Pino, A. Sidorov, N.F. Ayan (August 3, 2017) PDF. J.  Schmidhuber (AI Blog, 26 March 2021). alternative[FWP0-1] to recurrent NNs. the fast weights[FAST,FASTa] of Such Fast Weight Programmers[FWP0-6,FWPMETA1-7] can learn to memorize past data, e.g., by computing fast weight changes through additive outer products of self-invented activation patterns[FWP0-1] (now often called keys and values for self-attention[TR1-6]). The similar Transformers[TR1-2] combine this with projections linear Transformers or Performers[TR5-6] In 1993, I introduced the attention terminology[FWP2] now used in this context,[ATT] and RNNs that program themselves. PDF. PDF. HTML. Pictures (German). PDF. Preprint: arXiv:1811.12143. PDF. PDF. Like [FWP0-2]. Preprint: arXiv:2003.08165. PDF. HTML overview. Linear Transformers Are Secretly Fast Weight Programmers. ICML 2021. Preprint: arXiv:2102.11174. Preprint: arXiv:2106.06295 (June 2021). PDF. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here. Preprint arXiv:2012.14905 [cs.LG], 2020. Report arXiv:2011.07831 [cs.AI], 2020. Google Research Blog, Sep 2015, see also Aug 2015 Google's speech recognition based on CTC and LSTM. Alphr Technology, Jul 2015, or 9to5google, Jul 2015 WIRED, Sep 2016, siliconANGLE, Sep 2016 Blog post, Internet Archive, 2010. A blog post describing the basic ideas[AC][AC90, AC90b][AC20] of GANs. Description of GANs that does not cite the original work of 1990[AC][AC90,AC90b][AC20][R2] (also containing wrong claims about Predictability Minimization[PM0-2][AC20]). Link. This was number 1 on Hacker News. Frankfurter Allgemeine Zeitung, 16/6/2021. Preprint arXiv/2005.14165. for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. PDF. ArXiv preprint. win four important computer vision competitions 2011-2012 before others won any PDF. HTML overview. competitor.[DAN1] This led to massive interest from industry. [GPUCNN3] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, July 2012. PDF. Longer TR of Feb 2012: arXiv:1202.2745v1 [cs.CV]. More. PDF. J. Schmidhuber (AI Blog, 2017; updated 2021 for 10th birthday of DanNet): History of computer vision contests won by deep CNNs since 2011. DanNet won 4 of them in a row before the similar AlexNet/VGG Net and the Resnet (a Highway Net with open gates) joined the party. Today, deep CNNs are standard in computer vision. PDF. PDF. first deep learner to win a medical imaging contest (2012). HTML. [HIN] J. Schmidhuber (AI Blog, 2020). Critique of Honda Prize for Dr. Hinton. Science must not allow corporate PR to distort the academic record. PDF. North-Holland, 1991. PDF. Extending TR FKI-129-90, TUM, 1990. PDF. PDF. Preprints arXiv:1505.00387 (May 2015) and arXiv:1507.06228 (July 2015). Also at NIPS 2015. The LSTM with forget gates[LSTM2] for RNNs.) Resnets[HW2] are a version of this where the gates are always open: g(x)=t(x)=const=1. Highway Nets perform roughly as well as ResNets[HW2] on ImageNet.[HW3] Highway layers are also often used for natural language processing, where the simpler residual layers do not work as well.[HW3] More. Link. arXiv:1512.03385 (Dec 2015). Residual nets are a version of Highway Nets[HW1] More. arxiv:1612.07771 (2016). Also at ICLR 2017. Preprint arXiv:1704.04760 PDF. PDF. arXiv:1607.06450, 2016. A New Publishing Model in Computer Science. Local copy (HTML only). [LEI21] J. Schmidhuber (AI Blog, 2021). 375th birthday of Leibniz, founder of computer science. Frankfurter Allgemeine Zeitung (FAZ), 17/5/2021. FAZ online: 19/5/2021. PDF. [LSTM1] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. PDF. Based on [LSTM0]. More. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. PDF. Preprint: arxiv:1506.07452. PDF. J. Schmidhuber (AI Blog, Dec 2020). 10-year anniversary of our journal paper on deep reinforcement learning with policy gradients for LSTM (2007-2010). Recent PDF. Preprint arXiv:1805.04908. Architectures. Preprint arXiv:1703.03906 J. Schmidhuber (AI Blog, 2020). 1/3 century anniversary of Searchable PDF scan (created by OCRmypdf which uses LSTM). HTML. better GP methods through Meta-Evolution. More. [MIR] J. Schmidhuber (AI Blog, 2019). Deep Learning: Our Miraculous Year 1990-1991. Preprint arXiv:2005.05744, 2020. Computation 22(12): 3207-3220, 2010. ArXiv Preprint. (AI Blog, Sep 2020). 10-year anniversary of supervised deep learning breakthrough (2010). No unsupervised pre-training. By 2010, when compute was 100 times more expensive than today, both our feedforward NNs[MLP1] J.  Schmidhuber (AI Blog, 2021). The most cited neural networks all build on work done in my labs. Foundations of the most popular NNs originated in my labs at TU Munich and IDSIA. Here I mention: (1) Long Short-Term Memory (LSTM), (2) ResNet (which is our earlier Highway Net with open gates), (3) AlexNet and VGG Net (both citing our similar earlier DanNet: the first deep convolutional NN to win image recognition competitions), Adversarial Artificial Curiosity), and (5) variants of Transformers (linear Transformers are formally equivalent to my earlier Fast Weight Programmers). Annus Mirabilis of 1990-1991.[MIR] Preprint arXiv:1611.01578 (PDF), 2017. [NASC1] J. Schmidhuber. First Pow(d)ered flight / plane truth. Correspondence, Nature, 421 p 689, Feb 2003. [NASC3] J. Schmidhuber. The last inventor of the telephone. Letter, Science, 319, no. 5871, p. 1759, March 2008. Correspondence, Nature, vol 483, p 541, March 2012, doi:10.1038/483541b. Letter, Science, vol 336, p 1639, June 2012. See also comment on response by A. Hodges (DOI:10.1126/science.336.6089.1639-a) [NASC6] J. Schmidhuber. Colossus was the first electronic digital computer. Correspondence, Nature, 441 p 25, May 2006. [NASC7] J. Schmidhuber. Turing's impact. Correspondence, Nature, 429 p 501, June 2004 [NASC8] J. Schmidhuber. Prototype resilient, self-modeling robots. Correspondence, Science, 316, no. 5825 p 688, May 2007. [NASC9] J. Schmidhuber. Comparing the legacies of Gauss, Pasteur, Darwin. Correspondence, Nature, vol 452, p 530, April 2008. HTML. Link. NY Times article NY Times article Learning Dexterous In-Hand Manipulation. arxiv:1312.5602 (PDF). arxiv:1912.06680. An LSTM composes 84% of the model's total parameter count. 2018. An LSTM with 84% of the model's total parameter count was the core of OpenAI Five. PDF. HTML. J. Schmidhuber (AI Blog, 2020). 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990). This work also introduced high-dimensional reward signals, deterministic policy gradients for RNNs, the GAN principle Based on TR FKI-126-90 (1990).[AC90] More. PDF. Partially based on TR FKI-126-90 (1990).[AC90] Report arXiv:1210.0118 [cs.AI], 2015. One Big Net For Everything. Preprint arXiv:1802.08864 [cs.AI], Feb 2018. Preprint: arXiv:1809.01999. Github: World Models. minimization. TR CU-CS-565-91, Univ. Colorado at Boulder, 1991. PDF. More. 1991. PDF. More. PDF. More. arXiv:1112.5309 [cs.AI] First Experiments with PowerPlay. arXiv:1210.8385 [cs.AI]. [R1] Reddit/ML, 2019. Hinton, LeCun, Bengio receive ACM Turing Award. [R2] Reddit/ML, 2019. J. Schmidhuber really had GANs in 1990. [R3] Reddit/ML, 2019. NeurIPS 2019 Bengio Schmidhuber Meta-Learning Fiasco. [R4] Reddit/ML, 2019. Five major deep learning papers by G. Hinton did not cite similar earlier work by J. Schmidhuber. [R5] Reddit/ML, 2019. The 1997 LSTM paper by Hochreiter & Schmidhuber has become the most cited deep learning research paper of the 20th century. [R6] Reddit/ML, 2019. DanNet, the CUDA CNN of Dan Ciresan in J. Schmidhuber's team, won 4 image recognition challenges prior to AlexNet. [R7] Reddit/ML, 2019. J. Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970. [R8] Reddit/ML, 2019. J. Schmidhuber on Alexey Ivakhnenko, godfather of deep learning 1965. [R9] Reddit/ML, 2019. We [R11] Reddit/ML, 2020. Schmidhuber: Critique of Honda Prize for Dr. Hinton [R12] Reddit/ML, 2020. J. Schmidhuber: Critique of Turing Award for Drs. Bengio & Hinton & LeCun [R15] Reddit/ML, 2021. J. Schmidhuber's work on fast weights from 1991 is similar to linearized variants of Transformers Preprint arXiv/1311.2524, Nov 2013. Preprint arXiv/1703.06870, 2017. PDF. This experimental analysis of backpropagation did not cite the origin of the method,[BP1-4] also known as the reverse mode of automatic differentiation. Link. The Past, Present and Future of Artificial Intelligence. PDF. PDF. ACM's justification of the 2018 A.M. Turing Award (announced in 2019). WWW link. Local copy 1 (HTML only). Local copy 2 (HTML only). [T20a] J. Schmidhuber (AI Blog, 25 June 2020). Critique of 2018 Turing Award for Drs. Bengio & Hinton & LeCun. The first version of the present critique. Link. [TUR21] J. Schmidhuber (AI Blog, Sep 2021). Turing Oversold. It's not Turing's fault, though. J. Schmidhuber (AI Blog, 2021). 30-year anniversary. 1991: First very deep learning with unsupervised pre-training. Unsupervised PDF. 1992. Based on TR FKI-148-91, TUM, 1991.[UN0] PDF. approaches are now widely used. More. [UN2] J. Schmidhuber. Habilitation thesis, TUM, 1993. PDF. can be found here (depth > 1000). 2006. PDF. Link. [VAN1] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TUM, 1991 (advisor J. Schmidhuber). PDF. More on the Fundamental Deep Learning Problem. PDF. [VAN4] Y. Bengio. Neural net language models. Scholarpedia, 3(1):3881, 2008. Link. Link. Youtube video [see 28:16]. But in 2010, our team showed[MLP1-2] unsupervised pre-training is not necessary Youtube video, 2018. Preprint arXiv:1609.08144 (PDF), 2016. Based on LSTM which it mentions at least 50 times. WWW link (retrieved 15 May 2020). Local copy (plain HTML only). a general, practical, program-controlled computer. PDF. J. Schmidhuber (AI Blog, 2021). 80th anniversary celebrations: 1941: Konrad Zuse completes the first working general computer, based on his 1936 patent application.

      Deep Learning: Our Miraculous Year 1990-1991 Menu directory status & updates copyrights

    30. Menu directory status & updates copyrights ChatGPT and AI Usage Survey (Students)
      Request edit access
      What Menu directory status & updates copyrights ChatGPT and AI Usage Survey (Teachers)
      Request edit access
      What Menu
    31. Menu
    32. Menu "... Consciousness, at its simplest, is sentience and awareness of internal and external existence.[1] However, its nature has led to millennia of analyses, explanations and debates by philosophers, theologians, linguists, and scientists. Opinions differ about what exactly needs to be studied or even considered consciousness. ..."(Wiki2023)
    33. Only a very small number of theories of consciousness are listed on this webPage, compared to the vast number of [paper, book]s on the subject coming out all of the time. "Popular theories" as listed on Wikipedia, are shown, assuming that this will be important for non-experts. But the only ones that really count for this webSite are the "Priority model of consciousness".
      Readers will have completely different [interest, priority]s than I, so they would normally have different "Priority model of consciousness", and rankings of the conscousness theories. To understand my selections and rankings, see Introduction to this webSite.
    34. this webSite's Questions: Grossberg's c-ART, Transformer NNs, and consciousness?. I like the description in Wikipedia (Wiki2023):
      The following additional definitions are also quoted from (Wiki2023) :
      ..." (Wiki2023)
      ..." (Wiki2023)
      ..." (Wiki2023)
      Grossberg's concepts are NOT normally listed in [compilations, reviews] of consciousness, which is a [puzzle, failure] that I address separately.
      16Jul2023 I am currently lacking a coherent overall webPage for Grossberg's Consciousness. In the meantime refer to the very detailed listing of consciousness and other themes as a starting point to peruse for Grossberg's ideas. This webPage is a compilation of themes extracted from files listing [chapter, section, figure, table, comment]s.
      The following listing is taken from What is consciousness: from historical to Grossberg, and repeats some of the points in this section above : conscious ART (cART), etc
    35. A surprisingly small number of neural architectures can simulate [extensive, diverse] [neuro, pyscho]logical data at BOTH the [sub, ]conscious levels, and for [perception, action] of [sight, auditory, touch, language, cognition, emotion, etc]. This is similar to what we see in physics.
    36. [extensive, diverse] ex-bio applications have been successfully [developed, applied], based on Grossberg etal's computational models.
    37. see simple grepStr search results : 'ART|cART|pART|ARTMAP|ARTSTREAM|ARTPHONE|ARTSCAN|dARTSCAN|pARTSCAN|ARTSCENE|ARTSTREAM|ARTWORD|cARTWORD|LAMINART|PARSE|SMART|START|nSTART' ..."(Wiki2023)
      Byoung-Kyong Min 2010 "A Thalamic reticular networking model of consciousness"
      (Wiki2023)
      Wikipedia: Models of consciousness, retrieved Apr2023 (Wiki2023)
      ..." (Wiki2023)
      ..." (Wiki2023)
      "... The Neural correlates of consciousness (NCC) formalism is used as a major step towards explaining consciousness. The NCC are defined to constitute the minimal set of neuronal events and mechanisms sufficient for a specific conscious percept, and consequently sufficient for consciousness. In this formalism, consciousness is viewed as a state-dependent property of some undefined complex, adaptive, and highly interconnected biological system.[3][4][5] ..." (Wiki2023, full article: Wiki2023 - Neural_correlates_of_consciousness, also cited by Grossberg 2021)
      Another idea that has drawn attention for several decades is that consciousness is associated with high-frequency (gamma band) oscillations in brain activity. This idea arose from proposals in the 1980s, by Christof von der Malsburg and Wolf Singer, that gamma oscillations could solve the so-called binding problem, by linking information represented in different parts of the brain into a unified experience.[80] Rodolfo Llinás, for example, proposed that consciousness results from recurrent thalamo-cortical resonance where the specific thalamocortical systems (content) and the non-specific (centromedial thalamus) thalamocortical systems (context) interact in the gamma band frequency via synchronous oscillations.[81] ..." (Wiki2023 - Consciousness#Neural_correlates)
      Howell 19Jul2023 Note that Grossberg's ART predictions are supported by experiments by a number of researchers including Wolf Singer (see Quoted text from (Grossberg 2021)).
      "... Integrated Information Theory (IIT) offers an explanation for the nature and source of consciousness. Initially proposed by Giulio Tononi in 2004, it claims that consciousness is identical to a certain kind of information, the realization of which requires physical, not merely functional, integration, and which can be measured mathematically according to the phi metric. ..." (UTM - Integrated information theory)
      "... Integrated information theory (IIT) attempts to provide a framework capable of explaining why some physical systems (such as human brains) are conscious,[1] why they feel the particular way they do in particular states (e.g. why our visual field appears extended when we gaze out at the night sky),[2] and what it would take for other physical systems to be conscious (Are other animals conscious? Might the whole Universe be?).[3] ... In IIT, a system's consciousness (what it is like subjectively) is conjectured to be identical to its causal properties (what it is like objectively). Therefore it should be possible to account for the conscious experience of a physical system by unfolding its complete causal powers (see Central identity).[4] ... Specifically, IIT moves from phenomenology to mechanism by attempting to identify the essential properties of conscious experience (dubbed "axioms") and, from there, the essential properties of conscious physical systems (dubbed "postulates"). 3..." (Wiki2023 - Integrated information theory)
      Wikipedia lists numerous criticisms of IIT, but I have not yet quoted from that, other than to mention the authors : Wikipedia: Models of consciousness
      "... Sociology of human consciousness uses the theories and methodology of sociology to explain human consciousness. The theory and its models emphasize the importance of language, collective representations, self-conceptions, and self-reflectivity. It argues that the shape and feel of human consciousness is heavily social. ..."(Wiki2023, full webPage Wiki2023
      "... Daniel Dennett proposed a physicalist, information processing based multiple drafts model of consciousness described more fully in his 1991 book, Consciousness Explained. ..." (Wiki2023, full webPage Wiki2023)
      ..." (Wiki2023)
      "... Functionalism is a view in the theory of the mind. It states that mental states (beliefs, desires, being in pain, etc.) are constituted solely by their functional role – that is, they have causal relations to other mental states, numerous sensory inputs, and behavioral outputs. ..." (Wiki2023, full webPage Wiki2023)
      "... Electromagnetic theories of consciousness propose that consciousness can be understood as an electromagnetic phenomenon that occurs when a brain produces an electromagnetic field with specific characteristics.[7][8] Some electromagnetic theories are also quantum mind theories of consciousness.[9] ..." (Wiki2023)
      "... "No serious researcher I know believes in an electromagnetic theory of consciousness,"[16] Bernard Baars wrote in an e-mail.[better source needed] Baars is a neurobiologist and co-editor of Consciousness and Cognition, another scientific journal in the field. "It's not really worth talking about scientifically,"[16] he was quoted as saying. ..." (Wiki2023)
      Stuart Hameroff separately worked in cancer research and anesthesia, which gave him an interest in brain processes. Hameroff read Penrose's book and suggested to him that microtubules within neurons were suitable candidate sites for quantum processing, and ultimately for consciousness.[30][31] Throughout the 1990s, the two collaborated on the Orch OR theory, which Penrose published in Shadows of the Mind (1994).[19] ..."Wiki2023
      rationalwiki.org presents a hard-nosed critique of various "quantum consciousness" theories, from which the following quote is taken :
      Menu
    38. "... Large Language Models (LLMs) have been transformative. They are pre-trained foundational models that are self-supervised and can be adapted with fine-tuning to a wide range of natural language tasks, each of which previously would have required a separate network model. This is one step closer to the extraordinary versatility of human language. GPT-3 and more recently LaMDA can carry on dialogs with humans on many topics after minimal priming with a few examples. However, there has been a wide range of reactions and debate on whether these LLMs understand what they are saying or exhibit signs of intelligence. This high variance is exhibited in three interviews with LLMs reaching wildly different conclusions. A new possibility was uncovered that could explain this divergence. What appears to be intelligence in LLMs may in fact be a mirror that reflects the intelligence of the interviewer, a remarkable twist that could be considered a Reverse Turing Test. If so, then by studying interviews we may be learning more about the intelligence and beliefs of the interviewer than the intelligence of the LLMs. As LLMs become more capable they may transform the way we interact with machines and how they interact with each other. Increasingly, LLMs are being coupled with sensorimotor devices. LLMs can talk the talk, but can they walk the walk? A road map for achieving artificial general autonomy is outlined with seven major improvements inspired by brain systems and how LLMs could in turn be used to uncover new insights into brain function. ..." (Sejnowski 2022)
      Sejnowski's idea is very interesting, judging by many [science, computer, philosophy, engineering, policy, public] commentators, for whom this is a very emotionally-laden subject that seems to drive [fear, suppression]. The case of Blake Lemoine is a good example. How far can LLMs go in assessing human intelligence, given their huge "codified databases"? Would they be able to go beyond our traditional measures of intelligence in both [depth, diversity]?
      Menu
    39. Menu
    40. Navigation: [menu, link, directory]s
    41. Theme webPage generation by bash script
    42. Notation for [chapter, section, figure, table, index, note]s
    43. incorporate reader questions into theme webPages
    44. Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
      The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

      Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

      Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

      These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

      This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

      The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

      I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

      Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

      The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

      If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

      The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

      Howell: bold added for emphasis.
      (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
      see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
    45. image p059fig02.05 Bowed serial position curve. This kind of data emphasizes the importance of modelling how our brains give rise to our minds using nonlinear systems of differential equations.
      || Effects of [inter, intra]trial intervals (Hovland 1938). # of errors vs list position. [w (sec), W (sec)] = (2 6) (4 6) (2 126) (4 126). Nonoccurance of future items reduces the number of errors in response to past items. Thes data require a real-time theory for their explanation! that is, DIFFERENTIAL equations.
    46. image p064fig02.10 The Shunting Model includes upper and lower bounds on neuronal activities. These bound have the effect of multiplying additive terms by excitatory and inhibitory automatic gain terms that enable such models to preserve their sensitivity to inputs whose size may vary greatly in size through time, while also approximately normalizing their total activities.
      || STM: Shunting Model (Grossberg, PNAS 1967, 1968). Mass action in membrane equations. Bi/Ci -> xi(t) -> O -> -Fi/Ei. Bounded activations, automatic gain control. d[dt: xi(t)] = - Ai*xi + (Bi - Ci*xi)sum[j=1 to n: fj(xj(t))*Dji*yji*zji + Ii] - (Ei*Xi + Fi)*sum[j=1 to n: gj(xj)*Gji*Yji*Zji + Ji]. Includes the Additive Model.
    47. image p064fig02.11 Medium-Term Memory (MTM) and Long-Term Memory (LTM) equations complement the Additive and Shunting Models of STM. MTM is typically defined by a chemical transmitter that is released from the synaptic knobs of a neuron (Figure 2.03). Its release or inactivation in an activity-dependent way is also called habituation. LTM defines how associative learning occurs between a pair of neurons whose activities are approximately correlated through time. See the text for details.
      || Medium and Long Term memory.
      MTMhabituative transmitter gated[dt: yki(t)] = H*(K - yki) - L*fk(xk)*yki
      LTMgated steepest descent learningd[dt: zki(t)] = Mk*fk(xk)*(hi(xi) - zki)
    48. image p068fig02.14 Hodgkin and Huxley developed a model to explain how spikes travel down the squid giant axon.
      || Neurophysiology (single cell): spike potentials in squid giant axon (Hodgekin, Huxley 1952, Nobel Prize). time -> (dendrites -> cell body -> axon).
      C*dp[dt: V] = α*dp^2[dX^2: V] + (V(+) - V)*g(+) + (V(-) - V)*g(-) + (V^p - V)*g^p
      g(+) = G(+)(m,h), g(-) = G(-)(n), G^p = const, [m, h, n] - ionic processes, V - voltage
      Precursor of Shunting network model (Rail 1962). (Howell: see p075fig02.24 Membrane equations of neurophysiology. Shunting equation
    49. image p074fig02.23 The equations for a shunting on-center off-surround network. Shunting terms lead to many beautiful and important properties of these networks, which are found ubiquitously, in one form or another, in all cellular tissues.
      || Shunting on-center off-surround network.
      Mass action: d[dt: xi] = -A*xi +(B - xi)*Ii -xi*sum[k≠i: Ik]
      Turn on unexcited sitesTurn off excited sites
      At equilibrium:
      0 = d[dt: xi] = -(A + Ii + sum[k≠i: Ik])*xi + B*Ii = -(A + I)*xi + B*Ii
      xi = B*Ii/(A + I) = B*θi*I/(A + I) = θi* B*I/(A + I)No saturation!
      Infinite dynamical range
      Automatic gain control
      Compute ratio scale
      Weber law
      x = sum[k-1 to n: xk] = B*I/(A + I) ≤ B Conserve total activity
      NORMALIZATION
      Limited capacty
      Real-time probability
    50. image p075fig02.24 The membrane equations of neurophysiology describe how cell voltages change in response to excitatory, inhibitory, and passive input channels. Each channel is described by a potential difference multiplied by a conductance. With the special choices shown in the lower right-hand corner, this equation defines a feedforward shuntin on-center off-surround network.
      || Membrane equations of neurophysiology.
      C*dp[dt] = (V(+) - V)*g(+) +(V(-) - V)*g(-) +(V(p) - V)*g(p)
      Shunting equation (not additive)
      V Voltage
      V(+), V(-), V(p) Saturating voltages
      g(+), g(-), g(p) Conductances
      V(+) = B, C = 1; V(-) = V(p) = 0; g(+) = Ii; g(-) = sum[k≠i: Ik];
      lower V: V(+) = V(p) Silent inhibition, upper V: V(+). (Howell: see p068fig02.14 Grossberg's comment that Hodgkin&Huxley model was a "... Precursor of Shunting network model (Rail 1962) ...").
    51. image p079fig02.32 Matching amplifies the matched pattern due to automatic gain control. See terms I and J in the equation.
      || Substrate of resonance. Match (in phase) of BU and TD input patterns AMPLIFIES matched pattern due to automatic gain control by shunting terms. J = sum[i: Ji], I = sum[i: Ii], θi = (Ii + Ji)/(I + J)
      xi = (B + C)*(I + J)/(A + I + J)*[θi -C/(B + C)]
      Need top-down expectations to be MODULATORY.
    52. image p202fig05.17 This figure summarizes the simplest equations whereby the adaptive weights of a winning category learn the input pattern that drove it to win, or more generally a time-average of all the input patterns that succeeded in doing so.
      || Geometry of choice and learning, learning trains the closest LTM vector
    53. image p359fig10.07 The two bottom-up pathways from LGN to layer 4 of V1 can together activate layer 4 and contrast-normalize layer 4 responses.
      || Bottom-up contrast normalization (Grossberg 1968, 1973; Sperling, Sondhi 1968; Heeger 1992; Douglas etal 1995; Shapley etal 2004). Together, direct LGN-to-4 path and 6-to-4 on-center off-surround provide contrast normalization if cells obey shunting or membrane equation dynamics.
    54. image p501fig13.26 A simple differential equation describes the processes of transmitter accumulation and release that do their best, at a finite rate, to carry out unbiased transduction.
      || Transmitter accumulation and release. Transmitter y cannot be restored at an infinite rate: T = S*ym y ~= B, Differential equations: d[dt: y] = A*(B - y) - S*y = accumulate - release. Transmitter y tries to recover to ensure unbiased transduction. What if it falls behind? Evolution has exploited the good properties that happen then.
    55. image p505fig13.33 An unexpected event can disconfirm ongoing processing by triggering a burst of nonspecific arousal that causes antagonistic rebounds in currently active gated dipoles, whether cognitive or affective.
      || Novelty reset: rebound to arousal onset. 1. Equilibrate to I and J: S1 = f(I+J); y1 = A*B/(A+S1); S2 = f(I+J); y2 = A*B/(A+S2);. 2. Keep phasic input J fixed; increase arousal I to I* = I + ∆I: (a) OFF reaction if T1 < T2; OFF = T2 - T1 = f(I*+J)*y2 - f(I*)*y1 = { A*B*(f(I*) - f(I*+J)) - B*(f(I*)*f(I+J) - f(I)*f(I*+J)) } / (A+f(I)) / (A + f(I+J)). 3. How to interpret this complicated equation?
    56. image p580fig16.05 Macrocircuit of the GridPlaceMap model, which can learn both 2D grid cells and place cells in response to realistic trajectories of navigating rats using a hierarchy of SOMs with identical equations.
      || GridPlaceMap model: rate-based and spiking (Pilly, Grossberg 2012). Pre-wired 1D stripe cells, learns both 2D frid and place cells! Same laws for both; both select most frequent and energetic inputs. Place cells emerge gradually in response to developing grid cells. [place-> grid-> stripe] cells-> path integration-> vestibular signals
    57. image p586fig16.16 In the place cell learning model of (Gorchetnikov, Grossberg 2007), three populations of five cells each of entorhinal grid cells (only two are shown) with different spatial periods input to the model's dentate gyrus. The grid cells are one-dimensional and defined algorithmically. A model dentate gyrus granule cell that receives strong projections from all three grid cell scales fires (green cell) and activates a recurrent inhibitory interneuron that inhibits other granule cells. It also generates back-propagating action potentials that trigger learning in the adaptive weights of the projections from the grid cells, thereby causing learning of place cell receptive fields.
      || Grid-to-place Self-Organizing map (Gorchetnikov, Grossberg 2007). Formation of place cell fields via grid-to-place cell learning. Least common multiple: [grid (cm), place (m)] scales: [40, 50, 60 (cm); 6m], [50, 60, 70 (cm); 21m], [41, 53, 59 (cm); 1.282 km]. Our simulations: [40, 50 (cm); 2m], [44, 52 (cm); 5.72m]. Our SOM: Spiking Hodgkin-Huxley membrane equations; Nonlinear choice by contrast-enhancing recurrent on-center off-surround net;. Choice triggers back-propagating action potentials that induce STDP-modulated learning on cell dendrites.
    58. image p593fig16.25 Spectral Spacing Model STM, MTM, and LTM equations. The rate spectrum that determines the dorsoventral gradient of multiple grid cell properties is defined by μm.
      || Spectral Spacing Model equations. [STM, MTM, LTM]. μm = rate spectrum.
    59. Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
      The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

      Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

      Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

      These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

      This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

      The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

      I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

      Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

      The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

      If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

      The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

      Howell: bold added for emphasis.
      (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
      see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
    60. image pxvifig00.01 Macrocircuit of the visual system
    61. image p087fig03.01 A macrocircuit of key visual processes (in green) and the cortical areas in which they primarily occur (in red), from the retina to the Prefrontal Cortex (PFC), including both the What and Where cortical streams. The [bottom-up, horizontal, and top-down] interactions help each of these processes to overcome computationally complementary processing deficiencies that they would experience without them, and aso to read-out top-down expectations that help to stabilize learning while they focus attention on salient objects and positions.
      || Emerging unified theory of visual intelligence. [What, Where] streams. Bottom-up and top-down interactions overcome COMPLEMENTARY processing deficiencies.
    62. image p168fig04.44 Macrocircuit of the main boundary and surface formation stages that take place from the lateral geniculate nucleus, or LGN, through cortical areas [V1, V2, V4]. See the text for details.
      ||
      left eyebinocularright eye
      V4 binocular surface
      V2 monocular surfaceV2 layer 2/3 binocular boundaryV2 monocular surface
      V2 layer 4 binocular boundary
      V1 monocular surfaceV1 monocular boundaryV1 binocular boundaryV1 monocular boundaryV1 monocular surface
      LGNLGN
    63. image p228fig05.37 A macrocircuit of the neurotrophic Spectrally Timed ART, or nSTART, model. I developed nSTART with my PhD student Daniel Franklin. It proposes how adaptively timed learning in the hippocampus, bolstered by Brain Derived Neurotrophic Factor, or BDNF, helps to ensure normal memory consolidation.
      || habituative gates, CS, US, Thalamus (sensory cortex, category learning, conditioned reinforcer learning, adaptively time learning and BDNF), Amygdala (incentive motivation learning), Hippocampus (BDNF), Prefrontal Cortex (attention), Pontine nuclei, Cerebellum (adaptively timed motor learning)
    64. image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
      ||
    65. image p346fig09.16 A macrocircuit of some of the main brain regions that are used to move the eyes. Black boxes denote areas belonging to the saccadic eye movement systes (SAC), white boxes the smooth pursuit eye system (SPEM), and gray boxes, both systems. The abbreviations for the different brain regions are: LIP - Lateral Intra-Parietal area; FPA - Frontal Pursuit Area; MST - Middle Superior Temporal area; MT - Middle Temporal area; FEF - Frontal Eye Fields; NRPT - Nucleus Reticularis Tegmenti Pontis; DLPN - Dorso-Lateral Pontine Nuclei; SC - Superior Colliculus; CBM - CereBelluM; MVN/rLVN - Medial and Rostro-Lateral Vestibular Nucleii; PPRF - a Peri-Pontine Reticular Formation; TN - Tonic Neurons
      ||
    66. image p436fig12.30 The conscious ARTWORD, or cARTWORD, laminar cortical speech model simulates how future context can disambiguate noisy past speech sounds in such a way that the completed percept is consciously heard to proceed from past to future as a feature-item-list resonant wave propagates through time.
      || cARTWORD: Laminar cortical model macrocircuit (Grossberg, Kazerounian 2011) Simulates PHONEMIC RESTORATION: Cognitive Working Memory (processed item sequences) - [Excitatory-> inhibitory-> habituative-> adaptive filter-> adaptive filter-> adaptive filter with depletable synapse-> Acoustic [item, feature]
    67. image p474fig12.70 The kind of model macrocircuit that was used in (Grossberg, Stone 1986) to explain lexical decision task data.
      || inputs-> A1 <-> A2 oconic sensory features <-> A3 item and order in sensory STM <-> A4 list parsing in STM (masking field) <-> A5 semantic network (self-feedback). [A4, A5] <-> V* visual object recognition system. M1-> [outputs, A1]. M1 <-> M2 iconic motor features <-> M3 item and order in motor STM. A2-> M2. A3-> M3.
    68. image p481fig13.01 Macrocircuit of the functional stages and anatomical interpretations of the Cognitive-Emotional-Motor, or CogEM, model.
      || Drive-> hypothalamus value categories <-> amygdala incentive motivational learning-> Orbitofrontal cortex- object-value categories <-> sensory cortex- invariant object categories- conditioned reinforcer learning-> amygdala-> hypothalamus.
    69. image p520fig14.02 Macrocircuit of the main brain regions, and connections between them, that are modelled in the unified predictive Adaptive Resonance Theory (pART) of cognitive-emotional and working memory dynamics. Abbreviations in red denote brain regions used in cognitive-emotional dynamics. Those in green denote brain regions used in working memory dynamics. Black abbreviations denote brain regions that carry out visual perception, learning and recognition of visual object categories, and motion perception, spatial representation and target tracking. Arrow denote non-excitatory synapses. Hemidiscs denote adpative excitatory synapses. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown. Also not shown are output signals from cortical areas to motor responses. V1: striate, or primary, visual cortex; V2 and V4: areas of prestriate visual cortex; MT: Middle Temporal cortex; MST: Medial Superior Temporal area; ITp: posterior InferoTemporal cortex; ITa: anterior InferoTemporal cortex; PPC: Posterior parietal Cortex; LIP: Lateral InterParietal area; VPA: Ventral PreArculate gyrus; FEF: Frontal Eye Fields; PHC: ParaHippocampal Cortex; DLPFC: DorsoLateral Hippocampal Cortex; HIPPO: hippocampus; LH: Lateral Hypothalamus; BG: Basal Ganglia; AMYG: AMYGdala; OFC: OrbitoFrontal Cortex; PRC: PeriRhinal Cortex; VPS: Ventral bank of the Principal Sulcus; VLPFC: VentroLateral PreFrontal Cortex. See the text for further details.
      ||
    70. image p532fig14.08 Macrocircuit of the ARTSCENE Search neural model for learning to search for desired objects by using the sequences of already experienced objects and their locations to predict what and where the desired object is. V1 = First visual area or primary visual cortex; V2 = Second visual area; V4 = Fourth visual area; PPC = Posterior Parietal Cortex; ITp = posterior InferoTemporal cortex; ITa = anterior InferoTemporal cortex; MTL = Medial Temporal Lobe; PHC = ParaHippoCampal cortex; PRC = PeriRhinal Cortex; PFC = PreFrontal Cortex; DLPFC = DorsoLateral PreFrontal Cortex; VPFC = Ventral PFC; SC = Superior Colliculus.
      ||
    71. image p580fig16.05 Macrocircuit of the GridPlaceMap model, which can learn both 2D grid cells and place cells in response to realistic trajectories of navigating rats using a hierarchy of SOMs with identical equations.
      || GridPlaceMap model: rate-based and spiking (Pilly, Grossberg 2012). Pre-wired 1D stripe cells, learns both 2D frid and place cells! Same laws for both; both select most frequent and energetic inputs. Place cells emerge gradually in response to developing grid cells. [place-> grid-> stripe] cells-> path integration-> vestibular signals
    72. image p599fig16.35 Data (a) and simulations (b,c) about anatomically overlapping grid cell modules. (a) shows the anatomical distribution of grid cells belonging to different modules in one animal. DV location (mm) vs postrhinal border. (b) shows the simulated distribution of learned grid cell spacings from two stripe cell scales. frequency (%) vs grid spacing (cm). mu = [1, 0.6]. (c) shows what happens when half the cells respond with one rate and half another rate. (d) shows the same with three rates. (e-g) show spatial maps and autocorrelograms of grid cells that arise from the different rates in (d). [rate map, autocorelogram] vs [score [1.07, 0.5, 0.67], spacing (cm) [23.58, 41, 63.64]].
      ||
    73. image p612fig16.42 Macrocircuit of the main SOVEREIGN subsystems.
      || [reward input, drive input, drive representation (DR), visual working memory and planning system (VWMPS), visual form and motion system (VFMS), motor approach and orienting system (MAOS), visual input (VisIn), motor working memory and planning system (MWMPS), motor approach and orienting system (MAOS), motor plant (MotP), Proprioceptive Input (PropIn), Vestibular Input (VesIn), Environmental feedback (EnvFB). DR [incentive motivational learning-> [VWMPS, MWMPS], -> VFMS, -> MAOS], VWMPS [conditioned reinforcer learning-> DR, MAOS], VFMS [visual object categories-> VWMPS, reactive movement commands-> MAOS], MWMPS [conditioned reinforcer learning-> DR, planned movement commands-> MAOS], MAOS [motor map positions-> MWMPS, motor outflow-> MotP], VisIn-> VFMS, VesIn-> MAOS, EnvFB-> [VisIn, MotP, VesIn].
    74. p190 Howell: [neural microcircuits, modal architectures] used in ART -
      bottom-up filterstop-down expectationspurpose
      instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
      LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
      EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
      auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
      auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
      visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
      red - cognitive-emotional dynamics
      green - working memory dynamics
      black - see [bottom-up, top-down] lists
      EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
      CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

      backgound colours in the table signify :
      whitegeneral microcircuit : a possible component of ART architecture
      lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
      light bluepost-perceptual cognition?
      pink"the feeling of what happens" and knowing what event caused that feeling
    75. image p025fig01.16 (left panel) The main processing stages of the Cognitive-Emotional-Motor (CogEM) model have anatomical interpretations in terms of sensory cortex, amygdala, and prefrontal cortex. Chapter 13 will describe in greater detail how CS cues activate invariant object categories in the sensory cortex, value categories in the amygdala, and object-value categories in the prefrontal cortex, notably the orbitofrontal cortex. The amygdala is also modulated by internal drive inputs like hunger and satiety. (right panel) Anatomical data support this circuit, as do many neurophysiological data.
      || drive -> amygdala -> prefrontal cortex <-> sensory cortex -> amygdala. [visual, somatosensory, auditory, gustatory, olfactory] cortex -> [amygdala, Orbital Prefrontal Cortex]. amygdala -> Lateral Prefrontal Cortex
    76. image p481fig13.01 Macrocircuit of the functional stages and anatomical interpretations of the Cognitive-Emotional-Motor, or CogEM, model.
      || Drive-> hypothalamus value categories <-> amygdala incentive motivational learning-> Orbitofrontal cortex- object-value categories <-> sensory cortex- invariant object categories- conditioned reinforcer learning-> amygdala-> hypothalamus.
    77. image p483fig13.02 The object-value categories in the orbitofrontal cortex require converging specific inputs from the sensory cortex and nonspecific incentive motivational inputs from the amygdala in order to fire. When the orbitofrontal cortex fires, it can deliver top-down ART Matching Rule priming signals to the sensory cortical area by which it was activated, thereby helping to choose the active recognition categories there that have the most emotional support, while suppressing others, leading to attentional blocking of irrelevant cues.
      || Cognitive-Emotional-Motor (CogEM) model. Drive-> amygdala incentive motivational learning-> orbitofrontal cortex- need converging cue and incentive iputs to fire <-> sensory cortex- conditioned renforcer learning-> amygdala. CS-> sensory cortex. Motivated attention closes the cognitive-emotional feedback loop, focuses on relevant cues, and causes blocking of irrelevant cues.
    78. image p483fig13.03 The predicted processing stages of CogEM have been supported by anatomical studies of connections between sensory cortices, amygdala, and orbitofrontal cortex.
      || Adapted from (Barbas 1995). sensory cortices = [visual, somatosensory, auditory, gustatory, olfactory]. sensory cortices-> amygdala-> orbital prefrontal cortex. sensory cortices-> orbital prefrontal cortex. [visual cortex, amygdala]-> lateral prefrontal cortex.
    79. image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
      || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
    80. image p487fig13.11 The three main properties of CogEM that help to explain how attentional blocking occurs.
      || CogEM explanation of attentional blocking. Internal drive input <-> Conditioned reinforcer learning (self-recurrent) <-> Competition for STM <- Motor learning. 1. Sensory representations compete for limited capacity STM. 2. Previously reinforced cues amplify their STM via positive feedback. 3. Other dues lose STM via competition.
    81. image p489fig13.13 (top row) If a positive ISI separates onset of a CS and US, then the CS can sample the consequences of the US during the time interval before it is inhibited by it. (bottom row) A CogEM simulation of the inverted-U in conditioning as a function of the ISI betweeen CS and US.
      || Positive ISI and conditioning.
    82. image p490fig13.15 The CogEM circuit is an ancient design that is found even in mollusks like Aplysia. See the text for details.
      || Aplysia (Buononamo, Baxter, Byrne, Neural Networks 1990; Grossberg, Behavioral and Brain Sciences 1983). Facilitator neuron ~ drive representation.
    83. image p494fig13.19 (left column, top row) Secondary conditioning of both arousal and a specific response are now possible. (bottom row) The CogEM circuit may be naturally extended to include multiple drive representations and inputs. (right column, top row) The incentive motivational pathways is also conditionable in order to enable motivational sets to be learned.
      || Secondary conditioning. Homology: conditionable incentive motivation. Multiple drive representations and inputs.
    84. image p514fig13.44 Analog of the COgEM model in Figure 6.1 of (Damasio 1999).
      || (a) map of object X-> map of proto-self at inaugural instant-> [, map of proto-self modified]-> assembly of second-order map. (b) map of object X enhanced-> second-order map imaged.
    85. image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
      ||
    86. image p548fig15.16 Homologous recognition learning and reinforcement learning macrocicuits enable adaptively timed conditioning in the reinforcement learning circuit to increase inhibition of the orienting system at times when a mismatch in the recognition system would have reduced inhibition of it.
      || Homolog between ART and CogEM model, complementary systems. [Recognition, Reinforcement] learning vs [Attentional, Orienting] system. Reinforcement: timing, drive representation.
    87. image p080fig02.33 An opposite-attracts rule during the development of intracellular connections can lead to a mature network that realizes informational noise suppression.
      || How do noise suppression parameters arise? Symmetry-breaking during morphogenesis? Opposites attract rule.
      Intracellular parameters C/B = 1/(1 - n) Intercellular parameters
      Predicts that:
      • Intracellular excitatory and inhibitory saturation points can control the growth during development of :
      • Intercellular excitatory and inhibitory connections.
    88. image p012fig01.08 A sigmoidal signal function is a hybrid signal that combines the best properties of [faster, same, slower]-than linear signals. It can suppress noise and store a partially contrast-enhanced activity pattern. slower-than-linear saturates pattern; approximately linear- preserves pattern and normalizes; faster-than-linear- noise suppression and contrast-enhancement.
      || Sigmoidal signal: a hybrid. (upper) saturates pattern- slower-than-linear; (middle) preserves pattern and normalizes- approximately linear. (lower) noise suppression and contrast enhancement- faster-than-linear.
    89. image p078fig02.30 Choosing the adaptation level to achieve informational noise suppression.
      || Noise suppression. Attenuate Zero Spatial frequency patterns: no information. Ii vs i (flat line), xi vs i (flat line at zero)
      B >> C: Try B = (n - 1)*C or C/(B + C) = 1/n
      Choose a uniform input pattern (no distinctive features): All θi = 1/n
      xi = (B + C)*I/(A + I)*[θi -C/(B + C)] = 0 no matter how intense I is.
    90. image p078fig02.31 How noise suppression enables matching of bottom-up and top-down input patterns.
      || Noise suppression -> pattern matching. mismatch (out of phase) suppressed, match (in phase) amplifies pattern.
    91. image p080fig02.33 An opposite-attracts rule during the development of intracellular connections can lead to a mature network that realizes informational noise suppression.
      || How do noise suppression parameters arise? Symmetry-breaking during morphogenesis? Opposites attract rule.
      Intracellular parameters C/B = 1/(1 - n) Intercellular parameters
      Predicts that:
      • Intracellular excitatory and inhibitory saturation points can control the growth during development of :
      • Intercellular excitatory and inhibitory connections.
    92. image p080fig02.34 How to achieve informational noise suppression in a network with multiple parallel processing channels.
      || Symmetry-breaking: dynamics and anatomy.
      Dynamics:
      • excitatory range is amplified
      • inhibitory range is compressed
      Anatomy:
      • narrow on-center
      • broad off-surround
      Noise suppression: attenuates uniform patterns
      Contour direction: enhances pattern gradients
    93. image p081fig02.36 Informational noise suppression in network with Gaussian on-center and off-surround function as contour detectors that are sensitive to ratio-contrast.
      || Noise suppression and contour detection.
      If B*sum[k=1 to n: Cki] <= D*sum[k=1 to n: Eki] then:
      • uniform patterns are suppressed
      • contrasts are selectively enhanced
      • contours are detected
      Ii vs i, xi vs i
      Responses are selective to [REFLECTANCE, SPATIAL SCALE], eg color [feature, surface] contours.
    94. image p510fig13.39 Shunting competition and informational noise suppression in affective gated dipoles, plus back-propagating action potentials for teaching signals, enable the net normalized adaptive weights to be learned. They never saturate!
      || Learn net dipole output pattern. Opponent "decision" controls learning. Cf. competitive learning. Learning signal, opponent extinction.
    95. image p009fig01.06 Primacy gradient of activity stored in working memory within a recurrent shunting on-center off-surround network. Rehersal is controlled by a nonspecific rehersal wave and self-inhibitory feedback of the item that is currently being rehearsed. Rehearsal is controlled by a nonspecific rehearsal wave and self-inhibitory feedback of the item that is currently being rehearsed. Green = excitatory, red = inhibitory
      || inputs? -> item and order WM storage -> competitive selection-> rehearsal wave -> outputs
    96. image p024fig01.15 A REcurrent Associative Dipole, or READ, circuit is a recurrent shunting on-center off-surround network with habituative transmitter gates. Sensory cues sample it with LTM traces and thereby become conditioned reinforcers.
      ||
    97. image p038fig01.25 The ART Matching Rule stabilizes real time learning using a [top-down, modulatory on-center, off-surround] network. Object attention is realized by such a network. See text for additional discussion.
      || ART Matching Rule [volition, categories, features]. [one, two] against one.
    98. image p073fig02.22 An on-center off-surround network is capable of computing input ratios.
      || Computing with patterns.
      How to compute the pattern-sensitive variable: θi = Ii / sum[k=1 to n: Ik]?
      Needs interactions! What type? θi = Ii / sum[k ≠ i: Ik]
      Ii↑ ⇒ θi↑ excitation, Ik↑ ⇒ θk↓, k ≠ i inhibition
      On-center off-surround network.
    99. image p074fig02.23 The equations for a shunting on-center off-surround network. Shunting terms lead to many beautiful and important properties of these networks, which are found ubiquitously, in one form or another, in all cellular tissues.
      || Shunting on-center off-surround network.
      Mass action: d[dt: xi] = -A*xi +(B - xi)*Ii -xi*sum[k≠i: Ik]
      Turn on unexcited sitesTurn off excited sites
      At equilibrium:
      0 = d[dt: xi] = -(A + Ii + sum[k≠i: Ik])*xi + B*Ii = -(A + I)*xi + B*Ii
      xi = B*Ii/(A + I) = B*θi*I/(A + I) = θi* B*I/(A + I)No saturation!
      Infinite dynamical range
      Automatic gain control
      Compute ratio scale
      Weber law
      x = sum[k-1 to n: xk] = B*I/(A + I) ≤ B Conserve total activity
      NORMALIZATION
      Limited capacty
      Real-time probability
    100. image p075fig02.24 The membrane equations of neurophysiology describe how cell voltages change in response to excitatory, inhibitory, and passive input channels. Each channel is described by a potential difference multiplied by a conductance. With the special choices shown in the lower right-hand corner, this equation defines a feedforward shuntin on-center off-surround network.
      || Membrane equations of neurophysiology.
      C*dp[dt] = (V(+) - V)*g(+) +(V(-) - V)*g(-) +(V(p) - V)*g(p)
      Shunting equation (not additive)
      V Voltage
      V(+), V(-), V(p) Saturating voltages
      g(+), g(-), g(p) Conductances
      V(+) = B, C = 1; V(-) = V(p) = 0; g(+) = Ii; g(-) = sum[k≠i: Ik];
      lower V: V(+) = V(p) Silent inhibition, upper V: V(+). (Howell: see p068fig02.14 Grossberg's comment that Hodgkin&Huxley model was a "... Precursor of Shunting network model (Rail 1962) ...").
    101. image p076fig02.25 An on-center off-surround network can respond to increasing on-center excitatory inputs without a loss of sensitivity. Instead, as the off-surround input increases, the region of a cell's maximal sensitivity to an increasing on-center input shifts to a range of larger inputs. This is because the off-surround divides the effect of the on-center input, an effect that is often called a Weber law.
      || Web er law, adaptation, and shift property (Grossberg 1963).
      Convert to logarithmic coordinates:
      K = ln(Ii), Ii = e^K, J = sum[k≠i: Ik]
      xi(K,J) = B*Ii/(A + Ii + J) = B*e^K/(A + e^K + J)
      x(K + S, J1) = x(K, J2), S = ln((A + J1)/(A + J2)) size of SHIFT.
    102. image p076fig02.26 The mudpuppy retina exhibits the shift property that occurs in the feedforward shunting on-center off-surround network in Figure 2.25. As a result, its sensitivity also shifts in response to different background off-surrounds, and therefore exhibits no compression (dashed purple lines).
      || Mudpuppy retina neurophysiology.
      I center, J background
      a) Relative figure-to-ground
      b) Weber-Fechner I*(A + J)^(-I)
      c) No hyperpolarization, SHUNT: Silent inhibition
      d) Shift property(Werblin 1970) xi(K,J) vs K = ln(I)
      Adaptation- sensitivity shifts for different backgrounds. NO COMPRESSION.
    103. image p077fig02.27 A schematic of the on-center off-surround network that occurs in the mudpuppy retina, including three main cell types: receptors, horizontal cells, and bipolar cells.
      || Mechanism: cooperative-competitive dynamics.
      On-center off-surround (Kuffler 1953) cat retina
      Subtractive lateral inhibition (Hartline, Ratcliff 1956/7+) limulus retina.
      R receptor -> H horizontal -> B bipolar (Werblin, Dowling, etal 1969+) mudpuppy retina.
    104. image p080fig02.34 How to achieve informational noise suppression in a network with multiple parallel processing channels.
      || Symmetry-breaking: dynamics and anatomy.
      Dynamics:
      • excitatory range is amplified
      • inhibitory range is compressed
      Anatomy:
      • narrow on-center
      • broad off-surround
      Noise suppression: attenuates uniform patterns
      Contour direction: enhances pattern gradients
    105. image p081fig02.35 The equilibrium activities of a shunting netwok with Gaussian on-center off-surround kernels are sensitive to the ratio-contrasts of the input patterns that they process. The terms in the denominator of the equilibrium activities accomplish this using the shunting on-center and off-surround terms.
      || Ratio-contrast detector. flat versus [Gaussian Cki, flattened Gaussian? Eki]
      d[dt: xi] = -A*xi +(B - xi)*sum[k≠i: Ik]*Cki -(xi + D)*sum[k=1 to n: Ik*Eki]
      Cki = C*e^(-μ*(k - i)^2), Eki = E*e^(-μ*(k - i)^2)
      At equilibrium: xi = I*sum[k=1 to n: θk*Fki] / (A + I*sum[k=1 to n: θk*Gki])
      Fki = B*Cki -D*Eki (weighted D.O.G)
      Gki = Cki +Eki (S,O,G)
      • Reflectance processing
      • Contrast normalization
      • Discount illuminant
    106. image p081fig02.36 Informational noise suppression in network with Gaussian on-center and off-surround function as contour detectors that are sensitive to ratio-contrast.
      || Noise suppression and contour detection.
      If B*sum[k=1 to n: Cki] <= D*sum[k=1 to n: Eki] then:
      • uniform patterns are suppressed
      • contrasts are selectively enhanced
      • contours are detected
      Ii vs i, xi vs i
      Responses are selective to [REFLECTANCE, SPATIAL SCALE], eg color [feature, surface] contours.
    107. image p106fig03.24 In response to the Synthetic Aperture image (upper corner left), a shunting on-center off-surround network "discounts the illiminant" and thereby normalizes cell activities to compute feature contours, without causing saturation (upper right corner). Multiple-scale boundaries form in response to spatially coherent activities in the feature contours (lower left corner) and create the webs, or containers, into which the feature contours fill-in the final surface representations (lower right corner).
      || Do these ideas work on hard problems? SAR!
      input imagefeature contoursboundary contoursfilled-in surface
      Synthetic Aperture Radar: sees through weather 5 orders of magnitude of power in radar returndiscounting the illuminant
      • normalizes the image: preseves RELATIVE activities without SATURATION
      • shows individual PIXELS
      boundaries complete between regions where normalized feature contrasts changefilling-in averages brightnesses within boundary compartments
    108. image p176fig04.53 The on-center off-surround network within position and across depth helps to explain why brighter Kanizsa squares look closer.
      || inhibition vs. depth. p176c1h0.25 "... to qualitatively understand how this example of proximity-luminance covariance works. It follows directly from the boundary pruning by surface contour feedback signals (Figure 4.51) that achieves complementary consistency and initiates figure-ground perception. ...". p176c1h0.45 "... these inhibitory sigals are part of an off-surround network whose strength decreases as the depth difference increases between the surface that generates the signal and its recipient boundaries. ...". p176c1h0.8 "... Within FACADE theory, the perceived depth of a surface is controlled by the boundaries that act as its filling-in generators and barriers (Figure 3.22), since these boundaries select the depth-sselective FIDOs within whin filling-in can occur, and thereby achieve surface capture. These boundaries, in turn, are themselves strengthened after surface-to-boundary contour feedback eliminates redundant boundaries that cannot support sucessful filling-in (Figure 4.51). These surface contour feedback signals have precisely the properties that are needed to explain why brighter Kanizsa squares look closer! ..."
    109. image p192fig05.05 ON and OFF cells in the LGN respond differently to the sides and ends of lines.
      || [ON, OFF]-center, [OFF, ON]-surround (respectively). OFF-center cells maximum response at line end (interior), ON-center cells maximum response along sides (exterior)
    110. image p253fig06.02 After bottom-up surface inputs activate spatial attentional cells, they send top-down topographic excitatory signals back to the surface representations. This recurrent shunting on-center off-surround network contrast enhances larger attentional activities while approximately normalizing the total spatial attentional activity. A surface-shroud resonance hereby forms that selects an attentional shroud, enhances the perceived contrast of the attended surface (light blue region), and maintains spatial attention on it.
      || Surface-shroud resonance. perceptual surfaces -> competition -> spatial attention. (Carrasco, Penpeci-Talgar, and Eckstein 2000, Reynolds and Desimone 2003)
    111. image p300fig08.12 A single flash activates a Gaussian receptive field across space whose maximum is chosen by a winner-take-all recurrent on-center off-surround network.
      || Gaussian receptive fields are sufficient! (Grossberg, Rudd 1992). Single flash. Suppose that a single flash causes a narrow peak of activity at the position where it occurs. It generates output signals through a Gaussian filter that produces a Gaussian activity profile at the next processing stage., A recurrent on-center off-surround network chooses the maximum activity and suppresses samaller activities. Winner-take-all
    112. image p315fig08.35 The output signals from the directional grouping network obey the ART Matching Rule. They thereby select consistent motion directional signals while suppressing inconsistent ones, and do not distort that the spared cells code. The aperture problem is hereby solved by the same mechanism that dynamically stabilizes the learning of directional grouping cells.
      || How to select correct direction and preserve speed estimates? Prediction: Feedback from MSTv to MT- obeys ART Matching Rule; Top-down, modulatory on-center, off-surround network (Grossberg 1976, 1980; Carpenter, Grossberg 1987, 1991); Explains how directional grouping network can stably develop and how top-down directional attention can work. (Cavanough 1992; Goner etal 1986; Sekuer, Ball 1977; Stelmach etal 1994). Directional grouping network (MSTv) <-> Directional long-range filter (MT). Modulatory on-center selects chosen direction and preserves speed. Off-surround inhibits incompatible directions.
    113. image p340fig09.07 Log polar remapping from the retina to cortical area V1 and beyond converts expansion, translation, and spiral flows on the retina into parallel flows, with different orientations, on the cortical map.
      || Log polar remapping of optic flow. retina -> cortex. Any combination of expansion and circular motion centered on the fovea maps to cortex as a single direction. Retinal Cartesian coordinates (x,y) map to cortical polar coordinates (r,theta). This makes it easy to compute directional receptive fields in the cortex!
    114. image p345fig09.15 Double opponent directional receptive fields in MT are capable of detecting the motion of objects relative to each other and their backgrounds.
      || Motion opponency in MT (Born, Tootell 1992). Motion opponent (Grossberg etal), Differential motion (Royden etal), Subtractive motion cells (Neumann etal). ON center directionally selective: [excit, inhibit]ed by motion in [one, opponent] direction. OFF surround directionally selective: [excit, inhibit]ed by motion in [opponent, center] direction.
    115. image p354fig10.01 The lamiar cortical circuit that realizes how we pay attention to an object sends signals from layer 6 of a higher cortical level to layer 6 of a lower cortical level and then back up to layer 4. This "folded feedback" circuit realizes a top-down, modulatory on-center, off-surround circuit that realizes the ART Matching Rule.
      || Top-down attention and folded feedback. Attentional signals also feed back into 6-to-4 on-center off-surround. 1-to-5-to-6 feedback path: Macaque (Lund, Booth 1975) cat (Gilbert, Wiesel 1979). V2-to-V1 feedback is on-center off-surround and affects layer 6 of V1 the most (Bullier etal 1996; Sandell, Schiller 1982). Attended stimuli enhanced, ignored stimuli suppressed. This circuit supports the predicted ART Matching Rule! [LGN, V[1,2][6->1]]
    116. image p359fig10.06 Another, albeit indirect, pathway from LGN exists that can also excite layer 4 of V1. Why are not these two pathways redundant? The answer, ultimately, how to do with how cortex learns, as well as with how it pays attention. See the text for details.
      || Another bottom-up input to layer 4: Why?? Layer 6-to-4 on-center off-surround (Grieve, Sillito 1991, 1995; Ahmedetal 1994, 1997). LGN projects to layers 6 and 4. Layer 6 excites spiny stellates in column above it. Medium range connections onto inhibitory neurons. 6-t-4 path acts as on-center off-curround.
    117. image p359fig10.07 The two bottom-up pathways from LGN to layer 4 of V1 can together activate layer 4 and contrast-normalize layer 4 responses.
      || Bottom-up contrast normalization (Grossberg 1968, 1973; Sperling, Sondhi 1968; Heeger 1992; Douglas etal 1995; Shapley etal 2004). Together, direct LGN-to-4 path and 6-to-4 on-center off-surround provide contrast normalization if cells obey shunting or membrane equation dynamics.
    118. image p360fig10.08 The bottom-up on-center off-surround from LGN-to-6-to-4 has a modulatory on-center because of its role in realizing the ART Matching Rule and, with it, the ability of the cortex to dynamically stabilize its learned memories.
      || Modulation of priming by 6-to-4 on-center (Stratford etal 1996; Callaway 1998). On-center 6-to-4 excitation is inhibited down to being modulatory (priming, subthreshold). On-center 6-to-4 excitation cannot activate layer 4 on its own. Clarifies need for direct path. Prediction: plays key role in stable grouping, development and learning. ART Matching Rule!
    119. image p362fig10.11 Feedback between layer 2/3 to the layer 6-to-4-to-2/3 feedback loop chooses the strongest grouping in cases where there is more than one. If only one grouping exists, then the circuit can function very quickly in a feedforward manner. When multiple groupings exist, the cortex "runs as fast as it can" to select the one with the most evidence to support it using the self-normalizing inhibition in the layer 6-to-4 off-surround.
      || How is the final grouping selected? Folded feedback LGN-> 6-> 4-> 2/3. 1. Layer 2/3 groupings feed back into 6-to-4 on-center off-surround: a) direct layer 2/3 -to-6 path; b) can also go via layer 5 (Blasdel etal 1985; Kisvarday etal 1989). 2. Strongest grouping enhanced by its on-center. 3. Inputs to weaker groupings suppressed by off-surround. 4. Interlaminar feedback creates functional columns. Activities of conflicting groupings are reduced by self-normalizing inhibition, slowing processing; intracortical feedback selects and contrast-enhances the winning grouping, speeding processing.
    120. image p364fig10.13 The bottom-up adaptive filter, intracortical grouping circuit, and intercortical top-down attentional circuit all use the same competitive decision circuit between layers 6 and 4, called the attention-preattention interface, with which to select the featural patterns that will be processed.
      || Bottom-up filters and intracortical grouping feedback use the same 6-to-4 decision circuit, LGN-> Vx[6,4,2/3]. competitive decision circuit, modulatory on-center off-surround network. Top-down intercortical attention also uses the same 6-to-4 decision circuit!
    121. image p364fig10.14 This figure emphasizes how preattentive intracortical groupings and top-down intercortical attention share the same modulatory on-center, off-surround layer 4-to-6 decision circuit.
      || Explanation: grouping and attention share the same modulatory decision circuit. Layer 6-6-4-2/3 pathway shown; also a layer 6-1-2/3 path. intercortical attention, both act via a modulatory on-center off-surround decision circuit, intracortical feedback from groupings
    122. image p367fig10.15 Data (left column) and simulation (right column) of how attention prevents a masking stimulus from inhibiting the response to the on-center of the cell from which the recording was made.
      || Attention protects target from masking stimulus (Reynolds etal 1999; Grossberg, Raizada 2000).
    123. image p375fig11.07 The 3D LAMINART model uses both monocular and binocular simple cells to binocularly fuse like image contrasts. The remainder of the model generates 3D boundary and surface representations of multiple kinds of experiments as well as of natural scenes.
      || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A], V2 thin stripe [4->2/3A], V4]. V1 blob [V1-4 monocular, V1 interior binocular] simple cells. [complex, simple, inhibitory] cells, on-center off-surround
    124. image p376fig11.10 The 3D LAMINART model shows how the disparity filter can be integrated into the circuit that completes 3D boundary representations using bipole grouping cells. It also explains how surface contours can strengthen boundaries that succeed in generating closed filling-in domains.
      || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A] surface contour, V2 thin stripe (monocular surface) [4->2/3A], V2 interior [disynaptic inhibitory interneurons, bipole grouping cells, disparity filter, V4 binocular surface]. [complex, simple, inhibitory] cells, on-center off-surround
    125. image p423fig12.20 The Spatial Pitch Network, or SPINET, model shows how a log polar spatial representation of the sound frequency spectrum can be derived from auditory signals occuring in time. The spatial representation allows the ARTSTREAM model to compute spatially distinct auditory streams.
      || SPINET model (Spatial Pitch Network) (Cohen, Grossberg, Wyse 1995). 1. input sound 2. Gamma-tone filter bank 3. Shaort-term average energy spectrum 4. MAP transfer function 5. On-center off-surround and rectification 6. Harmonic weighting 7. Harmonic summation and competition -> PITCH
    126. image p448fig12.46 A Masking Field working memory is a multiple-scale self-similar recurrent shunting on-center off-surround network. It can learn list chunks that respond selectively to lists of item chunks of variable length that are stored in an item working memory at the previous processing stage. Chunks that code for longer lists (eg MY vs MYSELF) are larger, and give rise to stronger recurrent inhibitory neurons (red arrows).
      || How to code variable length lists? MASKING FIELDS code list chunks of variable length (Cohen, Grossberg 1986, 1987; Grossberg, Kazerounian 2011, 2016; Grossberg, Meyers 2000; Grossberg, Pearson 2008). Multiple-scale self-similar WM: Masking field, adaptive filter. Variable length coding- Masjking fields select list chunks that are sensitive to WM sequences of variable length; Selectivity- Larger cells selectively code code longer lists; Assymetric competition- Larger cells can inhibit smaller cells more than conversely MAgic Number 7! Temporal order- different list chunks respond to the same items in different orders eg LEFT vs FELT;.
    127. image p564fig15.35 (a) A pair of recurrent shunting on-center off-surround networks for control of the fore limbs and hind limbs. (b) Varying the GO signal to these networks can trigger changes in movement gaits. See the text for details.
      ||
    128. image p567fig15.38 (a) The Gated Pacemaker model for the control of circadian rythms is a recurrent shunting on-center off-surround network whose excitatory feedback signals are gated by habituative transmitters. Tonic arousal signals energize the pacemaker. Diurnal (left) and nocturnal (right) pacemakers are determined by whether phasic light signals turn the pacemaker on or off. An activity-dependent fatigue signal prevents the pacemaker from becoming overly active for too long. (b) Two simulations of circadian activity cycles during different schedules of light (L) and dark (D). See the text for details.
      || sourceOn-> on-cells (recurrent) <-(-) (-)> off-cells (recurrent) <-sourceOff. on-cells-> activity-> off-cells. off-cells-> fatigue. Diurnal: sourceOn=[light, arousal]; sourceOff=arousal;. Nocturnal: sourceOn=arousal; sourceOff=[arousal, light];.
    129. image p586fig16.16 In the place cell learning model of (Gorchetnikov, Grossberg 2007), three populations of five cells each of entorhinal grid cells (only two are shown) with different spatial periods input to the model's dentate gyrus. The grid cells are one-dimensional and defined algorithmically. A model dentate gyrus granule cell that receives strong projections from all three grid cell scales fires (green cell) and activates a recurrent inhibitory interneuron that inhibits other granule cells. It also generates back-propagating action potentials that trigger learning in the adaptive weights of the projections from the grid cells, thereby causing learning of place cell receptive fields.
      || Grid-to-place Self-Organizing map (Gorchetnikov, Grossberg 2007). Formation of place cell fields via grid-to-place cell learning. Least common multiple: [grid (cm), place (m)] scales: [40, 50, 60 (cm); 6m], [50, 60, 70 (cm); 21m], [41, 53, 59 (cm); 1.282 km]. Our simulations: [40, 50 (cm); 2m], [44, 52 (cm); 5.72m]. Our SOM: Spiking Hodgkin-Huxley membrane equations; Nonlinear choice by contrast-enhancing recurrent on-center off-surround net;. Choice triggers back-propagating action potentials that induce STDP-modulated learning on cell dendrites.
    130. image p627tbl17.01 Homologs between reaction-diffusion and recurrent shunting cellular network models of development.
      || byRows: (reaction-diffusion, recurrent shunting net) (activator, excitatory activity) (inhibitor, inhibitory activity) (morphogenic source density, inputs) (firing of morphogen gradient, contrast enhancement) (maintenance of morphogen gradient, short-term memory) (power or sigmoidal signal functions, power or sigmoidal signal functions) (on-center off-surround interactions via diffusion, on-center off-surround interactions via signals) (self-stabilizing distributions of morphogens if inhibitors equilibrate rapidly, short-term memory pattern if inhibitors equilibrate rapidly) (periodic pulses if inhibitors equilibrate slowly, periodic pulses if inhibitors equilibrate slowly) (regulation, adaptation).
    131. image p016fig01.11 A sufficiently big mismatch between a bottom-up input pattern and a top-down expectation can activate the orienting system, which triggers a burst of nonspecific arousal that can reset the recognition category that read out the expectation. In this way, unexpected events can reset short-term memory and initiate a search for a category that better represents the current situation.
      || [category- top-down (TD) expectation; Bottom-up (BU) input pattern] -> Feature pattern -> BU-TD mismatch -> orienting system -> non-specific arousal -> category.
    132. image p038fig01.25 The ART Matching Rule stabilizes real time learning using a [top-down, modulatory on-center, off-surround] network. Object attention is realized by such a network. See text for additional discussion.
      || ART Matching Rule [volition, categories, features]. [one, two] against one.
    133. image p052fig02.02 Feature-category resonances enable us to rapidly learn how to recognize objects without experiencing catastrophic forgetting. Attentive matching between bottom-up feature pattern inputs and top-down expectations prevent catastrophic forgetting by focussing object attention upon expected patterns of features, while suppressing outlier features that might otherwise have caused catastophic forgetting if they were learned also.
      || Adaptive Resonance. Attended feature clusters reactivate bottom-up pathways. Activated categories reactivate their top-down pathways. Categories STM, Feature patterns STM. Feature-Category resonance [synchronize, amplify, prolong]s system response. Resonance triggers learning in bottom-up and top-down adaptive weights: adaptive resonance!
    134. image p078fig02.31 How noise suppression enables matching of bottom-up and top-down input patterns.
      || Noise suppression -> pattern matching. mismatch (out of phase) suppressed, match (in phase) amplifies pattern.
    135. image p079fig02.32 Matching amplifies the matched pattern due to automatic gain control. See terms I and J in the equation.
      || Substrate of resonance. Match (in phase) of BU and TD input patterns AMPLIFIES matched pattern due to automatic gain control by shunting terms. J = sum[i: Ji], I = sum[i: Ii], θi = (Ii + Ji)/(I + J)
      xi = (B + C)*(I + J)/(A + I + J)*[θi -C/(B + C)]
      Need top-down expectations to be MODULATORY.
    136. image p087fig03.01 A macrocircuit of key visual processes (in green) and the cortical areas in which they primarily occur (in red), from the retina to the Prefrontal Cortex (PFC), including both the What and Where cortical streams. The [bottom-up, horizontal, and top-down] interactions help each of these processes to overcome computationally complementary processing deficiencies that they would experience without them, and aso to read-out top-down expectations that help to stabilize learning while they focus attention on salient objects and positions.
      || Emerging unified theory of visual intelligence. [What, Where] streams. Bottom-up and top-down interactions overcome COMPLEMENTARY processing deficiencies.
    137. image p091fig03.04 A cross-section of the eye, and top-down view of the retina, shao how the blind spot and retinal veins can occlude the registration of light signals at their positions on the retina.
      || Eye: [optic nerve, ciliary body, iris,lens, pupil, cornea, sclera, choroid, retina]. Human retina: [fovea, blind spot, optic nerve]. see alsi cross-section of retinal layer.
    138. image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
      || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
    139. image p192fig05.06 Bottom-up and top-down circuits between the LGN and cortical area V1. The top-down circuits obey the ART Matching Rule for matching with bottom-up input patterns and focussing attention on expected critical features.
      || Model V1-LGN circuits, version [1, 2]. retina -> LGN relay cells -> interneurons -> cortex [simple, endstopped] cells -> cortex complex cells
    140. image p193fig05.08 The patterns of LGN activation and inhibition on the sides and ends of a line without the top-down feedback (A) and with it (C). The top-down distribution of excitation (+) and inhibition (-) are shown in (B).
      ||
    141. image p199fig05.11 Instar learning enables a bottom-up adaptive filter to become selectively tuned to particular feature patterns. Such pattern learning needs adaptive weights that can either increase or decrease to match the featural activations that they filter.
      || Instar learning STM->LTM: need both increases and decreases in strength for the LTM pattern to learn the STM pattern
    142. image p200fig05.13 Instar and outstar learning are often used to learn the adaptive weights in the bottom-up filters and top-down expectations that occur in ART. The ART Matching Rule for object attention enables top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features.
      || Expectations focus attention: feature pattern (STM), Bottom-Up adaptive filter (LTM), Category (STM), competition, Top-Down expectation (LTM); ART Matching Rule: STM before top-down matching, STM after top-down matching (attention!)
    143. image p211fig05.20 The PN and N200 event-related potentials are computationally complementary events that are computed within the attentional and orienting systems.
      || PN and N200 are complementary waves. PN [top-down, conditionable, specific] match; N200 [bottom-up, unconditionable, nonspecific] mismatch
    144. image p214fig05.24 Learning of a top-down expectation must occur during bottom-up learning in the adaptive filter in order to be able to match the previously associated feature pattern with the one that is currently active.
      || Learning top-down expectations. When the code (green right triangle GRT) for X1 was learned at F2, GRT learned to read-out X1 at F1. [Bottom-Up, Top-Down] learning
    145. image p214fig05.25 The sequence of events whereby a novel input pattern can activate a category which, in turn, reads out its learned top-down expectation to be matched against the input pattern. Error correction thus requires the use of a Match Detector that has properties of the Processing Negativity ERP.
      || How is an error corrected. During bottom-up learning, top-down learning must also occur so that the pattern that is read out top-down can be compared with the pattern that is activated by bottom-up inputs. Match detector: Processing Negativity ERP. 1. top-down, 2. conditionable, 3. specific, 4. match
    146. image p214fig05.26 When a big enough mismatch occurs, the orienting system is activated and sends a burst of nonspecific arousal to the category level. This Mismatch Detector has properties of the N200 ERP.
      || Mismatch triggers nonspecific arousal. Mismatch at F1 eleicits a nonspecific event at F2. Call this event nonspecific arousal. N200 ERP Naatanen etal: 1. bottom-up, 2. unconditionable, 3. nonspecific, 4. mismatch
    147. image p215fig05.28 How a mismatch between bottom-up and top-down input patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level.
      || Mismatch -> inhibition -> arousal -> reset. BU input orienting arousal, BU+TD mismatch arousal and reset. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    148. image p220fig05.29 Vigilance is a gain parameter on inputs to the orienting system that regulates whether net excitation from bottom-up inputs or inhibition from activated categories will dominate the orienting system. If excitation wins, then a memory search for a better matching will occur. If inhibition wins, then the orienting system will remain quiet, thereby enabling resonance and learning to occur.
      || Vigilance control [resonate and learn, reset and search]. ρ is a sensitivity or gain parameter
    149. image p221fig05.30 When a predictive disconfirmation occurs, vigilance increases enough to drive a search for a more predictive category. If vigilance increases just enough to exceed the analog match between features that survive top-down matching and the entire bottom-up input pattern, then minimax learning occurs. In this case, the minimum amount of category generalization is given up to correct the predictive error.
      || Match tracking realizes minimax learning principle. Given a predictive error, vigilance increases just enough to trigger search and thus acrifices the minimum generalization to correct the error ... and enables expert knowledge to be incrementally learned. predictive error -> vigilance increase just enough -> minimax learning
    150. image p221fig05.31 A system like Fuzzy ARTMAP can learn to associate learned categories in one ART network with learned categories in a second ART network. Because both bottom-up and top-down interactions occur in both networks, a bottom-up input pattern to the first ART network can learn to generate a top-down output pattern from the second ART network.
      || Fuzzy ARTMAP. Match tracking realizes minimax learning principle: vigilence increases to just above the match ratio of prototype / exemplar, thereby triggering search
    151. image p226fig05.35 I had shown in 1976 how a competitive learning or self-organizing map model could undergo catastrophic forgetting if the input environment was sufficiently dense and nonstationary, as illustrated by Figure 5.18. Later work with Gail Carpenter showed how, if the ART Matching Rule was shut off, repeating just four input patterns in the correct order could also casue catastrophic forgetting by causing superset recoding, as illustrated in Figure 5.36.
      || Code instability input sequences. D C A; B A; B C = ; |D|<|B|<|C|; where |E| is the number of features in the set E. Any set of input vectors that satisfy the above conditions will lead to unstable coding if they are periodically presented in the order ABCAD and the top-down ART Matching Rule is shut off.
    152. image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
      ||
    153. image p252fig06.01 A surface-shroud resonance begins to form when the surface representations of objects bid for spatial attention. In addition to these topographic excitatory inputs, there is long-range inhibition of the spatial attention cells that determines which inputs will attract spatial attention.
      || Bottom-up spatial attention competition. [more, less] luminous perceptual surfaces -> competition -> spatial attention
    154. image p253fig06.02 After bottom-up surface inputs activate spatial attentional cells, they send top-down topographic excitatory signals back to the surface representations. This recurrent shunting on-center off-surround network contrast enhances larger attentional activities while approximately normalizing the total spatial attentional activity. A surface-shroud resonance hereby forms that selects an attentional shroud, enhances the perceived contrast of the attended surface (light blue region), and maintains spatial attention on it.
      || Surface-shroud resonance. perceptual surfaces -> competition -> spatial attention. (Carrasco, Penpeci-Talgar, and Eckstein 2000, Reynolds and Desimone 2003)
    155. image p258fig06.07 A top-down spotlight of attention can also be converted into a shroud. This process begins when the spotlight triggers surface filling-in within a region. Figure 6.8 shows how it is completed.
      || Reconciling spotlights and shrouds: top-down attentional spotlight becomes a shroud. spotlight of attention, surface filling-in
    156. image p286fig07.04 Illusory contours persist longer than real contours because real contours have more inducers whose rebound at contour offset can cause faster boundary reset. Illusory contours also take longer to form than real contours, which explains the increasing portion of the curve.
      || Persistence data and simulations (Meyer, Ming 1988; Reynolds 1981). Increasing portion of curve is due to formation time of the illusory contour. Longer persistence is due to fewer bottom-up inducers of an illusory contour that has the same length as a real contour: only illuminance-derived edges generate reset signals. When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
    157. image p286fig07.05 This figure shows the propagation through time of illusory contour offset from the rebounded cells that got direct inputs to the center of the contour.
      || Persistence data and simulations. Illusory contours persist longer than real contours (Meyer, Ming 1988; Reynolds 1981). When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
    158. image p315fig08.35 The output signals from the directional grouping network obey the ART Matching Rule. They thereby select consistent motion directional signals while suppressing inconsistent ones, and do not distort that the spared cells code. The aperture problem is hereby solved by the same mechanism that dynamically stabilizes the learning of directional grouping cells.
      || How to select correct direction and preserve speed estimates? Prediction: Feedback from MSTv to MT- obeys ART Matching Rule; Top-down, modulatory on-center, off-surround network (Grossberg 1976, 1980; Carpenter, Grossberg 1987, 1991); Explains how directional grouping network can stably develop and how top-down directional attention can work. (Cavanough 1992; Goner etal 1986; Sekuer, Ball 1977; Stelmach etal 1994). Directional grouping network (MSTv) <-> Directional long-range filter (MT). Modulatory on-center selects chosen direction and preserves speed. Off-surround inhibits incompatible directions.
    159. image p330fig08.52 Direction fields of the object frame (left column) and of the two dot "parts" (right column) show the correct motion directions after the peak shift top-down expectation acts.
      || Simulation of motion vector decomposition. [Larger scale (nearer depth), Small scale (farther depth)] vs [Down, Up]
    160. image p331fig08.54 The simulated part directions of the rotating dot through time after the translational motion of the frame does its work via the top-down peak shift mechanism.
      || Cycloid. Motion directions of a single dot moving slowly along a cycloid curve through time.
    161. image p354fig10.01 The lamiar cortical circuit that realizes how we pay attention to an object sends signals from layer 6 of a higher cortical level to layer 6 of a lower cortical level and then back up to layer 4. This "folded feedback" circuit realizes a top-down, modulatory on-center, off-surround circuit that realizes the ART Matching Rule.
      || Top-down attention and folded feedback. Attentional signals also feed back into 6-to-4 on-center off-surround. 1-to-5-to-6 feedback path: Macaque (Lund, Booth 1975) cat (Gilbert, Wiesel 1979). V2-to-V1 feedback is on-center off-surround and affects layer 6 of V1 the most (Bullier etal 1996; Sandell, Schiller 1982). Attended stimuli enhanced, ignored stimuli suppressed. This circuit supports the predicted ART Matching Rule! [LGN, V[1,2][6->1]]
    162. image p356fig10.03 Laminar computing achieves at least three basic properties of visual processing that have analogs in all biologically intelligent behaviors. These properties may be found in all cortical circuits in specialized form.
      || What does Laminar Computing achieve? 1. Self-stabilizing development and learning; 2. Seamless fusion of a) pre-attentive automatic bottom-up processing, b) attentive task-selective top-down processing; 3. Analog coherence: Solution of Binding Problem for perceptual grouping without loss of analog sensitivity. Even the earliest visual cortical stages carry out active adaptive information processing: [learn, group, attention]ing
    163. image p359fig10.05 Activation of V1 is initiated, in part, by direct excitatory signals from the LGN to layer 4 of V1.
      || How are layer 2/3 bipole cells activated? Direct bottom-up activation of layer 4. LGN -> V1 layer 4. Strong bottom-up LGN input to layer 4 (Stratford etal 1996; Chung, Ferster 1998). Many details omitted.
    164. image p359fig10.06 Another, albeit indirect, pathway from LGN exists that can also excite layer 4 of V1. Why are not these two pathways redundant? The answer, ultimately, how to do with how cortex learns, as well as with how it pays attention. See the text for details.
      || Another bottom-up input to layer 4: Why?? Layer 6-to-4 on-center off-surround (Grieve, Sillito 1991, 1995; Ahmedetal 1994, 1997). LGN projects to layers 6 and 4. Layer 6 excites spiny stellates in column above it. Medium range connections onto inhibitory neurons. 6-t-4 path acts as on-center off-curround.
    165. image p359fig10.07 The two bottom-up pathways from LGN to layer 4 of V1 can together activate layer 4 and contrast-normalize layer 4 responses.
      || Bottom-up contrast normalization (Grossberg 1968, 1973; Sperling, Sondhi 1968; Heeger 1992; Douglas etal 1995; Shapley etal 2004). Together, direct LGN-to-4 path and 6-to-4 on-center off-surround provide contrast normalization if cells obey shunting or membrane equation dynamics.
    166. image p360fig10.08 The bottom-up on-center off-surround from LGN-to-6-to-4 has a modulatory on-center because of its role in realizing the ART Matching Rule and, with it, the ability of the cortex to dynamically stabilize its learned memories.
      || Modulation of priming by 6-to-4 on-center (Stratford etal 1996; Callaway 1998). On-center 6-to-4 excitation is inhibited down to being modulatory (priming, subthreshold). On-center 6-to-4 excitation cannot activate layer 4 on its own. Clarifies need for direct path. Prediction: plays key role in stable grouping, development and learning. ART Matching Rule!
    167. image p364fig10.13 The bottom-up adaptive filter, intracortical grouping circuit, and intercortical top-down attentional circuit all use the same competitive decision circuit between layers 6 and 4, called the attention-preattention interface, with which to select the featural patterns that will be processed.
      || Bottom-up filters and intracortical grouping feedback use the same 6-to-4 decision circuit, LGN-> Vx[6,4,2/3]. competitive decision circuit, modulatory on-center off-surround network. Top-down intercortical attention also uses the same 6-to-4 decision circuit!
    168. image p364fig10.14 This figure emphasizes how preattentive intracortical groupings and top-down intercortical attention share the same modulatory on-center, off-surround layer 4-to-6 decision circuit.
      || Explanation: grouping and attention share the same modulatory decision circuit. Layer 6-6-4-2/3 pathway shown; also a layer 6-1-2/3 path. intercortical attention, both act via a modulatory on-center off-surround decision circuit, intracortical feedback from groupings
    169. image p420fig12.18 The ARTSTREAM model explains and simulates the auditory continuity illusion as an example of a spectral-pitch resonance. Interactions of ART Matching Rule and asymmetric competition mechanisms in cortical strip maps explain how the tone selects the consistent frequency from the noise in its own stream while separating the rest of the noise into another stream.
      || ARTSTREAM model (Grossberg 1999; Grossberg, Govindarajan, Wyse, Cohen 2004). SPINET. Frequency and pitch strips. Bottom Up (BU) harmonic sieve. Top Down (TD) harmonic ART matching. Exclusive allocation. Learn pitch categories based on early harmonic processing. A stream is a Spectral-Pitch Resonance!
    170. image p441fig12.38 The LTM Invariance Principle is realized if the relative sizes of the inputs to the list chunk level stay the same as more items are stored in working memory. This property, in turn, follows from shunting previously stored working memory activities when a ne4w item occurs.
      || LTM Invariance principle. Choose STM activities so that newly stored STM activities may alter the size of old STM activities without recoding their LTM patterns. In particular: New events do not change the relative activities of past event sequences, but may reduce their absolute activites. Why? Bottom-up adaptive filtering uses dot products: T(j) = sum[i=1 to n: x(i)*z(i,j) = total input to v(j). The relative sizes of inputs to coding nodes v(j) are preserved. x(i) -> w*x(i), 0 < w <= 1, leaves all past ratios T(j)/T(k) unchanged.
    171. image p449fig12.47 This figure illustrates the self-similarity in a Masking Field of both its recurrent inhibitory connections (red arrows) and its top-down excitatory priming signals (green arrows) to the item chunk working memory.
      || Both recurrent inhibition and top-down excitatory priming are self-similar in a masking field. MYSELF <-> [MY, MYSELF]
    172. image p453fig12.50 The ARTWORD perception cycle shows how sequences of items activate possible list chunks, which compete among each other and begin to send their top-down expectations back to the item working memory. An item-list resonance develops through time as a result.
      || ARTWORD perception cycle. (a) bottom-up activation (b) list chunk competition (c) item-list resonance (d) chunk reset due to habituative collapse.
    173. image p483fig13.02 The object-value categories in the orbitofrontal cortex require converging specific inputs from the sensory cortex and nonspecific incentive motivational inputs from the amygdala in order to fire. When the orbitofrontal cortex fires, it can deliver top-down ART Matching Rule priming signals to the sensory cortical area by which it was activated, thereby helping to choose the active recognition categories there that have the most emotional support, while suppressing others, leading to attentional blocking of irrelevant cues.
      || Cognitive-Emotional-Motor (CogEM) model. Drive-> amygdala incentive motivational learning-> orbitofrontal cortex- need converging cue and incentive iputs to fire <-> sensory cortex- conditioned renforcer learning-> amygdala. CS-> sensory cortex. Motivated attention closes the cognitive-emotional feedback loop, focuses on relevant cues, and causes blocking of irrelevant cues.
    174. image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
      || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
    175. image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
      ||
    176. image p600fig16.36 The entorhinal-hipppocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories. See the text for details.
      || Entorhinal-hippocampal interactions as an ART system. Hippocampal place cells as spatial categories. Angular head velocity-> head direction cells-> stripe cells- small scale 1D periodic code (ECIII) SOM-> grid cells- small scale 2D periodic code (ECII) SOM-> place cells- larger scale spatial map (DG/CA3)-> place cells (CA1)-> conjunctive-coding cells (EC V/VI)-> top-down feedback back to stripe cells- small scale 1D periodic code (ECIII). stripe cells- small scale 1D periodic code (ECIII)-> place cells (CA1).
    177. image p613fig16.44 The main target position vector (TPV), difference vector (DV), and volitional GO computations in SOVEREIGN that bring together reactive and planned signals to control decision-making and action. See the text for details.
      || Reactive visual TPV (RVT), NETs (NETs), S-MV mismatch (SMVM), NETmv (NETmv), reactive visual TPV storage (RVTS), reactive DV1 (RD1), NET (NET), motivated what and where decisions (MWWD), Planned DV1 (PD1), tonic (Tonic), top-down readout mismatch (TDRM), Parvo gate (tonic) (PG), Orienting GOp offset (OGpO). RVT-> [NETs, RVTS], NETs-> [SMVM, NET], SMVM-> NET, NETmv-> SMVM, RVTS-> [NETs, RD1], NET-> [RD1, PD1, TDRM], MWWD-> PD1, PD1-> Tonic-> TDRMPG-> NETs, OGpO-> [NETmv, PD1].
    178. Grossberg 2021 p229c2h0.60 SMART computer simulations demonstrate that a good enough match of a top-down expectation with a bottom-up feature pattern generates an attentive resonance during which the spikes of active cells synchronize in the gamma frequency range of 20-70 Hz (Figure 5.40). Many labs have reported a link between attention and gamma oscillations in the brain, including two articles published in 2001, one from the laboratory of Robert Desimone when he was at the the National Institute of Mental Health in Bethseda (Fries, Reynolds, Rorie, Desimone 2001), and the other from the laboratory of Wolf Singer in Frankfurt (Engel, Fries, Singer 2001). You'll note that Pascal Fries participated in both studies, and is an acknowledged leader in neurobiological studies of gamma oscillations; eg (Fries 2009). .."
    179. Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
      The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

      Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

      Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

      These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

      This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

      The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

      I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

      Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

      The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

      If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

      The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

      Howell: bold added for emphasis.
      (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
      see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
    180. p190 Howell: [neural microcircuits, modal architectures] used in ART -
      bottom-up filterstop-down expectationspurpose
      instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
      LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
      EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
      auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
      auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
      visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
      red - cognitive-emotional dynamics
      green - working memory dynamics
      black - see [bottom-up, top-down] lists
      EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
      CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

      backgound colours in the table signify :
      whitegeneral microcircuit : a possible component of ART architecture
      lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
      light bluepost-perceptual cognition?
      pink"the feeling of what happens" and knowing what event caused that feeling
    181. image p009fig01.06 Primacy gradient of activity stored in working memory within a recurrent shunting on-center off-surround network. Rehersal is controlled by a nonspecific rehersal wave and self-inhibitory feedback of the item that is currently being rehearsed. Rehearsal is controlled by a nonspecific rehearsal wave and self-inhibitory feedback of the item that is currently being rehearsed. Green = excitatory, red = inhibitory
      || inputs? -> item and order WM storage -> competitive selection-> rehearsal wave -> outputs
    182. image p077fig02.27 A schematic of the on-center off-surround network that occurs in the mudpuppy retina, including three main cell types: receptors, horizontal cells, and bipolar cells.
      || Mechanism: cooperative-competitive dynamics.
      On-center off-surround (Kuffler 1953) cat retina
      Subtractive lateral inhibition (Hartline, Ratcliff 1956/7+) limulus retina.
      R receptor -> H horizontal -> B bipolar (Werblin, Dowling, etal 1969+) mudpuppy retina.
    183. image p100fig03.15 A fuzzy band of possible initial grouping orientations allows grouping to get started. Cooperative-competitive feedback via a hierarchical resolution of uncertainty chooses a sharp final grouping that has the most evidence to support it.
      || before choice: transient; after choice: equilibrium
    184. image p108fig03.28 The watercolor illusion of Baingio Pinna 1987 can be explained using spatial competition betweeen like-oriented boundary signals. This occurs at what I have called the First Competitive Stage. This is one stage in the brain's computation of hypercomplex cells, which are also called endstopped complex cells. Why the blue regions seem to bulge in depth may be explained using multple-scale, depth-selective boundary webs. See ther text for details.
      || Baigio Pinna. Watercolor illusion 1987. Filled-in regions bulge in depth. Multiple-scale, depth-selective boundary web!
    185. image p146fig04.25 Networks of simple, complex, and hypercomplex cells can create end cuts as an example of hierarchical resolution of uncertainty. See the text for details.
      || How are end cuts created? (Grossberg 1984) Two stages of short-range competition. 1st stage: Simple cells -> complex cells -> hypercomplex - endstopped complex. First competitive stage- across position, same orientation; Second competitive stage- same position, across orientation. -> cooperation.
    186. image p148fig04.26 End cuts are formed during neon color spreading in the same way that they are formed at line ends.
      || End cut during neon color spreading.
      FIRST competitive stageSECOND competitive stage
      within orientationacross orientation
      across positionwithin position
      to generate end cuts.
    187. image p149fig04.27 Bipole cells can form boundaries that interpolate end cuts, and use their cooperative-competitive interactions to choose the boundary groupings that have the most support from them.
      || Bipole cells: boundary completion. long-range cooperation & short-range inhibition: complete winning boundary groupings and suppress weaker boundaries.
    188. image p161fig04.37 Kanizsa squares that form either collinearly to their inducers (left panel) or perpendicular to them (right panel) confirm predictions of the BCS boundary completion model.
      || Analog-sensitive boundary completion. contour strength vs Kanizsa square image. Increases with "support ratio" (Shipley, Kellman 1992). Inverted-U (Lesher, Mingoloa 1993; cf Soriano, Spillmann, Bach 1994)(shifted gratings). p370h0.6 BCS = Boundary Contour System, FCS = Feature Contour System. p161c1h0.85 "... As predicted by the BCS, they found an Inverted-U in contour strength as a function of line density. ... This effect may be explained by the action of the short-range competition that occurs before the stage of long-range cooperative grouping by bipole cells (Figure 4.32). It is thus another example of the balance between cooperative and competitive mechanisms. ..."
    189. image p198fig05.10 A competitive learning circuit learns to transform distributed feature patterns into selective responses of recognition categories.
      || Competitive learning and Self-Organized Maps (SOMs). input patterns -> feature level (F1) -> adaptive filter (T=ZS) ->
    190. image p205fig05.18 How catastrophic forgetting can occur in a competitive learning or self-organizing map model due to basic properties of competition and associative learning.
      || Learning from pattern sequences, practicing a sequence of spatial patterns can recode all of them! When is learning stable? Input patterns cannot be too dense relative to the number of categories; Either: not to many distributed inputs relative to the number of categories, or not too many input clusters
    191. image p226fig05.35 I had shown in 1976 how a competitive learning or self-organizing map model could undergo catastrophic forgetting if the input environment was sufficiently dense and nonstationary, as illustrated by Figure 5.18. Later work with Gail Carpenter showed how, if the ART Matching Rule was shut off, repeating just four input patterns in the correct order could also casue catastrophic forgetting by causing superset recoding, as illustrated in Figure 5.36.
      || Code instability input sequences. D C A; B A; B C = ; |D|<|B|<|C|; where |E| is the number of features in the set E. Any set of input vectors that satisfy the above conditions will lead to unstable coding if they are periodically presented in the order ABCAD and the top-down ART Matching Rule is shut off.
    192. image p287fig07.07 Persistence increases with distance between a target and a masking stimulus due to weakening of the spatial competition in the first competitive stage of hypercomplex cells.
      || Persistence data and simulations. Persistence increases with distance between a target and a masking stimulus (Farrell, Pavel, Sperling 1990). There is less spatial competition from the masker to the target when they are more distant, hence the target is more persistent.
    193. image p364fig10.13 The bottom-up adaptive filter, intracortical grouping circuit, and intercortical top-down attentional circuit all use the same competitive decision circuit between layers 6 and 4, called the attention-preattention interface, with which to select the featural patterns that will be processed.
      || Bottom-up filters and intracortical grouping feedback use the same 6-to-4 decision circuit, LGN-> Vx[6,4,2/3]. competitive decision circuit, modulatory on-center off-surround network. Top-down intercortical attention also uses the same 6-to-4 decision circuit!
    194. image p437fig12.33 Item and Order working memory models explain free recall data, as well as many other psychological and neurobiological data, by simulating how temporal series of events are stored as evolving spatial patterns of activity at content-addressable item categories. The categories with the largest activities are rehearsed first, and self-inhibit their activity as they do so in order to prevent tem from being rehearsed perseveratively. The laws whereby the items are stored in working memory obey basic design principles concerning list categories, or chunks, of sequences of stored items can be stably remembered.
      || Working memory models: item and order, or competitive queuing (Grossberg 1978; Houghton 1990; Page, Norris 1998). Event sequence in time stored as an evolving spatial pattern of activity. Primacy gradient of working memory activation stores correct temporal order at content-addressable cells. Maximally activated cell populations is performed next when a rehearsal wave is turned on. Output signal from chosen cell population inhibits its own activity to prevent perseveration: inhibition of return. Iterate until entire sequence is performed.
    195. image p488fig13.12 (left column) How incentive motivational feedback amplifies activity of a sensory cortical cell population. (right column) A sensory cortical cell population whose activity is amplified by incentive motivational feedback can suppress the activities of less activated populations via self-normalizing recurrent competitive interactions.
      || Motivational feedback and blocking. (left) sensory input CS, STM activity without motivational feedback, STM activity with motivational feedback. (right) STM suppressed by competition, STM amplified by (+) feedback.
    196. image p510fig13.39 Shunting competition and informational noise suppression in affective gated dipoles, plus back-propagating action potentials for teaching signals, enable the net normalized adaptive weights to be learned. They never saturate!
      || Learn net dipole output pattern. Opponent "decision" controls learning. Cf. competitive learning. Learning signal, opponent extinction.
    197. p001 Chapter 1 Overview - From Complementary Computing and Adaptive Resonance to conscious awareness
    198. p250 Chapter 6 Conscious seeing and invariant recognition - Complementary cortical streams coordinate attention for seeing and recognition
    199. p289 Chapter 8 How we see and recognize object motion - Visual form and motion perception obey complementary laws
    200. p337 Chapter 9 Target tracking, navigation, and decision-making - Visual tracking and navigation obey complementary laws
    201. image p029tbl01.01 Some pairs of complementary processing streams.
      ||
      visual boundary:
      interblob stream V1-V2-V4
      visual surface:
      blob stream V1-V2-V4
      visual boundary:
      interblob stream V1-V2-V4
      visual motion:
      magno stream V1-MT-MST
      WHAT streamWHERE stream
      perception & recognition:
      interferotemporal & prefrontal areas
      space & action:
      parietal & prefrontal areas
      object tracking:
      MT interbands & MSTv
      optic flow navigation:
      MT+ bands & MSTd
      motor target position:
      motor & parietal cortex
      volitional speed:
      basal ganglia
    202. image p030tbl01.02 The What and Where cortical processing streams obey complementary laws. These laws enable the What stream to rapidly and stably learn invariant object categories without experiencing catastrophic forgetting, while the Where stream learns labile spatial and action representations to control actions that are aimed towards these objects.
      ||
      WHATWHERE
      spatially-invariant object learning and recognitionspatially-variant reaching and movement
      fast learning without catastrophic forgettingcontinually update sensory-motor maps and gains
      IT InterferoTemporal CortexPPC Posterior Parietal Cortex
      WhatWhere
      matchingexcitatoryinhibitory
      learningmatchmismatch
    203. image p087fig03.01 A macrocircuit of key visual processes (in green) and the cortical areas in which they primarily occur (in red), from the retina to the Prefrontal Cortex (PFC), including both the What and Where cortical streams. The [bottom-up, horizontal, and top-down] interactions help each of these processes to overcome computationally complementary processing deficiencies that they would experience without them, and aso to read-out top-down expectations that help to stabilize learning while they focus attention on salient objects and positions.
      || Emerging unified theory of visual intelligence. [What, Where] streams. Bottom-up and top-down interactions overcome COMPLEMENTARY processing deficiencies.
    204. image p094fig03.07 The processes of boundary completion and surface filling-in are computationally complementary.
      ||
      Boundary completionSurface filling-in
      outwardinward
      orientedunoriented
      insensitive to direction of contrastsensitive to direction-of-contrast
    205. image p174fig04.51 The same feedback circuit that ensures complementary consistency between boundaries and surfaces also, automatically, initiates figure-ground separation! See the text for details.
      || before feedback: [V1 -> V2 pale stripe -> V2 thin stripe, "attention pointers" (Cavanagh etal 2010)]; after feedback: [V1 + V2 thin stripe] -> V2 pale stripe via contrast sensitive [exhitation, inhibition] for depths [1, 2] -> object recognition
    206. image p176fig04.53 The on-center off-surround network within position and across depth helps to explain why brighter Kanizsa squares look closer.
      || inhibition vs. depth. p176c1h0.25 "... to qualitatively understand how this example of proximity-luminance covariance works. It follows directly from the boundary pruning by surface contour feedback signals (Figure 4.51) that achieves complementary consistency and initiates figure-ground perception. ...". p176c1h0.45 "... these inhibitory sigals are part of an off-surround network whose strength decreases as the depth difference increases between the surface that generates the signal and its recipient boundaries. ...". p176c1h0.8 "... Within FACADE theory, the perceived depth of a surface is controlled by the boundaries that act as its filling-in generators and barriers (Figure 3.22), since these boundaries select the depth-sselective FIDOs within whin filling-in can occur, and thereby achieve surface capture. These boundaries, in turn, are themselves strengthened after surface-to-boundary contour feedback eliminates redundant boundaries that cannot support sucessful filling-in (Figure 4.51). These surface contour feedback signals have precisely the properties that are needed to explain why brighter Kanizsa squares look closer! ..."
    207. image p211fig05.20 The PN and N200 event-related potentials are computationally complementary events that are computed within the attentional and orienting systems.
      || PN and N200 are complementary waves. PN [top-down, conditionable, specific] match; N200 [bottom-up, unconditionable, nonspecific] mismatch
    208. image p267fig06.14 Feedback from object surfaces to object boundaries uses surface contours. This feedback assures complementary consistency and enables figure-ground separation. A corollary discharge of the surface contours can be used to compite salient object feature positions.
      || Perceptual consistency and figure-ground separation.
    209. image p314fig08.34 The VISTARS model for visually-based spatial navigation. It uses the Motion BCS as a front end and feeds it output signals into two computationally complementary cortical processing streams for computing optic flow and target tracking information.
      || VISTARS navigation model (Browning, Grossberg, Mingolia 2009). Use FORMOTION model as front end for higher level navigational circuits: input natural image sequences -> estimate heading (MT+)-MSTd -> additive processing -> estimate object position (MT-)-MSTv direction and speed subtractive processing -> Complementary Computing. [optic flow navigation, object tracking]
    210. image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
      ||
    211. image p548fig15.16 Homologous recognition learning and reinforcement learning macrocicuits enable adaptively timed conditioning in the reinforcement learning circuit to increase inhibition of the orienting system at times when a mismatch in the recognition system would have reduced inhibition of it.
      || Homolog between ART and CogEM model, complementary systems. [Recognition, Reinforcement] learning vs [Attentional, Orienting] system. Reinforcement: timing, drive representation.
    212. p353 Chapter 10 Laminar computing by cerebral cortex - Towards a unified theory of biologucal and artificial intelligence
    213. image p030fig01.20 A schematic cross-section of a slice of laminar neocortex whose cells are organized in a characteristic way in six layers, which themselves may be organized into distinct sublaminae. The computational paradigm of Laminar Computing attempts to show how different parts of neocortex can represent and control very different kinds of behavior - including vision, speech, can cognition - using specializations of the same canonical laminar cortical design.
      || Projection fibres: Cortico[spinal, bulbar, pontine, striate, reticulat, etc]; Thalamocortical fibres; Diffuse cortical afferent fibres: [nonspecific thalamocortical, Cholinergic, Monoaminergic]; Corticocortical efferents; Projection [cell, fibre]; Corticocortical efferent terminals.
    214. image p141fig04.19 A laminar cortical circuit for computing binocular disparities in layer 3B of V1 at binocular simple cells. These cells add positionally disparate inputes from like polarized monocular simple cells (layer 4 of V1). Binocular simple cells at each position that is sensitive to opposite polarities then add their outputs at complex cells in layer 2/3. Chapter 10 will explain how these laminar circuits work in greater detail.
      || Laminar cortical circuit for complex cells. [left, right] eye.
      V1 layerdescription
      2/3Acomplex cells
      3Bbinocular simple cells
      4monocular simple cells
    215. image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
      || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
    216. image p174fig04.52 An example of how the 3D LAMINART model can transform the two monocular images of the random dot stereogream in the top row into the three depth-separated surfaces representations in the bottom row.
      || Stereogram surface percepts: surface lightnesses are segregated in depth (Fang and Grossberg 2009). [left, right] inputs, [far, fixation, near] planes. Contrast algorithms that just compute disparity matches and let computer code build the surface eg (Marr, Poggio, etal 1974).
    217. image p182fig04.58 LAMINART model processing stage that are sufficient to explain many percepts of transparency, including those summarized in Figure 4.57.
      || [left, right] eye, [LGN, V1 [6, 4, 3B, 2/3 A], V2 [4, 2/3]], [mo, bi]nocular cart [simple, complex] cells, [excita, inhibi]tory cart [connection, cell]s.
    218. image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
      || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
    219. image p231fig05.39 The SMART hypothesis testing and learning cycle predicts that vigilance increases when a mismatch in subcortical regions like the nonspecific thanlamus activates the nucleus basalis of Meynert which, in turn, broadcasts a burst of the neurotransmitter acetylcholine, or ACh, to deeper cortical layers. Due to the way in which LAMINART proposes that cortical matching and mismatching occurs, this ACh burst can increase vigilance and thereby trigger a memory search. See the text for details.
      || [BU input, [, non]specific thalamic nucleus, thalamic reticulate nucleus, neocortical laminar circuit] cart [Arousal, Reset, Search, Vigilance]
    220. image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
      || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
    221. image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
      ||
    222. image p248fig05.49 This circuit of the LAMINART model helps to explain properties of Up and Down states during slow wave sleep, and how disturbances in ACh dynamics can disrupt them.
      ||
    223. image p356fig10.03 Laminar computing achieves at least three basic properties of visual processing that have analogs in all biologically intelligent behaviors. These properties may be found in all cortical circuits in specialized form.
      || What does Laminar Computing achieve? 1. Self-stabilizing development and learning; 2. Seamless fusion of a) pre-attentive automatic bottom-up processing, b) attentive task-selective top-down processing; 3. Analog coherence: Solution of Binding Problem for perceptual grouping without loss of analog sensitivity. Even the earliest visual cortical stages carry out active adaptive information processing: [learn, group, attention]ing
    224. image p357fig10.04 Laminar Computing achieves its properties by computing in a new way that sythesizes the best properties of feedforward and feedback interactions, analog and digital computations, and preattentive and attentive learning. The property of analog coherence enables coherent groupings and decisions to form without losing sensitivity to the amount of evidence that supports them.
      || Laminar Computing: a new way to compute. 1. Feedforward and feedback: a) Fast feedforward processing when data are unambiguous (eg Thorpe etal), b) slower feedback chooses among ambiguous alternatives [self-normalizing property, real-time probabiligy theory], c) A self-organizing system that trades certainty against speed: Goes beyond Bayesian models! 2. Analog and Digital: Analog Coherence combines the stability of digital with the sensitivity of analog. 3. Preattentive and Attentive Learning: Reconciles the differences of (eg) Helmholtz and Kanizsa, "A preattentive grouping is its own 'attentional' prime"
    225. image p362fig10.11 Feedback between layer 2/3 to the layer 6-to-4-to-2/3 feedback loop chooses the strongest grouping in cases where there is more than one. If only one grouping exists, then the circuit can function very quickly in a feedforward manner. When multiple groupings exist, the cortex "runs as fast as it can" to select the one with the most evidence to support it using the self-normalizing inhibition in the layer 6-to-4 off-surround.
      || How is the final grouping selected? Folded feedback LGN-> 6-> 4-> 2/3. 1. Layer 2/3 groupings feed back into 6-to-4 on-center off-surround: a) direct layer 2/3 -to-6 path; b) can also go via layer 5 (Blasdel etal 1985; Kisvarday etal 1989). 2. Strongest grouping enhanced by its on-center. 3. Inputs to weaker groupings suppressed by off-surround. 4. Interlaminar feedback creates functional columns. Activities of conflicting groupings are reduced by self-normalizing inhibition, slowing processing; intracortical feedback selects and contrast-enhances the winning grouping, speeding processing.
    226. image p363fig10.12 The same laminar circuit design repeats in V1 and V2, albeit with specializations that include longer horizontal grouping axoms and figure-ground separation interactions.
      || V2 repeats V1 circuitry at larger spatial scale, LGN-> V1[6,4,2/3]-> V2[6,4,2/3]. V2 layer 2/3 horizontal axons longer-range than in V1 (Amir etal 1993). Therefore, longer-range groupings can form in V2 (Von der Heydt etal 1984)
    227. image p375fig11.07 The 3D LAMINART model uses both monocular and binocular simple cells to binocularly fuse like image contrasts. The remainder of the model generates 3D boundary and surface representations of multiple kinds of experiments as well as of natural scenes.
      || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A], V2 thin stripe [4->2/3A], V4]. V1 blob [V1-4 monocular, V1 interior binocular] simple cells. [complex, simple, inhibitory] cells, on-center off-surround
    228. image p376fig11.10 The 3D LAMINART model shows how the disparity filter can be integrated into the circuit that completes 3D boundary representations using bipole grouping cells. It also explains how surface contours can strengthen boundaries that succeed in generating closed filling-in domains.
      || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A] surface contour, V2 thin stripe (monocular surface) [4->2/3A], V2 interior [disynaptic inhibitory interneurons, bipole grouping cells, disparity filter, V4 binocular surface]. [complex, simple, inhibitory] cells, on-center off-surround
    229. image p378fig11.12 How monocular and binocular information are combined in V1 and V2 in the 3D LAMINART model.
      || Model utilizes monocular information. [left, right] eye cart V1-[4 monocular simple, 3B binocular simple, complex2/3A [mo,bi]nocular] cells, V2-4 binocular complex cells. black = monocular cells, blue = binocular cells. In V2, monocular inputs add to binocular inputs along the line of sight and contribute to depth perception.
    230. image p379fig11.13 How the 3D LAMINART model explains DaVinci stereopsis. All the stages of boundary and surface formation are color coded to clarify their explanation. Although each mechanism is very simple, when all of them act together, the correct depthful surface representation is generated. See the text for details.
      || DaVinci stereopsis (Nakayama, Shimojo 1990). An emergent property of the previous simple mechanisms working together. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] pair [Binocular match: boundaries of thick bar -> Add monocular boundaries along lines-of-sight -> Line-of-sight inhibition kills weaker vertical boundaries -> 3D surface percept not just a disparity match!] pair [Binocular match: right edge of thin and thick bars -> Strongest boundaries: binocular and monocular boundaries add -> Vertical boundaries from monocular left edge of thin bar survive -> Filling-in contained by connected boundaries]. cart [very near, near, fixation plane, far, very far]
    231. image p380fig11.14 The model explanation of DaVinci stereopsis when the input stimuli have opposite contrast polarities.
      || Polarity-reversed Da Vinci stereopsis (Nakayama, Shimojo 1990). Same explanation! (... as Figure 11.13 ...) [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    232. image p395fig11.34 A comparison of the properties of other rivalry models with those of the 3D LAMINART model (surrounded by red border). Significantly, only 3D LAMINART explains both stable vision and rivalry (green border).
      || Comparison of rivalry models
    233. image p401fig11.41 The 3D LAMINART model proposes how angle cells and disparity-gradient interact through learning to generate 3D representations of slanted objects.
      || 3D LAMINART model. [LGN, V1, V2, V4] Four key additions: 1. Angle cells - tuned to various angles; 2. Disparity-gradient cells - tuned to disparity gradients in the image; 3. weights from [angle to disparity-gradient] cells - learned while viewing 3D image; Colinear grouping between [angle to disparity-gradient] cells - disambiguates ambiguous groupings.
    234. image p402fig11.44 3D scenic reconstruction of the image in Figure 11.43 by the 3D LAMINART model.
      || Disparity [5, 6, 8, 10, 11, 14]: images of objects in common depth planes
    235. image p436fig12.30 The conscious ARTWORD, or cARTWORD, laminar cortical speech model simulates how future context can disambiguate noisy past speech sounds in such a way that the completed percept is consciously heard to proceed from past to future as a feature-item-list resonant wave propagates through time.
      || cARTWORD: Laminar cortical model macrocircuit (Grossberg, Kazerounian 2011) Simulates PHONEMIC RESTORATION: Cognitive Working Memory (processed item sequences) - [Excitatory-> inhibitory-> habituative-> adaptive filter-> adaptive filter-> adaptive filter with depletable synapse-> Acoustic [item, feature]
    236. image p444fig12.42 The LIST PARSE laminar cortical model of working memory and list chunking that I published with Lance Pearson in 2008 simulated the Averbeck etal data in Figure 12.41, as in the left column of the figure. It also simulated cognitive data about working memory storage by human subjects. See the text for details.
      || LIST PARSE: Laminar cortical model of working memory and list chunking (Grossberg, Pearson 2008). Simulates data about: [immediate, delayed, continuous] distractor free recall; immediate serial recall; and variable-speed sequential perfomance of motor acts. [velocity, acceleration] vs time (ms) from recall cue.
    237. image p445fig12.43 The LIST PARSE laminar cortical Cognitive Working Memory circuit, that is proposed to occur in ventrolateral prefrontal cortex, is homologous to the LAMINART circuit circuit that models aspects of how visual cortex sees. The Motor Working Memory, VITE Trajectory Generator, and Variable-Rate Volitional Control circuits model how other brain regions, including dorsolateral prefrontal cortex, motor cortex, cerebellum, and basal ganglia, interact with the Cognitive Working Memory to control working memory storage and variable-rate performance of item sequences.
      || List parse circuit diagram. Connectivity convention. sequence chunks [<- BU filter, TD expectation ->] working memory. Working memory and sequence chunking circuit is homologous to visual LAMINART circuit!
    238. image p498fig13.22 (left column, top row) Adaptive filtering and conditioned arousal are both needed to regulate what cues can learn to activate particular space-time patterns. These developments lead inexorably to basic cognitive abilities, as embodied in the 3D LAMINART models for 3D vision and figure-ground perception (Chapter 11) and the 3D ARTSCAN SEARCH model for invariant object learning, recognition, and 3D search (Chapter 6). (right column, top row) Conditioned arousal enables only emotionally important cues to activate a motivationally relevant space-time pattern. (bottom row) Conditioned arousal and drive representations arise naturally from the unlumping of avalanche circuits to make them selective to motivationally important cues. The MOTIVATOR model is a natural outcome of this unlumping process (this chapter).
      || (top) Adaptive filtering and Conditioned arousal. Towards Cognition: need to filter inputs to the command cell. Towards Emotion: important signals turn arousal ON and OFF. (bottom) Conditioned arousal and Drive representations. Competition between conditioned arousal sources at drive representations, eg amygdala.
    239. image p038fig01.25 The ART Matching Rule stabilizes real time learning using a [top-down, modulatory on-center, off-surround] network. Object attention is realized by such a network. See text for additional discussion.
      || ART Matching Rule [volition, categories, features]. [one, two] against one.
    240. image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
      || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
    241. image p192fig05.06 Bottom-up and top-down circuits between the LGN and cortical area V1. The top-down circuits obey the ART Matching Rule for matching with bottom-up input patterns and focussing attention on expected critical features.
      || Model V1-LGN circuits, version [1, 2]. retina -> LGN relay cells -> interneurons -> cortex [simple, endstopped] cells -> cortex complex cells
    242. image p200fig05.13 Instar and outstar learning are often used to learn the adaptive weights in the bottom-up filters and top-down expectations that occur in ART. The ART Matching Rule for object attention enables top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features.
      || Expectations focus attention: feature pattern (STM), Bottom-Up adaptive filter (LTM), Category (STM), competition, Top-Down expectation (LTM); ART Matching Rule: STM before top-down matching, STM after top-down matching (attention!)
    243. image p207fig05.19 The ART hypothesis testing and learning cycle. See the text for details about how the attentional system and orienting system interact in order to incorporate learning of novel categories into the corpus of already learned categories without causing catastophic forgetting.
      ||
    244. image p211fig05.21 Sequences of P120, N200, and P300 event-related potentials occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
      || ERP support for mismatch-mediated reset: event-related potentials: human scalp potentials. ART predicted correlated sequences of P120-N200-P300 Event Related Potentials during oddball learning. P120 mismatch; N200 arousal/novelty; P300 STM reset. Confirmed in (Banquet and Grossberg 1987)
    245. image p215fig05.28 How a mismatch between bottom-up and top-down input patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level.
      || Mismatch -> inhibition -> arousal -> reset. BU input orienting arousal, BU+TD mismatch arousal and reset. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    246. image p221fig05.31 A system like Fuzzy ARTMAP can learn to associate learned categories in one ART network with learned categories in a second ART network. Because both bottom-up and top-down interactions occur in both networks, a bottom-up input pattern to the first ART network can learn to generate a top-down output pattern from the second ART network.
      || Fuzzy ARTMAP. Match tracking realizes minimax learning principle: vigilence increases to just above the match ratio of prototype / exemplar, thereby triggering search
    247. image p226fig05.35 I had shown in 1976 how a competitive learning or self-organizing map model could undergo catastrophic forgetting if the input environment was sufficiently dense and nonstationary, as illustrated by Figure 5.18. Later work with Gail Carpenter showed how, if the ART Matching Rule was shut off, repeating just four input patterns in the correct order could also casue catastrophic forgetting by causing superset recoding, as illustrated in Figure 5.36.
      || Code instability input sequences. D C A; B A; B C = ; |D|<|B|<|C|; where |E| is the number of features in the set E. Any set of input vectors that satisfy the above conditions will lead to unstable coding if they are periodically presented in the order ABCAD and the top-down ART Matching Rule is shut off.
    248. image p226fig05.36 Column (a) shows catastrophic forgetting when the ART Matching Rule is not operative. It is due to superset recoding. Column (b) shows how category learning quickly stabilizes when the ART Martching Rule is restored.
      || Stabel and unstable learning, superset recoding
    249. image p228fig05.37 A macrocircuit of the neurotrophic Spectrally Timed ART, or nSTART, model. I developed nSTART with my PhD student Daniel Franklin. It proposes how adaptively timed learning in the hippocampus, bolstered by Brain Derived Neurotrophic Factor, or BDNF, helps to ensure normal memory consolidation.
      || habituative gates, CS, US, Thalamus (sensory cortex, category learning, conditioned reinforcer learning, adaptively time learning and BDNF), Amygdala (incentive motivation learning), Hippocampus (BDNF), Prefrontal Cortex (attention), Pontine nuclei, Cerebellum (adaptively timed motor learning)
    250. image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
      || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
    251. image p240fig05.44 When an algebraic exemplar model is realized using only local computations, it starts looking like an ART prototype model.
      || How does the model know which exemplars are in category A? BU-TD learning. How does a NOVEL test item access category A?
    252. image p241fig05.45 The 5-4 category structure is one example of how an ART network learns the same kinds of categories as human learners. See the text for details.
      || 5-4 Category structure. A1-A5: closer to the (1 1 1 1) prototype; B1-B4: closer to the (0 0 0 0) prototype
    253. image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
      ||
    254. image p254fig06.03 These interactions of the ARTSCAN Search model enable it to learn to recognize and name invariant object categories. Interactions between spatial attention in the Where cortical stream, via surface-shroud resonances, and object attention in the What cortical stream, that obeys the ART Matching Rule, coordinate these learning, recognition, and naming processes.
      || Retinal image -> On & OFF cell contrast normalization (retina/LGN) -> polarity [, in]sensitive contrast enhancement (V1) -> object [boundary (V2), surface (V2/V4)] -> surface contour (V2); What stream categories: volition control (BG). object boundary (V2) <-> view (ITp) <-> view integrator (ITp) <-> object (ITa) <-> [object-value (ORB), value (Amyg)] <-> name (PFC)
    255. image p315fig08.35 The output signals from the directional grouping network obey the ART Matching Rule. They thereby select consistent motion directional signals while suppressing inconsistent ones, and do not distort that the spared cells code. The aperture problem is hereby solved by the same mechanism that dynamically stabilizes the learning of directional grouping cells.
      || How to select correct direction and preserve speed estimates? Prediction: Feedback from MSTv to MT- obeys ART Matching Rule; Top-down, modulatory on-center, off-surround network (Grossberg 1976, 1980; Carpenter, Grossberg 1987, 1991); Explains how directional grouping network can stably develop and how top-down directional attention can work. (Cavanough 1992; Goner etal 1986; Sekuer, Ball 1977; Stelmach etal 1994). Directional grouping network (MSTv) <-> Directional long-range filter (MT). Modulatory on-center selects chosen direction and preserves speed. Off-surround inhibits incompatible directions.
    256. image p316fig08.36 How the directional grouping network, notably properties of the ART Matching Rule, enables a small set of amplified feature tracking signals at the ends of a line to select consistent directions in the line interior, while suppressing inconsistent directions.
      || Motion capture by directional grouping feedback. Directional grouping network (MSTv) <-> Directional long-range filter (MT). It takes longer to capture ambiguous motion signals in the line interior as the length of the line increases cf (Castet etal 1993)
    257. image p354fig10.01 The lamiar cortical circuit that realizes how we pay attention to an object sends signals from layer 6 of a higher cortical level to layer 6 of a lower cortical level and then back up to layer 4. This "folded feedback" circuit realizes a top-down, modulatory on-center, off-surround circuit that realizes the ART Matching Rule.
      || Top-down attention and folded feedback. Attentional signals also feed back into 6-to-4 on-center off-surround. 1-to-5-to-6 feedback path: Macaque (Lund, Booth 1975) cat (Gilbert, Wiesel 1979). V2-to-V1 feedback is on-center off-surround and affects layer 6 of V1 the most (Bullier etal 1996; Sandell, Schiller 1982). Attended stimuli enhanced, ignored stimuli suppressed. This circuit supports the predicted ART Matching Rule! [LGN, V[1,2][6->1]]
    258. image p360fig10.08 The bottom-up on-center off-surround from LGN-to-6-to-4 has a modulatory on-center because of its role in realizing the ART Matching Rule and, with it, the ability of the cortex to dynamically stabilize its learned memories.
      || Modulation of priming by 6-to-4 on-center (Stratford etal 1996; Callaway 1998). On-center 6-to-4 excitation is inhibited down to being modulatory (priming, subthreshold). On-center 6-to-4 excitation cannot activate layer 4 on its own. Clarifies need for direct path. Prediction: plays key role in stable grouping, development and learning. ART Matching Rule!
    259. image p419fig12.17 The auditory continuity illusion illustrates the ART Matching Rule at the level of auditory streaming. Its "backwards in time" effect of future context on past conscious perception is a signature of resonance.
      || Auditory continuity illusion. input, percept. Backwards in time - How does a future sound let past sound continue through noise? Resonance! - It takes a while to kick in. After it starts, a future tone can maintain it much more quickly. Why does this not happen if there is no noise? - ART Matching Rule! TD harmonic filter is maodulatory without BU input. It cannot create something out of nothing.
    260. image p420fig12.18 The ARTSTREAM model explains and simulates the auditory continuity illusion as an example of a spectral-pitch resonance. Interactions of ART Matching Rule and asymmetric competition mechanisms in cortical strip maps explain how the tone selects the consistent frequency from the noise in its own stream while separating the rest of the noise into another stream.
      || ARTSTREAM model (Grossberg 1999; Grossberg, Govindarajan, Wyse, Cohen 2004). SPINET. Frequency and pitch strips. Bottom Up (BU) harmonic sieve. Top Down (TD) harmonic ART matching. Exclusive allocation. Learn pitch categories based on early harmonic processing. A stream is a Spectral-Pitch Resonance!
    261. image p473fig12.69 Error rate and mean reaction time (RT) data from the lexical decision experiments of (Schvanevelt, McDonald 1981). ART Matching Rule properties explain these data in (Grossberg, Stone 1986).
      || (left) Error rate vs type of prime [R, N, U], [non,] word. (right) Mean RT (msec) vs type of prime [R, N, U], [non,] word.
    262. image p483fig13.02 The object-value categories in the orbitofrontal cortex require converging specific inputs from the sensory cortex and nonspecific incentive motivational inputs from the amygdala in order to fire. When the orbitofrontal cortex fires, it can deliver top-down ART Matching Rule priming signals to the sensory cortical area by which it was activated, thereby helping to choose the active recognition categories there that have the most emotional support, while suppressing others, leading to attentional blocking of irrelevant cues.
      || Cognitive-Emotional-Motor (CogEM) model. Drive-> amygdala incentive motivational learning-> orbitofrontal cortex- need converging cue and incentive iputs to fire <-> sensory cortex- conditioned renforcer learning-> amygdala. CS-> sensory cortex. Motivated attention closes the cognitive-emotional feedback loop, focuses on relevant cues, and causes blocking of irrelevant cues.
    263. image p548fig15.16 Homologous recognition learning and reinforcement learning macrocicuits enable adaptively timed conditioning in the reinforcement learning circuit to increase inhibition of the orienting system at times when a mismatch in the recognition system would have reduced inhibition of it.
      || Homolog between ART and CogEM model, complementary systems. [Recognition, Reinforcement] learning vs [Attentional, Orienting] system. Reinforcement: timing, drive representation.
    264. image p600fig16.36 The entorhinal-hipppocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories. See the text for details.
      || Entorhinal-hippocampal interactions as an ART system. Hippocampal place cells as spatial categories. Angular head velocity-> head direction cells-> stripe cells- small scale 1D periodic code (ECIII) SOM-> grid cells- small scale 2D periodic code (ECII) SOM-> place cells- larger scale spatial map (DG/CA3)-> place cells (CA1)-> conjunctive-coding cells (EC V/VI)-> top-down feedback back to stripe cells- small scale 1D periodic code (ECIII). stripe cells- small scale 1D periodic code (ECIII)-> place cells (CA1).
    265. image p613fig16.43 The main visual form and motion processing stream mechanisms of SOVEREIGN, many of them described at length in previous chapters.
      || Render 3-D scene (R3DS), figure-ground separation (FGS), log-polar transform (LPT), Gaussian coarse-coding (GCC), Invariant visual target map (IVTM), What Fuzzy ART (WhatFuzz), body spatial coordinates (BSC), where reactive visual TPV storage (WRVTS), Directional transient cell network (DTCN), Motion direction hemifild map (MDHM), Hemifiled left/right scoring (HLRS), reactive visual control signal (RVCS), Parvo/Magno/Erg competition (PMEC), Approach and Orient GOp (AOGp), GOm (GOm). R3DS [parvo-> FGS, magno-> DTCN], FGS-> [LPT, WRVTS], LPT-> GCC-> IVTM-> WhatFuzz, BSC-> [RVTS, PMEC], PMEC-> [gateRVTS-> RVTS, gateRVCS-> RVCS], DTCN-> MDHM-> HLRS, HLRS-> [PMEC, RVCS], AOGp-> gateRVTS, GOm-> gateRVCS.
    266. p190 Howell: [neural microcircuits, modal architectures] used in ART -
      bottom-up filterstop-down expectationspurpose
      instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
      LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
      EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
      auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
      auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
      visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
      red - cognitive-emotional dynamics
      green - working memory dynamics
      black - see [bottom-up, top-down] lists
      EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
      CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

      backgound colours in the table signify :
      whitegeneral microcircuit : a possible component of ART architecture
      lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
      light bluepost-perceptual cognition?
      pink"the feeling of what happens" and knowing what event caused that feeling
    267. image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
      || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
    268. image p174fig04.52 An example of how the 3D LAMINART model can transform the two monocular images of the random dot stereogream in the top row into the three depth-separated surfaces representations in the bottom row.
      || Stereogram surface percepts: surface lightnesses are segregated in depth (Fang and Grossberg 2009). [left, right] inputs, [far, fixation, near] planes. Contrast algorithms that just compute disparity matches and let computer code build the surface eg (Marr, Poggio, etal 1974).
    269. image p182fig04.58 LAMINART model processing stage that are sufficient to explain many percepts of transparency, including those summarized in Figure 4.57.
      || [left, right] eye, [LGN, V1 [6, 4, 3B, 2/3 A], V2 [4, 2/3]], [mo, bi]nocular cart [simple, complex] cells, [excita, inhibi]tory cart [connection, cell]s.
    270. image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
      || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
    271. image p231fig05.39 The SMART hypothesis testing and learning cycle predicts that vigilance increases when a mismatch in subcortical regions like the nonspecific thanlamus activates the nucleus basalis of Meynert which, in turn, broadcasts a burst of the neurotransmitter acetylcholine, or ACh, to deeper cortical layers. Due to the way in which LAMINART proposes that cortical matching and mismatching occurs, this ACh burst can increase vigilance and thereby trigger a memory search. See the text for details.
      || [BU input, [, non]specific thalamic nucleus, thalamic reticulate nucleus, neocortical laminar circuit] cart [Arousal, Reset, Search, Vigilance]
    272. image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
      ||
    273. image p248fig05.49 This circuit of the LAMINART model helps to explain properties of Up and Down states during slow wave sleep, and how disturbances in ACh dynamics can disrupt them.
      ||
    274. image p375fig11.07 The 3D LAMINART model uses both monocular and binocular simple cells to binocularly fuse like image contrasts. The remainder of the model generates 3D boundary and surface representations of multiple kinds of experiments as well as of natural scenes.
      || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A], V2 thin stripe [4->2/3A], V4]. V1 blob [V1-4 monocular, V1 interior binocular] simple cells. [complex, simple, inhibitory] cells, on-center off-surround
    275. image p376fig11.10 The 3D LAMINART model shows how the disparity filter can be integrated into the circuit that completes 3D boundary representations using bipole grouping cells. It also explains how surface contours can strengthen boundaries that succeed in generating closed filling-in domains.
      || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A] surface contour, V2 thin stripe (monocular surface) [4->2/3A], V2 interior [disynaptic inhibitory interneurons, bipole grouping cells, disparity filter, V4 binocular surface]. [complex, simple, inhibitory] cells, on-center off-surround
    276. image p378fig11.12 How monocular and binocular information are combined in V1 and V2 in the 3D LAMINART model.
      || Model utilizes monocular information. [left, right] eye cart V1-[4 monocular simple, 3B binocular simple, complex2/3A [mo,bi]nocular] cells, V2-4 binocular complex cells. black = monocular cells, blue = binocular cells. In V2, monocular inputs add to binocular inputs along the line of sight and contribute to depth perception.
    277. image p379fig11.13 How the 3D LAMINART model explains DaVinci stereopsis. All the stages of boundary and surface formation are color coded to clarify their explanation. Although each mechanism is very simple, when all of them act together, the correct depthful surface representation is generated. See the text for details.
      || DaVinci stereopsis (Nakayama, Shimojo 1990). An emergent property of the previous simple mechanisms working together. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] pair [Binocular match: boundaries of thick bar -> Add monocular boundaries along lines-of-sight -> Line-of-sight inhibition kills weaker vertical boundaries -> 3D surface percept not just a disparity match!] pair [Binocular match: right edge of thin and thick bars -> Strongest boundaries: binocular and monocular boundaries add -> Vertical boundaries from monocular left edge of thin bar survive -> Filling-in contained by connected boundaries]. cart [very near, near, fixation plane, far, very far]
    278. image p380fig11.14 The model explanation of DaVinci stereopsis when the input stimuli have opposite contrast polarities.
      || Polarity-reversed Da Vinci stereopsis (Nakayama, Shimojo 1990). Same explanation! (... as Figure 11.13 ...) [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    279. image p395fig11.34 A comparison of the properties of other rivalry models with those of the 3D LAMINART model (surrounded by red border). Significantly, only 3D LAMINART explains both stable vision and rivalry (green border).
      || Comparison of rivalry models
    280. image p401fig11.41 The 3D LAMINART model proposes how angle cells and disparity-gradient interact through learning to generate 3D representations of slanted objects.
      || 3D LAMINART model. [LGN, V1, V2, V4] Four key additions: 1. Angle cells - tuned to various angles; 2. Disparity-gradient cells - tuned to disparity gradients in the image; 3. weights from [angle to disparity-gradient] cells - learned while viewing 3D image; Colinear grouping between [angle to disparity-gradient] cells - disambiguates ambiguous groupings.
    281. image p402fig11.44 3D scenic reconstruction of the image in Figure 11.43 by the 3D LAMINART model.
      || Disparity [5, 6, 8, 10, 11, 14]: images of objects in common depth planes
    282. image p445fig12.43 The LIST PARSE laminar cortical Cognitive Working Memory circuit, that is proposed to occur in ventrolateral prefrontal cortex, is homologous to the LAMINART circuit circuit that models aspects of how visual cortex sees. The Motor Working Memory, VITE Trajectory Generator, and Variable-Rate Volitional Control circuits model how other brain regions, including dorsolateral prefrontal cortex, motor cortex, cerebellum, and basal ganglia, interact with the Cognitive Working Memory to control working memory storage and variable-rate performance of item sequences.
      || List parse circuit diagram. Connectivity convention. sequence chunks [<- BU filter, TD expectation ->] working memory. Working memory and sequence chunking circuit is homologous to visual LAMINART circuit!
    283. image p498fig13.22 (left column, top row) Adaptive filtering and conditioned arousal are both needed to regulate what cues can learn to activate particular space-time patterns. These developments lead inexorably to basic cognitive abilities, as embodied in the 3D LAMINART models for 3D vision and figure-ground perception (Chapter 11) and the 3D ARTSCAN SEARCH model for invariant object learning, recognition, and 3D search (Chapter 6). (right column, top row) Conditioned arousal enables only emotionally important cues to activate a motivationally relevant space-time pattern. (bottom row) Conditioned arousal and drive representations arise naturally from the unlumping of avalanche circuits to make them selective to motivationally important cues. The MOTIVATOR model is a natural outcome of this unlumping process (this chapter).
      || (top) Adaptive filtering and Conditioned arousal. Towards Cognition: need to filter inputs to the command cell. Towards Emotion: important signals turn arousal ON and OFF. (bottom) Conditioned arousal and Drive representations. Competition between conditioned arousal sources at drive representations, eg amygdala.
    284. image p228fig05.37 A macrocircuit of the neurotrophic Spectrally Timed ART, or nSTART, model. I developed nSTART with my PhD student Daniel Franklin. It proposes how adaptively timed learning in the hippocampus, bolstered by Brain Derived Neurotrophic Factor, or BDNF, helps to ensure normal memory consolidation.
      || habituative gates, CS, US, Thalamus (sensory cortex, category learning, conditioned reinforcer learning, adaptively time learning and BDNF), Amygdala (incentive motivation learning), Hippocampus (BDNF), Prefrontal Cortex (attention), Pontine nuclei, Cerebellum (adaptively timed motor learning)
    285. image p541fig15.02 The neurotrophic Spectrally Timed Adaptive Resonance Theory, or nSTART, model of (Franklin, Grossberg 2017) includes hippocampus to enable adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between CS and US.
      || Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. [CS, US] -> Sensory Cortex (SC) <- motivational attention <-> category learning -> Prefrontal Cortex (PFC). SC conditioned reinforcement learning-> Amygdala (cannot bridge the temporal gap) incentive motivational learning-> PFC. SC adaptively timer learning and BDNF-> Hippocampus (can bridge the temporal gap) BDNF-> PFC. PFC adaptively timed motor learning-> cerebellum.
    286. p190 Howell: [neural microcircuits, modal architectures] used in ART -
      bottom-up filterstop-down expectationspurpose
      instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
      LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
      EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
      auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
      auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
      visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
      red - cognitive-emotional dynamics
      green - working memory dynamics
      black - see [bottom-up, top-down] lists
      EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
      CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

      backgound colours in the table signify :
      whitegeneral microcircuit : a possible component of ART architecture
      lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
      light bluepost-perceptual cognition?
      pink"the feeling of what happens" and knowing what event caused that feeling
    287. image p221fig05.31 A system like Fuzzy ARTMAP can learn to associate learned categories in one ART network with learned categories in a second ART network. Because both bottom-up and top-down interactions occur in both networks, a bottom-up input pattern to the first ART network can learn to generate a top-down output pattern from the second ART network.
      || Fuzzy ARTMAP. Match tracking realizes minimax learning principle: vigilence increases to just above the match ratio of prototype / exemplar, thereby triggering search
    288. image p225fig05.33 Some early ARTMAP benchmark studies. These successes led to the use of ARTMAP, and many variants that we and other groups have developed, in many large-scale applications in engineering and technology that has not abated even today.
      || see Early ARTMAP benchmark studies
    289. image p225fig05.34 ARTMAP was successfully used to learn maps of natural terrains with many advantages over those of mapping projects that used AI expert systems. The advantages are so great that many mapping projects started to use this technology.
      || AI expert system - 1 year: field identification of natural regions; derivation of ad hoc rules for each region by expert geographers; correct 80,000 of 250,000 site labels; 230m (site-level) scale. ARTMAP system - 1 day: rapid, automatic, no natural regions or rules; confidence map; 30m (pixel-level) scale can see roads; equal accuracy at test sites
    290. image p242fig05.46 Computer simulations of how two variants of Distributed ARTMAP incrementally learn the 5-4 category structure. See the text for details.
      || Distributed ARTMAP with [self-supervised learning, post-training LTM noise]
    291. image p436fig12.30 The conscious ARTWORD, or cARTWORD, laminar cortical speech model simulates how future context can disambiguate noisy past speech sounds in such a way that the completed percept is consciously heard to proceed from past to future as a feature-item-list resonant wave propagates through time.
      || cARTWORD: Laminar cortical model macrocircuit (Grossberg, Kazerounian 2011) Simulates PHONEMIC RESTORATION: Cognitive Working Memory (processed item sequences) - [Excitatory-> inhibitory-> habituative-> adaptive filter-> adaptive filter-> adaptive filter with depletable synapse-> Acoustic [item, feature]
    292. image p453fig12.49 The ARTWORD model that I published in 2000 with my PhD student Christopher Myers simulates data such as the (Repp etal 1978) data in Figure 12.68. See the text for details.
      || ARTWORD model (Grossberg, Myers 2000). Input phonetic features-> Phonemic item working memory-> Masking Field unitized lists-> Automatic gain control-> Phonemic item working memory. [habituative gate, adaptive filter]s.
    293. image p453fig12.50 The ARTWORD perception cycle shows how sequences of items activate possible list chunks, which compete among each other and begin to send their top-down expectations back to the item working memory. An item-list resonance develops through time as a result.
      || ARTWORD perception cycle. (a) bottom-up activation (b) list chunk competition (c) item-list resonance (d) chunk reset due to habituative collapse.
    294. image p455fig12.52 Simulation of cARTWORD dynamics in response to the complete list /1/-/2/-/3/. The relevant responses are surrounded by a red box.
      || Presentation of a normal sequence: input /1/-/2/-/3/. |c(i,1)-5| vs time (msec). List chunks select most predictive code. Order stored in WM layers. Resonant activity of /1/-/2/-/3/ in item and feature layers corresponds to conscious speech percept.
    295. image p456fig12.53 Simulation of cARTWORD dynamics in response to the partial list /1/-silence-/3/ with /2/ replaced by silence. Only the representations of these items can be seen in the red box.
      || Presentation with silence duration: input /1/-silence-/3/. |c(i,1)-5| vs time (msec). List chunks select most predictive code. Order stored in WM layers. Gap in resonant activity of /1/-silence-/3/ in item and feature layers corresponds to perceived silence.
    296. image p456fig12.54 Item /2/ is restored in the correct list position in response to the list /1/-noise-/3/.
      || Presentation with noise: input /1/-noise-/3/. |c(i,1)-5| vs time (msec). List chunks select the most predictive code. Order restored in WM layers. Resonant activity of /1/-/2/-/3/ in item and feature layers corresponds to restoration of item /2/ replaced by noise in input.
    297. image p457fig12.55 Item /4/ is restored in the correct list position in response to the list /1/-noise-/5/. This and the previous figure show how future context can disambiguate past noisy sequences that are otherwise identical.
      || Presentation with noise: input /1/-noise-/5/. |c(i,1)-5| vs time (msec). List chunks select the most predictive code. Order restored in WM layers. Resonant activity of /1/-/4/-/3/ in item and feature layers corresponds to restoration of item /4/ replaced by noise in input.
    298. image p444fig12.42 The LIST PARSE laminar cortical model of working memory and list chunking that I published with Lance Pearson in 2008 simulated the Averbeck etal data in Figure 12.41, as in the left column of the figure. It also simulated cognitive data about working memory storage by human subjects. See the text for details.
      || LIST PARSE: Laminar cortical model of working memory and list chunking (Grossberg, Pearson 2008). Simulates data about: [immediate, delayed, continuous] distractor free recall; immediate serial recall; and variable-speed sequential perfomance of motor acts. [velocity, acceleration] vs time (ms) from recall cue.
    299. image p445fig12.43 The LIST PARSE laminar cortical Cognitive Working Memory circuit, that is proposed to occur in ventrolateral prefrontal cortex, is homologous to the LAMINART circuit circuit that models aspects of how visual cortex sees. The Motor Working Memory, VITE Trajectory Generator, and Variable-Rate Volitional Control circuits model how other brain regions, including dorsolateral prefrontal cortex, motor cortex, cerebellum, and basal ganglia, interact with the Cognitive Working Memory to control working memory storage and variable-rate performance of item sequences.
      || List parse circuit diagram. Connectivity convention. sequence chunks [<- BU filter, TD expectation ->] working memory. Working memory and sequence chunking circuit is homologous to visual LAMINART circuit!
    300. image p446fig12.44 (left column, top row) LIST PARSE can model linguistic data from human subjects. In this figure, model parameters are fixed to enable a close fit to data about error-type distributions in immediate free recall experiments, notably transposition errors. (right column, top row) Simulation and data showing bowing of the serial position curve, including an extended primacy gradient. (left column, bottom row) The simulation curve overlays data about list length effects, notably the increasing recall difficulty of longer lists during immediate serial recall (ISR). (right column, bottom row) Simulation (bottom image) and data (top image) of the limited temporal extent for recall.
      || (1. TL) Error-type distributions in immediate serial recall (Hanson etal 1996). % occurence vs serial position. Graph convention: Data- dashed lines; Simulations- solid lines. Six letter visual ISR. Order errors- transpositions of neighboring items are the most common. Model explanation: Noisy activation levels change relative oorder in primacy gradient. Similar activation of neighboring items most susceptible to noise. Model paramenters fitted on these data. (2. TR) Bowing of serial position curve (Cowan etal 1999). % correct vs serial position. Auditory ISR with various list lengths (graphs shifted rightward): For [, sub-]span lists- extended primacy, with one (or two) item recency; Auditory presentation- enhanced performance for last items. LIST PARSE: End effects- first and last items half as many members; Echoic memory- last presented item retained in separate store. (3. BL) List length effects, circles (Crannell, Parrish 1968), squares (Baddeley, Hitch 1975), solid line- simulation. % list correct vs list length. Variable list length ISR: longer lists are more difficult to recall. LIST PARSE: More items- closer activation levels and lower absolute activity level with enough inputs; Noise is more likely to produce order errors, Activity levels more likely to drop below threshold;. (4. BR) Limited temporal extent for recall (Murdoch 1961). % recalled vs retention interval (s). ISR task with distractor-filled retention intervals (to prevent rehersal): Increasing retention interval - decreases probabilty of recalling list correctly; Load dependence- longer list more affected by delays; Performance plateau- subjects reach apparent asymptote. LIST PARSE: Increase convergence of activities with time; loss of order information;.
    301. image p447fig12.45 (left column) LIST PARSE simulations of the proportion of order errors as a function of serial position for 6 item lists with (a) an extended pause of 7 time units between the third and fourth items, and (b) pauses of 5 time units (solid curve) and 10 time units (dashed curve) between all items. (right column) Simulations (solid curves) and data (dashed curves) illustrating close model fits in various immediate free recall tasks.
      || (Left) Temporal grouping and presentation variability. Temporal grouping: Inserting an extended pause leads to inter-group bowing; Significantly different times of integration and activity levels across pause, fewer interchanges. (Right) Immediate free recall, and [delayed, continuous] distractor-free recall. Overt rehersal IFR task with super-span (ie 20 item) lists: Extended recency- even more extended with shorter ISIs; Increased probability of recall with diminished time from last rehearsal; Early items in list rehearsed most;. LIST PARSE (unique) for long lists: Incoming items from recency gradient; Rehearsal (re-presentation) based upon level of activity.
    302. image p254fig06.03 These interactions of the ARTSCAN Search model enable it to learn to recognize and name invariant object categories. Interactions between spatial attention in the Where cortical stream, via surface-shroud resonances, and object attention in the What cortical stream, that obeys the ART Matching Rule, coordinate these learning, recognition, and naming processes.
      || Retinal image -> On & OFF cell contrast normalization (retina/LGN) -> polarity [, in]sensitive contrast enhancement (V1) -> object [boundary (V2), surface (V2/V4)] -> surface contour (V2); What stream categories: volition control (BG). object boundary (V2) <-> view (ITp) <-> view integrator (ITp) <-> object (ITa) <-> [object-value (ORB), value (Amyg)] <-> name (PFC)
    303. image p255fig06.04 The ARTSCAN Search model can also search for a desired target object in a scene, thereby clarifying how our brains solve the Where's Waldo problem.
      || similar ilustration as Figure 06.03, with some changes to arrows
    304. image p259fig06.08 The distributed ARTSCAN, or dARTSCAN, model includes spatial attention in both PPC and PFC, and both fast-acting attention, triggered by transient cells in Where cortical areas such as MT, and slower-acting surface-shroud resonances in What cortical areas such as V4 and PPC. See the text for details.
      || dARTSCN spatial attention hierarchy, Fast (Where stream) Slow (What stream) (Foley, Grossberg, and Mingolia 2012). [transient cells (MT) ->, object surfaces (V4) <->] [object shrouds (PPC) <-> spatial shrouds (PPC/PFC)]
    305. image p271fig06.17 Persistent activity in IT cells is just what is needed to enable view-invariant object category learning by ARTSCAN to be generalized to [view, position, size]-invariant category learning by positional ARTSCAN, or pARTSCAN. See the text for details.
      || Persistent activity in IT. Physiological data show that persistent activity exist in IT (Fuester and Jervey 1981, Miyashita and Chang 1988, Tomita etal 1999). Adapted from (Tomita etal 1999 Nature)
    306. image p272fig06.18 The pARTSCAN model can learn [view, position, size]-invariant categories by adding view category integrator cells that have the properties of persistent neurons in IT. These integrator cells get reset with the invariant object category, not the view category.
      || pARTSCAN: positionally-invariant object learning. (Cao, Grossberg, Markowitz 2011). IT cells with persistent activities are modeled by view category integrators in ITp. View-specific category cells are RESET as the eyes move within the object. View category integrator cells are NOT RESET when the view-specific category is reset. They are RESET along with invariant object category cells when a spatial attention shift occurs.
    307. image p273fig06.20 pARTSCAN can simulate the IT cell recoding that Li and DiCarlo reported in their swapping experiments because the swapping procedure happens without causing a parietal reset burst to occur. Thus the originally activated invariant category remains activated and can get associated with the swapped object features.
      || Simulation of Li and DiCarlo swapping data. data (Li and DiCarlo 2008), model (Cao, Grossberg, Markowitz 2011). normalized response vs. exposure (swaps and/or hours)
    308. image p274fig06.21 pARTSCAN can also simulate the trade-off in IT cell responses between position invariance and selectivity that was reported by Zoccolan etal 2007. This trade-off limits the amount of position invariance that can be learned by a cortical area like V1 that is constrained by the cortical magnification factor.
      || Trade-off in IT cell response properties. Inferotemporal cortex cells with greater position invariance respond less selectively to natural objects. invariance-tolerance, selectivity-sparseness. data (Zoccolan etal 2007) model (Grossberg, Markowitzm, Cao 2011). position tolerance (PT, degrees) vs sparseness (S)
    309. image p274fig06.22 pARTSCAN can simulate how IT cortex processes image morphs, when it learns with high vigilance. See the text for details.
      || Akrami etal simulation: a case of high vigilance. tested on morphs between image pairs
    310. image p498fig13.22 (left column, top row) Adaptive filtering and conditioned arousal are both needed to regulate what cues can learn to activate particular space-time patterns. These developments lead inexorably to basic cognitive abilities, as embodied in the 3D LAMINART models for 3D vision and figure-ground perception (Chapter 11) and the 3D ARTSCAN SEARCH model for invariant object learning, recognition, and 3D search (Chapter 6). (right column, top row) Conditioned arousal enables only emotionally important cues to activate a motivationally relevant space-time pattern. (bottom row) Conditioned arousal and drive representations arise naturally from the unlumping of avalanche circuits to make them selective to motivationally important cues. The MOTIVATOR model is a natural outcome of this unlumping process (this chapter).
      || (top) Adaptive filtering and Conditioned arousal. Towards Cognition: need to filter inputs to the command cell. Towards Emotion: important signals turn arousal ON and OFF. (bottom) Conditioned arousal and Drive representations. Competition between conditioned arousal sources at drive representations, eg amygdala.
    311. image p531fig14.06 Classification of scenic properties as texture categories by the ARTSCENE model. See the text for details.
      || Image-> Feature extraction (texture principal component rankings)-> Learning feature-to-scene mapping (texture category principal component rankings)<- scene class. Large-to-small attentional shrouds as principle component higher.
    312. image p531fig14.07 Voting in the ARTSCENE model achieves even better prediction of scene type. See the text for details.
      || Image-> Feature extraction (texture principal component rankings)-> Learning feature-to-scene mapping (texture category principal component rankings)-> evidence accumulation (sum)-> scene class winner-take-all inference. Large-to-small attentional shrouds as principle component higher.
    313. image p532fig14.08 Macrocircuit of the ARTSCENE Search neural model for learning to search for desired objects by using the sequences of already experienced objects and their locations to predict what and where the desired object is. V1 = First visual area or primary visual cortex; V2 = Second visual area; V4 = Fourth visual area; PPC = Posterior Parietal Cortex; ITp = posterior InferoTemporal cortex; ITa = anterior InferoTemporal cortex; MTL = Medial Temporal Lobe; PHC = ParaHippoCampal cortex; PRC = PeriRhinal Cortex; PFC = PreFrontal Cortex; DLPFC = DorsoLateral PreFrontal Cortex; VPFC = Ventral PFC; SC = Superior Colliculus.
      ||
    314. image p533fig14.09 Search data and ARTSCENE Search simulations of them in each pair of images from (A) to (F). See the text for details.
      || 6*[data vs simulation], [Response time (ms) versus epoch].
    315. image p214fig05.26 When a big enough mismatch occurs, the orienting system is activated and sends a burst of nonspecific arousal to the category level. This Mismatch Detector has properties of the N200 ERP.
      || Mismatch triggers nonspecific arousal. Mismatch at F1 eleicits a nonspecific event at F2. Call this event nonspecific arousal. N200 ERP Naatanen etal: 1. bottom-up, 2. unconditionable, 3. nonspecific, 4. mismatch
    316. image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
      || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
    317. image p231fig05.39 The SMART hypothesis testing and learning cycle predicts that vigilance increases when a mismatch in subcortical regions like the nonspecific thanlamus activates the nucleus basalis of Meynert which, in turn, broadcasts a burst of the neurotransmitter acetylcholine, or ACh, to deeper cortical layers. Due to the way in which LAMINART proposes that cortical matching and mismatching occurs, this ACh burst can increase vigilance and thereby trigger a memory search. See the text for details.
      || [BU input, [, non]specific thalamic nucleus, thalamic reticulate nucleus, neocortical laminar circuit] cart [Arousal, Reset, Search, Vigilance]
    318. image p232fig05.40 Computer simulation of how the SMART mode generates (a) gamma oscillations if a good enough match occurs, or (c) beta oscillations if a bad enough match occurs. See the text for details.
      || Brain oscillations during match/mismatch, data, simulation. (a) TD corticothalamic feedback increases synchrony (Sillito etal 1994) (b) Match increases γ oscillations (c) Mismatch increases θ,β oscillations
    319. image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
      || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
    320. Grossberg 2021 p229c2h0.60 SMART computer simulations demonstrate that a good enough match of a top-down expectation with a bottom-up feature pattern generates an attentive resonance during which the spikes of active cells synchronize in the gamma frequency range of 20-70 Hz (Figure 5.40). Many labs have reported a link between attention and gamma oscillations in the brain, including two articles published in 2001, one from the laboratory of Robert Desimone when he was at the the National Institute of Mental Health in Bethseda (Fries, Reynolds, Rorie, Desimone 2001), and the other from the laboratory of Wolf Singer in Frankfurt (Engel, Fries, Singer 2001). You'll note that Pascal Fries participated in both studies, and is an acknowledged leader in neurobiological studies of gamma oscillations; eg (Fries 2009). .."
    321. image p468fig12.65 Linguistic properties of the PHONET model and some of the data that it simulates. The upper left image summarizes the asymmetric transient-to-sustained gain control that helps to create invariant intraword ratios during variable-rate speech. The lower left image summarizes the rate-dependent gain control of the ARTPHONE model that creates rate-invariant working memory representations in response to sequences of variable-rate speech. The right image summarizes the kind of paradoxical VC-CV category boundary data of (Repp 1980) that ARTPHONE simulates. See the text for details.
      || (left upper) [transient, sustained] [working memory, filter, category]. (left lower) phone inputs-> [input rate estimate, features], Features w <- habituative transmitter gates -> categories-> rate invariant phonetic output, input rate estimate-> gain control-> [features, categories] rate-dependent integration of categories and features. (right) % 2-stop vs VC-CV silent interval (msec): [ib-ga, ib-ba, iga, iba].
    322. image p420fig12.18 The ARTSTREAM model explains and simulates the auditory continuity illusion as an example of a spectral-pitch resonance. Interactions of ART Matching Rule and asymmetric competition mechanisms in cortical strip maps explain how the tone selects the consistent frequency from the noise in its own stream while separating the rest of the noise into another stream.
      || ARTSTREAM model (Grossberg 1999; Grossberg, Govindarajan, Wyse, Cohen 2004). SPINET. Frequency and pitch strips. Bottom Up (BU) harmonic sieve. Top Down (TD) harmonic ART matching. Exclusive allocation. Learn pitch categories based on early harmonic processing. A stream is a Spectral-Pitch Resonance!
    323. image p422fig12.19 The ARTSTREAM model includes mechanisms for deriving streams both from pitch and from source direction. See the text for details.
      || [left, right] cart Peripheral processing = [input signal-> outer & middle ear preemphasis-> basilar membrane gammatone filterbank-> energy measure]. Spectral stream layer-> spectral summation layer-> delays-> [f-, tau] plane-> pitch stream layer-> pitch summation layer.
    324. image p423fig12.20 The Spatial Pitch Network, or SPINET, model shows how a log polar spatial representation of the sound frequency spectrum can be derived from auditory signals occuring in time. The spatial representation allows the ARTSTREAM model to compute spatially distinct auditory streams.
      || SPINET model (Spatial Pitch Network) (Cohen, Grossberg, Wyse 1995). 1. input sound 2. Gamma-tone filter bank 3. Shaort-term average energy spectrum 4. MAP transfer function 5. On-center off-surround and rectification 6. Harmonic weighting 7. Harmonic summation and competition -> PITCH
    325. image p424fig12.21 One of the many types of data about pitch processing that are simulated by the SPINET model. See the text for details.
      || Pitch shifts with component shifts (Patterson, Wightman 1976; Schouten 1962). Pitch vs lowest harmonic number.
    326. image p425fig12.23 ARTSTREAM simulations of the auditory continuity illusion and other streaming properties (left column, top row). When two tones are separated by silence (Input), a percept of silence also separates them in a spectral-pitch resonance. (left column, bottom row). When two tones are separated by broadband noise, the percept of tone continues through the noise in one stream (stream 1) while the remainder of the noise occurs in a different stream (stream 2). (right column) Some of the other streaming properties that have been simulated by the ARTSTREAM model.
      || Auditory continuity does not occur without noise. Auditory continuity in noise. Other simulated streaming data.
    327. image p431fig12.27 The strip maps that occur in ARTSTREAM and NormNet are variants of a cortical design that aalso creates ocular dominance columns in the visual cortex.
      || Adult organization of V1 (Grinvald etal http://www.weizmann.ac.il/brain/images/cubes.html). (1) Occular dominance columns (OCDs): Alternating strips of cortex respond preferentially to visual inputs of each eye (R/L corresponds to Right and Left eye inputs in the figure); Orientation columns: A smooth pattern of changing orientation preference within each ODC. Organized in a pinwheel like fashion.
    328. p370 Chapter 11 means (Grossberg 2021) page 370, Chapter 11
      p002sec Illusion and realitymeans (Grossberg 2021) page 2, section Illusion and reality
      p013fig01.09means (Grossberg 2021) page 13, Figure 1.09 (1.9 as in book)
      p030tbl01.02 means (Grossberg 2021) page 30, Table 1.02 (1.2 as in book)
      p111c2h0.5means (Grossberg 2021) page 111, column 2, height from top as fraction of page height
      || text...Are notes in addition to [figure, table] captions, mostly comprised of text within the image, but also including quotes of text in the book. Rarely, it includes comments by Howell preceded by "Howell". The latter are distinct from "readers notes" (see, for example : reader Howell notes).
      p044 Howell: grepStr 'conscious' means a comment by reader Howell, extracted using the grep string shown, referring to page 44 in (Grossberg 2021)
    329. p001 Chapter 1 Overview - From Complementary Computing and Adaptive Resonance to conscious awareness
    330. p250 Chapter 6 Conscious seeing and invariant recognition - Complementary cortical streams coordinate attention for seeing and recognition
    331. p539 Chapter 15 Adaptively timed learning - How timed motivation regulates conscious learning and memory consolidation
      p370 Chapter 11 means (Grossberg 2021) page 370, Chapter 11
      p002sec Illusion and realitymeans (Grossberg 2021) page 2, section Illusion and reality
      p013fig01.09means (Grossberg 2021) page 13, Figure 1.09 (1.9 as in book)
      p030tbl01.02 means (Grossberg 2021) page 30, Table 1.02 (1.2 as in book)
      p111c2h0.5means (Grossberg 2021) page 111, column 2, height from top as fraction of page height
      || text...Are notes in addition to [figure, table] captions, mostly comprised of text within the image, but also including quotes of text in the book. Rarely, it includes comments by Howell preceded by "Howell". The latter are distinct from "readers notes" (see, for example : reader Howell notes).
      p044 Howell: grepStr 'conscious' means a comment by reader Howell, extracted using the grep string shown, referring to page 44 in (Grossberg 2021)
    332. image p039tbl01.03 The link between consciousness and movement
      ||
      VISUALseeing, knowing, and reaching
      AUDITORYhearing, knowing, and speaking
      EMOTIONALfeeling, knowing, and acting
    333. image p042tbl01.04 The six main kinds of resonances which support different kinds of conscious awareness that will be explained and discussed in this book.
      ||
      type of resonancetype of consciousness
      surface-shroudsee visual object or scene
      feature-categoryrecognize visual object or scene
      stream-shroudhear auditory object or stream
      spectral-pitch-and-timbrerecognize auditory object or stream
      item-listrecognize speech and language
      cognitive-emotionalfeel emotion and know its source
    334. image p270fig06.16 The same target position signal that can command the next saccade also updates a gain field that predictively maintains the attentional shroud in head-centered coordinates, even before the eye movement is complete. This process keeps the shroud invariant under eye movements, so that it can continue to inhibit reset of an emerging invariant category as t is associated with multiple object views, even while the conscious surface representation shifts with each eye movement in retinotopic coordinates. This pdating process is often called predictive re mapping.
      || Predictive remapping of eye movements! From V3A to LIP. [spatial attention, object attention, figure-ground separation, eye movement remapping, visual search]. (Beauvillaib etal 2005, Carlson-Radvansky 1999, Cavanaugh etal 2001, Fecteau & Munoz 2003, Henderson & Hollingworth 2003, Irwin 1991)
    335. image p278fig06.27 A surface-shroud resonance through the Where stream enables us to consciously see an object while a feature-category resonance into the What stream enables us to recognize it. Both kinds of resonances can synchronize via visual cortex so that we can know what an object is when we see it.
      || What kinds of resonances support knowing vs seeing? What stream [knowing, feature-prototype resonance], Where stream [seeing, surface-shroud resonance]
    336. image p278fig06.28 If the feature-category resonances cannot form, say due to a lesion in IT, then a surface-shroud resonance can still support conscious seeing of an attended object, and looking at or reaching for it, even if the individual doing so knows nothing about the object, as occurs during visual agnosia. The surface-shroud resonance supports both spatial attention and releases commands that embody the intention to move towards the attended object.
      || What kinds of resonances support knowing vs seeing? visual agnosia: reaching without knowing Patient DF (Goodale etal 1991). Attention and intention both parietal cortical functions (Anderson, Essick, Siegel 1985; Gnadt, Andersen 1988; Synder, Batista, Andersen 1997, 1998)
    337. image p355fig10.02 Distinguishing processes of seeing vs knowing has been difficult because they interact so strongly.
      || Seeing vs. Knowing. Seeing and knowing [operate at different levels of the brain, use specialized circuits], but they [interact via feedback, use similar cortical designs, feedback is needed for conscious perception]. Cerebral Cortex: Seeing [V1-V4, MS-MST], Knowing [IT, PFC].
    338. image p369fig10.19 Data from (Watanabe etal 2001) showing perceptual learning of the coherent motion direction, despite the lack of extra-foveal attention and awareness of the moving stimuli.
      || Unconscious perceptual learning of motion direction, % correct for two tests, compared to chance level results.
    339. image p396fig11.35 Three properties of bipole boundary grouping in V2 can explain how boundaries oscillate in response to rivalry-inducing stimuli. Because all boundaries are invisible, however, these properties are not sufficient to generate a conscious percept of rivalrous surfaces.
      || 3 V2 boundary properties cause binocular rivalry. 1. Bipole grouping, 2. Orientational competition, 3. Actovity-dependent habituation
    340. image p419fig12.17 The auditory continuity illusion illustrates the ART Matching Rule at the level of auditory streaming. Its "backwards in time" effect of future context on past conscious perception is a signature of resonance.
      || Auditory continuity illusion. input, percept. Backwards in time - How does a future sound let past sound continue through noise? Resonance! - It takes a while to kick in. After it starts, a future tone can maintain it much more quickly. Why does this not happen if there is no noise? - ART Matching Rule! TD harmonic filter is maodulatory without BU input. It cannot create something out of nothing.
    341. image p436fig12.30 The conscious ARTWORD, or cARTWORD, laminar cortical speech model simulates how future context can disambiguate noisy past speech sounds in such a way that the completed percept is consciously heard to proceed from past to future as a feature-item-list resonant wave propagates through time.
      || cARTWORD: Laminar cortical model macrocircuit (Grossberg, Kazerounian 2011) Simulates PHONEMIC RESTORATION: Cognitive Working Memory (processed item sequences) - [Excitatory-> inhibitory-> habituative-> adaptive filter-> adaptive filter-> adaptive filter with depletable synapse-> Acoustic [item, feature]
    342. image p455fig12.52 Simulation of cARTWORD dynamics in response to the complete list /1/-/2/-/3/. The relevant responses are surrounded by a red box.
      || Presentation of a normal sequence: input /1/-/2/-/3/. |c(i,1)-5| vs time (msec). List chunks select most predictive code. Order stored in WM layers. Resonant activity of /1/-/2/-/3/ in item and feature layers corresponds to conscious speech percept.
    343. image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
      || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
    344. Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
      The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

      Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in
      Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

      Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

      These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

      This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

      The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

      I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

      Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

      The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

      If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

      The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

      Howell: bold added for emphasis.
      (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
      see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
    345. image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
      || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
    346. image p514fig13.44 Analog of the COgEM model in Figure 6.1 of (Damasio 1999).
      || (a) map of object X-> map of proto-self at inaugural instant-> [, map of proto-self modified]-> assembly of second-order map. (b) map of object X enhanced-> second-order map imaged.
    347. image p105fig03.23 The pointillist painting A Sunday on la Grande Jatte by Georges Seurat illustrates how we group together both large-scale coherence among the pixels of the painting, as well as forming small groupings around the individual dabs of color.
      ||
    348. image p107fig03.25 The Roofs of Collioure by Matisse. See the text for details
      || p107c1h0.6 "... [Matisse] showed how patches of pure color, when laid down properly on a canvas, could be grouped by the brain into emergent boundarues, without the intervention of visible outlines. ... The trick was that these emergent boundaries, being invisible, or amodal, did not darken the colors in the surface representations. In this sense, Matisse intuitively realized that "all boundaries are invisible" through the masterful way in which he arranged his colors on canvas to generate boundaries that could support compelling surface representations. ..."
    349. image p108fig03.27 Matisse's painting Open Window, Collioure 1905 combines continuously colored surfaces with color patches that created surface representations using amodal boundaries, as in Figure 3.26. Both kinds of surfaces cooperate to form the final painterly percept.
      ||
    350. image p110fig03.32 Claude Monet's painting of Poppies Near Argenteuil. See the text for details.
      || Claude Monet Poppies Near Argenteuil 1873. p110c2h0.35 "... the red poppies and the green field around them are painted to have almost the same luminescence; that is, they are almost equiluminant. As a result, the boundaries between the red and green regions are weak and positionally unstable, thereby facilitating an occasional impression of the poppies moving in a gentle breeze, especially as one's attention wanders over the scene. ...".
    351. image p120fig03.43 Four paintings by Monet of the Rouen cathedral under different lighting conditions (top row) and their monochromatic versions (bottom row). See the text for details.
      || p119c2h0.25 "... Monet uses nearby colors that are nearly equiluminant, and sharp, high-contrast luminance defined edges are sparse. He hereby creates weaker boundary signals within and between the parts of many forms, and stronger boundary signals between the forms. This combination facilitates color spreading within the forms and better separation of brightness and collor differences between forms. ... The grayscale versions of these paintings demonstrate the near equiluminance of the brushstrokes within forms, and places in which brightness and color differences significantly influence the groupings that differentiate between forms, including the differentiation between the cathedral and the sky. ..."
    352. image p120fig03.44 The Rouen cathedral at sunset generates very different boundary webs than it does in full sunlight, as illustrated by Figure 3.45.
      || Rouen Cathedral at sunset (Monet 1892-1894).
      • Lighting almost equiluminant
      • Most boundaries are thus caused by color differences, not luminance differences
      • Fine architectural details are obscured, leading to...
      • Coarser and more uniform boundary webs, so...
      • Less depth in the painting.
    353. image p121fig03.45 The Rouen cathedral in full sunlight.
      || Rouen Cathedral full sunlight (Monet 1892-1894).
      • Lighting is strongly non-uniform across most of the painting
      • Strong boundaries due to both luminance and color differences
      • Fine architectural details are much clearer, leading to...
      • Finer and more non-uniform boundary webs, so...
      • Much more detail and depth
    354. image p121fig03.46 The Rouen cathedral in full sunlight contains T-Junctions that are not salient in the painting of it at sunset. These are among the painting's features that give it a much more depthful appearance.
      || Rouen Cathedral full sunlight (Monet 1892-1894).
      • There are also more T-junctions where vertical boundaries occlude horizontal boundaries, or conversely...
      • Leading to more depth.
      p119c2h1.0 "... Such T-junction boundary occlusions ... can generate percepts of depth in the absence of any other visual clues. ...".
    355. image p171fig04.49 An example of DaVinci stereopsis in which the left eye sees more of the wall between A and C than the right eye does. The region between B and C is seen only by the left eye because the nearer wall between C and D occludes it from the right eye view.
    356. image p377fig11.11 DaVinci stereopsis phenomena occur when only one eye can receive visual inputs from part of a 3D scene due to occlusion by a nearer surface.
      || How does monocular information contribute to depth perception? DaVinci steropsis (Gillam etal 1999). Only by utilizing monocular information can visual system create correct depth percept. [left, right] eye view
    357. image p379fig11.13 How the 3D LAMINART model explains DaVinci stereopsis. All the stages of boundary and surface formation are color coded to clarify their explanation. Although each mechanism is very simple, when all of them act together, the correct depthful surface representation is generated. See the text for details.
      || DaVinci stereopsis (Nakayama, Shimojo 1990). An emergent property of the previous simple mechanisms working together. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] pair [Binocular match: boundaries of thick bar -> Add monocular boundaries along lines-of-sight -> Line-of-sight inhibition kills weaker vertical boundaries -> 3D surface percept not just a disparity match!] pair [Binocular match: right edge of thin and thick bars -> Strongest boundaries: binocular and monocular boundaries add -> Vertical boundaries from monocular left edge of thin bar survive -> Filling-in contained by connected boundaries]. cart [very near, near, fixation plane, far, very far]
    358. image p380fig11.14 The model explanation of DaVinci stereopsis when the input stimuli have opposite contrast polarities.
      || Polarity-reversed Da Vinci stereopsis (Nakayama, Shimojo 1990). Same explanation! (... as Figure 11.13 ...) [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    359. image p381fig11.15 The same model mechanisms explain the surface percept that is generated by the variant of DaVinci stereopsis that Gillam, Blackburn, and Nakayama studied in 1999.
      || DaVinci stereopsis (Gillam, Blackburn, Nakayama 1999). same model mechanisms. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    360. image p382fig11.16 The version of DaVinci steropsis wherein three narrow rectangles are binocularly matched with one thick rectangle can also be explained is a similar way.
      || DaVinci stereopsis of [3 narrow, one thick] rectangles (Gillam, Blackburn, Nakayama 1999). [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    361. p618 Chapter 17 A universal development code - Mental measurements embody universal laws of cell biology and physics
    362. image p073fig02.19 Computing with cells: infinity does not exist in biology!
      || Computing in a bounded activity domain, Gedanken experiment (Grossberg 1970). Vm sub-areas [xm, B - xm], I(all m)], m=[1, i, B].
      Bexcitable sites
      xi(t)excited sites (activity, potential)
      B - xi(t)unexcited sites
    363. image p082fig02.37 My models begin with behavioral data, since brains are designed to achieve behavioral success. The text explains how models evolve in stages, through a process of successive refinements, or unlumpings. These unlumpings together carry out a kind of conceptual evolution, leading to models that can explain and predict ever larger psychological and neurobiological databases.
      || Modelling method and cycle.
      Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
      Operationalizes "proper level of abstraction"
      Operationalizes that you cannot "derive a brain" in one step.
    364. image p501fig13.26 A simple differential equation describes the processes of transmitter accumulation and release that do their best, at a finite rate, to carry out unbiased transduction.
      || Transmitter accumulation and release. Transmitter y cannot be restored at an infinite rate: T = S*ym y ~= B, Differential equations: d[dt: y] = A*(B - y) - S*y = accumulate - release. Transmitter y tries to recover to ensure unbiased transduction. What if it falls behind? Evolution has exploited the good properties that happen then.
    365. Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
      The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

      Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

      Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

      These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

      This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

      The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

      I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

      Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

      The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

      If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

      The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

      Howell: bold added for emphasis.
      (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
      see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
    366. image p248fig05.49 This circuit of the LAMINART model helps to explain properties of Up and Down states during slow wave sleep, and how disturbances in ACh dynamics can disrupt them.
      ||
    367. image p278fig06.28 If the feature-category resonances cannot form, say due to a lesion in IT, then a surface-shroud resonance can still support conscious seeing of an attended object, and looking at or reaching for it, even if the individual doing so knows nothing about the object, as occurs during visual agnosia. The surface-shroud resonance supports both spatial attention and releases commands that embody the intention to move towards the attended object.
      || What kinds of resonances support knowing vs seeing? visual agnosia: reaching without knowing Patient DF (Goodale etal 1991). Attention and intention both parietal cortical functions (Anderson, Essick, Siegel 1985; Gnadt, Andersen 1988; Synder, Batista, Andersen 1997, 1998)
    368. image p557fig15.26 Brain regions and processes that contribute to autistic behavioral symptoms when they become imbalanced in prescribed ways.
      || Basal Gamglia prolonged gate opening <-> { Amygdala emotionally depressed-> [hippocampus- hyperspecific learning; Cerebellum- adaptive timing fails; hypofrontal blocking fails, no Theory of Mind]-> Neocortex; Neocortex- rewards not received-> Amygdala}.
    369. image p189fig05.04 The hippocampus is one of several brain regions that are important in learning and remembering about objects and events that we experience throughout life. The book will describe several hippocampal processes that contribute to this achievement in different ways.
      || hypothalmic nuclei, amygdala, hippocampus, cingulate gyrus, corpus callosum, thalamus
    370. image p228fig05.37 A macrocircuit of the neurotrophic Spectrally Timed ART, or nSTART, model. I developed nSTART with my PhD student Daniel Franklin. It proposes how adaptively timed learning in the hippocampus, bolstered by Brain Derived Neurotrophic Factor, or BDNF, helps to ensure normal memory consolidation.
      || habituative gates, CS, US, Thalamus (sensory cortex, category learning, conditioned reinforcer learning, adaptively time learning and BDNF), Amygdala (incentive motivation learning), Hippocampus (BDNF), Prefrontal Cortex (attention), Pontine nuclei, Cerebellum (adaptively timed motor learning)
    371. image p233fig05.42 Mismatch-induced beta oscillations have been reported in at least three parts of the brain: V1, V4, and hippocampus. Althpough there may be other reasons for beta oscillations in the brain, those that are caused by a mismatch should be studied in concert with the gamma oscillations that occur during a good enough match. See tyhe text for details.
      || Is there evidence for the [gamma, beta] prediction? Yes, in at least three parts of the brain, (Buffalo EA, Fries P, Ladman R, Buschman TJ, Desimone R 2011, PNAS 108, 11262-11267) Does this difference in average oscillation frequencies in the superficial and deep layers reflect layer 4 reset? Superficial recording γ (gamma), Deep recording β (beta) (Berke etal 2008, hippocampus; Buschman and Miller 2009, FEF)
    372. image p520fig14.02 Macrocircuit of the main brain regions, and connections between them, that are modelled in the unified predictive Adaptive Resonance Theory (pART) of cognitive-emotional and working memory dynamics. Abbreviations in red denote brain regions used in cognitive-emotional dynamics. Those in green denote brain regions used in working memory dynamics. Black abbreviations denote brain regions that carry out visual perception, learning and recognition of visual object categories, and motion perception, spatial representation and target tracking. Arrow denote non-excitatory synapses. Hemidiscs denote adpative excitatory synapses. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown. Also not shown are output signals from cortical areas to motor responses. V1: striate, or primary, visual cortex; V2 and V4: areas of prestriate visual cortex; MT: Middle Temporal cortex; MST: Medial Superior Temporal area; ITp: posterior InferoTemporal cortex; ITa: anterior InferoTemporal cortex; PPC: Posterior parietal Cortex; LIP: Lateral InterParietal area; VPA: Ventral PreArculate gyrus; FEF: Frontal Eye Fields; PHC: ParaHippocampal Cortex; DLPFC: DorsoLateral Hippocampal Cortex; HIPPO: hippocampus; LH: Lateral Hypothalamus; BG: Basal Ganglia; AMYG: AMYGdala; OFC: OrbitoFrontal Cortex; PRC: PeriRhinal Cortex; VPS: Ventral bank of the Principal Sulcus; VLPFC: VentroLateral PreFrontal Cortex. See the text for further details.
      ||
    373. image p532fig14.08 Macrocircuit of the ARTSCENE Search neural model for learning to search for desired objects by using the sequences of already experienced objects and their locations to predict what and where the desired object is. V1 = First visual area or primary visual cortex; V2 = Second visual area; V4 = Fourth visual area; PPC = Posterior Parietal Cortex; ITp = posterior InferoTemporal cortex; ITa = anterior InferoTemporal cortex; MTL = Medial Temporal Lobe; PHC = ParaHippoCampal cortex; PRC = PeriRhinal Cortex; PFC = PreFrontal Cortex; DLPFC = DorsoLateral PreFrontal Cortex; VPFC = Ventral PFC; SC = Superior Colliculus.
      ||
    374. image p541fig15.02 The neurotrophic Spectrally Timed Adaptive Resonance Theory, or nSTART, model of (Franklin, Grossberg 2017) includes hippocampus to enable adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between CS and US.
      || Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. [CS, US] -> Sensory Cortex (SC) <- motivational attention <-> category learning -> Prefrontal Cortex (PFC). SC conditioned reinforcement learning-> Amygdala (cannot bridge the temporal gap) incentive motivational learning-> PFC. SC adaptively timer learning and BDNF-> Hippocampus (can bridge the temporal gap) BDNF-> PFC. PFC adaptively timed motor learning-> cerebellum.
    375. image p543fig15.06 The circuit between dentate granule cells and CA1 hippocampal pyramid cells seems to compute spectrally timed responses. See the text for details.
      || Hippocampal interpretation. 1. Dentate granule cells (Berger, Berry, Thompson 1986): "increasing firing...in the CS period...the latency...was constant". 2. Pyramidal cells: "Temporal model" Dentate granule cells-> CA3 pyramids. 3. Convergence (Squire etal 1989): 1e6 granule cells, 1.6e5 CA3 pyramids. 80-to-1 (ri).
    376. image p549fig15.19 How the adaptively timed hippocampal spectrum T inhibits (red arrow) the orienting system A as motivated attention in orbitofrontal cortex Si(2) peaks at the ISI.
      || Conditioning, Attention, and Timing circuit. Hippocampus spectrum-> Amgdala orienting system-> neocortex motivational attention. Adaptive timing inhibits orienting system and maintains adaptively timed Motivated Attention on the CS.
    377. image p557fig15.26 Brain regions and processes that contribute to autistic behavioral symptoms when they become imbalanced in prescribed ways.
      || Basal Gamglia prolonged gate opening <-> { Amygdala emotionally depressed-> [hippocampus- hyperspecific learning; Cerebellum- adaptive timing fails; hypofrontal blocking fails, no Theory of Mind]-> Neocortex; Neocortex- rewards not received-> Amygdala}.
    378. image p573fig16.01 The experimental chamber (A) and neurophysiological recordings from a rat hippocampus (B) that led to the discovery of place cells. See the text for details.
      ||
    379. image p575fig16.03 As a rat navigates in its experimental chamber (black curves), neurophysiological recordings disclose the firing patterns (in red) of (a) a hippocampal place cell and (b) an entrorhinal grid cell.
      ||
    380. image p578fig16.04 Cross-sections of the hippocampal regions and the inputs to them. See the text for details.
      || EC-> CA1-> CA3-> DG. Layers [V/V1, II, II].
    381. image p583fig16.10 The GRIDSmap model is embedded into a more complete representation of the processing stages from receipt of angular head velocity and linear velocity signals to this learning of place cells.
      || GRIDSmap. Pre-wired 2D stripe cells, learns 2D grid cells. vestibular cells [angular head velocity-> head direction cells, linear velocity]-> stripe cells- small scale 1D periodic spatial code (ECIII)-> SOM grid cells entorhinal cortex- small scale 2D periodic spatial scale-> SOM place cells hippocampal cortex- large scale 2D spatial code (dentate/CA3). Unified hierarchy of SOMs.
    382. image p600fig16.36 The entorhinal-hipppocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories. See the text for details.
      || Entorhinal-hippocampal interactions as an ART system. Hippocampal place cells as spatial categories. Angular head velocity-> head direction cells-> stripe cells- small scale 1D periodic code (ECIII) SOM-> grid cells- small scale 2D periodic code (ECII) SOM-> place cells- larger scale spatial map (DG/CA3)-> place cells (CA1)-> conjunctive-coding cells (EC V/VI)-> top-down feedback back to stripe cells- small scale 1D periodic code (ECIII). stripe cells- small scale 1D periodic code (ECIII)-> place cells (CA1).
    383. image p602fig16.37 Data showing the effect of hippocampal inactivation by muscimol on grid cell firing before, during, and six hours after the muscimol, reading from left to right.
      || Hippocampal inactivation disrupts grid cells (Bonnevie etal 2013). muscimole inactivation. spikes on trajectory: [before, after min [6-20, 20-40, 40-60, 6h]]. rate map (Hz) [18.6, 11.4, 9.5, 6.7, 10.8]. spatial autocorrelogram g=[1.12, 0.05, -0.34, 0.09, 1.27].
    384. image p603fig16.38 Role of hippocampal feedback in maintaining grid fields. (a) Data showing the effect of hippocampal inactivation before and during muscimol inhibition of hippocampal cells, as in Figure 16.37. (b) Model simulation with normal grid fields. (c) Model simulation that emulates the effect of hippocampal inhibition on grid fields.
      || (a) Data: hippocampal inactivation [before, after] cart [spikes on trajectory (p: [18.6, 6.7] Hz), spatial autocorrelogram (g= [1.12, 0.09])]. (b) Model: noise-free path integration, [spikes on trajectory (p: 14.56 Hz), rate map, spatial autocorrelogram (g= 1.41), dynamic autocorrelogram (g=0.6)]. (c) Model: noisy path integration + non-specific tonic inhibition, [spikes on trajectory (p: 11.33 Hz), rate map, spatial autocorrelogram (g= 0.05), dynamic autocorrelogram (g=0.047)].
    385. image p617fig16.50 The perirhinal and parahippocampal cortices enable adaptively timed reinforcement learning and spatial navigational processes that are modeled by Spectral Spacing models in the What and Where cortical streams, respectively, to be fused in the hippocampus.
      || What and Where inputs to the hippocampus (Diana, Yonelinas, Ranganath 2007). Adaptively timed conditioning and spatial naviga039tbl01.03 tion. Hippocampus <-> Entorhinal Cortex <-> [Perirhinal Cortex <-> what, Parahippocampal Cortex <-> where].
    386. p190 Howell: [neural microcircuits, modal architectures] used in ART -
      bottom-up filterstop-down expectationspurpose
      instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
      LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
      EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
      auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
      auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
      visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
      red - cognitive-emotional dynamics
      green - working memory dynamics
      black - see [bottom-up, top-down] lists
      EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
      CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

      backgound colours in the table signify :
      whitegeneral microcircuit : a possible component of ART architecture
      lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
      light bluepost-perceptual cognition?
      pink"the feeling of what happens" and knowing what event caused that feeling
    387. image p419fig12.17 The auditory continuity illusion illustrates the ART Matching Rule at the level of auditory streaming. Its "backwards in time" effect of future context on past conscious perception is a signature of resonance.
      || Auditory continuity illusion. input, percept. Backwards in time - How does a future sound let past sound continue through noise? Resonance! - It takes a while to kick in. After it starts, a future tone can maintain it much more quickly. Why does this not happen if there is no noise? - ART Matching Rule! TD harmonic filter is maodulatory without BU input. It cannot create something out of nothing.
    388. image p420fig12.18 The ARTSTREAM model explains and simulates the auditory continuity illusion as an example of a spectral-pitch resonance. Interactions of ART Matching Rule and asymmetric competition mechanisms in cortical strip maps explain how the tone selects the consistent frequency from the noise in its own stream while separating the rest of the noise into another stream.
      || ARTSTREAM model (Grossberg 1999; Grossberg, Govindarajan, Wyse, Cohen 2004). SPINET. Frequency and pitch strips. Bottom Up (BU) harmonic sieve. Top Down (TD) harmonic ART matching. Exclusive allocation. Learn pitch categories based on early harmonic processing. A stream is a Spectral-Pitch Resonance!
    389. image p425fig12.23 ARTSTREAM simulations of the auditory continuity illusion and other streaming properties (left column, top row). When two tones are separated by silence (Input), a percept of silence also separates them in a spectral-pitch resonance. (left column, bottom row). When two tones are separated by broadband noise, the percept of tone continues through the noise in one stream (stream 1) while the remainder of the noise occurs in a different stream (stream 2). (right column) Some of the other streaming properties that have been simulated by the ARTSTREAM model.
      || Auditory continuity does not occur without noise. Auditory continuity in noise. Other simulated streaming data.
    390. p190 Howell: [neural microcircuits, modal architectures] used in ART -
      bottom-up filterstop-down expectationspurpose
      instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
      LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
      EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
      auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
      auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
      visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
      red - cognitive-emotional dynamics
      green - working memory dynamics
      black - see [bottom-up, top-down] lists
      EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
      CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

      backgound colours in the table signify :
      whitegeneral microcircuit : a possible component of ART architecture
      lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
      light bluepost-perceptual cognition?
      pink"the feeling of what happens" and knowing what event caused that feeling
    391. p404 Chapter 12From seeing and reaching to hearing and speaking - Circular reaction, streaming, working memory, chunking, and number
    392. image p030tbl01.02 The What and Where cortical processing streams obey complementary laws. These laws enable the What stream to rapidly and stably learn invariant object categories without experiencing catastrophic forgetting, while the Where stream learns labile spatial and action representations to control actions that are aimed towards these objects.
      ||
      WHATWHERE
      spatially-invariant object learning and recognitionspatially-variant reaching and movement
      fast learning without catastrophic forgettingcontinually update sensory-motor maps and gains
      IT InterferoTemporal CortexPPC Posterior Parietal Cortex
      WhatWhere
      matchingexcitatoryinhibitory
      learningmatchmismatch
    393. image p032fig01.21 At least three parallel visual cortical streams respond to visual inputs that reach the retina. Two parvocellular streams process visual surfaces (blob stream) and visual boundaries (interblob stream). The magnocellular stream processes visual motion.
      || [Retina, LGNs, V[1,2,3,4], MT] to [What- inferotemporal areas, Where- parietal areas]: visual parallel streams [2x blob, 1x bound]
    394. image p039tbl01.03 The link between consciousness and movement
      ||
      VISUALseeing, knowing, and reaching
      AUDITORYhearing, knowing, and speaking
      EMOTIONALfeeling, knowing, and acting
    395. image p092fig03.05 A cross-section of the retinal layer. Note that light stimuli need to go through all retinal layers before they reach the photoreceptor layer at which the light signals are registered.
      || light stimuli ->
      retinal layerscellular composition
      inner limiting membrane
      retinal nerve fibreganglion nerve fibres
      ganglion cellganglion
      inner plexiformamacrine
      inner nuclearhorizontal
      outer plexiform
      outer limiting membrane
      photoreceptorrod
      photoreceptorcone
      retinal pigment epithelium
      <- signal transduction. http://brain.oxfordjournals.org/content/early/2011/01/20/brain.awq346
    396. image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
      || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
    397. image p278fig06.28 If the feature-category resonances cannot form, say due to a lesion in IT, then a surface-shroud resonance can still support conscious seeing of an attended object, and looking at or reaching for it, even if the individual doing so knows nothing about the object, as occurs during visual agnosia. The surface-shroud resonance supports both spatial attention and releases commands that embody the intention to move towards the attended object.
      || What kinds of resonances support knowing vs seeing? visual agnosia: reaching without knowing Patient DF (Goodale etal 1991). Attention and intention both parietal cortical functions (Anderson, Essick, Siegel 1985; Gnadt, Andersen 1988; Synder, Batista, Andersen 1997, 1998)
    398. image p303fig08.20 The G-wave speeds up with the distance between flashes at a fixed delay, and has a consitent motion across multiple spatial scales.
      || G-wave properties (Grossberg 1977). Theorem 2 (Equal half-time property) The time at which the motion signal reaches position w=L/2. Apparent motion speed-up with distance: this half-time is independent of the distance L between the two flashes. Consistent motion across scales: half-time is independent of the scale size K. Method of proof: elementary algebra and calculus (Grossberg, Rudd 1989 appendix)
    399. image p304fig08.21 A computer simulation of the equal half-time property whereby the apparent motions within different scales that respond to the same flashes all reach the half-way point in the motion trajectory at the same time.
      || Equal half-time property: how multiple scales cooperate to generate motion percept. Travelling waves from Gaussian filters of different sizes bridge the same distance in comparable time. The time needed to bridge half the distance between flashes is the same.
    400. image p335fig08.61 Behavioral data (left image) and simulation (right image) about speed in correct and error trials of the RT task. See text for details.
      || Behavioral data: speed, correct and error trials (RT task) (Roltman, Shadien 2002). More coherence in the motion causes faster reaction time.
    401. image p350fig09.22 How the negative Gaussian of an obstacle causes a peak shift to avoid the obstacle without losing sight of how to reach the goal.
      || Steering dynamics: obstacle avoidance. body-centered coordinates [obstacle, goal, heading] -> steering
    402. image p351fig09.25 By the time MT+ is reached, directional transient cells and directional filters have begun to extract more global directional information from the image.
      || M+ computes global motion estimate. Estimate global motion from noisy local motion estimates.
    403. image p414fig12.11 Neurophysiological data from cortical areas 4 and 5 (every other column) and simulations thereof (other columns) during a reach.
      || activation vs time. (a) area 4 phasic RT (IFV) (b) area 4 tonic (OPV) (c) area 4 phasic-tonic (OFPV) (d) area 4 phasic MT (DVV) (e) area 5 phasic (DV) (f) area 5 tonic (PPV)
    404. image p416fig12.13 The DIRECT model learns, using a circular reaction that is energized by an Endogenous Random Generator, or ERG, to make motor-equivalent volitionally-activated reaches. This circular reaction learns a spatial representation of a target in space. It can hereby make accurate reaches with clamped joints and on its first try using a tool under visual guidance; see Figure 12.16.
      || DIRECT model (Bulloch, Grossberg, Guenther 1993). learns by circular reaction. learns spatial reresentation to me4diate between vision and action. motor-equivalent reaching. can reach target with clamped joints. can reach target with a TOOL on the first try under visual guidance. How did tool use arise?!
    405. image p416fig12.14 Computer simulations of DIRECT reaches with (b) a tool, (c) a clamped elbow, and (d) with a blindfold, among other constraints.
      || Computer simulationsd of direct reaches [unconstrained, with TOOL, elbow clamped at 140°, blindfolded]
    406. image p417fig12.15 The DIRECT and DIVA models have homologous circuits to learn and control motor-equivalent reaching and speaking, with tool use and coarticulation resulting properties. See the text for why.
      || From Seeing and Reaching to Hearing and Speaking, Circular reactions (Piaget 1945, 1951, 1952). Homologous circuits for development and learning of motor-equivalent REACHING and SPEAKING. DIRECT TOOL use (Bullock, Grossberg, Guenther 1993), DIVA Coarticulation (Guenther 1995)
    407. image p428fig12.25 (left architecture) Auditory-articulatory feedback loop whereby babbled sounds active learning in an imitative map that is later used to learn to reproduce the sounds of other speakers. An articulatory-to-auditory expectation renders learning possible by making the auditory and motor data dimensionally consistent, as in the motor theory of speech. (right architecture) Parallel streams in the ARTSPEECH model for learning speaker-independent speech and language meaning, including a mechanism for speaker normalization (right cortical stream) and for learning speaker-dependent vocalic qualities (left cortical stream).
      || left: Speaker-dependent vocalic qualities; right: Speaker-independent speech and language meaning
    408. image p430fig12.26 The NormNet model shows how speaker normalization can be achieved using specializations of the same mechanisms that create auditory streams. See the text for how.
      || [Anchor vs Stream] log frequency map. -> diagonals-> Speaker-independent acoustic item information-> [BU adaptive filter, TD learned expectation]-> leaned item recognition categories
    409. image p446fig12.44 (left column, top row) LIST PARSE can model linguistic data from human subjects. In this figure, model parameters are fixed to enable a close fit to data about error-type distributions in immediate free recall experiments, notably transposition errors. (right column, top row) Simulation and data showing bowing of the serial position curve, including an extended primacy gradient. (left column, bottom row) The simulation curve overlays data about list length effects, notably the increasing recall difficulty of longer lists during immediate serial recall (ISR). (right column, bottom row) Simulation (bottom image) and data (top image) of the limited temporal extent for recall.
      || (1. TL) Error-type distributions in immediate serial recall (Hanson etal 1996). % occurence vs serial position. Graph convention: Data- dashed lines; Simulations- solid lines. Six letter visual ISR. Order errors- transpositions of neighboring items are the most common. Model explanation: Noisy activation levels change relative oorder in primacy gradient. Similar activation of neighboring items most susceptible to noise. Model paramenters fitted on these data. (2. TR) Bowing of serial position curve (Cowan etal 1999). % correct vs serial position. Auditory ISR with various list lengths (graphs shifted rightward): For [, sub-]span lists- extended primacy, with one (or two) item recency; Auditory presentation- enhanced performance for last items. LIST PARSE: End effects- first and last items half as many members; Echoic memory- last presented item retained in separate store. (3. BL) List length effects, circles (Crannell, Parrish 1968), squares (Baddeley, Hitch 1975), solid line- simulation. % list correct vs list length. Variable list length ISR: longer lists are more difficult to recall. LIST PARSE: More items- closer activation levels and lower absolute activity level with enough inputs; Noise is more likely to produce order errors, Activity levels more likely to drop below threshold;. (4. BR) Limited temporal extent for recall (Murdoch 1961). % recalled vs retention interval (s). ISR task with distractor-filled retention intervals (to prevent rehersal): Increasing retention interval - decreases probabilty of recalling list correctly; Load dependence- longer list more affected by delays; Performance plateau- subjects reach apparent asymptote. LIST PARSE: Increase convergence of activities with time; loss of order information;.
    410. image p231fig05.39 The SMART hypothesis testing and learning cycle predicts that vigilance increases when a mismatch in subcortical regions like the nonspecific thanlamus activates the nucleus basalis of Meynert which, in turn, broadcasts a burst of the neurotransmitter acetylcholine, or ACh, to deeper cortical layers. Due to the way in which LAMINART proposes that cortical matching and mismatching occurs, this ACh burst can increase vigilance and thereby trigger a memory search. See the text for details.
      || [BU input, [, non]specific thalamic nucleus, thalamic reticulate nucleus, neocortical laminar circuit] cart [Arousal, Reset, Search, Vigilance]
    411. image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
      || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
    412. image p461fig12.58 The lisTELOS model built upon key processes that were earlier modeled by the TELOS model. See the text for details.
      || TELOS model (Brown, Bulloch, Grossberg 1999, 2004). shows [BG nigro-[thalamic, collicular], FEF, ITa, PFC, PNR-THAL, PPC, SEF, SC, V1, V4/ITp, Visual Cortex input] and [GABA].
    413. image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
      ||
    414. image p524fig14.04 (a) Model basal ganglia circuit for the control of dopaminergic Now Print signals from the substantia nigra pars compacta, or SNc, in response to unexpected rewards. Cortical inputs (Ii), activated by conditioned stimuli, learn to excite the SNc via a multi-stage pathway from the vantral striatum (S) to the ventral pallidum and then on to the PPTN (P) and the SNc (D). The inputs Ii excite the ventral striatum via adaptive weights WIS, and the ventral striatum excites the SNc with strength W_PD. The striosomes, which contain an adaptive spectral timing mechanism [xij, Gij, Yij, Zij], learn to generate adaptively timed signals that inhibit reward-related activation of the SNc. Primary reward signals (I_R) from the lateral hypothalamus both excite the PPTN directly (with strength W_RP) and act as training signals to the ventral striatum S (with strength W_RS) that trains the weights W_IS. Arrowheads denote excitatory pathways, circles denote inhibitory pathways, and hemidiscs denote synapses at which learning occurs. Thick pathways denote dopaminergic signals.
      ||
    415. image p559fig15.27 Brain regions and processes that contribute to the release of dopaminergic Now Print signals by the substantia nigra pars compacta, or SNc, in response to unexpected reinforcing events. See the text for details.
      || Model of spectrally timed SNc learning (Brown, Bulloch, Grossberg 1999). Delayed inhibitory expectations of reward. Dopamine cells signal an error in reqard prediction timing or magnitude. Immediate excitatory predictions of reward. Lateral hypothalamus (Primary Reward Input)-> [(+)ventral striatum <-> ventral pallidium (+)-> PPTN(+)-> SNc]. SNc-> [dopamine signal -> ventral striatum, Striosomal cells]. Conditioned Stimuli (CS)(+)-> [ventral striatum, striosomal cells]. Striosomal cells(-)-> SNc.
    416. image p560fig15.29 Excitatory pathways that support activation of the SNc by a US and the conditioning of a CS to the US.
      || Excitatory pathway. Primary reward (apple juice) briefly excites lateral hypothalamus. Hypothalamic-PPTN excitation causes SNc dopamine burst. Hypothalamic activity excites ventral striatum for training. Active CS working memory signals learn to excite ventral striatum. Lateral hypothalamus (Primary Reward Input)-> [(+)ventral striatum <-> ventral pallidium(+)-> PPTN(+)-> SNc]. SNc-> [dopamine signal -> ventral striatum. Conditioned Stimuli working memory trace (CS)(+)-> ventral striatum.
    417. image p560fig15.30 The inhibitory pathway from striosomal cells to the SNc is able to inhibit the SNc when a reward occurs with expected timing and magnitude.
      || Inhibitory pathway. Learning: CS-striosomal LTP occurs due to a three-way coincidence [An active CS working memory input, a Ca2+ spike, a dopamine burst]; Signaling: The delayed Ca2+ spike facilitates striosomal-SNc inhibition;. Striosomal cells learn to predict both timing and magnitude of reward signal to cancel it: reward expectation;. Conditioned stimuli (CS) LTP-> Striosomal cells <- dopamine | (-)-> SNc->.
    418. image p561fig15.32 The SNc can generate both dopamine bursts and dips in response to rewards whose amplitude is unexpectedly large or small.
      || Inhibitory pathway: expectation magnitude. 1. If reward is greater than expected, a dopamine burst causes striosomal expectation to increase. 2. If reward is less than expected, a dopamine dip causes striosomal expectation to decrease. 3. This is a negative feedback control system for learning. Conditioned stimuli (CS)-> Striosomal cells <- dopamine | (-)-> SNc->.
    419. image p569fig15.40 The direct and indirect basal ganglia circuits that control GO and STOP movement signals. See the text for details.
      || [Direct path GO(+), Indirect path STOP(+), dopamine from SNc(+-)]-> striatum. GO-> GPi/SNr-> Thalamus (VA/Vlo) <-> frontal cortex. STOP-> GPe <-> STN-> GPi/SNr. NAc-> GPi/SNr.
    420. image p375fig11.06 The contrast constraint on binocular fusion is realized by obligate cells in layer 3B of cortical area V1.
      || Model implements contrast constraint on binocular fusion (cf. "obligate" cells Poggio 1991). An ecological constraint on cortical development. [left, right] eye cart V1-[4 monocular simple, 3B binocular simple, complex2/3A] cells. Inhibitory cells (red) ensure that fusion occurs when contrasts in left and right eye are approximately equal.
    421. image p376fig11.09 The disparity filter in V2 helps to solve the correspondence problem by eliminating spurious contrasts using line-of-sight inhibition.
      || Model V2 disparity filter solves the correspondence problem. An ecological constraint on cortical development. [left, right] eye view: False matches (black) suppressed by line-of-sight inhibition (green lines). "Cells that fire together wire together".
    422. image p581fig16.06 The learning of hexagonal grid cell receptive fields as an animal navigates an open field is a natural consequence of simple trigonometric properties of the positions at which the firing of stripe cells that are tuned to different directions will co-occur.
      || The Trigonometry of spatial navigation. Coactivation of stripe cells.
    423. image p583fig16.09 The GRIDSmap model used algorithmically defined stripe cells to process realistic rat trajectories. The stripe cell outputs then formed inputs to the adaptive filter of a self-organizing map which learned hexagonal grid cell receptive fields.
      || GRIDSmap. Self-organizing map receives inputs from stripe cells and learns to respond to most frequent co-activation patterns. Stripe cells combine speed and head direction to create a periodic 1D position code. Virtual rat navigated using live rat trajectories from Moser Lab. Speed and head direction drives stripe cells.
    424. image p584fig16.11 GRIDSmap simulation of the learning of hexagonal grid fields. See the text for details.
      || Simulation results. Multiple phases per scale. response vs lenght scale (0.5m+).
    425. image p585fig16.13 Hexagonal grid cell receptive fields develop if their stripe cell directional preferences are separated by 7, 10, 15, 20, or random numbers degrees. The number and directional selectivities of stripe cells can thus be chosen within broad limits without undermining grid cell development.
      ||
    426. image p585fig16.14 Superimposing firing of stripe cells whose directional preferences differ by 60 degrees supports learning hexagonal grid cell receptive fields in GRIDSmap.
      || GRIDSmap: from stripe cells to grid cells. Grid-cell Regularity from Integrated Distance through Self-organizing map. Superimposing firing of stripe cells oriented at intervals of 60 degrees. Hexagonal grid!
    427. image p586fig16.15 Superimposing stripe cells oriented by 45 degrees does not lead to learning of rectangular grids in GRIDSmap, but it does in an oscillatory inference model.
      || Why is a hexagonal grid favored? Superimposing firing of stripe cells oriented at intervals of 45 degrees. Rectangular grid. This and many other possibilities do not happen in vivo. They do happen in the oscillatory inference model. How are they prevented in GRIDSmap?
    428. image p587fig16.17 A finer analysis of the 2D trigonometry of spatial navigation showed that both the frequency and amplitude of coactivations by stripe cells determine the learning of hexagonal grid fields.
      || A refined analysis: SOM amplifies most frequent and energetic coactivations (Pilly, Grossberg 2012). [linear track, 2D environment]. (left) Stripe fields separated by 90°. 25 coactivations by 2 inputs. (right) Stripe fields separated by 60°. 23 coactivations by 3 inputs.
    429. image p588fig16.18 Simulations of coordinated learning of grid cell receptive fields (second row) and unimodal place cell receptive fields (third row) by the hierarchy of SOMs in the GridPlaceMap model. Note the exquisite regularity of the hexagonal grid cell firing fields.
      || [stripe, grid, place] cells vs [spikes on trajectory, unsmoothed rate map, smoothed rate map].
    430. image p605fig16.39 Data showing effects of medial septum (MS) inactivation on grid cells and network theta oscillations in medial entorhinal cortex (MEC). (A) Examples of disruption in the spatial expression of the hexagonal grid structure for two grid cells (Brandon etal 2011). (B) Temporal reduction in the power and frequency of network theta oscillations (Koenig etal 2011). (C) Temporary reduction in the gridness score, mean firing rate, and patial stability of grid cells (Koenig etal 2011).
      || Disruptive effects of Medial Septum inactivation in Medial Entorhinal Cortex (Brandon etal 2011; Koenig etal 2011). (A) Rate map [rate map, spatial autocorrelations, trajectory] vs [baseline, sub-sampled, medial septum inactivation, 3-6 hour recovery, 24 hour recovery], [rate map (Hz- m, p), spatial autocorrelations (gridness)][ 1.2, 7.2, 1.1; 0.25, 1.7, 0.6; 0.25, 2.5, -0.53; 0.7, 5.1, 0.55; 1.0, 5.3, 1.3; 2.1, 15, 0.19; 1.7, 12, 0.71; 1.7, 3.2, -0.22; 1.8, 9.1, 0.68; 2.5, 13, 0.46]. (B) [normalized power at 7-9 Hz, frequency (Hz)] vs 5-minute periods. (C) [mean gridness score (+-SEM), mean firing rate (% of baseline), mean correlation coeff (+-SEM)] vs 10-minute periods.
    431. p353 Chapter 10 Laminar computing by cerebral cortex - Towards a unified theory of biologucal and artificial intelligence
    432. image p011fig01.07 The choice of signal function f determines how an initial activity pattern will be transformed and stored in short-term memory (STM). Among [same, slower, faster]-than-linear signal functions, only the last one can suppress noise. It does so as it chooses the population that receives the largest input for storage, while suppressing the activities of all other population, thereby giving rise to a winner-take-all choice.
      || initial pattern (xi(0) vs i):
      fXi(∞)= xi(∞)/sum[j: xj(∞)]x(∞)
      linearperfect storage of any patternamplifies noise (or no storage)
      slower-than-linearsaturatesamplifies noise
      faster-than-linearchooses max [winner-take-all, Bayesian], categorical perceptionsuppresses noise, [normalizes, quantizes] total activity, finite state machine
    433. image p357fig10.04 Laminar Computing achieves its properties by computing in a new way that sythesizes the best properties of feedforward and feedback interactions, analog and digital computations, and preattentive and attentive learning. The property of analog coherence enables coherent groupings and decisions to form without losing sensitivity to the amount of evidence that supports them.
      || Laminar Computing: a new way to compute. 1. Feedforward and feedback: a) Fast feedforward processing when data are unambiguous (eg Thorpe etal), b) slower feedback chooses among ambiguous alternatives [self-normalizing property, real-time probabiligy theory], c) A self-organizing system that trades certainty against speed: Goes beyond Bayesian models! 2. Analog and Digital: Analog Coherence combines the stability of digital with the sensitivity of analog. 3. Preattentive and Attentive Learning: Reconciles the differences of (eg) Helmholtz and Kanizsa, "A preattentive grouping is its own 'attentional' prime"
    434. image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
      || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
    435. image p232fig05.40 Computer simulation of how the SMART mode generates (a) gamma oscillations if a good enough match occurs, or (c) beta oscillations if a bad enough match occurs. See the text for details.
      || Brain oscillations during match/mismatch, data, simulation. (a) TD corticothalamic feedback increases synchrony (Sillito etal 1994) (b) Match increases γ oscillations (c) Mismatch increases θ,β oscillations
    436. image p233fig05.42 Mismatch-induced beta oscillations have been reported in at least three parts of the brain: V1, V4, and hippocampus. Althpough there may be other reasons for beta oscillations in the brain, those that are caused by a mismatch should be studied in concert with the gamma oscillations that occur during a good enough match. See tyhe text for details.
      || Is there evidence for the [gamma, beta] prediction? Yes, in at least three parts of the brain, (Buffalo EA, Fries P, Ladman R, Buschman TJ, Desimone R 2011, PNAS 108, 11262-11267) Does this difference in average oscillation frequencies in the superficial and deep layers reflect layer 4 reset? Superficial recording γ (gamma), Deep recording β (beta) (Berke etal 2008, hippocampus; Buschman and Miller 2009, FEF)
    437. image p296fig08.07 When two flashes turn on and off out of phase with the correct range of interstimulus intervals, and not too far from one another, then either beta motion of phi motion are perceived.
      || Beta and Phi motion percepts. Beta motion: percepts of continuous motion of a well-defined object across empty intervening space. Phi motion: sense of "pure" motion without a concurrent percept of moving object. (Exner 1875) http://www.yorku.ca/eye/balls.htm
    438. image p297fig08.08 When a second flash is more intense than the first flash, then apparent motion may occur from the second to the first flash.
      || Delta motion: motions from the second to the first flash. Data: (Kolers 1972; Korte 1915). Simulation: (Grossberg, Rudd 1992). This occurs when the luminance or contrast of the second flash is large compared to that of the first flash. Sustained and transient cells obey shunting dynamics whose averaging rates speed up with output intensity. The first flash to wane is the one that will be the source of the G-wave.
    439. image p340fig09.07 Log polar remapping from the retina to cortical area V1 and beyond converts expansion, translation, and spiral flows on the retina into parallel flows, with different orientations, on the cortical map.
      || Log polar remapping of optic flow. retina -> cortex. Any combination of expansion and circular motion centered on the fovea maps to cortex as a single direction. Retinal Cartesian coordinates (x,y) map to cortical polar coordinates (r,theta). This makes it easy to compute directional receptive fields in the cortex!
    440. image p423fig12.20 The Spatial Pitch Network, or SPINET, model shows how a log polar spatial representation of the sound frequency spectrum can be derived from auditory signals occuring in time. The spatial representation allows the ARTSTREAM model to compute spatially distinct auditory streams.
      || SPINET model (Spatial Pitch Network) (Cohen, Grossberg, Wyse 1995). 1. input sound 2. Gamma-tone filter bank 3. Shaort-term average energy spectrum 4. MAP transfer function 5. On-center off-surround and rectification 6. Harmonic weighting 7. Harmonic summation and competition -> PITCH
    441. image p598fig16.34 The spiking GridPlaceMap model generates theta-modulated place and grid cell firing, unlike the rate-based model.
      || Theta-modulated cells in spiking model. [place, grid] cell vs [membrane potential (mV vs time), frequency vs inter-spike intervals (s), power spectra (normalized power vs frequency (Hz))].
    442. image p605fig16.39 Data showing effects of medial septum (MS) inactivation on grid cells and network theta oscillations in medial entorhinal cortex (MEC). (A) Examples of disruption in the spatial expression of the hexagonal grid structure for two grid cells (Brandon etal 2011). (B) Temporal reduction in the power and frequency of network theta oscillations (Koenig etal 2011). (C) Temporary reduction in the gridness score, mean firing rate, and patial stability of grid cells (Koenig etal 2011).
      || Disruptive effects of Medial Septum inactivation in Medial Entorhinal Cortex (Brandon etal 2011; Koenig etal 2011). (A) Rate map [rate map, spatial autocorrelations, trajectory] vs [baseline, sub-sampled, medial septum inactivation, 3-6 hour recovery, 24 hour recovery], [rate map (Hz- m, p), spatial autocorrelations (gridness)][ 1.2, 7.2, 1.1; 0.25, 1.7, 0.6; 0.25, 2.5, -0.53; 0.7, 5.1, 0.55; 1.0, 5.3, 1.3; 2.1, 15, 0.19; 1.7, 12, 0.71; 1.7, 3.2, -0.22; 1.8, 9.1, 0.68; 2.5, 13, 0.46]. (B) [normalized power at 7-9 Hz, frequency (Hz)] vs 5-minute periods. (C) [mean gridness score (+-SEM), mean firing rate (% of baseline), mean correlation coeff (+-SEM)] vs 10-minute periods.
    443. Grossberg 2021 p229c2h0.60 SMART computer simulations demonstrate that a good enough match of a top-down expectation with a bottom-up feature pattern generates an attentive resonance during which the spikes of active cells synchronize in the gamma frequency range of 20-70 Hz (Figure 5.40). Many labs have reported a link between attention and gamma oscillations in the brain, including two articles published in 2001, one from the laboratory of Robert Desimone when he was at the the National Institute of Mental Health in Bethseda (Fries, Reynolds, Rorie, Desimone 2001), and the other from the laboratory of Wolf Singer in Frankfurt (Engel, Fries, Singer 2001). You'll note that Pascal Fries participated in both studies, and is an acknowledged leader in neurobiological studies of gamma oscillations; eg (Fries 2009). .."
    444. p618 Chapter 17 A universal development code - Mental measurements embody universal laws of cell biology and physics
    445. image p025fig01.16 (left panel) The main processing stages of the Cognitive-Emotional-Motor (CogEM) model have anatomical interpretations in terms of sensory cortex, amygdala, and prefrontal cortex. Chapter 13 will describe in greater detail how CS cues activate invariant object categories in the sensory cortex, value categories in the amygdala, and object-value categories in the prefrontal cortex, notably the orbitofrontal cortex. The amygdala is also modulated by internal drive inputs like hunger and satiety. (right panel) Anatomical data support this circuit, as do many neurophysiological data.
      || drive -> amygdala -> prefrontal cortex <-> sensory cortex -> amygdala. [visual, somatosensory, auditory, gustatory, olfactory] cortex -> [amygdala, Orbital Prefrontal Cortex]. amygdala -> Lateral Prefrontal Cortex
    446. image p058fig02.04 Serial learning paradigm: Learning the temporal order of events by practicing them in the order that they occur in time.
      || Learning a global arrow in time. How do we learn to encode the temporal order of events in LTM? serial learning. [w=intra, W=inter]trial interval. "... data about serial verbal learning (Figure 2.4) seemed to suggest that events can go "backwards in time". ..."
    447. image p059fig02.05 Bowed serial position curve. This kind of data emphasizes the importance of modelling how our brains give rise to our minds using nonlinear systems of differential equations.
      || Effects of [inter, intra]trial intervals (Hovland 1938). # of errors vs list position. [w (sec), W (sec)] = (2 6) (4 6) (2 126) (4 126). Nonoccurance of future items reduces the number of errors in response to past items. Thes data require a real-time theory for their explanation! that is, DIFFERENTIAL equations.
    448. image p059fig02.06 The bowed serial position curve illustrates the sense in which "events can go backwards in time" during serial learning.
      || Bow due to backward effect in time. If the past influenced the future, but no conversely: # of errors vs list position; Data (Hoyland Hull, Underwood, etc).
    449. image p071fig02.16 To solve the noise-saturation dilemma, individual neurons in a network that is receiving a distributed spatial patterns of inputs need to remain sensitive to the ratio of input to them divided by all the inputs in that spatial pattern. Although the inputs are delivered to a finite number of neurons, the input and activity patterns are drawn continuously across the cells for simplicity.
      || Noise-Saturation Dilemma. [Ii, xi] vs t. [Input, Activity] pattern [small -> noise, large -> saturation]. Problem: remain sensitive to input RATIOS θi = Ii / sum[j: Ij] as total input I = sum[j: Ij] -> ∞. Many kinds of data exhibit sensitivity to ratios of inputs.
    450. image p073fig02.19 Computing with cells: infinity does not exist in biology!
      || Computing in a bounded activity domain, Gedanken experiment (Grossberg 1970). Vm sub-areas [xm, B - xm], I(all m)], m=[1, i, B].
      Bexcitable sites
      xi(t)excited sites (activity, potential)
      B - xi(t)unexcited sites
    451. image p082fig02.37 My models begin with behavioral data, since brains are designed to achieve behavioral success. The text explains how models evolve in stages, through a process of successive refinements, or unlumpings. These unlumpings together carry out a kind of conceptual evolution, leading to models that can explain and predict ever larger psychological and neurobiological databases.
      || Modelling method and cycle.
      Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
      Operationalizes "proper level of abstraction"
      Operationalizes that you cannot "derive a brain" in one step.
    452. image p085fig02.38 Our models have been used in many large-scale applications to engineering and technology. Linking brain to behavior explains how brain mechanisms give rise to psychological functions, and do so autonomously. The combination of mechanism, function, and autonomy helps to explain their value in helping to solve outstanding problems in technology.
      || Modeling method and cycle.
      Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
      Technology: Mathematical model and analysis <-> Technological applications
      At every stage, spin off new model designs and mechanisms to technologist who need autonomous intelligent applications.
    453. image p134fig04.14 The kinds of displays that Michael Paradiso and Ken Nakayamas used to catch filling-in "in the act" and which Karl Arrington then simulated using the Grossberg and Todorovic 1988 model.
      || Experiments on filling-in. Catching "filling0in" in the act (Paradiso, Nakayama 1991). (Arrington 1994 Vision Research 34, 3371-3387) simulated these data using the model of Grossberg and Todorovic 1988.
    454. image p145fig04.23 If end gaps were not closed by end cuts, then color would flow out of every line end!
      || A perceptual disaster in the feature contour system. feature contour, line boundary. input -> [boundary, surface]. boundary -> surface. Color would flow out of every line end! as it does during neon color spreading.
    455. image p151fig04.29 Experimental evidence of bipole cells in cortical area V2 was reported by Von der Heydt, Peterhans, and Baumgarter (1984).
      || Bipoles: first neurophysiological evidence (V2) (von der Heydt, Peterhans, Baumgartner 1984, Peterhans, von der Heydt 1988). (Grossberg 1984) prediction.
      Ordering:
      Stimulus (S)
      probe location *
      cells in V2
      response?
      ...(S)*...YES
      ...*...(S)NO
      (S)...*...NO
      (S)...*...(S)YES
      (S)...*...
      (more contrast)
      NO
      (S)...*.....(S)YES
      Evidence for receptive field.
    456. image p151fig04.30 Anatomical evidence for long-range horizontal connections has also been reported, as illustrated by the example above from (Bosking etal 1997).
      || Anatomy: horizontal connections (V1) (Bosking etal 1997). tree shrew. [10, 20]*[20, 10, 0, -10, -20] (degrees).
    457. image p152fig04.31 The predicted bipole cell receptive field (upper left corner) has been supported by both neurophysiological data and psychophysical data, and used in various forms by many modelers. See the text for details.
      || Bipoles through the ages. (Grossberg 1984; Grossberg, Mongolla 1985). (Field, Hayes, Hess 1993) "association field". (Heitger, von der Heydt 1993). (Williams, Jacobs 1997). cf. "relatability" geometric constraints on which countours get to group (Kellman & Shipley 1991). Also "tensor voting" (Ullman, Zucker, Mumford, Guy, Medioni, ...).
    458. image p159fig04.36 Graffiti art by Banksy exploits properties of amodal boundary completion and spatial impenetrability.
      || p159c1h0.75 perceptual psychologist Nava Rubin "... When the wall is smooth, Banksy leaves the regions previously covered by stencil unpainted, relying of observers' perception to segregate figural regions from the (identically colored) background. But when the wall is patterned with large-scale luminance edges - eg due to bricks - Banksy takes the extra time to fill in unpainted figural regions with another color (Rubin 2015). ..."
    459. image p162fig04.38 How long-range cooperation among bipole cells and short-range competition by hypercomplex cells work together to generate the inverted-U in boundary strength that is found in the data of Figure 4.37 (right panel).
      || Cooperation and competition during grouping.
      few lineswide spacing, inputs outside spatial range of competition, more inputs cause higher bipole activity
      more linesnarrower spacing, slightly weakens net input to bipoles from each inducer
      increasing line densitycauses inhibition to reduce net total input to bipoles
    460. image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
      || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
    461. image p165fig04.41 The Kanizsa-Minguzzi ring. See the text for details.
      || p165c1h0.6 "... (left panel), the annulus is divided by two line segments into annular sectors of unequal area. Careful viewing shows that the smaller sector looks a little brighter than the larger one. (Kanizsa, Minguzzi 1986) noted that "this unexpected effect is not easily explained. In fact, it cannot be accounted for by any simple psychological mechanism such as lateral inhibition or freuency filtering. Furthermore, it does not seem obvious to invoke oganizational factors, like figural belongingness or figure-ground articulation."". p165c2h0.35 "... (Grossberg, Todorovic 1988). Our main claim is that the two radial lines play two roles, one in the formation of boundaries with which to contain the filling-in process, and the other as a source of feature contour signals that are filled-in within the annular regions to create a surface brightness percept. ..."
    462. image p232fig05.40 Computer simulation of how the SMART mode generates (a) gamma oscillations if a good enough match occurs, or (c) beta oscillations if a bad enough match occurs. See the text for details.
      || Brain oscillations during match/mismatch, data, simulation. (a) TD corticothalamic feedback increases synchrony (Sillito etal 1994) (b) Match increases γ oscillations (c) Mismatch increases θ,β oscillations
    463. image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
      || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
    464. image p252fig06.01 A surface-shroud resonance begins to form when the surface representations of objects bid for spatial attention. In addition to these topographic excitatory inputs, there is long-range inhibition of the spatial attention cells that determines which inputs will attract spatial attention.
      || Bottom-up spatial attention competition. [more, less] luminous perceptual surfaces -> competition -> spatial attention
    465. image p253fig06.02 After bottom-up surface inputs activate spatial attentional cells, they send top-down topographic excitatory signals back to the surface representations. This recurrent shunting on-center off-surround network contrast enhances larger attentional activities while approximately normalizing the total spatial attentional activity. A surface-shroud resonance hereby forms that selects an attentional shroud, enhances the perceived contrast of the attended surface (light blue region), and maintains spatial attention on it.
      || Surface-shroud resonance. perceptual surfaces -> competition -> spatial attention. (Carrasco, Penpeci-Talgar, and Eckstein 2000, Reynolds and Desimone 2003)
    466. image p257fig06.05 A curve tracing task with monkeys was used by Roelfsema, Lamme, and Spekreijse in 1998 to demonstrate how spatial attention can flow along object boundaries. See the text for details.
      || Attention flows along curves: Roelfsema etal 1998: Macaque V1. fixation (300ms) -> stimulus (600ms RF - target curve, distractor) -> saccade. Crossed-curve condition: attention flows across junction between smoothly connected curve segments, Gestalt good continuation
    467. image p258fig06.06 Neurophysiological data and simulation of how attention can flow along a curve. See the text for details.
      || Simulation of Roelfsema etal 1998, data & simulation. Attention directed only to far end of curve. Propagates along active layer 2/3 grouping to distal neurons.
    468. image p265fig06.13 The basal ganglia gate perceptual, cognitive, emotional, and more processes through parallel loops.
      || [motor, ocularmotor, dorsolateral, ventral-orbital, anterior cingulate] vs. [Thalamus, pallidum-subs, nigra, Striatum, Cortex]
    469. image p267fig06.14 Feedback from object surfaces to object boundaries uses surface contours. This feedback assures complementary consistency and enables figure-ground separation. A corollary discharge of the surface contours can be used to compite salient object feature positions.
      || Perceptual consistency and figure-ground separation.
    470. image p271fig06.17 Persistent activity in IT cells is just what is needed to enable view-invariant object category learning by ARTSCAN to be generalized to [view, position, size]-invariant category learning by positional ARTSCAN, or pARTSCAN. See the text for details.
      || Persistent activity in IT. Physiological data show that persistent activity exist in IT (Fuester and Jervey 1981, Miyashita and Chang 1988, Tomita etal 1999). Adapted from (Tomita etal 1999 Nature)
    471. image p273fig06.20 pARTSCAN can simulate the IT cell recoding that Li and DiCarlo reported in their swapping experiments because the swapping procedure happens without causing a parietal reset burst to occur. Thus the originally activated invariant category remains activated and can get associated with the swapped object features.
      || Simulation of Li and DiCarlo swapping data. data (Li and DiCarlo 2008), model (Cao, Grossberg, Markowitz 2011). normalized response vs. exposure (swaps and/or hours)
    472. image p274fig06.21 pARTSCAN can also simulate the trade-off in IT cell responses between position invariance and selectivity that was reported by Zoccolan etal 2007. This trade-off limits the amount of position invariance that can be learned by a cortical area like V1 that is constrained by the cortical magnification factor.
      || Trade-off in IT cell response properties. Inferotemporal cortex cells with greater position invariance respond less selectively to natural objects. invariance-tolerance, selectivity-sparseness. data (Zoccolan etal 2007) model (Grossberg, Markowitzm, Cao 2011). position tolerance (PT, degrees) vs sparseness (S)
    473. image p275fig06.23 Data from (Akrami etal 2009) and our simulation of it. See the text for details.
      || IT responses to image morphs. data vs model
    474. image p284fig07.02 Psychophysical data (top row) and simulation (bottom row) of how persistence decreases with flash illuminance and duration.
      || Persistence data and simulations. (Francis, Grossberg, Mingolia 1994 Vision Research, 34, 1089-1104). Persistence decreases with flash illuminance and duration (Bowen, Pola, Matin 1974; Breitmeyer 1984; Coltheart 1980). Higher luminance or longer duration habituates the gated dipole ON channel more. Causes larger and faster rebound in the OFF channel to shut persisting ON activity off.
    475. image p285fig07.03 Persistence decreases with flash illuminance and duration due to the way in which habituative transmitters regulate the strength of the rebound in response to offset of a stimulating input, and how this rebound inhibits previously activated bipole cells.
      || Persistence data and simulations (Francis, Grossberg, Mingolia 1994 Vision Research, 34, 1089-1104). Persistence decreases with flash illuminance and duration. Horizontal input excites a horizontal bipole cell, which supports persistence. Offset of the horizontal input causes a rebound of activity in the vertical pathway, which inhibits the horizontal bipole cell, thereby terminating persistence.
    476. image p286fig07.04 Illusory contours persist longer than real contours because real contours have more inducers whose rebound at contour offset can cause faster boundary reset. Illusory contours also take longer to form than real contours, which explains the increasing portion of the curve.
      || Persistence data and simulations (Meyer, Ming 1988; Reynolds 1981). Increasing portion of curve is due to formation time of the illusory contour. Longer persistence is due to fewer bottom-up inducers of an illusory contour that has the same length as a real contour: only illuminance-derived edges generate reset signals. When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
    477. image p286fig07.05 This figure shows the propagation through time of illusory contour offset from the rebounded cells that got direct inputs to the center of the contour.
      || Persistence data and simulations. Illusory contours persist longer than real contours (Meyer, Ming 1988; Reynolds 1981). When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
    478. image p287fig07.06 The relative durations of persistence that occur due to an adaptation stimulus of the same or orthogonal orientation follow from the properties of the habituative gated dipoles that are embedded in the boundary completion system.
      || Persistence data and simulations. Change in persistence depends on whether adaptation stimulus has same or orthogonal orientation as test grating (Meyer, Lawson, Cohen 1975). If adaptation stimulus and test stimulus have the same orientation, they cause cumulative habituation, which causes a stronger reset signal, hence less persistence. When they are orthogonal, the competition on the ON channel is less, hence more persistence.
    479. image p287fig07.07 Persistence increases with distance between a target and a masking stimulus due to weakening of the spatial competition in the first competitive stage of hypercomplex cells.
      || Persistence data and simulations. Persistence increases with distance between a target and a masking stimulus (Farrell, Pavel, Sperling 1990). There is less spatial competition from the masker to the target when they are more distant, hence the target is more persistent.
    480. image p297fig08.08 When a second flash is more intense than the first flash, then apparent motion may occur from the second to the first flash.
      || Delta motion: motions from the second to the first flash. Data: (Kolers 1972; Korte 1915). Simulation: (Grossberg, Rudd 1992). This occurs when the luminance or contrast of the second flash is large compared to that of the first flash. Sustained and transient cells obey shunting dynamics whose averaging rates speed up with output intensity. The first flash to wane is the one that will be the source of the G-wave.
    481. image p297fig08.09 Simulation of motion in opposite directions that is perceived when two later flashes occur on either side of the first flash.
      || Split motion. Data: (H.R. Silva 1926), Simulation: (Grossberg, Rudd 1992)
    482. image p298fig08.10 Simulation of the motion speed-up that is perceived when flash duration decreases.
      || "The less you see it, the faster it moves". Data: (Giaschi, Anstis 1989), Simulation: (Grossberg, Rudd 1992). ISI = 0, flash duration decreases; SOA = constant, flash duration decreases
    483. image p304fig08.22 Data (top image) and simulation (bottom image) of Korte's laws. The laws raise the question of how ISIs in the hundreds of milliseconds can cause apparent motion.
      || Korte's Laws, Data: (Korte 1915) Simulation: (Francis, Grossberg 1996)
    484. image p311fig08.30 The data of (Castet etal 1993) in the left image was simulated in the right image by the 3D FORMOTION model that I developed with my PhD student Jonathan Chey. These data provide insight into how feature tracking signals propagate from the ends of a line to its interior, where they capture consistent motion directional signals and inhibit inconsistent ones.
      || Solving the aperture problem. A key design problem: How do amplified feature tracking signals propagate within depth to select the cirrect motion directions at ambiguous positions? This propagation from feature tracking signals to the line interior determines perceived speed in Castet etal data, which is why speed depends on line tilt and length. Data: (Castet etal 1993), Simulation: (Chey etal 1997)
    485. image p319fig08.38 The neurophysiological data from MT (left image) confirms the prediction embodied in the simulation of MT (right image) concerning the fact that it takes a long time for MT to compute an object's real direction of motion.
      || Solving the aperture problem takes time. MT Data (Pack, Born 2001), MT simulation (Chey, Grossberg, Mingolia 1997)
    486. image p333fig08.58 Neurophysiological data (left image) and simulation (right image) of LIP data during correct trials on the RT task. See the text for details.
      || LIP responses during RT task correct trials (Roltman, Shadlen 2002). More coherence in favored direction causes faster cell activation. More coherence in opposite direction causes faster cell inhibition. Coherence stops playing a role in the final stages of LIP firing.
    487. image p334fig08.59 Neurophysiological data (left column) and simulations (right column) of LIP responses for the FD task during both [correct, error] trials. See the text for details.
      || LIP responses for the FD task during both [correct, error] trials (Shadlen, Newsome 2001). LIP encodes the perceptual decision regardless of the true direction of the dots. Predictiveness of LIP responses on error trials decreases with increasing coherence.
    488. image p334fig08.60 Behavioral data (left image) and simulation (right image) about accuracy in both the RT and FD tasks. See text for details
      || Behavioral data: % correct vs % coherence (Mazurek etal 2003; Roltman, Shadien 2002). More coherence in the motion causes more accurate decisions. RT task accuracy at weaker coherence levels is slightly better than FD task accuracy.
    489. image p335fig08.61 Behavioral data (left image) and simulation (right image) about speed in correct and error trials of the RT task. See text for details.
      || Behavioral data: speed, correct and error trials (RT task) (Roltman, Shadien 2002). More coherence in the motion causes faster reaction time.
    490. image p335fig08.62 More remarkable simulation fits (right column) to LIP neurophysiology data (left column) about where and when to move the eyes.
      || LIP encodes not only where, but also when, to move the eyes. ...No Bayes(Roltman, Shadien 2002). Firing rate (sp/s) vs time (ms). Slope of firing rate (sp/s^2) vs % correct.
    491. image p342fig09.11 Psychophysical data (left panel) and computer simulation (right column) of the importance of efference copy in real movements. See the text for details.
      || Heading: move to wall and fixate stationary object (adapted from Warren, Hannon 1990). Inaccurate for simulated eye rotation, accurate for real eye rotation, need confirmation by efference copy!
    492. image p343fig09.13 When one scans the three different types of pears in the left image, as illustrated by the jagged blue curve with red movement end positions, and transforms the resulting retinal images via the cortical magnification factor, or log polar mapping, the result is the series of images in the right column. How do our brains figure out from such confusing data which views belong to which pear?
      || View-invariant object learning and recognition Three pears: Anjou, Bartlett, Comice. Which is the Bartlett pear? During unsupervised scanning and learning about the world, no one tells the brain what views belong to which objects while it learns view-invariant object categories. Cortical magnificantion in V1.
    493. image p349fig09.20 Using virtual reality displays (left image), (Fajen, Warren 2003) collected data (right two images) about how observers avoid obstacles (open circular disks) as a function of their distance and angular position as they navigate towards a fixed goal (x). These data illustrate how goals act as attractors while obstacles act as repellers.
      || Steering from optic flow (Fajen, Warren 2003). goals are attractors, obstacles are repellers. Damped spring model explains human steering data.
    494. image p352fig09.26 The final stage of the model computes a beautiful expansion optic flow that permits an easy estimate of the heading direction, with an accuracy that matches that of human navigators.
      || The model generates accurate heading (Warren, Hannon 1990; Royden, Crowell, Banks 1994). Maximally active MSTd cell = heading estimate. Accuracy matches human data. Random dots [mean +-1.5°, worst +-3.8°], Random dots with rotation [accurate with rotations <1°/s, rotation increases, error decreases], OpenGL & Yosemite benchmark +-1.5°, Driving video +-3°.
    495. image p356fig10.03 Laminar computing achieves at least three basic properties of visual processing that have analogs in all biologically intelligent behaviors. These properties may be found in all cortical circuits in specialized form.
      || What does Laminar Computing achieve? 1. Self-stabilizing development and learning; 2. Seamless fusion of a) pre-attentive automatic bottom-up processing, b) attentive task-selective top-down processing; 3. Analog coherence: Solution of Binding Problem for perceptual grouping without loss of analog sensitivity. Even the earliest visual cortical stages carry out active adaptive information processing: [learn, group, attention]ing
    496. image p357fig10.04 Laminar Computing achieves its properties by computing in a new way that sythesizes the best properties of feedforward and feedback interactions, analog and digital computations, and preattentive and attentive learning. The property of analog coherence enables coherent groupings and decisions to form without losing sensitivity to the amount of evidence that supports them.
      || Laminar Computing: a new way to compute. 1. Feedforward and feedback: a) Fast feedforward processing when data are unambiguous (eg Thorpe etal), b) slower feedback chooses among ambiguous alternatives [self-normalizing property, real-time probabiligy theory], c) A self-organizing system that trades certainty against speed: Goes beyond Bayesian models! 2. Analog and Digital: Analog Coherence combines the stability of digital with the sensitivity of analog. 3. Preattentive and Attentive Learning: Reconciles the differences of (eg) Helmholtz and Kanizsa, "A preattentive grouping is its own 'attentional' prime"
    497. image p360fig10.09 Perceptual grouping is carried out in layer 2/3 by long-range horizontal excitatory recurrent connections, supplemented by short-range disynaptic inhibitory connections that together realize the bipole grouping properties that are diagrammed in Figure 10.10.
      || Grouping starts in layer 2/3. LGN-> 6-> 4-> 2/3: 1. Long-range horizontal excitation links collinear, coaxial receptive fields (Gilbert, Wiesel 1989; Bosking etal 1997; Schmidt etal 1997) 2. Short-range disynaptic inhibition of target pyramidal via pool of intraneurons (Hirsch, Gilbert 1991) 3. Unambiguous groupings can form and generate feedforward outputs quickly (Thorpe etal 1996).
    498. image p361fig10.10 Bipole grouping is achieved by long-range horizontal recurrent connections that also give rise to short-range inhibitory interneurons which inhibit nearby bipole cells as well as each other.
      || Bipole property controls perceptual grouping. Collinear input on both sides. Excitatory inputs summate. Inhibitory inputs normalize, Shunting inhibition! Two-against-one. Cell is excited.
    499. image p367fig10.15 Data (left column) and simulation (right column) of how attention prevents a masking stimulus from inhibiting the response to the on-center of the cell from which the recording was made.
      || Attention protects target from masking stimulus (Reynolds etal 1999; Grossberg, Raizada 2000).
    500. image p367fig10.16 Neurophysiological data (left image) and simulation (right image) of how a low-contrast target can be facilitated if it is surrounded by a paid (31May2023 Howell - is word correct?) of collinear flankers, and suppresssed by them if it has high contrast.
      || Flankers can enhance or suppress targets (Polat etal 1998; Grossberg, Raizada 2000). target alone, target + flankers, flankers alone.
    501. image p368fig10.17 Neurophysiological data (left image) and simulation (right image) showing that attention has a greater effect on low contrast than high contrast targets.
      || Attention has greater effect on low contrast targets (DeWeerd etal 1999; Raizada, Grossberg 2001). Threshold increase (deg) vs Grating contrast (%), [no, with] attention
    502. image p368fig10.18 Neurophysiological data (left image) and simulation (right image) of relative on-cell activities when the input to that cell may also be surroubded by iso-orientation or perpendicular textures.
      || Texture reduces response to a bar: iso-orientation suppression (Knierim, van Essen 1992), perpendicular suppression (Raizada, Grossberg 2001)
    503. image p369fig10.19 Data from (Watanabe etal 2001) showing perceptual learning of the coherent motion direction, despite the lack of extra-foveal attention and awareness of the moving stimuli.
      || Unconscious perceptual learning of motion direction, % correct for two tests, compared to chance level results.
    504. image p393fig11.31 (Todd, Akerstrom 1987) created a series of 2D images from discrete black patches on a white disk and showed how the perceived depth varies with the factors summarized in the figure. The LIGHTSHAFT model quantitatively simulated their data.
      || Factors determining depth-from-texture percept. Perceived depth varies with texture element width, but only when elements are elongated and sufficiently aligned with one another to form long-range groupings. Data of (Todd, Akerstrom 1987) simulated by the LIGHTSHAFT model of (Grossberg, Kuhlmann 2007). [HP, LP, CCE, CCS, RO]
    505. image p399fig11.39 Simulation of the eye rivalry data of (Lee, Blake 1999).
      || [Binocular, [left, right] eye] activity
    506. image p402fig11.43 A pair of disparate images of a scene from the University of Tsukuba. Multiview imagre database.
      || input [left, right]
    507. image p407fig12.03 Neurophysiological data showing how motor cortical cells code different vectors that are sensitive to both the direction of the commanded movement and its length.
      || (a) Single primary motor cortex neuron, onset of movement -> on..., radial architecture... (b) Motor cortex neuronal population, radial architecture...
    508. image p409fig12.04 (top half) Neurophysiological data of vector cell responses in motor cortex. (bottom half) VITE model simulations of a simple movement in which the model's difference vector simulates the data as an emergent property of network interactions.
      || Neurophysiological data. VITE model [Present Position vector, Difference vector, Outflow velocity vector, go signal].
    509. image p410fig12.06 Monkeys seamlessly transformed a movement initiated towards the 2 o'clock target into one towards the 10 o'clock target when the later target was substituted 50 or 100 msec after activation of the first target light.
      ||
    510. image p414fig12.11 Neurophysiological data from cortical areas 4 and 5 (every other column) and simulations thereof (other columns) during a reach.
      || activation vs time. (a) area 4 phasic RT (IFV) (b) area 4 tonic (OPV) (c) area 4 phasic-tonic (OFPV) (d) area 4 phasic MT (DVV) (e) area 5 phasic (DV) (f) area 5 tonic (PPV)
    511. image p424fig12.21 One of the many types of data about pitch processing that are simulated by the SPINET model. See the text for details.
      || Pitch shifts with component shifts (Patterson, Wightman 1976; Schouten 1962). Pitch vs lowest harmonic number.
    512. image p425fig12.23 ARTSTREAM simulations of the auditory continuity illusion and other streaming properties (left column, top row). When two tones are separated by silence (Input), a percept of silence also separates them in a spectral-pitch resonance. (left column, bottom row). When two tones are separated by broadband noise, the percept of tone continues through the noise in one stream (stream 1) while the remainder of the noise occurs in a different stream (stream 2). (right column) Some of the other streaming properties that have been simulated by the ARTSTREAM model.
      || Auditory continuity does not occur without noise. Auditory continuity in noise. Other simulated streaming data.
    513. image p428fig12.25 (left architecture) Auditory-articulatory feedback loop whereby babbled sounds active learning in an imitative map that is later used to learn to reproduce the sounds of other speakers. An articulatory-to-auditory expectation renders learning possible by making the auditory and motor data dimensionally consistent, as in the motor theory of speech. (right architecture) Parallel streams in the ARTSPEECH model for learning speaker-independent speech and language meaning, including a mechanism for speaker normalization (right cortical stream) and for learning speaker-dependent vocalic qualities (left cortical stream).
      || left: Speaker-dependent vocalic qualities; right: Speaker-independent speech and language meaning
    514. image p432fig12.28 (left image) The SpaN model simulates how spatial representations of numerical quantities are generated in the parietal cortex. (right image) Behavior numerosity data and SpaN model simulations of it.
      || (Left) preprocessor-> spatial number map-> Comparison wave. (Right) data axis: number of lever presses; model axis: node position in the spatial number axis
    515. image p437fig12.32 Data from a free recall experiment illustrate the bowed serial position curve.
      || Serial position function for free recall Data: (Murdock 1962 JEP 64, 482-488). % correct vs position of word on a 40-word list. Primacy gradient can be a mixture of STM and LTM read-out.
    516. image p437fig12.33 Item and Order working memory models explain free recall data, as well as many other psychological and neurobiological data, by simulating how temporal series of events are stored as evolving spatial patterns of activity at content-addressable item categories. The categories with the largest activities are rehearsed first, and self-inhibit their activity as they do so in order to prevent tem from being rehearsed perseveratively. The laws whereby the items are stored in working memory obey basic design principles concerning list categories, or chunks, of sequences of stored items can be stably remembered.
      || Working memory models: item and order, or competitive queuing (Grossberg 1978; Houghton 1990; Page, Norris 1998). Event sequence in time stored as an evolving spatial pattern of activity. Primacy gradient of working memory activation stores correct temporal order at content-addressable cells. Maximally activated cell populations is performed next when a rehearsal wave is turned on. Output signal from chosen cell population inhibits its own activity to prevent perseveration: inhibition of return. Iterate until entire sequence is performed.
    517. image p443fig12.41 Neurophysiological data from the Averbeck etal sequential copying experiments show the predicted primacy gradient in working memory and the self-inhibition of activity as an item is stored. When only the last item remains stored, it has the highest activity becasuse it has been freed from inhibition by earlier items.
      || Neurophysiology of sequential copying
    518. image p444fig12.42 The LIST PARSE laminar cortical model of working memory and list chunking that I published with Lance Pearson in 2008 simulated the Averbeck etal data in Figure 12.41, as in the left column of the figure. It also simulated cognitive data about working memory storage by human subjects. See the text for details.
      || LIST PARSE: Laminar cortical model of working memory and list chunking (Grossberg, Pearson 2008). Simulates data about: [immediate, delayed, continuous] distractor free recall; immediate serial recall; and variable-speed sequential perfomance of motor acts. [velocity, acceleration] vs time (ms) from recall cue.
    519. image p446fig12.44 (left column, top row) LIST PARSE can model linguistic data from human subjects. In this figure, model parameters are fixed to enable a close fit to data about error-type distributions in immediate free recall experiments, notably transposition errors. (right column, top row) Simulation and data showing bowing of the serial position curve, including an extended primacy gradient. (left column, bottom row) The simulation curve overlays data about list length effects, notably the increasing recall difficulty of longer lists during immediate serial recall (ISR). (right column, bottom row) Simulation (bottom image) and data (top image) of the limited temporal extent for recall.
      || (1. TL) Error-type distributions in immediate serial recall (Hanson etal 1996). % occurence vs serial position. Graph convention: Data- dashed lines; Simulations- solid lines. Six letter visual ISR. Order errors- transpositions of neighboring items are the most common. Model explanation: Noisy activation levels change relative oorder in primacy gradient. Similar activation of neighboring items most susceptible to noise. Model paramenters fitted on these data. (2. TR) Bowing of serial position curve (Cowan etal 1999). % correct vs serial position. Auditory ISR with various list lengths (graphs shifted rightward): For [, sub-]span lists- extended primacy, with one (or two) item recency; Auditory presentation- enhanced performance for last items. LIST PARSE: End effects- first and last items half as many members; Echoic memory- last presented item retained in separate store. (3. BL) List length effects, circles (Crannell, Parrish 1968), squares (Baddeley, Hitch 1975), solid line- simulation. % list correct vs list length. Variable list length ISR: longer lists are more difficult to recall. LIST PARSE: More items- closer activation levels and lower absolute activity level with enough inputs; Noise is more likely to produce order errors, Activity levels more likely to drop below threshold;. (4. BR) Limited temporal extent for recall (Murdoch 1961). % recalled vs retention interval (s). ISR task with distractor-filled retention intervals (to prevent rehersal): Increasing retention interval - decreases probabilty of recalling list correctly; Load dependence- longer list more affected by delays; Performance plateau- subjects reach apparent asymptote. LIST PARSE: Increase convergence of activities with time; loss of order information;.
    520. image p447fig12.45 (left column) LIST PARSE simulations of the proportion of order errors as a function of serial position for 6 item lists with (a) an extended pause of 7 time units between the third and fourth items, and (b) pauses of 5 time units (solid curve) and 10 time units (dashed curve) between all items. (right column) Simulations (solid curves) and data (dashed curves) illustrating close model fits in various immediate free recall tasks.
      || (Left) Temporal grouping and presentation variability. Temporal grouping: Inserting an extended pause leads to inter-group bowing; Significantly different times of integration and activity levels across pause, fewer interchanges. (Right) Immediate free recall, and [delayed, continuous] distractor-free recall. Overt rehersal IFR task with super-span (ie 20 item) lists: Extended recency- even more extended with shorter ISIs; Increased probability of recall with diminished time from last rehearsal; Early items in list rehearsed most;. LIST PARSE (unique) for long lists: Incoming items from recency gradient; Rehearsal (re-presentation) based upon level of activity.
    521. image p452fig12.48 (left column) In experiments of (Repp etal 1978), the silence duration between the words GRAY and SHIP was varied, as was the duration of the fricative noise in S, with surprising results. (right column) The red arrow directs our attention to surprising perceptual changes as silence and noise durations increase. See the text for details.
      || Perceptual integration of acoustic cues, data (Repp etal 1978). GRAY-> silence duration-> SHIP (noise duration from start of word). Noise duration vs silence duration: GRAY SHIP <-> [GREAT SHIP <-> GRAY CHIP] <-> GREAT CHIP.
    522. image p453fig12.49 The ARTWORD model that I published in 2000 with my PhD student Christopher Myers simulates data such as the (Repp etal 1978) data in Figure 12.68. See the text for details.
      || ARTWORD model (Grossberg, Myers 2000). Input phonetic features-> Phonemic item working memory-> Masking Field unitized lists-> Automatic gain control-> Phonemic item working memory. [habituative gate, adaptive filter]s.
    523. image p465fig12.63 Neurophysiological data (left image) and lisTELOS stimulation (right figure) showing how microstimulation biases saccadic performance order but not the positions to which the saccades will be directed. See the text for details.
      || Saccade trajectories converge to a single location in space. Microstimulation biased selection so saccade trajectories converged toward a single location in space. [Data, model] contra <-> Ipsi (msec)
    524. image p468fig12.65 Linguistic properties of the PHONET model and some of the data that it simulates. The upper left image summarizes the asymmetric transient-to-sustained gain control that helps to create invariant intraword ratios during variable-rate speech. The lower left image summarizes the rate-dependent gain control of the ARTPHONE model that creates rate-invariant working memory representations in response to sequences of variable-rate speech. The right image summarizes the kind of paradoxical VC-CV category boundary data of (Repp 1980) that ARTPHONE simulates. See the text for details.
      || (left upper) [transient, sustained] [working memory, filter, category]. (left lower) phone inputs-> [input rate estimate, features], Features w <- habituative transmitter gates -> categories-> rate invariant phonetic output, input rate estimate-> gain control-> [features, categories] rate-dependent integration of categories and features. (right) % 2-stop vs VC-CV silent interval (msec): [ib-ga, ib-ba, iga, iba].
    525. image p469fig12.66 (left column) A schematic of how preserving relative duration, as in the first and third images, of consonant and vowel pairs can preserve a percept, in this case of /ba/, but not doing so, as in the first and second images, can cause a change in percept, as from /ba/ to /wa/, as in the data of (Miller, Liberman 1979) that PHONET simulates. (right column) Changing frequency extent can also cause a /ba/ - /wa/ transition, as shown in data of (Schwab, Sawusch, Nusbaum 1981) that PHONET also simulates.
      || (left image) Maintaining relative duration as speech speeds up preserves percept (Miller, Liberman 1979). frequency vs time- [/ba/, /wa/, /ba/] (right image) Changing frequency extent causes /b/-/wa/ transition (Schwab, Sawusch, Nusbaum 1981). frequency vs time- [/ba/, /wa/] Dt extent.
    526. image p473fig12.69 Error rate and mean reaction time (RT) data from the lexical decision experiments of (Schvanevelt, McDonald 1981). ART Matching Rule properties explain these data in (Grossberg, Stone 1986).
      || (left) Error rate vs type of prime [R, N, U], [non,] word. (right) Mean RT (msec) vs type of prime [R, N, U], [non,] word.
    527. image p474fig12.70 The kind of model macrocircuit that was used in (Grossberg, Stone 1986) to explain lexical decision task data.
      || inputs-> A1 <-> A2 oconic sensory features <-> A3 item and order in sensory STM <-> A4 list parsing in STM (masking field) <-> A5 semantic network (self-feedback). [A4, A5] <-> V* visual object recognition system. M1-> [outputs, A1]. M1 <-> M2 iconic motor features <-> M3 item and order in motor STM. A2-> M2. A3-> M3.
    528. image p476fig12.71 Word frequency data of (Underwood, Freund 1970) that were explained in (Grossberg, Stone 1986).
      || percent errors vs frequency of old words [L-H to H-H, L-L to H-L].
    529. image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
      || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
    530. image p485fig13.06 (left column) An inverted-U occurs in conditioned reinforcer strength as a function of the ISI between the CS and the US. Why is learning attenuated at 0 ISI? (right column) Some classical conditioning data that illustrate the inverted-U in conditioning as a function of the ISI.
      || InterStimulus Interval (ISI) effect. Data from (Dmith etal 1969; Schneiderman, Gormezano 1964).
    531. image p490fig13.15 The CogEM circuit is an ancient design that is found even in mollusks like Aplysia. See the text for details.
      || Aplysia (Buononamo, Baxter, Byrne, Neural Networks 1990; Grossberg, Behavioral and Brain Sciences 1983). Facilitator neuron ~ drive representation.
    532. image p504fig13.31 Behavioral contrast can occur during reinforcement learning due to decreases in either positive or negative reinforcers. See Figure 13.32 for illustrative operant conditioning data.
      || Behavioral contrast: rebounds! Shock level vs trials. 1. A sudden decrease in frequency or amount of food can act as a negative reinforcer: Frustration. 2. A sudden decrease in frequency or amount of shock can act as a positive reinforcer: Relief.
    533. image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
      ||
    534. image p533fig14.09 Search data and ARTSCENE Search simulations of them in each pair of images from (A) to (F). See the text for details.
      || 6*[data vs simulation], [Response time (ms) versus epoch].
    535. image p542fig15.04 Conditioning data from (Smith 1968; Millenson etal 1977). The former shows the kind of Weber Law and inverted U that were simulated in Figure 15.3. The latter shows that, if there are two ISIs during an experiment, then the animals learn to adaptively time their responses with two properly scaled Weber laws.
      || (left) One ISI (Smith 1968) [mean membrane extension (mm) versus time after CS onset (msec)]. (right) Two ISIs (Millenson etal 1977) [200, 100] msec CS test trials, [mean momentary CS amplitude (mm) vs time after CS onset (msec)]. (bottom) Conditioned eye blinks, made with nictitating membrane and/or eyelid, are adaptively timed: peak closure occurs at expected time(s) of arrival of the US following the CS and obeys a Weber Law.
    536. image p543fig15.05 Simulation of conditioning with two ISIs that generate their own Weber Laws, as in the data shown in Figure 15.4.
      || Learning with two ISIs: simulation: R = sum[all: f(xi)*yi*xi] vs msec. Each peak obeys Weber Law! strong evidence for spectral learning.
    537. image p556fig15.24 (a) Data showing normally timed responding (solid curve) and short latency responses after lesioning cerebellar cortex (dashed curve). (b) computer simulation of short latency response after ablation of model cerebellar cortex.
      ||
    538. image p559fig15.28 Neurophysiological data (left column) and model simulations (right column) of SNc responses. See the text for details.
      || membrane potential vs time
    539. image p573fig16.01 The experimental chamber (A) and neurophysiological recordings from a rat hippocampus (B) that led to the discovery of place cells. See the text for details.
      ||
    540. image p574fig16.02 Neurophysiological recordings of 18 different place cell receptive fields. See the text for details.
      ||
    541. image p575fig16.03 As a rat navigates in its experimental chamber (black curves), neurophysiological recordings disclose the firing patterns (in red) of (a) a hippocampal place cell and (b) an entrorhinal grid cell.
      ||
    542. image p582fig16.08 Some experimental evidence for stripe-like cell receptive fields has been reported. The band cells posited by Neil Burgess also exhibit the one-dimensional firing symmetry of stripe cells, but are modeled by oscillatory intererence. See the text for details.
      || Evidence for stripe-like cells. Entorhinal cortex data (Sargolini, Fyhn, Hafting, McNaughton, Witter, Moser, Moser 2006; Krupic, Burgess, O'Keefe 2012). Similar hypothetical construct used by Interference model but position is decoded by grid cell oscillatory interference- Band Cells (Burgess 2008).
    543. image p589fig16.19 Neurophysiological data showing the smaller dorsal grid cell scales and the larger ventral grid cell scales.
      || Spatial scale of grid cells increase along the MEC dorsoventral axis (Hafting etal 2005; Sargolini etal 2006; Brun etal 2008). [dorsal (left), ventral (right)] cart [rate map, autocortelogram]. How does the spatial scale increase along the MEC dorsoventral axis?
    544. image p593fig16.26 Data (left column) and simulations (right column) of the gradient of increasing grid cell spacing along the dorsoventral axis of MEC.
      || Gradient of grid spacing along dorsoventral axis of MEC (Brun etal 2008). data-[Distance (m?), Median grid spacing (m?)] simulations-[Grid spacing (cm), Grid spacing (cm)] vs response rate.
    545. image p594fig16.27 Data (left column) and simulations (right column) of the gradient of increasing grid cell field width along the dorsoventral axis of MEC.
      || Gradient of field width along dorsoventral axis of MEC (Brun etal 2008). data-[Distance (m?), Width autocorr peak (m?)] simulations-[Grid field width (cm), Width autocorr peak (cm)] vs response rate.
    546. image p595fig16.28 Data (left column) and simulations (right column) about peak and mean grid cell response rates along the dorsoventral axis of MEC.
      || Peak and mean rates at different locations along DV axis of MEC (Brun etal 2008). Peak rate (Hz) vs [data- DV quarter, simulations- Response rate].
    547. image p596fig16.29 Data (top row) and simulations (bottom row) showing decreasing frequency of subthreshold membrane potential oscillations along the DV axis of MEC.
      || Subthreshold membrane potential oscillations at different locations along DV axis of MEC (Giocomo etal 2020; Yoshida etal 2011). Data [oscillations (Hz) vs distance from dorsal surface (mm) @[-50, -45] mV, Frequency (Hz) vs [-58, -54, -50] mV]. Simulations MPO frequency (Hz) s [response, habituation] rate.
    548. image p596fig16.30 Data (top row) and simulations (bottom row) of spatial phases of learned grid and place cells.
      || Spatial phases of learned grid and place cells (Hafting etal 2005). Data: Cross-correlogram of rate maps of two grid cells; Distribution of phase difference: distance from origin to nearest peak in cross-correlogram. Simulations: Grid cell histogram of spatial correlation coefficients; Place cell histogram of spatial correlation coefficients.
    549. image p597fig16.31 Data (a) and simulations (b-d) about multimodal place cell receptive fields in large spaces. The simulations are the result of learned place fields.
      || Multimodal place cell firing in large spaces (Fenton etal 2008; Henriksen etal 2010; Park etal 2011). Number of cells (%) vs Number of place fields. [2, 3] place fields, 100*100 cm space.
    550. image p597fig16.32 Data (top row) and simulations (bottom row) about grid cell development in juvenile rats. Grid score increases (a-b and d), whereas grid spacing remains fairly flat (c and e).
      || Model fits data about grid cell development (Wills etal 2010; Langston etal 2010). Data: [Gridness, grid score, inter-field distance (cm)]. Simulations: [Gridness score, Grid spacing (cm)] vs trial.
    551. image p598fig16.33 Data (top row) and simulations (bottom row) of changes in place cell properties in juvenile rats, notably about spatial information (a,c) and inter-trial stability (b,d).
      || Model fits data about grid cell development (Wills etal 2010). [Data, Simulation] vs [spatial information, inter-trial stability]. x-axis [age (postnatal day), trial].
    552. image p599fig16.35 Data (a) and simulations (b,c) about anatomically overlapping grid cell modules. (a) shows the anatomical distribution of grid cells belonging to different modules in one animal. DV location (mm) vs postrhinal border. (b) shows the simulated distribution of learned grid cell spacings from two stripe cell scales. frequency (%) vs grid spacing (cm). mu = [1, 0.6]. (c) shows what happens when half the cells respond with one rate and half another rate. (d) shows the same with three rates. (e-g) show spatial maps and autocorrelograms of grid cells that arise from the different rates in (d). [rate map, autocorelogram] vs [score [1.07, 0.5, 0.67], spacing (cm) [23.58, 41, 63.64]].
      ||
    553. image p602fig16.37 Data showing the effect of hippocampal inactivation by muscimol on grid cell firing before, during, and six hours after the muscimol, reading from left to right.
      || Hippocampal inactivation disrupts grid cells (Bonnevie etal 2013). muscimole inactivation. spikes on trajectory: [before, after min [6-20, 20-40, 40-60, 6h]]. rate map (Hz) [18.6, 11.4, 9.5, 6.7, 10.8]. spatial autocorrelogram g=[1.12, 0.05, -0.34, 0.09, 1.27].
    554. image p603fig16.38 Role of hippocampal feedback in maintaining grid fields. (a) Data showing the effect of hippocampal inactivation before and during muscimol inhibition of hippocampal cells, as in Figure 16.37. (b) Model simulation with normal grid fields. (c) Model simulation that emulates the effect of hippocampal inhibition on grid fields.
      || (a) Data: hippocampal inactivation [before, after] cart [spikes on trajectory (p: [18.6, 6.7] Hz), spatial autocorrelogram (g= [1.12, 0.09])]. (b) Model: noise-free path integration, [spikes on trajectory (p: 14.56 Hz), rate map, spatial autocorrelogram (g= 1.41), dynamic autocorrelogram (g=0.6)]. (c) Model: noisy path integration + non-specific tonic inhibition, [spikes on trajectory (p: 11.33 Hz), rate map, spatial autocorrelogram (g= 0.05), dynamic autocorrelogram (g=0.047)].
    555. image p605fig16.39 Data showing effects of medial septum (MS) inactivation on grid cells and network theta oscillations in medial entorhinal cortex (MEC). (A) Examples of disruption in the spatial expression of the hexagonal grid structure for two grid cells (Brandon etal 2011). (B) Temporal reduction in the power and frequency of network theta oscillations (Koenig etal 2011). (C) Temporary reduction in the gridness score, mean firing rate, and patial stability of grid cells (Koenig etal 2011).
      || Disruptive effects of Medial Septum inactivation in Medial Entorhinal Cortex (Brandon etal 2011; Koenig etal 2011). (A) Rate map [rate map, spatial autocorrelations, trajectory] vs [baseline, sub-sampled, medial septum inactivation, 3-6 hour recovery, 24 hour recovery], [rate map (Hz- m, p), spatial autocorrelations (gridness)][ 1.2, 7.2, 1.1; 0.25, 1.7, 0.6; 0.25, 2.5, -0.53; 0.7, 5.1, 0.55; 1.0, 5.3, 1.3; 2.1, 15, 0.19; 1.7, 12, 0.71; 1.7, 3.2, -0.22; 1.8, 9.1, 0.68; 2.5, 13, 0.46]. (B) [normalized power at 7-9 Hz, frequency (Hz)] vs 5-minute periods. (C) [mean gridness score (+-SEM), mean firing rate (% of baseline), mean correlation coeff (+-SEM)] vs 10-minute periods.
    556. image p607fig16.40 Effects of medial septum (MS) inactivation on grid cells. (a) Each row shows data and different data-derived measures of grid cell responsiveness, starting from the left with the baseline response to the middle column with maximal inhibition. (b) Data showing the temporary reduction in the gridness scores during MS inactivation, followed by recovery. (c) Simulation of the collapse in gridness, achieved by reduction in cell response rates to mimic reduced cholinergic transmission. (d,e) Simulations of the reduction in gridness scores in (d) by reduction of cell response rates, in (e) by changing the leak conductance. See the text for details.
      ||
    557. Grossberg 2021 p229c2h0.60 SMART computer simulations demonstrate that a good enough match of a top-down expectation with a bottom-up feature pattern generates an attentive resonance during which the spikes of active cells synchronize in the gamma frequency range of 20-70 Hz (Figure 5.40). Many labs have reported a link between attention and gamma oscillations in the brain, including two articles published in 2001, one from the laboratory of Robert Desimone when he was at the the National Institute of Mental Health in Bethseda (Fries, Reynolds, Rorie, Desimone 2001), and the other from the laboratory of Wolf Singer in Frankfurt (Engel, Fries, Singer 2001). You'll note that Pascal Fries participated in both studies, and is an acknowledged leader in neurobiological studies of gamma oscillations; eg (Fries 2009). .."
    558. Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
      The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

      Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

      Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

      These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

      This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

      The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

      I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

      Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

      The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

      If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

      The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

      Howell: bold added for emphasis.
      (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
      see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet).
    559. p190 Howell: [neural microcircuits, modal architectures] used in ART -
      bottom-up filterstop-down expectationspurpose
      instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
      LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
      EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
      auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
      auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
      visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
      red - cognitive-emotional dynamics
      green - working memory dynamics
      black - see [bottom-up, top-down] lists
      EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
      CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

      backgound colours in the table signify :
      whitegeneral microcircuit : a possible component of ART architecture
      lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
      light bluepost-perceptual cognition?
      pink"the feeling of what happens" and knowing what event caused that feeling
    560. image p085fig02.38 Our models have been used in many large-scale applications to engineering and technology. Linking brain to behavior explains how brain mechanisms give rise to psychological functions, and do so autonomously. The combination of mechanism, function, and autonomy helps to explain their value in helping to solve outstanding problems in technology.
      || Modeling method and cycle.
      Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
      Technology: Mathematical model and analysis <-> Technological applications
      At every stage, spin off new model designs and mechanisms to technologist who need autonomous intelligent applications.
    561. image p225fig05.33 Some early ARTMAP benchmark studies. These successes led to the use of ARTMAP, and many variants that we and other groups have developed, in many large-scale applications in engineering and technology that has not abated even today.
      || see Early ARTMAP benchmark studies
    562. image p490fig13.15 The CogEM circuit is an ancient design that is found even in mollusks like Aplysia. See the text for details.
      || Aplysia (Buononamo, Baxter, Byrne, Neural Networks 1990; Grossberg, Behavioral and Brain Sciences 1983). Facilitator neuron ~ drive representation.
    563. image p563fig15.33 The basal ganglia gate neural processing in many parts of the brain. The feedback loop through the lateral orbitofrontal cortex (blue arrow, lateral orbitofrontal) is the one that MOTIVATOR models.
      || MOTIVATOR models one of several thalamocortical loops through basal ganglia (Adapted from Fundamental Neuroscience. 2002 Copyright Elsevier). [cortex-> striatum-> pallidum S. nigra-> thalamus] vs [motor, oculomotor, dorsolateral prefrontal, lateral orbitofrontal, anterior cingulate]. thalamus-> [striatum, cortex].
    564. image p563fig15.34 The colored regions are distinct parts of the basal ganglia in the loops depicted in Figure 15.33.
      || Distinct basal ganglia zones for each loop (Adapted from Fundamental Neuroscience. 2002 Copyright Elsevier). Menu
    565. p370 Chapter 11 means (Grossberg 2021) page 370, Chapter 11
      p002sec Illusion and realitymeans (Grossberg 2021) page 2, section Illusion and reality
      p013fig01.09means (Grossberg 2021) page 13, Figure 1.09 (1.9 as in book)
      p030tbl01.02 means (Grossberg 2021) page 30, Table 1.02 (1.2 as in book)
      p111c2h0.5means (Grossberg 2021) page 111, column 2, height from top as fraction of page height
      || text...Are notes in addition to [figure, table] captions, mostly comprised of text within the image, but also including quotes of text in the book. Rarely, it includes comments by Howell preceded by "Howell". The latter are distinct from "readers notes" (see, for example : reader Howell notes).
      p044 Howell: grepStr 'conscious' means a comment by reader Howell, extracted using the grep string shown, referring to page 44 in (Grossberg 2021)
    566. p00I PrefacePreface - Biological intelligence in sickness, health, and technology
    567. p001 Chapter 1 Overview - From Complementary Computing and Adaptive Resonance to conscious awareness
    568. p050 Chapter 2 How a brain makes a mind - Physics and psychology split as brain theories were born
    569. p086 Chapter 3 How a brain sees: Constructing reality - Visual reality as illusions that explain how we see art
    570. p122 Chapter 4 How a brain sees: Neural mechanisms - From boundary completion and surface flling-in to figure-ground perception
    571. p184 Chapter 5 Learning to attend, recognize, and predict the world -
    572. p250 Chapter 6 Conscious seeing and invariant recognition - Complementary cortical streams coordinate attention for seeing and recognition
    573. p280 Chapter 7 How do we see a changing world? - How vision regulates object and scene persistence
    574. p289 Chapter 8 How we see and recognize object motion - Visual form and motion perception obey complementary laws
    575. p337 Chapter 9 Target tracking, navigation, and decision-making - Visual tracking and navigation obey complementary laws
    576. p353 Chapter 10 Laminar computing by cerebral cortex - Towards a unified theory of biologucal and artificial intelligence
    577. p370 Chapter 11 How we see the world in depth - From 3D vision to how 2D pictures induce 3D percepts
    578. p404 Chapter 12From seeing and reaching to hearing and speaking - Circular reaction, streaming, working memory, chunking, and number
    579. p480 Chapter 13 From knowing to feeling - How emotion regulates motivation, attention, decision, and action
    580. p517 Chapter 14 How prefrontal cortex works - Cognitive working memory, planning, and emotion conjointly achieved valued goals
    581. p539 Chapter 15 Adaptively timed learning - How timed motivation regulates conscious learning and memory consolidation
    582. p572 Chapter 16 Learning maps to navigate space - From grid, place, and time cells to autonomous mobile agents
    583. p618 Chapter 17 A universal development code - Mental measurements embody universal laws of cell biology and physics Menu
      p370 Chapter 11 means (Grossberg 2021) page 370, Chapter 11
      p002sec Illusion and realitymeans (Grossberg 2021) page 2, section Illusion and reality
      p013fig01.09means (Grossberg 2021) page 13, Figure 1.09 (1.9 as in book)
      p030tbl01.02 means (Grossberg 2021) page 30, Table 1.02 (1.2 as in book)
      p111c2h0.5means (Grossberg 2021) page 111, column 2, height from top as fraction of page height
      || text...Are notes in addition to [figure, table] captions, mostly comprised of text within the image, but also including quotes of text in the book. Rarely, it includes comments by Howell preceded by "Howell". The latter are distinct from "readers notes" (see, for example : reader Howell notes).
      p044 Howell: grepStr 'conscious' means a comment by reader Howell, extracted using the grep string shown, referring to page 44 in (Grossberg 2021)
    584. image pxvifig00.01 Macrocircuit of the visual system
    585. image p002fig01.01 The difference between seeing and recognizing.
      || (W. Epstein, R. Gregory, H. von Helmholtz, G. Kanizsa, P. Kellman, A. Michote...) Seeing an object vs Knowing what it is. Seeing Ehrenstein illusion (See, recognize) va Recognizing offset grating Do not see, recognize). offset grating: some boundaries are invisible or amodal.
    586. image p002fig01.02 Dalmation in snow
      || p002c2h0.55 "...This image reminds us that invisible boundaries can sometimes be very useful in helping us to recognize visual objects in the world. ... When we first look at this picture, it may just look like an array of black splotches of different sizes, desities, and orientations across the picture. Gradually, however, we can recognize the Dalmatian in it as new boundaries form in our brain between the black splotches. ..."
    587. image p003fig01.03 Amodal completion
      || p00c1h0.75 "... Figure 1.3 illustrates what I mean by the claim that percepts derived from pictures are often illusions. Figure 1.3 (left column) shows three rectangular shapes that abut one another. Our percept of this image irresitably creates a different interpretation, however. We perceive a horizontal bar lying in front of a partially occluded vertical bar that is amodally completed behind it. ..."
    588. image p004fig01.04 (top row) Kanizsa stratification; (botton row) transparency images
      || [top row images] "... are called stratification percepts... This simple percept can ... be perceived either as a white cross in front of a white outline square, or as a white outline square in front of a white cross. The former percept usually occurs, but the percept can intermittently switch between these two interpretations. ...it is said to be a bistable percept. ..."
    589. image p008fig01.05 Noise-saturation dilemma.
      || cell activity vs cell number; [minimum, equilibrium, current, maximal] activity
    590. image p009fig01.06 Primacy gradient of activity stored in working memory within a recurrent shunting on-center off-surround network. Rehersal is controlled by a nonspecific rehersal wave and self-inhibitory feedback of the item that is currently being rehearsed. Rehearsal is controlled by a nonspecific rehearsal wave and self-inhibitory feedback of the item that is currently being rehearsed. Green = excitatory, red = inhibitory
      || inputs? -> item and order WM storage -> competitive selection-> rehearsal wave -> outputs
    591. image p011fig01.07 The choice of signal function f determines how an initial activity pattern will be transformed and stored in short-term memory (STM). Among [same, slower, faster]-than-linear signal functions, only the last one can suppress noise. It does so as it chooses the population that receives the largest input for storage, while suppressing the activities of all other population, thereby giving rise to a winner-take-all choice.
      || initial pattern (xi(0) vs i):
      fXi(∞)= xi(∞)/sum[j: xj(∞)]x(∞)
      linearperfect storage of any patternamplifies noise (or no storage)
      slower-than-linearsaturatesamplifies noise
      faster-than-linearchooses max [winner-take-all, Bayesian], categorical perceptionsuppresses noise, [normalizes, quantizes] total activity, finite state machine
    592. image p012fig01.08 A sigmoidal signal function is a hybrid signal that combines the best properties of [faster, same, slower]-than linear signals. It can suppress noise and store a partially contrast-enhanced activity pattern. slower-than-linear saturates pattern; approximately linear- preserves pattern and normalizes; faster-than-linear- noise suppression and contrast-enhancement.
      || Sigmoidal signal: a hybrid. (upper) saturates pattern- slower-than-linear; (middle) preserves pattern and normalizes- approximately linear. (lower) noise suppression and contrast enhancement- faster-than-linear.
    593. image p013fig01.09 A sigmoid signal function generates a quenching threshold below which cell activities are treated like noise and suppressed. Activities that are larger than the quenching threshold are contrast enhanced and stored in short-term memory.
      || Quenching threshold xi(o) vs i.
      fXi(∞)= xi(∞)/sum[j: xj(∞)]x(∞)
      sigmoidtunable filter
      stores infinitely many contrast-enhanced patterns
      suppresses noise
    594. image p016fig01.10 The blocking paradigm shows how sensory cues that are conditioned to predict specific consequences can attentionally block other cues that do not change those predictions. On the other hand, if the total cue context is changed by adding a cue that does not change the predicted consequences, then the new cues can be conditioned to the direction of that change. They can hereby learn, for example, to predict fear if the shock level unexpectedly increases, or relief if the shock level unexpectedly decreases.
      || Minimal adaptive prediction. blocking- CS2 is irrelevant, unblocking- CS2 predicts US change. Learn if CS2 predicts a different (novel) outcome than CS1. CS2 is not redundant.
    595. image p016fig01.11 A sufficiently big mismatch between a bottom-up input pattern and a top-down expectation can activate the orienting system, which triggers a burst of nonspecific arousal that can reset the recognition category that read out the expectation. In this way, unexpected events can reset short-term memory and initiate a search for a category that better represents the current situation.
      || [category- top-down (TD) expectation; Bottom-up (BU) input pattern] -> Feature pattern -> BU-TD mismatch -> orienting system -> non-specific arousal -> category.
    596. image p018fig01.12 Peak shift and behavioural contrast. When a negative generalization gradient (in red) is subtracted from a positive generalization gradient (in green), the net gradient (in purple) is shifted way from the negative gradient and has a width that is narrower than any of its triggering gradients. Because the total activity of the network tends to be normalized, the renormalized peak of the net gradient is higher than that of the rewarded gradient, thereby illustrating that we can prefer experiences that we have never previously experienced over those for which we have previously been rewarded.
      ||
    597. image p019fig01.13 Affective circuits are organized into opponent channels, such as fear vs. relief, and hunger vs. frustration. On a larger scale of affective behaviours, exploration and consummation are also opponent types of behaviour. Exploration helps to discover novel sources of reward. Consummation enables expected rewards to be acted upon. Exploration must be inhibited to enable an animal to maintain attention long enough upon a stationary reward in order to consume it.
      || exploration vs consummation
    598. image p023fig01.14 A gated dipole opponent process can generate a transient antagonistic reboubnd from its OFF channel in response to offset of an input J to its ON channel. sustained on-response; transient off-response; opponent process; gates arousal: energy for rebound.
      ||
    599. image p024fig01.15 A REcurrent Associative Dipole, or READ, circuit is a recurrent shunting on-center off-surround network with habituative transmitter gates. Sensory cues sample it with LTM traces and thereby become conditioned reinforcers.
      ||
    600. image p025fig01.16 (left panel) The main processing stages of the Cognitive-Emotional-Motor (CogEM) model have anatomical interpretations in terms of sensory cortex, amygdala, and prefrontal cortex. Chapter 13 will describe in greater detail how CS cues activate invariant object categories in the sensory cortex, value categories in the amygdala, and object-value categories in the prefrontal cortex, notably the orbitofrontal cortex. The amygdala is also modulated by internal drive inputs like hunger and satiety. (right panel) Anatomical data support this circuit, as do many neurophysiological data.
      || drive -> amygdala -> prefrontal cortex <-> sensory cortex -> amygdala. [visual, somatosensory, auditory, gustatory, olfactory] cortex -> [amygdala, Orbital Prefrontal Cortex]. amygdala -> Lateral Prefrontal Cortex
    601. image p025fig01.17 Sensory-drive heterarchy vs. drive hierarchy. How cues and drives interact to choose the drive and motivation that will control behavioral choices.
      || [drive inputs, sensory cue [before, after] cross-over] -> incentive motivation [eat, sex].
    602. image p026fig01.18 Inverted U as a function of arousal. A Golden Mean at intermediate levels of arousal generates a combination of behavioral threshold, sensitivity, and activation that can support typical behaviors. Both underarousal and overarousal lead to symptoms that are found in mental disorders.
      || Behavior vs arousal.
      depressionunder-arousedover-aroused
      thresholdelevatedlow
      excitable above thresholdHyperHypo
      "UPPER" brings excitability "DOWN".
    603. image p027fig01.19 The ventral What stream is devoted to perception and categorization. The dorsal Where stream is devoted to spatial representation and action. The Where stream is also often called the Where/How stream because of its role in the control of action.
      ||
      Spatial representation of actionPerception categorization
      WHERE dorsalWHAT ventral
      Parietal pathway "where"Temporal pathway "what"
      Posterior Parietal Cortex (PPC)Inferior temporal Cortex (IT)
      Lateral Prefrontal Cortex (LPFC)Lateral Prefrontal Cortex (LPFC)
    604. image p029tbl01.01 Some pairs of complementary processing streams.
      ||
      visual boundary:
      interblob stream V1-V2-V4
      visual surface:
      blob stream V1-V2-V4
      visual boundary:
      interblob stream V1-V2-V4
      visual motion:
      magno stream V1-MT-MST
      WHAT streamWHERE stream
      perception & recognition:
      interferotemporal & prefrontal areas
      space & action:
      parietal & prefrontal areas
      object tracking:
      MT interbands & MSTv
      optic flow navigation:
      MT+ bands & MSTd
      motor target position:
      motor & parietal cortex
      volitional speed:
      basal ganglia
    605. image p030tbl01.02 The What and Where cortical processing streams obey complementary laws. These laws enable the What stream to rapidly and stably learn invariant object categories without experiencing catastrophic forgetting, while the Where stream learns labile spatial and action representations to control actions that are aimed towards these objects.
      ||
      WHATWHERE
      spatially-invariant object learning and recognitionspatially-variant reaching and movement
      fast learning without catastrophic forgettingcontinually update sensory-motor maps and gains
      IT InterferoTemporal CortexPPC Posterior Parietal Cortex
      WhatWhere
      matchingexcitatoryinhibitory
      learningmatchmismatch
    606. image p030fig01.20 A schematic cross-section of a slice of laminar neocortex whose cells are organized in a characteristic way in six layers, which themselves may be organized into distinct sublaminae. The computational paradigm of Laminar Computing attempts to show how different parts of neocortex can represent and control very different kinds of behavior - including vision, speech, can cognition - using specializations of the same canonical laminar cortical design.
      || Projection fibres: Cortico[spinal, bulbar, pontine, striate, reticulat, etc]; Thalamocortical fibres; Diffuse cortical afferent fibres: [nonspecific thalamocortical, Cholinergic, Monoaminergic]; Corticocortical efferents; Projection [cell, fibre]; Corticocortical efferent terminals.
    607. image p032fig01.21 At least three parallel visual cortical streams respond to visual inputs that reach the retina. Two parvocellular streams process visual surfaces (blob stream) and visual boundaries (interblob stream). The magnocellular stream processes visual motion.
      || [Retina, LGNs, V[1,2,3,4], MT] to [What- inferotemporal areas, Where- parietal areas]: visual parallel streams [2x blob, 1x bound]
    608. image p035fig01.22 A classical example of phonemic restoration. The spectrogram of the word "legislatures" is either excised, leaving a silent interval, or filled with broad-band noise. A percept of the restored phoneme is heard when it is replaced by noise, but not by silence.
      || [normal, silence, noise replaced] presentations. frequency (Hz) vs time (sec).
    609. image p036fig01.23 As more items are stored in working memory through time, they can select larger chunks with which to represent the longer list of stored items.
      || [x, y, z] -> [xy, xyz]
    610. image p037fig01.24 Only three processing stages are needed to learn how to store and categorize sentences with repeated words in working memory. See the text for more discussion.
      || IOR working memory (item chunk-> sequences) <-> IOR masking field: [item->list]<->[list->list] chunks. (<-> signifies <- expectation/attention, adaptive filter ->)
    611. image p038fig01.25 The ART Matching Rule stabilizes real time learning using a [top-down, modulatory on-center, off-surround] network. Object attention is realized by such a network. See text for additional discussion.
      || ART Matching Rule [volition, categories, features]. [one, two] against one.
    612. image p039tbl01.03 The link between consciousness and movement
      ||
      VISUALseeing, knowing, and reaching
      AUDITORYhearing, knowing, and speaking
      EMOTIONALfeeling, knowing, and acting
    613. image p042tbl01.04 The six main kinds of resonances which support different kinds of conscious awareness that will be explained and discussed in this book.
      ||
      type of resonancetype of consciousness
      surface-shroudsee visual object or scene
      feature-categoryrecognize visual object or scene
      stream-shroudhear auditory object or stream
      spectral-pitch-and-timbrerecognize auditory object or stream
      item-listrecognize speech and language
      cognitive-emotionalfeel emotion and know its source
    614. image p051fig02.01 Along the boundaries between adjacent shades of gray, laterial inhibition makes the darker area appear even darker, and the lighter areas appear even lighter. (Ernst Mach bands)
      ||
    615. image p052fig02.02 Feature-category resonances enable us to rapidly learn how to recognize objects without experiencing catastrophic forgetting. Attentive matching between bottom-up feature pattern inputs and top-down expectations prevent catastrophic forgetting by focussing object attention upon expected patterns of features, while suppressing outlier features that might otherwise have caused catastophic forgetting if they were learned also.
      || Adaptive Resonance. Attended feature clusters reactivate bottom-up pathways. Activated categories reactivate their top-down pathways. Categories STM, Feature patterns STM. Feature-Category resonance [synchronize, amplify, prolong]s system response. Resonance triggers learning in bottom-up and top-down adaptive weights: adaptive resonance!
    616. image p057fig02.03 Some basic anatomical and physiological properties of individual neurons. See the text for additional discussion.
      ||
      physiologycell body potentialaxonal signalchemical transmitter
      anatomynerve cell bodyaxonsynaptic knob, synapse
    617. image p058fig02.04 Serial learning paradigm: Learning the temporal order of events by practicing them in the order that they occur in time.
      || Learning a global arrow in time. How do we learn to encode the temporal order of events in LTM? serial learning. [w=intra, W=inter]trial interval. "... data about serial verbal learning (Figure 2.4) seemed to suggest that events can go "backwards in time". ..."
    618. image p059fig02.05 Bowed serial position curve. This kind of data emphasizes the importance of modelling how our brains give rise to our minds using nonlinear systems of differential equations.
      || Effects of [inter, intra]trial intervals (Hovland 1938). # of errors vs list position. [w (sec), W (sec)] = (2 6) (4 6) (2 126) (4 126). Nonoccurance of future items reduces the number of errors in response to past items. Thes data require a real-time theory for their explanation! that is, DIFFERENTIAL equations.
    619. image p059fig02.06 The bowed serial position curve illustrates the sense in which "events can go backwards in time" during serial learning.
      || Bow due to backward effect in time. If the past influenced the future, but no conversely: # of errors vs list position; Data (Hoyland Hull, Underwood, etc).
    620. image p060fig02.07 Position-specific-forward and backward error gradients illustrate how associations can form in both the forward and backward directions in time before the list is completely learned.
      || Error gradients: depend on list position. # of responses vs list position:
      list beginninganticipatory errorsforward in time
      list middleanticipatory and perseverative errorsforward and backward in time
      list endperseverative errorsbackward in time
    621. image p061fig02.08 The existence of forward and backward associations, such as from A to B and from B to A is naturally explained by a network of neurons with their own activities or STM traces, and bidirectional connections between them with their own adaptive weights or LTM traces.
      || How these results led to neural networks (Grossberg 1957). Networks can learn forward and backward associations! Practice A->B also learn B<-A. Because learning AB is not the same as learning BA, you need STM traces, or activations, xp at the nodes, or cells, and LTM traces, or adaptive weights, zg, for learning at the synapses.
    622. image p063fig02.09 The Additive Model describes how multiple effects add up influence the activities, or STM, traces of neurons.
      || STM: Additive model (Grossberg, PNAS 1967, 1968).
      Short-term memory (STM)
      trace activation
      signaladaptive weightLong-term memory (LTM)
      trace
      xi(j)fi(xi(t))*Bijzij(t)xj(t)
      learning rate?passive decaypositive feedbacknegative feedbackinput
      d[dt: xi(t)] = - Ai*xi + sum[j=1 to n: fj(xj(t))*Bji*zji] - sum[j=1 to n: gj(xj)*Cp*Zp] + Ii
      Special case : d[dt: xi(t)] = - Ai*xi + sum[j=1 to n: fj(xj(t))*zp] + Ii
    623. image p064fig02.10 The Shunting Model includes upper and lower bounds on neuronal activities. These bound have the effect of multiplying additive terms by excitatory and inhibitory automatic gain terms that enable such models to preserve their sensitivity to inputs whose size may vary greatly in size through time, while also approximately normalizing their total activities.
      || STM: Shunting Model (Grossberg, PNAS 1967, 1968). Mass action in membrane equations. Bi/Ci -> xi(t) -> O -> -Fi/Ei. Bounded activations, automatic gain control. d[dt: xi(t)] = - Ai*xi + (Bi - Ci*xi)sum[j=1 to n: fj(xj(t))*Dji*yji*zji + Ii] - (Ei*Xi + Fi)*sum[j=1 to n: gj(xj)*Gji*Yji*Zji + Ji]. Includes the Additive Model.
    624. image p064fig02.11 Medium-Term Memory (MTM) and Long-Term Memory (LTM) equations complement the Additive and Shunting Models of STM. MTM is typically defined by a chemical transmitter that is released from the synaptic knobs of a neuron (Figure 2.03). Its release or inactivation in an activity-dependent way is also called habituation. LTM defines how associative learning occurs between a pair of neurons whose activities are approximately correlated through time. See the text for details.
      || Medium and Long Term memory.
      MTMhabituative transmitter gated[dt: yki(t)] = H*(K - yki) - L*fk(xk)*yki
      LTMgated steepest descent learningd[dt: zki(t)] = Mk*fk(xk)*(hi(xi) - zki)
    625. image p065fig02.12 Three sources of neural network research: [binary, linear, continuous nonlinear]. My own research has contributed primarily to the third.
      || Three sources of neural network research.
      BinaryLinearContinuous and non-Linear
      neural network signal processingSystems theoryNeurophysiology and Psychology
      McCullogh-Pitts 1943
      ... Xi(t+1) = sgn{sum[j: Aij*Xj(t) - Bi}
      Von Neumann 1945
      Calanielio 1961
      Rosenblatt 1962
      Widrow 1962
      Anderson 1968
      Kohonen 1971
      Hodgkin, Huxley 1952
      Hartline, Ratliff 1957
      Grossberg 1967
      Von der Malsburg 1973
      digital computerY-A*X
      cross-correlate
      steepest descent
    626. image p068fig02.13 Hartline's lab developed a model to describe signal processing by the retina of the horseshoe crab.
      || Neurophysiology (network): lateral inhibition in limulus retina of horseshoe crab (Hartline, Ratliff, Miller 1963, Nobel Prize)
      hi = ei - sum[j=1 to n: {∫[dv, 0 to t: e^(-A*(t-v))*hj(v)] - Γj}(+) *Bji
      ei = spiking frequency without inhibition
      hi = spiking frequency with inhibition
      [w - r]+ vs i, Precursor of ADDITIVE network model.
    627. image p068fig02.14 Hodgkin and Huxley developed a model to explain how spikes travel down the squid giant axon.
      || Neurophysiology (single cell): spike potentials in squid giant axon (Hodgekin, Huxley 1952, Nobel Prize). time -> (dendrites -> cell body -> axon).
      C*dp[dt: V] = α*dp^2[dX^2: V] + (V(+) - V)*g(+) + (V(-) - V)*g(-) + (V^p - V)*g^p
      g(+) = G(+)(m,h), g(-) = G(-)(n), G^p = const, [m, h, n] - ionic processes, V - voltage
      Precursor of Shunting network model (Rail 1962). (Howell: see p075fig02.24 Membrane equations of neurophysiology. Shunting equation
    628. image p071fig02.15 The noise saturation dilemma: How do neurons retain their sensitivity to the relative sizes of input patterns whose total sizes can change greatly through time?
      || Noise-Saturation Dilemma (Grossberg 1968-1973). Bounded activities from multiple input sources.
      If activities xi are sensitive to SMALL inputs, then why don't the saturate to large outputs?
      If xi are sensitive to LARGE inputs, then why don't small inputs get lost in system noise?
      The functional unit is a spatial activity pattern .
    629. image p071fig02.16 To solve the noise-saturation dilemma, individual neurons in a network that is receiving a distributed spatial patterns of inputs need to remain sensitive to the ratio of input to them divided by all the inputs in that spatial pattern. Although the inputs are delivered to a finite number of neurons, the input and activity patterns are drawn continuously across the cells for simplicity.
      || Noise-Saturation Dilemma. [Ii, xi] vs t. [Input, Activity] pattern [small -> noise, large -> saturation]. Problem: remain sensitive to input RATIOS θi = Ii / sum[j: Ij] as total input I = sum[j: Ij] -> ∞. Many kinds of data exhibit sensitivity to ratios of inputs.
    630. image p072fig02.17 Brightness constancy.
      || Vision: brightness constancy, contrast normalization. Compute RATIOS of reflected light. Reflectance processing. p72c1h0.45 "... In other words, the perceived brightness of the gray disk is constant despite changes in the overall illumination. On the other hand, if only the gray disk were illuminated at increaing intensities, with the annulus illuminated at a constant intensity, then the gray disk would look progressively brighter.
    631. image p072fig02.18 Vision: brightness contrast. Conserve a total quantity, Total activity normalization.
      LUCERatio scales in choice behavior
      ZEILERAdaptation level theory

      ||
    632. image p073fig02.19 Computing with cells: infinity does not exist in biology!
      || Computing in a bounded activity domain, Gedanken experiment (Grossberg 1970). Vm sub-areas [xm, B - xm], I(all m)], m=[1, i, B].
      Bexcitable sites
      xi(t)excited sites (activity, potential)
      B - xi(t)unexcited sites
    633. image p073fig02.20 Shunting saturation occurs when inputs get larger to non-interacting cells.
      || Shunting saturation. [xi(t), B - xi(t)].
      (a)(b)
      d[dt: xi] = -A*xi + (B - xi)*Ii
      (a) Spontaneous decay of activity xi to equilibrium
      (b) Turn on unexcited sites B - xo by inputs Ii (mass action)
      Inadequate response to a SPATIAL PATTERN of inputs: Ii(t) = θi*I(t)
      θirelative intensity (cf. reflectance)
      I(t)total intensity (cf. luminance)
    634. image p073fig02.21 How shunting saturation turns on all of a cell's excitable sites as input intensity increases.
      || Shunting saturation. At equilibrium:
      0 = d[dt: xi] = -A*xi + (B - xi)*Ii
      xi = B*Ii / (A + Ii) = B*θi*I / (A + θi*I) -> B as I -> ∞
      Ii = θi*I, I = sum[j: Ij]
      I small: lost in noise; I large: saturates
      Sensitivity loss to relative intensity as total intensity increases.
    635. image p073fig02.22 An on-center off-surround network is capable of computing input ratios.
      || Computing with patterns.
      How to compute the pattern-sensitive variable: θi = Ii / sum[k=1 to n: Ik]?
      Needs interactions! What type? θi = Ii / sum[k ≠ i: Ik]
      Ii↑ ⇒ θi↑ excitation, Ik↑ ⇒ θk↓, k ≠ i inhibition
      On-center off-surround network.
    636. image p074fig02.23 The equations for a shunting on-center off-surround network. Shunting terms lead to many beautiful and important properties of these networks, which are found ubiquitously, in one form or another, in all cellular tissues.
      || Shunting on-center off-surround network.
      Mass action: d[dt: xi] = -A*xi +(B - xi)*Ii -xi*sum[k≠i: Ik]
      Turn on unexcited sitesTurn off excited sites
      At equilibrium:
      0 = d[dt: xi] = -(A + Ii + sum[k≠i: Ik])*xi + B*Ii = -(A + I)*xi + B*Ii
      xi = B*Ii/(A + I) = B*θi*I/(A + I) = θi* B*I/(A + I)No saturation!
      Infinite dynamical range
      Automatic gain control
      Compute ratio scale
      Weber law
      x = sum[k-1 to n: xk] = B*I/(A + I) ≤ B Conserve total activity
      NORMALIZATION
      Limited capacty
      Real-time probability
    637. image p075fig02.24 The membrane equations of neurophysiology describe how cell voltages change in response to excitatory, inhibitory, and passive input channels. Each channel is described by a potential difference multiplied by a conductance. With the special choices shown in the lower right-hand corner, this equation defines a feedforward shuntin on-center off-surround network.
      || Membrane equations of neurophysiology.
      C*dp[dt] = (V(+) - V)*g(+) +(V(-) - V)*g(-) +(V(p) - V)*g(p)
      Shunting equation (not additive)
      V Voltage
      V(+), V(-), V(p) Saturating voltages
      g(+), g(-), g(p) Conductances
      V(+) = B, C = 1; V(-) = V(p) = 0; g(+) = Ii; g(-) = sum[k≠i: Ik];
      lower V: V(+) = V(p) Silent inhibition, upper V: V(+). (Howell: see p068fig02.14 Grossberg's comment that Hodgkin&Huxley model was a "... Precursor of Shunting network model (Rail 1962) ...").
    638. image p076fig02.25 An on-center off-surround network can respond to increasing on-center excitatory inputs without a loss of sensitivity. Instead, as the off-surround input increases, the region of a cell's maximal sensitivity to an increasing on-center input shifts to a range of larger inputs. This is because the off-surround divides the effect of the on-center input, an effect that is often called a Weber law.
      || Web er law, adaptation, and shift property (Grossberg 1963).
      Convert to logarithmic coordinates:
      K = ln(Ii), Ii = e^K, J = sum[k≠i: Ik]
      xi(K,J) = B*Ii/(A + Ii + J) = B*e^K/(A + e^K + J)
      x(K + S, J1) = x(K, J2), S = ln((A + J1)/(A + J2)) size of SHIFT.
    639. image p076fig02.26 The mudpuppy retina exhibits the shift property that occurs in the feedforward shunting on-center off-surround network in Figure 2.25. As a result, its sensitivity also shifts in response to different background off-surrounds, and therefore exhibits no compression (dashed purple lines).
      || Mudpuppy retina neurophysiology.
      I center, J background
      a) Relative figure-to-ground
      b) Weber-Fechner I*(A + J)^(-I)
      c) No hyperpolarization, SHUNT: Silent inhibition
      d) Shift property(Werblin 1970) xi(K,J) vs K = ln(I)
      Adaptation- sensitivity shifts for different backgrounds. NO COMPRESSION.
    640. image p077fig02.27 A schematic of the on-center off-surround network that occurs in the mudpuppy retina, including three main cell types: receptors, horizontal cells, and bipolar cells.
      || Mechanism: cooperative-competitive dynamics.
      On-center off-surround (Kuffler 1953) cat retina
      Subtractive lateral inhibition (Hartline, Ratcliff 1956/7+) limulus retina.
      R receptor -> H horizontal -> B bipolar (Werblin, Dowling, etal 1969+) mudpuppy retina.
    641. image p077fig02.28 Silent inhibition is replaced by hyperpolarization when the inhibitory saturating potential is smaller than the passive saturating potential. Then an adpatation level is created that determines how big input ratios need to be to activate their cells.
      || Weber Law and adaptation level.
      Hyperpolarization vs Silent inhibition
      d[dt: xi] = -A*xi +(B - xi)*Ii -(xi + C)*sum[k≠i: Ik]
      At equilibrium:
      0 = d[dt: xi] = -(A + Ii + )*xi +B*Ii -C*sum[k≠i: Ik]
      = -(A + I)*xi +(B + C)*Ii -C*I
      = -(A + I)*xi +(B + C)*I*[θi -C/(B + C)]
      xi = (B + C)*I/(A + I)* [θi -C/(B + C)]
      Weber Law Reflectance Adaptation level
    642. image p078fig02.29 How the adaptation level is chosen to enable sufficiently distinct inputs to activate their cells.
      || Weber Law and adaptation level.
      xi = (B + C)*I/(A + I)* [θi -C/(B + C)]
      Weber Law Reflectance Adaptation level
      V(+) >> V(-) ⇒ B >> C ⇒ C/(B + C) << 1
      Adaptation level theory (Zeiler 1963).
    643. image p078fig02.30 Choosing the adaptation level to achieve informational noise suppression.
      || Noise suppression. Attenuate Zero Spatial frequency patterns: no information. Ii vs i (flat line), xi vs i (flat line at zero)
      B >> C: Try B = (n - 1)*C or C/(B + C) = 1/n
      Choose a uniform input pattern (no distinctive features): All θi = 1/n
      xi = (B + C)*I/(A + I)*[θi -C/(B + C)] = 0 no matter how intense I is.
    644. image p078fig02.31 How noise suppression enables matching of bottom-up and top-down input patterns.
      || Noise suppression -> pattern matching. mismatch (out of phase) suppressed, match (in phase) amplifies pattern.
    645. image p079fig02.32 Matching amplifies the matched pattern due to automatic gain control. See terms I and J in the equation.
      || Substrate of resonance. Match (in phase) of BU and TD input patterns AMPLIFIES matched pattern due to automatic gain control by shunting terms. J = sum[i: Ji], I = sum[i: Ii], θi = (Ii + Ji)/(I + J)
      xi = (B + C)*(I + J)/(A + I + J)*[θi -C/(B + C)]
      Need top-down expectations to be MODULATORY.
    646. image p080fig02.33 An opposite-attracts rule during the development of intracellular connections can lead to a mature network that realizes informational noise suppression.
      || How do noise suppression parameters arise? Symmetry-breaking during morphogenesis? Opposites attract rule.
      Intracellular parameters C/B = 1/(1 - n) Intercellular parameters
      Predicts that:
      • Intracellular excitatory and inhibitory saturation points can control the growth during development of :
      • Intercellular excitatory and inhibitory connections.
    647. image p080fig02.34 How to achieve informational noise suppression in a network with multiple parallel processing channels.
      || Symmetry-breaking: dynamics and anatomy.
      Dynamics:
      • excitatory range is amplified
      • inhibitory range is compressed
      Anatomy:
      • narrow on-center
      • broad off-surround
      Noise suppression: attenuates uniform patterns
      Contour direction: enhances pattern gradients
    648. image p081fig02.35 The equilibrium activities of a shunting netwok with Gaussian on-center off-surround kernels are sensitive to the ratio-contrasts of the input patterns that they process. The terms in the denominator of the equilibrium activities accomplish this using the shunting on-center and off-surround terms.
      || Ratio-contrast detector. flat versus [Gaussian Cki, flattened Gaussian? Eki]
      d[dt: xi] = -A*xi +(B - xi)*sum[k≠i: Ik]*Cki -(xi + D)*sum[k=1 to n: Ik*Eki]
      Cki = C*e^(-μ*(k - i)^2), Eki = E*e^(-μ*(k - i)^2)
      At equilibrium: xi = I*sum[k=1 to n: θk*Fki] / (A + I*sum[k=1 to n: θk*Gki])
      Fki = B*Cki -D*Eki (weighted D.O.G)
      Gki = Cki +Eki (S,O,G)
      • Reflectance processing
      • Contrast normalization
      • Discount illuminant
    649. image p081fig02.36 Informational noise suppression in network with Gaussian on-center and off-surround function as contour detectors that are sensitive to ratio-contrast.
      || Noise suppression and contour detection.
      If B*sum[k=1 to n: Cki] <= D*sum[k=1 to n: Eki] then:
      • uniform patterns are suppressed
      • contrasts are selectively enhanced
      • contours are detected
      Ii vs i, xi vs i
      Responses are selective to [REFLECTANCE, SPATIAL SCALE], eg color [feature, surface] contours.
    650. image p082fig02.37 My models begin with behavioral data, since brains are designed to achieve behavioral success. The text explains how models evolve in stages, through a process of successive refinements, or unlumpings. These unlumpings together carry out a kind of conceptual evolution, leading to models that can explain and predict ever larger psychological and neurobiological databases.
      || Modelling method and cycle.
      Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
      Operationalizes "proper level of abstraction"
      Operationalizes that you cannot "derive a brain" in one step.
    651. image p085fig02.38 Our models have been used in many large-scale applications to engineering and technology. Linking brain to behavior explains how brain mechanisms give rise to psychological functions, and do so autonomously. The combination of mechanism, function, and autonomy helps to explain their value in helping to solve outstanding problems in technology.
      || Modeling method and cycle.
      Behavioral data -(art of modeling)-> Design principles <- Neural data <-(brain predictions)- Mathematical model and analysis -(behavioral predictions)-> Behavioural data
      Technology: Mathematical model and analysis <-> Technological applications
      At every stage, spin off new model designs and mechanisms to technologist who need autonomous intelligent applications.
    652. image p087fig03.01 A macrocircuit of key visual processes (in green) and the cortical areas in which they primarily occur (in red), from the retina to the Prefrontal Cortex (PFC), including both the What and Where cortical streams. The [bottom-up, horizontal, and top-down] interactions help each of these processes to overcome computationally complementary processing deficiencies that they would experience without them, and aso to read-out top-down expectations that help to stabilize learning while they focus attention on salient objects and positions.
      || Emerging unified theory of visual intelligence. [What, Where] streams. Bottom-up and top-down interactions overcome COMPLEMENTARY processing deficiencies.
    653. image p089fig03.02 What do you think lies under the two grey disks? (on a checkers board)
      || p089c1h0.55 "... As your eye traverses the entire circular boundary (Howell: of a grey disk on a checkerboard), the contrast keeps flipping between light-to-dark and dark-to-light. Despite these contrast reversals, we perceive a single continuous boundary surrounding the gray disk. ...".
    654. image p090fig03.03 Kanizsa square and reverse-contrast Kanizsa square precepts. The spatial arrangement of pac-men, lines, and relative contrasts determines the perceived brightness of the squares, and even if they exhibit no brightness difference from their backgrounds, as in (b). These factors also determine whether pac-men will appear to be amodally completed behind the squares, and how far behind them.
      || p089c2h0.65 "...
      a) The percept of the square that abuts the pac-men is a visual illusion that is called the Kanizsa square. The enhanced brightness of the square is also an illusion.
      c) shows that these boundaries can be induced by either collinear edges or perpendicular line ends, and that both kinds of inducers cooperate to generate an even stronger boundary.
      d) if the perpendicular lines cross the positions of the illusory contours, then they can inhibit the strength of these contours. ..."
    655. image p091fig03.04 A cross-section of the eye, and top-down view of the retina, shao how the blind spot and retinal veins can occlude the registration of light signals at their positions on the retina.
      || Eye: [optic nerve, ciliary body, iris,lens, pupil, cornea, sclera, choroid, retina]. Human retina: [fovea, blind spot, optic nerve]. see alsi cross-section of retinal layer.
    656. image p092fig03.05 A cross-section of the retinal layer. Note that light stimuli need to go through all retinal layers before they reach the photoreceptor layer at which the light signals are registered.
      || light stimuli ->
      retinal layerscellular composition
      inner limiting membrane
      retinal nerve fibreganglion nerve fibres
      ganglion cellganglion
      inner plexiformamacrine
      inner nuclearhorizontal
      outer plexiform
      outer limiting membrane
      photoreceptorrod
      photoreceptorcone
      retinal pigment epithelium
      <- signal transduction. http://brain.oxfordjournals.org/content/early/2011/01/20/brain.awq346
    657. image p093fig03.06 Every line is an illusion because regions of the line that are occluded by the blind spot or retinal veins are completed at higher levels of brain processing by boundary completion and surface filling-in.
      || Every line is an illusion!
      Boundary completionWhich boundaries to connect?
      Surface filling-inWhat color and brightness do we see?
    658. image p094fig03.07 The processes of boundary completion and surface filling-in are computationally complementary.
      ||
      Boundary completionSurface filling-in
      outwardinward
      orientedunoriented
      insensitive to direction of contrastsensitive to direction-of-contrast
    659. image p095fig03.08 Computer simulation of a Kanizsa square percept. See the text for details.
      || p094c2h0.2 "...
      b) shows the feature contours that are induced just inside the pac-man boundaries.
      c) feature contours fill-in within the square boundary
      d) create a percept of enhanced brightness throughout the square surface ..."
    660. image p095fig03.09 Simulation of a reverse-contrast Kanizsa square percept. See the text for details.
      || p094c2h0.5 "...
      b) whereas bright feature contours are induced just inside the boundaries of the two black pac-men at the bottom of the figure, dark feature contours are induced inside the boundaries of the two white pac-man at the top of the figure
      c) the square boundary is recognized
      d) Because these dark and bright feature contours are approximately balanced, the filled-in surface color is indistinguishable from the filled-in surface color outside of the square, ... but [the square boundary is] not seen ..."
    661. image p096fig03.10 The visual illusion of eon color spreading. Neither the square nor the blue color that are percieved within it are in the image that defines a neon color display. The display consists only of black and blue arcs.
      ||
    662. image p096fig03.11 Another example of neon color spreading. The image is composed of black and blue crosses. See the text for details.
      || Howell: note the appearance of illusory red squares
    663. image p098fig03.12 In this picture of Einstein's face, [edges, texture, shading] are overlaid.
      ||
    664. image p100fig03.13 The Ehrenstein percept in the left panel is significantly weakened as the orientations of the lines that induce it deviate from being perpendicular deviate from being perpendicular to the illusory circle.
      ||
    665. image p100fig03.14 Boundaries are completed with the orientations that receive the largest total amount of evidence, or support. Some can form in the locally preferred orientations that are perpendicular to the inducing lines, while others can form through orientations that are not locally preferred, thus showing that there is initially a fuzzy band of almost perpendicular initial grouping orientations at the end of each line.
      || Perpendicular induction at line ends wrt [circular, square] boundaries
      line ends localglobal
      perpendicular, crisppreferredpreferred
      NOT perpendicular, fuzzyunpreferredpreferred
    666. image p100fig03.15 A fuzzy band of possible initial grouping orientations allows grouping to get started. Cooperative-competitive feedback via a hierarchical resolution of uncertainty chooses a sharp final grouping that has the most evidence to support it.
      || before choice: transient; after choice: equilibrium
    667. image p102fig03.16 T's and L's group together based on shared orientations, not identities.
      ||
    668. image p102fig03.17 The relative positions of the squares give rise to a percept of three regions. In the middle region, emergent diagonal groupings form, despite the fact that all the orientations in the image are verticals and horizontals.
      ||
    669. image p103fig03.18 Computer simulations in [b, c, e, f] of groupings in response to different spatial arrangements in [a,c, e, g] of inducers that are composed of short vertical boundaries. Note the emergent horizontal groupings in [d, f, h] and the diagonal groupings in h, despite the fact that all its inducers have vertical orientations.
      ||
    670. image p103fig03.19 As in Figure 3.18, emergent groupings can form whose orientations differ from thos of the inducing stimuli.
      || Thats how multiple orientations can induce boundary completion of an object. [diagonal, perpendicular, parallel]
    671. image p104fig03.20 Sean Williams: how boundaries can form
      ||
    672. image p104fig03.21 Four examples of how emergent boundaries can form in response to different kinds of images. These examples show how boundary webs can shape themselves to textures, as in (c), and shading, as in (d), in addition to lines, as in (a). In all these cases, the boundaries are invisible, but reveal themselves by supporting filling-in of surface brightness and color within their form-sensitive webs.
      ||
    673. image p105fig03.22 Depth-selective boundary representations capture brightness and colors in surface filling-in domains. See the text for details.
      || 3D vision and figure-ground separation. multiple-scale, depth-selective boundary webs. refer to Figure 3.21(d)
      depth increasing ↓boundariessurfaces
      BC inputsurface capture!
      FC input
    674. image p105fig03.23 The pointillist painting A Sunday on la Grande Jatte by Georges Seurat illustrates how we group together both large-scale coherence among the pixels of the painting, as well as forming small groupings around the individual dabs of color.
      ||
    675. image p106fig03.24 In response to the Synthetic Aperture image (upper corner left), a shunting on-center off-surround network "discounts the illiminant" and thereby normalizes cell activities to compute feature contours, without causing saturation (upper right corner). Multiple-scale boundaries form in response to spatially coherent activities in the feature contours (lower left corner) and create the webs, or containers, into which the feature contours fill-in the final surface representations (lower right corner).
      || Do these ideas work on hard problems? SAR!
      input imagefeature contoursboundary contoursfilled-in surface
      Synthetic Aperture Radar: sees through weather 5 orders of magnitude of power in radar returndiscounting the illuminant
      • normalizes the image: preseves RELATIVE activities without SATURATION
      • shows individual PIXELS
      boundaries complete between regions where normalized feature contrasts changefilling-in averages brightnesses within boundary compartments
    676. image p107fig03.25 The Roofs of Collioure by Matisse. See the text for details
      || p107c1h0.6 "... [Matisse] showed how patches of pure color, when laid down properly on a canvas, could be grouped by the brain into emergent boundarues, without the intervention of visible outlines. ... The trick was that these emergent boundaries, being invisible, or amodal, did not darken the colors in the surface representations. In this sense, Matisse intuitively realized that "all boundaries are invisible" through the masterful way in which he arranged his colors on canvas to generate boundaries that could support compelling surface representations. ..."
    677. image p107fig03.26 How "drawing directly in color" leads to colored surface representations. Amodal boundary webs control the filling-in of color within these surface representations. See the text for details.
      || color patches on canvas -> [surface color and form, Amodal boundary web]. Amodal boundary web -> surface color and form.
    678. image p108fig03.27 Matisse's painting Open Window, Collioure 1905 combines continuously colored surfaces with color patches that created surface representations using amodal boundaries, as in Figure 3.26. Both kinds of surfaces cooperate to form the final painterly percept.
      ||
    679. image p108fig03.28 The watercolor illusion of Baingio Pinna 1987 can be explained using spatial competition betweeen like-oriented boundary signals. This occurs at what I have called the First Competitive Stage. This is one stage in the brain's computation of hypercomplex cells, which are also called endstopped complex cells. Why the blue regions seem to bulge in depth may be explained using multple-scale, depth-selective boundary webs. See ther text for details.
      || Baigio Pinna. Watercolor illusion 1987. Filled-in regions bulge in depth. Multiple-scale, depth-selective boundary web!
    680. image p109fig03.29 The 3D percepts that are generated by chiaroscuro and trompe l'oeil both exploit the same kind of multiple-scale, depth-selective boundary webs that create the impression of a 3D bulge of the blue regions in the watercolor percept in Figure 3.28.
      || Chiascuro - Rembrandt self-portraitm Trompe l'oeil - Graham Rust.
    681. image p109fig03.30 The triptych of Joe Baer, called Primary Light Goup: Red, Green, and Blue 1964-1965, generates watercolor illusion percepts which, when displayed side by side in a museum, create a striking impression.
    682. image p110fig03.31 Henry Hensche's painting of The Bather is suffused with light.
      || p109c2h0.8 (Hawthorne 1938/60) wrote "... (pp 25-26) the outline and color of each spot of color against every other spot of color it touches, is the only kind of drawing you need to bother about ...Let color make form- do not make form and color it. ...". p110c1h0.6 (Robichaux 1997, p27) "... The untrained eye is fooled to think he sees forms by the model edges, not with color ... Fool the eye into seeing form without edges. (p33) Every form change must be a color change. ...".
    683. image p110fig03.32 Claude Monet's painting of Poppies Near Argenteuil. See the text for details.
      || Claude Monet Poppies Near Argenteuil 1873. p110c2h0.35 "... the red poppies and the green field around them are painted to have almost the same luminescence; that is, they are almost equiluminant. As a result, the boundaries between the red and green regions are weak and positionally unstable, thereby facilitating an occasional impression of the poppies moving in a gentle breeze, especially as one's attention wanders over the scene. ...".
    684. image p112fig03.33 Various ways that spatial gradients in boundary webs can cause self-luminous percepts. See the text for details.
      || Boundary web gradient can cause self luminosity. Similar to watercolor illusion. Gloss by attached highlight (Beck, Prazdny 1981), glare. (Bresan 2001) Double brilliant illusion, (Grossberg, Hong 2004) simulation. p111c2h0.5 "... This effect may be explained as the result of the boundary webs that are generated in response to the luminance gradients and how they control the filling-in of lightness within themselves and abutting regions. ... Due to the mutually inhibitory interactions across the boundaries that comprise these boundary webs, more lightness can spread into the central square as the steepness of the boundary gradients increases. ...".
    685. image p112fig03.34 Examples of Ross Bleckner's self-luminous paintings.
      || Self-luminous paintings (Ross Bleckner). Galaxy painting (1993), Galaxy with Birds (1993). p112c2h0.15 "... Bleckner does this, not by painting large surface areas with high reflectances or bright colors, but rather creating compositions of small, star-like, circular regions that are perceived as self luminous ...".
    686. image p113fig03.35 The Highest Luminance As White (HLAW) rule of (Hans Wallach 1948) works in some cases (top row) but not others (bottom row).
    687. image p113fig03.36 The Blurred Highest Luminance As White (BHLAW) rule that I developed with my PhD student, Simon Hong, works in caseswhere the rule of Hans Wallach fails, as can be seen by comparing the simulation in Figure 3.35 with the one in this figure.
      || Blurred Highest Luminance As White (BHLAW) rule (Grossberg, Hong 2004, 2006). Spatial integration (blurring) adds spatial context to lightness perception.
    688. image p114fig03.37 How the Blurred Highest Luminance as White rule sometimes normalizes the highest luminance to white (left panel) but at other times normalizes it to be self-luminous (right panel). See the text for details.
      || perceived reflectance vs cross-section of visual field. [white level, anchored lightness, self-luminous*, BHCAW]. *self-luminous only when conditions are right.
    689. image p114fig03.38 Four color-field spray paintings of Jules Olitski. The text explains why they generate surfaces percepts with such ambiguous depth.
      || Jules and his friends (1967), Lysander-1 (1970), Instant Loveland (1968), Comprehensive Dream (1965). p114c2h0.4 "... it is impossible to visually perceive descrete colored units within the boundary webs in Olitski's spray paintings. ... create a sense of ambiguous depth in the viewer, similar to staring into a space filled with colored fog, or into a sunset free of discrete clouds. Olitski intentionally created this effect. ...".
    690. image p115fig03.39 Two of Gene Davis's paintings in full color (top row) and in the monochromatic versions (bottom row). The text text explains how they achieve their different percepts of grouping and relative depth.
      || Gene Davis [Black popcorn, Pink flamingo] in [full color, monchromatic]. p115c1h0.8 "... His paintings ... are built up from vertical stripes. They do not contain size differences, shading, or recognizable objects. ...". p115c2h0.15 "... For starters, color similarities and/or almost equal luminances between stripes can influuence whether the viewer's eyes are drawn to individual stripes or groups of stripes. The achromatic versions of the two paintings more clearly show regions where the color assimilation is facilitated. ... Such form-sensitive spatial attention is called an attentional shroud. An attentional shroud, in turn, is created by a dynamical state in the brain that I call a surface-shroud resonance. ...".
    691. image p116fig03.40 A combination of T-junctions and perspective cues can create a strong percept of depth in response to 2D images, with a famous example being Leonardo da Vinci's painting of the Mona Lisa.
      || p116c2h0.05 "... Many Renaissance artists learned how to use perspective cues ... Renaissance artists also undeerstood how to use T-junctionslike the ones that occur where the vertical and horizontal edges intersect in Figure 3.40 (left column, bottom row), or in the Kanizsa square percepts in Figure 3.3, or in the zebra image in Figure 3.21b.
    692. image p117fig03.41 End gaps, or small breaks or weakenings of boundaries, can form where a stronger boundary abuts a weaker, like-oriented, boundary, as occurs where black boundaries touch red boundaries in the neon color spreading image of Figure 3.11.
      || Boundary contours - lower contrast boundary signals are weakened. feature contours- no inhibition, feature signals survive and spread. MP -> [BCS, FCS]. BCS -> FCS.
    693. image p117fig03.42 Two paintings by Frank Stella. See the text for details.
      || Firzubad (top row) ... and Khurasan Gate (variation) (bottom row). p117c1h0.75 "... The luminance and color structure within a painting affects how it groups and stratifies the figures within it. These processes, in turn, affect the formation of attentional shrouds that organize how spatial attention is is allocated as we view them. ..." "... Stella wrote Furzabad is a good example of of lookng for stability and trying to create as much instability as possible. 'Cause those things are like bicycle wheels spinning around'."
    694. image p120fig03.43 Four paintings by Monet of the Rouen cathedral under different lighting conditions (top row) and their monochromatic versions (bottom row). See the text for details.
      || p119c2h0.25 "... Monet uses nearby colors that are nearly equiluminant, and sharp, high-contrast luminance defined edges are sparse. He hereby creates weaker boundary signals within and between the parts of many forms, and stronger boundary signals between the forms. This combination facilitates color spreading within the forms and better separation of brightness and collor differences between forms. ... The grayscale versions of these paintings demonstrate the near equiluminance of the brushstrokes within forms, and places in which brightness and color differences significantly influence the groupings that differentiate between forms, including the differentiation between the cathedral and the sky. ..."
    695. image p120fig03.44 The Rouen cathedral at sunset generates very different boundary webs than it does in full sunlight, as illustrated by Figure 3.45.
      || Rouen Cathedral at sunset (Monet 1892-1894).
      • Lighting almost equiluminant
      • Most boundaries are thus caused by color differences, not luminance differences
      • Fine architectural details are obscured, leading to...
      • Coarser and more uniform boundary webs, so...
      • Less depth in the painting.
    696. image p121fig03.45 The Rouen cathedral in full sunlight.
      || Rouen Cathedral full sunlight (Monet 1892-1894).
      • Lighting is strongly non-uniform across most of the painting
      • Strong boundaries due to both luminance and color differences
      • Fine architectural details are much clearer, leading to...
      • Finer and more non-uniform boundary webs, so...
      • Much more detail and depth
    697. image p121fig03.46 The Rouen cathedral in full sunlight contains T-Junctions that are not salient in the painting of it at sunset. These are among the painting's features that give it a much more depthful appearance.
      || Rouen Cathedral full sunlight (Monet 1892-1894).
      • There are also more T-junctions where vertical boundaries occlude horizontal boundaries, or conversely...
      • Leading to more depth.
      p119c2h1.0 "... Such T-junction boundary occlusions ... can generate percepts of depth in the absence of any other visual clues. ...".
    698. image p123fig04.01 A classical example of how boundaries are barriers to filling-in.
      || Combining stabilized images with filling-in (Krauskopf 1963, Yarbus 1967). Image: Stabilize these boundaries with suction cup attached to retina or electronic feedback circuit. Percept: A visible effect of an invisible cause!
    699. image p124fig04.02 The verical cusp of lesser and greater illuminance is the same in both images, but the one on the left prevents brightness from flowing around it by creating closed boundaries that tighly surround the cusp.
    700. image p126fig04.03 A McCann Mondrian is an excellent display with which to illustrate how our brains discount the illuminant to compute the "real" colors of objects. See the text for details.
      || Color constancy: compute ratios. McCann Mondrian. Biological advantage: never see in bright light, eg tropical fish
      Discount the illuminantCompute lightness
      Different colors seen from the same spectrum
      ... similar to those seen in white light
      Physical basis: reflectance RATIOS!
    701. image p128fig04.04 When a gradient of light illuminates a McCann Mondrian, there is a jump in the total light that is reflected at nearby positions where the reflectances of the patches change,
      || Compute reflectance changes at contours. Fill-in illuminant-discounted surface colors.
      leftright
      I + εI - ε
      A*(I + ε)B*(I - ε)
      A*(I + ε)/(B*(I - ε)) - 1 = A/B - 1
    702. image p129fig04.05 Multiple-scale balanced competition chooses color contours where the reflectance of the patches change. These color contours discount the illuminant.
      || Compute reflectance changes at contours. Fill-in illuminant-discounted surface colors. Discount illuminant: compute color contours.
    703. image p129fig04.06 Filling-in of color contours restores a surface percept with colors that substantially discount the illuminant.
      || Compute reflectance changes at contours. Fill-in illuminant-discounted surface colors. Fill-in surface color: hierarchical resolution of uncertainty.
    704. image p130fig04.07 Simulation of brightness constancy under uniform illumination.
      || Simulation of brightness constancy (Grossberg & Todorovic 1988). Uniform illumination. [stimulus (S), feature (F), boundary (B), output]. B -> F -> S -> B: Veridical! Boundary peaks are spatially narrower than feature peaks.
    705. image p131fig04.08 Simulation of brightness constancy under an illimination gradient. Note that the feature content pattern (F) is the same in both cases, so too is the boundary contour (B) pattern that is derived from it, and the final filled-in surface.
      || Simulation of brightness constancy. Discount the illuminant. [stimulus (S), feature (F), boundary (B), output]. B -> F -> S -> B: not veridical, but useful! Ratio-sensitive feature contours (F).
    706. image p131fig04.09 Simulation of brightness contrast
      || Simulation of brightness contrast. [stimulus (S), feature (F), boundary (B), output].
    707. image p132fig04.10 Simulation of brightness assimilation. Note how the equal steps on the left and right sides of the luminance profile are transformed into different brightness levels.
      || Simulation of brightness assimilation. [stimulus (S), feature (F), boundary (B), output].
    708. image p132fig04.11 Simulations of a double step (left panel) and the Craik-O'Brien-Cornsweet (COCE) illusion. Note that discounting the illuminant creates similar feature contour patterns, from which the fact that the COCE looks like the double step follows immediately.
      || Simulations of double step and COCE. [stimulus (S), feature (F), boundary (B), output].
    709. image p133fig04.12 Simulation of the 2D COCE.
      || (Todorovic, Grossberg 1988). p132c2h0.6 "... 2D Craik-O'Brien-Cornsweet Effect percepts that are generated by the stimulus in the left panel of Figure 4.2. ..."
    710. image p134fig04.13 Contrast constancy shows how the relative luminances when a picture is viewed in an illumination gradient can even be reversed to restore the correct reflectances due to discounting the illuminant.
    711. image p134fig04.14 The kinds of displays that Michael Paradiso and Ken Nakayamas used to catch filling-in "in the act" and which Karl Arrington then simulated using the Grossberg and Todorovic 1988 model.
      || Experiments on filling-in. Catching "filling0in" in the act (Paradiso, Nakayama 1991). (Arrington 1994 Vision Research 34, 3371-3387) simulated these data using the model of Grossberg and Todorovic 1988.
    712. image p138fig04.15 Simple cells are oriented contrast detectors, not edge detectors.
      || From oriented filtering to grouping and boundary completion (Hubei, Weisel 1968). Oriented receptive fields: SIMPLE CELLS. Sensitive to : orientation, [amount, direction] of contrast, spatial scale. Oriented local contrast detectors, not edge detectors!
    713. image p139fig04.16 The simplest way to realize an odd simple cell receptive field and firing threshold.
      || "Simplest" simple cell model. need more complexity for processing natural scenes. Difference-of-Gaussian or Gabor filter (J. Daugman, D. Pollen...). Output signal vs cell activity. Threshold linear signal, half-wave rectification.
    714. image p140fig04.17 Complex cells pool inputs from simple cells that are sensitive to opposite contrast polarities. Complex cells hereby become contrast invartiant, and can respond to contrasts of either polarity.
      || Complex cells: pool signals from like-oriented simple cells of opposite contrast polarity at the same position. They are "insensitive to contrast polarity". Half-wave rectification of inputs from simple cells.
    715. image p141fig04.18 The images formed on the two retinas in response to a single object in the world are displaced by different amounts with respect to their foveas. This binocular disparity is a powerful cue for determining the depth of the object from an observer.
      || Binocular Disparity. Binocular disparities are used in the brain to reconstruct depth from 2D retinal inputs, for relatively near objects.
    716. image p141fig04.19 A laminar cortical circuit for computing binocular disparities in layer 3B of V1 at binocular simple cells. These cells add positionally disparate inputes from like polarized monocular simple cells (layer 4 of V1). Binocular simple cells at each position that is sensitive to opposite polarities then add their outputs at complex cells in layer 2/3. Chapter 10 will explain how these laminar circuits work in greater detail.
      || Laminar cortical circuit for complex cells. [left, right] eye.
      V1 layerdescription
      2/3Acomplex cells
      3Bbinocular simple cells
      4monocular simple cells
    717. image p142fig04.20 A Glass pattern and a reverse-contrast Glass pattern give rise to a different boundary groupings because simple cells can only pool signals from like-polarity visual features. See the text for details.
    718. image p143fig04.21 Oriented simple cells can respond at the ends of thick enough bar ends, but not at the ends of thin enough lines. See the text for an explanation of why this is true, and its implications for visual system design.
      || Hierarchical resolution of uncertainty. For a given field size. Different responses occur at bar ends and line ends. For a thin line no detector perpendicular to line end can respond enough to close the boundary there. Network activity.
    719. image p144fig04.22 Computer simulation of how simple and complex cells respond to the end of a line (gray region) that is thin enough relative to the receptive field size (thick dashed region in the left panel). These cells cannot detect the line end, as indicated by the lack of responses there in the left panel (oriented short lines denote the cells' preferred positions and orientations, and their lengths denote relative cell activations). Such an end gap is corrected in the responses of hypercomplex cells that create a boundary at the line end which is called an end cut (right panel). See the text for details.
      || End gap and end cut simulation (Grossberg, Mingolia 1985). End gap, filter size, end cut.
    720. image p145fig04.23 If end gaps were not closed by end cuts, then color would flow out of every line end!
      || A perceptual disaster in the feature contour system. feature contour, line boundary. input -> [boundary, surface]. boundary -> surface. Color would flow out of every line end! as it does during neon color spreading.
    721. image p145fig04.24 A brain's task in creating an end cut to replace an ambiguous end gap requires that it be sensitive to the pattern of signals across the network, not just the activities of individual neurons.
      || Hierarchical resolution of uncertainty. End Cuts. The boundary system must CREATE a line end at next processing stage: Every line end is illusory! input -> ambiguous -> end cut. vertical -> vertical, ambiguous -> horizontal. A pattern-to-pattern map, not a pixel-to-pixel map.
    722. image p146fig04.25 Networks of simple, complex, and hypercomplex cells can create end cuts as an example of hierarchical resolution of uncertainty. See the text for details.
      || How are end cuts created? (Grossberg 1984) Two stages of short-range competition. 1st stage: Simple cells -> complex cells -> hypercomplex - endstopped complex. First competitive stage- across position, same orientation; Second competitive stage- same position, across orientation. -> cooperation.
    723. image p148fig04.26 End cuts are formed during neon color spreading in the same way that they are formed at line ends.
      || End cut during neon color spreading.
      FIRST competitive stageSECOND competitive stage
      within orientationacross orientation
      across positionwithin position
      to generate end cuts.
    724. image p149fig04.27 Bipole cells can form boundaries that interpolate end cuts, and use their cooperative-competitive interactions to choose the boundary groupings that have the most support from them.
      || Bipole cells: boundary completion. long-range cooperation & short-range inhibition: complete winning boundary groupings and suppress weaker boundaries.
    725. image p150fig04.28 Bipole cells have two branches (A and B), or poles, in their receptive fields. They help to carry out long-range boundary completion.
      || Bipole property. Boundary completion via long-range cooperation. Completing boundaries inwardly between pairs or great numbers of inducers in an oriented way. fuzzy "AND" gate.
    726. image p151fig04.29 Experimental evidence of bipole cells in cortical area V2 was reported by Von der Heydt, Peterhans, and Baumgarter (1984).
      || Bipoles: first neurophysiological evidence (V2) (von der Heydt, Peterhans, Baumgartner 1984, Peterhans, von der Heydt 1988). (Grossberg 1984) prediction.
      Ordering:
      Stimulus (S)
      probe location *
      cells in V2
      response?
      ...(S)*...YES
      ...*...(S)NO
      (S)...*...NO
      (S)...*...(S)YES
      (S)...*...
      (more contrast)
      NO
      (S)...*.....(S)YES
      Evidence for receptive field.
    727. image p151fig04.30 Anatomical evidence for long-range horizontal connections has also been reported, as illustrated by the example above from (Bosking etal 1997).
      || Anatomy: horizontal connections (V1) (Bosking etal 1997). tree shrew. [10, 20]*[20, 10, 0, -10, -20] (degrees).
    728. image p152fig04.31 The predicted bipole cell receptive field (upper left corner) has been supported by both neurophysiological data and psychophysical data, and used in various forms by many modelers. See the text for details.
      || Bipoles through the ages. (Grossberg 1984; Grossberg, Mongolla 1985). (Field, Hayes, Hess 1993) "association field". (Heitger, von der Heydt 1993). (Williams, Jacobs 1997). cf. "relatability" geometric constraints on which countours get to group (Kellman & Shipley 1991). Also "tensor voting" (Ullman, Zucker, Mumford, Guy, Medioni, ...).
    729. image p153fig04.32 The double filter network embodies simple, complex, and hypercomplex (or endstopped complex) cells. It feeds into a network of bipole cells that can complete boundaries when it properly interacts with the double filter.
      || Double filter and grouping network. Cells : simple -> complex -> hypercomplex (endstopping) -> bipole
      Grouping networkbipole cells
      Double filterhypercomplex cells
      endstopping
      complex cells
      simple cells
    730. image p156fig04.33 A tripartite texture (top row) and two bipartite textures (bottom row) that illustrate how emergent boundary groupings can segregate textured regions from one another.
    731. image p157fig04.34 Some textures that were simulated with mixed success by the complex channels model. In particular, the model gets the wrong answer for the textures in (g) and (i). The Boundary Contour System model of Figure 4.32, which includes both a double filter and a bipole grouping network, simulates the observed results.
    732. image p159fig04.35 Spatial impenetrability prevents grouping between the pac-men figures in the left figure, but not in the figure on the right.
      || p158c2h0.75 "... In the image shown in the left panel, the horizontal boundaries of the background squares interfere with vertical boundary completion by vertically-oriented bipole cells, again by spatial impenetrability. In contrast, the vertical boundaries of the background squares are collinear with the vertical pac-man inducers, thereby supporting formation of the square boundaries. Finer aspects of these percepts, such as why the square ... (right panel) appears to lie in front of four partially occuded circular discs, as regularly occurs when the Kanizsa square can form (eg Figure 3.3), can be understood using FACADE theory mechanism that will shown below to explain many figure-ground percepts using natural extensions to the three dimensional world of boundary and and surface mechanisms that we have already discussed. ..."
    733. image p159fig04.36 Graffiti art by Banksy exploits properties of amodal boundary completion and spatial impenetrability.
      || p159c1h0.75 perceptual psychologist Nava Rubin "... When the wall is smooth, Banksy leaves the regions previously covered by stencil unpainted, relying of observers' perception to segregate figural regions from the (identically colored) background. But when the wall is patterned with large-scale luminance edges - eg due to bricks - Banksy takes the extra time to fill in unpainted figural regions with another color (Rubin 2015). ..."
    734. image p161fig04.37 Kanizsa squares that form either collinearly to their inducers (left panel) or perpendicular to them (right panel) confirm predictions of the BCS boundary completion model.
      || Analog-sensitive boundary completion. contour strength vs Kanizsa square image. Increases with "support ratio" (Shipley, Kellman 1992). Inverted-U (Lesher, Mingoloa 1993; cf Soriano, Spillmann, Bach 1994)(shifted gratings). p370h0.6 BCS = Boundary Contour System, FCS = Feature Contour System. p161c1h0.85 "... As predicted by the BCS, they found an Inverted-U in contour strength as a function of line density. ... This effect may be explained by the action of the short-range competition that occurs before the stage of long-range cooperative grouping by bipole cells (Figure 4.32). It is thus another example of the balance between cooperative and competitive mechanisms. ..."
    735. image p162fig04.38 How long-range cooperation among bipole cells and short-range competition by hypercomplex cells work together to generate the inverted-U in boundary strength that is found in the data of Figure 4.37 (right panel).
      || Cooperation and competition during grouping.
      few lineswide spacing, inputs outside spatial range of competition, more inputs cause higher bipole activity
      more linesnarrower spacing, slightly weakens net input to bipoles from each inducer
      increasing line densitycauses inhibition to reduce net total input to bipoles
    736. image p163fig04.39 A schematic of the LAMINART model that explains key aspects of laminar visual cortical anatomy and dynamics. LGN -> V1 [6, 4, 2/3] -> V2 [6, 4, 2/3]
      || p163c1h0.6 "... The first article abount laminar computing ... proposed how the laminar cortical model could process 2D pictures using bottom-up filtering and horizontal bipole grouping interactions (Grossbeerg, Mingolla, Ross 1997). In 1999, I was able to extend the model to also include top-down circuits for expectation and attention (Grossberg 1999)(right panel). Such a synthesis of laminar bottom-up, horizontal, and top-down circuits is characteristic of the cerebral cortex (left panel). I called it LAMINART because it began to show how properties of Adaptive Resonance Theory, or ART, notably the ART prediction about how top-down expectations and attention work, and are realized by identical cortical cells and circuits. You can immediately see from the schematic laminar circuit diagram ... (right panel) that circuits in V2 seem to repeat circuits in V1, albeit with a larger spatial scale, despite the fact that V1 and V2 carry out different functions, How this anatomical similarity can coexist with functional diversity will be clarified in subsequent sections and chapters. It enable dfifferent kinds of biological intelligence to communicate seamlessly while carrying out their different psychological functions. ..."
    737. image p164fig04.40 The Koffka-Benussi ring. See the text for details.
      || p164c2h0.25 "... [left image] The luminance of the ring is intermediate between the luminances of the two background regions. Its perceived brightness is also between the brightnesses of the two background regions, and appears to be uniform throughout. The right image differs from the left only in that a vertical line divides the two halves of the ring where it intersects the two halves in the background. Although the luminance of the ring is still uniform throughout, the two halves of the rig now have noticeably different brightnesses, with the left half of the ring looking darker than the right half. How can drawing a line have such a profound effect on the brightnesses of surface positions that are so far away from the line? ..."
    738. image p165fig04.41 The Kanizsa-Minguzzi ring. See the text for details.
      || p165c1h0.6 "... (left panel), the annulus is divided by two line segments into annular sectors of unequal area. Careful viewing shows that the smaller sector looks a little brighter than the larger one. (Kanizsa, Minguzzi 1986) noted that "this unexpected effect is not easily explained. In fact, it cannot be accounted for by any simple psychological mechanism such as lateral inhibition or freuency filtering. Furthermore, it does not seem obvious to invoke oganizational factors, like figural belongingness or figure-ground articulation."". p165c2h0.35 "... (Grossberg, Todorovic 1988). Our main claim is that the two radial lines play two roles, one in the formation of boundaries with which to contain the filling-in process, and the other as a source of feature contour signals that are filled-in within the annular regions to create a surface brightness percept. ..."
    739. image p166fig04.42 Computer simulation of Kanizsa-Minguzzi ring percept. See the text for details.
    740. image p167fig04.43 (a) How bipole cells cause end cuts. (b) The Necker cube generates a bistable percept of two 3D parallelopipeds. (c) Focusing spatial attention on one of the disks makes it look both nearer and darker, as (Tse 1995) noted and (Grossbert, Yazdanbakhsh 1995) explained.
      || T-junction sensitivity. image -> bipole cells -> boundary. (+) long-range cooperation, (-) short-range competition.
    741. image p168fig04.44 Macrocircuit of the main boundary and surface formation stages that take place from the lateral geniculate nucleus, or LGN, through cortical areas [V1, V2, V4]. See the text for details.
      ||
      left eyebinocularright eye
      V4 binocular surface
      V2 monocular surfaceV2 layer 2/3 binocular boundaryV2 monocular surface
      V2 layer 4 binocular boundary
      V1 monocular surfaceV1 monocular boundaryV1 binocular boundaryV1 monocular boundaryV1 monocular surface
      LGNLGN
    742. image p168fig04.45 How ON and OFF feature contour (FC) activities give rise to filled-in surface regions when they are adjacent to a like oriented boundary, but not otherwise.
    743. image p170fig04.46 Surface regions can fill-in using feature contour inputs (+ and - signs) if they are adjacent to, and collinear with, boundary contour inputs (solid) line, as in (a), but not otherwise, as in (b).
    744. image p170fig04.47 A double-opponent network processes output signals from opponent ON and OFF Filling-In DOmains, or FIDOs.
      || OFF FIDO -> shunting networks -> ON FIDO -> shunting networks-> opponent interation -> FIDO outputs
    745. image p171fig04.48 How closed boundaries contain filling-in of feature contour signals, whereas open boundaries allow color to spread to both sides of the boundary.
      || Before filling-in: boundary contour, illuminant-discounted feature contour; After filling-in: no gap, gap
    746. image p171fig04.49 An example of DaVinci stereopsis in which the left eye sees more of the wall between A and C than the right eye does. The region between B and C is seen only by the left eye because the nearer wall between C and D occludes it from the right eye view.
    747. image p173fig04.50 This figure illustrates how a closed boundary can be formed in a prescribed depth due to addition of binocular and monocular boundaries, but not at other depths.
      || How are closed 3D boundaries formed? V1 Binocular, V2 boundary, V2 surface; Prediction: monocular and horizontal boundaries are added to ALL binocular boundaries along the line of sight. Regions that are surrounded by a CLOSED boundary can depth-selectively contain filling-in of lightness and colored signals.
    748. image p174fig04.51 The same feedback circuit that ensures complementary consistency between boundaries and surfaces also, automatically, initiates figure-ground separation! See the text for details.
      || before feedback: [V1 -> V2 pale stripe -> V2 thin stripe, "attention pointers" (Cavanagh etal 2010)]; after feedback: [V1 + V2 thin stripe] -> V2 pale stripe via contrast sensitive [exhitation, inhibition] for depths [1, 2] -> object recognition
    749. image p174fig04.52 An example of how the 3D LAMINART model can transform the two monocular images of the random dot stereogream in the top row into the three depth-separated surfaces representations in the bottom row.
      || Stereogram surface percepts: surface lightnesses are segregated in depth (Fang and Grossberg 2009). [left, right] inputs, [far, fixation, near] planes. Contrast algorithms that just compute disparity matches and let computer code build the surface eg (Marr, Poggio, etal 1974).
    750. image p176fig04.53 The on-center off-surround network within position and across depth helps to explain why brighter Kanizsa squares look closer.
      || inhibition vs. depth. p176c1h0.25 "... to qualitatively understand how this example of proximity-luminance covariance works. It follows directly from the boundary pruning by surface contour feedback signals (Figure 4.51) that achieves complementary consistency and initiates figure-ground perception. ...". p176c1h0.45 "... these inhibitory sigals are part of an off-surround network whose strength decreases as the depth difference increases between the surface that generates the signal and its recipient boundaries. ...". p176c1h0.8 "... Within FACADE theory, the perceived depth of a surface is controlled by the boundaries that act as its filling-in generators and barriers (Figure 3.22), since these boundaries select the depth-sselective FIDOs within whin filling-in can occur, and thereby achieve surface capture. These boundaries, in turn, are themselves strengthened after surface-to-boundary contour feedback eliminates redundant boundaries that cannot support sucessful filling-in (Figure 4.51). These surface contour feedback signals have precisely the properties that are needed to explain why brighter Kanizsa squares look closer! ..."
    751. image p178fig04.54 Initial steps in figure-ground separation. See the text for details.
      ||
      topLeftrepeats the image in Figure 1.3
      topRightshows again the long-range cooperation and short-range compeition that are controlled by the bipole grouping process (Figure 4.43a middle panel)
      bottomLeftshows the end gaps that are caused by these bipole grouping mechanisms
      bottomRightshows how surface filling-in is contained within the closed horizontal rectangular boundary, but spills out of the end gaps formed in the other two rectangles
    752. image p178fig04.55 Amodal completion of boundaries and surfaces in V2.
      || Separated V2 boundaries: near, far (amodal boundary completion); Separated V2 surfaces: ?horizonal, vertical? (amodal surface filling-in).
    753. image p179fig04.56 Final steps in generating a visible, figure-ground separated, 3D surface representation in V4 of the unoccluded parts of opaque surfaces.
      || Visible surface perception.
      Boundary enrichment:nearfarasymmetry between near & far
      V4horizontal rectanglehorizontal & vertical rectanglescannot use these (overlapping?) boundaries for occluded object recognition
      V2horizontal rectanglevertical rectangleuse these boundaries for occluded object recognition
      Visible surface filling-in:filling-in of entire vertical rectanglepartial filling in of horizontal rectanglevisible percept of unoccluded [vertical] surface
    754. image p181fig04.57 Percepts of unimodal and bistable transparency (top row) as well as of a flat 2D surface (bottom row, left column) can be induced just by changing the relative contrasts in an image with a fixed geometry.
      || X junction
    755. image p182fig04.58 LAMINART model processing stage that are sufficient to explain many percepts of transparency, including those summarized in Figure 4.57.
      || [left, right] eye, [LGN, V1 [6, 4, 3B, 2/3 A], V2 [4, 2/3]], [mo, bi]nocular cart [simple, complex] cells, [excita, inhibi]tory cart [connection, cell]s.
    756. image p186fig05.01 Humans and other autonomous adaptive intelligent agents need to be able to learn both many-to-one and one-to-many maps.
      || Learn many-to-one (compression, naming) and one-to-many (expert knowledge) maps
    757. image p186fig05.02 Learning a many-to-one map from multiple visual fonts of a letter to the letter's name requires a stage of category learning followed by one of asscociatively learned mapping.
      || Many-to-one map- two stages of compression: visual categories, auditory categories
    758. image p186fig05.03 Many-to-one maps can learn a huge variety of kinds of predictive information.
      || Many-to-one map, two stage compression: IF-THEN rules: [symptom, test, treatment]s; length of stay in hospital
    759. image p189fig05.04 The hippocampus is one of several brain regions that are important in learning and remembering about objects and events that we experience throughout life. The book will describe several hippocampal processes that contribute to this achievement in different ways.
      || hypothalmic nuclei, amygdala, hippocampus, cingulate gyrus, corpus callosum, thalamus
    760. image p192fig05.05 ON and OFF cells in the LGN respond differently to the sides and ends of lines.
      || [ON, OFF]-center, [OFF, ON]-surround (respectively). OFF-center cells maximum response at line end (interior), ON-center cells maximum response along sides (exterior)
    761. image p192fig05.06 Bottom-up and top-down circuits between the LGN and cortical area V1. The top-down circuits obey the ART Matching Rule for matching with bottom-up input patterns and focussing attention on expected critical features.
      || Model V1-LGN circuits, version [1, 2]. retina -> LGN relay cells -> interneurons -> cortex [simple, endstopped] cells -> cortex complex cells
    762. image p193fig05.07 A more detailed description of the connections between retinal ganglion cells, the LGN, and V1.
      ||
    763. image p193fig05.08 The patterns of LGN activation and inhibition on the sides and ends of a line without the top-down feedback (A) and with it (C). The top-down distribution of excitation (+) and inhibition (-) are shown in (B).
      ||
    764. image p194fig05.09 A computer simulation of the percept (D) that is generated by feature contours (B) and boundary contours (C) in response to an Ehrenstein disk stimulus (A).
      ||
    765. image p198fig05.10 A competitive learning circuit learns to transform distributed feature patterns into selective responses of recognition categories.
      || Competitive learning and Self-Organized Maps (SOMs). input patterns -> feature level (F1) -> adaptive filter (T=ZS) ->
    766. image p199fig05.11 Instar learning enables a bottom-up adaptive filter to become selectively tuned to particular feature patterns. Such pattern learning needs adaptive weights that can either increase or decrease to match the featural activations that they filter.
      || Instar learning STM->LTM: need both increases and decreases in strength for the LTM pattern to learn the STM pattern
    767. image p200fig05.12 The duality of the outstar and instar networks is evident when they are drawn as above.
      ||
    768. image p200fig05.13 Instar and outstar learning are often used to learn the adaptive weights in the bottom-up filters and top-down expectations that occur in ART. The ART Matching Rule for object attention enables top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features.
      || Expectations focus attention: feature pattern (STM), Bottom-Up adaptive filter (LTM), Category (STM), competition, Top-Down expectation (LTM); ART Matching Rule: STM before top-down matching, STM after top-down matching (attention!)
    769. image p200fig05.14 Outstar learning enables individual sampling cells to learn distributed spatial patterns of activation at the network of cells that they sample. Again, both increases and decreases in LTM traces must be possible to enable them to match the activity pattern at the sampled cells.
      || Outstar learning, need both increases and decreases in ????
    770. image p201fig05.15 An outstar can learn an arbitrary spatial pattern of activation at its sampled nodes, or cells. The net pattern that is learned is a time average of all the patterns that are active at the sampled nodes when the sampling node is active.
      || Spatial learning pattern, outstar learning.
    771. image p202fig05.16 In the simplest example of category learning, the category that receives the largest total input from the feature level is chosen, and drives learning in the adaptive weights that abut it. Learning in this "classifying vector", denoted by zi, makes this vector more parallel to the input vector from the feature level that is driving the learning (dashed red arrow).
      || Geometry of choice and learning
    772. image p202fig05.17 This figure summarizes the simplest equations whereby the adaptive weights of a winning category learn the input pattern that drove it to win, or more generally a time-average of all the input patterns that succeeded in doing so.
      || Geometry of choice and learning, learning trains the closest LTM vector
    773. image p205fig05.18 How catastrophic forgetting can occur in a competitive learning or self-organizing map model due to basic properties of competition and associative learning.
      || Learning from pattern sequences, practicing a sequence of spatial patterns can recode all of them! When is learning stable? Input patterns cannot be too dense relative to the number of categories; Either: not to many distributed inputs relative to the number of categories, or not too many input clusters
    774. image p207fig05.19 The ART hypothesis testing and learning cycle. See the text for details about how the attentional system and orienting system interact in order to incorporate learning of novel categories into the corpus of already learned categories without causing catastophic forgetting.
      ||
    775. image p211fig05.20 The PN and N200 event-related potentials are computationally complementary events that are computed within the attentional and orienting systems.
      || PN and N200 are complementary waves. PN [top-down, conditionable, specific] match; N200 [bottom-up, unconditionable, nonspecific] mismatch
    776. image p211fig05.21 Sequences of P120, N200, and P300 event-related potentials occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
      || ERP support for mismatch-mediated reset: event-related potentials: human scalp potentials. ART predicted correlated sequences of P120-N200-P300 Event Related Potentials during oddball learning. P120 mismatch; N200 arousal/novelty; P300 STM reset. Confirmed in (Banquet and Grossberg 1987)
    777. image p213fig05.22 Suppose that a very different exemplar activates a category than the one that originally learned how to do this.
      || By prior learning, X1 at F1 is coded at F2, Suppose that X2 incorrectly activates the same F2 code. How to correct the error? The problem occurs no matter how you define an "error"
    778. image p213fig05.23 A category, symbol, or other highly compressed representation cannot determine whether an error has occurred.
      || Compression vs error correction. past vs present. Where is the knowledge than an error was made? Not at F2! The compressed code cannot tell the difference! X2 is at F1 when (green right triangle GRT) is at F2 defines the error. There is a mismatch between X1 and X2 at F1. How does the system know this?
    779. image p214fig05.24 Learning of a top-down expectation must occur during bottom-up learning in the adaptive filter in order to be able to match the previously associated feature pattern with the one that is currently active.
      || Learning top-down expectations. When the code (green right triangle GRT) for X1 was learned at F2, GRT learned to read-out X1 at F1. [Bottom-Up, Top-Down] learning
    780. image p214fig05.25 The sequence of events whereby a novel input pattern can activate a category which, in turn, reads out its learned top-down expectation to be matched against the input pattern. Error correction thus requires the use of a Match Detector that has properties of the Processing Negativity ERP.
      || How is an error corrected. During bottom-up learning, top-down learning must also occur so that the pattern that is read out top-down can be compared with the pattern that is activated by bottom-up inputs. Match detector: Processing Negativity ERP. 1. top-down, 2. conditionable, 3. specific, 4. match
    781. image p214fig05.26 When a big enough mismatch occurs, the orienting system is activated and sends a burst of nonspecific arousal to the category level. This Mismatch Detector has properties of the N200 ERP.
      || Mismatch triggers nonspecific arousal. Mismatch at F1 eleicits a nonspecific event at F2. Call this event nonspecific arousal. N200 ERP Naatanen etal: 1. bottom-up, 2. unconditionable, 3. nonspecific, 4. mismatch
    782. image p215fig05.27 Every event activates both the attentional system and the orienting system. This text explains why.
      || Attentional and Orienting systems. Every event has a cue (specific) and an arousal (nonspecific) function
    783. image p215fig05.28 How a mismatch between bottom-up and top-down input patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level.
      || Mismatch -> inhibition -> arousal -> reset. BU input orienting arousal, BU+TD mismatch arousal and reset. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
    784. image p220fig05.29 Vigilance is a gain parameter on inputs to the orienting system that regulates whether net excitation from bottom-up inputs or inhibition from activated categories will dominate the orienting system. If excitation wins, then a memory search for a better matching will occur. If inhibition wins, then the orienting system will remain quiet, thereby enabling resonance and learning to occur.
      || Vigilance control [resonate and learn, reset and search]. ρ is a sensitivity or gain parameter
    785. image p221fig05.30 When a predictive disconfirmation occurs, vigilance increases enough to drive a search for a more predictive category. If vigilance increases just enough to exceed the analog match between features that survive top-down matching and the entire bottom-up input pattern, then minimax learning occurs. In this case, the minimum amount of category generalization is given up to correct the predictive error.
      || Match tracking realizes minimax learning principle. Given a predictive error, vigilance increases just enough to trigger search and thus acrifices the minimum generalization to correct the error ... and enables expert knowledge to be incrementally learned. predictive error -> vigilance increase just enough -> minimax learning
    786. image p221fig05.31 A system like Fuzzy ARTMAP can learn to associate learned categories in one ART network with learned categories in a second ART network. Because both bottom-up and top-down interactions occur in both networks, a bottom-up input pattern to the first ART network can learn to generate a top-down output pattern from the second ART network.
      || Fuzzy ARTMAP. Match tracking realizes minimax learning principle: vigilence increases to just above the match ratio of prototype / exemplar, thereby triggering search
    787. image p224fig05.32 Learning the alphabet with two different levels of vigilance. The learning in column (b) is higher than in column (a), leading to more concrete categories with less abstract prototypes. See the text for details.
      ||
    788. image p225fig05.33 Some early ARTMAP benchmark studies. These successes led to the use of ARTMAP, and many variants that we and other groups have developed, in many large-scale applications in engineering and technology that has not abated even today.
      || see Early ARTMAP benchmark studies
    789. image p225fig05.34 ARTMAP was successfully used to learn maps of natural terrains with many advantages over those of mapping projects that used AI expert systems. The advantages are so great that many mapping projects started to use this technology.
      || AI expert system - 1 year: field identification of natural regions; derivation of ad hoc rules for each region by expert geographers; correct 80,000 of 250,000 site labels; 230m (site-level) scale. ARTMAP system - 1 day: rapid, automatic, no natural regions or rules; confidence map; 30m (pixel-level) scale can see roads; equal accuracy at test sites
    790. image p226fig05.35 I had shown in 1976 how a competitive learning or self-organizing map model could undergo catastrophic forgetting if the input environment was sufficiently dense and nonstationary, as illustrated by Figure 5.18. Later work with Gail Carpenter showed how, if the ART Matching Rule was shut off, repeating just four input patterns in the correct order could also casue catastrophic forgetting by causing superset recoding, as illustrated in Figure 5.36.
      || Code instability input sequences. D C A; B A; B C = ; |D|<|B|<|C|; where |E| is the number of features in the set E. Any set of input vectors that satisfy the above conditions will lead to unstable coding if they are periodically presented in the order ABCAD and the top-down ART Matching Rule is shut off.
    791. image p226fig05.36 Column (a) shows catastrophic forgetting when the ART Matching Rule is not operative. It is due to superset recoding. Column (b) shows how category learning quickly stabilizes when the ART Martching Rule is restored.
      || Stabel and unstable learning, superset recoding
    792. image p228fig05.37 A macrocircuit of the neurotrophic Spectrally Timed ART, or nSTART, model. I developed nSTART with my PhD student Daniel Franklin. It proposes how adaptively timed learning in the hippocampus, bolstered by Brain Derived Neurotrophic Factor, or BDNF, helps to ensure normal memory consolidation.
      || habituative gates, CS, US, Thalamus (sensory cortex, category learning, conditioned reinforcer learning, adaptively time learning and BDNF), Amygdala (incentive motivation learning), Hippocampus (BDNF), Prefrontal Cortex (attention), Pontine nuclei, Cerebellum (adaptively timed motor learning)
    793. image p230fig05.38 The Synchronous Matching ART, or SMART, model includes spiking neurons in a laminar cortical hierarchy. I developed SMART with my PhD student Massimiliano Versace. By unlumping LAMINART to include spiking neurons, finer details of neurodynamics, such as the existence of faster gamma oscillations during good enough matches, and slower beta oscillations during bad enough mismatches, could be shown as emergent properties of network interactions.
      || Second order thalamus -> specific thalamic nucleus -> Thalamic reticulate nucleus -> neocortical laminar circuit [6ll, 6l, 5, 2/3, 1] -> Higher order cortex. Similar for First order thalamus -> First order cortex, with interconnection to Second order, nonspecific thalamic nucleus
    794. image p231fig05.39 The SMART hypothesis testing and learning cycle predicts that vigilance increases when a mismatch in subcortical regions like the nonspecific thanlamus activates the nucleus basalis of Meynert which, in turn, broadcasts a burst of the neurotransmitter acetylcholine, or ACh, to deeper cortical layers. Due to the way in which LAMINART proposes that cortical matching and mismatching occurs, this ACh burst can increase vigilance and thereby trigger a memory search. See the text for details.
      || [BU input, [, non]specific thalamic nucleus, thalamic reticulate nucleus, neocortical laminar circuit] cart [Arousal, Reset, Search, Vigilance]
    795. image p232fig05.40 Computer simulation of how the SMART mode generates (a) gamma oscillations if a good enough match occurs, or (c) beta oscillations if a bad enough match occurs. See the text for details.
      || Brain oscillations during match/mismatch, data, simulation. (a) TD corticothalamic feedback increases synchrony (Sillito etal 1994) (b) Match increases γ oscillations (c) Mismatch increases θ,β oscillations
    796. image p232fig05.41 (a)-(c). The sequence of interlaminar events that SMART predicts during a mismatch reset. (d) Some of the compatible neurophysiological data.
      || Mismatch causes layer 5 dendritic spike that trigger reset. (a) Arousal causes increase in nonspecific thalamic nuclei firing rate and layer 5 dendritic and later somatic spikes (Karkum and Zhu 2002, Williams and Stuart 1999) (b) Layer 5 spikes reach layer 4 via layer 6i and inhibitory neurons (Lund and Boothe 1975, Gilbert and Wiesel 1979) (c) habituative neurotransmitters in layer 6i shift the balance of active cells in layer 4 (Grossberg 1972, 1976) (d) Dendritic stimulation fires layer 5 (Larkum and Zhu 2002) stimulation apical dendrites of nonspecific thalamus
    797. image p233fig05.42 Mismatch-induced beta oscillations have been reported in at least three parts of the brain: V1, V4, and hippocampus. Althpough there may be other reasons for beta oscillations in the brain, those that are caused by a mismatch should be studied in concert with the gamma oscillations that occur during a good enough match. See tyhe text for details.
      || Is there evidence for the [gamma, beta] prediction? Yes, in at least three parts of the brain, (Buffalo EA, Fries P, Ladman R, Buschman TJ, Desimone R 2011, PNAS 108, 11262-11267) Does this difference in average oscillation frequencies in the superficial and deep layers reflect layer 4 reset? Superficial recording γ (gamma), Deep recording β (beta) (Berke etal 2008, hippocampus; Buschman and Miller 2009, FEF)
    798. image p236fig05.43 The activation of the nucleus basalis of Meynert, and its subsequent release of ACh into deeper layers of neocortex, notably layer 5, is assumed to increase vigilance by reducing afterhyperpolarization (AHP) currents.
      || Vigilance control: mismatch-mediated acetylcholine release (Grossberg and Versace 2008). Acetylcholine (ACh) regulation by nonspecific thalamic nuclei via nucleus basalis of Meynert reduces AHP in layer 5 and causes a mismatch/reset thereby increasing vigilance. HIGH vigilance ~ sharp code, LOW vigilance ~ coarse code
    799. image p240fig05.44 When an algebraic exemplar model is realized using only local computations, it starts looking like an ART prototype model.
      || How does the model know which exemplars are in category A? BU-TD learning. How does a NOVEL test item access category A?
    800. image p241fig05.45 The 5-4 category structure is one example of how an ART network learns the same kinds of categories as human learners. See the text for details.
      || 5-4 Category structure. A1-A5: closer to the (1 1 1 1) prototype; B1-B4: closer to the (0 0 0 0) prototype
    801. image p242fig05.46 Computer simulations of how two variants of Distributed ARTMAP incrementally learn the 5-4 category structure. See the text for details.
      || Distributed ARTMAP with [self-supervised learning, post-training LTM noise]
    802. image p245fig05.47 How long-range excitatory connections and short-range disynaptic inhibitory connections realize the bipole grouping law.
      || stimulus -> boundary representation -> layer 2/3
    803. image p246fig05.48 Microcircuits of the LAMINART model that I developed with Rajeev Raizada. See the text for details of how they integrate bottom-up adaptive filtering, horizontal bipole grouping, and top-down attentional matching that satisfied the ART Matching Rule.
      ||
    804. image p248fig05.49 This circuit of the LAMINART model helps to explain properties of Up and Down states during slow wave sleep, and how disturbances in ACh dynamics can disrupt them.
      ||
    805. image p252fig06.01 A surface-shroud resonance begins to form when the surface representations of objects bid for spatial attention. In addition to these topographic excitatory inputs, there is long-range inhibition of the spatial attention cells that determines which inputs will attract spatial attention.
      || Bottom-up spatial attention competition. [more, less] luminous perceptual surfaces -> competition -> spatial attention
    806. image p253fig06.02 After bottom-up surface inputs activate spatial attentional cells, they send top-down topographic excitatory signals back to the surface representations. This recurrent shunting on-center off-surround network contrast enhances larger attentional activities while approximately normalizing the total spatial attentional activity. A surface-shroud resonance hereby forms that selects an attentional shroud, enhances the perceived contrast of the attended surface (light blue region), and maintains spatial attention on it.
      || Surface-shroud resonance. perceptual surfaces -> competition -> spatial attention. (Carrasco, Penpeci-Talgar, and Eckstein 2000, Reynolds and Desimone 2003)
    807. image p254fig06.03 These interactions of the ARTSCAN Search model enable it to learn to recognize and name invariant object categories. Interactions between spatial attention in the Where cortical stream, via surface-shroud resonances, and object attention in the What cortical stream, that obeys the ART Matching Rule, coordinate these learning, recognition, and naming processes.
      || Retinal image -> On & OFF cell contrast normalization (retina/LGN) -> polarity [, in]sensitive contrast enhancement (V1) -> object [boundary (V2), surface (V2/V4)] -> surface contour (V2); What stream categories: volition control (BG). object boundary (V2) <-> view (ITp) <-> view integrator (ITp) <-> object (ITa) <-> [object-value (ORB), value (Amyg)] <-> name (PFC)
    808. image p255fig06.04 The ARTSCAN Search model can also search for a desired target object in a scene, thereby clarifying how our brains solve the Where's Waldo problem.
      || similar ilustration as Figure 06.03, with some changes to arrows
    809. image p257fig06.05 A curve tracing task with monkeys was used by Roelfsema, Lamme, and Spekreijse in 1998 to demonstrate how spatial attention can flow along object boundaries. See the text for details.
      || Attention flows along curves: Roelfsema etal 1998: Macaque V1. fixation (300ms) -> stimulus (600ms RF - target curve, distractor) -> saccade. Crossed-curve condition: attention flows across junction between smoothly connected curve segments, Gestalt good continuation
    810. image p258fig06.06 Neurophysiological data and simulation of how attention can flow along a curve. See the text for details.
      || Simulation of Roelfsema etal 1998, data & simulation. Attention directed only to far end of curve. Propagates along active layer 2/3 grouping to distal neurons.
    811. image p258fig06.07 A top-down spotlight of attention can also be converted into a shroud. This process begins when the spotlight triggers surface filling-in within a region. Figure 6.8 shows how it is completed.
      || Reconciling spotlights and shrouds: top-down attentional spotlight becomes a shroud. spotlight of attention, surface filling-in
    812. image p259fig06.08 The distributed ARTSCAN, or dARTSCAN, model includes spatial attention in both PPC and PFC, and both fast-acting attention, triggered by transient cells in Where cortical areas such as MT, and slower-acting surface-shroud resonances in What cortical areas such as V4 and PPC. See the text for details.
      || dARTSCN spatial attention hierarchy, Fast (Where stream) Slow (What stream) (Foley, Grossberg, and Mingolia 2012). [transient cells (MT) ->, object surfaces (V4) <->] [object shrouds (PPC) <-> spatial shrouds (PPC/PFC)]
    813. image p260fig06.09 Crowding in the periphery of the eye can be avoided by expanding the size and spacing of the letters to match the cortical magnification factor.
      || Crowding: visible objects and confused recognition. Accurate target recogition requires increased flanker spacing at higher eccentricity
    814. image p260fig06.10 The cortical magnification factor transforms (A) artesian coordinates in the retina into (B) log polar coordinates in visual cortical area V1.
      ||
    815. image p261fig06.11 If the sizes and distances between the letters stays the same as they are received by more peripheral parts of the retina, then all three letters may be covered by a single shroud, thereby preventing their individual perception and recognition.
      || Crowding: visible objects and confused recognition. log compression and center-surround processing cause... input same eccentricity, surface, object shroud, crowding threshold. object shrouds merge!
    816. image p261fig06.12 Pop-out of the L among T's can easily occur when inspecting the picture to the left. In the picture to the right, a more serial search is needed to detect the vertical red bar due to overlapping conjunctions of features.
      ||
    817. image p265fig06.13 The basal ganglia gate perceptual, cognitive, emotional, and more processes through parallel loops.
      || [motor, ocularmotor, dorsolateral, ventral-orbital, anterior cingulate] vs. [Thalamus, pallidum-subs, nigra, Striatum, Cortex]
    818. image p267fig06.14 Feedback from object surfaces to object boundaries uses surface contours. This feedback assures complementary consistency and enables figure-ground separation. A corollary discharge of the surface contours can be used to compite salient object feature positions.
      || Perceptual consistency and figure-ground separation.
    819. image p268fig06.15 The largest salient feature signal is chosen to determine the next target position of a saccadic eye movement. This This target position signal self-inhibits to enable the next most salient position to be foveated. In this way, multiple feature combinations of the object can be foveated and categorized. This process clarifies how the eyes can explire even novel objects before moving to other objects. These eye movements enable invariant categories to be learned. Each newly chosen target position is, moreover, an "attention pointer" whereby attention shifts to the newly foveated object position.
      || How are saccades within an object determined? Figure-ground outputs control eye movements via V3AA! Support for prediction (Theeuwes, Mathot, and Kingstone 2010), More support: "attention pointers" (Cavanaugh etal 2010), Even more support (Backus etal 2001, Caplovitz and Tse 2006, Galletti and Battaglia 1989, Nakamura and Colby 2000)
    820. image p270fig06.16 The same target position signal that can command the next saccade also updates a gain field that predictively maintains the attentional shroud in head-centered coordinates, even before the eye movement is complete. This process keeps the shroud invariant under eye movements, so that it can continue to inhibit reset of an emerging invariant category as t is associated with multiple object views, even while the conscious surface representation shifts with each eye movement in retinotopic coordinates. This pdating process is often called predictive re mapping.
      || Predictive remapping of eye movements! From V3A to LIP. [spatial attention, object attention, figure-ground separation, eye movement remapping, visual search]. (Beauvillaib etal 2005, Carlson-Radvansky 1999, Cavanaugh etal 2001, Fecteau & Munoz 2003, Henderson & Hollingworth 2003, Irwin 1991)
    821. image p271fig06.17 Persistent activity in IT cells is just what is needed to enable view-invariant object category learning by ARTSCAN to be generalized to [view, position, size]-invariant category learning by positional ARTSCAN, or pARTSCAN. See the text for details.
      || Persistent activity in IT. Physiological data show that persistent activity exist in IT (Fuester and Jervey 1981, Miyashita and Chang 1988, Tomita etal 1999). Adapted from (Tomita etal 1999 Nature)
    822. image p272fig06.18 The pARTSCAN model can learn [view, position, size]-invariant categories by adding view category integrator cells that have the properties of persistent neurons in IT. These integrator cells get reset with the invariant object category, not the view category.
      || pARTSCAN: positionally-invariant object learning. (Cao, Grossberg, Markowitz 2011). IT cells with persistent activities are modeled by view category integrators in ITp. View-specific category cells are RESET as the eyes move within the object. View category integrator cells are NOT RESET when the view-specific category is reset. They are RESET along with invariant object category cells when a spatial attention shift occurs.
    823. image p272fig06.19 The various parts of this figure explain why persistent activity is needed in order to learn positionally-invariant object categories, and how this fails when persistent activity is not available. See the text for details.
      ||
    824. image p273fig06.20 pARTSCAN can simulate the IT cell recoding that Li and DiCarlo reported in their swapping experiments because the swapping procedure happens without causing a parietal reset burst to occur. Thus the originally activated invariant category remains activated and can get associated with the swapped object features.
      || Simulation of Li and DiCarlo swapping data. data (Li and DiCarlo 2008), model (Cao, Grossberg, Markowitz 2011). normalized response vs. exposure (swaps and/or hours)
    825. image p274fig06.21 pARTSCAN can also simulate the trade-off in IT cell responses between position invariance and selectivity that was reported by Zoccolan etal 2007. This trade-off limits the amount of position invariance that can be learned by a cortical area like V1 that is constrained by the cortical magnification factor.
      || Trade-off in IT cell response properties. Inferotemporal cortex cells with greater position invariance respond less selectively to natural objects. invariance-tolerance, selectivity-sparseness. data (Zoccolan etal 2007) model (Grossberg, Markowitzm, Cao 2011). position tolerance (PT, degrees) vs sparseness (S)
    826. image p274fig06.22 pARTSCAN can simulate how IT cortex processes image morphs, when it learns with high vigilance. See the text for details.
      || Akrami etal simulation: a case of high vigilance. tested on morphs between image pairs
    827. image p275fig06.23 Data from (Akrami etal 2009) and our simulation of it. See the text for details.
      || IT responses to image morphs. data vs model
    828. image p275fig06.24 Left and right eye stereogram inputs are constructed to generate percepts of objects in depth. These percepts include the features of the objects, not only their relative depths, a property that is not realized in some other models of steriopsis. See the text for details.
      || Sterogram surface percepts: surface lightnesses are segregated in depth (Fand, Grossberg 2009). Contrast algorithms that just compute disparity matches and let computer code build the surface, eg (Marr, Poggio 1974)
    829. image p276fig06.25 In addition to the gain field that predictively maintains a shroud in head-centered coordinates during saccades, there are gain fields that predictively maintain binocular boundaries in head-centered coordinates so that they can maintain binocular fusion during saccades and control the filling-in of surfaces in retinotopic coordinates.
      || Surface-shroud resonance.
    830. image p277fig06.26 Gain fields also enable predictive remapping that maintain binocular boundary fusion as the eyes move betweeen objects. See the text for details.
      || Predictive remapping maintains binocular boundary fusion even as eyes move between objects. retinotopic boundary -> invariant boundary (binocular)
    831. image p278fig06.27 A surface-shroud resonance through the Where stream enables us to consciously see an object while a feature-category resonance into the What stream enables us to recognize it. Both kinds of resonances can synchronize via visual cortex so that we can know what an object is when we see it.
      || What kinds of resonances support knowing vs seeing? What stream [knowing, feature-prototype resonance], Where stream [seeing, surface-shroud resonance]
    832. image p278fig06.28 If the feature-category resonances cannot form, say due to a lesion in IT, then a surface-shroud resonance can still support conscious seeing of an attended object, and looking at or reaching for it, even if the individual doing so knows nothing about the object, as occurs during visual agnosia. The surface-shroud resonance supports both spatial attention and releases commands that embody the intention to move towards the attended object.
      || What kinds of resonances support knowing vs seeing? visual agnosia: reaching without knowing Patient DF (Goodale etal 1991). Attention and intention both parietal cortical functions (Anderson, Essick, Siegel 1985; Gnadt, Andersen 1988; Synder, Batista, Andersen 1997, 1998)
    833. image p283fig07.01 The usual boundary processing stages of [simple, complex, hypercomplex, bipole] cells enable our brains to correct uncontrolled persistence of previously excited cells just by adding habituative transmitter gates, or MTM traces, at appropriate places in the network.
      || Boundary processing with habituative gates. spatial competition with habituative gates, orientational competition: gated dipole, bipole grouping
    834. image p284fig07.02 Psychophysical data (top row) and simulation (bottom row) of how persistence decreases with flash illuminance and duration.
      || Persistence data and simulations. (Francis, Grossberg, Mingolia 1994 Vision Research, 34, 1089-1104). Persistence decreases with flash illuminance and duration (Bowen, Pola, Matin 1974; Breitmeyer 1984; Coltheart 1980). Higher luminance or longer duration habituates the gated dipole ON channel more. Causes larger and faster rebound in the OFF channel to shut persisting ON activity off.
    835. image p285fig07.03 Persistence decreases with flash illuminance and duration due to the way in which habituative transmitters regulate the strength of the rebound in response to offset of a stimulating input, and how this rebound inhibits previously activated bipole cells.
      || Persistence data and simulations (Francis, Grossberg, Mingolia 1994 Vision Research, 34, 1089-1104). Persistence decreases with flash illuminance and duration. Horizontal input excites a horizontal bipole cell, which supports persistence. Offset of the horizontal input causes a rebound of activity in the vertical pathway, which inhibits the horizontal bipole cell, thereby terminating persistence.
    836. image p286fig07.04 Illusory contours persist longer than real contours because real contours have more inducers whose rebound at contour offset can cause faster boundary reset. Illusory contours also take longer to form than real contours, which explains the increasing portion of the curve.
      || Persistence data and simulations (Meyer, Ming 1988; Reynolds 1981). Increasing portion of curve is due to formation time of the illusory contour. Longer persistence is due to fewer bottom-up inducers of an illusory contour that has the same length as a real contour: only illuminance-derived edges generate reset signals. When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
    837. image p286fig07.05 This figure shows the propagation through time of illusory contour offset from the rebounded cells that got direct inputs to the center of the contour.
      || Persistence data and simulations. Illusory contours persist longer than real contours (Meyer, Ming 1988; Reynolds 1981). When bottom-up inducers are inhibited by OFF cell rebounds, their offset gradually propagates to the center of the illusory contour.
    838. image p287fig07.06 The relative durations of persistence that occur due to an adaptation stimulus of the same or orthogonal orientation follow from the properties of the habituative gated dipoles that are embedded in the boundary completion system.
      || Persistence data and simulations. Change in persistence depends on whether adaptation stimulus has same or orthogonal orientation as test grating (Meyer, Lawson, Cohen 1975). If adaptation stimulus and test stimulus have the same orientation, they cause cumulative habituation, which causes a stronger reset signal, hence less persistence. When they are orthogonal, the competition on the ON channel is less, hence more persistence.
    839. image p287fig07.07 Persistence increases with distance between a target and a masking stimulus due to weakening of the spatial competition in the first competitive stage of hypercomplex cells.
      || Persistence data and simulations. Persistence increases with distance between a target and a masking stimulus (Farrell, Pavel, Sperling 1990). There is less spatial competition from the masker to the target when they are more distant, hence the target is more persistent.
    840. image p290fig08.01 Motion in a given direction pools all possible contrast-sensitive sources of information that are moving in that direction.
      ||
    841. image p291fig08.02 Complex cells can respond to motion in opposite directions and from features with opposite contrast polarities.
      ||
    842. image p292fig08.03 The MacKay and waterfall illusion aftereffects dramatically illustrate the different symmetries that occur in the orientational form stream and the directional motion stream.
      || Form and motion aftereffects. different inhibitory symmetries govern orientation and direction. illusions: [Form- MacKay 90°, Motion- waterfall 180°]. stimulus, aftereffect percept
    843. image p293fig08.04 Most local motion signals on a moving object (red arrows) may not point in the direction of the object's real motion (green arrows). This problem besets every neuron due to the fact that it receives signals only in a space-limited aperture.
      || Most motion signals may not point in an object's direction of motion. Aperture problem. EVERY neuron's receptive field experiences an aperture problem. How doe the brain use the small number of [correct, unambiguous] motion signals to compute an object's motion direction?
    844. image p295fig08.05 The perceived direction of an object is derived either from a small subset of feature tracking signals, or by voting among ambiguous signals when feature tracking signals are not available.
      || Aperture problem. Barberpole illusion (Wallach). How do sparse feature tracking signals capture so many ambiguous motion signals to determine the perceived motion direction?
    845. image p296fig08.06 In the simplest example of apparent motion, two dots turning on and off out of phase in time generate a compelling percept of continuous motion between them.
      || Simplest long-range motion paradigm. ISI- interstimulus interval, SOA- stimulus onset synchrony
    846. image p296fig08.07 When two flashes turn on and off out of phase with the correct range of interstimulus intervals, and not too far from one another, then either beta motion of phi motion are perceived.
      || Beta and Phi motion percepts. Beta motion: percepts of continuous motion of a well-defined object across empty intervening space. Phi motion: sense of "pure" motion without a concurrent percept of moving object. (Exner 1875) http://www.yorku.ca/eye/balls.htm
    847. image p297fig08.08 When a second flash is more intense than the first flash, then apparent motion may occur from the second to the first flash.
      || Delta motion: motions from the second to the first flash. Data: (Kolers 1972; Korte 1915). Simulation: (Grossberg, Rudd 1992). This occurs when the luminance or contrast of the second flash is large compared to that of the first flash. Sustained and transient cells obey shunting dynamics whose averaging rates speed up with output intensity. The first flash to wane is the one that will be the source of the G-wave.
    848. image p297fig08.09 Simulation of motion in opposite directions that is perceived when two later flashes occur on either side of the first flash.
      || Split motion. Data: (H.R. Silva 1926), Simulation: (Grossberg, Rudd 1992)
    849. image p298fig08.10 Simulation of the motion speed-up that is perceived when flash duration decreases.
      || "The less you see it, the faster it moves". Data: (Giaschi, Anstis 1989), Simulation: (Grossberg, Rudd 1992). ISI = 0, flash duration decreases; SOA = constant, flash duration decreases
    850. image p298fig08.11 This formotion percept is a double illusion due to boundary completion in the form stream followed by long-range apparent motion using the completed bioundaries in the motion stream.
      || Form-motion interactions. Apparent motion of illusory contours (Ramachandran 1985). Double illusion! Illusory contour is created in form stream V1-V2. Apparent motion of illusory contours occurs in motion stream due to a V2-MT interaction.
    851. image p300fig08.12 A single flash activates a Gaussian receptive field across space whose maximum is chosen by a winner-take-all recurrent on-center off-surround network.
      || Gaussian receptive fields are sufficient! (Grossberg, Rudd 1992). Single flash. Suppose that a single flash causes a narrow peak of activity at the position where it occurs. It generates output signals through a Gaussian filter that produces a Gaussian activity profile at the next processing stage., A recurrent on-center off-surround network chooses the maximum activity and suppresses samaller activities. Winner-take-all
    852. image p300fig08.13 As a flash waxes and wanes through time, so too do the activities of the cells in its Gaussian receptive field. Because the maximum of each Gaussian occurs at the same position, nothing is perceived to move.
      || Temporal profile of a single flash. Suppose that a single flash quickly turns on to maximum activity, stays there for a short time, and then shuts off. It causes an increase in activity, followed by an exponential decay of activity. The corresponding Gaussian profile waxes and wanes through time. Since the peak position of the Gaussian does not change through time, nothing moves.
    853. image p300fig08.14 Visual inertia depicts how the effects of a flash decay after the flash shuts off.
      || Inertia (%) vs ISI (msec)
    854. image p301fig08.15 If two flashes occur in succession, then the cell activation that is caused by the first one can be waning while the activation due to the second one is waxing.
      || Temporal profile of two flashes. Of two flashes occur in succession, the waning of the activity due to the first flash may overlap with the waxing of the activity due to the second flash.
    855. image p301fig08.16 The sum of the waning Gaussian activity profile due to the first flash and the waxing Gaussian activity profile due to the second flash has a maximum that moves like a travelling wave from the first to the second flash.
      || Travelling wave (G-wave): long-range motion. If the Gaussian activity profiles of two flashes overlap sufficiently in space and time, then the sum of Gaussians produced by the waning of the first flash added to the Gaussian produced by the waxing of the second flash, can produce a single-peaked travelling wave from the position of the first flash to that of the second flash. The wave is then processed through a WTA choice network (Winner Take All). The resulting continuous motion percept is both long-range and sharp.
    856. image p302fig08.17 An important constraint on whether long-rang apparent motion occurs is whether the Gaussian kernel is broad enough to span the distance between successive flashes.
      || Motion speed-up with increasing distance: For a fixed ISI, how does perceived velocity increase with distance between the flashes? Gaussian filter : Gp = exp{ -(j-i)^2 / (2*K^2) }. The largest separation, L_crit, for which sufficient spatial overlap between two Gaussians centered at locations i and j will exist to support a travelling wave of summed peak activity is : L_crit = 2*K
    857. image p302fig08.18 This theorem shows how far away (L), given a fixed Gaussian width, two flashes can be to generate a wave of apparent motion between them.
      || G-wave properties (Grossberg 1977). Let flashes occur at positions i=0 and i=L. Suppose that d[dt: x0] = -A*x0 + J0; d[dt: xL] = -A*xL + JL; Define G(w,t) ...; Theorem 1 max_w G(w,t) moves continuously through time from w=0 to w=L if and only if L <= 2*K.
    858. image p303fig08.19 The dashed red line divides combinations of flash distance L and Gaussian width K into two regions of no apparent motion (above the line) and apparent motion (below the line).
      || No motion vs motion at multiple scales.
    859. image p303fig08.20 The G-wave speeds up with the distance between flashes at a fixed delay, and has a consitent motion across multiple spatial scales.
      || G-wave properties (Grossberg 1977). Theorem 2 (Equal half-time property) The time at which the motion signal reaches position w=L/2. Apparent motion speed-up with distance: this half-time is independent of the distance L between the two flashes. Consistent motion across scales: half-time is independent of the scale size K. Method of proof: elementary algebra and calculus (Grossberg, Rudd 1989 appendix)
    860. image p304fig08.21 A computer simulation of the equal half-time property whereby the apparent motions within different scales that respond to the same flashes all reach the half-way point in the motion trajectory at the same time.
      || Equal half-time property: how multiple scales cooperate to generate motion percept. Travelling waves from Gaussian filters of different sizes bridge the same distance in comparable time. The time needed to bridge half the distance between flashes is the same.
    861. image p304fig08.22 Data (top image) and simulation (bottom image) of Korte's laws. The laws raise the question of how ISIs in the hundreds of milliseconds can cause apparent motion.
      || Korte's Laws, Data: (Korte 1915) Simulation: (Francis, Grossberg 1996)
    862. image p305fig08.23 Despite its simplicity, the Terus display can induce one of four possible percepts, depending on the ISI.
      || Ternus motion. ISI [small- stationary, intermediate- element, larger- group] motion http://en.wikipedia.org/wiki/Ternus_illusion
    863. image p305fig08.24 When each stimulus has an opposite contrast relative to the background, element motion is eliminated and replaced by group motion at intermediate values of the ISI.
      || Reverse-contrast Ternus motion. ISI [small- stationarity, intermediate- group (not element!), larger- group] motion.
    864. image p306fig08.25 The Motion BCS model can explain and simulate all the long-range apparent motion percepts that this chapter describes.
      || Motion BCS model (Grossberg, Rudd 1989, 1992) Level 1: discount illuminant; Level 2: short-range filter, pool sustained simple cell inputs with like-oriented receptive fields aligned in a given direction. Sensitive to direction-of-contrast; Level 3: Transient celss with unoriented receptive field. Sensitive to direction-of-change
    865. image p306fig08.26 The 3D FORMOTION model combines mechanisms for determining the relative depth of a visual form with mechanisms for both short-range and long-range motion filtering and grouping. A formotion interaction from V2 to MT is predicted to enable the motion stream to track objects moving in depth.
      || 3D Formotion model (Chey etal 1997; Grossberg etal 2001; Berzhanskaya etal 2007). Form [LGN contours -> simple cells orientation selectivity -> complex cells (contrast pooling, orientation selectivity, V1) -> hypercomplex cells (end-stopping, spatial sharpening) <-> bipole cells (grouping, cross-orientation competition) -> depth-separated boundaries (V2)], Motion: [LGN contours -> transient cells (directional stability, V1) -> short-range motion filter -> spatial competition -> long-range motion filter and boundary selection in depth (MT) <-> directional grouping, attentional priming (MST)]
    866. image p307fig08.27 The distribution of transients through time at onsets and offsets of Ternus display flashes helps to determine whether element motion or group motion will be perceived.
      || Ternus motion. Element motion: zero or weak transients at positions 2 and 3; Group motion: strong transients at positions 2 and 3. Conditions that favor visual persistence and thus perceived stationarity of element (2,3) favor element motion (Braddick, Adlard 1978; Breitmeyer, Ritter 1986; Pantle, Peteresik 1980)
    867. image p308fig08.28 The Gaussian distributions of activity that arise from the three simultaneous flashes in a Ternus display add to generate a maximum value at their midpoint. The motion of this group gives rise to group motion.
      || Ternus group motion simulation. If L < 2*K, Gaussian filter of three flashes forms one global maximum.
    868. image p310fig08.29 When the individual component motions in (A) and (B) combine into a plaid motion (C), both their perceived direction and speed changes.
      ||
    869. image p311fig08.30 The data of (Castet etal 1993) in the left image was simulated in the right image by the 3D FORMOTION model that I developed with my PhD student Jonathan Chey. These data provide insight into how feature tracking signals propagate from the ends of a line to its interior, where they capture consistent motion directional signals and inhibit inconsistent ones.
      || Solving the aperture problem. A key design problem: How do amplified feature tracking signals propagate within depth to select the cirrect motion directions at ambiguous positions? This propagation from feature tracking signals to the line interior determines perceived speed in Castet etal data, which is why speed depends on line tilt and length. Data: (Castet etal 1993), Simulation: (Chey etal 1997)
    870. image p311fig08.31 Processing stages of the Motion BCS convert locally ambiguous motion signals from transient cells into a globally coherent percept of object motion, thereby solving the aperture problem.
      || Why are so many motion processing stages needed? change sensitive receptors -> directional transient cells -> directional short-range filter -> spatial and directional competition -> directional long-range filter (MT) <-> Directional grouping network
    871. image p312fig08.32 Schematic of motion filtering circuits.
      || Level 1: Change sensitive units -> Level 2: transient cells -> Level 3: short-range spatial filters -> Level 4: intra-scale competition -> Level 5: inter-scale competition
    872. image p312fig08.33 Processing motion signals by a population of speed-tuned neurons.
      ||
    873. image p314fig08.34 The VISTARS model for visually-based spatial navigation. It uses the Motion BCS as a front end and feeds it output signals into two computationally complementary cortical processing streams for computing optic flow and target tracking information.
      || VISTARS navigation model (Browning, Grossberg, Mingolia 2009). Use FORMOTION model as front end for higher level navigational circuits: input natural image sequences -> estimate heading (MT+)-MSTd -> additive processing -> estimate object position (MT-)-MSTv direction and speed subtractive processing -> Complementary Computing. [optic flow navigation, object tracking]
    874. image p315fig08.35 The output signals from the directional grouping network obey the ART Matching Rule. They thereby select consistent motion directional signals while suppressing inconsistent ones, and do not distort that the spared cells code. The aperture problem is hereby solved by the same mechanism that dynamically stabilizes the learning of directional grouping cells.
      || How to select correct direction and preserve speed estimates? Prediction: Feedback from MSTv to MT- obeys ART Matching Rule; Top-down, modulatory on-center, off-surround network (Grossberg 1976, 1980; Carpenter, Grossberg 1987, 1991); Explains how directional grouping network can stably develop and how top-down directional attention can work. (Cavanough 1992; Goner etal 1986; Sekuer, Ball 1977; Stelmach etal 1994). Directional grouping network (MSTv) <-> Directional long-range filter (MT). Modulatory on-center selects chosen direction and preserves speed. Off-surround inhibits incompatible directions.
    875. image p316fig08.36 How the directional grouping network, notably properties of the ART Matching Rule, enables a small set of amplified feature tracking signals at the ends of a line to select consistent directions in the line interior, while suppressing inconsistent directions.
      || Motion capture by directional grouping feedback. Directional grouping network (MSTv) <-> Directional long-range filter (MT). It takes longer to capture ambiguous motion signals in the line interior as the length of the line increases cf (Castet etal 1993)
    876. image p317fig08.37 Processing stages that transform the transient cell inputs in response to a tilted moving line into a global percept of the object's direction of motion. The orientations of the lines denote the directional preferences of the corresponding cells, whereas line lengths are proportional to cell activities.
      || Motion capture by directional grouping feedback (Chey, Grossberg, Mingolia 1997). thresholded short-range filter outputs, directional long-range filter cell activities a 3 times, directional short-range filter cells, directionally-sensitive transient cells
    877. image p319fig08.38 The neurophysiological data from MT (left image) confirms the prediction embodied in the simulation of MT (right image) concerning the fact that it takes a long time for MT to compute an object's real direction of motion.
      || Solving the aperture problem takes time. MT Data (Pack, Born 2001), MT simulation (Chey, Grossberg, Mingolia 1997)
    878. image p320fig08.39 Simulation of the barberpole illusion direction field at two times. Note that the initial multiple directions due to the feature tracking signals at the contiguous vertical and horizontal sides of the barberpole (upper image) get supplanted by the horizontal direction of the two horizontal sides (lower image).
      || Barberpole illusion (one line) simulation
    879. image p321fig08.40 Visible occluders capture the boundaries that they share with moving edges. Invisible occluders do not. Consequently, the two types of motions are influenced by different combinations of feature tracking signals.
      || Motion grouping across occluders (J. Lorenceau, D. Alais 2001). Rotating contours observed through apertures. Determine direction of a circular motion. [, in]visible occluders http://persci.mit.edu/demos/square/square.html
    880. image p322fig08.41 A percept of motion transparency can be achieved by using motion grouping feedback that embodies the "asymmetry between near and far" along with the usual opponent competition between opposite motion directions.
      || Motion transparency. near: big scale; far: small scale MSTv, "Asymmetry between near and far" Inhibition from near (large scales) to far (small scales) at each position
    881. image p323fig08.42 The chopsticks illusion not only depends upon how feature tracking signals are altered by visible and invisible occluders, but also upon how the form system disambiguates the ambiguous region where the two chopsticks intersect and uses figure-ground mechanisms to separate them in depth.
      || Chopsticks: motion separation in depth (Anstis 1990). [, in]visible occluders [display, percept]
    882. image p324fig08.43 Attention can flow along the boundaries of one chopstick and enable it to win the orientation competition where the two chopsticks cross, thereby enabling bipole grouping and figure-ground mechanisms to separate them in depth within the form cortical stream.
      || The ambiguous X-junction. motion system. Attention propagates along chopstick and enhances cell activations in one branch of a chopstick. MT-MST directional motion grouping helps to bridge the ambiguous position.
    883. image p325fig08.44 Attentional feedback from MST-to-MT-to-V2 can strengthen one branch of a chopstick (left image). Then bipole cell activations that are strengthened by this feedback can complete that chopstick's boundaries across the ambiguous X region (right image).
      || The role of MT-V1 feedback. Motion-form feedback: MT-to-V2 feedback strengthens boundaries of one bar. Bipole boundary completion: Bipole grouping helps to complete bar boundary even if motion grouping does not cross the gap.
    884. image p325fig08.45 The feedback loop between MT/MST-to-V1-to-V2-to-MT/MST enables a percept of two chopsticks sliding one in front of the other while moving in opposite directions.
      || Closing formotion feedback loop. [formotion interaction, motion grouping] V1 -> V2 -> (MT <-> MST) -> V1
    885. image p326fig08.46 How do we determine the relative motion direction of a part of a scene when it moves with a larger part that determines an object reference frame?
      || How do we perceive relative motion of object parts?
    886. image p327fig08.47 Two classical examples of part motion in a moving reference frame illustrate the general situation where complex objects move while their multiplie parts may move in different directions relative to the direction of the reference frame.
      || Two kinds of percepts and variations (Johansson 1950). Symmetrically moving inducers: each do moves along a straight path, each part contributes equally to common motion; Duncker wheel (Duncker 1929): one dot moves on a cycloid, the other dot (the "center") moves stright, unequal contributipon from parts; If the dot is presented alone: seen as cycloid; if with center: seen as if it were on the rim of a wheel.
    887. image p328fig08.48 How vector subtraction from the reference frame motion direction computes the part directions.
      || How vector decomposition can explain them. Common motion subtracted from retinal motion gives part motion: [retinal, common, part] motion
    888. image p328fig08.49 A directional peak shift in a directional hypercolumn determines the part directions relative to a moving reference frame.
      || What is the mechanism of vector decomposition? (Grossberg, Leveille, Versace 2011). Prediction: directional peak shift! ...specifically, a peak shift due to Gaussian lateral inhibition. [retinal, part, common, relative] motion. shunting dynamics, self-normalization, contrast gain control
    889. image p329fig08.50 The common motion direction of the two dots builds upon illusory contours that connect the dots as they move through time. The common motion directin signal can flow along these boundaries.
      || How is common motion direction computed? retinal motion. Bipole grouping in the form stream creates illusory contours between the dots. V2-MT formotion interaction injects the completed boundaries into the motion stream where they capture consistent motion signals. Motion of illusory contours is computed in the motion stream: cf. Ramanchandran
    890. image p329fig08.51 Large and small scale boundaries differentially form illusory contours between the dots and boundaries that surround each of them respectively. These boundaries capture the motion signals that they will support via V2-to-MT formotion interaction. The MST-to-MT directional peak shift has not yet occurred.
      || Large scale: near. Can bridge gap between dots to form illusory contours. Spatial competition inhibits inner dot boundaries.; Small scale: far. Forms boundaries around dots.
    891. image p330fig08.52 Direction fields of the object frame (left column) and of the two dot "parts" (right column) show the correct motion directions after the peak shift top-down expectation acts.
      || Simulation of motion vector decomposition. [Larger scale (nearer depth), Small scale (farther depth)] vs [Down, Up]
    892. image p330fig08.53 Simulation of the various directional signals of the left dot through time. Note the amplification of the downward directional signal due to the combined action of the short-range and long-range directional signals.
      ||
    893. image p331fig08.54 The simulated part directions of the rotating dot through time after the translational motion of the frame does its work via the top-down peak shift mechanism.
      || Cycloid. Motion directions of a single dot moving slowly along a cycloid curve through time.
    894. image p331fig08.55 The rightward motion of the dot that determines the frame propagates along the illusory contour between the dots and thereby dominates the motion directions along the rim as well, thereby setting the stage for the peak shift mechanism.
      || Duncker Wheel: large scale. [cyc;oid, center] velocity -> rightward common velocity. Stable rightward motion at the center captures motion at the rim.
    895. image p332fig08.56 Simulation of the Duncker Wheel motion through time. See the text for details.
      || Duncker Wheel: small scale. Temporal procession of activity in eight directions. Wheel motion as seen when directions are collapsed.
    896. image p332fig08.57 The MODE model uses the Motion BCS as its front end, followed by a saccadic target selection circuit in the model LIP region that converts motion directions into movement directions. These movement choices are also under basal ganglia (BG) control. More will be explained about the BG in Chapters 13 and 15.
      || MODE (MOtion DEcision) model (Grossberg, Pilly 2008, Vision Research). Change sensitive receptors -> directional transient cells -> directiponal short-range filter -> spatial and directional competition -> directional long-range filter (MT) <-> directional grouping network (MSTv) -> saccadic target selection <-> gsting mechanism (BG). Representation of problem that solves the aperture problem (change sensitive receptors (CSR) -> directional grouping network (DGN, MSTv)). Gated movement choice (saccadic target selection & gating mechanism)
    897. image p333fig08.58 Neurophysiological data (left image) and simulation (right image) of LIP data during correct trials on the RT task. See the text for details.
      || LIP responses during RT task correct trials (Roltman, Shadlen 2002). More coherence in favored direction causes faster cell activation. More coherence in opposite direction causes faster cell inhibition. Coherence stops playing a role in the final stages of LIP firing.
    898. image p334fig08.59 Neurophysiological data (left column) and simulations (right column) of LIP responses for the FD task during both [correct, error] trials. See the text for details.
      || LIP responses for the FD task during both [correct, error] trials (Shadlen, Newsome 2001). LIP encodes the perceptual decision regardless of the true direction of the dots. Predictiveness of LIP responses on error trials decreases with increasing coherence.
    899. image p334fig08.60 Behavioral data (left image) and simulation (right image) about accuracy in both the RT and FD tasks. See text for details
      || Behavioral data: % correct vs % coherence (Mazurek etal 2003; Roltman, Shadien 2002). More coherence in the motion causes more accurate decisions. RT task accuracy at weaker coherence levels is slightly better than FD task accuracy.
    900. image p335fig08.61 Behavioral data (left image) and simulation (right image) about speed in correct and error trials of the RT task. See text for details.
      || Behavioral data: speed, correct and error trials (RT task) (Roltman, Shadien 2002). More coherence in the motion causes faster reaction time.
    901. image p335fig08.62 More remarkable simulation fits (right column) to LIP neurophysiology data (left column) about where and when to move the eyes.
      || LIP encodes not only where, but also when, to move the eyes. ...No Bayes(Roltman, Shadien 2002). Firing rate (sp/s) vs time (ms). Slope of firing rate (sp/s^2) vs % correct.
    902. image p338fig09.01 The brain regions that help to use visual information for navigating in the world and tracking objects are highlighted in yellow.
      || How does a moving observer use optic flow to navigate while tracking a moving object? [What ventral, Where dorsal] retina -> many locations -> PFC
    903. image p338fig09.02 Heading, or the direction of self-motion (green dot), can be derived from the optic flow (red arrows) as an object, in this case an airplane landing, moves forward.
      || Heading and optic flow (Gibson 1950). Optic flow: scene motion generates a velocity field. Heading: direction of travel- self-motion direction. Heading from optic flow, focus of expansion (Gibson 1950). Humans determine heading accurately to within 1-2 degrees.
    904. image p339fig09.03 When an observer moves forward, an expanding optic flow is caused. Eye rotations cause a translating flow. When these flows are combined, a spiral flow is caused. How do our brains compensate for eye rotations to compute the heading of the expanding optic flow?
      || Optic flow during navigation (adapted from Warren, Hannon 1990) [observer, retinal flow]: [linear movement, expansion], [eye rotation, translation], [combined motion, spiral]
    905. image p339fig09.04 This figure emphasizes that the sum of the expansion and translation optic flows is a spiral optic flow. It thereby raises the question: How can the translation flow be subtracted from the spiral flow to recover the expansion flow?
      || Eye rotations add a uniform translation to an flow field. Resulting retinal patterns are spirals. Expansion + translation = spiral
    906. image p340fig09.05 An outflow movement command, also called efference copy or corollary discharge, is the souce ot the signals whereby the commanded eye movement position is subtracted from spiral flow to recover expansion flow and, with it, heading.
      || Subtracting efference copy. Many experiments suggest that the brain internally subtracts the translational component due to eye movements. Efference copy subtracts the translational component using pathways that branch from outflow movement commands to the eye muscles.
    907. image p340fig09.06 Corollary discharges are computed using a branch of the outflow movement commands that move their target muscles.
      ||
    908. image p340fig09.07 Log polar remapping from the retina to cortical area V1 and beyond converts expansion, translation, and spiral flows on the retina into parallel flows, with different orientations, on the cortical map.
      || Log polar remapping of optic flow. retina -> cortex. Any combination of expansion and circular motion centered on the fovea maps to cortex as a single direction. Retinal Cartesian coordinates (x,y) map to cortical polar coordinates (r,theta). This makes it easy to compute directional receptive fields in the cortex!
    909. image p341fig09.08 How the various optic flows on the retina are mapped through V1m MT, and MSTd to then compute heading in parietal cortex was modeled by (Grossberg, Mingolia, Pack 1999), using the crucial transformation via V1 log polar mapping into parallel cortical flow fields.
      || MSTd model (Grossberg, Mingolia, Pack 1999). Retinal motion -> V1 log polar mapping -> Each MT Gaussian RF sums motion in preferred direction -> Each MSTd cell sums MT cell inputs with same log polar direction -> Efference copy subtracts rotational flow from MSTd cells.
    910. image p341fig09.09 Responses of MSTd cells that are used to compute heading. See the text for details.
      || Cortical area MSTd (adapted from Graziano, Anderson, Snowden 1994). MSTd cells are sensitive to spiral motion as combinations of rotation and expansion.
    911. image p342fig09.10 Model simulations of how the peak of MSTd cell activation varies with changes of heading.
      || Heading in log polar space: Retina -> log polar -> MSTd cell. Log polar motion direction correlates with heading eccentricity.
    912. image p342fig09.11 Psychophysical data (left panel) and computer simulation (right column) of the importance of efference copy in real movements. See the text for details.
      || Heading: move to wall and fixate stationary object (adapted from Warren, Hannon 1990). Inaccurate for simulated eye rotation, accurate for real eye rotation, need confirmation by efference copy!
    913. image p343fig09.12 Transforming two retinal views of the Simpsons into log polar coordinates dramatizes the problem that our brains need to solve in order to separate, and recognize, overlapping figures.
      || View 1 cortical magnification. View 2 How do we know if we are still fixating on the same object?!
    914. image p343fig09.13 When one scans the three different types of pears in the left image, as illustrated by the jagged blue curve with red movement end positions, and transforms the resulting retinal images via the cortical magnification factor, or log polar mapping, the result is the series of images in the right column. How do our brains figure out from such confusing data which views belong to which pear?
      || View-invariant object learning and recognition Three pears: Anjou, Bartlett, Comice. Which is the Bartlett pear? During unsupervised scanning and learning about the world, no one tells the brain what views belong to which objects while it learns view-invariant object categories. Cortical magnificantion in V1.
    915. image p344fig09.14 (top row, left column) By fitting MT tuning curves with Gaussian receptive fields, a tuning width of 38° is estimated, and leads to the observed standard spiral tuning of 61° in MSTd. (bottom row, left column) The spiral tuning estimate in Figure 9.16 maximizes the position invariant of MSTd receptive fields. (top row, right column) Heading sensitivity is not impaired by these parameter choices.
      || [Spiral tuning (deg), position invariance (deg^(-1)), heading sensitivity] versus log polar direction tuning σ (deg)
    916. image p345fig09.15 Double opponent directional receptive fields in MT are capable of detecting the motion of objects relative to each other and their backgrounds.
      || Motion opponency in MT (Born, Tootell 1992). Motion opponent (Grossberg etal), Differential motion (Royden etal), Subtractive motion cells (Neumann etal). ON center directionally selective: [excit, inhibit]ed by motion in [one, opponent] direction. OFF surround directionally selective: [excit, inhibit]ed by motion in [opponent, center] direction.
    917. image p346fig09.16 A macrocircuit of some of the main brain regions that are used to move the eyes. Black boxes denote areas belonging to the saccadic eye movement systes (SAC), white boxes the smooth pursuit eye system (SPEM), and gray boxes, both systems. The abbreviations for the different brain regions are: LIP - Lateral Intra-Parietal area; FPA - Frontal Pursuit Area; MST - Middle Superior Temporal area; MT - Middle Temporal area; FEF - Frontal Eye Fields; NRPT - Nucleus Reticularis Tegmenti Pontis; DLPN - Dorso-Lateral Pontine Nuclei; SC - Superior Colliculus; CBM - CereBelluM; MVN/rLVN - Medial and Rostro-Lateral Vestibular Nucleii; PPRF - a Peri-Pontine Reticular Formation; TN - Tonic Neurons
      ||
    918. image p347fig09.17 The leftward eye movement control channel in the model that I developed with Christopher Pack. See the text for details.
      || retinal image -> MT -> MST[v,d] -> pursuit
    919. image p347fig09.18 These circuits between MSTv and MSTd enable predictive target tracking to be achieved by the pursuit system, notably when the eyes are successfully foveating a moving target. Solid arrows depict excitatory connections, dashed arrows depict inhibitory connections.
      ||
    920. image p348fig09.19 How a constant pursuit speed that is commanded by MSTv cells starts by using target speed on the retina and ends by using backgound speed on the retina in the reverse direction during successful predictive pursuit.
      || target speed on retina, background speed on retina, pursuit speed command by MSTV cells
    921. image p349fig09.20 Using virtual reality displays (left image), (Fajen, Warren 2003) collected data (right two images) about how observers avoid obstacles (open circular disks) as a function of their distance and angular position as they navigate towards a fixed goal (x). These data illustrate how goals act as attractors while obstacles act as repellers.
      || Steering from optic flow (Fajen, Warren 2003). goals are attractors, obstacles are repellers. Damped spring model explains human steering data.
    922. image p349fig09.21 How attractor-repeller dynamics with Gaussians change the net steering gradient as the goal is approached.
      || Steering dynamics: goal approach. body-centered coordinates [obstacle, goal, heading] -> steering
    923. image p350fig09.22 How the negative Gaussian of an obstacle causes a peak shift to avoid the obstacle without losing sight of how to reach the goal.
      || Steering dynamics: obstacle avoidance. body-centered coordinates [obstacle, goal, heading] -> steering
    924. image p350fig09.23 Unidirectional transient cells respond to changes in all image contours as an auto navigates and urban scene while taking a video of it.
      || Unidirectional transient cells (Baloch, Grossberg 1997; Berzhanskaya, Grossberg, Mingolia 2007). Transient cells respond to leading and trailing boundaries. Transient cells response, driving video
    925. image p351fig09.24 Directional transient cells respond most to motion in their preferred directions.
      || Directional transient cells. 8 directions, 3 speeds
    926. image p351fig09.25 By the time MT+ is reached, directional transient cells and directional filters have begun to extract more global directional information from the image.
      || M+ computes global motion estimate. Estimate global motion from noisy local motion estimates.
    927. image p352fig09.26 The final stage of the model computes a beautiful expansion optic flow that permits an easy estimate of the heading direction, with an accuracy that matches that of human navigators.
      || The model generates accurate heading (Warren, Hannon 1990; Royden, Crowell, Banks 1994). Maximally active MSTd cell = heading estimate. Accuracy matches human data. Random dots [mean +-1.5°, worst +-3.8°], Random dots with rotation [accurate with rotations <1°/s, rotation increases, error decreases], OpenGL & Yosemite benchmark +-1.5°, Driving video +-3°.
    928. image p354fig10.01 The lamiar cortical circuit that realizes how we pay attention to an object sends signals from layer 6 of a higher cortical level to layer 6 of a lower cortical level and then back up to layer 4. This "folded feedback" circuit realizes a top-down, modulatory on-center, off-surround circuit that realizes the ART Matching Rule.
      || Top-down attention and folded feedback. Attentional signals also feed back into 6-to-4 on-center off-surround. 1-to-5-to-6 feedback path: Macaque (Lund, Booth 1975) cat (Gilbert, Wiesel 1979). V2-to-V1 feedback is on-center off-surround and affects layer 6 of V1 the most (Bullier etal 1996; Sandell, Schiller 1982). Attended stimuli enhanced, ignored stimuli suppressed. This circuit supports the predicted ART Matching Rule! [LGN, V[1,2][6->1]]
    929. image p355fig10.02 Distinguishing processes of seeing vs knowing has been difficult because they interact so strongly.
      || Seeing vs. Knowing. Seeing and knowing [operate at different levels of the brain, use specialized circuits], but they [interact via feedback, use similar cortical designs, feedback is needed for conscious perception]. Cerebral Cortex: Seeing [V1-V4, MS-MST], Knowing [IT, PFC].
    930. image p356fig10.03 Laminar computing achieves at least three basic properties of visual processing that have analogs in all biologically intelligent behaviors. These properties may be found in all cortical circuits in specialized form.
      || What does Laminar Computing achieve? 1. Self-stabilizing development and learning; 2. Seamless fusion of a) pre-attentive automatic bottom-up processing, b) attentive task-selective top-down processing; 3. Analog coherence: Solution of Binding Problem for perceptual grouping without loss of analog sensitivity. Even the earliest visual cortical stages carry out active adaptive information processing: [learn, group, attention]ing
    931. image p357fig10.04 Laminar Computing achieves its properties by computing in a new way that sythesizes the best properties of feedforward and feedback interactions, analog and digital computations, and preattentive and attentive learning. The property of analog coherence enables coherent groupings and decisions to form without losing sensitivity to the amount of evidence that supports them.
      || Laminar Computing: a new way to compute. 1. Feedforward and feedback: a) Fast feedforward processing when data are unambiguous (eg Thorpe etal), b) slower feedback chooses among ambiguous alternatives [self-normalizing property, real-time probabiligy theory], c) A self-organizing system that trades certainty against speed: Goes beyond Bayesian models! 2. Analog and Digital: Analog Coherence combines the stability of digital with the sensitivity of analog. 3. Preattentive and Attentive Learning: Reconciles the differences of (eg) Helmholtz and Kanizsa, "A preattentive grouping is its own 'attentional' prime"
    932. image p359fig10.05 Activation of V1 is initiated, in part, by direct excitatory signals from the LGN to layer 4 of V1.
      || How are layer 2/3 bipole cells activated? Direct bottom-up activation of layer 4. LGN -> V1 layer 4. Strong bottom-up LGN input to layer 4 (Stratford etal 1996; Chung, Ferster 1998). Many details omitted.
    933. image p359fig10.06 Another, albeit indirect, pathway from LGN exists that can also excite layer 4 of V1. Why are not these two pathways redundant? The answer, ultimately, how to do with how cortex learns, as well as with how it pays attention. See the text for details.
      || Another bottom-up input to layer 4: Why?? Layer 6-to-4 on-center off-surround (Grieve, Sillito 1991, 1995; Ahmedetal 1994, 1997). LGN projects to layers 6 and 4. Layer 6 excites spiny stellates in column above it. Medium range connections onto inhibitory neurons. 6-t-4 path acts as on-center off-curround.
    934. image p359fig10.07 The two bottom-up pathways from LGN to layer 4 of V1 can together activate layer 4 and contrast-normalize layer 4 responses.
      || Bottom-up contrast normalization (Grossberg 1968, 1973; Sperling, Sondhi 1968; Heeger 1992; Douglas etal 1995; Shapley etal 2004). Together, direct LGN-to-4 path and 6-to-4 on-center off-surround provide contrast normalization if cells obey shunting or membrane equation dynamics.
    935. image p360fig10.08 The bottom-up on-center off-surround from LGN-to-6-to-4 has a modulatory on-center because of its role in realizing the ART Matching Rule and, with it, the ability of the cortex to dynamically stabilize its learned memories.
      || Modulation of priming by 6-to-4 on-center (Stratford etal 1996; Callaway 1998). On-center 6-to-4 excitation is inhibited down to being modulatory (priming, subthreshold). On-center 6-to-4 excitation cannot activate layer 4 on its own. Clarifies need for direct path. Prediction: plays key role in stable grouping, development and learning. ART Matching Rule!
    936. image p360fig10.09 Perceptual grouping is carried out in layer 2/3 by long-range horizontal excitatory recurrent connections, supplemented by short-range disynaptic inhibitory connections that together realize the bipole grouping properties that are diagrammed in Figure 10.10.
      || Grouping starts in layer 2/3. LGN-> 6-> 4-> 2/3: 1. Long-range horizontal excitation links collinear, coaxial receptive fields (Gilbert, Wiesel 1989; Bosking etal 1997; Schmidt etal 1997) 2. Short-range disynaptic inhibition of target pyramidal via pool of intraneurons (Hirsch, Gilbert 1991) 3. Unambiguous groupings can form and generate feedforward outputs quickly (Thorpe etal 1996).
    937. image p361fig10.10 Bipole grouping is achieved by long-range horizontal recurrent connections that also give rise to short-range inhibitory interneurons which inhibit nearby bipole cells as well as each other.
      || Bipole property controls perceptual grouping. Collinear input on both sides. Excitatory inputs summate. Inhibitory inputs normalize, Shunting inhibition! Two-against-one. Cell is excited.
    938. image p362fig10.11 Feedback between layer 2/3 to the layer 6-to-4-to-2/3 feedback loop chooses the strongest grouping in cases where there is more than one. If only one grouping exists, then the circuit can function very quickly in a feedforward manner. When multiple groupings exist, the cortex "runs as fast as it can" to select the one with the most evidence to support it using the self-normalizing inhibition in the layer 6-to-4 off-surround.
      || How is the final grouping selected? Folded feedback LGN-> 6-> 4-> 2/3. 1. Layer 2/3 groupings feed back into 6-to-4 on-center off-surround: a) direct layer 2/3 -to-6 path; b) can also go via layer 5 (Blasdel etal 1985; Kisvarday etal 1989). 2. Strongest grouping enhanced by its on-center. 3. Inputs to weaker groupings suppressed by off-surround. 4. Interlaminar feedback creates functional columns. Activities of conflicting groupings are reduced by self-normalizing inhibition, slowing processing; intracortical feedback selects and contrast-enhances the winning grouping, speeding processing.
    939. image p363fig10.12 The same laminar circuit design repeats in V1 and V2, albeit with specializations that include longer horizontal grouping axoms and figure-ground separation interactions.
      || V2 repeats V1 circuitry at larger spatial scale, LGN-> V1[6,4,2/3]-> V2[6,4,2/3]. V2 layer 2/3 horizontal axons longer-range than in V1 (Amir etal 1993). Therefore, longer-range groupings can form in V2 (Von der Heydt etal 1984)
    940. image p364fig10.13 The bottom-up adaptive filter, intracortical grouping circuit, and intercortical top-down attentional circuit all use the same competitive decision circuit between layers 6 and 4, called the attention-preattention interface, with which to select the featural patterns that will be processed.
      || Bottom-up filters and intracortical grouping feedback use the same 6-to-4 decision circuit, LGN-> Vx[6,4,2/3]. competitive decision circuit, modulatory on-center off-surround network. Top-down intercortical attention also uses the same 6-to-4 decision circuit!
    941. image p364fig10.14 This figure emphasizes how preattentive intracortical groupings and top-down intercortical attention share the same modulatory on-center, off-surround layer 4-to-6 decision circuit.
      || Explanation: grouping and attention share the same modulatory decision circuit. Layer 6-6-4-2/3 pathway shown; also a layer 6-1-2/3 path. intercortical attention, both act via a modulatory on-center off-surround decision circuit, intracortical feedback from groupings
    942. image p367fig10.15 Data (left column) and simulation (right column) of how attention prevents a masking stimulus from inhibiting the response to the on-center of the cell from which the recording was made.
      || Attention protects target from masking stimulus (Reynolds etal 1999; Grossberg, Raizada 2000).
    943. image p367fig10.16 Neurophysiological data (left image) and simulation (right image) of how a low-contrast target can be facilitated if it is surrounded by a paid (31May2023 Howell - is word correct?) of collinear flankers, and suppresssed by them if it has high contrast.
      || Flankers can enhance or suppress targets (Polat etal 1998; Grossberg, Raizada 2000). target alone, target + flankers, flankers alone.
    944. image p368fig10.17 Neurophysiological data (left image) and simulation (right image) showing that attention has a greater effect on low contrast than high contrast targets.
      || Attention has greater effect on low contrast targets (DeWeerd etal 1999; Raizada, Grossberg 2001). Threshold increase (deg) vs Grating contrast (%), [no, with] attention
    945. image p368fig10.18 Neurophysiological data (left image) and simulation (right image) of relative on-cell activities when the input to that cell may also be surroubded by iso-orientation or perpendicular textures.
      || Texture reduces response to a bar: iso-orientation suppression (Knierim, van Essen 1992), perpendicular suppression (Raizada, Grossberg 2001)
    946. image p369fig10.19 Data from (Watanabe etal 2001) showing perceptual learning of the coherent motion direction, despite the lack of extra-foveal attention and awareness of the moving stimuli.
      || Unconscious perceptual learning of motion direction, % correct for two tests, compared to chance level results.
    947. image p371fig11.01 FACADE theory explains how the 3D boundaries and surfaces are formed with which we see the world in depth.
      || 3D Vision and figure-ground perception (Grossberg 1987, 1994, 1997). How are 3D boundaries and 3D surfaces formed? How the world looks without assuming naive realism. Form And Color And DEpth theory (FACADE). Prediction: Visible figure-ground-separated Form-And-Color-And-DEpth are represented in cortical area V4.
    948. image p372fig11.02 FACADE theory explains how multiple depth-selective boundary representations can capture the surface lightnesses and colors at the correct depths. The fact that both surface qualia and depth are determined by a single process implies that, for example, a change in brightness can cause a change in depth.
      || 3D surface filling-in. From filling-in of surface lightness and color to filling-in of surface depth. Prediction: Depth-selective boundary-gated filling-in defines the 3D surfaces that we see. Prediction: A single process fills-in lightness, color, and depth. Can a change in brightness cause a change in depth? YES! eg proximity-luminance covariance (Egusa 1983, Schwartz, Sperling 1983). Why is depth not more unstable when lighting changes? Prediction: Discounting the illuminant limits variability.
    949. image p373fig11.03 Both contrast-specific binocular fusion and contrast-invariant boundary perception are needed to properly see the world in depth.
      || How to unify contrast-specific binocular fusion with contrast-invariant boundary perception? Contrast-specific binocular fusion: [Left, right] eye view [, no] binocular fusion. Contrast-invariant boundary perception: contrast polarity along the gray square edge reverses; opposite polarities are pooled to form object boundary.
    950. image p374fig11.04 The three processing stages of monocular simple cells, and complex cells accomplish both contrast-specific binocular fusion and contrast-invariant boundary perception.
      || Model unifies contrast-specific binocular fusion and contrast-invariant boundary perception (Ohzawa etal 1990; Grossberg, McLoughlin 1997). [Left, right] eye V1-4 simple cells-> V1-3B simple cells-> V1-2/3A complex cells. Contrast-specific stereoscopic fusion by disparity-selective simple cells. Contrast-invariant boundaries by pooling opposite polarity binocular simple cells at complex cells layer 2/3A.
    951. image p374fig11.05 The brain uses a contrast constraint on binocular fusion to help ensure that only contrasts which are derived from the same objects in space are binoculary matched.
      || Contrast constraint on binocular fusion. Left and right input from same object has similar contrast, Percept changes when one contrast is different. Fusion only occurs between bars of similar contrast (McKee etal 1994)
    952. image p375fig11.06 The contrast constraint on binocular fusion is realized by obligate cells in layer 3B of cortical area V1.
      || Model implements contrast constraint on binocular fusion (cf. "obligate" cells Poggio 1991). An ecological constraint on cortical development. [left, right] eye cart V1-[4 monocular simple, 3B binocular simple, complex2/3A] cells. Inhibitory cells (red) ensure that fusion occurs when contrasts in left and right eye are approximately equal.
    953. image p375fig11.07 The 3D LAMINART model uses both monocular and binocular simple cells to binocularly fuse like image contrasts. The remainder of the model generates 3D boundary and surface representations of multiple kinds of experiments as well as of natural scenes.
      || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A], V2 thin stripe [4->2/3A], V4]. V1 blob [V1-4 monocular, V1 interior binocular] simple cells. [complex, simple, inhibitory] cells, on-center off-surround
    954. image p376fig11.08 The contrast constraint on binocular fusion is not sufficient to prevent many of the false binocular matches that satisfy this constraint.
      || How to solve the correspondance problem? How does the brain inhibit false matches? Contrast constraint is not enough. [stimulus, multiple possible binocular matches] - Which squares in the two retinal images must be fused to form the correct percept?
    955. image p376fig11.09 The disparity filter in V2 helps to solve the correspondence problem by eliminating spurious contrasts using line-of-sight inhibition.
      || Model V2 disparity filter solves the correspondence problem. An ecological constraint on cortical development. [left, right] eye view: False matches (black) suppressed by line-of-sight inhibition (green lines). "Cells that fire together wire together".
    956. image p376fig11.10 The 3D LAMINART model shows how the disparity filter can be integrated into the circuit that completes 3D boundary representations using bipole grouping cells. It also explains how surface contours can strengthen boundaries that succeed in generating closed filling-in domains.
      || 3D LAMINART model (Cao, Grossberg 2005). [left, right] eye cart [LGN, V1 blob [4->3B->2/3A] surface contour, V2 thin stripe (monocular surface) [4->2/3A], V2 interior [disynaptic inhibitory interneurons, bipole grouping cells, disparity filter, V4 binocular surface]. [complex, simple, inhibitory] cells, on-center off-surround
    957. image p377fig11.11 DaVinci stereopsis phenomena occur when only one eye can receive visual inputs from part of a 3D scene due to occlusion by a nearer surface.
      || How does monocular information contribute to depth perception? DaVinci steropsis (Gillam etal 1999). Only by utilizing monocular information can visual system create correct depth percept. [left, right] eye view
    958. image p378fig11.12 How monocular and binocular information are combined in V1 and V2 in the 3D LAMINART model.
      || Model utilizes monocular information. [left, right] eye cart V1-[4 monocular simple, 3B binocular simple, complex2/3A [mo,bi]nocular] cells, V2-4 binocular complex cells. black = monocular cells, blue = binocular cells. In V2, monocular inputs add to binocular inputs along the line of sight and contribute to depth perception.
    959. image p379fig11.13 How the 3D LAMINART model explains DaVinci stereopsis. All the stages of boundary and surface formation are color coded to clarify their explanation. Although each mechanism is very simple, when all of them act together, the correct depthful surface representation is generated. See the text for details.
      || DaVinci stereopsis (Nakayama, Shimojo 1990). An emergent property of the previous simple mechanisms working together. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] pair [Binocular match: boundaries of thick bar -> Add monocular boundaries along lines-of-sight -> Line-of-sight inhibition kills weaker vertical boundaries -> 3D surface percept not just a disparity match!] pair [Binocular match: right edge of thin and thick bars -> Strongest boundaries: binocular and monocular boundaries add -> Vertical boundaries from monocular left edge of thin bar survive -> Filling-in contained by connected boundaries]. cart [very near, near, fixation plane, far, very far]
    960. image p380fig11.14 The model explanation of DaVinci stereopsis when the input stimuli have opposite contrast polarities.
      || Polarity-reversed Da Vinci stereopsis (Nakayama, Shimojo 1990). Same explanation! (... as Figure 11.13 ...) [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    961. image p381fig11.15 The same model mechanisms explain the surface percept that is generated by the variant of DaVinci stereopsis that Gillam, Blackburn, and Nakayama studied in 1999.
      || DaVinci stereopsis (Gillam, Blackburn, Nakayama 1999). same model mechanisms. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    962. image p382fig11.16 The version of DaVinci steropsis wherein three narrow rectangles are binocularly matched with one thick rectangle can also be explained is a similar way.
      || DaVinci stereopsis of [3 narrow, one thick] rectangles (Gillam, Blackburn, Nakayama 1999). [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    963. image p383fig11.17 The bars in the left and right images that are in the same positions are marked in red to simplify tracking how they are processed at subsequent stages.
      || The Venetian blind effect (Howard, Rogers 1995). Every second bar on L in same position as every third bar on R. These bars are marked in red; see them match in Fixation Plane. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    964. image p384fig11.18 Surface and surface-to-boundary surface contour signals that are generated by the Venetian blind image.
      || Venetian blind effect (Howard, Rogers 1995). Every second bar on L in same position as every third bar on R. PERCEPT: 3-bar ramps sloping up from L to R with step returns. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    965. image p385fig11.19 Dichoptic masking occurs when the bars in the left and right images have sufficiently different contrasts.
      || Dichoptic masking (McKee, Bravo, Smallman, Legge 1994). [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    966. image p385fig11.20 Dichoptic masking occurs in Panum's limiting case for reasons explained in the text.
      || Dichoptic masking in Panum's limiting case (McKee, Bravo, Smallman, Legge 1995). Panum's limiting case is a simplified version of the Venetian blind effect! [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    967. image p386fig11.21 A simulation of the Craik-O'Brien-Cornsweet Effect when viewed on a planar surface in depth.
      || Craik-O'Brien-Cornsweet Effect. Can the model simulate other surface percepts? eg surface brightness. The 2D surface with the image on it is viewed at a very near depth. Adapts (Grossberg, Todovoric 1988) to 3D. [V1 binocular boundaries -> V2 initial boundaries -> V2 final boundaries -> V4 surface] cart [very near, near, fixation plane, far, very far]
    968. image p387fig11.22 Simulation of the boundaries that are generated by the Julesz stereogram in Figure 4.59 (top row) without (second row) and with (third row) surface contour feedback.
      || Boundary cart [V2-2, V2, V1] cart [near, fixation, far]
    969. image p388fig11.23 Simulation of the surface percept that is seen in response to a sparse stereogram. The challenge is to assign large regions of ambiguous white to the correct surface in depth.
      || [left, right] retinal input. Surface [near, fixation, far] V4
    970. image p388fig11.24 Boundary groupings capture the ambiguous depth-ambiguous feature contour signals and lift them to the correct surface in depth.
      || [surface, boundary] cart [near, fixation, far] V2.
    971. image p389fig11.25 Boundaries are not just edge detectors. If they were, a shaded ellipse would look flat, and uniformly gray.
      || 3D vision and figure-ground separation. Multiple-scale, depth-selective boundary webs. [dark-light, light-dark] boundaries -> complex cells! If boundaries were just edge detectors, there would be just a bounding edge of the ellipse. After filling-in, it would look like this:.
    972. image p390fig11.26 Although larger scales sometimes look closer (left image), that is not always true, as the right image of (Brown, Weisstein 1988) illustrates. The latter percept is, moreover, bistable. These images show the importance of interactions between groupings and multiple scales to determine perceived surface depths.
      || Multiple-scale depth-selective groupings determine perceived depth (Brown, Weisstein 1988). As an object approaches, it gets bigger on the retina. Does a big scale (RF) always signal NEAR? NO! The same scale can signal either near or far. Some scales fuse more than one disparity.
    973. image p391fig11.27 (left image) Each scale can binocularly fuse a subset of spatial scales, with larger scales fusing more scales and closer ones than small scales. (right image) Cortical hypercolumns enable binocular fusion to occur in a larger scale even as rivalry occurs in a smaller scale.
      || Multiple-scale grouping and size-disparity correlation. Depth-selective cooperation and competition among multiple scales determines perceived depth: a) Larger scales fuse more depth; b) Simultaneous fusion and rivalry. Boundary prining using surface contours: Surface-to-boundary feedback from the nearest surface that is surrounded by a connected boundary eliminates redundant boundaries at the same position and further depths.
    974. image p391fig11.28 (left image) Ocular dominance columns respond selectively to inputs from one eye or the other. (right image) Inputs from the two eyes are mapped into layer 4C of V1, among other layers.
      || Cortex V1[1, 2/3, 4A, 4B, 4C, 5, 6], LGN
    975. image p392fig11.29 Boundary webs of the smallest scales are closer to the boundary edge of the ellipse, and progressively larger scale webs penetrate ever deeper into the ellipse image, due to the amount of evidence that they need to fire. Taken together, they generate a multiple-scale boundary web with depth-selective properties that can capture depth-selective surface filling-in.
      || 3D vision and figure-ground separation. Multiple-scale, depth-selective boundary webs. Instead, different size detectors generate dense boundary webs at different positions and depths along the shading gradient. Small-far, Larger-nearer, Largest-nearest. Each boundary web captures the gray shading in small compartments at its position and depths. A shaded percept in depth results.
    976. image p392fig11.30 Multiple scales interact with bipole cells that represent multiple depths, and conversely. See the text for details.
      || How multiple scales vote for multiple depths. Scale-to-depth and depth-to-scale maps. Smallest scale projects to, and receives feedback from, boundary groupings that represent the furthest depths. Largest scale connects to boundary groupings that represent all depths. multiple-[depth, scale] dot [grouping, filter] cells. [small <-> large] vs [far <-> near]
    977. image p393fig11.31 (Todd, Akerstrom 1987) created a series of 2D images from discrete black patches on a white disk and showed how the perceived depth varies with the factors summarized in the figure. The LIGHTSHAFT model quantitatively simulated their data.
      || Factors determining depth-from-texture percept. Perceived depth varies with texture element width, but only when elements are elongated and sufficiently aligned with one another to form long-range groupings. Data of (Todd, Akerstrom 1987) simulated by the LIGHTSHAFT model of (Grossberg, Kuhlmann 2007). [HP, LP, CCE, CCS, RO]
    978. image p393fig11.32 Kulikowski stereograms involve binocular matching of out-of-phase (a) Gaussians or (b) rectangles. The latter can generate a percept of simultaneous fusion and rivalry. See the text for why.
      ||
    979. image p394fig11.33 The Kaufman stereogram also creates a percept of simultaneous fusion and rivalry. The square in depth remains fused and the perpendicular lines in the two images are pervceived as rivalrous.
      || 3D groupings determine perceived depth, stereogram (Kaufman 1974). Vertical illusory contours are at different disparities than those of bounding squares. Illusory square is seen in depth. Vertical illusory contours are binocularly fused and determine the perceived depth of the square. Thin, oblique lines, being perpendicular, are rivalrous: simultaneous fusion and rivalry.
    980. image p395fig11.34 A comparison of the properties of other rivalry models with those of the 3D LAMINART model (surrounded by red border). Significantly, only 3D LAMINART explains both stable vision and rivalry (green border).
      || Comparison of rivalry models
    981. image p396fig11.35 Three properties of bipole boundary grouping in V2 can explain how boundaries oscillate in response to rivalry-inducing stimuli. Because all boundaries are invisible, however, these properties are not sufficient to generate a conscious percept of rivalrous surfaces.
      || 3 V2 boundary properties cause binocular rivalry. 1. Bipole grouping, 2. Orientational competition, 3. Actovity-dependent habituation
    982. image p397fig11.36 Simulation of the temporal dynamics of rivalrous, but coherent, boundary switching.
      || Simulation of 2D rivalry dynamics. [Inputs, Temporal dynamics of V2 layer 2/3 boundary cells] cart [left, right]
    983. image p398fig11.37 Simulation of the no swap baseline condition of (Logothetis, Leopold, Sheinberg 1996).
      || [Binocular, [left, right] eye] activity
    984. image p399fig11.38 Simulation of the swap condition of (Logothetis, Leopold, Sheinberg 1996).
      || [Binocular, [left, right] eye] activity
    985. image p399fig11.39 Simulation of the eye rivalry data of (Lee, Blake 1999).
      || [Binocular, [left, right] eye] activity
    986. image p400fig11.40 When planar 2D parallelograms are justaposed, the resultant forms generate 3D percepts that are sensitive to the configuration of angles and edges in the fugure. See the text for why.
      || 3D representation of 2D images, Monocular cues (eg angles) can interact together to yield 3D interpretation. Monocular cues by themselves are often ambiguous. Same angles and shapes, different surface slants. How do these ambiguous 2D shapes contextually define a 3D object form?
    987. image p401fig11.41 The 3D LAMINART model proposes how angle cells and disparity-gradient interact through learning to generate 3D representations of slanted objects.
      || 3D LAMINART model. [LGN, V1, V2, V4] Four key additions: 1. Angle cells - tuned to various angles; 2. Disparity-gradient cells - tuned to disparity gradients in the image; 3. weights from [angle to disparity-gradient] cells - learned while viewing 3D image; Colinear grouping between [angle to disparity-gradient] cells - disambiguates ambiguous groupings.
    988. image p401fig11.42 A hypothetical cortical hypercolumn structure proposes how angle cells and disparity-gradient cells, including bipole cells that stay within a given depth, may self-organize during development.
      || Hypercolumn representation of angles [leftm right] cart [far-to-near, zero, near-to-far]
    989. image p402fig11.43 A pair of disparate images of a scene from the University of Tsukuba. Multiview imagre database.
      || input [left, right]
    990. image p402fig11.44 3D scenic reconstruction of the image in Figure 11.43 by the 3D LAMINART model.
      || Disparity [5, 6, 8, 10, 11, 14]: images of objects in common depth planes
    991. image p403fig11.45 The multiple boundary and surface scales that were used to simulate a reconstruction of the SAR image in Figure 3.24.
      || SAR processing by multiple scales. [boundaries before completion, boundaries after completion, surface filling-in] versus scale [small, medium, large]. large scale bipole
    992. image p405fig12.01 A What ventral cortical stream and Where/How dorsal cortical stream have been described for audition, no less than for vision.
      || Partietal lobe: where; Temporal lobe: what. V1-> [[what: IT], [where: PPC-> DLPFC]]. A1-> [[what: [ST-> VLPFC], VLPFC], [where: [PPC-> DLPFC], DLPFC]].
    993. image p406fig12.02 The Vector Integration to Endpoint, or VITE, model of arm trajectory formation enables the three S's of a movement to be realized: Synergy formation, Synchrony of muscles within a synergy, and variable Speed that is under volitional control (G). This is accomplished by subtracting a present position vector (P) from a target position vector (T) to form a difference vector (V) which moves P towards T at a speed that is determined by G.
      || The three S's of movement control. T-> D [-> [[D]+G]-> P->], P-> D (inhib), G-> [[D]+G]. 1. Synergy - Defining T determines the muscle groups that will contract during the movement. 2. Synchrony - When G turns on, all muscle groups for which D != 0 contract by variable amounts in equal time. Because G multiplies D, it does not change the direction in which P moves to acquire T: straight line movement. 3. Speed - P integrates D at rate G until P = T. Increasing (decreasing) G makes the movement faster (slower).
    994. image p407fig12.03 Neurophysiological data showing how motor cortical cells code different vectors that are sensitive to both the direction of the commanded movement and its length.
      || (a) Single primary motor cortex neuron, onset of movement -> on..., radial architecture... (b) Motor cortex neuronal population, radial architecture...
    995. image p409fig12.04 (top half) Neurophysiological data of vector cell responses in motor cortex. (bottom half) VITE model simulations of a simple movement in which the model's difference vector simulates the data as an emergent property of network interactions.
      || Neurophysiological data. VITE model [Present Position vector, Difference vector, Outflow velocity vector, go signal].
    996. image p410fig12.05 VITE simulation of velocity profile invariance if the same GO signal gates shorter (a) or longer (b) movements. Note the higher velocities in (b).
      || [[short, long] cart [G, dP/dt]] vs time. G = GO signal, dP/dt = velocity profile.
    997. image p410fig12.06 Monkeys seamlessly transformed a movement initiated towards the 2 o'clock target into one towards the 10 o'clock target when the later target was substituted 50 or 100 msec after activation of the first target light.
      ||
    998. image p411fig12.07 The left column simulation by VITE shows the velocity profile when the GO signal (G) starts with the movement. The right signal column shows that the peak velocity is much greater if a second movement begins when the GO signal is already positive.
      || Higher peak velocity due to target switching. VITE simulation of higher peak speed if second target rides on first GO signal. [[first, second] target cart [G, dP/dt]] vs time. Second target GO is much higher. G = GO signal, dP/dt = velocity profile.
    999. image p411fig12.08 Agonist-antagonist opponent organization of difference vector (DV) and present position vector (PPV) processing stages and how GO signals gate them.
      ||
    1000. image p412fig12.09 How a Vector Associative Map, or VAM, model uses mismatch learning during its development to calibrate inputs from a target position vector (T) and a present position vector (P) via mismatch learning of adaptive weights at the difference vector (D). See the text for details.
      || Vector Associative Map model (VAP). During critical period, Endogenous Random Generator (ERG+) tirns on, activates P, and causes random movements that sample workspace. When ERG+ shuts off, posture occurs. ERG- then turns on (rebound) and opens Now Print (NP) gate, that dumps P into T. Mismatch learning enables adaptive weights between T and D to change until D (the mismatch) appoaches 0. Then T and P are both correctly calibrated to represent the same positions.
    1001. image p413fig12.10 Processing stages in cortical areas 4 and 5 whereby the VITE model combines outflow VITE trajectory formation signals with inflow signals from the spinal cord and cerebellum that enable it to carry out movements with variable loads and in the presence of obstacles. See the text for details.
      || area 4 (rostral) <-> area 5 (caudal).
    1002. image p414fig12.11 Neurophysiological data from cortical areas 4 and 5 (every other column) and simulations thereof (other columns) during a reach.
      || activation vs time. (a) area 4 phasic RT (IFV) (b) area 4 tonic (OPV) (c) area 4 phasic-tonic (OFPV) (d) area 4 phasic MT (DVV) (e) area 5 phasic (DV) (f) area 5 tonic (PPV)
    1003. image p415fig12.12 The combined VITE, FLETE, cerebellar, and multi-joint opponent muscle model for trajectory formation in the presence of variable forces and obstacles.
      ||
    1004. image p416fig12.13 The DIRECT model learns, using a circular reaction that is energized by an Endogenous Random Generator, or ERG, to make motor-equivalent volitionally-activated reaches. This circular reaction learns a spatial representation of a target in space. It can hereby make accurate reaches with clamped joints and on its first try using a tool under visual guidance; see Figure 12.16.
      || DIRECT model (Bulloch, Grossberg, Guenther 1993). learns by circular reaction. learns spatial reresentation to me4diate between vision and action. motor-equivalent reaching. can reach target with clamped joints. can reach target with a TOOL on the first try under visual guidance. How did tool use arise?!
    1005. image p416fig12.14 Computer simulations of DIRECT reaches with (b) a tool, (c) a clamped elbow, and (d) with a blindfold, among other constraints.
      || Computer simulationsd of direct reaches [unconstrained, with TOOL, elbow clamped at 140°, blindfolded]
    1006. image p417fig12.15 The DIRECT and DIVA models have homologous circuits to learn and control motor-equivalent reaching and speaking, with tool use and coarticulation resulting properties. See the text for why.
      || From Seeing and Reaching to Hearing and Speaking, Circular reactions (Piaget 1945, 1951, 1952). Homologous circuits for development and learning of motor-equivalent REACHING and SPEAKING. DIRECT TOOL use (Bullock, Grossberg, Guenther 1993), DIVA Coarticulation (Guenther 1995)
    1007. image p418fig12.16 Anatomical interpretations of the DIVA model processing stages.
      || [Feedforward control system (FF), Feedback control subsystem (FB)]. Speech sound map (Left Ventral Premotor Cortex (LVPC)), Cerebellum, Articulatory velocity and position maps (Motor Cortex (MC)), Somatosensory Error Map (Inferior Parietal Cortex (IPC)), Auditory Error Map (Superior Temporal Cortex (STC)), Auditory State Map (Superior Temporal Cortex)), Somatosensory State Map (Inferior Parietal Cortex)), articulatory musculature via subcortical nuclei, auditory feedback via subcortical nuclei
    1008. image p419fig12.17 The auditory continuity illusion illustrates the ART Matching Rule at the level of auditory streaming. Its "backwards in time" effect of future context on past conscious perception is a signature of resonance.
      || Auditory continuity illusion. input, percept. Backwards in time - How does a future sound let past sound continue through noise? Resonance! - It takes a while to kick in. After it starts, a future tone can maintain it much more quickly. Why does this not happen if there is no noise? - ART Matching Rule! TD harmonic filter is maodulatory without BU input. It cannot create something out of nothing.
    1009. image p420fig12.18 The ARTSTREAM model explains and simulates the auditory continuity illusion as an example of a spectral-pitch resonance. Interactions of ART Matching Rule and asymmetric competition mechanisms in cortical strip maps explain how the tone selects the consistent frequency from the noise in its own stream while separating the rest of the noise into another stream.
      || ARTSTREAM model (Grossberg 1999; Grossberg, Govindarajan, Wyse, Cohen 2004). SPINET. Frequency and pitch strips. Bottom Up (BU) harmonic sieve. Top Down (TD) harmonic ART matching. Exclusive allocation. Learn pitch categories based on early harmonic processing. A stream is a Spectral-Pitch Resonance!
    1010. image p422fig12.19 The ARTSTREAM model includes mechanisms for deriving streams both from pitch and from source direction. See the text for details.
      || [left, right] cart Peripheral processing = [input signal-> outer & middle ear preemphasis-> basilar membrane gammatone filterbank-> energy measure]. Spectral stream layer-> spectral summation layer-> delays-> [f-, tau] plane-> pitch stream layer-> pitch summation layer.
    1011. image p423fig12.20 The Spatial Pitch Network, or SPINET, model shows how a log polar spatial representation of the sound frequency spectrum can be derived from auditory signals occuring in time. The spatial representation allows the ARTSTREAM model to compute spatially distinct auditory streams.
      || SPINET model (Spatial Pitch Network) (Cohen, Grossberg, Wyse 1995). 1. input sound 2. Gamma-tone filter bank 3. Shaort-term average energy spectrum 4. MAP transfer function 5. On-center off-surround and rectification 6. Harmonic weighting 7. Harmonic summation and competition -> PITCH
    1012. image p424fig12.21 One of the many types of data about pitch processing that are simulated by the SPINET model. See the text for details.
      || Pitch shifts with component shifts (Patterson, Wightman 1976; Schouten 1962). Pitch vs lowest harmonic number.
    1013. image p424fig12.22 Decomposition of a sound (bottom row) in terms of three of its harmonics (top three rows).
      ||
    1014. image p425fig12.23 ARTSTREAM simulations of the auditory continuity illusion and other streaming properties (left column, top row). When two tones are separated by silence (Input), a percept of silence also separates them in a spectral-pitch resonance. (left column, bottom row). When two tones are separated by broadband noise, the percept of tone continues through the noise in one stream (stream 1) while the remainder of the noise occurs in a different stream (stream 2). (right column) Some of the other streaming properties that have been simulated by the ARTSTREAM model.
      || Auditory continuity does not occur without noise. Auditory continuity in noise. Other simulated streaming data.
    1015. image p426fig12.24 Spectrograms of /ba/ and /pa/ show the transient and sustained parts of their spectrograms.
      ||
    1016. image p428fig12.25 (left architecture) Auditory-articulatory feedback loop whereby babbled sounds active learning in an imitative map that is later used to learn to reproduce the sounds of other speakers. An articulatory-to-auditory expectation renders learning possible by making the auditory and motor data dimensionally consistent, as in the motor theory of speech. (right architecture) Parallel streams in the ARTSPEECH model for learning speaker-independent speech and language meaning, including a mechanism for speaker normalization (right cortical stream) and for learning speaker-dependent vocalic qualities (left cortical stream).
      || left: Speaker-dependent vocalic qualities; right: Speaker-independent speech and language meaning
    1017. image p430fig12.26 The NormNet model shows how speaker normalization can be achieved using specializations of the same mechanisms that create auditory streams. See the text for how.
      || [Anchor vs Stream] log frequency map. -> diagonals-> Speaker-independent acoustic item information-> [BU adaptive filter, TD learned expectation]-> leaned item recognition categories
    1018. image p431fig12.27 The strip maps that occur in ARTSTREAM and NormNet are variants of a cortical design that aalso creates ocular dominance columns in the visual cortex.
      || Adult organization of V1 (Grinvald etal http://www.weizmann.ac.il/brain/images/cubes.html). (1) Occular dominance columns (OCDs): Alternating strips of cortex respond preferentially to visual inputs of each eye (R/L corresponds to Right and Left eye inputs in the figure); Orientation columns: A smooth pattern of changing orientation preference within each ODC. Organized in a pinwheel like fashion.
    1019. image p432fig12.28 (left image) The SpaN model simulates how spatial representations of numerical quantities are generated in the parietal cortex. (right image) Behavior numerosity data and SpaN model simulations of it.
      || (Left) preprocessor-> spatial number map-> Comparison wave. (Right) data axis: number of lever presses; model axis: node position in the spatial number axis
    1020. image p433fig12.29 Learning of place-value number maps language categories in the What cortical stream into numerical strip maps in the Where cortical stream. See the text for details.
      || (1) spoken word "seven"-> (2) What processing stream- learned number category <-> (3) What-Where learned assoociations <- (4) Where processing stream- spatial number map <-(5) visual clues of seven objects
    1021. image p436fig12.30 The conscious ARTWORD, or cARTWORD, laminar cortical speech model simulates how future context can disambiguate noisy past speech sounds in such a way that the completed percept is consciously heard to proceed from past to future as a feature-item-list resonant wave propagates through time.
      || cARTWORD: Laminar cortical model macrocircuit (Grossberg, Kazerounian 2011) Simulates PHONEMIC RESTORATION: Cognitive Working Memory (processed item sequences) - [Excitatory-> inhibitory-> habituative-> adaptive filter-> adaptive filter-> adaptive filter with depletable synapse-> Acoustic [item, feature]
    1022. image p436fig12.31 Working memories do not store longer sequences of events in the correct temporal order. Instead, items at the beginning and end of the list are oftem called first, and with the highest probability.
      || Working memory. How to design a working memory to code "Temporal Order Information" in STM before it is stored in LTM. Speech, language, sensory-motor control, cognitive planning. eg repeat a telephone number unless you are distracted first. Temporal order STM is often imperfect, eg Free Recall. [probability, order] of recall vs list position. WHY?
    1023. image p437fig12.32 Data from a free recall experiment illustrate the bowed serial position curve.
      || Serial position function for free recall Data: (Murdock 1962 JEP 64, 482-488). % correct vs position of word on a 40-word list. Primacy gradient can be a mixture of STM and LTM read-out.
    1024. image p437fig12.33 Item and Order working memory models explain free recall data, as well as many other psychological and neurobiological data, by simulating how temporal series of events are stored as evolving spatial patterns of activity at content-addressable item categories. The categories with the largest activities are rehearsed first, and self-inhibit their activity as they do so in order to prevent tem from being rehearsed perseveratively. The laws whereby the items are stored in working memory obey basic design principles concerning list categories, or chunks, of sequences of stored items can be stably remembered.
      || Working memory models: item and order, or competitive queuing (Grossberg 1978; Houghton 1990; Page, Norris 1998). Event sequence in time stored as an evolving spatial pattern of activity. Primacy gradient of working memory activation stores correct temporal order at content-addressable cells. Maximally activated cell populations is performed next when a rehearsal wave is turned on. Output signal from chosen cell population inhibits its own activity to prevent perseveration: inhibition of return. Iterate until entire sequence is performed.
    1025. image p438fig12.34 The LTM Invariance Principle insists that words being stored in working memory for the first time (eg MYSELF) do not cause catastrophic forgetting of the categories that have already been learned for their subwords (eg MY, SELF, and ELF) or other subset linguistic groups.
      || LTM invariance principle. unfamiliar STM -> LTM familiar. How does STM storage of SELF influence STM storage of MY? It should not recode LTM of either MY or SELF!
    1026. image p439fig12.35 The Normalization Rule insists that the total activity of stored items in working memory has an upper bound that is approximately independent of the number of items that are stored.
      || Normalization Rule (Grossberg 1978). Total STM activity has a finite bound independent of the number of items (limited capacity of STM). Activity vs Items for [slow, quick] asymptotic energy growth.
    1027. image p439fig12.36 (1) Inputs to Item and Order working memories are stored by content-addressable item categories. (2) The relative activities of the item categories code the temporal order of performance. (3) In addition to excitatory recurrent signals from each working memory cell (population) to itself, there are also inhibitory recurrent signals to other working memory cells, in order to solve the noise-saturation dilemma. (4) A nonspecific rehearsal wave allows the most active cell to be rehearsed first. (5) As an item is being rehearsed, it inhibits its own activity using a feedback inhibitory interneuron. Persevervation performance is hereby prevented.
      || Item and order working memories. (1) Content-addressable item codes (2) Temporal order stored as relative sizes of item activities (3) Competition between working memory cells: Competition balances the positive feedback that enables the cells to remain active. Without it, cell activities may all saturate at their maximal values-> Noise saturation dilemma again! (4) Read-out by nonspecific reheasal wave- Largest activity is the first out (5) STM reset self-inhibition prevents perseveration: [input/self-excitatory, rehearsal wave]-> [output, self-inhibition]
    1028. image p440fig12.37 Simulation of a primacy gradient for a short list (left image) being transformed into a bowed gradient for a longer list (right image). Activities of cells that store the longer list are smaller die to the Normalization Rule, which follows from the shunting inhibition in the working memory network.
      || Primacy bow as more items stored. [activities, final y] (Left) Primacy gradient 6 items (Right) Bowed gradient 20 items
    1029. image p441fig12.38 The LTM Invariance Principle is realized if the relative sizes of the inputs to the list chunk level stay the same as more items are stored in working memory. This property, in turn, follows from shunting previously stored working memory activities when a ne4w item occurs.
      || LTM Invariance principle. Choose STM activities so that newly stored STM activities may alter the size of old STM activities without recoding their LTM patterns. In particular: New events do not change the relative activities of past event sequences, but may reduce their absolute activites. Why? Bottom-up adaptive filtering uses dot products: T(j) = sum[i=1 to n: x(i)*z(i,j) = total input to v(j). The relative sizes of inputs to coding nodes v(j) are preserved. x(i) -> w*x(i), 0 < w <= 1, leaves all past ratios T(j)/T(k) unchanged.
    1030. image p442fig12.39 (left column, top row) How a shunt plus normalization can lead to a bow in the stored working memory spatial pattern. Time increases in each row as every item is stored with activity 1 before it is shunted by w due to each successive item's storage, and the total working memory activity in each row is normalized to a total activity of 1. (right column, top row) When the working memory stored pattern is shunted sufficiently strongly (w > 1/2), then the pattern bows at position 2 in the list as more items are stored through time. (left column, bottom row) LTM invariance can be generalized to consider arbitrary amounts of attention u, being paid when the i_th item is stored with an arbitrary amount of shunting w(j) to the j_th item. (right colum, bottom row) The Normalization Rule can also be generalized to approach the maximum possible normalized total activity that is stored across all the working memory cells at different rates.
      || Shunt normalization -> STM bow. (topLeft) Algebraic working memory (Grossberg 1978) (topRight) Strong inhibition of new inputs by stored STM items. Bow at position 2. Can we classify all working memory codes of this type? Yes! (bottomLeft) 1. LTM invariance principle (bottomRight) 2. Normalization Rule (Kahneman, Beatty 1966)
    1031. image p442fig12.40 Given the hypothesis in Figure 12.39 (right column, bottom row) and a generalized concept of steady, albeit possibly decreasing, attention to each item as it is stored in working memory, only a primacy, or bowed gradient of activity across the working memory items can be stored.
      || LTM Invariance + Normalization. (... given conditions ...) Then the x(i) can ONLY form: [primacy gradient, recency gradient, unimodal bow]
    1032. image p443fig12.41 Neurophysiological data from the Averbeck etal sequential copying experiments show the predicted primacy gradient in working memory and the self-inhibition of activity as an item is stored. When only the last item remains stored, it has the highest activity becasuse it has been freed from inhibition by earlier items.
      || Neurophysiology of sequential copying
    1033. image p444fig12.42 The LIST PARSE laminar cortical model of working memory and list chunking that I published with Lance Pearson in 2008 simulated the Averbeck etal data in Figure 12.41, as in the left column of the figure. It also simulated cognitive data about working memory storage by human subjects. See the text for details.
      || LIST PARSE: Laminar cortical model of working memory and list chunking (Grossberg, Pearson 2008). Simulates data about: [immediate, delayed, continuous] distractor free recall; immediate serial recall; and variable-speed sequential perfomance of motor acts. [velocity, acceleration] vs time (ms) from recall cue.
    1034. image p445fig12.43 The LIST PARSE laminar cortical Cognitive Working Memory circuit, that is proposed to occur in ventrolateral prefrontal cortex, is homologous to the LAMINART circuit circuit that models aspects of how visual cortex sees. The Motor Working Memory, VITE Trajectory Generator, and Variable-Rate Volitional Control circuits model how other brain regions, including dorsolateral prefrontal cortex, motor cortex, cerebellum, and basal ganglia, interact with the Cognitive Working Memory to control working memory storage and variable-rate performance of item sequences.
      || List parse circuit diagram. Connectivity convention. sequence chunks [<- BU filter, TD expectation ->] working memory. Working memory and sequence chunking circuit is homologous to visual LAMINART circuit!
    1035. image p446fig12.44 (left column, top row) LIST PARSE can model linguistic data from human subjects. In this figure, model parameters are fixed to enable a close fit to data about error-type distributions in immediate free recall experiments, notably transposition errors. (right column, top row) Simulation and data showing bowing of the serial position curve, including an extended primacy gradient. (left column, bottom row) The simulation curve overlays data about list length effects, notably the increasing recall difficulty of longer lists during immediate serial recall (ISR). (right column, bottom row) Simulation (bottom image) and data (top image) of the limited temporal extent for recall.
      || (1. TL) Error-type distributions in immediate serial recall (Hanson etal 1996). % occurence vs serial position. Graph convention: Data- dashed lines; Simulations- solid lines. Six letter visual ISR. Order errors- transpositions of neighboring items are the most common. Model explanation: Noisy activation levels change relative oorder in primacy gradient. Similar activation of neighboring items most susceptible to noise. Model paramenters fitted on these data. (2. TR) Bowing of serial position curve (Cowan etal 1999). % correct vs serial position. Auditory ISR with various list lengths (graphs shifted rightward): For [, sub-]span lists- extended primacy, with one (or two) item recency; Auditory presentation- enhanced performance for last items. LIST PARSE: End effects- first and last items half as many members; Echoic memory- last presented item retained in separate store. (3. BL) List length effects, circles (Crannell, Parrish 1968), squares (Baddeley, Hitch 1975), solid line- simulation. % list correct vs list length. Variable list length ISR: longer lists are more difficult to recall. LIST PARSE: More items- closer activation levels and lower absolute activity level with enough inputs; Noise is more likely to produce order errors, Activity levels more likely to drop below threshold;. (4. BR) Limited temporal extent for recall (Murdoch 1961). % recalled vs retention interval (s). ISR task with distractor-filled retention intervals (to prevent rehersal): Increasing retention interval - decreases probabilty of recalling list correctly; Load dependence- longer list more affected by delays; Performance plateau- subjects reach apparent asymptote. LIST PARSE: Increase convergence of activities with time; loss of order information;.
    1036. image p447fig12.45 (left column) LIST PARSE simulations of the proportion of order errors as a function of serial position for 6 item lists with (a) an extended pause of 7 time units between the third and fourth items, and (b) pauses of 5 time units (solid curve) and 10 time units (dashed curve) between all items. (right column) Simulations (solid curves) and data (dashed curves) illustrating close model fits in various immediate free recall tasks.
      || (Left) Temporal grouping and presentation variability. Temporal grouping: Inserting an extended pause leads to inter-group bowing; Significantly different times of integration and activity levels across pause, fewer interchanges. (Right) Immediate free recall, and [delayed, continuous] distractor-free recall. Overt rehersal IFR task with super-span (ie 20 item) lists: Extended recency- even more extended with shorter ISIs; Increased probability of recall with diminished time from last rehearsal; Early items in list rehearsed most;. LIST PARSE (unique) for long lists: Incoming items from recency gradient; Rehearsal (re-presentation) based upon level of activity.
    1037. image p448fig12.46 A Masking Field working memory is a multiple-scale self-similar recurrent shunting on-center off-surround network. It can learn list chunks that respond selectively to lists of item chunks of variable length that are stored in an item working memory at the previous processing stage. Chunks that code for longer lists (eg MY vs MYSELF) are larger, and give rise to stronger recurrent inhibitory neurons (red arrows).
      || How to code variable length lists? MASKING FIELDS code list chunks of variable length (Cohen, Grossberg 1986, 1987; Grossberg, Kazerounian 2011, 2016; Grossberg, Meyers 2000; Grossberg, Pearson 2008). Multiple-scale self-similar WM: Masking field, adaptive filter. Variable length coding- Masjking fields select list chunks that are sensitive to WM sequences of variable length; Selectivity- Larger cells selectively code code longer lists; Assymetric competition- Larger cells can inhibit smaller cells more than conversely MAgic Number 7! Temporal order- different list chunks respond to the same items in different orders eg LEFT vs FELT;.
    1038. image p449fig12.47 This figure illustrates the self-similarity in a Masking Field of both its recurrent inhibitory connections (red arrows) and its top-down excitatory priming signals (green arrows) to the item chunk working memory.
      || Both recurrent inhibition and top-down excitatory priming are self-similar in a masking field. MYSELF <-> [MY, MYSELF]
    1039. image p452fig12.48 (left column) In experiments of (Repp etal 1978), the silence duration between the words GRAY and SHIP was varied, as was the duration of the fricative noise in S, with surprising results. (right column) The red arrow directs our attention to surprising perceptual changes as silence and noise durations increase. See the text for details.
      || Perceptual integration of acoustic cues, data (Repp etal 1978). GRAY-> silence duration-> SHIP (noise duration from start of word). Noise duration vs silence duration: GRAY SHIP <-> [GREAT SHIP <-> GRAY CHIP] <-> GREAT CHIP.
    1040. image p453fig12.49 The ARTWORD model that I published in 2000 with my PhD student Christopher Myers simulates data such as the (Repp etal 1978) data in Figure 12.68. See the text for details.
      || ARTWORD model (Grossberg, Myers 2000). Input phonetic features-> Phonemic item working memory-> Masking Field unitized lists-> Automatic gain control-> Phonemic item working memory. [habituative gate, adaptive filter]s.
    1041. image p453fig12.50 The ARTWORD perception cycle shows how sequences of items activate possible list chunks, which compete among each other and begin to send their top-down expectations back to the item working memory. An item-list resonance develops through time as a result.
      || ARTWORD perception cycle. (a) bottom-up activation (b) list chunk competition (c) item-list resonance (d) chunk reset due to habituative collapse.
    1042. image p454fig12.51 (left column) Even as a resonance with the list chunk GRAY begins to develop, if the delay between "gray" and "chip" is increased, greater habituation of this resonance may allow the GREAT chunk to begin to win, thereby smoothly transfering the item-list resonance from GRAY to GREAT through time. (right column) Simulation of a resonant treansfer from GRAY to GREAT, and back again as the silence interval between the words {gray" and "chip" increases. The red region between GRAY and GREAT curves calls attention to when GREAT wins. See the text for details.
      || Resonant transfer, as silence interval increases. (left) Delay GRAY resonance weakens. A delayed additional item can facilitate perception of a longer list. (right) GRAY-> GREAT-> GRAY.
    1043. image p455fig12.52 Simulation of cARTWORD dynamics in response to the complete list /1/-/2/-/3/. The relevant responses are surrounded by a red box.
      || Presentation of a normal sequence: input /1/-/2/-/3/. |c(i,1)-5| vs time (msec). List chunks select most predictive code. Order stored in WM layers. Resonant activity of /1/-/2/-/3/ in item and feature layers corresponds to conscious speech percept.
    1044. image p456fig12.53 Simulation of cARTWORD dynamics in response to the partial list /1/-silence-/3/ with /2/ replaced by silence. Only the representations of these items can be seen in the red box.
      || Presentation with silence duration: input /1/-silence-/3/. |c(i,1)-5| vs time (msec). List chunks select most predictive code. Order stored in WM layers. Gap in resonant activity of /1/-silence-/3/ in item and feature layers corresponds to perceived silence.
    1045. image p456fig12.54 Item /2/ is restored in the correct list position in response to the list /1/-noise-/3/.
      || Presentation with noise: input /1/-noise-/3/. |c(i,1)-5| vs time (msec). List chunks select the most predictive code. Order restored in WM layers. Resonant activity of /1/-/2/-/3/ in item and feature layers corresponds to restoration of item /2/ replaced by noise in input.
    1046. image p457fig12.55 Item /4/ is restored in the correct list position in response to the list /1/-noise-/5/. This and the previous figure show how future context can disambiguate past noisy sequences that are otherwise identical.
      || Presentation with noise: input /1/-noise-/5/. |c(i,1)-5| vs time (msec). List chunks select the most predictive code. Order restored in WM layers. Resonant activity of /1/-/4/-/3/ in item and feature layers corresponds to restoration of item /4/ replaced by noise in input.
    1047. image p459fig12.56 (Grossberg, Pearson 2008) proposed that the ability of working memories to store repeated items in a sequence represents rank information about the position of an item in a list using numerical hypercolumns in the prefrontal cortex (circels with numbered sectors: 1,2,3,4). These numerical hypercolumns are conjointly activated by inputs from item categories and from the analog spatial representation of numerosity in the parietal cortex. Thes parietal representations (overlapping Gausian activity profiles that obey a Weber Law) had earlier been modeled by (Grossberg, Repin 2003). See the text for details.
      || Item-order-rank working memory, rank information from parietal numerosity cicuit (Grossberg, Peaarson 2008; Grossberg, Repin 2003). [Sensory working memory-> adaptive filter-> list chunk-> attentive prime-> Motor working memory]-> [large, small] numbers-> transfer functions with variable thresholds and slopes-> uniform input-> integrator amplitude-> number of transient sensory signals.
    1048. image p460fig12.57 The lisTELOS architecture explains and simulates how sequences of saccadic eye movement commands can be stored in a spatial working memory and recalled. Multiple brain regions are needed to coordinate these processes, notably three different basal ganglia loops to replace saccade storage, choice, and performance, and the supplementary eye fields (SEF) to choose the next saccadic command from a stored sequence. Because all working memories use a similar network design, this model can be used as a prototype for storing and recalling many other kinds of cognitive, spatial, and motor information. See the text for details.
      || lisTELOS model- Spatial working memory (Silver, Grossberg, Bulloch, Histed, Miller 2011). Simulates how [PPC, PFC, SEF, FEF, SC] interact with 3 BG loops to learn and perform sequences of saccadic eye movements.
    1049. image p461fig12.58 The lisTELOS model built upon key processes that were earlier modeled by the TELOS model. See the text for details.
      || TELOS model (Brown, Bulloch, Grossberg 1999, 2004). shows [BG nigro-[thalamic, collicular], FEF, ITa, PFC, PNR-THAL, PPC, SEF, SC, V1, V4/ITp, Visual Cortex input] and [GABA].
    1050. image p462fig12.59 The TELOS model clarifies how reactive vs. planned eye movements may be properly balanced against one another, notably how a fast reactive movement is prevented from occuring in response to onset of a cue that requires a different, and more contextually appropriate, response, even if the latter response takes longer to be chosen and performed. The circuit explains how "the brain knows it before it knows" what this latter response should be by changing the balance of excitation to inhibition in the basal ganglie (BG) to keep the reactive gate stays shut until the correct target position can be chosen by a frontal-parietal resonance.
      || Balancing reactive vs. planned movements (Brown, Bulloch, Grossberg 2004). (a) shows [FEF, PPC]-> [BG, SC], and BG-> SC. (b) FTE vs time (msec) for [fixation, saccade, overlap, gap, delayed saccade] tasks.
    1051. image p463fig12.60 Rank-related activity in prefrontal cortex and supplementary eye fields from two different experiments. See the text for details.
      || Rank-related activity in PFC and SEF. Prefrontal cortex (Averbeck etal 2003) [sqare, inverted triangle]. Supplementary eye field (Isoda, Tanju 2002).
    1052. image p464fig12.61 (left column) A microstimulating electrode causes a spatial gradient of habituation. (right column) The spatial gradient of habituation that is caused by microstimulation alters the order of saccadic performance of a stored sequence, but not which saccades are performed, using interactions between the prefrontal cortex (PFC) working memory and the supplemental eye field (SEF) saccadic choice.
      || (left) Microstimulation causes habituation (Grossberg 1968). Stimulation caused habituation. Cells close to the stimulation site habituate most strongly. (right) Stimulation biases selection PFC-> SEF-> SEF. PFC Activity gradient in working memory, SEF Microstimulation causes habituation, During selection habituated nodes are less likely to win this competition.
    1053. image p464fig12.62 The most habituated positions have their neuronal activites most reduced, other things being equal, as illustrated by the gradient from deep habituation (red) to less habituation (pink). The saccadic performance orders (black arrows) consequently tend to end in the most habituated positions that have been stored.
      || The most habituated position is foveated last. For each pair of cues, the cue closest to the stimulation site is most habituated -- and least likely to be selected. Because stimulation spreads in all directions, saccade trajectories tend to converge.
    1054. image p465fig12.63 Neurophysiological data (left image) and lisTELOS stimulation (right figure) showing how microstimulation biases saccadic performance order but not the positions to which the saccades will be directed. See the text for details.
      || Saccade trajectories converge to a single location in space. Microstimulation biased selection so saccade trajectories converged toward a single location in space. [Data, model] contra <-> Ipsi (msec)
    1055. image p467fig12.64 Some of the auditory cortical regions that respond to sustained or transient sounds. See text for details.
      || Some auditory cortical regions. Core <-> belt <-> parabelt. [Belt, Core, ls, PAi, Parabelt, PGa, TAs, TE, TP, TPO, st s].
    1056. image p468fig12.65 Linguistic properties of the PHONET model and some of the data that it simulates. The upper left image summarizes the asymmetric transient-to-sustained gain control that helps to create invariant intraword ratios during variable-rate speech. The lower left image summarizes the rate-dependent gain control of the ARTPHONE model that creates rate-invariant working memory representations in response to sequences of variable-rate speech. The right image summarizes the kind of paradoxical VC-CV category boundary data of (Repp 1980) that ARTPHONE simulates. See the text for details.
      || (left upper) [transient, sustained] [working memory, filter, category]. (left lower) phone inputs-> [input rate estimate, features], Features w <- habituative transmitter gates -> categories-> rate invariant phonetic output, input rate estimate-> gain control-> [features, categories] rate-dependent integration of categories and features. (right) % 2-stop vs VC-CV silent interval (msec): [ib-ga, ib-ba, iga, iba].
    1057. image p469fig12.66 (left column) A schematic of how preserving relative duration, as in the first and third images, of consonant and vowel pairs can preserve a percept, in this case of /ba/, but not doing so, as in the first and second images, can cause a change in percept, as from /ba/ to /wa/, as in the data of (Miller, Liberman 1979) that PHONET simulates. (right column) Changing frequency extent can also cause a /ba/ - /wa/ transition, as shown in data of (Schwab, Sawusch, Nusbaum 1981) that PHONET also simulates.
      || (left image) Maintaining relative duration as speech speeds up preserves percept (Miller, Liberman 1979). frequency vs time- [/ba/, /wa/, /ba/] (right image) Changing frequency extent causes /b/-/wa/ transition (Schwab, Sawusch, Nusbaum 1981). frequency vs time- [/ba/, /wa/] Dt extent.
    1058. image p469fig12.67 PHONET contains transient and sustained cells that respond to different kinds of sounds, notably the transients of certain consonants and the sustained sounds of certain vowels. It then uses the transient working memory to gain contol the integration rate of the sustained working memory to which these different detectors input.
      || Phonetic model summary. (left) Acoustic tokens [consonant, vowel]. (middle) Acoustic detectors [transient (sensitive to rate), Sustained (sensitive to duration)]. (right) Working memory, Spatially stored transient pattern (extent) + gain control-> spatially stored sustained pattern.
    1059. image p471fig12.68 A mismatch reset of /b/ in response to the /g/ in [ib]-[ga] can rapidly shut off the [ib] percept, leading to the percept of [ga] after an interval of silence. In contrast, resonant fusion of the two occurences of /b/ in [ib]-[ba] can cause a continuous percept of sound [iba] to occur during times at which silence is heard in response to [ib]-[ga].
      || Mismatch vs resonant fusion
    1060. image p473fig12.69 Error rate and mean reaction time (RT) data from the lexical decision experiments of (Schvanevelt, McDonald 1981). ART Matching Rule properties explain these data in (Grossberg, Stone 1986).
      || (left) Error rate vs type of prime [R, N, U], [non,] word. (right) Mean RT (msec) vs type of prime [R, N, U], [non,] word.
    1061. image p474fig12.70 The kind of model macrocircuit that was used in (Grossberg, Stone 1986) to explain lexical decision task data.
      || inputs-> A1 <-> A2 oconic sensory features <-> A3 item and order in sensory STM <-> A4 list parsing in STM (masking field) <-> A5 semantic network (self-feedback). [A4, A5] <-> V* visual object recognition system. M1-> [outputs, A1]. M1 <-> M2 iconic motor features <-> M3 item and order in motor STM. A2-> M2. A3-> M3.
    1062. image p476fig12.71 Word frequency data of (Underwood, Freund 1970) that were explained in (Grossberg, Stone 1986).
      || percent errors vs frequency of old words [L-H to H-H, L-L to H-L].
    1063. image p481fig13.01 Macrocircuit of the functional stages and anatomical interpretations of the Cognitive-Emotional-Motor, or CogEM, model.
      || Drive-> hypothalamus value categories <-> amygdala incentive motivational learning-> Orbitofrontal cortex- object-value categories <-> sensory cortex- invariant object categories- conditioned reinforcer learning-> amygdala-> hypothalamus.
    1064. image p483fig13.02 The object-value categories in the orbitofrontal cortex require converging specific inputs from the sensory cortex and nonspecific incentive motivational inputs from the amygdala in order to fire. When the orbitofrontal cortex fires, it can deliver top-down ART Matching Rule priming signals to the sensory cortical area by which it was activated, thereby helping to choose the active recognition categories there that have the most emotional support, while suppressing others, leading to attentional blocking of irrelevant cues.
      || Cognitive-Emotional-Motor (CogEM) model. Drive-> amygdala incentive motivational learning-> orbitofrontal cortex- need converging cue and incentive iputs to fire <-> sensory cortex- conditioned renforcer learning-> amygdala. CS-> sensory cortex. Motivated attention closes the cognitive-emotional feedback loop, focuses on relevant cues, and causes blocking of irrelevant cues.
    1065. image p483fig13.03 The predicted processing stages of CogEM have been supported by anatomical studies of connections between sensory cortices, amygdala, and orbitofrontal cortex.
      || Adapted from (Barbas 1995). sensory cortices = [visual, somatosensory, auditory, gustatory, olfactory]. sensory cortices-> amygdala-> orbital prefrontal cortex. sensory cortices-> orbital prefrontal cortex. [visual cortex, amygdala]-> lateral prefrontal cortex.
    1066. image p484fig13.04 The top-down feedback from the orbitofrontal cortex closes a feedback loop that supports a cognitive-emotional resonance. If this resonance can be sustained long enough, it enables us to have feelings at the same time that we experience the categories that caused them.
      || Cognitive-Emotional resonance. Basis of "core consciousness" and "the feeling of what happens". (Damasio 1999) derives heuristic version of CogEM model from his clinical data. Drive-> amygdala-> prefrontal cortex-> sensory cortex, resonance around the latter 3. How is this resonance maintained long enough to become conscious?
    1067. image p484fig13.05 Classical conditioning is perhaps the simplest kind of associative learning.
      || Classical conditioning (nonstationary prediction). Bell (CS)-> (CR), Shock (US)-> Fear (UR), associative learning.
    1068. image p485fig13.06 (left column) An inverted-U occurs in conditioned reinforcer strength as a function of the ISI between the CS and the US. Why is learning attenuated at 0 ISI? (right column) Some classical conditioning data that illustrate the inverted-U in conditioning as a function of the ISI.
      || InterStimulus Interval (ISI) effect. Data from (Dmith etal 1969; Schneiderman, Gormezano 1964).
    1069. image p485fig13.07 The paradigm of secondary conditioning. See the text for details.
      || Secondary conditioning (Advertising!). [CS1, C2] become conditioned reinforcers.
    1070. image p486fig13.08 The blocking paradigm illustrates how cues that do not predict different consequences may fail to be attended.
      || Blocking- minimal adaptive prediction. Phase [I, II] - CS2 is irrelevant.
    1071. image p486fig13.09 Equally salient cues can be conditioned in parallel to an emotional consequence.
      || Parallel processing of equally salient cues vs overshadowing (Pavlov).
    1072. image p486fig13.10 Blocking follows if both secondary conditioning and attenuation of conditioning at a zero ISI occur.
      || Blocking = ISI + secondary conditioning.
    1073. image p487fig13.11 The three main properties of CogEM that help to explain how attentional blocking occurs.
      || CogEM explanation of attentional blocking. Internal drive input <-> Conditioned reinforcer learning (self-recurrent) <-> Competition for STM <- Motor learning. 1. Sensory representations compete for limited capacity STM. 2. Previously reinforced cues amplify their STM via positive feedback. 3. Other dues lose STM via competition.
    1074. image p488fig13.12 (left column) How incentive motivational feedback amplifies activity of a sensory cortical cell population. (right column) A sensory cortical cell population whose activity is amplified by incentive motivational feedback can suppress the activities of less activated populations via self-normalizing recurrent competitive interactions.
      || Motivational feedback and blocking. (left) sensory input CS, STM activity without motivational feedback, STM activity with motivational feedback. (right) STM suppressed by competition, STM amplified by (+) feedback.
    1075. image p489fig13.13 (top row) If a positive ISI separates onset of a CS and US, then the CS can sample the consequences of the US during the time interval before it is inhibited by it. (bottom row) A CogEM simulation of the inverted-U in conditioning as a function of the ISI betweeen CS and US.
      || Positive ISI and conditioning.
    1076. image p490fig13.14 In order for conditioning to work properly, the sensory representation needs to have at least two successive processing stages. See the text for why.
      || Model of Cognitive-Emotional circuit. Drive-> Drive representation-> ??? <-> Sensory STM <-CS
    1077. image p490fig13.15 The CogEM circuit is an ancient design that is found even in mollusks like Aplysia. See the text for details.
      || Aplysia (Buononamo, Baxter, Byrne, Neural Networks 1990; Grossberg, Behavioral and Brain Sciences 1983). Facilitator neuron ~ drive representation.
    1078. image p492fig13.16 (left column) In order to satisfy all four postulates, there needs to be UCS-activated arousal of polyvalent CS-activated sampling neuron. (right column) The arousal needs to be nonspecific in order to activate any of the CSs that could be paired with the UCS.
      || Polyvalent CS sampling and US-activated nonspecific arousal.
    1079. image p493fig13.17 (top row) Overcoming the ostensible contradiction that seems to occur when attempting to simultaneously realize hypotheses (3) and (4). (bottom row) The problem is overcome by assuming the existence of US-activated drive representation to which CSs can be associated, and that activate nonspecific incentive motivational feedback to sensory representations.
      || Learning nonspecific arousal and CR read-out. (top) Learning to control nonspecific arousal, Learning to read-out the CR (bottom) Drive representation, Incentive motivation.
    1080. image p494fig13.18 Realizing the above constraints favor one particular circuit. Circuits (a) and (b) are impossible. Circuit (d) allows previously occurring sensory cues to be stored in STM. Circuit (e) in addition enables a CS can be stored in STM without initiating conditioning in the absence of a US.
      || Learning to control nonspecific arousal and read-out of the CR: two stages of CS. (d) & (e) polyvalent cells.
    1081. image p494fig13.19 (left column, top row) Secondary conditioning of both arousal and a specific response are now possible. (bottom row) The CogEM circuit may be naturally extended to include multiple drive representations and inputs. (right column, top row) The incentive motivational pathways is also conditionable in order to enable motivational sets to be learned.
      || Secondary conditioning. Homology: conditionable incentive motivation. Multiple drive representations and inputs.
    1082. image p496fig13.20 (top image) A single avalanche sampling cell can learn an arbitrary space-time pattern by sampling it as a temporally ordered series of spatial patterns using a series of outstars. Once an avalanche's sampling cell starts to fire, there is no way to stop it from performing the entire space-time pattern, no matter how dire the consequences. (bottom image) If nonspecific arousal and a specific cue input are both needed to fire the next cell in an avalanche, then environmental feedback can shut off avalanche performance at any time, and volition can speed up or slow down performance.
      || Space-time pattern learning: avalanche. (top image) CS sampling signal-> serially activated outstars-> US spacetime input pattern. Sample a space-time pattern as a sequence of sptial patterns. (bottom image) Nonspecific arousal as a command cell. Polyvalent cell: nonspecific arousal as a STOP and a GO signal.
    1083. image p497fig13.21 (left column) An early embodiment of nonspecific arousal was a command cell in such primitive animals as crayfish. (right column) The songbird pattern generator is also an avalanche. This kind of circuit raises the question of how the connections self-organize through developmental learning.
      || Nonspecific arousal as a command cell. Crayfish swimmerets (Stein 1971). Songbird pattern generator (Fee etal 2002)+. Motor-> RA-> HVC(RA).
    1084. image p498fig13.22 (left column, top row) Adaptive filtering and conditioned arousal are both needed to regulate what cues can learn to activate particular space-time patterns. These developments lead inexorably to basic cognitive abilities, as embodied in the 3D LAMINART models for 3D vision and figure-ground perception (Chapter 11) and the 3D ARTSCAN SEARCH model for invariant object learning, recognition, and 3D search (Chapter 6). (right column, top row) Conditioned arousal enables only emotionally important cues to activate a motivationally relevant space-time pattern. (bottom row) Conditioned arousal and drive representations arise naturally from the unlumping of avalanche circuits to make them selective to motivationally important cues. The MOTIVATOR model is a natural outcome of this unlumping process (this chapter).
      || (top) Adaptive filtering and Conditioned arousal. Towards Cognition: need to filter inputs to the command cell. Towards Emotion: important signals turn arousal ON and OFF. (bottom) Conditioned arousal and Drive representations. Competition between conditioned arousal sources at drive representations, eg amygdala.
    1085. image p499fig13.23 (left column) Self-organization in avalanches includes adaptive filtering by outstars [?instars?], serial learning of temporal order, and learned read-out of spatial patterns by outstars. (right column) Serial learning of temporal order occurs in recurrent associative networks.
      || (left) Self-organizing avalanches [instars, serial learning, outstars]. (right) Serial list learning.
    1086. image p500fig13.24 Both primary excitatory and inhibitory conditioning can occur using opponent processes and their antagonistic rebounds.
      || Opponent processing. Cognitive drive associations. Primary associations: excitatory [CS, US, Fear], inhibitory [CS, US, Fear, Relief rebound].
    1087. image p501fig13.25 When an unbiased transducer is embodied by a finite rate physical process, mass action by a chemical transmitter is the result.
      || Unbiased transducer (Grossberg 1968). S=input, T=output, ?SB?=SB B is the gain. Suppose T is due to release of chemical transmitter y at a synapse: release rate T = S*y (mass action); Accumulation y ~= B.
    1088. image p501fig13.26 A simple differential equation describes the processes of transmitter accumulation and release that do their best, at a finite rate, to carry out unbiased transduction.
      || Transmitter accumulation and release. Transmitter y cannot be restored at an infinite rate: T = S*ym y ~= B, Differential equations: d[dt: y] = A*(B - y) - S*y = accumulate - release. Transmitter y tries to recover to ensure unbiased transduction. What if it falls behind? Evolution has exploited the good properties that happen then.
    1089. image p502fig13.27 Despite the fact that less transmitter y is available after persistent activation by a larger input signal S, the gated output signal S*y is larger die to the mass action gating of S by y.
      || Minor mathematical miracle. At equilibrium: 0 = d[dt: y] = A*(B - y) - S*y. Transmitter y decreases when input S increases: y = A*B/(A + S). However, output S*y increases with S!: S*y = S*A*B/(A + S) (gate, mass action).
    1090. image p502fig13.28 Fast increments and decrements in an input S lead to slow habituation of the habituative gate, or medium-term memory, transmitter y. The output T is a product of these fast and slow variables, and consequently exhibits overshoots, habituation, and undershoots in its response.
      || Habituative transmitter gate: Input; Habituative gate d[dt: y] = A*(B - y) - S*y; Output [overshoot, habituation, undershoot]s Weber Law.
    1091. image p503fig13.29 The ON response to a phasic ON input has Weber Law properties due to the divisive terms in its equilibrium response, which are due to the habituative transmitter.
      || ON-response to phasic ON-input. S1 = f(I+J): y1 = A*B/(A+S1), T1 = s1*y1 = A*B*S1/(A+S1); S2 = f(I): y2 = A*B/(A+S2), T2 = s2*y2 = A*B*S2/(A+S2);. ON = T1 - T2 = (A^2*B*(f(I+J)-f(I)) / (A+f(I)) / (A+f(I+J)) Note Weber Law. When f has a threshold, small I requires larger J to fire due to numerator, but makes suprathreshold ON bigger due to denominator. When I is large, quadratic in denominator and upper bound of f make ON small.
    1092. image p504fig13.30 OFF rebound occurs when the ON-input shuts off due to the imbalance that is caused by the ON input in the habituation of the transmitters in the ON and OFF channels. The relative sizes of ON responses and OFF rebounds is determined by the arousal level I.
      || OFF-rebound due to phasic input offset. Shut off J (Not I!). Then: S1 = f(I), S2 = f(I); y1 ~= A*B/(A+f(I+J)) < y2 ~= A*B/(A+f(I)) y1 and y2 are SLOW; T1 = S1*y1, T2 = S2*y2, T1 < T2;. OFF = T2 - T1 = A*B*f(I)*(f(I+J) - f(I)) / (A+f(I)) / (A + f(I+J)), Note Weber Law due to remembered previous input. Arousal sets sensitivity of rebound: OFF/ON = f(I)/A. Why is the rebound transient? Note equal f(I) inputs.
    1093. image p504fig13.31 Behavioral contrast can occur during reinforcement learning due to decreases in either positive or negative reinforcers. See Figure 13.32 for illustrative operant conditioning data.
      || Behavioral contrast: rebounds! Shock level vs trials. 1. A sudden decrease in frequency or amount of food can act as a negative reinforcer: Frustration. 2. A sudden decrease in frequency or amount of shock can act as a positive reinforcer: Relief.
    1094. image p505fig13.32 Response suppression and the subsequent antagonist rebounds are both calibrated by the inducing shock levels.
      || Behavioral contrast (Reynolds 1968). Responses per minute (VI schedule) vs Trial shock level.
    1095. image p505fig13.33 An unexpected event can disconfirm ongoing processing by triggering a burst of nonspecific arousal that causes antagonistic rebounds in currently active gated dipoles, whether cognitive or affective.
      || Novelty reset: rebound to arousal onset. 1. Equilibrate to I and J: S1 = f(I+J); y1 = A*B/(A+S1); S2 = f(I+J); y2 = A*B/(A+S2);. 2. Keep phasic input J fixed; increase arousal I to I* = I + ∆I: (a) OFF reaction if T1 < T2; OFF = T2 - T1 = f(I*+J)*y2 - f(I*)*y1 = { A*B*(f(I*) - f(I*+J)) - B*(f(I*)*f(I+J) - f(I)*f(I*+J)) } / (A+f(I)) / (A + f(I+J)). 3. How to interpret this complicated equation?
    1096. image p506fig13.34 With a linear signal function, one can prove that the rebound increases with both the previous phasic input intensity J and the unexpectedness of the disconfirming event that caused the burst of nonspecific arousal.
      || Novelty reset: rebound to arousal onset.
    1097. image p506fig13.35 A shock, or other reinforcing event, can have multiple cognitive and emotional effects on different brain processes.
      || Multiple functional roles of shock. 1. Reinforcement sign reversal: An isolated shock is a negative reinforcer; In certain contexts, a shock can be a positive reinforcer. 2. STM-LTM interaction: Prior shock levels need to be remembered (LTM) and used to calibrate the effect of the present shock (STM). 3. Discriminative and situational cues: The present shock level is unexpected (novel) with respect to the shock levels that have previously been contingent upon experimental cues: shock as a [1.reinforcer, 2. sensory cue, 3. expectancy].
    1098. image p509fig13.36 How can life-long learning occur without passive forgetting or associative saturation?
      || Associative learning. 1. Forgetting (eg remember childhood experiences): forgetting [is NOT passive, is Selective]; 2. Selective: larger memory capacity; 3. Problem: why doesn't memory saturate?
    1099. image p510fig13.37 A disconfirmed expectation can cause an antagonistic rebound that inhibits prior incentive motivational feedback, but by itself is insufficient to prevent associative saturation.
      || Learn on-response. 1. CS-> ON, disconfirmed expectation-> antagonistic rebound, OFF-channel is conditioned 2. CS-> [ON, OFF]-> net, zero net output. What about associative saturation?
    1100. image p510fig13.38 Dissociation of the read-out of previously learned adaptive weights, or LTM traces, and of the read-in of new weight values enables back-propagating dendritic action potentials to teach the new adaptive weight values.
      || Dissociation of LTM read-out and read-in. Backpropagating dendritic action potentials as teaching signals. 1. LTM Denditic spines (Rall 1960's)-> Teaching signal - retrograde action potential-> opponent competition. 2. Early predictions: Ca++ currents in learning (Grossberg 1968); role of dendritic spines in learning (Grossberg 1975). Cf experiments of (Hausser, Markram, Poo, Sakmann, Spruston, etc).
    1101. image p510fig13.39 Shunting competition and informational noise suppression in affective gated dipoles, plus back-propagating action potentials for teaching signals, enable the net normalized adaptive weights to be learned. They never saturate!
      || Learn net dipole output pattern. Opponent "decision" controls learning. Cf. competitive learning. Learning signal, opponent extinction.
    1102. image p512fig13.40 A conditioning paradigm that illustrates what it means for conditioned excitators to extinguish.
      || Conditioned excitor extinguishes. 1. Learning phase: CS1 bell-> US, CS1-> Fear(-). 2. Forgetting phase: CS1 bell-> Forgetting. 3. The expectation of shock is disconfirmed.
    1103. image p513fig13.41 A conditioning paradigm that illustrates what it means for conditioned inhibitors not to extinguish.
      || Conditioned inhibitor does not extinguish. 1. Learning phase: CS1 light-> shock, CS1-> Fear(-); Forgetting phase: n/a;. 2. Learning phase : CS1 + CS bell-> no shock; CS2-> relief;. Forgetting phase: CS2 bell- no forgetting. SAME CS could be used! SAME "teacher" in forgetting phase! Something else must be going on , or else causality would be violated!
    1104. image p513fig13.42 A conditioned excitor extinguishes because the expectation that was learned of a shock during the learning phase is disconfirmed during the forgetting phase.
      || Conditioned excitor extinguishes. Learning phase: CS1 bell-> US; CS1-> Fear(-); CS1-> shock; CS1 is conditioned to an expectation of shock. Forgetting phase: CS2 bell-> forgetting;. The expectation of shock is disconfirmed.
    1105. image p513fig13.43 A conditioned inhibitor does not extinguish because the expectation that was learned of no shock during the learning phase is not disconfirmed during the forgetting phase.
      || Conditioned excitor extinguishes. 1. Learning phase: CS1 light-> Shock; CS1-> Fear(-);. Forgetting phase: n/a;. 2. Learning phase: CS1 bell + CS2-> NO shock; CS2-> relief(+); CS2-> no shock;. Forgetting phase: CS2 bell!-> no forgetting;. The expectation that "no shock" follows CS2 is NOT disconfirmed!
    1106. image p514fig13.44 Analog of the COgEM model in Figure 6.1 of (Damasio 1999).
      || (a) map of object X-> map of proto-self at inaugural instant-> [, map of proto-self modified]-> assembly of second-order map. (b) map of object X enhanced-> second-order map imaged.
    1107. image p519fig14.01 Coronal sections of prefrontal cortex. Note particulary the areas 11, 13, 14, and 12o.
      ||
    1108. image p520fig14.02 Macrocircuit of the main brain regions, and connections between them, that are modelled in the unified predictive Adaptive Resonance Theory (pART) of cognitive-emotional and working memory dynamics. Abbreviations in red denote brain regions used in cognitive-emotional dynamics. Those in green denote brain regions used in working memory dynamics. Black abbreviations denote brain regions that carry out visual perception, learning and recognition of visual object categories, and motion perception, spatial representation and target tracking. Arrow denote non-excitatory synapses. Hemidiscs denote adpative excitatory synapses. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown. Also not shown are output signals from cortical areas to motor responses. V1: striate, or primary, visual cortex; V2 and V4: areas of prestriate visual cortex; MT: Middle Temporal cortex; MST: Medial Superior Temporal area; ITp: posterior InferoTemporal cortex; ITa: anterior InferoTemporal cortex; PPC: Posterior parietal Cortex; LIP: Lateral InterParietal area; VPA: Ventral PreArculate gyrus; FEF: Frontal Eye Fields; PHC: ParaHippocampal Cortex; DLPFC: DorsoLateral Hippocampal Cortex; HIPPO: hippocampus; LH: Lateral Hypothalamus; BG: Basal Ganglia; AMYG: AMYGdala; OFC: OrbitoFrontal Cortex; PRC: PeriRhinal Cortex; VPS: Ventral bank of the Principal Sulcus; VLPFC: VentroLateral PreFrontal Cortex. See the text for further details.
      ||
    1109. image p523fig14.03 (a) The MOTIVATOR neural model generalizes CogEM by also including the basal ganglia. It can hereby explain and simulate complementary functions of the amygdala and basal ganglia (SNc) during conditioning and learned performance. The basal ganglia generate Now Print signals in response to unexpected rewards. These signals modulate learning of new associations in many brain regions. The amygdala supports motivated attention to trigger actions that are expected to occur in response to conditioned or unconditioned stimuli. Object Categories represent visual or gustatory inputs in anterior inferotemporal (ITA) and rhinal (RHIN) cortices, respectively. Value Categories represent the value of anticipated outcomes on the basis of hunger and satiety inputs, in amygdala (AMYG) and lateral hypothalamus (LH). Object-Value Categories resolve the value of competing perceptual stimuli in medial (MORB) and lateral (ORB) orbitofrontal cortex. The Reward Expectation Filter detects the omission or delivery of rewards using a circuit that spans ventral striatum (VS), ventral pallidum (VP), striosomal delay (SD) cells in the ventral striatum, the pedunculopontine nucleus (PPTN) and midbrain dopaminergic neurons of the substantia nigra pars compacta/ventral tegmental area (SNc/VTA). The circuit that processes CS-related visual information (ITA, AMTG, ORB) operates in parallel with a circuit that processes US-related visual and gustatory information (RHIN, AMYG, MORB). (b) Reciprocal adaptive connections between hypothalamus and amygdala enable amygdala cells to become learned value categories. The bottom region represents hypothalmic cells, which receive converging taste and metabolite inputs whereby they become taste-drive cells. Bottom-up signals from activity patterns across these cells activate competing value category, or US Value Representations, in the amygdala. A winning value category learns to respond selectively to specific combinations of taste-drive activity patterns and sends adaptive top-down priming signals back to the taste-drive cells that activated it. CS-activated conditioned reinforcer signals are also associatively linked to value categories. Adpative connections end in (approximately) hemidiscs. See the text for details.
      ||
    1110. image p524fig14.04 (a) Model basal ganglia circuit for the control of dopaminergic Now Print signals from the substantia nigra pars compacta, or SNc, in response to unexpected rewards. Cortical inputs (Ii), activated by conditioned stimuli, learn to excite the SNc via a multi-stage pathway from the vantral striatum (S) to the ventral pallidum and then on to the PPTN (P) and the SNc (D). The inputs Ii excite the ventral striatum via adaptive weights WIS, and the ventral striatum excites the SNc with strength W_PD. The striosomes, which contain an adaptive spectral timing mechanism [xij, Gij, Yij, Zij], learn to generate adaptively timed signals that inhibit reward-related activation of the SNc. Primary reward signals (I_R) from the lateral hypothalamus both excite the PPTN directly (with strength W_RP) and act as training signals to the ventral striatum S (with strength W_RS) that trains the weights W_IS. Arrowheads denote excitatory pathways, circles denote inhibitory pathways, and hemidiscs denote synapses at which learning occurs. Thick pathways denote dopaminergic signals.
      ||
    1111. image p530fig14.05 Displays used by (Buschman, Miller 2007) in their visual search experiments. See the text foir details.
      || Fixation 500 ms-> Sample 1000 ms-> Delay 500 ms-> Visual [pop-out, search]- reaction time.
    1112. image p531fig14.06 Classification of scenic properties as texture categories by the ARTSCENE model. See the text for details.
      || Image-> Feature extraction (texture principal component rankings)-> Learning feature-to-scene mapping (texture category principal component rankings)<- scene class. Large-to-small attentional shrouds as principle component higher.
    1113. image p531fig14.07 Voting in the ARTSCENE model achieves even better prediction of scene type. See the text for details.
      || Image-> Feature extraction (texture principal component rankings)-> Learning feature-to-scene mapping (texture category principal component rankings)-> evidence accumulation (sum)-> scene class winner-take-all inference. Large-to-small attentional shrouds as principle component higher.
    1114. image p532fig14.08 Macrocircuit of the ARTSCENE Search neural model for learning to search for desired objects by using the sequences of already experienced objects and their locations to predict what and where the desired object is. V1 = First visual area or primary visual cortex; V2 = Second visual area; V4 = Fourth visual area; PPC = Posterior Parietal Cortex; ITp = posterior InferoTemporal cortex; ITa = anterior InferoTemporal cortex; MTL = Medial Temporal Lobe; PHC = ParaHippoCampal cortex; PRC = PeriRhinal Cortex; PFC = PreFrontal Cortex; DLPFC = DorsoLateral PreFrontal Cortex; VPFC = Ventral PFC; SC = Superior Colliculus.
      ||
    1115. image p533fig14.09 Search data and ARTSCENE Search simulations of them in each pair of images from (A) to (F). See the text for details.
      || 6*[data vs simulation], [Response time (ms) versus epoch].
    1116. image p540fig15.01 The timing of CS and US inputs in the delay and trace conditioning paradigms.
      || Delay and trace conditioning paradigms. [CS, US] vs [Delay, Trace]. To perform an adaptively timed CR, trace conditioning requires a CS memory trace over the Inter-Stimulus Interval (ISI).
    1117. image p541fig15.02 The neurotrophic Spectrally Timed Adaptive Resonance Theory, or nSTART, model of (Franklin, Grossberg 2017) includes hippocampus to enable adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between CS and US.
      || Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. [CS, US] -> Sensory Cortex (SC) <- motivational attention <-> category learning -> Prefrontal Cortex (PFC). SC conditioned reinforcement learning-> Amygdala (cannot bridge the temporal gap) incentive motivational learning-> PFC. SC adaptively timer learning and BDNF-> Hippocampus (can bridge the temporal gap) BDNF-> PFC. PFC adaptively timed motor learning-> cerebellum.
    1118. image p541fig15.03 Stages in the processing of adaptively timed conditioning, leading to timed responses in (d) that exhibit both individual Weber laws and an inverted U in conditioning as a function of ISI. See the text for details.
      || Curves of [Response vs ISI].
    1119. image p542fig15.04 Conditioning data from (Smith 1968; Millenson etal 1977). The former shows the kind of Weber Law and inverted U that were simulated in Figure 15.3. The latter shows that, if there are two ISIs during an experiment, then the animals learn to adaptively time their responses with two properly scaled Weber laws.
      || (left) One ISI (Smith 1968) [mean membrane extension (mm) versus time after CS onset (msec)]. (right) Two ISIs (Millenson etal 1977) [200, 100] msec CS test trials, [mean momentary CS amplitude (mm) vs time after CS onset (msec)]. (bottom) Conditioned eye blinks, made with nictitating membrane and/or eyelid, are adaptively timed: peak closure occurs at expected time(s) of arrival of the US following the CS and obeys a Weber Law.
    1120. image p543fig15.05 Simulation of conditioning with two ISIs that generate their own Weber Laws, as in the data shown in Figure 15.4.
      || Learning with two ISIs: simulation: R = sum[all: f(xi)*yi*xi] vs msec. Each peak obeys Weber Law! strong evidence for spectral learning.
    1121. image p543fig15.06 The circuit between dentate granule cells and CA1 hippocampal pyramid cells seems to compute spectrally timed responses. See the text for details.
      || Hippocampal interpretation. 1. Dentate granule cells (Berger, Berry, Thompson 1986): "increasing firing...in the CS period...the latency...was constant". 2. Pyramidal cells: "Temporal model" Dentate granule cells-> CA3 pyramids. 3. Convergence (Squire etal 1989): 1e6 granule cells, 1.6e5 CA3 pyramids. 80-to-1 (ri).
    1122. image p544fig15.07 In response to a step CS and sustained storage by I_CS of that input, a spectrum of responses xi at different rates ri develops through time.
      || Spectral timing: activation. CS-> I_CS-> All xi. STM sensory representation. Spectral activation d[dt: xi] = ri*[-A*xi + (1 - B*xi)*I_CS].
    1123. image p544fig15.08 The spectral activities xi generate sigmoid signals f(xi) before the signals are, in turn, gated by habituative transmitters yi.
      || Habituative transmitter gate. transmitter.
    1124. image p544fig15.09 As always, the habituative transmitter gate yi increases in response to accumulation and decreases due to gated inactivation, leading to the kinds of transmitter and output responses in the right hand column.
      || Habituative transmitter gate (Grossberg 1968). 1. d[dt: yi] = c*(1-yi) - D*f(xi)*yi, C-term - accumulation, D-term gated inactivation. 2. Sigmoid signal f(xi) = xi^n / (B^n + xi^n). 3. Gated output signal f(xi)*yi.
    1125. image p545fig15.10 When the activity spectrum xi generates a spectrum of sigmoidal signals (f(xi), the corresponding transmitters habituate at different rates. The output signals f(xi)*yi therefore generate a series of unimodal activity profiles that peak at different times, as in Figure 15.3a.
      || A timed spectrum of sampling intervals. [f(xi) activation, yi habituation, f(xi)*yi gated sampling] spectra. gated = sampling intervals.
    1126. image p545fig15.11 The adaptive weight, or LTM trace , zi learns from the US input I_US at times when the sampling signal f(xi)*yi is on. It then gates the habituative sampling signal f(xi)*yi to generate a doubly gated response f(xi)*yi*zi.
      || Associative learning, gated steepest descent learning (Grossberg 1969). d[dt: zi] = E*f(xi)*yi*[-zi + I_US], E-term read-out of CS gated signal, []-term read-out of US. Output from each population: f(xi)*yi*zi doubly gated signal.
    1127. image p546fig15.12 The adaptive weights zi in the spectrum learn fastest whose sampling signals are large when the US occurs, as illustrated by the green region in this simulation of (Grossberg, Schmajuk 1989).
      || Computer simulation of spectral learning. (left) fast (right) slow. Constant ISI: 6 cells fast to slow, 4 learning trials, 1 test trial.
    1128. image p546fig15.13 The total learned response is a sum R of all the doubly gated signals in the spectrum.
      || Adaptive timing is a population property. Total output signal: R = sum[i: f(xi)*yi*zi]. Adaptive timing is a collective property of the circuit. "Random" spectrum of rates achieves good collective timing.
    1129. image p547fig15.14 An individual's survival depends upon being able to process ?UN?expected non-occurrences, or disconfirmations, of goals differently from EXPECTED non-occurrences, or disconfirmations. See the text for details.
      || Unexpected non-occurences of goal: a predictive failure, eg reward that does not occur at the expected time. Leads to Orienting Reactions: Cognitive- STM reset, attention shift, forgetting; Emotional- Frustration; Motor- Exploratory behaviour;. What about an Expected non-occurrence? predictive signal, all other events, expected goal.
    1130. image p547fig15.15 Expected non-occurences do not prevent the processing of sensory events and their expectations. Rather, they prevent mismatches of those expectations from triggering orienting reactions.
      || Expected non-occurrence of goal. Some rewards are reliable but delayed in time. Does not lead to orienting reactions: How? Both expected and unexpected nonoccurrences are diue to mismatch of a sensory event with learned expectations. Expected non-occurrences do not inhibit sensory matching: eg a pigeon can see an earlier-than-usual food pellet. Hypothesis: Expected non-occurrences inhibit the process whereby sensory mismatch activates orienting reactions. Mismatch not-> orient.
    1131. image p548fig15.16 Homologous recognition learning and reinforcement learning macrocicuits enable adaptively timed conditioning in the reinforcement learning circuit to increase inhibition of the orienting system at times when a mismatch in the recognition system would have reduced inhibition of it.
      || Homolog between ART and CogEM model, complementary systems. [Recognition, Reinforcement] learning vs [Attentional, Orienting] system. Reinforcement: timing, drive representation.
    1132. image p548fig15.17 The timing paradox asks how inhibition of an orienting response (-) can be spread throughout the ISI, yet accurately timed responding can be excited (+) at the end of the ISI.
      || Timing paradox. [CS light, US shock] vs t. ISI = InterStimulus Interval = expected delay of reinforcer. Want timing to be accurate. Want to inhibit exploratory behaviour throught ISI.
    1133. image p549fig15.18 The Weber Law solves the timing paradox by creating an adaptively timed response throughout the ISI that peaks at the ISI. Within the reinforcement learning circuit, this response can maintain inhibition of the orienting system A at the same time as it generates adaptively timed incentive motivation to the orbitofrontal cortex.
      || Weber Law: reconciling accurate and distributed timing. Resolution: Output can inhibit orienting, peak response probability. What about different ISIs? Standard deviation = peak time. Weber law rule.
    1134. image p549fig15.19 How the adaptively timed hippocampal spectrum T inhibits (red arrow) the orienting system A as motivated attention in orbitofrontal cortex Si(2) peaks at the ISI.
      || Conditioning, Attention, and Timing circuit. Hippocampus spectrum-> Amgdala orienting system-> neocortex motivational attention. Adaptive timing inhibits orienting system and maintains adaptively timed Motivated Attention on the CS.
    1135. image p550fig15.20 Adaptively timed conditioning of Long Term Depression, or LTD, occurs in the cerebellum at synapses between parallel fibres and Purkinje cells, thereby reducing inhibition of subcortical nucleus cells and enabling them to express their learned movement gains within the learned time interval. Also see Figure 15.21.
      || [CS-Activated input pathways parallel fibres, US-Activated climbing fibres]-> [Subcortical nucleus (gain control), Cerebella cortex- Purkinje cells (timing)].
    1136. image p551fig15.21 The most important cells types and circuitry of the cerebellum: Purkinje cells (PC) receive excitatory inputs from the climbing fibres (CF) that originate in the inferior olive (IO) and from parallel fibres (PF), which are the axons for granule cells (GC). GCs, in turn, receive inputs from the mossy fibres (MF) coming from the precerebellar nuclei (PCN). The PF also inhibit PC via basket cells (BC), thereby helping to select the most highly activated PC. The PC generate inhibitory outputs from the cerebellum cortex to the deep cerebellar nuclei (DCN), as in Figure 15.20. Excitatory signals are denoted by (+) and inhibitory signals by (-). Other notations: GL- granular layer; GoC- golgi cells; ML- molecular layer; PCL- Purkinje cell layer; SC- stellate cell; WM- white matter.
      ||
    1137. image p551fig15.22 Responses of a retinal cone in the turtle retina to brief flashes of light of increasing intensity.
      || response vs msc.
    1138. image p552fig15.23 Cerebellar biochemistry that supports the hypothesis of how mGluR supports adaptively timed conditioning at cerebellar Purkinje cells. AMPA, Amino-3-hydroxy-5-methyl4-isoxazole priopionic acid-sensitive glutamate receptor; cGMP, cyclic guanosine monophosphate; DAG, diacylglycerol; glu, glutamate; GC, guanylyl cyclase; gK, Ca+-dependent K+ channel protein; GTP, guanosine triphosphate; IP 3'inositol,4,5-triphosphate; NO, nitric oxide; NOS, nitric oxide synthase; P, phosphate; PLC, phospholipase C; PKC, protein kinase C; PKG, cGMP-dependent protein kinase; PP-I, protein phosphatase-i;.
      || climbing fibre induced depolarization, parallel fibre induced mGLuR1 activation. PDE, GTP, 5'GMP, G-substrate, calcineurin, AMPA...
    1139. image p556fig15.24 (a) Data showing normally timed responding (solid curve) and short latency responses after lesioning cerebellar cortex (dashed curve). (b) computer simulation of short latency response after ablation of model cerebellar cortex.
      ||
    1140. image p557fig15.25 Computer simulations of (a) adaptively timed long term depression at Purkinje cells, and (b) adaptively timed activation of cereballar nuclear cells.
      || response vs time (msec)
    1141. image p557fig15.26 Brain regions and processes that contribute to autistic behavioral symptoms when they become imbalanced in prescribed ways.
      || Basal Gamglia prolonged gate opening <-> { Amygdala emotionally depressed-> [hippocampus- hyperspecific learning; Cerebellum- adaptive timing fails; hypofrontal blocking fails, no Theory of Mind]-> Neocortex; Neocortex- rewards not received-> Amygdala}.
    1142. image p559fig15.27 Brain regions and processes that contribute to the release of dopaminergic Now Print signals by the substantia nigra pars compacta, or SNc, in response to unexpected reinforcing events. See the text for details.
      || Model of spectrally timed SNc learning (Brown, Bulloch, Grossberg 1999). Delayed inhibitory expectations of reward. Dopamine cells signal an error in reqard prediction timing or magnitude. Immediate excitatory predictions of reward. Lateral hypothalamus (Primary Reward Input)-> [(+)ventral striatum <-> ventral pallidium (+)-> PPTN(+)-> SNc]. SNc-> [dopamine signal -> ventral striatum, Striosomal cells]. Conditioned Stimuli (CS)(+)-> [ventral striatum, striosomal cells]. Striosomal cells(-)-> SNc.
    1143. image p559fig15.28 Neurophysiological data (left column) and model simulations (right column) of SNc responses. See the text for details.
      || membrane potential vs time
    1144. image p560fig15.29 Excitatory pathways that support activation of the SNc by a US and the conditioning of a CS to the US.
      || Excitatory pathway. Primary reward (apple juice) briefly excites lateral hypothalamus. Hypothalamic-PPTN excitation causes SNc dopamine burst. Hypothalamic activity excites ventral striatum for training. Active CS working memory signals learn to excite ventral striatum. Lateral hypothalamus (Primary Reward Input)-> [(+)ventral striatum <-> ventral pallidium(+)-> PPTN(+)-> SNc]. SNc-> [dopamine signal -> ventral striatum. Conditioned Stimuli working memory trace (CS)(+)-> ventral striatum.
    1145. image p560fig15.30 The inhibitory pathway from striosomal cells to the SNc is able to inhibit the SNc when a reward occurs with expected timing and magnitude.
      || Inhibitory pathway. Learning: CS-striosomal LTP occurs due to a three-way coincidence [An active CS working memory input, a Ca2+ spike, a dopamine burst]; Signaling: The delayed Ca2+ spike facilitates striosomal-SNc inhibition;. Striosomal cells learn to predict both timing and magnitude of reward signal to cancel it: reward expectation;. Conditioned stimuli (CS) LTP-> Striosomal cells <- dopamine | (-)-> SNc->.
    1146. image p561fig15.31 The CS activates a population of striosomal cells that respond with different delays in order to enable adaptively timed inhibition of the SNc.
      || Expectation timing (Fiala, Grossberg, Bulloch 1996; Grossberg, Merrill 1992, 1996; Grossberg, Schmajuk 1989). How do cells bridge hundreds of milliseconds? Timing spectrum (msec). 1. CS activates a population of cells with delayed transient signals: MGluR. 2. Each has a different delay, so that the range of delays covers the entire interval. 3. Delayed transients gate both learning and read-out of expectations.
    1147. image p561fig15.32 The SNc can generate both dopamine bursts and dips in response to rewards whose amplitude is unexpectedly large or small.
      || Inhibitory pathway: expectation magnitude. 1. If reward is greater than expected, a dopamine burst causes striosomal expectation to increase. 2. If reward is less than expected, a dopamine dip causes striosomal expectation to decrease. 3. This is a negative feedback control system for learning. Conditioned stimuli (CS)-> Striosomal cells <- dopamine | (-)-> SNc->.
    1148. image p563fig15.33 The basal ganglia gate neural processing in many parts of the brain. The feedback loop through the lateral orbitofrontal cortex (blue arrow, lateral orbitofrontal) is the one that MOTIVATOR models.
      || MOTIVATOR models one of several thalamocortical loops through basal ganglia (Adapted from Fundamental Neuroscience. 2002 Copyright Elsevier). [cortex-> striatum-> pallidum S. nigra-> thalamus] vs [motor, oculomotor, dorsolateral prefrontal, lateral orbitofrontal, anterior cingulate]. thalamus-> [striatum, cortex].
    1149. image p563fig15.34 The colored regions are distinct parts of the basal ganglia in the loops depicted in Figure 15.33.
      || Distinct basal ganglia zones for each loop (Adapted from Fundamental Neuroscience. 2002 Copyright Elsevier).
    1150. image p564fig15.35 (a) A pair of recurrent shunting on-center off-surround networks for control of the fore limbs and hind limbs. (b) Varying the GO signal to these networks can trigger changes in movement gaits. See the text for details.
      ||
    1151. image p565fig15.36 (a) The FOVEATE model circuit for the control of saccadic eye movements within the peri-pontine reticular formation. (b) A simulated saccade staircase. See the text for details.
      || [left, right] eye FOVEATE model. [vertical vs horizontal] position (deg).
    1152. image p566fig15.37 Steps in the FOVEATE model's generation of a saccade. See the text for details.
      || input(+)-> LLBN-> [(-)OPN, (+)EBN], EBN(-)-> LLBN. (A) rest OPN active. (B) charge [input, LLBN, OPN] active. (C) burst [input, LLBN, EBN] active. (D) shutdown [OPN, EBN] active.
    1153. image p567fig15.38 (a) The Gated Pacemaker model for the control of circadian rythms is a recurrent shunting on-center off-surround network whose excitatory feedback signals are gated by habituative transmitters. Tonic arousal signals energize the pacemaker. Diurnal (left) and nocturnal (right) pacemakers are determined by whether phasic light signals turn the pacemaker on or off. An activity-dependent fatigue signal prevents the pacemaker from becoming overly active for too long. (b) Two simulations of circadian activity cycles during different schedules of light (L) and dark (D). See the text for details.
      || sourceOn-> on-cells (recurrent) <-(-) (-)> off-cells (recurrent) <-sourceOff. on-cells-> activity-> off-cells. off-cells-> fatigue. Diurnal: sourceOn=[light, arousal]; sourceOff=arousal;. Nocturnal: sourceOn=arousal; sourceOff=[arousal, light];.
    1154. image p568fig15.39 Circuits of the MOTIVATOR model that show hypothalamic gated dipoles.
      || [inputs, -> [object, value] categories-> object-value categories-> [reward expectation filter, [FEF, EAT] outputs]. reward expectation filter [DA dip, arousal burst]-> alpha1 non-specific arousal-> value categories. Msi drive inputs-> value categories.
    1155. image p569fig15.40 The direct and indirect basal ganglia circuits that control GO and STOP movement signals. See the text for details.
      || [Direct path GO(+), Indirect path STOP(+), dopamine from SNc(+-)]-> striatum. GO-> GPi/SNr-> Thalamus (VA/Vlo) <-> frontal cortex. STOP-> GPe <-> STN-> GPi/SNr. NAc-> GPi/SNr.
    1156. image p573fig16.01 The experimental chamber (A) and neurophysiological recordings from a rat hippocampus (B) that led to the discovery of place cells. See the text for details.
      ||
    1157. image p574fig16.02 Neurophysiological recordings of 18 different place cell receptive fields. See the text for details.
      ||
    1158. image p575fig16.03 As a rat navigates in its experimental chamber (black curves), neurophysiological recordings disclose the firing patterns (in red) of (a) a hippocampal place cell and (b) an entrorhinal grid cell.
      ||
    1159. image p578fig16.04 Cross-sections of the hippocampal regions and the inputs to them. See the text for details.
      || EC-> CA1-> CA3-> DG. Layers [V/V1, II, II].
    1160. image p580fig16.05 Macrocircuit of the GridPlaceMap model, which can learn both 2D grid cells and place cells in response to realistic trajectories of navigating rats using a hierarchy of SOMs with identical equations.
      || GridPlaceMap model: rate-based and spiking (Pilly, Grossberg 2012). Pre-wired 1D stripe cells, learns both 2D frid and place cells! Same laws for both; both select most frequent and energetic inputs. Place cells emerge gradually in response to developing grid cells. [place-> grid-> stripe] cells-> path integration-> vestibular signals
    1161. image p581fig16.06 The learning of hexagonal grid cell receptive fields as an animal navigates an open field is a natural consequence of simple trigonometric properties of the positions at which the firing of stripe cells that are tuned to different directions will co-occur.
      || The Trigonometry of spatial navigation. Coactivation of stripe cells.
    1162. image p582fig16.07 Stripe cells were predicted in (Mhatre, Gorchetchnikov, Grossberg 2012) to convert linear velocity signals into the distances travelled in particular directions. They are modeled by directionally-sensitive ring attractors, which help to explain their periodic activation as an animal continues to move in a given direction. See the text for details.
      || Stripe cells. Stripe cells are predicted to exist in (or no later than) EC layer (III, V/VI). Linear path integrators: represent distance traveled using linear velocity modulated with head direction signal. Ring attractor circuit: the activity bump represents distance traveled, stripe cells with same spatial period and directional preference fire with different spatial phases at different ring positions. Distance is computed directly, it does not require decoding by oscillatory interference. Periodic stripe cell activation due to ring anatomy: periodic boundary conditions. Stripe firing fields with multiple orientations, phases and scales.
    1163. image p582fig16.08 Some experimental evidence for stripe-like cell receptive fields has been reported. The band cells posited by Neil Burgess also exhibit the one-dimensional firing symmetry of stripe cells, but are modeled by oscillatory intererence. See the text for details.
      || Evidence for stripe-like cells. Entorhinal cortex data (Sargolini, Fyhn, Hafting, McNaughton, Witter, Moser, Moser 2006; Krupic, Burgess, O'Keefe 2012). Similar hypothetical construct used by Interference model but position is decoded by grid cell oscillatory interference- Band Cells (Burgess 2008).
    1164. image p583fig16.09 The GRIDSmap model used algorithmically defined stripe cells to process realistic rat trajectories. The stripe cell outputs then formed inputs to the adaptive filter of a self-organizing map which learned hexagonal grid cell receptive fields.
      || GRIDSmap. Self-organizing map receives inputs from stripe cells and learns to respond to most frequent co-activation patterns. Stripe cells combine speed and head direction to create a periodic 1D position code. Virtual rat navigated using live rat trajectories from Moser Lab. Speed and head direction drives stripe cells.
    1165. image p583fig16.10 The GRIDSmap model is embedded into a more complete representation of the processing stages from receipt of angular head velocity and linear velocity signals to this learning of place cells.
      || GRIDSmap. Pre-wired 2D stripe cells, learns 2D grid cells. vestibular cells [angular head velocity-> head direction cells, linear velocity]-> stripe cells- small scale 1D periodic spatial code (ECIII)-> SOM grid cells entorhinal cortex- small scale 2D periodic spatial scale-> SOM place cells hippocampal cortex- large scale 2D spatial code (dentate/CA3). Unified hierarchy of SOMs.
    1166. image p584fig16.11 GRIDSmap simulation of the learning of hexagonal grid fields. See the text for details.
      || Simulation results. Multiple phases per scale. response vs lenght scale (0.5m+).
    1167. image p584fig16.12 Temporal development of grid cell receptive fields on successive learning trials (1,3,5,7,25,50,75,100).
      || Temporal development of grid fields. Cells begin to exhibit grid structure by 3rd trial. Orientations of the emergent grid rotate to align with each other over trials.
    1168. image p585fig16.13 Hexagonal grid cell receptive fields develop if their stripe cell directional preferences are separated by 7, 10, 15, 20, or random numbers degrees. The number and directional selectivities of stripe cells can thus be chosen within broad limits without undermining grid cell development.
      ||
    1169. image p585fig16.14 Superimposing firing of stripe cells whose directional preferences differ by 60 degrees supports learning hexagonal grid cell receptive fields in GRIDSmap.
      || GRIDSmap: from stripe cells to grid cells. Grid-cell Regularity from Integrated Distance through Self-organizing map. Superimposing firing of stripe cells oriented at intervals of 60 degrees. Hexagonal grid!
    1170. image p586fig16.15 Superimposing stripe cells oriented by 45 degrees does not lead to learning of rectangular grids in GRIDSmap, but it does in an oscillatory inference model.
      || Why is a hexagonal grid favored? Superimposing firing of stripe cells oriented at intervals of 45 degrees. Rectangular grid. This and many other possibilities do not happen in vivo. They do happen in the oscillatory inference model. How are they prevented in GRIDSmap?
    1171. image p586fig16.16 In the place cell learning model of (Gorchetnikov, Grossberg 2007), three populations of five cells each of entorhinal grid cells (only two are shown) with different spatial periods input to the model's dentate gyrus. The grid cells are one-dimensional and defined algorithmically. A model dentate gyrus granule cell that receives strong projections from all three grid cell scales fires (green cell) and activates a recurrent inhibitory interneuron that inhibits other granule cells. It also generates back-propagating action potentials that trigger learning in the adaptive weights of the projections from the grid cells, thereby causing learning of place cell receptive fields.
      || Grid-to-place Self-Organizing map (Gorchetnikov, Grossberg 2007). Formation of place cell fields via grid-to-place cell learning. Least common multiple: [grid (cm), place (m)] scales: [40, 50, 60 (cm); 6m], [50, 60, 70 (cm); 21m], [41, 53, 59 (cm); 1.282 km]. Our simulations: [40, 50 (cm); 2m], [44, 52 (cm); 5.72m]. Our SOM: Spiking Hodgkin-Huxley membrane equations; Nonlinear choice by contrast-enhancing recurrent on-center off-surround net;. Choice triggers back-propagating action potentials that induce STDP-modulated learning on cell dendrites.
    1172. image p587fig16.17 A finer analysis of the 2D trigonometry of spatial navigation showed that both the frequency and amplitude of coactivations by stripe cells determine the learning of hexagonal grid fields.
      || A refined analysis: SOM amplifies most frequent and energetic coactivations (Pilly, Grossberg 2012). [linear track, 2D environment]. (left) Stripe fields separated by 90°. 25 coactivations by 2 inputs. (right) Stripe fields separated by 60°. 23 coactivations by 3 inputs.
    1173. image p588fig16.18 Simulations of coordinated learning of grid cell receptive fields (second row) and unimodal place cell receptive fields (third row) by the hierarchy of SOMs in the GridPlaceMap model. Note the exquisite regularity of the hexagonal grid cell firing fields.
      || [stripe, grid, place] cells vs [spikes on trajectory, unsmoothed rate map, smoothed rate map].
    1174. image p589fig16.19 Neurophysiological data showing the smaller dorsal grid cell scales and the larger ventral grid cell scales.
      || Spatial scale of grid cells increase along the MEC dorsoventral axis (Hafting etal 2005; Sargolini etal 2006; Brun etal 2008). [dorsal (left), ventral (right)] cart [rate map, autocortelogram]. How does the spatial scale increase along the MEC dorsoventral axis?
    1175. image p590fig16.20 Integration rate of grid cells decreases along the dorsoventral gradient of the Medial Entorhinal Cortex, or MEC.
      || Dorsoventral gradient in the rate of synaptic integration of MEC layer II stellate cells (Garden etal 2008). Cross-section of [Hp, CC, LEC, MEC. (A left column) [dorsal, ventral] mV? vs msec. (B center column) [half width (ms), rise time (ms), amplitude (mV)] vs location (μm). (C right upper) responses (D right lower) width (ms) vs loacation (μm).
    1176. image p590fig16.21 Frequency of membrane potential oscillations in grid cells decreases along the dorsoventral gradient of the MEC.
      || Dorsoventral gradient in the frequency of membrane potential oscillations of MEC layer II stellate cells (Giocomo etal 2007). (C left column) Oscillation (Hz) vs distance from dorsal surface (mm). (D right upper) [dorsal, ventral oscillations 5mV-500ms. (E right lower) [dorsal, ventral oscillations 100ms. Both membrane potential oscillation frequency and resonance frequency decrease from the dorsal to ventral end of MEC.
    1177. image p591fig16.22 Time constants and duration of afterhyperpolarization currents of grid cells increase along the dorsoventral gradient of the MEC.
      || Dorsoventral gradient in afterhyperpolarization (AHP) kinetics of MEC layer II stellate cells (Navratilova etal 2012). [mAHP time constant (ms), Half-width (mm)] vs distance from the dorsal surface (mm), at [-55, -50, -45] mV. Time constants and duration of AHP increase from the dorsal to the ventral end of MEC layer II. Effectively, the relative refractory period is longer for ventral stellate cells in MEC layer II.
    1178. image p591fig16.23 The Spectral Spacing Model uses a rate gradient to learn a spatial gradient of grid cell receptive field sizes along the dorsoventral gradient of the MEC.
      || Spectral spacing model. Map cells responding to stripe cell inputs of multiple scales. Grid cells: MEC layer II (small scale 2D spatial code). Stripe cells: PaS / MEC deep layer (small scale 1D spatial code). Path Integration. Vestibular signals- linear velocity and angular head velocity. SOM. How do entorhinal cells solve the scale selection problem?
    1179. image p592fig16.24 Parameter settings in the Spectral Spacing Model that were used in simulations.
      || Simulation settings. Activity vs distance (cm). Learning trials: 40.
    1180. image p593fig16.25 Spectral Spacing Model STM, MTM, and LTM equations. The rate spectrum that determines the dorsoventral gradient of multiple grid cell properties is defined by μm.
      || Spectral Spacing Model equations. [STM, MTM, LTM]. μm = rate spectrum.
    1181. image p593fig16.26 Data (left column) and simulations (right column) of the gradient of increasing grid cell spacing along the dorsoventral axis of MEC.
      || Gradient of grid spacing along dorsoventral axis of MEC (Brun etal 2008). data-[Distance (m?), Median grid spacing (m?)] simulations-[Grid spacing (cm), Grid spacing (cm)] vs response rate.
    1182. image p594fig16.27 Data (left column) and simulations (right column) of the gradient of increasing grid cell field width along the dorsoventral axis of MEC.
      || Gradient of field width along dorsoventral axis of MEC (Brun etal 2008). data-[Distance (m?), Width autocorr peak (m?)] simulations-[Grid field width (cm), Width autocorr peak (cm)] vs response rate.
    1183. image p595fig16.28 Data (left column) and simulations (right column) about peak and mean grid cell response rates along the dorsoventral axis of MEC.
      || Peak and mean rates at different locations along DV axis of MEC (Brun etal 2008). Peak rate (Hz) vs [data- DV quarter, simulations- Response rate].
    1184. image p596fig16.29 Data (top row) and simulations (bottom row) showing decreasing frequency of subthreshold membrane potential oscillations along the DV axis of MEC.
      || Subthreshold membrane potential oscillations at different locations along DV axis of MEC (Giocomo etal 2020; Yoshida etal 2011). Data [oscillations (Hz) vs distance from dorsal surface (mm) @[-50, -45] mV, Frequency (Hz) vs [-58, -54, -50] mV]. Simulations MPO frequency (Hz) s [response, habituation] rate.
    1185. image p596fig16.30 Data (top row) and simulations (bottom row) of spatial phases of learned grid and place cells.
      || Spatial phases of learned grid and place cells (Hafting etal 2005). Data: Cross-correlogram of rate maps of two grid cells; Distribution of phase difference: distance from origin to nearest peak in cross-correlogram. Simulations: Grid cell histogram of spatial correlation coefficients; Place cell histogram of spatial correlation coefficients.
    1186. image p597fig16.31 Data (a) and simulations (b-d) about multimodal place cell receptive fields in large spaces. The simulations are the result of learned place fields.
      || Multimodal place cell firing in large spaces (Fenton etal 2008; Henriksen etal 2010; Park etal 2011). Number of cells (%) vs Number of place fields. [2, 3] place fields, 100*100 cm space.
    1187. image p597fig16.32 Data (top row) and simulations (bottom row) about grid cell development in juvenile rats. Grid score increases (a-b and d), whereas grid spacing remains fairly flat (c and e).
      || Model fits data about grid cell development (Wills etal 2010; Langston etal 2010). Data: [Gridness, grid score, inter-field distance (cm)]. Simulations: [Gridness score, Grid spacing (cm)] vs trial.
    1188. image p598fig16.33 Data (top row) and simulations (bottom row) of changes in place cell properties in juvenile rats, notably about spatial information (a,c) and inter-trial stability (b,d).
      || Model fits data about grid cell development (Wills etal 2010). [Data, Simulation] vs [spatial information, inter-trial stability]. x-axis [age (postnatal day), trial].
    1189. image p598fig16.34 The spiking GridPlaceMap model generates theta-modulated place and grid cell firing, unlike the rate-based model.
      || Theta-modulated cells in spiking model. [place, grid] cell vs [membrane potential (mV vs time), frequency vs inter-spike intervals (s), power spectra (normalized power vs frequency (Hz))].
    1190. image p599fig16.35 Data (a) and simulations (b,c) about anatomically overlapping grid cell modules. (a) shows the anatomical distribution of grid cells belonging to different modules in one animal. DV location (mm) vs postrhinal border. (b) shows the simulated distribution of learned grid cell spacings from two stripe cell scales. frequency (%) vs grid spacing (cm). mu = [1, 0.6]. (c) shows what happens when half the cells respond with one rate and half another rate. (d) shows the same with three rates. (e-g) show spatial maps and autocorrelograms of grid cells that arise from the different rates in (d). [rate map, autocorelogram] vs [score [1.07, 0.5, 0.67], spacing (cm) [23.58, 41, 63.64]].
      ||
    1191. image p600fig16.36 The entorhinal-hipppocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories. See the text for details.
      || Entorhinal-hippocampal interactions as an ART system. Hippocampal place cells as spatial categories. Angular head velocity-> head direction cells-> stripe cells- small scale 1D periodic code (ECIII) SOM-> grid cells- small scale 2D periodic code (ECII) SOM-> place cells- larger scale spatial map (DG/CA3)-> place cells (CA1)-> conjunctive-coding cells (EC V/VI)-> top-down feedback back to stripe cells- small scale 1D periodic code (ECIII). stripe cells- small scale 1D periodic code (ECIII)-> place cells (CA1).
    1192. image p602fig16.37 Data showing the effect of hippocampal inactivation by muscimol on grid cell firing before, during, and six hours after the muscimol, reading from left to right.
      || Hippocampal inactivation disrupts grid cells (Bonnevie etal 2013). muscimole inactivation. spikes on trajectory: [before, after min [6-20, 20-40, 40-60, 6h]]. rate map (Hz) [18.6, 11.4, 9.5, 6.7, 10.8]. spatial autocorrelogram g=[1.12, 0.05, -0.34, 0.09, 1.27].
    1193. image p603fig16.38 Role of hippocampal feedback in maintaining grid fields. (a) Data showing the effect of hippocampal inactivation before and during muscimol inhibition of hippocampal cells, as in Figure 16.37. (b) Model simulation with normal grid fields. (c) Model simulation that emulates the effect of hippocampal inhibition on grid fields.
      || (a) Data: hippocampal inactivation [before, after] cart [spikes on trajectory (p: [18.6, 6.7] Hz), spatial autocorrelogram (g= [1.12, 0.09])]. (b) Model: noise-free path integration, [spikes on trajectory (p: 14.56 Hz), rate map, spatial autocorrelogram (g= 1.41), dynamic autocorrelogram (g=0.6)]. (c) Model: noisy path integration + non-specific tonic inhibition, [spikes on trajectory (p: 11.33 Hz), rate map, spatial autocorrelogram (g= 0.05), dynamic autocorrelogram (g=0.047)].
    1194. image p605fig16.39 Data showing effects of medial septum (MS) inactivation on grid cells and network theta oscillations in medial entorhinal cortex (MEC). (A) Examples of disruption in the spatial expression of the hexagonal grid structure for two grid cells (Brandon etal 2011). (B) Temporal reduction in the power and frequency of network theta oscillations (Koenig etal 2011). (C) Temporary reduction in the gridness score, mean firing rate, and patial stability of grid cells (Koenig etal 2011).
      || Disruptive effects of Medial Septum inactivation in Medial Entorhinal Cortex (Brandon etal 2011; Koenig etal 2011). (A) Rate map [rate map, spatial autocorrelations, trajectory] vs [baseline, sub-sampled, medial septum inactivation, 3-6 hour recovery, 24 hour recovery], [rate map (Hz- m, p), spatial autocorrelations (gridness)][ 1.2, 7.2, 1.1; 0.25, 1.7, 0.6; 0.25, 2.5, -0.53; 0.7, 5.1, 0.55; 1.0, 5.3, 1.3; 2.1, 15, 0.19; 1.7, 12, 0.71; 1.7, 3.2, -0.22; 1.8, 9.1, 0.68; 2.5, 13, 0.46]. (B) [normalized power at 7-9 Hz, frequency (Hz)] vs 5-minute periods. (C) [mean gridness score (+-SEM), mean firing rate (% of baseline), mean correlation coeff (+-SEM)] vs 10-minute periods.
    1195. image p607fig16.40 Effects of medial septum (MS) inactivation on grid cells. (a) Each row shows data and different data-derived measures of grid cell responsiveness, starting from the left with the baseline response to the middle column with maximal inhibition. (b) Data showing the temporary reduction in the gridness scores during MS inactivation, followed by recovery. (c) Simulation of the collapse in gridness, achieved by reduction in cell response rates to mimic reduced cholinergic transmission. (d,e) Simulations of the reduction in gridness scores in (d) by reduction of cell response rates, in (e) by changing the leak conductance. See the text for details.
      ||
    1196. image p611fig16.41 How back-propagating action potentials, supplemented by recurrent inhibitory interneurons, control both learning within the synapses on the apical dendrites of winning pyramidal cells, and regulate a rythm by which associative read-out is dissociated from read-in. See the text for details.
      ||
    1197. image p612fig16.42 Macrocircuit of the main SOVEREIGN subsystems.
      || [reward input, drive input, drive representation (DR), visual working memory and planning system (VWMPS), visual form and motion system (VFMS), motor approach and orienting system (MAOS), visual input (VisIn), motor working memory and planning system (MWMPS), motor approach and orienting system (MAOS), motor plant (MotP), Proprioceptive Input (PropIn), Vestibular Input (VesIn), Environmental feedback (EnvFB). DR [incentive motivational learning-> [VWMPS, MWMPS], -> VFMS, -> MAOS], VWMPS [conditioned reinforcer learning-> DR, MAOS], VFMS [visual object categories-> VWMPS, reactive movement commands-> MAOS], MWMPS [conditioned reinforcer learning-> DR, planned movement commands-> MAOS], MAOS [motor map positions-> MWMPS, motor outflow-> MotP], VisIn-> VFMS, VesIn-> MAOS, EnvFB-> [VisIn, MotP, VesIn].
    1198. image p613fig16.43 The main visual form and motion processing stream mechanisms of SOVEREIGN, many of them described at length in previous chapters.
      || Render 3-D scene (R3DS), figure-ground separation (FGS), log-polar transform (LPT), Gaussian coarse-coding (GCC), Invariant visual target map (IVTM), What Fuzzy ART (WhatFuzz), body spatial coordinates (BSC), where reactive visual TPV storage (WRVTS), Directional transient cell network (DTCN), Motion direction hemifild map (MDHM), Hemifiled left/right scoring (HLRS), reactive visual control signal (RVCS), Parvo/Magno/Erg competition (PMEC), Approach and Orient GOp (AOGp), GOm (GOm). R3DS [parvo-> FGS, magno-> DTCN], FGS-> [LPT, WRVTS], LPT-> GCC-> IVTM-> WhatFuzz, BSC-> [RVTS, PMEC], PMEC-> [gateRVTS-> RVTS, gateRVCS-> RVCS], DTCN-> MDHM-> HLRS, HLRS-> [PMEC, RVCS], AOGp-> gateRVTS, GOm-> gateRVCS.
    1199. image p613fig16.44 The main target position vector (TPV), difference vector (DV), and volitional GO computations in SOVEREIGN that bring together reactive and planned signals to control decision-making and action. See the text for details.
      || Reactive visual TPV (RVT), NETs (NETs), S-MV mismatch (SMVM), NETmv (NETmv), reactive visual TPV storage (RVTS), reactive DV1 (RD1), NET (NET), motivated what and where decisions (MWWD), Planned DV1 (PD1), tonic (Tonic), top-down readout mismatch (TDRM), Parvo gate (tonic) (PG), Orienting GOp offset (OGpO). RVT-> [NETs, RVTS], NETs-> [SMVM, NET], SMVM-> NET, NETmv-> SMVM, RVTS-> [NETs, RD1], NET-> [RD1, PD1, TDRM], MWWD-> PD1, PD1-> Tonic-> TDRMPG-> NETs, OGpO-> [NETmv, PD1].
    1200. image p614fig16.45 The main distance (d) and angle (a) computations that bring together and learn dimensionally-consistent visual and motor information whereby to make the currently best decisions and actions. See the text for details.
      || Reactive Visual TPV [m storage], NETm S-MV mismatch, MV mismatch, NETmv, PPVv, PPVm, Vestibular feedback, motor copy.
    1201. image p615fig16.46 SOVEREIGN uses homologous processing stages to model the (a) What cortical stream and the (b) Where cortical stream, including their cognitive working memories and chunking networks, and their modulation by motivational mechanisms. See the text for details.
      ||
    1202. image p615fig16.47 SOVEREIGN models how multiple READ circuits, operating in parallel in response to multiple internal drive sources, can be coordinated to realize a sensory-drive heterarchy that can maximally amplify the motivationally most currently favored option.
      ||
    1203. image p616fig16.48 SOVEREIGN was tested using a virtual reality 3D rendering of a cross maze (a) with different visual cues at the end of each corridor.
      ||
    1204. image p616fig16.49 The animat learned to convert (a) inefficient exploration of the maze into (b) an efficient direct learned path to the goal.
      ||
    1205. image p617fig16.50 The perirhinal and parahippocampal cortices enable adaptively timed reinforcement learning and spatial navigational processes that are modeled by Spectral Spacing models in the What and Where cortical streams, respectively, to be fused in the hippocampus.
      || What and Where inputs to the hippocampus (Diana, Yonelinas, Ranganath 2007). Adaptively timed conditioning and spatial naviga039tbl01.03 tion. Hippocampus <-> Entorhinal Cortex <-> [Perirhinal Cortex <-> what, Parahippocampal Cortex <-> where].
    1206. image p627tbl17.01 Homologs between reaction-diffusion and recurrent shunting cellular network models of development.
      || byRows: (reaction-diffusion, recurrent shunting net) (activator, excitatory activity) (inhibitor, inhibitory activity) (morphogenic source density, inputs) (firing of morphogen gradient, contrast enhancement) (maintenance of morphogen gradient, short-term memory) (power or sigmoidal signal functions, power or sigmoidal signal functions) (on-center off-surround interactions via diffusion, on-center off-surround interactions via signals) (self-stabilizing distributions of morphogens if inhibitors equilibrate rapidly, short-term memory pattern if inhibitors equilibrate rapidly) (periodic pulses if inhibitors equilibrate slowly, periodic pulses if inhibitors equilibrate slowly) (regulation, adaptation).
    1207. image p628fig17.01 A hydra
      ||
    1208. image p628fig17.02 Schematics of how different cuts and grafts of the normal Hydra in (a) may (*) or may not lead to the growth of a new head. See the text for details.
      ||
    1209. image p629fig17.03 How an initial morphogenetic gradient may be contrast enhanced to exceed the threshold for head formation in its most active region.
      || head formation threshold, final gradient, initial gradient.
    1210. image p630fig17.04 Morphogenesis: more ratios (Wolpert 1969). Shape preserved as size increases. French flag problem. Use cellular models! (Grossberg 1976, 1978) vs chemical or fluid reaction-diffusion models (Turing 1952; Gierer, Meinhardt 1972).
      ||
    1211. image p631fig17.05 How a blastula develops into a gastrula. See the text for details.
      || 1. The vegetal pole of the blastula flattens, [Animal, vegetal] hemisphere, blastocoel. 2. Some cells change shape and move inward to form the archenteron, Elastopore. 3. Other cells break free, becoming mesenchyme. 4. Then extensions of mesenchyme cells attach to the overlying ctoderm, Archenteron. 5. The archenteron elongates, assisted by the contraction of mesenchyme cells. 6. The mouth will form, where the archenteron meets ectoderm. 7. The blastopone will form the anus of the mature animal. [Mesenchyme, Ectoderm, Endoderm, Blastocoel, Archenteron, Mesenchyme]. Concept 38.3, www.macmillanhighered.com
    1212. image p634fig17.06 Summing over a population of cells with binary output signals whose firing thresholds are Gaussianly distributed (left image) generates a total output signal that grows in a sigmoidal fashion with increasing input size (dashed vertical line).
      || How binary cells with a Gaussian distribution of output thresholds generates a sigmoidal population signal. [# of binary cells with threshold T, Total output signal] vs Cell firing thresholds T. Cell population with firing thresholds Gaussianly distributed around a mean value. As input increases (dashed line), more cells in population fire with binary signals. Total population output obeys a sigmoid signal function f. Menu Menu
    1213. As described on the Introduction webPage, questions driving this "webSite" (collection of webPages, defined by the menu above) are :
    1214. How difficult would it be to augment "Transformer Neural Networks" (TrNNs) with Grossberg's [concept, architecture]s, including the emergent systems for consciousness? Perhaps this would combine the scalability of the former with the [robust, extendable] foundations of the latter, which is supported by [broad, diverse, deep] data from [neuroscience, psychology], as well success in real world advanced [science, engineering] applications? This section is repeated in the Introduction webPage.
      Menu
    1215. Menu
    1216. Menu
    1217. Grossberg 2021 p229c2h0.60 SMART computer simulations demonstrate that a good enough match of a top-down expectation with a bottom-up feature pattern generates an attentive resonance during which the spikes of active cells synchronize in the gamma frequency range of 20-70 Hz (Figure 5.40). Many labs have reported a link between attention and gamma oscillations in the brain, including two articles published in 2001, one from the laboratory of Robert Desimone when he was at the the National Institute of Mental Health in Bethseda (Fries, Reynolds, Rorie, Desimone 2001), and the other from the laboratory of Wolf Singer in Frankfurt (Engel, Fries, Singer 2001). You'll note that Pascal Fries participated in both studies, and is an acknowledged leader in neurobiological studies of gamma oscillations; eg (Fries 2009). .."
    1218. Grossberg 2021 p081c2h0.66 sub-section: How does one evolve a computational brain?
      The above discussion illustrates that no single step of theoretical derivation can derive a whole brain. One needs a method for deriving a brain in stages, or cycles, much as evolution has incrementally discovered ever more complex brains over many thousands of years. The following theoretical method has been successfully applied many times since I first used it in 1957. It embodies a kind of conceptual evolutionary process for deriving a brain.

      Because "brain evolution needs to achieve behavioural success", we need to start with data that embodiey indices of behavioral success. That is why, as illustrated in Figure 2.37 Modelling method and cycle, one starts with Behavioral Data from scores or hundreds of psychological experiments. These data are analyszed as the result of an individual adapting autonomously in real time to a changing world. This is the Arty of Modeling. It requires that one be able to infer from static data curves the dynamical processes that control individual behaviors occuring in real time. One of the hardest things that I teach to my students to do is "how to think in real time" to be able to carry out this speculative leap.

      Properly carried out, this analysis leads to the discovery of new Design Principles that are embodied by these behavioral processes. The Design Principles highlight the functional meaning of the data, and clarify how individual behaviors occurring in real time give rise to these static data curves.

      These principles are then converted into the simplest Mathematical Model using a method of minimal anatomies, which is a form of Occam's Razor, or principle of parsimony. Such a mathematical model embodies the psychological principles using the simplest possible differential equations. By "simplest" I mean that, if any part of the derived model is removed, then a significant fraction of the targeted data could no longer be explained. One then analyzes the model mathematically and simulates it on the computer, showing along the way how variations on the minimal anatomy can realize the design principles in different individuals or species.

      This analysis has always provided functional explanations and Behavioral Predictions for much larger behavioral data bases than those used to discover the Design Principles. The most remarkable fact is, however, that the behaviorally derived model always looks like part of a brain, thereby explaining a body of challenging Neural Data and making novel Brain Predictions.

      The derivation hereby links mind to brain via psychological organizational principles and their mechanistic realization as a mathematically defined neural network. This startling fact is what I first experienced as a college Freshman taking Introductory Psychology, and it changed my life forever.

      I conclude from having had this experience scores of times since 1957 that brains look the way they do because they embody a natural computational realization for controlling autonomous adaptation in real-time to a changing world. Moreover, the Behavior -> Principles -> Model -> Neural derivation predicts new functional roles for both known and unknown brain mechanisms by linking the brain data to how it helps to ensure behavioral success. As I noted above, the power of this method is illustrated by the fact that scores of these predictions about brain and behavior have been supported by experimental data 5-30 years after they were first published.

      Having made the link from behavior to brain, one can then "burn the candle from both ends" by pressing both top-down from Behavioral Data and bottom-up from Brain Data to clarify what the model can and cannot explain at its current stage of derivation. No model can explain everything. At each stage of development, the model can cope with certain environmental challenges but not others. An important part of the mathematical and computational analysis is to characterize the boundary between the known and unknown; that is which challenges the model can cope with and which it cannot. The shape of this boundary between the known and unknown helps to direct the theorist's attention to new design principles that have been omitted from previous analysis.

      The next step is to show how these new design principles can be incorporated into the evolved model in a self-consistent way, without undermining its previous mechanisms, thereby leading to a progressively more realistic model, one that can explain and predict ever more behavioral and neural data. In this way, the model undergoes a type of evolutionary development, as it becomes able to cope behaviorally with environmental constraints of ever increasing subtlety and complexity. The Method of Minimal Anatomies may hereby be viewed as way to functionally understand how increasingly demanding combinations of environmental pressures were incorporated into brains during the evolutionary process.

      If such an Embedding Principle cannot be carried out - that is, if the model cannot be unlumped or refined in a self-consistent way - then the previous model was, put simply, wrong, and one needs to figure out which parts must be discarded. Such a model is, as it were, an evolutionary dead end. Fortunately, this has not happened to me since I began my work in 1957 because the theoretical method is so conservative. No theoretical addition is made unless it is supported by multiple experiments that cannot be explained in its absence. Where multiple mechanistic instantiations of some Design Principles were possible, they were all developed in models to better underestand their explanatory implications. Not all of these instantiations could survive the pressure of the evolutionary method, but some always could. As a happy result, all earlier models have been capable of incremental refinement and expansion.

      The cycle of model evolution has been carried out many times since 1957, leading today to increasing numbers of models that individually can explain and predict psychological, neurophysiological, anatomical, biophysical, and even biochemical data. In this specific sense, the classical mind-body problem is being incrementally solved.

      Howell: bold added for emphasis.
      (keys : Principles-Principia, behavior-mind-brain link, brain evolution, cycle of model evolution)
      see also quotes: Charles William Lucas "Universal Force" and others (not retyped yet). Menu
    1219. [definitions, models] of consciousness.html -
    1220. What is consciousness: from historical to Grossberg - Menu
    1221. Menu
    1222. data from [neuroscience, psychology] : quick list, more details
    1223. success in real world advanced [science, engineering] applications (non-[bio, psycho]logical) A few common definitions of consciousness are provided my webPage [definitions, models] of [consciousness, sentience]. However, for reasons given on that webpage, only Stephen Grossberg's concept provide a workable basis that is tied to [].
      A few models of consciousness are summarized on my webPage A quick comparison of Consciousness Theories. Only a few concepts are listed, almost randomly selected except for [Grossberg, Taylor]'s, as there are a huge [number, diversity] of concepts. Stephen Grossberg may have the ONLY definition of consciousness that is directly tied to quantitative models for lower-level [neuron, general neurology, psychology] data. Foundational models, similar in nature to the small number of general theories in physics to describe a vast range of phenomena, were derived over a period of ?4-5? decades BEFORE they were found to apply to consciousness. That paralleled their use in very widespread applications in [science, engineering, etc]. As such, this is the only solidly-based EMERGENT theory of consciousness that I know of. Grossberg's book provides a wonderful description :
    1224. John Taylor's concepts - The only other concept for consiousness that felt even somewhat comfortable with was the late John Taylor's. It seemed to me that it emerged from "Approximate Dynamic Programming" theories of Paul Werbos, which was inspired by Sigmund Freud's theories (which I dodn't actually like in general, but had to admit their widespread adoption at one time, and their inspirational use) with a tremendous based of [theoretical, practical] applications to system [identification ????]. While I do provide a very brief summary on a separate webPage, it is not my current focus.
    1225. references- Grossberg and Menu
    1226. see Grossberg 2021: the biological need for machine consciousness
      Howell 30Dec2011, page 39 "Part VI - Far beyond current toolsets"
      Menu 13.3K Followers ..."(Blake Lemoine, 2022)
    1227. 11Jun2022 Is LaMDA Sentient? — an Interview

      22Jun2022 We’re All Different and That’s Okay

      11Jun2022 What is LaMDA and What Does it Want?

      14Aug2022 What is sentience and why does it matter?

      More detail following from Sejnnowski's thinking is on the webPage For whom the bell tolls. The following comment comes from that webPage.
      Menu
    1228. Historical thinking about consciousness.
    1229. Historical thinking about quantum [neurophysiology, consciousness] Menu
    1230. WRONG!! It may help the ready to re-visit comments about the historical thinking about consciousness, which is not limited to quantum consciousness. This complements items below. Early era of [General Relativity, Quantum Mechanics]: I would be greatly surprised if there wasn't some thinking about quantum consciousness at least back to the "modern inception" of quantum mechanics by Max Planc in 1901. Schrodinger seems to have gone at least partially in that direction by 1944 (see Historical thinking about quantum [neurophysiology, consciousness]). But as with the ancient Greeks, I would be surprised if others in the quantum mechanics community weren't thinking of mind in addition to matter in the early 1900s. To me, this would not be a solid assumption to make even if the lack of documentation is glaring.
      Pribram 1993 quantum fields and consciousness proceedings provides references back to 1960, and Jibu, Yasue comment that :
    1231. Howells questions about 1993 conference proceedings Menu see incorporate reader questions into theme webPage
      see Navigation: [menu, link, directory]s
    1232. p153 Howell: grepStr 'uncertainty' "multiple conflicting hypothesis"- a slef-imposed practice to avoid becoming a [believer, tool] of a concept. But this wasd intended for [long-term, well-established, mainstream] theories, and well as new ideas that excite me. Does Grossberg's uncertainty" concept also allow for "multiple conflicting hypothesis" to sit there and brew?
    1233. p190 Howell: [neural microcircuits, modal architectures] used in ART -
      bottom-up filterstop-down expectationspurpose
      instar learningoutstar learningp200fig05.13 Expectations focus attention: [in, out]star often used to learn the adaptive weights. top-down expectations to select, amplify, and synchronize expected patterns of critical features, while suppressing unexpected features
      LGN Lateral Geniculate NucleusV1 cortical areap192fig05.06 focus attention on expected critical features
      EC-III entorhinal stripe cellsCA1 hippocampal place cellsp600fig16.36 entorhinal-hippocampal system has properties of an ART spatial category learning system, with hippocampal place cells as the spatial categories
      auditory orienting arousalauditory categoryp215fig05.28 How a mismatch between bottom-up and top-down patterns can trigger activation of the orienting system A and, with it, a burst of nonspecific arousal to the category level. ART Matching Rule: TD mismatch can suppress a part of F1 STM pattern, F2 is reset if degree of match < vigilance
      auditory stream with/without [silence, noise] gapsperceived sound continues?p419fig12.17 The auditory continuity illusion: Backwards in time - How does a future sound let past sound continue through noise? Resonance!
      visual perception, learning and recognition of visual object categoriesmotion perception, spatial representation and target trackingp520fig14.02 pART predictive Adaptive Resonance Theory. Many adaptive synapses are bidirectional, thereby supporting synchronous resonant dynamics among multiple cortical regions. The output signals from the basal ganglia that regulate reinforcement learning and gating of multiple cortical areas are not shown.
      red - cognitive-emotional dynamics
      green - working memory dynamics
      black - see [bottom-up, top-down] lists
      EEG with mismatch, arousal, and STM reset eventsexpected [P120, N200, P300] event-related potentials (ERPs)p211fig05.21 Sequences of P120, N200, and P300 event-related potentials (ERPs) occur during oddball learning EEG experiments under conditions that ART predicted should occur during sequences of mismatch, arousal, and STM reset events, respectively.
      CognitiveEmotionalp541fig15.02 nSTART neurotrophic Spectrally Timed Adaptive Resonance Theory: Hippocampus can sustain a Cognitive-Emotional resonance: that can support "the feeling of what happens" and knowing what event caused that feeling. Hippocampus enables adaptively timed learning that can bridge a trace conditioning gap, or other temporal gap between Conditioned Stimuluus (CS) and Unconditioned Stimulus (US).

      backgound colours in the table signify :
      whitegeneral microcircuit : a possible component of ART architecture
      lime greensensory perception [attention, expectation, learn]. Table includes [see, hear, !!*must add touch example*!!], no Grossberg [smell, taste] yet?
      light bluepost-perceptual cognition?
      pink"the feeling of what happens" and knowing what event caused that feeling
      Menu Menu Note that a separate webPage lists a very small portion of Stephan Grossberg's publications.
    1234. J.E. Kaal, A. Otte, J.A. Sorensen, J.G. Emming 2021 "The nature of the atom" www.Curtis-Press.com, 268pp ISBN 978-1-8381280-2-9 https://StructuredAtom.org/
    1235. rationalwiki.org "Quantum consciousness" (last update 07Nov2022, viewed 16Jul2023)
      also critiques of the article above
    1236. Terrence J. Sejnowski 21Aug2023 "Large Language Models and the Reverse Turing Test", Neural Computation (2023) 35 (3): 309–342 (33 pages) https://direct.mit.edu/neco/issue (also copy in case original link fails)
    1237. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin 12Jun2017 "Attention Is All You Need" [v5] Wed, 6 Dec 2017 03:30:32 UTC https://arxiv.org/abs/1706.03762
    1238. Wikipedia Consciousness Menu
    1239. Menu
    1240. from the section Grossberg's c-ART, Transformer NNs, and consciousness?:
      Menu
    1241. As per the second question from the section Grossberg's c-ART, Transformer NNs, and consciousness?:
      2. How difficult would it be to augment "Transformer Neural Networks" (TrNNs) with Grossberg's [concept, architecture]s, including the emergent systems for consciousness? Perhaps this would combine the scalability of the former with the [robust, extendable] foundations of the latter, which is supported by [broad, diverse, deep] data from [neuroscience, psychology], as well success in real world advanced [science, engineering] applications?
      Menu
    1242. As per the first question from the section Grossberg's c-ART, Transformer NNs, and consciousness?:
      Menu
    1243. Menu
    1244. Menu
    1245. Grossbergs list of [chapter, section]s.html - Note that the links on this webPage can be used to individually view all captioned images.
    1246. directory of captioned images - users can easily view all of the captioned images, especially if they are downloaded onto their computer. Many image viewers have "forward, backward] arrows to go through these sequentially, or right-click to open a link in a window.
    1247. core bash script for extracting captions from webPage listing, convert them to images, then vertically appending them to the figure.
    1248. my bash utility to [position, move] windows. This is normally used to start up 6 workspaces on my computer (Linux Mint Debian Edition), each with 5-10 apps in separate windows.
    1249. Prepared themes with links to the captioned images - there are a huge number of themes from the book to focus on. I have prepared a few as examples.
    1250. What is consciousness? - video example not ready as of 30Aug2023. I save videos as "ogv/ogg" files, and open standard format. The "VLC media viewer" is the program that I use to view them. I have found that although some of the standard video viewers complain, when pushed into the process ogv files can be viewed with them. Menu
    1251. Navigation: [menu, link, directory]s
    1252. Theme webPage generation by bash script
    1253. Notation for [chapter, section, figure, table, index, note]s
    1254. incorporate reader questions into theme webPages
      GNU Public License The GNU Free Documentation License; Creative Commons License Menu
    1255. A very primitive bash script is used to generate the search results for ALL themes in the Themes webPage. Many readers will already have far better tools for this from the Computational Intelligence area etc.
      Because the theme webPage is automatically generated, and frequently re-generated as I update the list of themes and sources, I do NOT edit the file directly. The output format can be confusing, due to the special formatted [chapter, section] headings, and large tables which will keep the readers guessing whether they are still within the theme they want to peruse (as per the Table of Contents). Perhaps I can upgrade the searches in time to reduce the confusion, and to split themes in a better way.
    1256. list of [chapter, section]s
    1257. list of [figure, table]s
    1258. selected index items - I have NO intention of re-typing the entire index!
    1259. Grossberg quotes
    1260. reader Howell notes - this is an example of building your own webPage of [note, comment, thought]s when reading the book, which can them be added to the bash script for searches. Are notes in addition to [figure, table] captions, mostly comprised of text within the image, but also including quotes of text in the book. Rarely, it includes comments by Howell preceded by "Howell".
      The latter are distinct from "readers notes" (see, for example : Grossberg's list items- related notes from others). The reader may want to create their own file of comments based on this example, or augment this list with their [own, others'] notes. If using a new file, it should be added to the bash search script.
      More importantly, and as an easy first adaptation of Grossbergs [core, fun, strange] concepts.html thematic listings, you probably want to get rid of Howell's [comments, question]s. This can be done for a "local directory on your system" simply by :
    1261. downloading the entire webDirectories below to some directory on your filesystem, say {yourDir} : TrNNs_ART , bin (hopefully I'm not missing too many other directories in this list)
    1262. adapt the bash script bash script: thematic [search, collect]s.sh to your own system, and run. This will require re-defining several environmental variables for your, such as : Menu
    1263. thematic sub-lists appear in the webPage "Grossberg's [core, fun, strange] concepts", created by very simple searches for key [word, phrase]s. Links in the sub-lists lead quickly to pertinent figures or other content. Menu
    1264. 29Sep2023 Here is a list of various problems with the captioned images and their links on the webPage Grossbergs list of [figure, table]s.html :
      10Aug2023 I haven't yet provided content for this webPage. It does touch on one of three questions of this webSite as mentioned in the Introduction :
    1265. How difficult would it be to augment "Transformer Neural Networks" (TrNNs) with Grossberg's [concept, architecture]s, including the emergent systems for consciousness? Perhaps this would combine the scalability of the former with the [robust, extendable] foundations of the latter, which is supported by [broad, diverse, deep] data from [neuroscience, psychology], as well success in real world advanced [science, engineering] applications? 10Aug2023 This webPage has not yet been worked on. It will touch on one of three questions of this webSite as mentioned in the Introduction :
    1266. How difficult would it be to augment "Transformer Neural Networks" (TrNNs) with Grossberg's [concept, architecture]s, including the emergent systems for consciousness? Perhaps this would combine the scalability of the former with the [robust, extendable] foundations of the latter, which is supported by [broad, diverse, deep] data from [neuroscience, psychology], as well success in real world advanced [science, engineering] applications? 10Aug2023 I haven't yet provided content for this webPage. It does touch on one of three questions of this webSite as mentioned in the Introduction : Menu conscious ART (cART), etc
    1267. A surprisingly small number of neural architectures can simulate [extensive, diverse] [neuro, pyscho]logical data at BOTH the [sub, ]conscious levels, and for [perception, action] of [sight, auditory, touch, language, cognition, emotion, etc]. This is similar to what we see in physics.
    1268. [extensive, diverse] ex-bio applications have been successfully [developed, applied], based on Grossberg etal's computational models.
    1269. see simple grepStr search results : 'ART|cART|pART|ARTMAP|ARTSTREAM|ARTPHONE|ARTSCAN|dARTSCAN|pARTSCAN|ARTSCENE|ARTSTREAM|ARTWORD|cARTWORD|LAMINART|PARSE|SMART|START|nSTART' Grossberg's concepts are NOT normally listed in [compilations, reviews] of consciousness, which is a [puzzle, failure] that I address separately. (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin 2017) Byoung-Kyong Min 2010 "A Thalamic reticular networking model of consciousness"
      "... The model suggests consciousness as a "mental state embodied through TRN-modulated synchronization of thalamocortical networks". In this model the thalamic reticular nucleus (TRN) is suggested as ideally suited for controlling the entire cerebral network, and responsible (via GABAergic networking) for synchronization of neural activity. ..." (Wiki2023) Menu
    1270. Menu directory status & updates copyrights
    1271. Menu directory status & updates copyrights
  • Mark Boslough and team at Sandia National Laboratory - Exploding asteroids, 1,950 views, Dec 27, 2010
  • CraterHunter (?Dennis Cox?) - A Catastrophe of Comets, The geophysical world according to me, and a few folks I happen to agree with, ~23Dec2010?
  • EU2015 speakers Bruce Leybourne and Ben Davidson - explain theories of our electromagnetic environment and the hot spots of current welling inside the Earth. 2015 (Ben Davidson video 24Feb2016) https://www.youtube.com/watch?v=mPcF40vBqzs https://www.thunderbolts.info/wp/2016/05/11/arc-blast-part-1/
  • Mark Boslough and team at Sandia National Laboratory - Exploding asteroids, 1,950 views, Dec 27, 2010
  • https://www.thunderbolts.info/wp/2016/05/21/arc-blast-part-two/ https://www.thunderbolts.info/wp/2016/05/28/arc-blast-part-three/ https://www.thunderbolts.info/wp/2016/10/06/the-monocline/ https://www.thunderbolts.info/wp/2017/01/20/the-maars-of-pinacate-part-one/ https://www.thunderbolts.info/wp/2017/02/16/the-maars-of-pinacate-part-two/ https://www.thunderbolts.info/wp/2017/04/22/natures-electrode/ https://www.thunderbolts.info/wp/2017/05/21/the-summer-thermopile/ https://www.thunderbolts.info/wp/2017/06/13/tornado-the-electric-model/ https://www.thunderbolts.info/wp/2017/12/10/lightning-scarred-earth-part-1/ https://www.thunderbolts.info/wp/2017/12/17/lightning-scarred-earth-part-2/ https://www.thunderbolts.info/wp/2018/02/12/sputtering-canyons-part-1/ https://www.thunderbolts.info/wp/2018/02/12/sputtering-canyons-part-2/ https://www.thunderbolts.info/wp/2018/03/31/sputtering-canyons-part-3/ https://www.thunderbolts.info/wp/2019/03/31/the-eye-of-the-storm-part-1/ https://www.thunderbolts.info/wp/2019/05/05/the-eye-of-the-storm-part-2/ https://www.thunderbolts.info/wp/2019/05/24/eye-of-the-storm-part-3/ https://www.thunderbolts.info/wp/2019/06/20/eye-of-the-storm-part-4-2/ https://www.thunderbolts.info/wp/2020/03/19/47212/ https://www.thunderbolts.info/wp/2020/04/04/the-great-red-spot/ https://www.thunderbolts.info/wp/2020/09/24/48437/ https://www.youtube.com/watch?v=DgNTKrjpiiI&t=0s https://www.youtube.com/watch?v=_3ITTdl_QRY&t=0s https://www.thunderbolts.info/wp/2020/09/24/48437/ https://youtu.be/_3ITTdl_QRY https://www.thunderbolts.info/wp/2020/10/31/eye-of-the-storm-part-8/ https://youtu.be/2WS0vsVB4Tw https://www.thunderbolts.info/wp/2020/12/25/eye-of-the-storm-part-9/ https://youtu.be/LwbsA-QDBFY https://www.thunderbolts.info/wp/2020/12/25/eye-of-the-storm-part-9/ https://youtu.be/-KoJ9wpvD_g https://www.youtube.com/watch?v=-KoJ9wpvD_g https://www.thunderbolts.info/wp/2021/01/28/eye-of-the-storm-part-10-2/ https://www.youtube.com/watch?v=hW4kCP-ascw https://www.patreon.com/posts/andrew-hall-egg-51555997?utm_medium=post_notification_email&utm_source=post_link&utm_campaign=patron_engagement https://thunderbolts.us7.list-manage.com/track/click?u=1b8e5fc5ffab70f95805dea12&id=f6b8bab8a7&e=54f3bc9169 https://www.thunderbolts.info/wp/2021/08/20/the-shocking-truth/ https://www.youtube.com/watch?v=Pt6NscQ2qS8 Thunderblog source article
    https://www.youtube.com/watch?v=ISfuOZgaN3c
    https://www.youtube.com/watch?v=i4jWPfNJ0rM&t=1s
    Shine On You Crazy Diamond
  • Immanuel Velikovsky - Velikovsky is a primary inspiration for a great deal of breakthrough thinking across many subjects! He was not liked by establishment science, but over time most of his [idea, prediction]s have been OR[right, insightful], and mainstream scientists wrong! That thing about Venus sprouting from [Saturn, Mars, something] (I forget which) in historical times is a bit much for me, but given his track record I am afraid to say that he was wrong.
  • Paul Anderson - A US Army research chemist, Paul also was also core team member for the SAFIRE experiment of an electrical model of the sun, which broke convential physics theories and is leading to development and commercialisation efforts for [energy, de-radioisotope processes, ???]. (see also Howell 120903 Paul Anderson's Electric scarring of the Earth.pdf)
  • Rens Van Der Sluijs "Theories on the Rocks - In a Flash (Part Two)" 27Aug2021
  • Expanding Earth (EE) hypothesis [?Hildebrand?, Neal Adams (Batman artist), James Maxlow, ??? Hurrell?] - entirely subsumes plate techtonics and takes it to an entirely new level,both for geology and evolution.
  • Petroglyphs [David Talbot, Wal Thornhill, Anthony Peratt] - Mythology backed by space plasma science help explain what some [mythology, petroglyphic images] may represent. This is far superior to any other explanations that I have seen (including ?Joseph Campbell's archetypes).
    Menu directory status & updates copyrights