#] #] ********************* #] "$d_web"'Neural nets/DevelopingMinds/210930 Oudeyer - Developmental AI [machines, children] learn better transcript.txt' # www.BillHowell.ca 17Nov2022 initial # view in text editor, using constant-width font (eg courier), tabWidth = 3 #48************************************************48 2021-09-30: Pierre-Yves Oudeyer, INRIA, "Developmental Artificial Intelligence: machines that learn like children and help children learn better". 55:30 duration, Technologies for fostering efficient learning and intrinsic motivation Transcript 0:00 the time is exciting as you said um there was this very tiny field 0:06 uh called developmental robotics that was started uh something like 20 years ago 0:13 uh where a small group of people maybe for the first time 0:19 took seriously the research roadmap proposed by alan turing 50 years before about 0:26 trying to build machines that would learn like a child um you said i was one of the pioneers but i 0:33 have to say that you were also one of those pioneers and chen chen you and yuki nagai who are the 0:38 co-organizer of this seminar were also and so that that's really an honor and great for me that you invited me 0:46 today and so the time is exciting because more recently we've seen advances uh in 0:52 the field of machine learning and especially deep learning where people were not really trying 0:58 initially to to build machine that would learn like children but now there is a 1:04 since quite recently a form of convergence between those two fields 1:10 that has happened and which i think is really opening new perspectives uh and this field 1:15 i i like to call it developmental artificial intelligence and that's basically about um 1:22 trying to build machines that learn like children but also apply some of those things in domains 1:29 like education and for example help children learn better 1:34 so let me try to present an outline of research directions we've been working with many colleagues 1:41 over recent years to to develop further this domain 1:47 so first of all i just said recently there was a major advances in in deep learning and especially deep enforcement 1:54 learning where people have been trying to learn controllers and um we've seen pretty 1:59 impressive results so a few years ago everyone heard about the alphago system 2:04 which could learn to play the game of go and beat the best human players in the world 2:10 which was thought to be just impossible two or three years before and these advances were not 2:17 constrained to games like go but also went into the real world so for example now you have 2:24 different enforcement learning system that enable robots to work in a very robust manner 2:29 in very rough terrain you've got deep reinforcement learning system 2:34 that have been used in uh to control stratospheric stratospheric 2:40 balloon to bring internet to remote populations and you've been a different 2:45 different for smart learning used for example to develop new drugs which 2:50 can have high impact in medicine however this system even if they are very 2:56 impressive they are very far from the kind of capabilities 3:01 that human infants have so indeed children are extraordinary uh 3:07 not because uh some of them become world champions in games like go 3:12 or chess actually only very few of them become world champion in those games but this because nearly all of them 3:19 acquire efficiently a wide variety of everyday skills like uh locomotion 3:24 building lego structure doing bicycles playing board games mastering language etc etc and it's always evolving and 3:32 this learning is autonomous which means that there is never a an engineer 3:38 that intervenes by opening the brain to change some parameters or give some specific reward function for a new task 3:46 it's also developmental because learning uh happens progressively with the specific 3:51 timing and ordering and so for example the infant does not right away learns to work on two feet first he learns to hold 3:58 its head and later on to see to crawl uh to stand on two feet with one arm and on 4:03 the ground and progressively to open two feet so basically a fundamental question is how such developmental sequences form 4:10 uh what's the role of these structures and probably 4:15 such developmental pathways that we can call so learning curriculum are useful to address 4:21 a phenomenal problem which is how to guide learning of so many things when you have so little 4:28 resources of time of competition and of energy and so basically in my team we we've 4:34 been studying these questions along three dimensions so first we've been building algorithmic models 4:40 to help understand better human development and here we have a lot of interaction with developmental psychology and and neuroscience 4:47 then we've been trying to um extract those insights and and transfer them in 4:52 machine learning uh to build more robust more autonomous lifelong learning machines and finally 4:59 we've been studying how these insights can find applications in various domains such as educational technologies for 5:06 example for personalizing sequences of exercises to maximize not only learning efficiency 5:12 but also motivation or in the domain of automated discovery in physics or biology it may seem remote 5:18 to you but today i will try to illustrate these different facets including the applications uh which may 5:25 be important in the real world um and then taking a little bit of a 5:31 step back um the work that we've been doing 5:36 has been uh relying uh over the years on three fundamental ideas coming from 5:43 developmental psychology we've been really taking inspiration from them studying them modeling them so the first 5:49 fundamental idea is that the child is autotenic it means that he is a curious 5:55 little scientist that's intrinsically motivated to spontaneously explore the world to make 6:01 sense of the world and enhance its own learning objective its own goals 6:09 another fundamental idea is that intelligence is embodied and developed 6:15 through self-organization of the dynamical system that's formed by the brain body environment interactions 6:22 and finally the third idea which i will also develop is that intelligence develops in a 6:29 social context for solving social problems and in this social context 6:34 where social uh social peers are pretty key for several things so first of all they scaffold learning they 6:40 produce a form of socially induced curriculum learning but also there is the general culture and 6:47 language in particular which becoming internalized and can become cognitive 6:52 tools for example for planning imagination okay so let's go through various 6:59 dimensions of research exploring these fundamental ideas the first one the child has a little scientist so this is 7:05 for example an idea championed by people like piaget by line but also many others 7:11 so one extraordinary property of child development is that they spend a lot of time spontaneously exploring their 7:17 environment so they are not doing it because they are externally imposed tasks rather they 7:23 are driven by different forms of what psychologists call intrinsic motivation and what we may call in quotation mark 7:29 curiosity in everyday language and so during explorative pray for example they they invent and they pursue 7:37 their own problems uh and and such intrinsically motivating exploration 7:42 uh has been art to be key for child development for several reasons so for example for solving problems with rare 7:49 rewards for learning efficiently world models for discovering open-ended skill 7:55 repertoires and so um basically we've been trying to work 8:01 trying to model these principles so i said psychologists propose the ideas of intrinsic 8:08 motivation and indeed it was done was proposed already like uh 8:13 more than 50 years ago in the 40s and in the 50s in the last century and basically psychologists hypothesized 8:19 that the brain contained circuits that pushed infant to be intrinsically interested by things like novelty surprise cognitive 8:26 dissonance intermediate novelty or optimal challenge for example but but a major limit of this line of 8:33 work it was very inspiring but major limit was that these hypotheses have been has 8:38 remained at a verbal level and very little work has actually been done until 8:44 around 20 years ago to understand better the mechanisms and to study this concept 8:49 more experimentally but there is a small group of people actually uh quite 8:55 at the same time as the beginning of the development of robotics uh that began actually to work on curiosity driven 9:02 learning 20 years ago trying to understand it in humans trying to model it in machines and 9:07 today it has become a relatively standard topic even in very traditional machine 9:14 learning communities like people uh who go to eurips conference for example many people talk about curiosity 9:20 like like those people who want to write papers with many proofs many mathematics etc 9:25 so that's today very standard but 20 years ago when you would come to a conference and he was at the social 9:31 dinner and and then he would say okay i am working on curiosity driven learning like every people would look at you with 9:37 big eyes and thinking you you are the crazy scientist you were really crazy and it was happening uh 9:44 not only in ai but also in neuroscience and in psychology um but things have 9:50 changed things have changed and actually different communities uh 9:56 really took up the work the early work of the psychologist on the 40s and the 50s to 10:01 study operationally those mechanisms and so there is a first community basically 10:07 uh from which um i have come with my colleagues which has taken inspiration both from 10:14 developmental psychology and theoretical theoretical biology where we developed some hypotheses that 10:21 i'm going to present today but there are also other communities 10:27 who developed ideas related to curated driven learning for example in machine learning you had people like fedoroff 10:34 john andrea who was probably the first to introduce novelty search in reinforcement learning and then schmidt 10:39 huber and roberto and satinusing and their colleagues and then even another 10:45 community that has if i can say evolved similar ideas but independently is the evolutionary computation community 10:52 uh like ken stanley john mattis more stephane launcher and their colleagues so on our side our focus has really been 10:59 on modeling curiously driven exploration in humans um trying to understand how it 11:04 can be made to work for the acquisition of skills in the real world and understand how it links to 11:10 developmental self-organization and so as i said in the last 20 years a lot of 11:16 a lot of uh work has been done the community has 11:22 changed in views its views and curiosity many collaborations were built up between 11:28 people of various domains so for example i've been collaborating with jacqueline gottlieb with linda smith with celeski 11:34 and there are many others and and and now there is really an interdisciplinary community in the world 11:40 working on trying on this topic and even in neuroscience which has been very very reluctant uh 11:47 until uh 10 around 10 years ago to work on those topics now it's kind of a 11:54 big big topic also like you can see for example recently like this met the cover 11:59 of the nature reviews neuroscience uh journal so things have changed and that's right 12:04 and so our theoretical perspective um has basically been to see the child as a sense-making organism as a little 12:10 scientist that makes experiments to acquire good predictive models of the world and even more importantly good 12:16 models that enable to control the world with the actions and so most research in psychology but 12:22 also research in machine learning they have so far focused on understanding what kind of mechanisms enable efficient 12:28 learning if you assume you provide the data for learning or if you provide 12:34 externally a given task to the child up to the machine this is the vast majority of research so far but however a very 12:41 fundamental question often overlooked is how an autonomous learner can decide for 12:46 itself what tasks to learn when and in what order to learn them and 12:52 even learn how to represent novel tasks as new discoveries are made and so this is a kind of adaptive 12:58 metacognitive learning architecture that we need that i am picturing here and we've been basically trying to 13:04 understand what kind of of mechanism we need and this side to enable efficient 13:10 uh what we say autotellic learning like autotanic learning is this concept that 13:15 an agent will learn by generating its own objective its own goals 13:22 okay so there are different kinds of experiments that children can imagine and choose to pursue different kinds of 13:27 learning situation they could choose that correspond to different forms of intrinsically motivated exploration 13:37 so a first category that we call knowledge-based intrinsic motivation systems relies on basically generating 13:43 intrinsic rewards upon visiting some particular states or experiencing transition from a state to 13:49 another state when you do an action so such intrinsic rewards include things like prediction errors uncertainty 13:56 and this can be seen as basically enabling an agent to choose to make prediction experiments and this has 14:02 often been made on relatively short time scales in the in the in the in the computational modeling literature and in 14:08 machine learning and here it's important to realize there is no notion of a goal 14:13 this family of inference motivation system is also used primarily for solving uh 14:20 later on efficiently problems with external rewards but with with space 14:26 rewards or for learning efficiently world models but then there is a second category 14:32 which we've been calling competence-based intrinsic motivation and which is basically uh at the core of 14:38 what we call autotellic agents and which consists in systems where the agent imagines its own goals 14:45 and and selects the one he wants to pursue in what order and so here the intrinsic reward is basically replaced 14:50 by a measure of interestingness that's applied to imaging goals such as for 14:56 example competence progress to reach expected competence progress to reach these 15:01 generated goals and so the basically the agent is deciding uh which goal achievement experiment to make and this 15:07 has been studied over long time scale longer scale in the register and it's been particularly used to enable legends 15:14 to develop open-ended skill repertoires so in the talk today i will actually give examples that are more on that side 15:21 i will present basically research about autotanic agents and so concretely how does autotalk 15:28 learning work so here is the default structure that's common to all algorithm uh in a class that we've been calling i 15:35 am jet intrinsically motivated goal exploration processes which is basically a framework 15:41 that we've developed to try to organize research uh in this domain of orthotic learning and so here is the is the loop 15:48 they basically all follow first the agent observes a context then it samples 15:53 a goal with an internal goal generator then it generates an appropriate sequences of action with a goal 16:00 condition policy to pursue the goal inside the environment and then after 16:06 taking this sequence of action it uses its own go achievement function to evaluate evaluate for itself its own 16:13 performances sometimes it succeeds and so it reinforces what is being doing but 16:18 sometimes it fails but even when it fails most autotelic learning algorithm 16:23 algorithms include some forms of what's called hindsight learning which basically reliables what was achieved 16:29 with uh the goal that could have been imagined and that was actually achieved and was a 16:35 success um so for example imagine you want to to push a box on on the left you 16:41 try something and actually you push it on the right so you fail on the left but then you can say okay if my goal would 16:46 have been to push it on the right now i succeed and i can learn from this and then based on the data collected all 16:52 kinds of internal models are updated uh but 16:58 research on totally learning with such roof poses many challenges to achieve to achieve all of this efficiently 17:04 and so i'd like you to illustrate a few aspects of how to do that efficiently and how you mind humans might do that 17:13 so first i will discuss the importance of goal sampling strategies and in particular how an agent can automate its 17:18 goal sampling and how it can be made with a form of self-organized learning curriculum 17:24 so what's an interesting goal what's an interesting learning experiment for the bread the brand may make 17:31 or artificial so many ideas have been proposed in fields runs in from psychology to biology to ai 17:37 often quite independently so one idea is that an interesting experiment is one which leads to the experience of high 17:43 novelty or high prediction errors but well in the real world it's not going to work at all 17:48 for example this could lead a robot to spend his whole day staring through the window making movements with the arms 17:54 and trying to predict how the color of cars passing by may change as a function of its movements 17:59 always prediction errors of course but among other ideas it will now i will now explain 18:05 uh one idea that have been developed with the colleagues and we studied a lot that's inspired 18:12 initially from the work of varela and maturana in biology and that's around the concept of empirical learning 18:18 progress so this idea is that the interestingness of a of 18:24 a learning situation of a goal for example is proportional to empirical change 18:30 in goal achievement error so basically how much you improve to achieve the goal but this challenge is can also not be an 18:38 improved model but it can be a decrease in competencies will enable as we will see to drive 18:43 interest to parts of an environment that that change or skills that are being forgotten and so this idea we've been 18:49 calling it the learning progress hypothesis but in practice it includes also learning regress so that's absolute 18:55 learning progress and so such a mechanism can actually automatically self-organize exploration 19:00 along the curriculum and to visualize how let's imagine a situation where you have a robot 19:05 that can explore four sensory motor activities that are different and characterized by different learning 19:12 rates as we see on top of the graph so each curve here shows the evolution 19:17 of the errors in each activity if you would assume the learner would focus on practicing each of them for a 19:23 long time but then exploration driven by the learning 19:29 progress principle will result in avoidance of activities that are too easy or too difficult to learn and first 19:34 focus on the activity with the fastest learning rate and eventually when it starts to reach a plateau to switch to 19:39 the second most promising learning situation 19:45 so let's study this idea in the context of a real simulated robot environment 19:52 such as this one so here the robot is free to choose a switch to any goal he might learn uh achieve so in this 19:58 particular experiment he's given a space of goals in which he can sample from which includes a variety of things like 20:05 reaching via subject grasping by your subject stacking some objects but one challenge 20:11 is that we put destructors for example we put objects in the environment that move 20:17 randomly or that are too far away to be controlled by the robot and thus there are many goals in this gold space that 20:24 are just impossible to learn such as for example stacking uncontrollable objects uh but initially the robot does not know 20:31 that there are goals that are uncontrollable and he doesn't know which one but and also among physical goals some 20:38 are easy some are more difficult and some depend on mastering previous other skills uh to be to become learnable 20:46 and so basically if the robot samples goals randomly here it will be very inefficient at learning so there is 20:52 really a need for automatic curriculum learning and this is realized by something the goals according to their 20:57 expected absolute learning progress and so in practice um 21:02 it's posing several technical difficulties so one of them is how to estimate how to track how to predict the 21:08 learning progress for various goals uh because you cannot observe that directly so you need to estimate that and so for 21:15 this we develop specific approaches based on specific kinds of regression models 21:21 and especially in particular non-stationary prediction model regression models and then another challenge is how to 21:28 find efficiency the niches of learning progress what kind of goals might provide further learning 21:35 progress and for this we've been using bonded algorithms that are dedicated to non-stationary problems because indeed 21:41 a kind of learning situation or goal that provide learning progress at the point does not provide any more learning 21:47 progress later on when you've learned it by construction but today i won't delve into those 21:54 technical details i rather want to focus uh on what are the good properties of this sampling scheme when you when you 22:00 assume you already can estimate where the learning progress so on the right we see the actually the emergence of a 22:06 learning curriculum where on the top you see the progressive acquisitions of seal of increasing complexity and on the 22:12 bottom you see the estimates of learning progress which are basically proportional to the probabilities to select each goal type and we can 22:19 basically check that here the tasks 5 6 and 7 which are the unlearnable ones 22:25 are indeed very little sampled and so basically it means that the robot has learned to sample the goals of 22:31 appropriate complexity of appropriate learnability avoiding those that are easier either too easy or too difficult 22:39 and another good property of this gold sampling scheme is that it can address uh what people have called catastrophic 22:46 forgetting so catastrophic forgetting happens sometimes with neural networks when you train them a new task basically 22:53 erases what was learned about the previous task and so here we see this phenomenon after 22:58 uh that here after mastering the green type of goal uh on on the left the robot focuses 23:05 on the yellow one and this leads to a drop in performance in the green one so this is actually catch by the 23:10 estimate of absolute learning progress which we actually see in the middle where there is a a new bump a second bump in the in the 23:18 green card and then the green task is re-practiced again re-practiced again 23:24 uh and we see that on the on the selection probabilities on the right and then in the end it's 23:29 reacquired and every everything ends up mastered and so in the model i just described the 23:35 gold space representation was handmade but it's also possible that autotellic agents learned in space for example in 23:41 the case of robots perceiving raw images so one natural approach has been to trend for example what's called a 23:47 generative model such as variational auto encoder to learn an embedding of images and then to sample goals in that 23:54 embedding so this was for example made in uh the rig work uh 23:59 with uniform sampling in the prior but also in another work we did with with the algorithm we called ugl or another 24:06 team called did something they called skufit where goals are basically sampled on the frontier of the known 24:11 distribution of goal and this is basically implementing a form of novelty search when something the 24:17 the goals in the learned representation space so the advantage is that it enables much 24:22 more autonomy as compared to the system before but because it is doing some form of novelty search it's not going to be 24:28 robust to environment with distractors or many objects 24:33 and so to deal with this and to address robustness to distractors uh we've been applying the 24:40 learning progress idea for example in the grimshap system where we use a learning progress based bandit and where 24:45 the arms are unsurpassed supervised learned clusters in the learned embedding for example using russian 24:52 mixture models we've been also using approaches using a beta variational 24:58 encoders to learn the disentangle embeddings like to learn a representation separating different objects from raw pixels that enable 25:05 further organization of the learning progress based exploration 25:10 okay 25:17 so the learning progressed hypothesis that we initially proposed and developed as a 25:24 theoretical approach to understand how humans can explore large species of tasks potentially with many distractors 25:31 so far i've been talking to you mostly about studying 25:36 this hypothesis in the context of seeing how it enables machines to learn autonomously and efficiently in 25:44 artificial environments but a big question 25:49 is basically whether humans actually use learning progress to organize their 25:55 curiosity-driven exploration and so of course we wanted to test that in humans 26:02 it remained to be tested in humans at this until recently and this is exactly what we did in the 26:08 paper that's now impressed uh in nature communication and with this joint work with jacqueline gottlieb 26:15 in new york in columbia university and so in this experiment we propose basically to humans for learning tasks 26:22 that are that they are free to explore and they are they are not given any objectives 26:28 uh these tasks are basically uh about learning what food types certain 26:33 monsters with different shapes and from different families prefer 26:41 and there is basically one task that's very easy to learn there are two tasks that are more difficult to learn but 26:47 still learnable and one that is really difficult to learn but learnable and therein there is one task which on 26:52 purpose is made completely unlearnable but people don't know and so there is also a control group 27:00 of people to whom we gave explicitly the objective of maximizing learning and we add this group to see whether the free 27:06 group would actually do the same and so a first first result that we 27:13 obtained is that indeed a substantial part of the free exploration group actually behaved in a very similar 27:20 pattern than the group to whom we asked to maximize learning 27:27 but so we made many analyses today i'm not going to enter into the detail i will focus 27:33 on one computational modeling analysis uh where we've been studying which utility 27:38 function considering a space of many potential utility functions that include 27:43 alternative hypotheses to the learning process hypothesis for example minimizing or maximizing prediction 27:50 errors and basically we've been trying to find which are the utility function that 27:56 explain best human exploration and what we've been finding is that the utility function that 28:03 explains best human exploration actually include as a key component the estimate 28:09 of absolute learning progress so that was a very encouraging first 28:14 step to show that humans might use learning progress to monitor 28:19 their exploration okay there are many things to be said about this but let me go now to um 28:26 we go we went now to pretty fundamental research let us go to applications and then later on i will go back to 28:31 fundamental research and again to applications um okay now i'd like to try to see a 28:37 concrete application that are possible uh from this fundamental research unlocked athletic learning and learning 28:43 progress based hypothesis so the first application is for machine 28:49 learning so here the idea is to actually reuse learning progress based automatic curriculum learning especially 28:55 approaches based on gaussian mixture models to train very classical different smart learning 29:00 agents that are given one external task for example learning to locomotes 29:05 robustly and we basically want them to acquire policies that can robustly locomote in 29:11 various kind environments with various kinds of obstacles and to foster generalization one 29:17 approach is to use procedural content generation so that one can generate diverse environments for training but 29:23 then we challenge is how to control the evolution of sample environments especially when you might have a different personal learning student with 29:30 initially unknown capabilities with a body that has unknown capabilities also and very many parameters in the 29:36 procedural generation for example controlling the shape and the spacing of obstacles and you don't know really what's learnable what's not 29:42 vulnerable what's easy what's difficult and so actually you can use the learning progress based system to organize the 29:48 curriculum and what we've been showing is that if you use such an approach as opposed to random sampling of 29:54 environments which is actually often done uh in the different for smart learning domain then not only you can 30:00 learn much faster policies but actually even if you let a lot a lot of time 30:07 to the random curriculum it may never end up learning policies uh for example 30:12 to go over certain categories of obstacles we show here whereas with automated curriculum learning it will 30:17 actually learn very robust policy that will enable a wider variety 30:22 of generalization and so basically we've been showing this uh in 30:28 for in a very robust manner we've been using this kind of teacher algorithms to train many different kinds of different 30:35 reinforcement learning systems many different algorithms so many different bodies for many different environments 30:41 and we've actually provided a benchmark that's called tips my agents with all the code for those who want to build on 30:47 that but because this approach to automatic curriculum learning is very robust and 30:53 is initially inspired by models of human currency-driven learning another natural use is to personalize learning 30:58 curriculum for children in the domain of educational technologies so instead of generating a 31:04 learning curriculum for robots and machine learning algorithms here we were able we were able to show that a 31:09 learning process based automatic curriculum can actually allow to generate 31:15 personalized curriculums for human learners to maximize learning efficiency and 31:21 motivation so an example is the personalization of sequences of exercises for learning mathematics 31:27 that we've been exploring and experimenting in primary schools in a pretty large scale 31:32 a few years ago with a first experiment over one thousand children i'm going to plan to 31:38 the rhythm and and now it's it's now being disseminating at lunch to thousands of children 31:43 but the basic idea is to track for each student it's learning progress according to various properties of exercises and 31:50 incrementally proposed exercises that maximize learning progress so the principle is really the same as before 31:55 there is a space of exercises uh that's organized in a graph so here there is this graph which is a little bit brings 32:02 a little bit more structures it it allows to bring also a little bit of expert knowledge to hot start the system 32:08 and each node is a family of exercises with similar parameters for from which one can sample um 32:15 so the system is hot started by concerning only easy nodes in the graph and it's using learning progress to 32:20 sample exercises in the different kinds of exercises corresponding to these nodes 32:26 and uh actually to be precise the system does not completely impose exercises but it recommends a few decisions to the 32:32 child uh from this pool and the child makes the final choice and then as the learning proceeds the 32:38 set of nodes in the bandit evolved in direction of highlighting progress and so on and so we'll be we were basically 32:43 able to show that such automated personalized curriculum allowed to bring 32:48 all children to higher levels than curriculum that were handmade by a pedagogical expert and it was especially 32:54 the case for children that were either with special difficulties or special 33:01 talents and this kind of children typically they don't fit well with the handmade curriculum that are made by 33:06 experts so here basically we can we can address better the diversity of children 33:12 uh and also we show that not only it enables better learning but also uh the state of intrinsic motivation in 33:18 children that you the system uh is basically better than the the intrinsic 33:23 motivation of of children who've been using the the system made by a pedagogical expert 33:30 and recently we've been actually working with a consortium of educational technologies companies which are now 33:37 integrating uh this approach in a large scale educational solution which which 33:42 is called adaptive math to be disseminated uh in a very large scale in france 33:50 okay and finally i'd like to talk about another very interesting and broad area of application of a totality learning 33:55 algorithm uh they can actually be used as autonomous discovery assistant for scientists that 34:01 study and explore the emergence of highly structured patterns in many complex dynamical systems that run from 34:06 biological organism chemical system artificial and even artificial ones like the game of life 34:12 so in those so-called macrogenetic systems we are still far from grasping the range of structure that can 34:18 self-organize and we believe that totally learning can help so for example here uh 34:23 jonathan rizzu in the lead the lab of likrona was a previous phd in our team and he went in this 34:30 chemistry lab to use uh ototelli learning algorithm to explore 34:35 the behavior of oil droplets that are used by people by chemists who to study the origins of life 34:42 and basically those algorithms were used to control an automated factory robotic factory that makes automatically 34:47 experiments and basically they could they could show that by using uh the gold exploration algorithm they could 34:54 actually discover a wider diversity of phenomena that they knew before 34:59 and recently we've also been using these approaches to explore the formation of organized structures 35:05 in a continuous game of life which has been a very important testbed over the years for fundamental questions in 35:11 artificial life and theoretical biology and uh recently actually uh in a series of 35:18 papers uh we and also one we are working right now we manage actually to show 35:25 uh maybe for the first time uh how it's possible to evolve um 35:31 sensory motor agency in continuous sensory motor automata so maybe many of you are familiar with the gliders in the 35:38 cellular in the game of life which is a kind of specially localized pattern but this kind of glyser did not really have 35:45 sensory motor behavior like um it's the emergence of a body but with no organized 35:51 interaction with the environment and the way to preserve their integrity they are percubated by the environment what we 35:57 could evolve here with the totality learning are whole creatures initially there is only an 36:04 environment and we emerge localized creatures localized body 36:09 which can move around and which can for example have sensory motor behavior by 36:14 being robust to perturbations like obstacles go around them and actually generalize very strongly 36:22 okay so let's go now to the second fundamental idea i 36:28 mentioned initially an idea in particular was championed by linda smith 36:34 anesthetin as well as many others which is that intelligence is embodied and developed 36:40 through the self-organization of the dynamical form by a brand but the environment interactions 36:45 um one way that's very uh fundamental to see what it means this 36:52 approach is trying to understand the role of physics in generating 36:57 organized uh working or locomotion many people have thought for a long time that locomotion 37:03 was a complicated control problem when the brain had to decide every millisecond what's the optimal sequences 37:09 of uh motor signals to send uh for maximizing a certain cost like walking far without falling 37:16 and spending energy but people like ted mcgear for example they said okay what about um 37:23 building a pair of legs with no motors with no computer but with a geometry that is similar to human length and it 37:29 could show that by the intake the interaction between the mechanical structure and the gravity uh 37:34 there is the same organization of a gate that's very robust than you like um and this is the same idea that 37:41 we've been exploring uh with olivia lee in particular a few years ago when uh now we began to be the 37:48 robot with the vertebral column uh inspired by human physiology with elastics uh that were at different 37:55 places and we could show with only very very simple controllers with no model at all of the body and no 38:01 models of the interaction with the environment only very basic reactive systems that 38:06 these robots could be very robust in equilibration but also could um produce in an emergent manner 38:13 forms of interaction with human people like for example we discovered that 38:18 in a public show that little children would take the hand of the robot and the robot would follow 38:24 no one programmed this it is just a spontaneous consequence of the interaction between gravity the elastics 38:30 and the forces the small forces that are applied by the robot by the human 38:37 so basically this idea of the body providing structure can be also 38:43 combined with autonomic learning so for example this is what we've been doing in an experiment with where we've been 38:49 using a platform and in which we've been giving to the robots what we've called dynamic motion primitives this is 38:56 something developed in the robotics field uh uh initially inspired to model certain 39:01 kinds of neuromuscular synergies of observing in certain animals and here 39:07 we've been equipping those robots with those dynamical systems to produce organized movements 39:12 um and then combined with the ortotelic learning algorithm i presented before we 39:18 were basically able to show um that a robot in an environment that it 39:25 initially doesn't know the body initially doesn't know by exploring goals um 39:31 ranging from trying to move the hand which is initially an object like any other to move distant objects some of 39:37 them being destructors we have things that move along uh in the room we have seen that are too far away to be 39:43 controlled and we have more complicated toys that are learnable but pretty complicated and we saw the emergence in 39:49 just a few hours of an organized learning curriculum where initially after something a little bit everything 39:55 the robot discover that it's uh more interesting to sample goals in terms of movement of the hand 40:01 and then progressively he learns how to control the end and then it samples goals about moving the two joystick in 40:07 front of me and one of them is especially interesting because it happens to control the electric toy in front and then now the robot sampled 40:14 gold about the electric toy and in a very uh after just a few hours it ends up being 40:20 able to use the electric tie to move the ball around uh as you will see in a 40:25 second um in in in a way that's far from trivial actually if you would if you if 40:32 you would use a traditional different smart learning algorithm with low level action uh so without machine primitives and 40:38 with only an external reward for moving the ball and no orthotic learning at all it will never work just never work it 40:45 would take millions of years to learn this okay 40:51 so now is another related experiment um that's more grounded to child development and where basically one 40:58 studies how the exact same mechanism can participate to structure vocal development so the expression of sound 41:03 that can be produced with the vocal tract so an additional mechanism here is that the curiosity driven learning 41:08 mechanism was also used to let the robot decide when to try to imitate the speech sounds of a caregiver model or when to 41:15 explore other self-segmented kinds of sounds and so these models relies on the physical model of the vocal tract of the 41:21 motor control of the auditory system uh and includes also a model of 41:26 vocalization of social media and again experiment showed an autonomous self-organized exploration uh 41:33 within an initial self exploration phase the following sequence appears so you you the vocal learner first discovers 41:38 how to control phonation just to produce sound then to focus on vocal variations of unarticulated sounds things like 41:45 [Music] and finally automatically discovers and focuses on babbling with articulating 41:50 articulated protest levels and so as the vocal learner becomes more proficient at producing complex sounds 41:57 imitating vocalization of peers starts to provide learning progress and now it's focusing on this imitation and this 42:03 is exactly exactly the same kind of transition that we observe in many infants as shown by david oller and his 42:09 colleagues and it's important to know that here the self-organized trajectory is the result 42:16 of the dynamic interaction between the learning system the intrinsic motivation system the body and the environment and 42:21 so if we run an experiment many times when observed on the same time strong regularities 42:27 and also diversity in the developmental trajectory so for example some phases and sequences of phases that appear very 42:33 often but sometimes some phases are reversed and even some simulated agents have weird developmental trajectories 42:40 and with exactly the same mechanism and the same parameters and this duality 42:46 between regularities and diversity is also very typical of child development and in fact here this can be understood as the 42:52 system is a dynamical system where you have a different development trajectory that can be seen as attractors and the 42:59 contingencies that are encountered by the learner makes it fall with different probabilities in each of them 43:06 in each of these attractors and i won't enter into the detail but we've 43:11 basically been doing similar studies to model the development of of some aspects of tools 43:18 but let's go to the last fundamental idea 43:25 so this fundamental idea is that it's very important to not forget that human intelligence 43:31 develops in a social environment and so first of all it means that if we want to understand 43:37 human intelligence or if you we want to build human-like intelligence machines 43:42 we need to understand very deeply the family of problems they are they have evolved to solve humans have evolved to 43:50 solve and the key problems humans are specifically able to solve are social problems 43:57 so it means we probably need to understand how to provide rich social environments to machines 44:04 and more than this the social perspective as for example developed by vygotsky 44:11 shows that social environments social peers are also providing guidance a form 44:18 of socially induced curriculum learning to help children develop and finally and that's pretty fundamental social 44:24 communication tools and language are internalized to become cognitive tools for example for planning 44:30 your imagination and i'm going to illustrate a few recent work we made 44:36 to explore those ideas and especially this last idea of language as a cognitive tool 44:43 if we come back to the orthodelic learning architecture i described so far the spaces in which agents could sample 44:50 goals all corresponding to very concrete goals such as producing a precise movement or a precise visual pattern 44:56 with an object furthermore when using generative models to sample the goals for example the auto encoders 45:04 i was speaking about even if we did novelty search within those species the same thing happened 45:09 within the distribution of known goals but to power creative exploration this 45:15 is what we want agents would need to generate novel creative and abstract goals that are 45:21 really out of the distribution of what they have already seen and this process for doing this process children use 45:28 language um so so vygotsky showed that children speak to themselves as a tool to 45:34 generate goals and to plan uh to make plans to solve this goal so language is indeed compositional by 45:40 nature and it can push the limits of the known towards the unknown so for example if i know what the cat and the bus are then i can 45:47 easily compose the two to generate a new concept the cat bus and once i have this linguistic concept of a cat bus then 45:53 it's very easy to picture in the mine what it could look like 45:58 and so the compositionality of language can thus be used to generate out of distribution goals to imagine new goals 46:04 from known ones so inspired by this we recently introduced 46:10 the imagine system starting from basically state-of-the-art language guided different format learning system 46:16 that we have developed in the recent years and so there is a first phase which is relatively classical where 46:22 basically the agent is going to interact with the social peer and learn language guided policy uh so more 46:29 precisely here we have an environment where you have so that is a very visually simple environment but it's actually very rich from a compositional 46:35 point of view because you have many kinds of objects that can be combined a little bit like in minecraft if you know minecraft the difference is that here 46:42 it's visually much more simple but the combinatoriality is quite similar and so there is a first phase of 46:48 learning that is guided by the social peer and so here the agent explore the environment that is procedurally 46:54 generated with a lot of compositional action and the agent when the adjunct does 46:59 something that is relevant to the social peer then the social peer provides at the end of the behavioral trajectory 47:06 that's important because it's not the the social pair never produces instruction instructions or commands 47:11 it's only at the end it provides a language description of what was done and then and that's very important these 47:18 descriptions are used by legend to learn several things the first thing is very classical is going to learn a policy 47:25 that is conditioned by language groups but then it's going to learn an internal model 47:31 of the guidance provided by the social media like learning an internal model of 47:37 what the social peer would say or think um if he does a certain thing even if the 47:44 social player is not there anymore and so basically the thing he will internalize for the social pair and learn our first the goal achievement 47:51 function which later on with enabled legend to assess by itself whether a language goal has been achieved 47:56 it will learn also a captioner that basically is going to enable the agent to produce its own linguistic 48:01 description about its own discoveries which in turn will enable hindsight learning and it's going to learn the 48:07 goal generator basically enabling to imagine new goals and which is going to be based on the compositional structure 48:14 that is discovered in the language through the interaction and something that's very important to 48:20 realize is that the language model is not only a model of language it is also a model of relevance 48:26 it's basically a model of what the social peers would find interesting in the real world language is conveying 48:34 this relevant knowledge and so now comes the most interesting 48:40 part after the first phase there is now an autonomous exploration phase in which the social peer does not speak anymore 48:46 here the agent now reuses uh its uh learned goal generator to sample its own goal and it's not only 48:53 sentences that were already uttered by the social peer but new sentences that use the compositionality like you can see under 49:00 on the left for example maybe in the past with the social pier the the agent uh 49:05 so grasp red tree or grass red algae or raw blue algae now it's a it's able to 49:10 imagine a new goods by recombining and saying first of all oh maybe i could try to grow the red tree 49:17 um and so basically it means that uh the goal something process is going also to 49:22 be guided by the relevance model that's encoding in the language model and then it's going to use its learned goal achievement function to generate its own 49:29 feedback and learn from it and it's going to use the learn captioner for goal relabeling 49:35 and hindsight learning so for example imagine that the entrance aimed to grow the red tree but it failed 49:41 but maybe in the end it managed to to to to to grasp a new kind of object 49:47 uh to grasp i don't know the the yellow algae for example then it will able to describe linguistically this what it 49:54 just did and to learn about it and so basically now the agent explore by using 50:00 language to imagine new goals and really learns using fundamentally this internalized model of 50:08 the social beer um and so to evaluate such mechanism for 50:13 out-of-decision language uh goal imagination uh we've been 50:19 making a whole series of experiments uh studying the capacity of the agent uh to 50:24 achieve goals for example from a hold out test set of linguistic goals that 50:30 the social player was not allowed to to say in the first phase so we're sure these are goals that the social media 50:36 never taught to the agent and so we could see that the agent actually can 50:42 generalize um to those uh to those language goal test language goal much 50:47 better than he was than if he was only learning in a traditional way by by the social peer guidance without imagining 50:55 it's all goals using the eternal model of the social media we've actually been studying various 51:02 alternative implementation of this idea showing that it's very robust on the particular way you do the goal 51:07 imagination so that's a very strong idea we've also been able to show that 51:12 there are several kinds of generalization that in it it enables from the language that we've been also 51:18 able to show that it's enabled to reach very efficiently some exp it's enabled to explore very 51:23 efficiently in environments and and finally we have some some fun qualitative analysis like seeing an 51:28 agent for example imagine the goal of growing a plant which is a goal that was never uttered by the social peer and then it's going 51:35 to first try to do it by giving food to the plant because it's policy generalizing the strategy for growing 51:40 animals the designs learned with the social peer but then the internal learned goal achievement function says it doesn't 51:46 work and cell supervises the agents to adapt its behavior and to learn the right behavior which is basically to 51:52 bring water okay and then finally before i conclude a 51:58 very short word of a project that is uh at the very beginning right now with a 52:04 few colleagues uh but i think i'm very excited about that because it's i think it's something we need to do 52:11 i said that if we want to develop human-like intelligence uh human-like 52:17 agents with human-like intelligence we need to really address frontally so the social context 52:23 in most existing uh deep learning and different personal learning work you've been seeing in machine learning and in ai 52:31 uh uh recently uh in uh i'm speaking about the machine learning domain not development of robotics in which all 52:37 those issues are very traditional but i'm thinking about different personal learning uh which which i have contributed amazing methods but those 52:44 methods have not at all been tested in the social context for example you know the atari suit of video game which has been 52:51 instrumental and great to propel progress in the recent years in this field but there is no social 52:57 dimension at all and so what we believe is that we need to develop um rich 53:02 environments in which there are many kinds of social peers social problems uh on which we can 53:09 we can use those environment to foster the development of social intelligence and also to test social intelligence in 53:15 engines and so this is uh we have a first version of this which we call social ai which is only a draft it will 53:21 evolve a lot uh which enables to to to foster things like the development of the various 53:27 aspects of theory of minds learning interaction protocols which uh psychologists call pragmatic frames in 53:34 in the term used by jerome brunner so if you want to discuss this i'll be 53:39 happy to discuss this uh further after this talk and offline so let me 53:46 conclude i'd like to conclude by just saying again those three fundamental ideas that i think are very key for developmental 53:52 ai and and and should be used to structure research further the child is authentic 53:59 uh the child learns by generating its own learning objective its own goals 54:04 intelligence is embodied and developed through self-organization of the interaction between brand body and 54:09 environment which also includes the fact that to understand intelligence you don't only 54:16 need to develop complicated algorithmic learning models but you need to understand how to 54:21 provide adequate bodies and adequate environments and among the properties of adequate 54:27 environment it's very fundamental to understand that the social context is key 54:32 and it's not only key to develop social intelligence but also because social 54:38 skills like language become internalized and become cognitive tools that are 54:43 fundamental even for individual thinking and development and i've been also showing 54:49 and i've tried also to show you today that this fundamental research can have many real world and social societally 54:55 important applications in domains as diverse as educational technologies 55:01 robotics automatic discoveries etc and i'll finish by just saying a major 55:08 thanks uh to many phd sports engineers senior colleagues with whom i have been 55:14 working across the years there are actually even many more but but whose work is amazing 55:21 i couldn't have time to to present today but all this work is only possible 55:26 with them so thank you very much # enddoc