#] 
#] *********************
#] "$d_web"'Neural nets/DevelopingMinds/210930 Oudeyer - Developmental AI [machines, children] learn better transcript.txt'
# www.BillHowell.ca  17Nov2022 initial 
# view in text editor, using constant-width font (eg courier), tabWidth = 3

#48************************************************48


2021-09-30: Pierre-Yves Oudeyer, INRIA, "Developmental Artificial Intelligence: machines that learn like children and help children learn better".  
55:30 duration, Technologies for fostering efficient learning and intrinsic motivation


Transcript

0:00
the time is exciting as you said um there was this very tiny field
0:06
uh called developmental robotics that was started uh something like 20 years ago
0:13
uh where a small group of people maybe for the first time
0:19
took seriously the research roadmap proposed by alan turing 50 years before about
0:26
trying to build machines that would learn like a child um you said i was one of the pioneers but i
0:33
have to say that you were also one of those pioneers and chen chen you and yuki nagai who are the
0:38
co-organizer of this seminar were also and so that that's really an honor and great for me that you invited me
0:46
today and so the time is exciting because more recently we've seen advances uh in
0:52
the field of machine learning and especially deep learning where people were not really trying
0:58
initially to to build machine that would learn like children but now there is a
1:04
since quite recently a form of convergence between those two fields
1:10
that has happened and which i think is really opening new perspectives uh and this field
1:15
i i like to call it developmental artificial intelligence and that's basically about um
1:22
trying to build machines that learn like children but also apply some of those things in domains
1:29
like education and for example help children learn better


1:34
so let me try to present an outline of research directions we've been working with many colleagues
1:41
over recent years to to develop further this domain
1:47
so first of all i just said recently there was a major advances in in deep learning and especially deep enforcement
1:54
learning where people have been trying to learn controllers and um we've seen pretty
1:59
impressive results so a few years ago everyone heard about the alphago system
2:04
which could learn to play the game of go and beat the best human players in the world
2:10
which was thought to be just impossible two or three years before and these advances were not
2:17
constrained to games like go but also went into the real world so for example now you have
2:24
different enforcement learning system that enable robots to work in a very robust manner
2:29
in very rough terrain you've got deep reinforcement learning system
2:34
that have been used in uh to control stratospheric stratospheric
2:40
balloon to bring internet to remote populations and you've been a different
2:45
different for smart learning used for example to develop new drugs which
2:50
can have high impact in medicine however this system even if they are very
2:56
impressive they are very far from the kind of capabilities
3:01
that human infants have so indeed children are extraordinary uh
3:07
not because uh some of them become world champions in games like go
3:12
or chess actually only very few of them become world champion in those games but this because nearly all of them
3:19
acquire efficiently a wide variety of everyday skills like uh locomotion
3:24
building lego structure doing bicycles playing board games mastering language etc etc and it's always evolving and
3:32
this learning is autonomous which means that there is never a an engineer
3:38
that intervenes by opening the brain to change some parameters or give some specific reward function for a new task
3:46
it's also developmental because learning uh happens progressively with the specific
3:51
timing and ordering and so for example the infant does not right away learns to work on two feet first he learns to hold
3:58
its head and later on to see to crawl uh to stand on two feet with one arm and on
4:03
the ground and progressively to open two feet so basically a fundamental question is how such developmental sequences form
4:10
uh what's the role of these structures and probably
4:15
such developmental pathways that we can call so learning curriculum are useful to address
4:21
a phenomenal problem which is how to guide learning of so many things when you have so little
4:28
resources of time of competition and of energy and so basically in my team we we've
4:34
been studying these questions along three dimensions so first we've been building algorithmic models
4:40
to help understand better human development and here we have a lot of interaction with developmental psychology and and neuroscience
4:47
then we've been trying to um extract those insights and and transfer them in
4:52
machine learning uh to build more robust more autonomous lifelong learning machines and finally
4:59
we've been studying how these insights can find applications in various domains such as educational technologies for
5:06
example for personalizing sequences of exercises to maximize not only learning efficiency
5:12
but also motivation or in the domain of automated discovery in physics or biology it may seem remote
5:18
to you but today i will try to illustrate these different facets including the applications uh which may
5:25
be important in the real world um and then taking a little bit of a
5:31
step back um the work that we've been doing
5:36
has been uh relying uh over the years on three fundamental ideas coming from
5:43
developmental psychology we've been really taking inspiration from them studying them modeling them so the first
5:49
fundamental idea is that the child is autotenic it means that he is a curious
5:55
little scientist that's intrinsically motivated to spontaneously explore the world to make
6:01
sense of the world and enhance its own learning objective its own goals
6:09
another fundamental idea is that intelligence is embodied and developed
6:15
through self-organization of the dynamical system that's formed by the brain body environment interactions
6:22
and finally the third idea which i will also develop is that intelligence develops in a
6:29
social context for solving social problems and in this social context
6:34
where social uh social peers are pretty key for several things so first of all they scaffold learning they
6:40
produce a form of socially induced curriculum learning but also there is the general culture and
6:47
language in particular which becoming internalized and can become cognitive
6:52
tools for example for planning imagination okay so let's go through various
6:59
dimensions of research exploring these fundamental ideas the first one the child has a little scientist so this is
7:05
for example an idea championed by people like piaget by line but also many others
7:11
so one extraordinary property of child development is that they spend a lot of time spontaneously exploring their
7:17
environment so they are not doing it because they are externally imposed tasks rather they
7:23
are driven by different forms of what psychologists call intrinsic motivation and what we may call in quotation mark
7:29
curiosity in everyday language and so during explorative pray for example they they invent and they pursue
7:37
their own problems uh and and such intrinsically motivating exploration
7:42
uh has been art to be key for child development for several reasons so for example for solving problems with rare
7:49
rewards for learning efficiently world models for discovering open-ended skill
7:55
repertoires and so um basically we've been trying to work
8:01
trying to model these principles so i said psychologists propose the ideas of intrinsic
8:08
motivation and indeed it was done was proposed already like uh
8:13
more than 50 years ago in the 40s and in the 50s in the last century and basically psychologists hypothesized
8:19
that the brain contained circuits that pushed infant to be intrinsically interested by things like novelty surprise cognitive
8:26
dissonance intermediate novelty or optimal challenge for example but but a major limit of this line of
8:33
work it was very inspiring but major limit was that these hypotheses have been has
8:38
remained at a verbal level and very little work has actually been done until
8:44
around 20 years ago to understand better the mechanisms and to study this concept
8:49
more experimentally but there is a small group of people actually uh quite
8:55
at the same time as the beginning of the development of robotics uh that began actually to work on curiosity driven
9:02
learning 20 years ago trying to understand it in humans trying to model it in machines and
9:07
today it has become a relatively standard topic even in very traditional machine
9:14
learning communities like people uh who go to eurips conference for example many people talk about curiosity
9:20
like like those people who want to write papers with many proofs many mathematics etc
9:25
so that's today very standard but 20 years ago when you would come to a conference and he was at the social
9:31
dinner and and then he would say okay i am working on curiosity driven learning like every people would look at you with
9:37
big eyes and thinking you you are the crazy scientist you were really crazy and it was happening uh
9:44
not only in ai but also in neuroscience and in psychology um but things have
9:50
changed things have changed and actually different communities uh
9:56
really took up the work the early work of the psychologist on the 40s and the 50s to
10:01
study operationally those mechanisms and so there is a first community basically
10:07
uh from which um i have come with my colleagues which has taken inspiration both from
10:14
developmental psychology and theoretical theoretical biology where we developed some hypotheses that
10:21
i'm going to present today but there are also other communities
10:27
who developed ideas related to curated driven learning for example in machine learning you had people like fedoroff
10:34
john andrea who was probably the first to introduce novelty search in reinforcement learning and then schmidt
10:39
huber and roberto and satinusing and their colleagues and then even another
10:45
community that has if i can say evolved similar ideas but independently is the evolutionary computation community
10:52
uh like ken stanley john mattis more stephane launcher and their colleagues so on our side our focus has really been
10:59
on modeling curiously driven exploration in humans um trying to understand how it
11:04
can be made to work for the acquisition of skills in the real world and understand how it links to
11:10
developmental self-organization and so as i said in the last 20 years a lot of
11:16
a lot of uh work has been done the community has
11:22
changed in views its views and curiosity many collaborations were built up between
11:28
people of various domains so for example i've been collaborating with jacqueline gottlieb with linda smith with celeski
11:34
and there are many others and and and now there is really an interdisciplinary community in the world
11:40
working on trying on this topic and even in neuroscience which has been very very reluctant uh
11:47
until uh 10 around 10 years ago to work on those topics now it's kind of a
11:54
big big topic also like you can see for example recently like this met the cover
11:59
of the nature reviews neuroscience uh journal so things have changed and that's right
12:04
and so our theoretical perspective um has basically been to see the child as a sense-making organism as a little
12:10
scientist that makes experiments to acquire good predictive models of the world and even more importantly good
12:16
models that enable to control the world with the actions and so most research in psychology but
12:22
also research in machine learning they have so far focused on understanding what kind of mechanisms enable efficient
12:28
learning if you assume you provide the data for learning or if you provide
12:34
externally a given task to the child up to the machine this is the vast majority of research so far but however a very
12:41
fundamental question often overlooked is how an autonomous learner can decide for
12:46
itself what tasks to learn when and in what order to learn them and
12:52
even learn how to represent novel tasks as new discoveries are made and so this is a kind of adaptive
12:58
metacognitive learning architecture that we need that i am picturing here and we've been basically trying to
13:04
understand what kind of of mechanism we need and this side to enable efficient
13:10
uh what we say autotellic learning like autotanic learning is this concept that
13:15
an agent will learn by generating its own objective its own goals
13:22
okay so there are different kinds of experiments that children can imagine and choose to pursue different kinds of
13:27
learning situation they could choose that correspond to different forms of intrinsically motivated exploration
13:37
so a first category that we call knowledge-based intrinsic motivation systems relies on basically generating
13:43
intrinsic rewards upon visiting some particular states or experiencing transition from a state to
13:49
another state when you do an action so such intrinsic rewards include things like prediction errors uncertainty
13:56
and this can be seen as basically enabling an agent to choose to make prediction experiments and this has
14:02
often been made on relatively short time scales in the in the in the in the computational modeling literature and in
14:08
machine learning and here it's important to realize there is no notion of a goal
14:13
this family of inference motivation system is also used primarily for solving uh
14:20
later on efficiently problems with external rewards but with with space
14:26
rewards or for learning efficiently world models but then there is a second category
14:32
which we've been calling competence-based intrinsic motivation and which is basically uh at the core of
14:38
what we call autotellic agents and which consists in systems where the agent imagines its own goals
14:45
and and selects the one he wants to pursue in what order and so here the intrinsic reward is basically replaced
14:50
by a measure of interestingness that's applied to imaging goals such as for
14:56
example competence progress to reach expected competence progress to reach these
15:01
generated goals and so the basically the agent is deciding uh which goal achievement experiment to make and this
15:07
has been studied over long time scale longer scale in the register and it's been particularly used to enable legends
15:14
to develop open-ended skill repertoires so in the talk today i will actually give examples that are more on that side
15:21
i will present basically research about autotanic agents and so concretely how does autotalk
15:28
learning work so here is the default structure that's common to all algorithm uh in a class that we've been calling i
15:35
am jet intrinsically motivated goal exploration processes which is basically a framework
15:41
that we've developed to try to organize research uh in this domain of orthotic learning and so here is the is the loop
15:48
they basically all follow first the agent observes a context then it samples
15:53
a goal with an internal goal generator then it generates an appropriate sequences of action with a goal
16:00
condition policy to pursue the goal inside the environment and then after
16:06
taking this sequence of action it uses its own go achievement function to evaluate evaluate for itself its own
16:13
performances sometimes it succeeds and so it reinforces what is being doing but
16:18
sometimes it fails but even when it fails most autotelic learning algorithm
16:23
algorithms include some forms of what's called hindsight learning which basically reliables what was achieved
16:29
with uh the goal that could have been imagined and that was actually achieved and was a
16:35
success um so for example imagine you want to to push a box on on the left you
16:41
try something and actually you push it on the right so you fail on the left but then you can say okay if my goal would
16:46
have been to push it on the right now i succeed and i can learn from this and then based on the data collected all
16:52
kinds of internal models are updated uh but
16:58
research on totally learning with such roof poses many challenges to achieve to achieve all of this efficiently
17:04
and so i'd like you to illustrate a few aspects of how to do that efficiently and how you mind humans might do that
17:13
so first i will discuss the importance of goal sampling strategies and in particular how an agent can automate its
17:18
goal sampling and how it can be made with a form of self-organized learning curriculum
17:24
so what's an interesting goal what's an interesting learning experiment for the bread the brand may make
17:31
or artificial so many ideas have been proposed in fields runs in from psychology to biology to ai
17:37
often quite independently so one idea is that an interesting experiment is one which leads to the experience of high
17:43
novelty or high prediction errors but well in the real world it's not going to work at all
17:48
for example this could lead a robot to spend his whole day staring through the window making movements with the arms
17:54
and trying to predict how the color of cars passing by may change as a function of its movements
17:59
always prediction errors of course but among other ideas it will now i will now explain
18:05
uh one idea that have been developed with the colleagues and we studied a lot that's inspired
18:12
initially from the work of varela and maturana in biology and that's around the concept of empirical learning
18:18
progress so this idea is that the interestingness of a of
18:24
a learning situation of a goal for example is proportional to empirical change
18:30
in goal achievement error so basically how much you improve to achieve the goal but this challenge is can also not be an
18:38
improved model but it can be a decrease in competencies will enable as we will see to drive
18:43
interest to parts of an environment that that change or skills that are being forgotten and so this idea we've been
18:49
calling it the learning progress hypothesis but in practice it includes also learning regress so that's absolute
18:55
learning progress and so such a mechanism can actually automatically self-organize exploration
19:00
along the curriculum and to visualize how let's imagine a situation where you have a robot
19:05
that can explore four sensory motor activities that are different and characterized by different learning
19:12
rates as we see on top of the graph so each curve here shows the evolution
19:17
of the errors in each activity if you would assume the learner would focus on practicing each of them for a
19:23
long time but then exploration driven by the learning
19:29
progress principle will result in avoidance of activities that are too easy or too difficult to learn and first
19:34
focus on the activity with the fastest learning rate and eventually when it starts to reach a plateau to switch to
19:39
the second most promising learning situation
19:45
so let's study this idea in the context of a real simulated robot environment
19:52
such as this one so here the robot is free to choose a switch to any goal he might learn uh achieve so in this
19:58
particular experiment he's given a space of goals in which he can sample from which includes a variety of things like
20:05
reaching via subject grasping by your subject stacking some objects but one challenge
20:11
is that we put destructors for example we put objects in the environment that move
20:17
randomly or that are too far away to be controlled by the robot and thus there are many goals in this gold space that
20:24
are just impossible to learn such as for example stacking uncontrollable objects uh but initially the robot does not know
20:31
that there are goals that are uncontrollable and he doesn't know which one but and also among physical goals some
20:38
are easy some are more difficult and some depend on mastering previous other skills uh to be to become learnable
20:46
and so basically if the robot samples goals randomly here it will be very inefficient at learning so there is
20:52
really a need for automatic curriculum learning and this is realized by something the goals according to their
20:57
expected absolute learning progress and so in practice um
21:02
it's posing several technical difficulties so one of them is how to estimate how to track how to predict the
21:08
learning progress for various goals uh because you cannot observe that directly so you need to estimate that and so for
21:15
this we develop specific approaches based on specific kinds of regression models
21:21
and especially in particular non-stationary prediction model regression models and then another challenge is how to
21:28
find efficiency the niches of learning progress what kind of goals might provide further learning
21:35
progress and for this we've been using bonded algorithms that are dedicated to non-stationary problems because indeed
21:41
a kind of learning situation or goal that provide learning progress at the point does not provide any more learning
21:47
progress later on when you've learned it by construction but today i won't delve into those
21:54
technical details i rather want to focus uh on what are the good properties of this sampling scheme when you when you
22:00
assume you already can estimate where the learning progress so on the right we see the actually the emergence of a
22:06
learning curriculum where on the top you see the progressive acquisitions of seal of increasing complexity and on the
22:12
bottom you see the estimates of learning progress which are basically proportional to the probabilities to select each goal type and we can
22:19
basically check that here the tasks 5 6 and 7 which are the unlearnable ones
22:25
are indeed very little sampled and so basically it means that the robot has learned to sample the goals of
22:31
appropriate complexity of appropriate learnability avoiding those that are easier either too easy or too difficult
22:39
and another good property of this gold sampling scheme is that it can address uh what people have called catastrophic
22:46
forgetting so catastrophic forgetting happens sometimes with neural networks when you train them a new task basically
22:53
erases what was learned about the previous task and so here we see this phenomenon after
22:58
uh that here after mastering the green type of goal uh on on the left the robot focuses
23:05
on the yellow one and this leads to a drop in performance in the green one so this is actually catch by the
23:10
estimate of absolute learning progress which we actually see in the middle where there is a a new bump a second bump in the in the
23:18
green card and then the green task is re-practiced again re-practiced again
23:24
uh and we see that on the on the selection probabilities on the right and then in the end it's
23:29
reacquired and every everything ends up mastered and so in the model i just described the
23:35
gold space representation was handmade but it's also possible that autotellic agents learned in space for example in
23:41
the case of robots perceiving raw images so one natural approach has been to trend for example what's called a
23:47
generative model such as variational auto encoder to learn an embedding of images and then to sample goals in that
23:54
embedding so this was for example made in uh the rig work uh
23:59
with uniform sampling in the prior but also in another work we did with with the algorithm we called ugl or another
24:06
team called did something they called skufit where goals are basically sampled on the frontier of the known
24:11
distribution of goal and this is basically implementing a form of novelty search when something the
24:17
the goals in the learned representation space so the advantage is that it enables much
24:22
more autonomy as compared to the system before but because it is doing some form of novelty search it's not going to be
24:28
robust to environment with distractors or many objects
24:33
and so to deal with this and to address robustness to distractors uh we've been applying the
24:40
learning progress idea for example in the grimshap system where we use a learning progress based bandit and where
24:45
the arms are unsurpassed supervised learned clusters in the learned embedding for example using russian
24:52
mixture models we've been also using approaches using a beta variational
24:58
encoders to learn the disentangle embeddings like to learn a representation separating different objects from raw pixels that enable
25:05
further organization of the learning progress based exploration
25:10
okay
25:17
so the learning progressed hypothesis that we initially proposed and developed as a
25:24
theoretical approach to understand how humans can explore large species of tasks potentially with many distractors
25:31
so far i've been talking to you mostly about studying
25:36
this hypothesis in the context of seeing how it enables machines to learn autonomously and efficiently in
25:44
artificial environments but a big question
25:49
is basically whether humans actually use learning progress to organize their
25:55
curiosity-driven exploration and so of course we wanted to test that in humans
26:02
it remained to be tested in humans at this until recently and this is exactly what we did in the
26:08
paper that's now impressed uh in nature communication and with this joint work with jacqueline gottlieb
26:15
in new york in columbia university and so in this experiment we propose basically to humans for learning tasks
26:22
that are that they are free to explore and they are they are not given any objectives
26:28
uh these tasks are basically uh about learning what food types certain
26:33
monsters with different shapes and from different families prefer
26:41
and there is basically one task that's very easy to learn there are two tasks that are more difficult to learn but
26:47
still learnable and one that is really difficult to learn but learnable and therein there is one task which on
26:52
purpose is made completely unlearnable but people don't know and so there is also a control group
27:00
of people to whom we gave explicitly the objective of maximizing learning and we add this group to see whether the free
27:06
group would actually do the same and so a first first result that we
27:13
obtained is that indeed a substantial part of the free exploration group actually behaved in a very similar
27:20
pattern than the group to whom we asked to maximize learning
27:27
but so we made many analyses today i'm not going to enter into the detail i will focus
27:33
on one computational modeling analysis uh where we've been studying which utility
27:38
function considering a space of many potential utility functions that include
27:43
alternative hypotheses to the learning process hypothesis for example minimizing or maximizing prediction
27:50
errors and basically we've been trying to find which are the utility function that
27:56
explain best human exploration and what we've been finding is that the utility function that
28:03
explains best human exploration actually include as a key component the estimate
28:09
of absolute learning progress so that was a very encouraging first
28:14
step to show that humans might use learning progress to monitor
28:19
their exploration okay there are many things to be said about this but let me go now to um
28:26
we go we went now to pretty fundamental research let us go to applications and then later on i will go back to
28:31
fundamental research and again to applications um okay now i'd like to try to see a
28:37
concrete application that are possible uh from this fundamental research unlocked athletic learning and learning
28:43
progress based hypothesis so the first application is for machine
28:49
learning so here the idea is to actually reuse learning progress based automatic curriculum learning especially
28:55
approaches based on gaussian mixture models to train very classical different smart learning
29:00
agents that are given one external task for example learning to locomotes
29:05
robustly and we basically want them to acquire policies that can robustly locomote in
29:11
various kind environments with various kinds of obstacles and to foster generalization one
29:17
approach is to use procedural content generation so that one can generate diverse environments for training but
29:23
then we challenge is how to control the evolution of sample environments especially when you might have a different personal learning student with
29:30
initially unknown capabilities with a body that has unknown capabilities also and very many parameters in the
29:36
procedural generation for example controlling the shape and the spacing of obstacles and you don't know really what's learnable what's not
29:42
vulnerable what's easy what's difficult and so actually you can use the learning progress based system to organize the
29:48
curriculum and what we've been showing is that if you use such an approach as opposed to random sampling of
29:54
environments which is actually often done uh in the different for smart learning domain then not only you can
30:00
learn much faster policies but actually even if you let a lot a lot of time
30:07
to the random curriculum it may never end up learning policies uh for example
30:12
to go over certain categories of obstacles we show here whereas with automated curriculum learning it will
30:17
actually learn very robust policy that will enable a wider variety
30:22
of generalization and so basically we've been showing this uh in
30:28
for in a very robust manner we've been using this kind of teacher algorithms to train many different kinds of different
30:35
reinforcement learning systems many different algorithms so many different bodies for many different environments
30:41
and we've actually provided a benchmark that's called tips my agents with all the code for those who want to build on
30:47
that but because this approach to automatic curriculum learning is very robust and
30:53
is initially inspired by models of human currency-driven learning another natural use is to personalize learning
30:58
curriculum for children in the domain of educational technologies so instead of generating a
31:04
learning curriculum for robots and machine learning algorithms here we were able we were able to show that a
31:09
learning process based automatic curriculum can actually allow to generate
31:15
personalized curriculums for human learners to maximize learning efficiency and
31:21
motivation so an example is the personalization of sequences of exercises for learning mathematics
31:27
that we've been exploring and experimenting in primary schools in a pretty large scale
31:32
a few years ago with a first experiment over one thousand children i'm going to plan to
31:38
the rhythm and and now it's it's now being disseminating at lunch to thousands of children
31:43
but the basic idea is to track for each student it's learning progress according to various properties of exercises and
31:50
incrementally proposed exercises that maximize learning progress so the principle is really the same as before
31:55
there is a space of exercises uh that's organized in a graph so here there is this graph which is a little bit brings
32:02
a little bit more structures it it allows to bring also a little bit of expert knowledge to hot start the system
32:08
and each node is a family of exercises with similar parameters for from which one can sample um
32:15
so the system is hot started by concerning only easy nodes in the graph and it's using learning progress to
32:20
sample exercises in the different kinds of exercises corresponding to these nodes
32:26
and uh actually to be precise the system does not completely impose exercises but it recommends a few decisions to the
32:32
child uh from this pool and the child makes the final choice and then as the learning proceeds the
32:38
set of nodes in the bandit evolved in direction of highlighting progress and so on and so we'll be we were basically
32:43
able to show that such automated personalized curriculum allowed to bring
32:48
all children to higher levels than curriculum that were handmade by a pedagogical expert and it was especially
32:54
the case for children that were either with special difficulties or special
33:01
talents and this kind of children typically they don't fit well with the handmade curriculum that are made by
33:06
experts so here basically we can we can address better the diversity of children
33:12
uh and also we show that not only it enables better learning but also uh the state of intrinsic motivation in
33:18
children that you the system uh is basically better than the the intrinsic
33:23
motivation of of children who've been using the the system made by a pedagogical expert
33:30
and recently we've been actually working with a consortium of educational technologies companies which are now
33:37
integrating uh this approach in a large scale educational solution which which
33:42
is called adaptive math to be disseminated uh in a very large scale in france
33:50
okay and finally i'd like to talk about another very interesting and broad area of application of a totality learning
33:55
algorithm uh they can actually be used as autonomous discovery assistant for scientists that
34:01
study and explore the emergence of highly structured patterns in many complex dynamical systems that run from
34:06
biological organism chemical system artificial and even artificial ones like the game of life
34:12
so in those so-called macrogenetic systems we are still far from grasping the range of structure that can
34:18
self-organize and we believe that totally learning can help so for example here uh
34:23
jonathan rizzu in the lead the lab of likrona was a previous phd in our team and he went in this
34:30
chemistry lab to use uh ototelli learning algorithm to explore
34:35
the behavior of oil droplets that are used by people by chemists who to study the origins of life
34:42
and basically those algorithms were used to control an automated factory robotic factory that makes automatically
34:47
experiments and basically they could they could show that by using uh the gold exploration algorithm they could
34:54
actually discover a wider diversity of phenomena that they knew before
34:59
and recently we've also been using these approaches to explore the formation of organized structures
35:05
in a continuous game of life which has been a very important testbed over the years for fundamental questions in
35:11
artificial life and theoretical biology and uh recently actually uh in a series of
35:18
papers uh we and also one we are working right now we manage actually to show
35:25
uh maybe for the first time uh how it's possible to evolve um
35:31
sensory motor agency in continuous sensory motor automata so maybe many of you are familiar with the gliders in the
35:38
cellular in the game of life which is a kind of specially localized pattern but this kind of glyser did not really have
35:45
sensory motor behavior like um it's the emergence of a body but with no organized
35:51
interaction with the environment and the way to preserve their integrity they are percubated by the environment what we
35:57
could evolve here with the totality learning are whole creatures initially there is only an
36:04
environment and we emerge localized creatures localized body
36:09
which can move around and which can for example have sensory motor behavior by
36:14
being robust to perturbations like obstacles go around them and actually generalize very strongly
36:22
okay so let's go now to the second fundamental idea i
36:28
mentioned initially an idea in particular was championed by linda smith
36:34
anesthetin as well as many others which is that intelligence is embodied and developed
36:40
through the self-organization of the dynamical form by a brand but the environment interactions
36:45
um one way that's very uh fundamental to see what it means this
36:52
approach is trying to understand the role of physics in generating
36:57
organized uh working or locomotion many people have thought for a long time that locomotion
37:03
was a complicated control problem when the brain had to decide every millisecond what's the optimal sequences
37:09
of uh motor signals to send uh for maximizing a certain cost like walking far without falling
37:16
and spending energy but people like ted mcgear for example they said okay what about um
37:23
building a pair of legs with no motors with no computer but with a geometry that is similar to human length and it
37:29
could show that by the intake the interaction between the mechanical structure and the gravity uh
37:34
there is the same organization of a gate that's very robust than you like um and this is the same idea that
37:41
we've been exploring uh with olivia lee in particular a few years ago when uh now we began to be the
37:48
robot with the vertebral column uh inspired by human physiology with elastics uh that were at different
37:55
places and we could show with only very very simple controllers with no model at all of the body and no
38:01
models of the interaction with the environment only very basic reactive systems that
38:06
these robots could be very robust in equilibration but also could um produce in an emergent manner
38:13
forms of interaction with human people like for example we discovered that
38:18
in a public show that little children would take the hand of the robot and the robot would follow
38:24
no one programmed this it is just a spontaneous consequence of the interaction between gravity the elastics
38:30
and the forces the small forces that are applied by the robot by the human
38:37
so basically this idea of the body providing structure can be also
38:43
combined with autonomic learning so for example this is what we've been doing in an experiment with where we've been
38:49
using a platform and in which we've been giving to the robots what we've called dynamic motion primitives this is
38:56
something developed in the robotics field uh uh initially inspired to model certain
39:01
kinds of neuromuscular synergies of observing in certain animals and here
39:07
we've been equipping those robots with those dynamical systems to produce organized movements
39:12
um and then combined with the ortotelic learning algorithm i presented before we
39:18
were basically able to show um that a robot in an environment that it
39:25
initially doesn't know the body initially doesn't know by exploring goals um
39:31
ranging from trying to move the hand which is initially an object like any other to move distant objects some of
39:37
them being destructors we have things that move along uh in the room we have seen that are too far away to be
39:43
controlled and we have more complicated toys that are learnable but pretty complicated and we saw the emergence in
39:49
just a few hours of an organized learning curriculum where initially after something a little bit everything
39:55
the robot discover that it's uh more interesting to sample goals in terms of movement of the hand
40:01
and then progressively he learns how to control the end and then it samples goals about moving the two joystick in
40:07
front of me and one of them is especially interesting because it happens to control the electric toy in front and then now the robot sampled
40:14
gold about the electric toy and in a very uh after just a few hours it ends up being
40:20
able to use the electric tie to move the ball around uh as you will see in a
40:25
second um in in in a way that's far from trivial actually if you would if you if
40:32
you would use a traditional different smart learning algorithm with low level action uh so without machine primitives and
40:38
with only an external reward for moving the ball and no orthotic learning at all it will never work just never work it
40:45
would take millions of years to learn this okay
40:51
so now is another related experiment um that's more grounded to child development and where basically one
40:58
studies how the exact same mechanism can participate to structure vocal development so the expression of sound
41:03
that can be produced with the vocal tract so an additional mechanism here is that the curiosity driven learning
41:08
mechanism was also used to let the robot decide when to try to imitate the speech sounds of a caregiver model or when to
41:15
explore other self-segmented kinds of sounds and so these models relies on the physical model of the vocal tract of the
41:21
motor control of the auditory system uh and includes also a model of
41:26
vocalization of social media and again experiment showed an autonomous self-organized exploration uh
41:33
within an initial self exploration phase the following sequence appears so you you the vocal learner first discovers
41:38
how to control phonation just to produce sound then to focus on vocal variations of unarticulated sounds things like
41:45
[Music] and finally automatically discovers and focuses on babbling with articulating
41:50
articulated protest levels and so as the vocal learner becomes more proficient at producing complex sounds
41:57
imitating vocalization of peers starts to provide learning progress and now it's focusing on this imitation and this
42:03
is exactly exactly the same kind of transition that we observe in many infants as shown by david oller and his
42:09
colleagues and it's important to know that here the self-organized trajectory is the result
42:16
of the dynamic interaction between the learning system the intrinsic motivation system the body and the environment and
42:21
so if we run an experiment many times when observed on the same time strong regularities
42:27
and also diversity in the developmental trajectory so for example some phases and sequences of phases that appear very
42:33
often but sometimes some phases are reversed and even some simulated agents have weird developmental trajectories
42:40
and with exactly the same mechanism and the same parameters and this duality
42:46
between regularities and diversity is also very typical of child development and in fact here this can be understood as the
42:52
system is a dynamical system where you have a different development trajectory that can be seen as attractors and the
42:59
contingencies that are encountered by the learner makes it fall with different probabilities in each of them
43:06
in each of these attractors and i won't enter into the detail but we've
43:11
basically been doing similar studies to model the development of of some aspects of tools
43:18
but let's go to the last fundamental idea
43:25
so this fundamental idea is that it's very important to not forget that human intelligence
43:31
develops in a social environment and so first of all it means that if we want to understand
43:37
human intelligence or if you we want to build human-like intelligence machines
43:42
we need to understand very deeply the family of problems they are they have evolved to solve humans have evolved to
43:50
solve and the key problems humans are specifically able to solve are social problems
43:57
so it means we probably need to understand how to provide rich social environments to machines
44:04
and more than this the social perspective as for example developed by vygotsky
44:11
shows that social environments social peers are also providing guidance a form
44:18
of socially induced curriculum learning to help children develop and finally and that's pretty fundamental social
44:24
communication tools and language are internalized to become cognitive tools for example for planning
44:30
your imagination and i'm going to illustrate a few recent work we made
44:36
to explore those ideas and especially this last idea of language as a cognitive tool
44:43
if we come back to the orthodelic learning architecture i described so far the spaces in which agents could sample
44:50
goals all corresponding to very concrete goals such as producing a precise movement or a precise visual pattern
44:56
with an object furthermore when using generative models to sample the goals for example the auto encoders
45:04
i was speaking about even if we did novelty search within those species the same thing happened
45:09
within the distribution of known goals but to power creative exploration this
45:15
is what we want agents would need to generate novel creative and abstract goals that are
45:21
really out of the distribution of what they have already seen and this process for doing this process children use
45:28
language um so so vygotsky showed that children speak to themselves as a tool to
45:34
generate goals and to plan uh to make plans to solve this goal so language is indeed compositional by
45:40
nature and it can push the limits of the known towards the unknown so for example if i know what the cat and the bus are then i can
45:47
easily compose the two to generate a new concept the cat bus and once i have this linguistic concept of a cat bus then
45:53
it's very easy to picture in the mine what it could look like
45:58
and so the compositionality of language can thus be used to generate out of distribution goals to imagine new goals
46:04
from known ones so inspired by this we recently introduced
46:10
the imagine system starting from basically state-of-the-art language guided different format learning system
46:16
that we have developed in the recent years and so there is a first phase which is relatively classical where
46:22
basically the agent is going to interact with the social peer and learn language guided policy uh so more
46:29
precisely here we have an environment where you have so that is a very visually simple environment but it's actually very rich from a compositional
46:35
point of view because you have many kinds of objects that can be combined a little bit like in minecraft if you know minecraft the difference is that here
46:42
it's visually much more simple but the combinatoriality is quite similar and so there is a first phase of
46:48
learning that is guided by the social peer and so here the agent explore the environment that is procedurally
46:54
generated with a lot of compositional action and the agent when the adjunct does
46:59
something that is relevant to the social peer then the social peer provides at the end of the behavioral trajectory
47:06
that's important because it's not the the social pair never produces instruction instructions or commands
47:11
it's only at the end it provides a language description of what was done and then and that's very important these
47:18
descriptions are used by legend to learn several things the first thing is very classical is going to learn a policy
47:25
that is conditioned by language groups but then it's going to learn an internal model
47:31
of the guidance provided by the social media like learning an internal model of
47:37
what the social peer would say or think um if he does a certain thing even if the
47:44
social player is not there anymore and so basically the thing he will internalize for the social pair and learn our first the goal achievement
47:51
function which later on with enabled legend to assess by itself whether a language goal has been achieved
47:56
it will learn also a captioner that basically is going to enable the agent to produce its own linguistic
48:01
description about its own discoveries which in turn will enable hindsight learning and it's going to learn the
48:07
goal generator basically enabling to imagine new goals and which is going to be based on the compositional structure
48:14
that is discovered in the language through the interaction and something that's very important to
48:20
realize is that the language model is not only a model of language it is also a model of relevance
48:26
it's basically a model of what the social peers would find interesting in the real world language is conveying
48:34
this relevant knowledge and so now comes the most interesting
48:40
part after the first phase there is now an autonomous exploration phase in which the social peer does not speak anymore
48:46
here the agent now reuses uh its uh learned goal generator to sample its own goal and it's not only
48:53
sentences that were already uttered by the social peer but new sentences that use the compositionality like you can see under
49:00
on the left for example maybe in the past with the social pier the the agent uh
49:05
so grasp red tree or grass red algae or raw blue algae now it's a it's able to
49:10
imagine a new goods by recombining and saying first of all oh maybe i could try to grow the red tree
49:17
um and so basically it means that uh the goal something process is going also to
49:22
be guided by the relevance model that's encoding in the language model and then it's going to use its learned goal achievement function to generate its own
49:29
feedback and learn from it and it's going to use the learn captioner for goal relabeling
49:35
and hindsight learning so for example imagine that the entrance aimed to grow the red tree but it failed
49:41
but maybe in the end it managed to to to to to grasp a new kind of object
49:47
uh to grasp i don't know the the yellow algae for example then it will able to describe linguistically this what it
49:54
just did and to learn about it and so basically now the agent explore by using
50:00
language to imagine new goals and really learns using fundamentally this internalized model of
50:08
the social beer um and so to evaluate such mechanism for
50:13
out-of-decision language uh goal imagination uh we've been
50:19
making a whole series of experiments uh studying the capacity of the agent uh to
50:24
achieve goals for example from a hold out test set of linguistic goals that
50:30
the social player was not allowed to to say in the first phase so we're sure these are goals that the social media
50:36
never taught to the agent and so we could see that the agent actually can
50:42
generalize um to those uh to those language goal test language goal much
50:47
better than he was than if he was only learning in a traditional way by by the social peer guidance without imagining
50:55
it's all goals using the eternal model of the social media we've actually been studying various
51:02
alternative implementation of this idea showing that it's very robust on the particular way you do the goal
51:07
imagination so that's a very strong idea we've also been able to show that
51:12
there are several kinds of generalization that in it it enables from the language that we've been also
51:18
able to show that it's enabled to reach very efficiently some exp it's enabled to explore very
51:23
efficiently in environments and and finally we have some some fun qualitative analysis like seeing an
51:28
agent for example imagine the goal of growing a plant which is a goal that was never uttered by the social peer and then it's going
51:35
to first try to do it by giving food to the plant because it's policy generalizing the strategy for growing
51:40
animals the designs learned with the social peer but then the internal learned goal achievement function says it doesn't
51:46
work and cell supervises the agents to adapt its behavior and to learn the right behavior which is basically to
51:52
bring water okay and then finally before i conclude a
51:58
very short word of a project that is uh at the very beginning right now with a
52:04
few colleagues uh but i think i'm very excited about that because it's i think it's something we need to do
52:11
i said that if we want to develop human-like intelligence uh human-like
52:17
agents with human-like intelligence we need to really address frontally so the social context
52:23
in most existing uh deep learning and different personal learning work you've been seeing in machine learning and in ai
52:31
uh uh recently uh in uh i'm speaking about the machine learning domain not development of robotics in which all
52:37
those issues are very traditional but i'm thinking about different personal learning uh which which i have contributed amazing methods but those
52:44
methods have not at all been tested in the social context for example you know the atari suit of video game which has been
52:51
instrumental and great to propel progress in the recent years in this field but there is no social
52:57
dimension at all and so what we believe is that we need to develop um rich
53:02
environments in which there are many kinds of social peers social problems uh on which we can
53:09
we can use those environment to foster the development of social intelligence and also to test social intelligence in
53:15
engines and so this is uh we have a first version of this which we call social ai which is only a draft it will
53:21
evolve a lot uh which enables to to to foster things like the development of the various
53:27
aspects of theory of minds learning interaction protocols which uh psychologists call pragmatic frames in
53:34
in the term used by jerome brunner so if you want to discuss this i'll be
53:39
happy to discuss this uh further after this talk and offline so let me
53:46
conclude i'd like to conclude by just saying again those three fundamental ideas that i think are very key for developmental
53:52
ai and and and should be used to structure research further the child is authentic
53:59
uh the child learns by generating its own learning objective its own goals
54:04
intelligence is embodied and developed through self-organization of the interaction between brand body and
54:09
environment which also includes the fact that to understand intelligence you don't only
54:16
need to develop complicated algorithmic learning models but you need to understand how to
54:21
provide adequate bodies and adequate environments and among the properties of adequate
54:27
environment it's very fundamental to understand that the social context is key
54:32
and it's not only key to develop social intelligence but also because social
54:38
skills like language become internalized and become cognitive tools that are
54:43
fundamental even for individual thinking and development and i've been also showing
54:49
and i've tried also to show you today that this fundamental research can have many real world and social societally
54:55
important applications in domains as diverse as educational technologies
55:01
robotics automatic discoveries etc and i'll finish by just saying a major
55:08
thanks uh to many phd sports engineers senior colleagues with whom i have been
55:14
working across the years there are actually even many more but but whose work is amazing
55:21
i couldn't have time to to present today but all this work is only possible
55:26
with them so thank you very much


# enddoc