Contents

Foreword vii

Online Supplement x

Preface xi

1 Introduction 1

1.1 Evolving Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Extending Creative AI . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Improving the World . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Plan for the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.5 Plan for Hands-on Exercises . . . . . . . . . . . . . . . . . . . . . . . 12

1.6 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 12

2 The Basics 14

2.1 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1.2 Population-Based Search . . . . . . . . . . . . . . . . . . . . . 17

2.1.3 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.4 Variation Operators . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.5 Fitness Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.6 Reproduction and Replacement . . . . . . . . . . . . . . . . . 19

2.1.7 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Types of Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . 21

2.2.1 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.2 Evolution Strategy . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.3 Covariance-Matrix Adaptation Evolution Strategy . . . . . . . . 25

2.2.4 OpenAI Evolution Strategy . . . . . . . . . . . . . . . . . . . . 28

2.2.5 Multiobjective Evolutionary Algorithms . . . . . . . . . . . . . 30

2.2.6 Further Evolutionary Computation Techniques . . . . . . . . . 32

2.2.7 Try These Algorithms Yourself . . . . . . . . . . . . . . . . . . 34

2.3 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.3.1 Feedforward Neural Networks . . . . . . . . . . . . . . . . . . 36

2.3.2 Training Feedforward Neural Networks with Gradient Descent . 37

2.3.3 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . 39

CONTENTS

2.3.4 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . 40

2.3.5 Convolutional Neural Networks . . . . . . . . . . . . . . . . . 42

2.3.6 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.4 Neuroevolution: An Integrated Approach . . . . . . . . . . . . . . . . 47

2.5 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 47

3 The Fundamentals of Neuroevolution 49

3.1 Neuroevolution Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . 49

3.1.1 Fixed-Topology Neuroevolution . . . . . . . . . . . . . . . . . 50

3.1.2 Topology and Weight Evolving Artiﬁcial Neural Networks . . . 50

3.1.3 Direct Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.1.4 Indirect Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.2 Case study: Evolving a Simple Walking Agent . . . . . . . . . . . . . . 52

3.2.1 The Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.2.2 Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.2.3 Neural Network Architecture . . . . . . . . . . . . . . . . . . . 54

3.2.4 Evolutionary Algorithm . . . . . . . . . . . . . . . . . . . . . 55

3.2.5 Training for Generality . . . . . . . . . . . . . . . . . . . . . . 55

3.3 Neuroevolution of Augmenting Topologies . . . . . . . . . . . . . . . . 57

3.3.1 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . 57

3.3.2 Genetic Encoding and Historical Markings . . . . . . . . . . . 59

3.3.3 Speciation and Fitness Sharing . . . . . . . . . . . . . . . . . . 62

3.3.4 Example: Double Pole Balancing . . . . . . . . . . . . . . . . 63

3.4 Scaling up Neuroevolution . . . . . . . . . . . . . . . . . . . . . . . . 66

3.4.1 Neuroevolution vs. Deep Learning . . . . . . . . . . . . . . . . 66

3.4.2 Deep Neuroevolution . . . . . . . . . . . . . . . . . . . . . . . 68

3.4.3 Taking Advantage of Big Compute . . . . . . . . . . . . . . . . 69

3.5 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 71

4 Indirect Encodings 73

4.1 Why Indirect Encodings? . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.2 Developmental Processes . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.2.1 Cell-Chemistry Approaches . . . . . . . . . . . . . . . . . . . 75

4.2.2 Grammatical Encodings . . . . . . . . . . . . . . . . . . . . . 77

4.2.3 Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . 81

4.3 Indirect Encoding through Hypernetworks . . . . . . . . . . . . . . . . 85

4.3.1 Compositional Pattern Producing Networks . . . . . . . . . . . 86

4.3.2 Case Study: Evolving Virtual Creatures with CPPN-NEAT . . . 90

4.3.3 HyperNEAT . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.3.4 Multiagent HyperNEAT . . . . . . . . . . . . . . . . . . . . . 95

4.3.5 Evolvable Substrate HyperNEAT . . . . . . . . . . . . . . . . 98

4.3.6 General Hypernetworks and Dynamic Indirect Encodings . . . 101

4.4 Self-attention as Dynamic Indirect Encoding . . . . . . . . . . . . . . . 103

4.4.1 Background on Self-Attention . . . . . . . . . . . . . . . . . . 104

4.4.2 Self-Attention as a Form of Indirect Encoding . . . . . . . . . . 105

CONTENTS

4.4.3 Self-Attention Based Agents . . . . . . . . . . . . . . . . . . . 106

4.5 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 110

5 Utilizing Diversity 111

5.1 Genetic Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.2 Behavioral Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.3 Novelty Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5.4 Quality Diversity Methods . . . . . . . . . . . . . . . . . . . . . . . . 119

5.4.1 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . 120

5.4.2 Novelty Search with Local Competition . . . . . . . . . . . . . 121

5.4.3 MAP-Elites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

5.4.4 Implementing and Enhancing QD Algorithms . . . . . . . . . . 126

5.5 Multiobjectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.6 Ensembling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

5.7 Utilizing Population Culture and History . . . . . . . . . . . . . . . . . 132

5.8 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 136

6 Neuroevolution of Behavior 138

6.1 From Control to Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.2 Discovering Robust Control . . . . . . . . . . . . . . . . . . . . . . . . 142

6.2.1 Noise, Exploration, and Novelty . . . . . . . . . . . . . . . . . 142

6.2.2 Symmetry, Context, and Adaptation . . . . . . . . . . . . . . . 143

6.2.3 Transfer to Physical Robots . . . . . . . . . . . . . . . . . . . . 147

6.3 Discovering Flexible Strategies . . . . . . . . . . . . . . . . . . . . . . 150

6.3.1 Switching between Behaviors . . . . . . . . . . . . . . . . . . 150

6.3.2 Evolving Cognitive Behaviors . . . . . . . . . . . . . . . . . . 154

6.3.3 Utilizing Stochasticity, Coevolution, and Scale . . . . . . . . . 155

6.4 Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

6.4.1 Successes and Challenges . . . . . . . . . . . . . . . . . . . . 157

6.4.2 Surrogate Modeling . . . . . . . . . . . . . . . . . . . . . . . 158

6.4.3

Case Study: Mitigating Climate Change through Optimized Land

Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

6.4.4 Case Study: Optimizing NPIs for COVID-19 . . . . . . . . . . 165

6.4.5 Leveraging Human Expertise . . . . . . . . . . . . . . . . . . . 170

6.5 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 175

7 Neuroevolution of Collective Systems 177

7.1 Cooperative Coevolution . . . . . . . . . . . . . . . . . . . . . . . . . 177

7.1.1 Evolving a Single Neural Network . . . . . . . . . . . . . . . . 178

7.1.2 Evolving Structured Heterogeneous Networks . . . . . . . . . . 181

7.1.3 Evolving a Team . . . . . . . . . . . . . . . . . . . . . . . . . 183

7.2 Competitive Coevolution . . . . . . . . . . . . . . . . . . . . . . . . . 186

7.2.1 Evolving Single Neural Networks . . . . . . . . . . . . . . . . 187

7.2.2 Evolving Multiple Teams . . . . . . . . . . . . . . . . . . . . . 189

7.3 Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

iii

CONTENTS

7.3.1 Evolving Neural Cellular Automata . . . . . . . . . . . . . . . 193

7.3.2 Growing Functional Machines . . . . . . . . . . . . . . . . . . 195

7.3.3 Case Study: Growing Game Levels with QD-Evolved NCAs . . 197

7.3.4 Evolving Self-Assembling Neural Networks . . . . . . . . . . . 200

7.3.5 Combining Evolutionary Creativity with GD Precision . . . . . 204

7.4 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 207

8 Interactive Neuroevolution 208

8.1 The NERO Machine Learning Game . . . . . . . . . . . . . . . . . . . 208

8.2 Incorporating Human Knowledge into NERO . . . . . . . . . . . . . . 213

8.3 Neuroevolution-enabled Collaboration . . . . . . . . . . . . . . . . . . 218

8.4 Case Study: Collaborative Interactive Neuroevolution Through Play . . 220

8.5 Making Human Contributions Practical . . . . . . . . . . . . . . . . . 224

8.6 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 226

9 Open-ended Neuroevolution 228

9.1 Open-ended Discovery of Complex Behavior . . . . . . . . . . . . . . 228

9.1.1 Neutral Mutations with Weak Selection . . . . . . . . . . . . . 228

9.1.2 Extinction Events . . . . . . . . . . . . . . . . . . . . . . . . . 230

9.1.3 Evolvable Representations . . . . . . . . . . . . . . . . . . . . 231

9.1.4 Expressive Encodings . . . . . . . . . . . . . . . . . . . . . . 234

9.1.5 Major Transitions . . . . . . . . . . . . . . . . . . . . . . . . . 235

9.1.6 Open-ended Evolution of Intelligence . . . . . . . . . . . . . . 237

9.2 Cooperative Coevolution of Environments and Solutions . . . . . . . . 238

9.2.1 The Inŕuence of Environments . . . . . . . . . . . . . . . . . . 238

9.2.2 Body and Brain Coevolution . . . . . . . . . . . . . . . . . . . 238

9.2.3 Coevolution Driven by Interestingness . . . . . . . . . . . . . . 241

9.3 Competitive Coevolution of Environments and Solutions . . . . . . . . 244

9.3.1 Paired Open-Ended Trailblazer . . . . . . . . . . . . . . . . . . 244

9.3.2 Learning to Chase-and-Escape . . . . . . . . . . . . . . . . . . 249

9.4 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 253

10 Evolutionary Neural Architecture Search 254

10.1 Neural Architecture Search with NEAT . . . . . . . . . . . . . . . . . 254

10.2 NAS for Deep Lear ning . . . . . . . . . . . . . . . . . . . . . . . . . . 258

10.3 Case Studies: Improving Deep Learning SOTA . . . . . . . . . . . . . 262

10.3.1 LSTM Designs . . . . . . . . . . . . . . . . . . . . . . . . . . 262

10.3.2 CoDeepNEAT . . . . . . . . . . . . . . . . . . . . . . . . . . 264

10.3.3 AmoebaNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

10.4 Multiobjective and Multitask NAS . . . . . . . . . . . . . . . . . . . . 267

10.5 Making NAS Practical . . . . . . . . . . . . . . . . . . . . . . . . . . 272

10.6 Beyond Neural Architecture Search . . . . . . . . . . . . . . . . . . . . 277

10.7 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 280

CONTENTS

11 Optimization of Neural Network Designs 281

11.1 Designing Complex Systems . . . . . . . . . . . . . . . . . . . . . . . 281

11.2 Bilevel Neuroevolution . . . . . . . . . . . . . . . . . . . . . . . . . . 282

11.3 Evolutionary Meta-lear ning . . . . . . . . . . . . . . . . . . . . . . . . 285

11.3.1 Loss functions . . . . . . . . . . . . . . . . . . . . . . . . . . 286

11.3.2 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . 288

11.3.3 Data Use and Augmentation . . . . . . . . . . . . . . . . . . . 290

11.3.4 Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . 291

11.3.5 Utilizing Surrogates . . . . . . . . . . . . . . . . . . . . . . . 292

11.3.6 Synergies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

11.4 Case Study: Meta-learning vs. Human Design . . . . . . . . . . . . . . 295

11.5 Neuroevolution of Neuromorphic Systems . . . . . . . . . . . . . . . . 299

11.5.1 Neuromorphic Computation . . . . . . . . . . . . . . . . . . . 299

11.5.2 Evolutionary Optimization . . . . . . . . . . . . . . . . . . . . 300

11.5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

11.5.4 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . 303

11.6 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 304

12 Synergies with Reinforcement Learning 306

12.1 Reinforcement learning vs. Neuroevolution . . . . . . . . . . . . . . . 306

12.2 Synergistic Combinations . . . . . . . . . . . . . . . . . . . . . . . . . 308

12.2.1 Integrating Population-Based and Reinforcement-Based Search 308

12.2.2 Evolving Value Networks for RL . . . . . . . . . . . . . . . . . 309

12.2.3 Evolving Starting Points for RL . . . . . . . . . . . . . . . . . 311

12.3 Evolving Neural Networks to Reinforcement Learn . . . . . . . . . . . 315

12.3.1 Evolving Hebbian Learning Rules . . . . . . . . . . . . . . . . 316

12.3.2 Case Study: Hebbian Learning for Physical Robot Transfer . . . 320

12.3.3 Learning When to Learn through Neuromodulation . . . . . . . 322

12.3.4 Indirectly Encoded Plasticity . . . . . . . . . . . . . . . . . . . 324

12.3.5

Learning to Continually Learn through Networks with External

Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

12.4 Integrating Evolution, Learning, and Embodiment . . . . . . . . . . . . 330

12.5 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 333

13 Synergies with Generative AI 335

13.1 Background on Large Language Models . . . . . . . . . . . . . . . . . 335

13.2 Evolutionary Computing Enhances LLMs . . . . . . . . . . . . . . . . 336

13.2.1 Evolutionary Prompt Engineering/Adaptation . . . . . . . . . . 337

13.2.2 Evolutionary Model Merging . . . . . . . . . . . . . . . . . . . 341

13.2.3 Fine-Tuning with Evolution Strategy . . . . . . . . . . . . . . . 345

13.3 LLMs Enhance Evolutionary Computing . . . . . . . . . . . . . . . . . 348

13.3.1 Evolution through Large Models . . . . . . . . . . . . . . . . . 348

13.3.2 Language Model Crossover . . . . . . . . . . . . . . . . . . . 350

13.3.3 LLMs as Evolution Strategies . . . . . . . . . . . . . . . . . . 354

13.3.4 AlphaEvolve . . . . . . . . . . . . . . . . . . . . . . . . . . . 358

CONTENTS

13.4 Case Studies: NE-enhanced Generative AI for Game Level Generation . 361

13.4.1 MarioGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

13.4.2 MarioGPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

13.5 World Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

13.5.1 A Simple World Model for Agents . . . . . . . . . . . . . . . . 367

13.5.2 Using the World Model for Feature Extraction . . . . . . . . . . 370

13.5.3 Training an Agent Inside Its Own World Model . . . . . . . . . 371

13.6 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 373

14 What Neuroevolution Can Tell Us About Biological Evolution? 375

14.1 Understanding Neural Structure . . . . . . . . . . . . . . . . . . . . . 375

14.2 Evolutionary Or igins of Modularity . . . . . . . . . . . . . . . . . . . 379

14.3 Understanding Neuromodulation . . . . . . . . . . . . . . . . . . . . . 382

14.4 Developmental Processes . . . . . . . . . . . . . . . . . . . . . . . . . 384

14.4.1 Synergistic Development . . . . . . . . . . . . . . . . . . . . . 384

14.4.2 Development through Genetically Directed Learning . . . . . . 385

14.5 Constrained Evolution of Behavior . . . . . . . . . . . . . . . . . . . . 389

14.6 Case Study: Understanding Human-like Behavior . . . . . . . . . . . . 392

14.7 Case Study: Understanding an Evolutionary Breakthrough . . . . . . . 394

14.8 Evolution of Language . . . . . . . . . . . . . . . . . . . . . . . . . . 398

14.8.1 Biology of Language . . . . . . . . . . . . . . . . . . . . . . . 399

14.8.2 Evolving Communication . . . . . . . . . . . . . . . . . . . . 400

14.8.3 Evolution of Structured Language . . . . . . . . . . . . . . . . 402

14.9 Chapter Review Questions . . . . . . . . . . . . . . . . . . . . . . . . 405

15 Epilogue 406

References 408

Subject Index 456

Author Index 465

Foreword

Neuroevolution is the study of how to use evolutionary computation methods in the design

and optimization of neural networks. And neuroevolution might just be the łnext big

thingž in artiﬁcial intelligence. Why neuroevolution? And why now?

Since the beginnings of the ﬁeld of artiﬁcial intelligence in the 1940s and 50s, AI

researchers have taken inspiration from intelligent and adaptive systems in nature. The

best-known example is biological brains, which led to neural networks and deep learning.

But other inspirations for AI have included biological systems ranging from immune

systems to ant colonies, and most notably, the processes of evolution driven by natural

selection.

Work on evolution-inspired AI has gone under the names łgenetic algorithms,ž łevolu-

tion strategies,ž łgenetic programming,ž and more generally łevolutionary computationž.

All such approaches involve populations of individuals that represent solutions to a

problem or set of problems, where a solution can be in the form of a vector, a program,

a grammar, or other kinds of data structures, depending on the task. Each individual is

assigned a łﬁtnessž value encoding its quality according to some task-speciﬁc criteria,

and the population undergoes a computational version of natural selection, in which the

ﬁttest individuals produce łoﬀspring,ž that is, new individuals, with variation generated

by mutation and recombination. This process is repeated for some number of iterations

(łgenerationsž), at which point one or more highly ﬁt solutions have (hopefully) been

discovered.

My own enchantment with evolutionary computation started in graduate school at the

University of Michigan, where I had the privilege to study with John Holland, the founder

of the ﬁeld of genetic algorithms (GAs). In his book Adaptation in Natural and Artiﬁcial

Systems,

Holland showed that biological evolution could be abstracted in such a way as

to be programmed and run on machines. In my own computational experiments with GAs,

it was thrilling to witness innovative solutions to complex problems being created via the

simple mechanisms of selection and variation, iterated over many generations.

Holland’s work on genetic algorithms began in the 1960s. Around the same time,

a few other groups were investigating similar ideas, such as the evolution strategies of

Hans-Paul Schwefel and others.

During the 1960s and in subsequent decades, research

on neural networks and on evolutionary computation advanced along largely independent

paths, each area growing its own research community with separate conferences, journals,

Holland, J. H. (1975). Adaptation in Natural and Artiﬁcial Systems. University of Michigan Press.

Schwefel, H. P. (1984). Evolution Strategies: A Family of Non-Linear Optimization Techniques Based

on Imitating Some Principles of Organic Evolution. Annals of Operations Research, 1.

vii

CONTENTS

and benchmarks for measuring progress. These lesser known biologically inspired AI

approaches stood in contrast to the logic-inspired symbolic AI methods, including łexpert

systems,ž that dominated the ﬁeld.

By the late 1980s, there was widespread sentiment that none of the major AI methodsÐ

symbolic, neural, or evolutionaryÐhad lived up to expectations, and an łAI winterž set in.

Indeed, when I g raduated with a PhD in 1990, I was advised not to use the term łartiﬁcial

intelligencež on my job applications.

In the 1990s and early 2000s, the next big thing in AI was machine learning, which,

at the time, drew its inspirations from statistics and other mathematical approaches to

inference from data. However, research continued on both neural networks and evolutionary

computation in relatively small communities.

This changed dramatically in the 2010s with the meteoric r ise of deep neural networks,

a technology that had been around since at least the 1980s, but suddenly showed dramatic

improvements in performance due to scaleÐthe ability to train very large networks with

suﬃcient data, by virtue of increased compute power and the availability of enormous

corpora of images, text, and other modalities on the World Wide Web. The 2010s saw the

łdeep learning revolutionž in computer vision, speech recognition, language translation,

and other long studied areas of AI. In the 2020s, the world witnessed the rise of generative

AI, based on the transformer architecture, a kind of deep neural network architecture

optimized for sequence learning. The most successful generative AI models have up to

trillions of tunable parameters, and are trained on up to a petabyte of data. It seemed

to many that scaling up these systems would soon result in machines with human-level

intelligence.

However, several years after the release of ChatGPT, most AI researchers are coming

to the conclusion that scaling alone is actually a łdead end.ž

While the best generative

AI systems are remarkably good at many things, they remain stubbornly brittle on tasks

requiring complex decision-making, as well as tr ustworthy generalization, reasoning, and

planningÐabilities needed for intelligent agents that accomplish ill-deﬁned or open-ended

tasks in the real world.

This book argues that neuroevolution will be part of a new revolution in AI. The

development of evolutionary methods for optimizing diﬀerent components of neural

networks dates back to the 1980s. And as it did for neural networks, scaling computing

power and data might unlock neuroevolution’s potential. As the prominent roboticist

Rodney Brooks speculated, łPerhaps there is an opportunity to reinvent evolutionary

computation and exploit the competition ready training sets and massive amounts of

computation.ž

Interest in evolutionary computation has stemmed from the fact that evolution, at

scale, has given rise to many essential features of intelligent and adaptive systems. These

include the abilities to continually adapt to changing environments, to design open-ended

diversity and novelty, and to create collective intelligencežÐcomplex multi-agent systems

that, via cooperative and competitive interactions, produce adaptive behavior that is far

more than the sum of their parts. In addition, evolution is a mechanism for hierarchical

https://futurism.com/ai-researchers-tech-industry-dead-end

https://x.com/rodneyabrooks/status/1204249201913122817

viii

CONTENTS

adaptation, simultaneously working on many levels ranging from genes to individuals, and

on to groups and even entire coevolutionary ecosystems. This book makes the argument

that such features can be captured in computational systems, and provides readers the

essential knowledge and tools they will need to build neuroevolutionary AI.

Authored by four pioneering neuroevolution researchers, this book provides detailed

primers on the major ideas, methods, and mathematics underlying both evolutionary

computation and neural networks, and on the many ways in which they can be combined,

summarizing advances from decades of work in this ﬁeld. The book also provides

numerous real-world case studies in domains such as decision-making, control systems,

robotics, and video games, that demonstrate the ways in which these methods can be used

to deal with dynamic, ambiguous, and uncertain environments, to simultaneously optimize

multiple levels of a system, often taking into account multiple goals, and to enable lifelong

learning, open-ended adaptation and novelty creation.

The next big thing in AI is coming, and I suspect that neuroevolution will be a major

part of it.

Melanie Mitchell, Santa Fe, NM, March, 2025

Online Supplement

https://neuroevolutionbook.com/

The above website provides supplementary material that we hope will be useful to

readers and instructors, including demos, tutorials, exercises, lecture slides, and any

corrections and updates.

Preface

Artiﬁcial intelligence has surged into mainstream popularity, with generative AI tech-

nologies such as large language models (LLMs) capturing the public’s imagination.

Conversations about AI’s potential and power are everywhere, as these models compose

text, generate images, and mimic human language at an unprecedented scale. Amid this

boom, however, lies another ﬁeld with equally transformative potential: neuroevolution.

Neuroevolution has developed unique approaches and capabilities that have yet to capture

the same level of mainstream attention.

Neuroevolution, combining principles of neural networks with evolutionary processes,

has been around for decades. It oﬀers solutions that go beyond imitation and pattern

recognition, extending into areas of adaptability, creativity, and resilience. While

traditional AI often relies on predeﬁned objectives and vast datasets, neuroevolution

excels in environments where goals are ambiguous, rewards are sparse, and conditions are

ever-changing. This approach introduces a method of designing and evolving AI systems

that can handle complex, high-dimensional problems with minimal human intervention,

and it is precisely this adaptability that is set to bring neuroevolution to the forefront of AI

in the coming years.

As AI advances into realms requiring ŕexibility and open-ended problem-solving,

neuroevolution has shown great promise in evolving robust, adaptive, and creative solutions.

It is particularly promising for applications where the optimal solution is unknown or hard

to deﬁne, such as robotics, dynamic systems, and even art and design. With neuroevolution,

we can create agents that not only evolve but also learn continuously during their lifetime,

much like biological organisms do in nature.

This book serves as a gateway into the world of neuroevolution, providing readers

with both a foundational understanding and practical tools for harnessing its potential. It

covers the core concepts, algorithms, and applications of neuroevolutionary systems, with

each chapter containing examples and questions that encourage readers to engage with the

material critically. By oﬀering insights into synergies with generative AI, reinforcement

learning, and other domains, we hope to demonstrate the relevance of neuroevolution to

the future of AI.

This book would not have been possible without the contributions of researchers and

pioneers in neuroevolution and evolutionary computation, whose insights and innovations

have laid the foundation for this work. We are also grateful to our colleagues, students,

and readers who have inspired us with their curiosity and feedback, helping us to reﬁne

and expand upon the ideas presented here. We would also like to thank our MIT Editor

Elizabeth Swayze, who believed in this project early on and was a pleasure to work with.

CONTENTS

Additionally, we would like to express our gratitude to everybody who gave us

permission to reproduce images and ﬁgures from their publications. We indicate the ﬁgure

sources throughout the book in the ﬁgure captions. Special thanks to Ken Stanley for

giving detailed feedback on a draft of this book, Noah Syrkis for assistance in obtaining

ﬁgure permissions, and Julianna Nijmeh and Manasha Vengat for help in designing and

building the book website.

Writing this book has been a long journey. We want to thank your families and friends

for their support, without which this book would not have seen the light of day. Sebastian

would like to thank his wife Débora for her support and patience throughout the countless

hours spent writing this book. He is also deeply grateful to his parents, whose love,

encouragement, and belief in him have shaped the path that made this work possible.

Yujin is very grateful to his wife Jinmei for tolerating many late nights and caﬀeine-fueled

ramblings; half the credit for his contr ibution to this book belongs to her. David would

like to thank his parents for their unwavering support, love, and encouragement throughout

every step of this jour ney. Risto would like to thank his wife Riitta and mom Raili for

providing a distraction-free environment for three month-long writing binges in Helsinki.

We would also like to thank Sakana.ai and Cognizant AI Lab for the ﬁnancial support,

which allowed this book to be enjoyed in color.

xii

Chapter 1

Introduction

To illustrate what neuroevolution is about, consider the following four challenges (ﬁg-

ure 1.1):

Imagine that you want to create a character in a video game where you, as the player,

perform search and rescue. This character acts as your sidekick: scouts for helpful

information, helps move large objects, and so on. You want the character to anticipate

what you want to do, and act in a believable, human-like manner: it has limited resources,

like you do, but generally uses them well. How do you design such a character? Many of

its characteristics are diﬃcult to describe: you know it when you see it.

Now imagine that a new pandemic is emerging. It seems to target particularly

vulnerable populations, seems to be transmitted through the air in crowded conditions, and

seems to have a long incubation period. The disease has already led to hospitalizations

in several countries, and some have taken measures to contain it e.g. by closing schools,

restricting air travel, and establishing contact tracing. Eventually, the pathogen will be

sequenced, and vaccines and medications perhaps developed for it, but we need to cope

with the spread of the disease right now. Can we learn from these experiences around

the world, and come up with intervention recommendations that are customized for the

current situation in diﬀerent countries, or even cities and neighborhoods?

You are an analyst at a retailer, tr ying to predict sales of diﬀerent products in diﬀerent

stores to minimize inventory and waste. You have historical data that includes product

descriptions, seasonal variations, and economic indicators, which should allow you to

use deep learning to predict. However, there is not enough data to do it: Such a network

would likely learn to memorize the small dataset and not generalize well in the future.

However, there is a lot of data about other types of sales, as well as other economic and

retail metrics. Could you design a deep learning architecture that utilizes all these other

datasets to learn to predict your data better?

You are a biologist studying the behavior of a particular species, say hyenas. You

discover that in some circumstances they perform extremely sophisticated coordination of

collaborative actions that allows them to overpower a group of lions. While hyenas are

good at many social tasks, this one stands out as something beyond their usual capabilities.

Could we be seeing evolution taking place, i.e. an adaptation that eventually leads to a

leap in social intelligence? It is not possible to verify the hypothesis in the ﬁeld, or even in

CHAPTER 1. INTRODUCTION

(𝑎) Video-game character (𝑏) Pandemic intervention strategy

(𝑐) Network sharing knowledge across tasks (𝑑) Evolution of coordination

Figure 1.1: Illustrative opportunities for neuroevolution. (

𝑎

) A non-player character in a

video game is controlled by an evolved neural network. It balances multiple objectives, including

ill-deﬁned ones such as łhuman-like behaviorž. (

𝑏

) Based on a predictive model learned from

historical data (top), neuroevolution constructs a strategy that can be applied to diﬀerent countries

at diﬀerent times. It discovers customized solutions (bottom) that are more eﬀective than general

rules of thumb. (

𝑐

) In learning multiple tasks at once, neuroevolution discovers a common set of

modules, and for each task, a diﬀerent architecture made of these modules (this one recognizes

handwritten characters in the Angelic alphabet; the diﬀerent modules are labeled by color). By

combining knowledge from multiple tasks in this manner, neuroevolution can make deep learning

work even when the data is otherwise insuﬃcient. (

𝑑

) Neuroevolution discovers sophisticated

coordination that allows simulated hyenas to steal a kill from lions. It is possible to identify what

steps in evolution lead to this breakthrough; for instance, the descendants of risk-taking (red) and

risk-averse (blue) hyenas will evolve to approach up to the striking distance (black dotted square)

where they can overpower the lion (yellow, with a zebra kill). Figure

𝑐

from J. Liang, Meyerson,

and Miikkulainen (2018).

the lab. Could we create a computational simulation to provide evidence for it?

The above four examples each illustrate neuroevolution in action. Neuroevolution, or

optimization of neural network designs through evolutionary computation, is an approach

in the AI toolbox that is diﬀerent from just about anything else. The idea is not to optimize

CHAPTER 1. INTRODUCTION

a quantitative metric, but ﬁnd solutions that achieve multiple goals, some of which may be

ill-deﬁned; not to replace human creativity and decision-making authority, but to extend

it with a powerful tool for discovery; not to solve problems by encoding and applying

what already works, but to discover creative, eﬀective solutions that can be surprising

and diﬃcult to ﬁnd; not to create static and rigid solutions but behavior that generalizes

and adapts to unpredictable and changing world. Thus, with neuroevolution it is possible

to develop AI-based decision-making to improve engineering, science, and society in

general.

This book aims to give the reader the conceptual and practical knowledge to take

advantage of neuroevolution in a range of applications, and to develop it further. The

discussion will begin in this chapter with a high-level over view of neuroevolution mecha-

nisms, comparing and contrasting them with other types of creative AI, and identifying

opportunities where neuroevolution can have the most signiﬁcant impact. The body of

the book then reviews evolutionary computation basics, methods for taking advantage of

encodings and diversity, constructing intelligent agents, empowering and leveraging other

learning systems (such as deep learning, neuromorphic systems, reinforcement learning,

and generative AI), and modeling and drawing insights from biology.

1.1 Evolving Neural Networks

Neuroevolution is the practice of applying computational evolution methods to artiﬁcial

neural networks. Most students of machine learning are taught that to train a neural

network, one needs to deﬁne an objective function to measure how well the neural network

performs in the task, use backpropagation to solve for the derivatives of this objective

function with respect to each weight, and then use these derivatives iteratively to ﬁnd a

good set of weights. This framework is known as end-to-end training.

While the backpropagation algorithm is a powerful method for many applications, it is

certainly not the only one. There are other methods for coming up with neural network

weights. For example, going to one extreme, one method is to randomly guess the weights

of a neural network until we get a set of weights that can help us perform some task.

Evolutionary algor ithms are a principled approach beyond random guessing. It works

as follows: Imagine that we have 100 sets of random weights for a neural network, and

evaluate the neural network with each set of weights to see how well it performs a given

task. After doing this, we keep only the best 20 sets of weights. Then, we populate

the remaining 80 sets of weights based on the 20 sets that we kept. Those 20 serve as

raw material, and we apply genetic operations crossover and mutation to form new sets

of weights. Crossover is a recombination operator, i.e. it forms a new set by choosing

randomly from two (or more) existing sets. Note that the existing sets are known to

be relatively good already, so crossover aims to ﬁnd ways to combine their strengths.

Mutation is a novelty operator, i.e. it chooses a weight in the new set randomly, and

modiﬁes it randomly to create a new weight. Thus, mutation aims to create weights that

may not already exist among the top 20 sets, but would be useful to have.

The 80 new sets of weights thus constitute a mutated recombination of the top 20.

Once we have a full population of 100 sets of weights again, we can repeat the task of

CHAPTER 1. INTRODUCTION

Figure 1.2: A general framework for neuroevolution. The process starts with a population of

neural networks, encoded e.g. as a set of weights in a ﬁxed network topology, concatenated into a

string, and initialized randomly. Each encoding is decoded into a network, which is then evaluated

in the task to estimate its ﬁtness, i.e. to see how well it performs in the task. The encodings of

networks that perform well become parents for the next generation of networks: They are mutated

and recombined with other good encodings to form oﬀspring networks. These oﬀspring networks

replace those that per formed poorly in the original population. Some of these oﬀspring networks

are likely to include good parts of both parents, and therefore perform better than their parents.

This process repeats until networks are eventually created that solve the task. Note that gradient

information is not necessary; only high-level ﬁtness information is needed. Thus, neuroevolution is

a population-based search that discovers and utilizes building blocks as well as random exploration,

resulting in network designs that perform well in a desired task.

evaluating the neural network with each set of weights again and repeat the evolution

process until we obtain a set of weights that satisﬁes our needs (ﬁgure 1.2).

This type of algorithm is an example of neuroevolution. It is very useful for solving

for neural network weights when it is diﬃcult to deﬁne a mathematically well-behaved

objective function, such as functions with no clear derivatives. Using this simple method

in the past, we can train neural networks to balance inverted pendulums, play video games,

and get agents to learn to avoid obstacles collectively.

In the past few decades, however, neuroevolution has developed into a branch of AI of

its own. Several new techniques beyond random exploration have been proposed to make

it systematic and eﬀective, and it has turned out to be a state-of-the-art method in many

application areas. This book reviews these techniques and opportunities. But let us start

by outlining neuroevolution’s role in AI in general.

1.2 Extending Creative AI

The ﬁeld of artiﬁcial intelligence (AI) is going through a transformation, i.e. a paradigm

shift. It is emerging from the laboratory and getting integrated into the mainstream of

CHAPTER 1. INTRODUCTION

society, changing how much of human intellectual activity is organized and conducted.

Technically, the focus of AI methods is moving from prediction to prescription, i.e. from

imitating what people do to creating new solutions that have not existed before. For

instance, instead of recognizing images and understanding language, or predicting the

weather or binding strength of molecules, AI is now generating images at will, wr iting

prose and answering questions, creating new molecules that never existed before, and

making decisions about resource allocations, treatment plans, and engineering design.

This technology has been named agentic AI because they are intelligent agents that make

changes to the world.

There is no single technology or breakthrough that made this progress possible; instead,

it emerged from the conŕuence of several factors. A most important one is simply the

availability of massive amounts of dataÐmuch of human experience is now available

online (text, code, images, video, music, and scientiﬁc datasets). At the same time,

computational resources are now available at an unprecedented and unexpectedly large

scaleÐa million-fold increase from 1990s to 2010s (Routley, 2017), and about four orders

of magnitude since then. As a result, many of the techniques that have been known since

the 1990sÐtechniques that looked promising but never quite worked at scaleÐcan now

be scaled up and made to work.

The most impactful one, of course, is large language models (LLMS; Hadi, Al Tashi,

Qureshi, et al., 2025; Min, Ross, Sulem, et al., 2024). Gradient descent as a learning

mechanism for neural networks became popular in the 1980s (although conceived much

before), and the task of predicting the next word in text (or more generally, a token in a

sequence) has been used to demonstrate properties of neural networks for decades. An

important innovation in modeling language structure was the transformer architecture,

which allows representing relationships and abstractions of the sequence. However, it was

still surprising that when scaled up billion-fold in terms of data and compute, language

modeling results in an agent that encodes general knowledge about the world and can cope

with many of the tasks in it. How exactly the scale-up achieved such behavior, whether

it is based on principles similar to the human brain, and how we can take advantage

of it in a reliable and productive manner is still work that needs to be done, but it has

already fundamentally changed the way we think about AI and artiﬁcial agents. They can

have useful knowledge and skills similar to and even beyond human abilities, and we can

interact with them similarly to human experts (Miikkulainen, 2024).

Image generation models are similarly a major step forward in generative AI. Various

techniques can be used, such as GANs or transformers, but many current models are based

on diﬀusion: A sequence of noising and denoising operations is used to tie together a

linguistic expression of the desired image (Luo,

2022). With very large training sets of

images and descriptions, the system learns the general principles about the visual world,

and can then use them to create images that have never existed before. The approach can

be extended to video and sound as well. One diﬀerence from LLMs is that the applications

are mostly creative, i.e. humans give high-level descriptions of what they want and the

model makes a guess of what the human has in mind. They are not used to answer

questions about facts, e.g. to create a map of an actual city; therefore, they cannot really be

wrong. Yet they still encode a lot of knowledge about the world, i.e. objects and actors in

CHAPTER 1. INTRODUCTION

it, their relationships, and even ill-deﬁned concepts such as styles, moods, and emotions.

They can thus serve as an extension of human creativity.

Indeed, LLMs and image models are already useful in this role of enhancing human

creativity. Experts can use them as a tool that makes them more productive. In an

interactive setup, the expert can describe what s/he wants, and the AI will generate

alternative solutions, be it illustrations, diagrams, memos, lyrics, art, stories, translations,

music, code for algorithms, code for interfaces, etc. The human can then reﬁne these

solutions until they solve the problem. The process can thus be more comprehensive,

eﬃcient, and creative than without such tools. However, what really made AI break out

from the lab to the mainstream is that these tools are also useful for non-experts. A much

larger segment of the population can now create art, text, and code at will, and be eﬀective

and proﬁcient in it, the way they never could before. For instance, I can write an outline of

a story, and use AI to realize it in a particular style, and another AI to provide illustrations

for itÐeven if I’m not a skilled artist or a writer. Similarly, I can describe an idea for a

method to extract knowledge from a dataset, and then use AI to implement the method in

e.g. Python. If the database has an esoteric API, I can have AI read the documentation

and write the code to get the data through it. I can do this even if I’m not a prog rammer,

or technical enough to understand the documentation.

The third area of AI that has recently emerged from the lab and is changing the world

is decision-makingÐin behavior, design, and strategy. That is, we have autonomous

agents that behave intelligently, for instance drive a car in open-ended traﬃc conditions, or

control non-player characters in video games. Using AI, we can design a better shape for

a train’s nose cone, or molecules that detect pathogens more accurately or treat diseases

more eﬀectively. Based on datasets in healthcare, business, and science, AI can be used

to recommend more eﬀective treatments, marketing campaigns, and strategies to reduce

global warming. This kind of AI diﬀers from the ﬁrst two in that it is not based on learning

and utilizing patterns from large datasets of existing solutions. Gradient descent cannot be

used because the desired behaviors are not knownÐhence there are no targets from which

to backpropagate. Instead, decision-making AI is based on searchÐtrying out solutions

and evaluating how well they work, and then improving them. The most important aspect

of such methods is to be able to explore and extrapolate, i.e. to discover solutions that are

novel and unlikely to be developed otherwise.

Like the other two methods, decision-making AI beneﬁts massively from scale. There

are two aspects to it. First, scaling up to large search spaces means that more novel,

diﬀerent, and surprising solutions can be created. A powerful way to do this scale-up is

to code the solutions as neural networks. Second, scaling up the number of evaluations

means that more of the search space can be explored, making their discover y more likely.

This scale-up is possible through high-ﬁdelity simulations and surrogate models (i.e.

predictive machine learning models). Like LLMs and image models, these technologies

have existed for a long timeÐand the massive increases in computational power are now

ready to make them practical, and take them from the laboratory to the real world. Thus,

decision-making AI is likely to be the third component of the AI revolution and one that is

emerging right now.

The technologies enabling it are diﬀerent from LLMs and image models (although

CHAPTER 1. INTRODUCTION

(𝑎) Single-agent improvement in a regular

landscape

(𝑏) Population-based search in a deceptive

landscape

Figure 1.3: Discovering solutions in large, multidimensional, deceptive search spaces. (

𝑎

)

Hill-climbing methods such as gradient descent and reinforcement learning are well-suited, but also

limited to small, low-dimensional, regular search spaces. If the initial solution is in the scope of the

optimum, hill-climbing will ﬁnd it. (

𝑏

) Population-based search extends to large, high-dimensional,

deceptive spaces. For instance in this deceptive space, the population is distributed over several

peaks, and operations such as crossover allow for long jumps between them.

they can also be used to enhance the emergence, as will be discussed in chapter 13). An

obvious one is reinforcement learning (RL). RL started in the 1980s and 1990s as a model

of animal conditioning and is still largely based on lifetime exploration and adaptation of

a single individual solution. RL takes many forms; the most dominant one has been based

on Q-lear ning, i.e. the idea that diﬀerent decisions at diﬀerent states have diﬀerent utility

values (Q-values), which can be learned by comparing values available at successive states.

An important aspect of such learning is that instead of storing the values explicitly as an

array, a value function is lear ned that covers a continuous space of states and decisions.

In that manner, the approach extends to large spaces often encountered in the real world.

For instance, a humanoid robot can have many degrees of freedom, and therefore many

physical conﬁgurations, and perform many diﬀerent actionsÐeven continuous ones. A

value function assigns a utility to all combinations of them. This approach in particular

has beneﬁted from the progress in neural networks and deep learning, and the increase in

available compute: it is possible to use them to learn more powerful value functions (e.g.

DQN; Mnih, Kavukcuoglu, Silver, et al., 2015).

With suﬃcient compute, policy iteration has emerged as an alternative to Q-learning.

Instead of values of decisions at states, the entire policy is learned directly as a neural

network. That is, given a state, the network suggests an optimal action directly. Again,

methods such as REINFORCE have existed for a long time (R. J. Williams,

1992), but

they have become practical with modern compute.

As a result, several real-world applications have emerged. The best known ones are in

game playing: For instance, RL was used as an element in beating the best human players

in e.g. go and chess as well as in simulated car racing (Silver, Hubert, Schrittwieser, et al.,

2018; Wurman, Barrett, Kawamoto, et al., 2022). Applications have also started to emerge

in scientiﬁc domains such as protein folding and drug design (Korshunova, N. Huang,

Capuzzi, et al., 2022).

Importantly, however, scale-up is still an issue with RL. Even though multiple

CHAPTER 1. INTRODUCTION

Figure 1.4: Finding solutions with population-based search. The search space is depicted

as a rectangle; the solutions are dots whose size corresponds to their ﬁtness. Population-based

search, i.e. evolutionary optimization, starts by spreading the initial population broadly around the

search space, thus exploring a diverse set of solutions. The poor solutions are discarded, and the

good ones are recombined with other good solutions through crossover and mutation, creating an

oﬀspring population. After several generations, the population converges around the best solutions.

They often represent diﬀerent tradeoﬀs from which the human decision-maker can choose. In this

manner, the search can discover a host of possible creative solutions.

modiﬁcations can be evaluated in parallel and oﬄine, the methods are still primarily

based on improving a single solution, i.e. on hill-climbing (ﬁgure 1.3

𝑎

). Creativity and

exploration are thus limited. Drastically diﬀerent, novel solutions are unlikely to be found

because the approach simply does not explore the space widely enough. Progress is slow

if the search landscape is high-dimensional and nonlinear enough, making it diﬃcult to

ﬁnd good combinations. Deceptive landscapes are diﬃcult to deal with since hill-climbing

is likely to get stuck in local minima. Care must thus be taken to design the problem well

so that RL can be eﬀective, which also limits the creativity that can be achieved.

Evolutionary computation (EC) oﬀers the missing piece. With a population of

candidates, it is possible to explore more widely (ﬁgure 1.3

𝑏

). The population can be

created to be highly diverse, covering the various areas of the search space. If some

such candidate does not work out, that’s ok; many other candidates are exploring other

areas. However, evolutionary search is much more than simply a large number of diverse,

parallel searches. As soon as a good idea is discovered, i.e. a solution that solves part of

the problem, or a special case, that information is available to other solutions through

crossover (ﬁgure

1.4). Good ideas thus spread quickly, and other parallel searches can

take advantage of them. As will be discussed in section

11.1, it is thus possible to ﬁnd

solutions in vast search spaces (e.g.

states), high-dimensional search spaces (e.g. 1B

parameters), and spaces that are highly nonlinear and deceptive.

CHAPTER 1. INTRODUCTION

These properties of evolutionary computation are useful in general in discovering many

diﬀerent kinds of solutions, such as designs described as parameter vectors, program trees,

or solution graphs. However, they are particularly useful in discovering neural networks for

decision-making tasks. Remember that the optimal behaviors are not known, and therefore

they must be found using search. The space of possible neural networks that implement

the behaviors is vast, high-dimensional, and with highly nonlinear interactions. Therefore,

evolution can be used eﬀectively to discover neural networks for decision-making. This is

what neuroevolution is all about.

1.3 Improving the World

The utility of neuroevolution is tremendous. First, it can be used to discover and optimize

behavior for intelligent agents, i.e. systems that are embedded in an environment and

interact with it over time. The networks map situations in the environment into actions that

achieve multiple goals. In this manner, it is possible to optimize control for cars, planes,

other vehicles, and robots in generalÐand not only control but behavioral strategies as well,

such as anticipating and avoiding obstacles, optimizing trajectories, and minimizing energy

usage and stress on the hardware. In simulated worlds, it is possible to discover eﬀective

behavior for non-player characters, guiding it towards diﬀerent strategies such as aggressive

or conservative, and even ill-deﬁned ones such as human-like and believable. Strategies for

dynamic optimization of logistics, transportation, manufacturing, and control of chemical

and biological plants as well as intelligent buildings and cities can be developed.

Second, neuroevolution can be used to discover customized strategies for decision-

making. These networks map descriptions of problems directly to solutions. For example

in wellness and healthcare, given a description of a person’s medical proﬁle as input,

they can make nutrition or exercise recommendations, or design personalized medical

treatments and rehabilitation plans, in order to maximize beneﬁts and minimize cost

and side eﬀects. In business, they can create marketing strategies customized to the

product, season, and competition, or investment strategies optimized to current markets

and resources. They can discover eﬀective incentives for recruiting and retention in

particular cases, as well as the most eﬀective responses in various customer service

situations. In education, they may assign personalized exercises that are maximally

eﬀective with the least amount of work. The same approach applies to physical training

while minimizing injury risk. There are many łAI for Goodž applications in society

as well, such as discovering eﬀective non-pharmaceutical containment and mitigation

strategies in a pandemic, approaches to land-use strategies to minimize climate change,

and designing and operating ecological villages.

Third, it is possible to use neuroevolution to optimize other learning methods.

Evolution creates optimal designs for them so that e.g. deep learning, reinforcement

learning, or spike-timing-dependent plasticity can be as eﬀective as possible. For instance,

architectures, loss functions, activation functions, data augmentation, and learning rules

can be discovered speciﬁcally for diﬀerent deep-learning tasks and datasets. Networks

can be evolved as transfer functions for cellular automata, allowing them to perform more

complex tasks. They can be evolved to serve as kernels for Gaussian processes, or as value

CHAPTER 1. INTRODUCTION

functions in Q-learning. It is possible to optimize them for particular hardware limitations,

such as limited compute or memory, or for speciﬁc neuromorphic hardware, to take the

best advantage of available resources. In domains where deep learning might work well

but there is not enough data available to train it, as is often the case in the real world, it

may be possible to evolve neural network architectures that combine data from multiple

other tasks, thus making more deep-learning applications possible. Neuroevolution can be

combined with reinforcement learning, for instance for evolving general approaches that

are then reﬁned over the lifetime of the individual, and by evolving reinforcement lear ning

mechanisms themselves, such as learning and memory mechanisms, and starting points.

Neuroevolution can also be used synergistically with LLMs in several ways: by evolving

prompts, ﬁne-tuning, and ways to merge multiple models and to orchestrate them, or using

LLMs to implement evolutionary operations in domains where it would be otherwise

diﬃcult to do. Neuroevolution can thus enhance the performance of LLMs, and LLMs

enhance evolution.

Fourth, since neuroevolution emulates biological adaptation (evolution) and encodes

solutions in biologically motivated processors (neural networks), it is a natural approach

to studying biological behavior. Neuroevolution experiments can shed light on questions

such as how mating, hunting, herding, and communication emerged over evolution, and

even how language and intelligence generally resulted from adaptation and niching in

biology. A computational model provides the ultimate understanding in cognitive science,

and neuroevolution can be used to motivate such models from a biological perspective.

On the other hand, such biological connections can provide insight into how intelligent

artiﬁcial systems can be engineered to be eﬀective, robust, and resource-eﬃcient.

1.4 Plan for the Book

This book provides a comprehensive introduction to these topics. The goal is to familiarize

the reader with the various neuroevolution technologies, but also to provide the tools to

take advantage of them, to develop them further, and to build applications. The major

algorithms are reviewed and their origins and motivation are explained; concrete examples

of their use are given and references are provided in the literature; open areas of research

are identiﬁed and suggestions for further work are given. A number of case studies are

presented in depth, illustrating how the concepts can be used to address more complex

challenges and problems in the real world. While the book assumes basic familiarity and

understanding of neural networks, not much background in evolutionary computation

is necessary. The book is accompanied on the web by several demos, exercises, and a

general software platform. The idea is to provide the reader not just with the knowledge

but also a practical tool that can be readily applied and extended.

Neuroevolution as a ﬁeld emerged in the late 1980s, with some earlier results by

Belew, McInerney, and Schraudolph (1992), Harp, Samad, and A. Guha (1989), Kitano

(1990), G. F. Miller, P. Todd, and Hedge (1989), Mjolsness, Sharp, and Alpert (1989),

Montana and L. Davis (1989), Mühlenbein and Kindermann (1989), Schaﬀer, Caruana,

and Eshelman (1990), and Whitley and T. Hanson (1989). Its development over the years

has been chronicled in comprehensive survey articles about once a decade (Floreano,

CHAPTER 1. INTRODUCTION

Dürr, and Mattiussi, 2008; Hougen and Shah, 2019; Schaﬀer, Whitley, and Eshelman,

1992; Stanley, Clune, Lehman, et al., 2019; Yao, 1999). Instead of attempting to cover

everything that has been done in this ﬁeld, this book aims to provide a guided tour and a

logical story through it.

Hence, the material is organized into ﬁve main parts. The ﬁrst part introduces the

reader to the principles of evolutionary computation through a series of increasingly

challenging examples. The speciﬁc case of neuroevolution is then introduced, similarly

through simple example applications. The ﬁrst exercises are introduced to make these

concepts concrete and productive immediately (the software platform is described in the

next section).

The second part focuses on two fundamental neuroevolution design considerations:

network encodings (direct and indirect), and making the search eﬀective through diversity.

Important distinctions between encoding approaches are clariﬁed with examples, genetic

and behavioral diversity contrasted, and novelty and quality-diversity search introduced,

as well as taking advantage of diversity through ensemblingÐall of these fundamental

methods in the neuroevolution toolbox, but rarely explicitly distinguished.

The third part focuses on intelligent agents, i.e. how eﬀective behavior can be evolved

from low-level control to high-level strategies, and ultimately to support decision-making

systems. The setting is then expanded from individual agents to collective systems with

cooperative and competitive interactions. Next, interactive evolution methods are reviewed

as a way to combine machine discovery with human insight. Finally, opportunities and

challenges for open-ended discovery will be discussed, motivated by biological evolution,

and existing artiﬁcial examples of open-ended innovation systems will be reviewed.

The fourth part then extends neuroevolution to combinations with other learning

methods. Approaches to designing deep learning architectures are ﬁrst reviewed, and

challenges in it and possible future opportunities discussed. Meta-learning is then extended

to other aspects of neural-network design, including loss and activation functions, data use,

and learning methods and their synergies. Synergistic combinations with neuromorphic

systems, reinforcement learning, and generative AI are reviewed as well, ﬁnding that in

each case it is possible to use evolution to optimize the general setting that makes other

types of learning more eﬀective.

The ﬁfth and ﬁnal part evaluates how neuroevolution can provide insight into the

study of biological evolution, from understanding neural structure and modularity, to

developmental processes and body/brain coevolution, and ﬁnally to biological behavior,

breakthroughs and evolution of language. Throughout, possible insights for biology-

motivated engineering in the future are identiﬁed. Indeed, the Epilogue points out the

potential role of neuroevolution in constructing agents with artiﬁcial general intelligence.

In sum, neuroevolution is an emerging third component of the recent AI revolution. It

allows the development of systems that generate behavior, strategies, and decision-making

agents. Applications of such agents are ubiquitous in the real world, leading to more

proﬁcient, eﬃcient, and cost-eﬀective systemsÐand generally improving lives. The area

is ripe with many future work opportunities as well.

CHAPTER 1. INTRODUCTION

1.5 Plan for Hands-on Exercises

Practical engagement is essential for mastering complex concepts such as those explored

in this book. The plan above is rooted in a commitment to provide a rich, accessible,

and eﬀective learning experience; therefore, hands-on exercises are an essential part of

it. They are accessible in the online supplement

https://neuroevolutionbook.com

This section outlines the philosophy behind them.

Purpose: The exercises are crafted to deepen the readers’ understanding through

problem-solving and experimentation. While some exercises address inherently complex

topics, others focus on areas closely aligned with current technology trends and the latest

advancements in ML/AI. By doing so, the exercises aim to: (1) Encourage exploration of

cutting-edge methodologies, making the learning experience engaging and relevant; (2)

Bridging theoretical understanding with practical implementation to solidify concepts;

(3) Foster an experimentation mindset, mirroring the iterative nature of real-world AI

research and applications. These hands-on experiences serve to develop conﬁdence and

engineering capabilities in tackling novel problems, equipping readers to innovate and

adapt to emerging challenges in the ﬁeld.

Form: The exercises are presented as Python notebooks, currently hosted on Google

Colab, to minimize setup eﬀort and enable readers to start problem-solving immediately.

This format ensures accessibility, as the exercises can run on CPUs or low-end GPUs

available in Colab, making them inclusive for readers with limited computational resources.

Each exercise is designed to take no more than 30 minutes to one hour of running or

training time for a complete solution, ensuring a balance between depth and computational

eﬃciency, while allowing students ample time to engage with and understand the content.

The tasks are carefully distilled to emphasize core knowledge while reducing execution

time, creating an experience that focuses on learning the essentials without unnecessary

overhead.

Solutions (for Instructors and TAs): For instructors and teaching assistants, complete

solutions are provided in the form of Python notebooks stored in a separate archive. These

solutions act as a reference, oﬀering clarity and consistency when guiding students during

workshops or discussions. They demonstrate the expected approach and results for each

exercise, and they are structured to facilitate adaptation or extension for varied educational

contexts. By separating the problems from their solutions, students are encouraged to

engage actively with the exercises, fostering independent learning and problem-solving

skills.

1.6 Chapter Review Questions

Deﬁnition: What is neuroevolution, and how does it diﬀer from traditional neural

network optimization methods such as backpropagation?

Key Challenges: List and describe the four illustrative challenges that neuroevolu-

tion aims to address, as presented in ﬁgure 1.1.

Mechanisms: Explain the general framework of neuroevolution, including the roles

CHAPTER 1. INTRODUCTION

of crossover, mutation, and ﬁtness evaluation.

Comparison: How does neuroevolution address the limitations of gradient-based

methods in optimizing neural networks, especially in large, high-dimensional, and

deceptive search spaces?

Creative Solutions: Why can neuroevolution be considered a tool for discovery

and creativity rather than just optimization? Provide examples to illustrate your

answer.

Applications: Neuroevolution was described as improving the world in four main

areas. List these areas and brieŕy explain one example for each.

Extending AI: How does neuroevolution complement other AI methods like

reinforcement learning and deep learning? Provide speciﬁc scenarios where these

combinations are eﬀective.

AI Transformation: Discuss the paradigm shift in AI described in the chapter.

How is neuroevolution a part of this shift, particularly in decision-making tasks?

Population-Based Search: Contrast hill-climbing methods like reinforcement

learning with population-based search methods used in neuroevolution. Why is

the latter better suited for exploring large, high-dimensional, and deceptive search

spaces?

10.

Future Directions: According to the chapter, what are some promising areas of

future research in neuroevolution, and why are they signiﬁcant?

Chapter 2

The Basics

This chapter will ﬁrst review the basics of evolutionary algorithms, including genetic

algorithms and evolution strategy. It will then cover how neural networks work, including

the architectures often used in this book, such as feedforward, convolutional, recurrent

neural networks, long short-term memory networks, and transformers. Readers familiar

with these techniques should feel free to skip this chapter.

2.1 Evolutionary Algorithms

Figure 2.1: Survival of the ﬁttest. Figure by J. Tan (2017).

Optimization is a fundamental component of machine learning and artiﬁcial intelli-

gence. However, not all problems are well-behaved enough to be solved by gradient-based

methods. Some problems lack a clear objective function, have noisy or delayed feedback,

or involve highly nonlinear dynamics that frustrate traditional optimization. In these

cases, evolutionary algorithms (EAs) provide a powerful alternative. Inspired by natural

evolution (ﬁgure 2.1), EAs evolve a population of candidate solutions using mechanisms

CHAPTER 2. THE BASICS

Variation

Operator

Solution

Selection

Initial Population

New

Population

Fitness Function

Evaluation

Yes

Ter m in a ti o n

condition

reached?

Figure 2.2: Evolutionary algorithm overview. The process begins with an initial population of

candidate solutions, which are evaluated using a ﬁtness function. Based on ﬁtness, a selection

mechanism chooses solutions for variation through genetic operators (e.g. mutation, crossover),

producing a new population. This cycle repeats until a termination condition is met.

such as selection, mutation, and recombination. EAs are widely used in various ﬁelds,

including engineering, economics, and biology, due to their ability to ﬁnd optimal or

near-optimal solutions in large and complex search spaces. These methods require only a

way to evaluate solution quality, making them highly ŕexible and broadly applicable to

domains like reinforcement learning, black-box optimization, and robotics. This section

explores the key ideas, algorithms, and applications of evolutionary methodsÐfrom

classic genetic algorithms to methods like CMA-ES and more scalable approaches such as

OpenAI ES.

An overview of the basic EA loop is shown in ﬁgure 2.2. The EA starts with a population

of candidate solutions to a problem and iteratively improves them through mechanisms

analogous to biological evolution. At each generation, individuals are evaluated using

a ﬁtness function that measures their quality. Based on ﬁtness, better individuals are

selected to reproduce. New individuals are created using variation operatorsÐtypically

crossover (recombining parts of two parents) and mutation (introducing random changes).

These oﬀspring then form the next generation. Over time, the population evolves, and the

algorithm is stopped once some termination condition is reached (e.g. optimal solution

was found or the maximum number of generations was reached). EAs are particularly

well-suited for problems where there is no single perfect solution, or where the solution

itself is complex and deﬁes easy deﬁnition with for mulas. Unlike backpropagation, which

requires a clearly deﬁned error function, EAs only need a way to evaluate goodness, not a

step-by-step guide. This ability opens doors for applications in a number of areas where

traditional gradient-based optimization techniques cannot be easily applied.

Let’s have a look at some code together (listing 1), which shows that the basic

evolutionary loop can be set up in only a few lines. Here, we use the solver paradigm,

which is popular in black-box optimization, and abstracts the optimization process into

two main operations:

ask()

, which generates candidate solutions and

tell()

, which

CHAPTER 2. THE BASICS

Listing 1 Basic evolutionary algorithm training loop.

1 solver = EvolutionAlgorithm()

while True:

# Ask the EA to give us a set of candidate solutions.

4 solutions = solver.ask()

# Create an array to hold the fitness results.

6 fitness_list = np.zeros(solver.popsize)

# Evaluate the fitness for each given solution.

8 for i in range(solver.popsize):

9 fitness_list[i]

= evaluate(solutions[i])

# Give list of fitness results back to EA.

11 solver.tell(fitness_list)

# Get best parameter, fitness from EA.

13 best_solution, best_fitness = solver.result()

if best_fitness > MY_REQUIRED_FITNESS:

break

evaluates and provides feedback. This loop continues until a high-performing solution is

discovered. We’ll now go a bit deeper into the diﬀerent components that most EAs share.

2.1.1 Representation

Individuals in an EA must be represented in a form suitable for manipulation by evolutionary

operators such as selection, crossover, and mutation. The process of deﬁning how these

individuals are encoded and manipulated is known as representation, and it plays a

pivotal role in determining the success of an evolutionary algorithm. A well-designed

representation bridges the gap between the problem domain and the evolutionary search

space, enabling eﬃcient exploration and exploitation of potential solutions.

Here, it is essential to distinguish between the genotype and the phenotype of an

individual. The genotype refers to the internal data structure used by the algorithm to

represent a candidate solutionÐtypically a string, vector, tree, or g raph structure that

is subject to variation and selection. The phenotype, on the other hand, is the external

manifestation of this solution in the context of the problem domain. It is the actual

behavior, structure, or conﬁguration that results from decoding the genotype and is

ultimately evaluated by the ﬁtness function.

For example, consider an optimization problem involving the design of an aerodynamic

wing. The genotype might be a vector of real numbers encoding control points for a spline

curve. The phenotype, derived from decoding this vector, is the physical shape of the

wing. The evolutionary algorithm manipulates genotypes, but it is the performance of the

phenotype (e.g. drag or lift) that determines ﬁtness.

The nature of the mapping between genotype and phenotype can be broadly classiﬁed

into direct and indirect encoding schemes. In a direct encoding, each element of the

genotype corresponds explicitly to an element or parameter in the phenotype. The mapping

is straightforward and often one-to-one. For instance, in a binary string representation

for a knapsack problem, each bit in the genotype directly indicates whether a particular

CHAPTER 2. THE BASICS

item is included or excluded from the knapsack. This type of encoding is typically easy

to implement and understand, and it allows direct control over the phenotype features.

However, it may become ineﬃcient or unwieldy when dealing with large or structured

phenotypes, such as networks or modular systems.

In contrast, an indirect encoding introduces an intermediate layer, where the genotype

speciﬁes rules, developmental processes, or construction procedures that lead to the

formation of the phenotype. This approach is inspired by biological development, where

the genome encodes not the organism itself but a set of instructions that guide its formation.

Indirect encodings are particularly useful when the solution space is highly structured

or exhibits regularities, symmetries, or modularities. They can lead to more compact

representations and better generalization. However, they typically require more complex

decoding procedures and can introduce challenges in designing suitable variation operators

that respect the semantics of the encoding. In chapter 4 we’ll go deeper into indirect

encodings.

Choosing or designing a representation for individuals in an evolutionary algorithm

involves a delicate balance between several competing goals. The representation must be

expressive enough to capture high-quality solutions within the search space, yet constrained

enough to avoid overwhelming the algorithm with infeasible or irrelevant candidates. It

should enable the application of variation operators in a way that preserves the syntactic

and semantic integrity of individuals. Moreover, it should support eﬃcient decoding into

phenotypes and allow the ﬁtness function to evaluate solutions meaningfully.

The interaction between genotype structure and evolutionar y dynamics is also crucial.

For example, in representations with high redundancy, where multiple genotypes map

to the same phenotype, evolutionary progress may be slowed due to wasted evaluations.

Conversely, representations with poor locality, where small changes in genotype result in

large and unpredictable changes in phenotype, can make it diﬃcult for the algorithm to

converge toward optimal regions.

2.1.2 Population-Based Search

In evolutionary algorithms, the population refers to the set of individuals maintained and

evolved over successive generations. Each individual in the population encodes a potential

solution to the optimization problem, typically as a genotype that maps to a corresponding

phenotype evaluated by a ﬁtness function. The population acts as a distributed search

mechanism, allowing the algorithm to sample multiple regions of the solution space

simultaneously. For example, for the Traveling Salesman Problem (TSP), each individual

could be a diﬀerent permutation of cities, representing a possible tour. A population of 100

such permutations allows the algorithm to evaluate and evolve multiple route possibilities

simultaneously.

A key parameter is the population size, which controls the algorithm’s capacity for

exploration and its computational cost. Smaller populations tend to converge quickly but

risk premature convergence due to insuﬃcient diversity. Larger populations maintain

broader coverage of the search space but can slow down convergence and increase resource

demands. Optimal sizing depends on problem complexity and the design of variation and

selection operators.

CHAPTER 2. THE BASICS

The initial population is usually generated randomly, ensuring an unbiased and diverse

sample of the search space. However, in certain domains, informed or heuristic-based

initialization may be used to seed the population with potentially high-quality solutions.

Regardless of the method, the goal is to start with suﬃcient diversity to support eﬀective

evolutionary progress.

In most evolutionar y algorithms, the population is unstructured, allowing all individuals

to interact freely. However, structured populations such as island models and cellular

models restrict interactions, thereby promoting subpopulation diversity. Island models

divide the population into semi-isolated groups that occasionally exchange individuals,

helping avoid global stagnation. Cellular models impose a spatial topology where

individuals interact only with neighbors, encouraging local adaptation and maintaining

niches.

Diversity maintenance within the population is critical for preventing premature

convergence. Techniques such as ﬁtness sharing, crowding, and adaptive mutation rates

are commonly employed to preserve variation among individuals. Population structure

itself can aid in preserving diversity, as can variation in selection intensity and mating

schemes.

2.1.3 Selection

The selection process is inspired by the concept of łsurvival of the ﬁttestž. The main

idea is that individuals with better ﬁtness have a higher probability of being selected for

reproduction. The selection pressure determines how strongly the algorithm favors ﬁtter

individuals. It has a profound eﬀect on the dynamics of evolution. High selection pressure

(e.g. always choosing the top few individuals) can lead to rapid convergence, as good

solutions dominate quickly. However, this can reduce genetic diversity and may cause

premature convergenceÐwhere the population gets stuck in suboptimal regions of the

search space. Low selection pressure allows weaker individuals a chance to reproduce,

which slows convergence but promotes diversity and broader exploration of the search

space. This helps in avoiding local optima, especially in complex or rugged ﬁtness

landscapes.

Diversity within the population is essential for eﬀective evolutionary search. Without it,

the population may converge prematurely, losing the potential to discover better solutions.

Selection methods and associated parameters can be tuned to help preserve diversity,

ensuring the algorithm continues to explore new possibilities rather than exploiting only

the current best. In practice, a careful balance between selection pressure and diversity

preservation is critical. Too much exploitation can hinder innovation, while too much

exploration may prevent the algorithm from reﬁning good solutions. In section 2.2.1 on

genetic algorithms, we will look at a few common selection methods.

2.1.4 Variation Operators

Variation operators are the primary mechanism by which EAs explore the solution

space. They introduce diversity by modifying existing individuals to generate new ones.

The two main types are mutation, which alters individuals randomly, and crossover (or

CHAPTER 2. THE BASICS

recombination), which combines traits from two or more parents. In simple forms of

EAsÐsuch as those with binary or real-valued encodingsÐmutation might ŕip bits

or perturb numer ical values with noise, while crossover can swap segments of parent

genomes or blend parameter values. These operators are essential for both reﬁning good

solutions and escaping local optima. Overall, variation operators drive innovation in

EAs by ensuring that new, potentially better solutions are continually introduced into

the population. The speciﬁc implementation of these operators depends heavily on how

solutions are represented and what the problem demands.

2.1.5 Fitness Evaluation

The ﬁtness score determines the individual’s likelihood of being selected for reproduction,

making this step central to guiding the evolutionary search. A well-designed ﬁtness function

eﬀectively captures the problem’s objectives and constraints, steering the population toward

high-quality solutions over successive generations. The design of the ﬁtness function is

critical and often non-trivial. In simple problems, the ﬁtness may be a direct measure

of performance, for example, classiﬁcation accuracy in a machine learning task or

total distance in a routing problem. However, in complex or real-world applications,

ﬁtness evaluation can involve signiﬁcant computational overhead or additional design

considerations. For instance, in robotic control tasks, ﬁtness may be determined by

simulating the robot’s behavior over time, accounting for factors such as stability, energy

eﬃciency, or obstacle avoidance. These simulations can be computationally expensive,

especially when involving physics engines or real-time constraints.

In engineering design problems, ﬁtness functions often incorporate constraint handling

to ensure that infeasible solutions are appropriately penalized or corrected. In other

domains, such as architectural layout or circuit design, subjective or aesthetic goals

may need to be quantiﬁed, requiring proxy metrics, surrogate models, or interactive

evolutionary approaches (chapter 8).

Furthermore, in many practical settings, the ﬁtness function must balance multiple

conŕicting objectives, such as cost versus per formance or speed versus accuracy. In such

cases, single-objective evaluation may be insuﬃcient, and multi-objective optimization

techniques (see section 2.2.5) are employed. Here, individuals are evaluated on multiple

criteria simultaneously, and selection is guided by concepts like Pareto dominance rather

than a single ﬁtness score. Because the ﬁtness function fundamentally shapes the

evolutionary trajectory, it often requires iterative reﬁnement, domain expertise, and, in

some cases, adaptive or learned components to improve search eﬃciency and relevance to

the problem domain.

2.1.6 Reproduction and Replacement

Selected individuals reproduce to form a new generation, replacing some or all of the

existing population. This step is crucial in balancing exploration (searching new areas of

the solution space) and exploitation (reﬁning promising solutions), and diﬀerent strategies

can lead to signiﬁcantly diﬀerent evolutionary dynamics. Reproduction typically involves

applying variation operators (e.g. crossover and mutation) to the selected individuals to

CHAPTER 2. THE BASICS

generate oﬀspring. The newly created individuals then enter the population through a

replacement strategy, which determines how the current population is updated. Broadly,

replacement can be categorized into generational and steady-state approaches.

In generational replacement, the entire population is replaced in each generation by

the oﬀspring. This is common in traditional genetic algorithms and promotes exploration,

as a large number of new individuals are evaluated at each step. However, it may also

result in the loss of high-quality individuals unless some form of elitism is employed.

Elitism ensures that the best-performing individuals are preserved unchanged and carried

over to the next generation, thereby preventing regression in solution quality.

In contrast, steady-state replacement updates the population incrementally. Only a few

individuals are replaced at each generation, typically by inserting new oﬀspring into the

population while removing the least ﬁt individuals. Generational replacement is more

common, but examples of steady-state replacement in the context of evolving behaviors of

bots in a machine learning game are given in section 8.1.

Ultimately, the reproduction and replacement mechanism plays a critical role in

maintaining population diversity, ensuring progress over generations, and adapting the

evolutionary process to the demands of the problem.

2.1.7 Termination

An EA is an iterative process that, in principle, can continue indeﬁnitely. However, in

practice, the algorithm is typically halted either when a satisfactory solution is found or

when further computation is unlikely to yield signiﬁcant improvements. The termination

criterion determines when the evolutionary process should stop. Several common

termination strategies are employed in evolutionary algorithms:

•

Fixed Number of Generations: The algorithm ter minates after a predeﬁned

number of generations. This is simple and commonly used, particularly when

computational resources are limited. It provides a guaranteed runtime but does not

ensure solution quality.

•

Fitness Threshold: The process stops when an individual reaches or surpasses a

predeﬁned ﬁtness value. This is suitable for problems with known acceptable or

optimal ﬁtness levels.

•

No Improvement (Stagnation): If the best ﬁtness value does not improve over a

given number of consecutive generations, the algorithm is terminated. This helps

avoid wasting resources on stagnant searches.

•

Computational Budget: The algorithm halts after consuming a speciﬁed number

of ﬁtness evaluations, CPU time, or memory. This is particularly relevant in

applications with expensive evaluation functions.

•

Population Convergence: If the population diversity falls below a threshold (e.g.

measured by genotype or phenotype variance), the algorithm may be stopped, as

this suggests convergence or lack of exploratory capacity.

CHAPTER 2. THE BASICS

The selection of an appropriate termination condition depends on the nature of the

problem, the computational cost of ﬁtness evaluations, and the balance between exploration

and eﬃciency. In practice, multiple criteria are often combined. For example, an EA

might be set to stop either after 500 generations or if a ﬁtness threshold is achieved,

whichever comes ﬁrst.

In general, ending the search too early can result in suboptimal solutions, while

continuing too long may waste resources with diminishing returns. An eﬀective termination

strategy ensures a reasonable trade-oﬀ between solution quality and computational

eﬃciency.

2.2 Types of Evolutionary Algorithms

This section focuses on two of the most prominent types of evolutionary algorithms:

Genetic algorithms and evolution strategy. The underlying principles, key components,

and applications of these algorithms are discussed. A selection of multiobjective EAs is

then presented, and many other EA methods that have been used in neuroevolution are

reviewed.

2.2.1 Genetic Algorithm

Genetic algorithms (GAs) are a popular type of evolutionary algorithm that mimics the

process of natural selection. GAs were ﬁrst introduced by John Holland in the 1970s and

have since become one of the most widely used EAs.

In GAs, each individual in the population is typically represented as a chromosome,

which is a string of genes. The genes can be binary (0s and 1s), real numbers, or any other

representation suitable for the problem. The initial population is generated randomly or

using a heuristic to provide a diverse set of starting solutions.

As discussed in the previous section, the selection process determines which individuals

survive to be candidates for reproduction and which of those contribute their genetic

material to the next generation. Common selection methods for GAs include:

•

Roulette Wheel Selection: Individuals are selected probabilistically based on their

ﬁtness, with better individuals having a higher chance of being chosen.

•

Tournament Selection: A small group of individuals is selected randomly, and the

ﬁttest individual in the group is chosen.

•

Rank-Based Selection: Individuals are ranked based on their ﬁtness, and selection

probabilities are assigned according to their rank.

•

Truncation Selection: This method involves selecting the top fraction of individuals

based solely on their ﬁtness. Only the highest-performing individuals above a

certain ﬁtness threshold contribute to the next generation, while the rest are excluded.

Truncation selection often leads to rapid convergence but can reduce genetic

diversity.

CHAPTER 2. THE BASICS

Crossover point

(a) Single-Point Crossover (b)Two-Point Crossover (c) Uniform Crossover

ParentsOffspring

Crossover points

Figure 2.3: Crossover operators. (

𝑎

) Single-Point Crossover: A single crossover point is selected,

and genetic material is exchanged beyond this point. (

𝑏

) Two-Point Crossover: Two points are

selected, and the segment between them is swapped between parents. (

𝑐

) Uniform Crossover: Each

gene is independently inherited from either parent with equal probability.

Crossover, or recombination, is a key operator in GAs that combines the genetic

material of two parent individuals to create oﬀspring. Common crossover techniques are

shown in ﬁgure 2.3 and include:

•

Single-Point Crossover: A random crossover point is chosen, and the genes from

the two parents are exchanged at this point.

•

Two-Point Crossover: Two crossover points are selected, and the segment between

them is swapped between the parents.

•

Uniform Crossover: Each gene is independently chosen from one of the two parents

with equal probability.

Following the standard EA process, mutations in GAs introduce small random changes

to an individual’s genes to maintain diversity in the population. This mechanism helps

prevent premature convergence to local optima. The mutation rate, which determines how

often mutations occur, is typically kept low. Additionally, it often helps to copy the best

individual from the current generation to the next without applying any mutations to it, a

method known as elitism.

To get a better idea of how the GA operates, we can visualize it in solving simple

toy problems. For example, ﬁgure

2.4 shows top-down plots of shifted 2D Schaﬀer

and Rastrigin functions, two of several simple problems used for testing continuous

black-box optimization algorithms. Lighter regions of the plots represent higher values

𝐹 (𝑥, 𝑦)

. As one can observe, there are many local optima in this function. Our job

is to ﬁnd a set of input parameters

(𝑥, 𝑦)

, such that

𝐹 (𝑥, 𝑦)

is as close as possible to the

global maximum. Figure 2.5 illustrates how the simple genetic algorithm proceeds over

succeeding generations. The green dots represent members of the elite population from the

CHAPTER 2. THE BASICS

(𝑎) Schaﬀer-2D function (𝑏) Rastrigin-2D function

Figure 2.4: 2D Schaﬀer and Rastrigin functions. Lighter regions represent higher values of the

ﬁtness function

𝐹 (𝑥, 𝑦)

. In addition to the global maximum, these functions are characterized by

many local optima.

previous generation, the blue dots are the oﬀspring forming the set of candidate solutions,

and the red dot is the best solution.

Genetic algorithms help diversity by keeping track of a diverse set of candidate

solutions to produce the next generation. However, in practice, most of the solutions in the

elite sur viving population tend to converge to a local optimum over time. There are more

sophisticated variations of GA out there, such as CoSyNe, ESP, and NEAT (which we will

discuss later in this book), where the idea is to cluster similar solutions in the population

together into diﬀerent species, to maintain better diversity over time.

2.2.2 Evolution Strategy

Another popular evolutionary algorithm is evolution strategy (ES). The term was originally

introduced by Rechenberg (1973). Unlike GAs, which are ŕexible in the type of

representation used (e.g. binary, symbolic, etc.), ES typically operates on real-valued

vectors and is more focused on optimizing continuous functions. In ES, each individual is

represented by a vector of real numbers, which corresponds to the solution’s parameters.

The initial population is usually generated randomly or based on some prior knowledge.

Selection in ES is deterministic, meaning that a ﬁxed number of the best individuals

(based on ﬁtness) are selected to produce oﬀspring for the next generation. Two canonical

ES variations are

(𝜇, 𝜆)

-ES and

(𝜇 + 𝜆)

-ES, which primarily diﬀer in how they select

individuals for the next generation. Both variants use a population of parents, denoted by

𝜇

, which represents the number of selected individuals that generate oﬀspring. Second,

they produce a number of oﬀspring, denoted by 𝜆, where typically 𝜆 ≥ 𝜇:

•

(

𝜇, 𝜆

) Selection: From

𝜆

oﬀspring, the best

𝜇

individuals are selected to form the

next generation. Parents are not considered for selection; only oﬀspr ing are eligible.

•

(

𝜇 +𝜆

) Selection: The best

𝜇

individuals are selected from the combined pool of

𝜇

parents and 𝜆 oﬀspring. Parents can survive into the next generation.

CHAPTER 2. THE BASICS

Figure 2.5: Simple GA progress over 20 generations. Green dots indicate elite individuals from

the previous generation, blue dots represent oﬀspring forming the new set of candidate solutions,

and the red dot marks the best solution. Over successive generations (every 4th is shown), the

GA is able to ﬁnd the global function optima, without getting stuck in the many local optima. For

animations, see https://neuroevolutionbook.com/demos.

In ES, variation is introduced primarily through mutation, which perturbs the real-

valued parameters. Mutation is usually applied by adding a normally distributed random

vector to each individual. The mutation strength, often denoted by

𝜎

, controls the

magnitude of these perturbations. Crossover is less commonly used in ES compared to

GAs but can be applied by combining the parameter vectors of two or more parents.

Let’s look at an example of a simple evolution strategy in more detail, more speciﬁcally

(𝜇 +𝜆)

-ES with ﬁxed mutation strength, in which a population of

𝜆

oﬀspring is sampled

from a multivariate normal distribution centered at a mean vector. This strategy uses elitist

selection, retaining the best

𝜇

individuals to inŕuence the next generation. In our case, we

use

𝜇 = 1

, meaning that only the best solution from the previous generation is used to

generate the next. At each generation

𝑡

, we sample a set of

𝜆

oﬀspring

{𝑥

, . . . , 𝑥

𝜆

}

from

a ﬁxed Gaussian distribution:

𝑥

𝑖

∼ N(𝑚

(𝑡 )

, 𝜎

), (2.1)

where

𝑚

(𝑡 )

∈ R

is the mean vector (i.e. the center of the sampling distribution) at

generation

𝑡

, and

𝜎 = (𝜎

𝑥

, 𝜎

𝑦

)

is the ﬁxed standard deviation along each axis (i.e. the

mutation strength).

The initial mean is set to

𝑚

(0)

= (0, 0)

, so the ﬁrst generation is sampled around the

origin. After evaluating the ﬁtness of all

𝜆

oﬀspring, the new mean

𝑚

(𝑡+1)

is updated to

the best-performing solution:

𝑚

(𝑡+1)

= arg max

𝑥

𝑖

Fitness(𝑥

𝑖

). (2.2)

Figure 2.6 shows how the algorithm behaves over 20 generations on the Schaﬀer and

Rastrigin test functions. The green dot indicates the mean of the distribution at each

generation, the blue dots are the sampled solutions, and the red dot is the best solution

found so far by our algorithm.

This simple algorithm will generally only work for simple problems. Given its greedy

nature, it throws away all but the best solution and can be prone to getting stuck at a

CHAPTER 2. THE BASICS

Figure 2.6: Simple ES progress over 20 generations. The green dot represents the mean of the

distribution at each generation, blue dots indicate the sampled solutions, and the red dot marks the

best solution found so far by the algorithm. For animations, see

https://neuroevolutionbook.

com/demos

local optimum for more complicated problems. It would be beneﬁcial to sample the next

generation from a probability distribution that represents a more diverse set of ideas rather

than just from the best solution from the current generation.

2.2.3 Covariance-Matrix Adaptation Evolution Strategy

A shortcoming of both the simple ES and the simple GA is that our standard deviation

noise parameter is ﬁxed. There are times when we want to explore more and increase the

standard deviation of our search space, and there are times when we are conﬁdent we

are close to a good optimum and just want to ﬁne-tune the solution. Covariance-matrix

adaptation evolution strategy (CMA-ES) does exactly that.

Figure 2.7: CMA-ES progress over 20 generations. In contrast to the simple GA and ES, CMA-ES

dynamically learns the shape of the search landscape by adapting the full covariance matrix of the

sampling distribution. For animations, see https://neuroevolutionbook.com/demos.

CMA-ES is an algorithm that adaptively adjusts its search strategy using feedback

from each generation. Unlike simpler methods that only modify a ﬁxed mutation scale,

CHAPTER 2. THE BASICS

(𝑎) (𝑏) (𝑐) (𝑑)

Figure 2.8: Illustration of a CMA-ES step. The algorithm proceeds with: (

𝑎

) Evaluate ﬁtness

of each candidate in generation

𝑔

. (

𝑏

) Select top 25% (purple). (

𝑐

) Compute covariance matrix

𝐶(𝑔 + 1)

using selected candidates and generation mean

𝜇(𝑔)

(green dot). (

𝑑

) Sample new

candidates using updated 𝜇(𝑔 + 1) and 𝐶 (𝑔 + 1).

CMA-ES adapts both the center and shape of its search distribution over time. It maintains

a multivariate Gaussian distr ibution and updates its parametersÐthe mean vector

𝜇

and

full covariance matrix 𝐶Ðusing the most successful candidates (ﬁgure 2.7).

At a high level, CMA-ES performs the following steps every generation. First, it

samples a population from the current Gaussian distribution and ranks them by ﬁtness.

Second, it updates

𝜇

and

𝐶

based on the best-performing individuals. The details on

how to calculate the covariance matrix

𝐶

are given in the math detail box below. These

mechanisms allow CMA-ES to stretch, shrink, or rotate the search space to better match

the landscape of the objective function. For instance, if successful solutions tend to lie

along a diagonal, CMA-ES learns that shape and directs its search accordingly. Figure 2.8

visualizes one full update cycle of CMA-ES in a 2D toy problem:

(a) Evaluate the ﬁtness of each candidate solution in generation 𝑔.

(b) Select the top-performing 25% of the population.

(c)

Use those selected individuals to estimate a new covariance matrix

𝐶

(𝑔+1)

, based on

the mean 𝜇

(𝑔)

from the current generation.

(d)

Generate the next population by sampling from a multivariate Gaussian deﬁned by

the updated 𝜇

(𝑔+1)

and 𝐶

(𝑔+1)

Because CMA-ES adapts based on actual performance, it can widen the search when

promising solutions are diverse or narrow it down when the optimum seems close. For

further technical depth, we recommend the comprehensive tutorial by CMA-ES creator

Nikolaus Hansen (Hansen, 2016).

CMA-ES is one of the most popular gradient-free optimization algorithms, and has

been the algorithm of choice for many researchers and practitioners alike. The only real

drawback is slow performance with a large number of model parameters, as the covariance

calculation is

𝑂(𝑁

)

, although recently proposed approximations can make it

𝑂(𝑁)

CMA-ES is generally a good algorithm of choice when the search space is less than a

thousand parameters. We ﬁnd that it is still usable up to around 10K parameters if we are

willing to be patient.

CHAPTER 2. THE BASICS

Math Detail: How to Estimate a Covariance Matrix

Covariance matrices describe how variables change together. In the context of

sampling or optimization algorithms, we often want to estimate this matrix from a

set of points. Here’s how.

Assume we have

𝑁

points

(𝑥

𝑖

, 𝑦

𝑖

)

for

𝑖 = 1, 2, ..., 𝑁

drawn from an unknown

distribution. The maximum likelihood estimates of the means are:

𝜇

𝑥

𝑁



𝑖=1

𝑥

𝑖

, 𝜇

𝑦

𝑁



𝑖=1

𝑦

𝑖

. (2.3)

From these, we estimate the variances and covariance:

𝜎

𝑥

𝑁



𝑖=1

(𝑥

𝑖

− 𝜇

𝑥

)

, (2.4)

𝜎

𝑦

𝑁



𝑖=1

(𝑦

𝑖

− 𝜇

𝑦

)

, (2.5)

𝜎

𝑥𝑦

𝑁



𝑖=1

(𝑥

𝑖

− 𝜇

𝑥

)(𝑦

𝑖

− 𝜇

𝑦

). (2.6)

These components form the covariance matrix:

𝐶 =



𝜎

𝑥

𝜎

𝑥𝑦

𝜎

𝑥𝑦

𝜎

𝑦



In adaptive optimization methods like CMA-ES, we often estimate this matrix from

only the top-performing points. A common trick is to use the previous generation’s

mean 𝜇

(𝑔)

rather than the updated mean 𝜇

(𝑔+1)

when calculating variance:

𝜎

2, (𝑔+1)

𝑥

𝑁

𝑏𝑒𝑠𝑡

𝑁

𝑏𝑒𝑠𝑡



𝑖=1

(𝑥

𝑖

− 𝜇

(𝑔)

𝑥

)

, (2.7)

𝜎

2, (𝑔+1)

𝑦

𝑁

𝑏𝑒𝑠𝑡

𝑁

𝑏𝑒𝑠𝑡



𝑖=1

(𝑦

𝑖

− 𝜇

(𝑔)

𝑦

)

, (2.8)

𝜎

(𝑔+1)

𝑥𝑦

𝑁

𝑏𝑒𝑠𝑡

𝑁

𝑏𝑒𝑠𝑡



𝑖=1

(𝑥

𝑖

− 𝜇

(𝑔)

𝑥

)(𝑦

𝑖

− 𝜇

(𝑔)

𝑦

). (2.9)

This approach ensures that the estimated shape reŕects the direction in which top

candidates are moving relative to the previous population center, which improves

stability during optimization.

CHAPTER 2. THE BASICS

2.2.4 OpenAI Evolution Strategy

Following CMA-ES, another prominent approach within the family of evolutionar y

strategies is OpenAI evolution strategy (OpenAI ES; Salimans, Ho, X. Chen, et al.,

2017), a scalable variant of the natural evolution strategies (NES) framework. What

distinguishes NES from conventional gradient-based methods is that it applies a gradient

ascent step using the natural gradient, a second-order method that adjusts the update

based on uncertainty, unlike the standard gradient. This leads to more stable and eﬃcient

updates, especially in high-dimensional settings. OpenAI ES builds on this principle

but simpliﬁes the setup for scalability: it uses a ﬁxed or diagonal Gaussian distribution,

estimates gradients using the score function estimator (a form of Monte Carlo sampling),

and parallelizes computation across many workers.

As we will see later on, this makes it well-suited for optimizing large neural network

policies in reinforcement learning settings (section 3.4.2), where direct gradients are

unavailable or unreliable. While simple ES typically operates on low-dimensional search

spaces, OpenAI ES was designed with scalability in mind and has been used to train deep

neural networks with millions of parameters.

Unlike CMA-ES, OpenAI ES does not adapt a full covariance matrix. Instead, it

approximates gradients using a form of ﬁnite-diﬀerence estimation. In this context, a

gradient refers to the vector of partial derivatives of the objective function with respect to

the model parameters. Intuitively, the gradient points in the direction of steepest ascentÐ

indicating how the parameters should be adjusted to most eﬀectively increase the objective

function (e.g. expected reward in reinforcement learning). In many optimization algorithms,

following the gradient allows for systematic improvement of model performance.

Since the exact gradient of the objective function may not be accessible, especially in

black-box settings, OpenAI ES estimates it using random sampling. At each generation, a

set of random perturbations

𝜖

𝑖

is sampled from a multivariate Gaussian distribution with

zero mean and isotropic (or diagonal) covariance. These perturbations are applied to the

current parameter vector

𝜃

, and each perturbed version

𝜃 + 𝜎𝜖

𝑖

is evaluated to obtain a

ﬁtness score

𝐹 (𝜃 + 𝜎𝜖

𝑖

)

. The gradient estimate is then computed as a weighted sum of

these perturbations:

∇

𝜃

𝐽 ≈

𝑁𝜎

𝑁



𝑖=1

𝐹 (𝜃 + 𝜎𝜖

𝑖

)𝜖

𝑖

, (2.10)

where 𝑁 is the number of samples and 𝜎 is the mutation strength.

This gradient estimate represents an approximation of how changes to the parameters

would aﬀect the expected objective value. Rather than computing analytical derivatives,

OpenAI ES infers the gradient from the diﬀerences in ﬁtness caused by small, random

perturbations. This approach is especially advantageous when the function is non-

diﬀerentiable (more on this in section 2.3.2), noisy, or deﬁned only through simulation.

The resulting gradient estimate is then used to update the parameters using a standard

gradient-based optimizer such as Adam:

𝜃 ← 𝜃 + 𝛼 · Adam(∇

𝜃

𝐽), (2.11)

where

𝛼

is the learning rate. This method retains the black-box nature of evolutionary

approaches, requiring only ﬁtness evaluations, and is highly parallelizable because all

CHAPTER 2. THE BASICS

perturbation evaluations are independent. Figure 2.9 shows what this strategy looks like,

with a constant 𝜎 parameter.

In addition to these simpliﬁcations, the update rule was also modiﬁed so that it is

suitable for parallel computation across diﬀerent worker machines. By pre-computing

a large grid of random numbers with a ﬁxed random seed, each worker can reproduce

the parameters of every other worker over time. Additionally, each worker needs only to

communicate a single number (i.e. the ﬁnal ﬁtness result) to all of the other workers. This

ability is important if we want to scale evolution strategies to thousands or even a million

workers located on diﬀerent machines, since while it may not be feasible to transmit an

entire solution vector a million times at each generation update, it may be feasible to

transmit only the ﬁnal ﬁtness results.

A key advantage of OpenAI ES is its robustness in high-dimensional parameter spaces

and sparse-reward environments, where traditional policy gradient methods often struggle.

It remains an important demonstration of how classical evolutionary strategies can be

adapted for modern, distributed computation, showing that gradient-free optimization can

scale remarkably well with suﬃcient compute.

Figure 2.9: OpenAI ES progress over 20 generations. In this ES variation, the

𝜎

is ﬁxed to a

constant number, and only the

𝜇

parameter is updated at each generation. For animations, see

https://neuroevolutionbook.com/demos.

Evolution strategy algorithms are often combined with a ﬁtness shaping method.

Fitness shaping makes it possible to avoid outliers in the population from dominating the

approximate gradient calculation (ﬁgure 2.10). If a particular

𝐹 (𝑧

𝑚

)

is much larger than

other

𝐹 (𝑧

𝑖

)

in the population, then the gradient might become dominated by these outliers

and increase the chance of the algorithm being stuck in a local optimum. The method

normalizes the ﬁtness values to ensure consistent scaling and reduce sensitivity to outliers.

There are alternative methods for ﬁtness shaping, but they all lead to similar results in the

end. Fitness shaping can be very useful for tasks with non-deterministic ﬁtness functions.

It is less useful for optimizing well-behaved functions that are deterministic, and using

ﬁtness shaping can sometimes slow down the time it takes to ﬁnd a good solution.

CHAPTER 2. THE BASICS

(𝑎) Raw ﬁtness (𝑏) Ranked ﬁtness

Figure 2.10: Fitness Shaping. A comparison of the original ﬁtness values (

𝑎

) and ranked

ﬁtness values (

𝑏

). With ranked ﬁtnesses, outliers do not dominate gradient calculations, and the

optimization process is less likely to get stuck at local optima.

2.2.5 Multiobjective Evolutionary Algorithms

Many real-world optimization problems require satisfying multiple, often conŕicting

objectives simultaneously. Many of the problems addressed by neuroevolution in this

book have this property as well. Traditional single-objective optimization approaches

fall short in such scenarios: they often cannot capture the trade-oﬀs between objectives

adequately. In contrast, multiobjective EAs are designed to do precisely that.

Because no single solution will be best in all objectives, the outcomes of multiobjective

problems are trade-oﬀs among objectives rather than one perfect optimum. A solution is

considered Pareto-optimal (or nondominated) if none of its objectives can be improved

without worsening at least one other objective (Chankong and Haimes, 2008). In other

words, for a minimization problem, solution A dominates solution B if A is no worse in

every objective and strictly better in at least one. If no solution exists that dominates X,

then X is Pareto-optimal. Without additional preference information, there will typically be

many Pareto-optimal solutions, all considered equally valid choices among the trade-oﬀs.

These solutions collectively form the Pareto front (also called Pareto frontier): the set of

outcome vectors that are nondominated by any other feasible solution.

Because multiobjective problems yield an entire set of trade-oﬀ solutions rather than a

single optimum, solving a multiobjective problem often means ﬁnding a representative set

of Pareto-optimal solutions rather than one ﬁnal answer. This diﬀerence poses unique

challenges. Algorithms must approximate the entire Pareto front as well as possible,

giving the decision-maker a comprehensive set of choices that balance the objectives. The

goal is twofold: (1) convergenceÐsolutions should be as close as possible to the true

Pareto-optimal front, and (2) diversityÐsolutions should be well-spread along the front to

capture diﬀerent trade-oﬀs. Achieving a good balance between convergence and diversity

is a central theme in multiobjective optimization algorithms.

Because evolutionary computation is a population-based search method, multiobjective

optimization is a natural ﬁt, and several methods have been developed for it (Coello

Coello, Van Veldhuizen, and Lamont, 2007; Q. Zhang and H. Li, 2007). Perhaps the best

known is the non-dominated sorting genetic algorithm II (NSGA-II; Deb, Pratap, Agarwal,

et al., 2002). NSGA-II is well-regarded for its eﬃciency and its well-balanced handling

of convergence and diversity. It addresses several shortcomings of earlier methods by

CHAPTER 2. THE BASICS

introducing three key mechanisms: elitism, fast non-dominated sorting, and crowding

distance. Together, these mechanisms allow NSGA-II to ﬁnd an approximation of the

Pareto front that is both close to the true front and well-spread along it. In more detail:

Elitism and Generational Selection: NSGA-II is an elitist GA: the best solutions

are preserved between generations, ensuring that the Pareto front approximation never

degrades. At each generation, NSGA-II creates oﬀspring through crossover and mutation,

then merges parent and oﬀspring populations (of size

𝑁

each) into a temporary population

of size

2𝑁

. It then selects the next generation by picking the

𝑁

best individuals from

this merged set. łBestž is determined ﬁrst by Pareto rank (front number) and second by

diversity (crowding distance, explained below). By selecting from the union of parents

and children, NSGA-II ensures that no high-quality solution is ever lostÐif an oﬀspring is

worse than all parents, the parents will carry over; if an oﬀspring dominates its parents, it

will be included. Elitist selection was a major improvement in reliability over non-elitist

algorithms, which could sometimes discard Pareto-optimal solutions due to random

ŕuctuations. It also tends to speed up convergence, as good solutions accumulate over

time.

Fast Non-Dominated Sorting: To rank the

2𝑁

candidates, NSGA-II perfor ms eﬃcient

non-dominated sorting that classiﬁes individuals into Pareto fronts in

𝑂(𝑀 × 𝑁

)

time

(where

𝑀

is the number of objectives). This approach is signiﬁcantly faster than the

original NSGA’s 𝑂(𝑀 × 𝑁

) approach. The sorting procedure works as follows:

Identify Front 1: Find all individuals that are not dominated by any other in the

population.

Identify Front 2: Remove the ﬁrst front from consideration; then ﬁnd the nondomi-

nated set of the remaining individuals.

Repeat: Continue removing identiﬁed fronts and ﬁnding the next nondominated set,

until all individuals are classiﬁed into fronts.

Each individual gets a rank (ﬁtness) equal to the index of the front to which it belongs; a

lower rank is better. This layering implicitly favors convergence: solutions on the ﬁrst

front are Pareto-optimal within the population and thus are preferred to any dominated

solutions. NSGA-II’s eﬃcient implementation relies on bookkeeping to avoid redundant

dominance comparisons, making it practical to sort large populations quickly.

Crowding Distance for Diversity: After sorting, NSGA-II knows how many whole

fronts it can fully include in the new generation. For instance, fronts 1, 2, ...

𝑘 − 1

might

all ﬁt, and Front

𝑘

is the last partial front that exceeds the population limit

𝑁

. To choose

which individuals from the last included front

𝑘

get to ﬁll the remaining slots, NSGA-II

uses crowding distance. This measure is a numerical estimate of how crowded a solution

is relative to its neighbors on the same front. It is calculated by sorting the front’s solutions

according to each objective value and, for each solution, measuring the objective-space

distance to its nearest neighbors on either side. A larger crowding distance means the

solution resides in a sparsely populated region of the Pareto front. During the selection

of the last front, NSGA-II prefers those with larger crowding distances, i.e. it preserves

the points that maximize diversity and eliminates those in dense clusters. This simple yet

CHAPTER 2. THE BASICS

eﬀective strategy prevents the algorithm from focusing only on a small area of the Pareto

front.

Because of its good performance and simple implementation, NSGA-II has become a de

facto baseline for multiobjective optimization. It has been applied in many domains and has

inspired many variants and improvements. For instance, NSGA-III, an extension to more

objectives, uses reference points in lieu of crowding, but retains the core nondominated

sorting idea. Indeed, typically NSGA-II works well up to half a dozen objectives, after

which the Pareto front starts to have too many solutions (i.e. fewer solutions dominate

other solutions). Other techniques have been developed for many-objective optimization,

up to hundreds or thousands of objectives, representing a large number of constraints or

tests (Deb and H. Jain, 2014; Ishibuchi, Tsukamoto, and Nojima, 2008).

In sum, multiobjective formulation is often a natural way to approach problems in

the real world, including those addressed eﬀectively by neuroevolution. Multiobjective

techniques will therefore be demonstrated many times in this book, e.g. in sections 6.4.3-

6.4.4, 10.4-10.5, and 14.2. It can also play a signiﬁcant role in maintaining diversity, as

will be discussed in section 5.5.

2.2.6 Further Evolutionary Computation Techniques

While this chapter has focused on the most common techniques, virtually any evolution-

ary computation method has been applied to evolving neural networks in some form.

Researchers have experimented with a wide range of algorithms beyond standard EAs.

Below is an outline of several additional evolutionary approaches that have been explored

in neuroevolution.

A prominent example is genetic programming (GP; Banzhaf, Nordin, R. E. Keller,

et al., 1998; Poli, Langdon, and McPhee, 2008). It evolves computer programs or symbolic

expressions, traditionally representing solutions as tree-structured programs. Originally

introduced by Koza (1992) as a way to evolve programs for arbitrary tasks, GP extends the

genetic algorithm paradigm to variable-length, executable structures. In the context of

neuroevolution, GP oﬀers the ŕexibility to evolve neural networks in more open-ended

ways, e.g. by evolving entire network construction programs, activation functions, or

learning rules. For example, GP is used to evolve indirect encodings in section 4.2.2, to

optimize neural architectures in section 10.3.1, and loss functions, activation functions,

and learning methods in chapter 11. A new opportunity is also emerging in enhancing GP

by using large language models as advanced mutation operators (section 13.3.1).

Despite a similar name, evolutionary programming (EP; D. B. Fogel, 2006; L. J. Fogel,

Owens, and Walsh, 1966) is a distinctly diﬀerent method from GP. It was originally

developed to evolve predictive models and ﬁnite state machines for predictive modeling,

and later generalized to continuous optimization problems, such as neural networks. The

representations are usually ﬁxed-length vectors, and mutation is the primary operator.

As with ES, mutation is typically not used. As will be pointed out in section

3.1, EP

was one of the earliest neuroevolution techniques, and it was later used in game-playing

neuroevolution as well (section 7.2.1).

Cartesian genetic programming (CGP; J. F. Miller, 2011; J. F. Miller, 2020), is a

form of genetic programming that represents programs or neural networks as directed

CHAPTER 2. THE BASICS

acyclic graphs (instead of tree structures), often laid out on a 2D grid of nodes. CGP has

proven well-suited for evolving neural networks because an arbitrary graph can naturally

represent neural architectures (including recurrent or skip connections) more directly than

a tree. The method retains many advantages of GP (e.g. ŕexibility in representation) while

constraining individuals to a Cartesian g rid of nodes for eﬃciency and simplicity. For

instance, CGP is used in the work described in section 14.4.2 to discover plasticity rules

for spiking neural networks.

Particle swarm optimization (PSO; Kennedy and Eberhart, 1995; Shami, El-Saleh,

Alswaitti, et al.,

2022) is a population-based optimization method inspired by social

behaviors in animals (such as bird ŕocking). In PSO, a swarm of particles (candidate

solutions) ŕies through the search space of neural network parameters, where each

particle’s position encodes a set of weights or other network design variables. The particles

update their positions iteratively based on their own best-found solution and the swarm’s

global best solution, eﬀectively sharing infor mation to converge on optima. Because of its

ability to ﬁnd local optima accurately, PSO can be used in neuroevolution e.g. to reﬁne the

parameters of a neural network that was evolved oﬄine (section 6.2.2).

Similarly, ant colony optimization (ACO; Dorigo, Maniezzo, and Colorni, 1996;

Dorigo and Stützle,

2010) is a swarm intelligence technique that ﬁnds solutions by

mimicking how real ant colonies forage for paths between their nest and food sources. A

set of artiﬁcial ants constructs solutions on a graph incrementally, e.g. by selecting neural

network components or connections step by step. As they build solutions, the ants deposit

virtual pheromones on the graph edges; shorter or higher-quality solutions result in stronger

pheromone trails, which bias subsequent ants to follow those components (which is a form

of positive feedback). Over iterations, an optimal or near-optimal solution emerges as the

heavily pheromone-traveled path. For example, ACO can be used in neural architecture

search, where the network is constructed based on the ants’ path (section 6.2.2).

In contrast to most EAs, estimation of distribution algorithms (EDAs; Alden and

Miikkulainen,

2016; Baluja and Caruana, 1995; Larranaga and J. Lozano, 2002; J. A.

Lozano, Larrañaga, Inza, et al.,

2006; Pelikan, Goldberg, and Cantú-Paz, 1999) take a

fundamentally diﬀerent approach to population-based search. They replace traditional

variation operators with probabilistic modeling. Instead of relying on individual or

collective behavior, EDAs construct a statistical model of the most promising solutions

found so far and sample new candidates from this learned distribution. This approach

allows the algorithm to capture and exploit underlying patterns or dependencies among

variables, making it especially powerful for complex optimization problems where such

structure is present. In contrast to most EAs, EDAs oﬀer a model-driven approach that

adapts as the search progresses, enabling a more informed exploration of the solution

space. In neuroevolution, EDAs have been used to evolve neural network weights and

structures by iteratively reﬁning a distribution over network parameters (section 5.7).

In addition, diﬀerential evolution (DE; Price, Storn, and Lampinen, 2005; Storn and

Price, 1997) has recently turned out promising as well: it has been used both to optimize

network weights as well as search for deep learning architectures (Awad, Mallik, and

Hutter, 2020; Iacca, Caraﬃni, and Neri, 2020; Mousavirad, Tabatabaei, Zabihzadeh,

et al., 2025; B. Wang, Sun, Xue, et al., 2018). DE is a population-based stochastic

CHAPTER 2. THE BASICS

search algorithm that operates through a simple but eﬀective mutation–crossover–selection

cycle. Mutation is performed by adding the weighted diﬀerence of two randomly selected

individuals to a third, i.e.

𝑣

𝑖

= 𝑥

𝑟

+ 𝐹 · (𝑥

𝑟

− 𝑥

𝑟

), (2.12)

where

𝑥

𝑟

, 𝑥

𝑟

, 𝑥

𝑟

are distinct population vectors, and

𝐹 ∈ (0, 2)

controls the ampliﬁcation

of diﬀerential variations. The resulting mutant vector

𝑣

𝑖

is then mixed with the current

target vector

𝑥

𝑖

through a crossover operator, yielding a trial vector. Finally, greedy

selection ensures that the ﬁtter of 𝑥

𝑖

and its trial replaces 𝑥

𝑖

in the next generation.

Indeed, given the popularity of neural networks as a prediction and decision approach,

and the power of population-based search to ﬁnd good solutions, it is no surprise that

almost any advances in EAs can be utilized in neuroevolution as well.

2.2.7 Try These Algorithms Yourself

There is no better way to learn and gain intuition than by trying out these evolutionary

algorithms yourself. There are open-source implementations for most of the algorithms de-

scribed in this book. For example, the author of CMA-ES, Nikolaus Hansen, has maintained

a numpy-based implementation of CMA-ES (

https://github.com/CMA-ES/pycma

)

with lots of bells and whistles. His Python implementation introduced some of us to the

training loop interface described earlier. Since this interface is quite easy to use, we’ve

integrated additional algorithmsÐlike a simple GA and OpenAI’s ESÐinto a compact

Python module named

es.py

. We’ve also wrapped the original CMA

ES library within

this lightweight package. This way, we can quickly compare diﬀerent ES algorithms by

just changing one line, as seen in listing 2.

Listing 2 Basic training loop with interchangeable solvers.

1 import es

# solver = es.SimpleGA(...)

4 # solver = es.PGPE(...)

5 # solver = es.OpenES(...)

6 solver = es.CMAES(...)

while True:

9 solutions

= solver.ask()

10 fitness_list

= np.zeros(solver.popsize)

12 for i in range(solver.popsize):

13 fitness_list[i]

= evaluate(solutions[i])

15 solver

.tell(fitness_list)

16 result

= solver.result()

18 if result[1] > MY_REQUIRED_FITNESS:

19 break

CHAPTER 2. THE BASICS

Figure 2.11: 100-Dimensional Rastrigin Function Results. A comparison of the performance

for various algorithms discussed in this section for the high-dimensional Rastrigin function.

You can ﬁnd

es.py

https://neuroevolutionbook.com

exercises. In the

accompanying notebook, we show how to use the ES solvers in

es.py

to solve a 100-

dimensional version of the Rastrigin function with even more local optimum points. The

100-D version is somewhat more challenging than the trivial 2D version used to produce

the visualizations in this book. On this 100-D Rastrigin problem, none of the optimizers

got to the global optimum solution, although CMA-ES comes close (ﬁgure 2.11). CMA-ES

is clearly the best performer, with OpenAI-ES / genetic algorithm further behind. We had

to use an annealing schedule to gradually lower

𝜎

for OpenAI-ES to make it perform

better for this task.

In general, choosing between a GA, CMA-ES, OpenAI ES, or other EAs depends

heavily on the nature of the problem, the search space, and available computational

resources. GAs are relatively simple to implement and per form well when the problem

landscape has many local optima or when custom genetic operations can be crafted to

exploit str ucture in the solution space. They are a natural choice when the problem isn’t

purely continuous.

CMA-ES, in contrast, is tailored for continuous, real-valued optimization problems. It

stands out when dealing with non-convex or rugged landscapes, especially when variables

are interdependent or when the objective function is not easily separable. The strategy

automatically adapts the shape of its sampling distribution to the topology of the problem,

making it very eﬃcient in exploring complex ﬁtness landscapes. CMA-ES typically

performs best on low- to medium-dimensional problems.

OpenAI ES is designed for scalable, parallel optimization of high-dimensional

continuous problems, where reward signals are sparse, noisy, or hard to diﬀerentiate.

Unlike GA and CMA-ES, OpenAI ES emphasizes massive parallelism and simple, gradient-

free updates, making it a compelling option when computational power is abundant

and traditional gradient-based methods are impractical. It doesn’t adapt its sampling

CHAPTER 2. THE BASICS

distribution as intricately as CMA-ES but beneﬁts from being easy to implement, robust

in noisy environments, and eﬃcient in settings with large populations and cloud-based

infrastructure.

Ultimately, while each method has its strengths, no single one is universally best.

Performance varies signiﬁcantly with the problem, and practical experimentation is usually

the most reliable way to choose among them. Importantly, these methods are not limited

to simple optimization tasksÐthey can be eﬀectively combined with neural networks.

While evolutionary algorithms provide a robust framework for global search and

optimization, neural networks excel in learning complex patterns and approximating

nonlinear functions. As we will see throughout this book, the synergy between these

two paradigms becomes particularly evident in neuroevolution. Before diving deeper

into this integration, it is essential to ﬁrst understand the str ucture, learning dynamics,

and capabilities of neural networks in their own right. This will lay the groundwork for

appreciating how evolution can be harnessed to shape and enhance their performance.

2.3 Neural Networks

Artiﬁcial neural networks (ANNs) are a class of machine learning models loosely inspired

by the structure and function of the human brain. They consist of layers of interconnected

nodes that process input data to produce an output. ANNs have been remarkably successful

in various domains such as image recognition, natural language processing, and time-series

forecasting. This section will provide the basic ideas behind the structure and function

of neural networks, focusing on several key architectures used throughout the book:

Feedforward neural networks (FNNs), recurrent neural networks (RNNs), long short-term

memory networks (LSTMs), convolutional neural networks (CNNs), and transformers.

2.3.1 Feedforward Neural Networks

Feedforward neural networks are the simplest type of artiﬁcial neural network. They

consist of an input layer, one or more hidden layers, and an output layer (ﬁgure 2.12

𝑎

Information ŕows in one direction, from the input to the output, without loops or cycles.

The network begins with the input layer, which receives raw data. Each node in this

input layer corresponds to a feature or variable from the input dataset or the environment.

This layer performs no calculations; it merely passes the input values to the next layer.

After the input layer, the data moves through one or more hidden layers. These layers

are where the actual computations occur. Each hidden layer consists of multiple nodes, or

neurons, which are fully connected to the nodes of the previous layer. Every connection

between nodes has an associated weight that signiﬁes the strength or importance of that

connection. Each neuron also has a bias value that modiﬁes the output.

For each neuron in a hidden layer, a weighted sum of all incoming inputs is calculated

(ﬁgure 2.12

𝑏

). This sum is then passed through an activation function, such as ReLU,

sigmoid, or tanh, which introduces nonlinearity to the model. The nonlinearity is crucial

because it allows the network to model more complex relationships between inputs and

CHAPTER 2. THE BASICS

Hidden

Input

Output

(𝑎) Feedforward neural network

∑

Transfer

function

Activation

function

...

(𝑏) Artiﬁcial neuron

Figure 2.12: Artiﬁcial neural networks. (

𝑎

) This example feedforward network has three inputs,

one hidden layer with ﬁve nodes, and one output layer with one node. The input to the network

propagates through the consecutive layers of the neural network to produce the outputs. The details

of an artiﬁcial neuron are shown in (

𝑏

). The inputs to a neuron are ﬁrst weighted, and their sum is

then passed through an activation function.

outputs. The output of the neurons in one layer becomes the input for the neurons in the

next layer.

The ﬁnal layer in the network is the output layer, which produces the network’s

prediction. The number of neurons in the output layer matches the number of possible

outputs. For example, a binary classiﬁcation task may have one or two output neurons,

while a multi-class classiﬁcation problem might have as many neurons as there are classes

to predict. In other contexts, such as networks evolved for control or decision-making

tasks, the output layer may signify the actions an agent should take, with each neuron

corresponding to a possible action or control signal.

An FNN can be represented mathematically as follows:

𝑦 = 𝜎(𝑊

ℎ

· 𝜎(𝑊

· 𝑥 + 𝑏

) + 𝑏

ℎ

). (2.13)

Here,

𝑥

is the input vector,

𝑊

and

𝑊

ℎ

are weight matrices for the ﬁrst and hidden layers,

respectively. The bias vectors are

𝑏

and

𝑏

ℎ

, and

𝜎(·)

is the activation function. The

output vector is denoted as 𝑦.

2.3.2 Training Feedforward Neural Networks with Gradient Descent

While this book is about neuroevolution, we will brieŕy explain the backpropagation

algorithm to train neural networks. Backpropagation is a powerful algorithm for many

applications. However, backpropagation typically requires large amounts of labeled data

and that the function being optimized (e.g. the neural network model) is diﬀerentiable.

Diﬀerentiability means that the function has a well-deﬁned derivative at every point in its

domain, allowing us to compute gradients that indicate how to adjust weights to minimize

error. In practical terms, each activation function, layer operation, and loss function in

the network must support diﬀerentiation so that the chain rule can be applied across all

layers (more on this below). We will see in later chapters how both neuroevolution and

backpropagation can be synergistically combined, for example, in the context of neural

architecture search (chapter 10) or reinforcement learning (chapter 12).

CHAPTER 2. THE BASICS

While we focus on the application of backpropagation to feedforward neural networks

in this section, it can similarly be applied to RNN and LSTM, and it is also used in

CNNs and transformers. Backpropagation is a fundamental algorithm for training neural

networks by minimizing the loss function, which quantiﬁes the error in the network’s

predictions. This algorithm calculates the gradient of the loss function with respect to each

weight and bias in the network. A gradient is essentially a vector of partial derivativesÐit

tells us how much a small change in each parameter (like a weight or bias) will aﬀect the

overall error or loss of the network. By following the direction of the negative gradient (a

process known as gradient descent), the network can update its parameters in a way that

gradually reduces the error. You can think of this process like hiking down a hill in the fog:

the loss function is the terrain, and your goal is to reach the lowest point (the minimum

error). Since you cannot see far ahead, you feel the slope under your feet (the gradient)

and take a small step in the direction that goes downhill the fastest. Repeating this step

over and over slowly leads you to the bottom of the valley, just like repeated updates lead

the network to better performance.

In the 1980s, backpropagation became widely recognized and applied in neural

networks, thanks to the work of Rumelhart, Hinton, and R. J. Williams (

1986). Their

seminal paper highlighted backpropagation as a practical and eﬀective way to train

multi-layer neural networks. This breakthrough renewed interest in neural networks,

marking a signiﬁcant milestone in machine learning and artiﬁcial intelligence.

The backpropagation algorithm consists of two main phases: a forward pass and a

backward pass. In the forward pass, input data ŕows through the network layer by layer,

producing an output. This output is compared with the true target value to compute the

loss, or error, of the network’s prediction.

The backward pass uses the chain rule to calculate gradients of the loss function with

respect to each weight and bias in the network. This information is then used to adjust

these parameters to minimize the error. The key steps in the backward pass are as follows:

Initialize Gradients: Start by calculating the loss,

𝐿

, from the forward pass. Then,

initialize the gradients for each weight and bias in the network.

Calculate the Gradient at the Output Layer: Compute the gradient of the loss

with respect to the output layer’s activations. For example, in a neural network with

output ˆ𝑦 and target 𝑦, if the loss function is Mean Squared Error (MSE),

𝐿 =

(ˆ𝑦 − 𝑦)

(2.14)

then the gradient of 𝐿 with respect to ˆ𝑦 is:

𝜕𝐿

𝜕 ˆ𝑦

= ˆ𝑦 − 𝑦 (2.15)

Backpropagate the Error to the Previous Layers: For each layer, star ting from

the output layer and moving back to the input layer:

(a)

Calculate the Gradient of the Activation Function: For each neuron, apply

the derivative of the activation function to the neuron’s output to compute how

CHAPTER 2. THE BASICS

sensitive the neuron’s output is to changes in its input. For example, if the

activation function is Sigmoid:

𝜎(𝑥) =

1 + 𝑒

−𝑥

, 𝜎

′

(𝑥) = 𝜎(𝑥) · (1 − 𝜎(𝑥)) (2.16)

(b)

Calculate the Gradient of the Weights and Biases: Using the chain rule,

multiply the gradients from the previous layer by the current layer’s activation

derivative to compute the gradients with respect to each weight and bias.

(c)

Store the Gradients for Each Weight and Bias: These gradients will be

used in the next step to update the weights and biases.

Update Weights and Biases: After computing the gradients via backpropagation,

update each weight

𝑤

and bias

𝑏

by moving in the opposite direction of the gradient,

scaled by the learning rate 𝛼:

𝑤 ← 𝑤 − 𝛼

𝜕𝐿

𝜕𝑤

, 𝑏 ← 𝑏 − 𝛼

𝜕𝐿

𝜕𝑏

(2.17)

Backpropagation is sensitive to cer tain hyperparameters, such as the learning rate

𝛼

Choosing an appropriate learning rate is essential; a value that is too large may cause

the network to diverge, while a value that is too small may result in slow convergence.

Techniques such as learning rate schedules or adaptive optimizers (e.g. Adam) can help.

Additionally, for deep networks, issues like vanishing and exploding gradients may

arise, especially when using activation functions like sigmoid or tanh. Techniques such as

ReLU activation, batch normalization, and careful weight initialization can help mitigate

these issues.

In summary, backpropagation allows neural networks to learn from data by calculating

the g radients of the loss with respect to each weight and bias and updating them in a

way that reduces prediction error. Instead of using backpropagation, we can also directly

optimize the weights and structure of neural networks with evolution. Chapter 3 gives an

overview of how this can be done.

2.3.3 Recurrent Neural Networks

A recurrent neural network (RNN) (ﬁgure 2.13

𝑎

) is a type of artiﬁcial neural network

designed to recognize patterns in sequences of data, such as time series, text, or audio.

Unlike feedfor ward neural networks, RNNs have connections that loop back, allowing

information to persist. This architecture makes them particularly well-suited for tasks

where context and order matter, enabling them to handle sequences of variable length and

maintain a memory of what has been processed.

Let’s have a look at exactly how a recurrent neural network works. In the RNN, the

neurons not only receive input from the previous layer but also from their previous states.

This recur rency allows the network to maintain a form of memory about the past inputs,

which is essential for tasks like speech recognition, machine translation, or any other

problem where the current input depends on the previous inputs. As we will see later on,

this temporal awareness also makes RNNs well-suited for agents that act in environments

CHAPTER 2. THE BASICS

where decisions depend not just on the current observation but on the sequence of prior

events.

The network begins with an input layer that receives a sequence of data. Unlike

feedforward networks, RNNs process sequences one element at a time. For example, in a

text processing task, each word in a sentence might be fed into the network one by one.

The core of an RNN is its hidden state, which is designed to maintain a hidden state,

or memory, that captures information about the sequence. When an input element is fed

into the network, it is combined with the previous hidden state to produce a new hidden

state. Mathematically, this is often represented as:

ℎ

𝑡

= 𝑓 (𝑊 · 𝑥

𝑡

+𝑈 · ℎ

𝑡 −1

+ 𝑏), (2.18)

where

ℎ

𝑡

represents the hidden state at time step

𝑡

𝑥

𝑡

is the input at time step

𝑡

𝑊

and

𝑈

are weight matrices for the input and hidden state, respectively,

𝑏

is a bias term,

𝑓

is an

activation function, typically a nonlinear function like tanh or ReLU. This hidden state is

updated at each time step, capturing both the current input and the past context.

At each time step, the hidden state can produce an output, depending on the speciﬁc

task. The output is computed using the current hidden state and a weight matrix. For

example, in a text prediction task, the output at each time step might represent the predicted

next word in a sentence.

In the case of supervised learning problems, RNNs are typically trained using

backpropagation through time (BPTT). However, they suﬀer from issues like vanishing

and exploding gradients, which makes it diﬃcult to capture long-term dependencies in the

data.

Neuroevolution techniques that optimize both weights and network topology can

naturally exploit recurrent connections to discover clever solutions, as we will see in

section 3.3.4 of the next chapter.

2.3.4 Long Short-Term Memory

A long short-term memory (LSTM) network is a special type of RNN designed to overcome

some of the limitations of traditional RNNs, particularly the problem of learning long-term

dependencies (ﬁgure 2.13

𝑏

). LSTMs (Hochreiter and Schmidhuber, 1997) can learn

and retain information over extended periods, making them highly eﬀective for tasks

involving sequential data, such as language modeling, speech recognition, and time-series

forecasting.

An LSTM network comprises a series of LSTM cells, which replace the standard

neurons in traditional RNNs. Each LSTM cell has a more complex internal structure

designed to control the ŕow of information in and out of the cell, using several gates.

These gates regulate which information is added, updated, or forgotten, allowing the

network to maintain long-term dependencies and learn which pieces of information are

important for making predictions.

An LSTM cell contains three main gates: the forget gate, the input gate, and the output

gate. These gates use sigmoid activation functions to decide whether to let information

pass through or not. Here is a breakdown of each component:

CHAPTER 2. THE BASICS

Input x

...

(𝑎) Recurrent Neural Network

tanh

Forget

gate

Input

gate

Output

gate

Input x

Hidden

state h

t-1

(𝑏) Long Short-Term Memory Block

Figure 2.13: Recurrent neural network and LSTM block. (

𝑎

) The left side shows a basic

recurrent neural network architecture, where the hidden state is updated at each time step using

the current input and the previous hidden state. The unrolled version of the RNN over multiple

time steps is shown to the right, illustrating how the network processes a sequence by passing

information forward through time via shared weights. (

𝑏

) An LSTM block illustrating the internal

structure, including the cell state and the three gating mechanisms: forget gate, input gate, and

output gate. These components work together to regulate the ŕow of information, enabling the

network to learn long-range dependencies in sequential data.

Forget Gate: The forget gate determines which parts of the cell’s previous state should

be discarded or forgotten. It takes the current input (

𝑥

𝑡

) and the previous hidden state

(

ℎ

𝑡 −1

) and passes them through a sigmoid function. The output of this function is a value

between 0 and 1 for each number in the cell state (

𝐶

𝑡 −1

), where 0 represents łcompletely

forgetž and 1 represents łcompletely retain.ž:

𝑓

𝑡

= 𝜎(𝑊

𝑓

· [ℎ

𝑡 −1

, 𝑥

𝑡

] + 𝑏

𝑓

), (2.19)

where

𝑓

𝑡

is the forget gate’s output,

𝑊

𝑓

is the weight matrix for the forget gate,

𝑏

𝑓

the bias term for the forget gate, and 𝜎 denotes the sigmoid function.

Input Gate: The input gate decides which new information will be added to the

cell state. It consists of two parts: a sigmoid layer that deter mines which values will be

updated and a tanh layer that creates a vector of new candidate values that could be added

to the state. These two layers’ results are multiplied to decide which new information to

keep. We can deﬁne it as:

𝑖

𝑡

= 𝜎(𝑊

𝑖

· [ℎ

𝑡 −1

, 𝑥

𝑡

] + 𝑏

𝑖

𝐶

𝑡

= tanh(𝑊

𝐶

· [ℎ

𝑡 −1

, 𝑥

𝑡

] + 𝑏

𝐶

), (2.20)

where

𝑖

𝑡

is the input gate’s output,

𝐶

𝑡

represents the new candidate values to be added,

𝑊

𝑖

and

𝑊

𝐶

are weight matrices for the input gate and candidate values, and

𝑏

𝑖

and

𝑏

𝐶

are

the bias terms for the input gate and candidate values.

Cell State Update: The new cell state

𝐶

𝑡

is updated by combining the old cell state

𝐶

𝑡 −1

multiplied by the forget gate output

𝑓

𝑡

(which determines what to forget) and the new

candidate values

𝐶

𝑡

multiplied by the input gate output

𝑖

𝑡

(which determines what new

information to add):

𝐶

𝑡

= 𝑓

𝑡

∗𝐶

𝑡 −1

+𝑖

𝑡

∗

𝐶

𝑡

. (2.21)

This equation eﬀectively updates the cell state by retaining the necessary information

from the past and incorporating the new relevant information.

CHAPTER 2. THE BASICS

Convolution Subsampling Convolution Subsampling Fully connected

Output

Figure 2.14: A typical architecture of a convolutional neural network. The input image passes

through multiple layers of convolutions, which extract various features, followed by subsampling

(pooling) layers to reduce dimensionality. This process is repeated to create deeper feature maps,

which are then ŕattened and connected to fully connected layers to generate the ﬁnal output.

Output Gate: The output gate determines the next hidden state

ℎ

𝑡

, which is used for

the next time step and can also be an output for the current time step. The output gate ﬁrst

passes the current input and previous hidden state through a sigmoid function to decide

which parts of the cell state to output. Then, it multiplies the cell state (after applying the

tanh function to scale between -1 and 1) by the output of the sigmoid gate:

𝑜

𝑡

= 𝜎(𝑊

𝑜

· [ℎ

𝑡 −1

, 𝑥

𝑡

] + 𝑏

𝑜

), ℎ

𝑡

= 𝑜

𝑡

∗ tanh(𝐶

𝑡

), (2.22)

where

𝑜

𝑡

is the output gate’s output,

ℎ

𝑡

is the new hidden state,

𝑊

𝑜

is the weight matrix

for the output gate, and 𝑏

𝑜

is the bias term for the output gate.

The gating mechanisms in LSTM cells allow them to remember information for long

periods. This mechanism is particularly useful in tasks where the context of earlier parts

of a sequence is essential for making accurate predictions later. Additionally, LSTMs are

speciﬁcally designed to mitigate the problem of vanishing gradients, which occurs when

training traditional RNNs on long sequences. The cell state in LSTMs can maintain a

constant ŕow of gradients during backpropagation, allowing the network to learn long-term

dependencies eﬀectively.

In this book, we will see how neuroevolution is able to successfully optimize the

weights of LSTMs that control agents in complex environments (section 7.1.2) or is even

able to come up with new and better-performing LSTM node designs (section 10.3.1).

2.3.5 Convolutional Neural Networks

A convolutional neural network (CNN) is a type of deep learningmodel speciﬁcally designed

to process and analyze data with a grid-like structure, such as images (ﬁgure 2.14). CNNs

are particularly eﬀective for tasks that involve spatial hierarchies in data, such as image

recognition, object detection, and video analysis. The architecture of CNNs is inspired by

the visual cortex of the brain, where individual neurons respond to overlapping regions in

the visual ﬁeld (Fukushima, 1980; Hubel and Wiesel, 1968).

CHAPTER 2. THE BASICS

A CNN consists of several layers, each with a speciﬁc function. The primary building

blocks of a CNN are the convolutional layers, pooling layers, and fully connected layers.

These layers work together to automatically and adaptively learn spatial hierarchies of

features from input data.

The Convolutional Layer: The convolutional layer is the core component of a CNN.

It performs the convolution operation, which involves sliding a small ﬁlter or kernel (a

matrix of weights) over the input data. This sliding motion is gover ned by a stride, which

deﬁnes how many pixels the ﬁlter moves at each step. Padding (adding values, often zeros,

around the input’s borders) is frequently applied to control the spatial dimensions of the

output and retain information at the edges.

As the ﬁlter slides, it performs a dot product between its weights and the corresponding

patch of the input data, producing a single value in the output feature map. This operation

allows the ﬁlter to detect spatial patterns such as edges, textures, or speciﬁc color variations

within the input. This can be visualized as taking a small window of the input image (the

same size as the ﬁlter), applying the ﬁlter’s weights to it, and generating an output value

that represents the presence or strength of a speciﬁc feature at that location.

Mathematically, the convolution operation (often implemented as cross-correlation in

deep learning frameworks) can be expressed as:

(𝐼 ∗ 𝐾)(𝑥, 𝑦) =

𝑚−1



𝑖=0

𝑛−1



𝑗=0

𝐼 (𝑥 +𝑖, 𝑦 + 𝑗) · 𝐾 (𝑖, 𝑗), (2.23)

where

𝐼

is the input image,

𝐾

is the convolution kernel or ﬁlter of dimensions

𝑚 × 𝑛

, and

(𝑥, 𝑦)

are the coordinates of the pixel in the output feature map, representing the top-left

corner of the window over which the operation is performed.

The output of this operation is a set of feature maps that highlight speciﬁc patterns or

features in the input data. Multiple ﬁlters can be used simultaneously, each designed (or

learned) to detect diﬀerent features, resulting in multiple feature maps.

Activation Function: After the convolutional layer, an activation function, typically

the rectiﬁed linear unit (ReLU), is applied to introduce nonlinearity. This nonlinearity

allows the network to learn complex patterns. The ReLU function is deﬁned as:

𝑓 (𝑥) = max(0, 𝑥). (2.24)

This activation function outputs the input directly if it is positive; otherwise, it outputs

zero. It helps the network to learn nonlinear relationships.

Pooling Layer: The pooling layer, also known as the subsampling or downsampling

layer, reduces the spatial dimensions of the feature maps. This mechanism helps to reduce

the number of parameters, computational complexity, and overﬁtting. The most common

type of pooling is max pooling, which takes the maximum value from a small region of

the feature map.

If the input to the pooling layer is a 2

2 window, max pooling selects the highest value

from that window. Mathematically, max pooling over a region can be expressed as:

𝑃(𝑥, 𝑦) = max{ 𝑓 (𝑖, 𝑗) : 𝑖, 𝑗 ∈ window(𝑥, 𝑦)}. (2.25)

CHAPTER 2. THE BASICS

Here,

𝑃(𝑥, 𝑦)

represents the output of the pooling operation at position

(𝑥, 𝑦)

, and

𝑓 (𝑖, 𝑗)

is the feature value at position (𝑖, 𝑗).

Fully Connected Layer: After several convolutional and pooling layers, the high-level

reasoning in the neural network is done via fully connected layers. In a fully connected

layer, each neuron is connected to every neuron in the previous layer. The output of the

ﬁnal fully connected layer can represent the class scores (in a classiﬁcation problem),

task-speciﬁc outputs such as predicted values or sequences, or, in the case of agents

trained via neuroevolution, it may represent continuous control signals or discrete action

probabilities used to interact with an environment. The fully connected layer can be

mathematically represented as:

𝑦 = 𝑊 · 𝑥 + 𝑏, (2.26)

where

𝑦

is the output vector,

𝑊

is the weight matrix,

𝑥

is the input vector, and

𝑏

is the bias

term.

In classiﬁcation tasks, the output layer often uses a softmax activation function to

convert the output scores into probabilities. The softmax function is deﬁned as:

softmax(𝑧

𝑖

) =

𝑒

𝑧

𝑖

𝑗

𝑒

𝑧

𝑗

. (2.27)

Here,

𝑧

𝑖

represents the output score for class

𝑖

, and the denominator is the sum of the

exponentials of all output scores. This function ensures that the output values are between

0 and 1 and sum to 1, representing a probability distribution over the classes.

Finding the right design parameters for a convolution network manually, such as

the number of layers, the number of channels, or the kernel size, can take a lot of time.

Thankfully, we can also automate this process with neuroevolution, as we will see in

section 10.5 in the chapter on neural architecture search.

2.3.6 Transformers

A transformer (Vaswani, Shazeer, Parmar, et al., 2017) is a type of deep learningmodel

that relies entirely on a so-called self-attention mechanism to process input data, rather

than traditional recurrent or convolutional layers. We will look at the self-attention

mechanism in more detail below and again in section

4.4.1 in the context of indirect

encodings. Transformers are the foundation for many state-of-the-art models in natural

language processing (NLP) and other ﬁelds. They are particularly well-suited for handling

sequential data and long-range dependencies, and they have demonstrated signiﬁcant

improvements in performance for tasks like machine translation, text generation, and

summarization. We will go into more detail on transformers and large language models in

chapter 13, which shows some of the ways in which NE methods can be synergistically

combined with generative AI.

The transformer architecture consists of an encoder-decoder structure, where both the

encoder and decoder are composed of multiple layers of self-attention and feedforward

neural networks (ﬁgure 2.15). The encoder takes an input sequence and processes it into

an internal representation, which the decoder then uses to generate an output sequence.

CHAPTER 2. THE BASICS

Multi-head

attention

Add &

norm

MLP

Add &

norm

Input

embedding

Input

sequence

N×

Positional

encoding

Encoder

Multi-head

attention

Masked multi-head

attention

Add &

norm

Add &

norm

MLP

Add &

norm

Output

embedding

Softmax

output

Output

(shifted right)

Positional

encoding

Linear

Decoder

N×

Output

propabilities

Figure 2.15: Illustration of the transformer architecture. The architecture consists of an

encoder (

𝑡𝑜𝑝

) and a decoder (

𝑏𝑜𝑡𝑡𝑜𝑚

). The encoder comprises a stack of layers, each containing

a multi-head self-attention mechanism followed by a position-wise feedforward network, with

residual connections and layer normalization applied after each sub-layer. The decoder stack is

similarly structured but includes an additional masked multi-head self-attention mechanism to

prevent positions from attending to subsequent positions. Positional encodings are added to the

input embeddings to provide information about the position of the words in the sequence. The

ﬁnal output is generated after applying a linear transformation and a softmax function to produce

the output probabilities.

Each component in the transformer leverages self-attention to weigh the importance of

diﬀerent elements in the input sequence in learning complex patterns.

Input Embedding and Positional Encoding: The input to a transformer model is

ﬁrst converted into embeddings, which are ﬁxed-length dense vector representations of

the input tokens (words, subwords, etc.). Since transformers do not inherently understand

the order of the sequence, positional encodings are added to the embeddings to provide

information about the relative positions of tokens in the sequence. For example, positional

encodings can use sine and cosine functions of diﬀerent frequencies to create unique

position vectors.

Self-Attention Mechanism: The core of the transformer is the self-attention mech-

anism, which allows the model to focus on diﬀerent parts of the input sequence when

making predictions. Self-attention computes a weighted representation of each input token

based on its relationship with all other tokens in the sequence. This calculation is done

based on three vectors: the query (Q), key (K), and value (V) vectors for each token. These

vectors are derived using learned weight matrices:

𝑄 = 𝑋𝑊

𝑄

, 𝐾 = 𝑋𝑊

𝐾

, 𝑉 = 𝑋𝑊

𝑉

, (2.28)

where

𝑋

is the input sequence, and

𝑊

𝑄

, 𝑊

𝐾

, 𝑊

𝑉

are weight matrices for the query, key,

and value vectors, respectively.

The self-attention scores are computed by taking the dot product of the query and key

vectors and scaling by the square root of the dimensionality of the key vectors. The scores

CHAPTER 2. THE BASICS

are then passed through a softmax function to produce attention weights:

Attention(𝑄, 𝐾, 𝑉) = softmax



𝑄𝐾

𝑇

√

𝑑

𝑘



𝑉, (2.29)

where 𝑑

𝑘

is the dimension of the key vectors.

Multi-Head Attention: To allow the model to attend to information from diﬀerent

representation subspaces jointly, Transformers use multi-head attention. Instead of

computing a single set of attention scores, the input is projected into multiple sets of

queries, keys, and values, and the attention mechanism is applied in parallel. The outputs

of these attention heads are concatenated and linearly transformed:

MultiHead(𝑄, 𝐾, 𝑉) = Concat(head

, . . . , head

ℎ

)𝑊

𝑂

. (2.30)

Each head

𝑖

performs the self-attention computation independently, and the results are

combined to capture diﬀerent aspects of the input data.

Feedforward Neural Network: After the multi-head attention layer, the output is

passed through a position-wise feedforward neural network. This network consists of two

linear transformations with a ReLU activation in between. The same feedforward network

is applied independently to each position in the sequence:

FFN(𝑥) = max(0, 𝑥𝑊

+ 𝑏

)𝑊

+ 𝑏

, (2.31)

where 𝑊

, 𝑊

are weight matrices, and 𝑏

, 𝑏

are bias terms.

Layer Normalization and Residual Connections: To stabilize and speed up training,

each sub-layer (multi-head attention and feedforward neural network) is followed by a

layer normalization step, which normalizes the output across the features. Additionally,

the transformer uses residual connections (skip connections) that add the input of each

sub-layer to its output before applying layer normalization. This computation mitigates

the vanishing gradient problem and allows the model to learn more eﬃciently:

Output = LayerNorm(𝑥 + Sublayer(𝑥)).

Stacking Layers: The encoder and decoder are composed of multiple identical

layers (typically six to 12 in common implementations). Each encoder layer consists of a

multi-head self-attention mechanism followed by a feedforward neural network, while each

decoder layer contains an additional cross-attention mechanism to attend to the encoder’s

output.

Output Decoding: The decoder generates the output sequence one token at a time.

At each step, the decoder attends to all the previously generated tokens using masked

self-attention (to prevent attending to future tokens) and to the encoder’s output using a

cross-attention mechanism. This process continues until the model generates a special

end-of-sequence token.

Neuroevolution has also been applied to the transformer architecture, resulting in

evolved transformer models that outperform baseline models on benchmark tasks while

using fewer computational resources. This approach will be discussed in more detail in

the context of evolutionary neural architecture search later in this book (chapter 10).

CHAPTER 2. THE BASICS

2.4 Neuroevolution: An Integrated Approach

This chapter introduced the fundamental principles of evolutionary algorithms and

neural networks, laying the foundation for their integration in neuroevolution. EAs

are optimization techniques inspired by natural selection, operating on populations of

candidate solutions that evolve over successive generations. Key processes include

selection, mutation, and crossover, which allow populations to explore and exploit the

search space for optimal or near-optimal solutions. The chapter discussed diﬀerent

types of EAs, such as GA and ES, and their speciﬁc uses, advantages, and limitations

in optimization problems. For readers interested in diving deeper into EAs, books like

Introduction to Evolutionary Computing by Eiben and J. E. Smith (2015) and the tutorial

Evolutionary Computation: A Uniﬁed Approach by De Jong (2020) would be a good

starting point.

Additionally, the chapter introduced neural networks, including basic architectures like

feedforward networks, convolutional networks, and LSTMs. These networks are designed

to process and learn from data, enabling them to make decisions or predictions. For a

more comprehensive overview of neural networks and deep learning, see e.g. the books

Dive into deep learning by A. Zhang, Lipton, M. Li, et al. (2023) and Deep Learning:

Foundations and Concepts by C. M. Bishop and H. Bishop (2024).

While this chapter provided a comprehensive overview of these foundational concepts,

it is also important to consider why they should be combined. Neural networks, as

presented here, may already appear suﬃcient on their own. However, their training often

relies on gradient-based methods, which can struggle in vast, high-dimensional, nonlinear,

or deceptive search spacesÐprecisely the kinds of spaces where optimal behaviors are

hard to deﬁne and must be discovered through search.

Evolutionary computation oﬀers a powerful complement to neural networks in this

context. Operating over a diverse population of candidate solutions makes a broad

exploration of the search space possible. This quality makes evolutionary methods an

eﬀective approach for discovering neural network architectures and weights, forming

the core idea behind neuroevolution. In the next chapter, we will take a ﬁrst look at its

fundamentals.

2.5 Chapter Review Questions

Core Principles of Evolutionary Algorithms : What are the key components

of evolutionary algorithms? How do these components collectively emulate the

process of natural selection?

Genetic Algorithm Operations: Describe the role of crossover and mutation in

genetic algorithms, and explain how they contribute to maintaining diversity in the

population.

Covariance Matrix Adaptation Evolution Strategy: How does CMA-ES adapt its

search over successive generations? What advantage does this adaptation provide in

comparison to simpler evolution strategies?

CHAPTER 2. THE BASICS

Multiobjective Evolutionary Computation: Compare and contrast single-objective

and multiobjective evolutionary algorithms. What unique challenges arise in multi-

objective EAs, and how does NSGA-II address them?

Practical Applications of Fitness Shaping: What is ﬁtness shaping, and how

does rank-based ﬁtness shaping mitigate the impact of outliers in evolutionary

optimization tasks?

Feedforward Neural Networks: What is the primary purpose of the activation

function in the hidden layers of a feedforward neural network? Why is nonlinearity

crucial for the network’s performance?

Recurrent Neural Networks: How do RNNs maintain information about past

inputs? Why are they particularly well-suited for sequential data tasks like language

modeling?

Long Short-Term Memory Networks: What are the roles of the forget, input, and

output gates in an LSTM cell? How do they collectively help mitigate the vanishing

gradient problem?

Convolutional Neural Networks: Describe the purpose of the convolutional and

pooling layers in a CNN. How do these layers work together to extract and summarize

features from input data?

10.

Transformers: What is the self-attention mechanism in a transformer model? How

does it enable the model to capture long-range dependencies in sequential data?

Chapter 3

The Fundamentals of Neuroevolution

Neuroevolution refers to the use of evolutionary algorithms to optimize artiﬁcial neural

networks, including their connection weights and even their architectures, through

simulated evolution. The story of neuroevolution begins with its most profound inspiration:

the evolution of biological nervous systems. Over billions of years, natural selection has

shaped increasingly complex neural architectures, from the simple nerve nets of primitive

organisms to the intricate brains of mammals. This evolutionary journey provides both

inspiration and validation for computational approaches that seek to evolve artiﬁcial neural

networks.

Compared to traditional neural network training methods, neuroevolution oﬀers

several distinctive advantages. It can optimize both network parameters and architecture

simultaneously. It requires only a ﬁtness function rather than explicit error signals. It

can handle non-diﬀerentiable aspects of networks and objectives. It maintains population

diversity, potentially discover ing novel solutions. As we will see throughout this book,

these capabilities make neuroevolution particularly valuable for problems where traditional

methods face limitations, such as reinforcement learning tasks, robot control, game playing,

decision-making, and other domains with complex, delayed, or sparse feedback.

This chapter starts with the basic neuroevolution taxonomy and then presents a simple

case study on how to evolve a neural network-controlled robot. It continues with details

on a particular neuroevolution method called NEAT, which allows optimizing both the

topology and weights of a neural network. Finally, it compares neuroevolution to deep

learning and discusses how neuroevolution itself can be scaled up to evolve the parameters

of larger neural networks with millions of weights.

3.1 Neuroevolution Taxonomy

The idea of evolving neural networks dates back to at least the late 1980s. Early researchers

explored using GAs to train ﬁxed-topology neural networks by evolving their connection

weights. For instance, Montana and L. Davis (1989) applied a GA to optimize the weights

of a feed-forward network, even designing specialized genetic operators to preserve useful

building blocks (sub-networks) during evolution. Around the same time, researchers like

D. B. Fogel, L. J. Fogel, and Porto (1990) demonstrated that evolutionary programming

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

could successfully evolve neural network weights for certain tasks. These early successes

showed that evolutionary search could ﬁnd good weight solutions and even sometimes

avoid local minima that gradient descent might get stuck in, thereby sparking interest in

learning by evolution.

Applying evolutionary algorithms to neural networks involves deciding how to encode

a neural network into a representation that can be evolved, and what evolutionary operations

will be used to modify those representations. As will be discussed next, approaches can

broadly be divided into those that only evolve the weights of the network and approaches

that evolve both the network’s weights and topology.

3.1.1 Fixed-Topology Neuroevolution

The simplest approach is to assume a ﬁxed network architecture (with a predetermined

number of layers, neurons, and connectivity patterns) and use evolution to optimize the

weights (and possibly biases) of that network. In this scenario, the genotype can be a

direct list of all weight values. Early work predominantly followed this approach ś for

example, representing the network’s weights as a vector of real numbers, which a GA or

ES then optimized (Schaﬀer, Whitley, and Eshelman, 1992; Yao, 1999). Standard genetic

operators can be adapted (e.g. using real-valued mutation or specialized crossover for

vectors) to breed better weight sets. In the basic setup, the ﬁtness of each individual is

computed by setting a network’s weights accordingly and measuring performance (like

accuracy or reward).

3.1.2 Topology and Weight Evolving Artiﬁcial Neural Networks

A more ambitious approach is to evolve the structure of the neural network itselfÐ

determining how many neurons to use and how they are connectedÐin addition to

optimizing weights. This approach promises automated architecture search, potentially

discovering designs that a human might not consider.

Early methods for evolving network topology began by directly mutating connection

weights within matrices (Dasgupta and McGregor, 1992). However, attention soon shifted

toward more advanced encoding strategies for representing and modifying graphs (Figueira

Pujol and Poli, 1998). This shift led to the rise of novel representations, such as the

graphical structures used in Cartesian genetic programming (J. F. Miller,

2011), and

the implicit connectivity found in approaches such as analog genetic encoding (AGE;

Mattiussi and Floreano, 2007) or geometric encoding for neural network evolution (GENE;

Templier, Rachelson, and Wilson, 2021), which draw inspiration from genetic regulatory

networks.

Another early direction was to evolve genetic strings with start and end markers for

node and connection deﬁnitions (Fullmer and Miikkulainen, 1992). These markers can be

mutated, activating and deactivating parts of the string: what was junk DNA becomes

part of the network, and parts of the network become junk DNA. Both the topology and

the weights can be evolved in this manner, sometimes resulting in drastic changes and

wide exploration. This approach was later extended to high-level abstractions of neural

networks: in Markov Brains, a structure of logic gates and their connections are evolved to

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

represent complex behavior (Hintze, Edlund, Olson, et al., 2017; Olson, Hintze, F. C. Dyer,

et al., 2013).

Transitioning from ﬁxed to increasingly complex network topologies introduced new

challenges. One such challenge was how to perform crossoverÐcombining the structures

of two parent networksÐwhen the topologies diﬀer signiﬁcantly. Another was ensuring

that more intricate structures were not prematurely eliminated from the population before

their weights had time to be properly optimized, potentially revealing their full capabilities.

One method that gained a lot of traction by addressing these issues is the neuroevolution

of augmenting topologies (NEAT) algorithm (Stanley and Miikkulainen, 2002), which

will be discussed in detail in section 3.3.

Another key consideration in evolving neural networks is the representation of the

network in the genotype. Encoding aﬀects everything: how variation operators work, how

well the search space is covered, and how scalable the approach is. There are two main

approaches, direct and indirect, which will be discussed next.

3.1.3 Direct Encoding

In a direct encoding scheme, every detail of the neural network is explicitly encoded

in the chromosome. This design often means that each connection (and possibly each

neuron) is represented by genes. For example, one might enumerate all weights in a

predetermined order, forming a long string of numbers (or bits) that correspond one-to-one

with the ANN’s weight matrix. Early architecture-evolving methods also used direct

encodings (Whitley, Dominic, Das, and Anderson, 1993; Yao, 1999), such as encoding

the connectivity matrix of a network as a binary string (1s and 0s indicating the presence

or absence of connections).

Direct encodings are straightforwardÐthey describe the phenotype network precisely

and are easy to implement. They allow ﬁne-grained modiﬁcations; a single mutation

can add, remove, or alter a speciﬁc connection. However, scaling can be an issue: as

network size grows, the genome length grows rapidly (potentially quadratic in number of

neurons for dense connectivity). A more fundamental issue is that direct encodings lack

an obvious way to capture high-level regularities or symmetries in the network; unless the

evolutionary process discovers them, which can be ineﬃcient. Despite these issues, direct

encodings have been widely used and are the default in many neuroevolution algorithms

(including NEAT), due to their simplicity and precision.

3.1.4 Indirect Encoding

Indirect encodings describe a network more abstractly, through a set of rules or a

generative process rather than enumerating every connection. Only the most important

design parameters are encoded, and a developmental procedure generates the full network

from this compressed description. In biology, DNA encodes how an organism grows

rather than explicitly mapping every cell. Similarly, an indirect ANN encoding might

encode blueprints for repeating structures, symmetric connectivity patterns, or growth

rules. Indirect encodings can be far more compact, potentially scaling to very large,

regular networks by exploiting patterns. They are also arguably closer to biological reality

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

(since real neural systems are not encoded link-by-link in genomes). The trade-oﬀ is that

the mapping from genotype to phenotype is more complex: mutations in the genome can

have broad, nonlinear eﬀects on the resulting network, and it may be harder for evolution

to ﬁne-tune speciﬁc connections. There is also a risk that an indirect encoding constrains

the space of possible networks in unintended ways. These considerations and others will

be discussed in detail in chapter 4.

In practice, the choice between direct and indirect encoding depends on the problem:

if the solution network is expected to have a lot of symmetry or repeated motifs (as in

certain sensorimotor coordination tasks), indirect encoding can be powerful; if the solution

is more irregular, direct encoding might be more eﬀective. The rest of this chapter will

focus on direct encodings; their indirect counterparts will be discussed in the next chapter.

3.2 Case study: Evolving a Simple Walking Agent

To make the fundamental concepts of neuroevolution concrete, this section will go over

the details of a case study in which a robot is taught to walk.

3.2.1 The Challenge

Neuroevolution is one of several ways to train an agent to operate in an environment, and

it shares similarities with reinforcement learning (RL). In both cases, an agent performs

actions in an environment and receives feedback in the for m of rewards. Over time, the

agent improves its decisions to maximize those rewards. However, in RL it is not trivial

to estimate the gradient of reward signals given to the agent in the future to an action

performed by the agent right now, especially if the reward is realized in many time steps in

the future. Even if it were possible to calculate accurate gradients, learning may get stuck

in a local optimum (ﬁgure 3.1), which exists in many RL tasks.

Neuroevolution, on the other hand, sidesteps gradients altogether. Instead, it treats

each neural network as an individual organism and uses evolutionary algorithms to select,

reproduce, and mutate better-performing networks over generations. This fundamental

diﬀerence enables neuroevolution to overcome several limitations of other approaches.

Most notably, neuroevolution can be applied to scenarios where gradient information is

unavailable or unreliable, such as when the relationship between network outputs and

performance is complex, sparse, or delayed. Further, while RL algorithms require a reward

signal to be given to the agent at every timestep, neuroevolution algorithms only care about

the ﬁnal cumulative reward that an agent gets at the end of its rollout in an environment.

In many problems, the outcome becomes apparent only at the end of the task, e.g. whether

the agent wins or loses, whether the robot arm picks up the object or not, or whether the

agent reached the goal.

Overall, these properties make neuroevolution particularly powerful in environments

with sparse or delayed rewards, discontinuous, noisy, or deceptive reward landscapes, and

unknown or diﬃcult-to-model dynamics. They are put to good use in the task of training

a robot to walk.

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

Figure 3.1: Bipedal walker agent stuck in a local optimum. In this 2-D domain, a robot agent

with two legs, controlled by a neural network, needs to walk across a terrain with various obstacles

and holes. The task is diﬃcult because the reward is given only in the endÐbut it also allows

learning methods to explore a variety of solutions. Simpler methods like the standard RL may

easily get stuck on the obstacles, as it did in this case. Neuroevolution, on the other hand, is

well-suited for the task and ﬁnds several creative ways to solve it. For animations of both stuck and

successful behaviors, see https://neuroevolutionbook.com/demos.

The task is implemented in an environment called BipedalWalkerHardcore, in which

the agent is challenged to control a bipedal robotÐsimulated in the Box2D physics engineś

that must walk across an uneven terrain (ﬁgure 3.1). This robot has four controllable

joints, two hips and two knees, and moves in a physics-based simulation with the potential

for complex interactions. Unlike simpler arcade games, this environment introduces

continuous state and action spaces.

The task is available inside the OpenAI gym (Brockman, Cheung, Pettersson, et al.,

2016), which is a toolkit designed to support the development and evaluation of diﬀerent

learning algorithms. In this framework, the agent observes the current state, selects an

action, and receives feedback in the form of a new observation, a reward, and a done signal

indicating whether the episode has ended.

3.2.2 Fitness Function

A critical aspect of any neuroevolution experiment is the design of the ﬁtness function. The

bipedal walker environment already provides a reward at each timestep, as a combination of

several factors designed to encourage forward locomotion, energy eﬃciency, and stability.

The primary component of the reward comes from forward progressÐthe faster the walker

moves to the right (positive

𝑥

-direction), the higher the reward. This component creates

a strong incentive for the agent to learn how to walk eﬀectively. In addition to forward

velocity, there is a penalty for using energy. Speciﬁcally, the environment penalizes the

agent based on the square of the torque applied to its motors. This component discourages

ineﬃcient or overly aggressive movement and helps the agent learn smoother, more natural

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

gaits. There is also a small positive reward for simply staying alive at each timestep, which

promotes stability and discourages falling. However, if the walker falls (e.g. the torso

touches the ground), the episode terminates and the agent receives a signiﬁcant negative

reward.

To determine the ﬁtness of a controller, the total cumulative reward is calculated by

adding up the environment rewards given to the agent at each timestep. The code in

listing 3 encapsulates a rollout of an agent in an OpenAI gym environment.

Listing 3 A simple rollout function for evaluating an agent in an OpenAI gym environment.

1 def rollout(agent, env):

# Reset the environment and get initial observation

3 obs = env.reset()

4 done

= False

5 # Accumulator for total reward

6 total_reward = 0

# Loop until the episode is finished

9 while not done:

# Agent selects action based on observation

11 a = agent.get_action(obs)

# Take action, observe new state/reward

13 obs, reward, done = env.step(a)

# Accumulate reward

15 total_reward += reward

# Return total reward after episode ends

18 return total_reward

3.2.3 Neural Network Architecture

For the experiments in this case study, we employ a ﬁxed-topology neuroevolution approach

and a direct encoding of the network weights. The employed simple feed-forward network

has two hidden layers to map from an agent’s observation, a vector

𝑥

, directly to the

actions, a vector 𝑦.

At each time step, the environment provides a 24-dimensional observation vector to

the neural network. This vector includes information about the robot’s hull angle, velocity,

and position, along with joint angles, contact points for the feet, and distance readings from

simulated LIDAR sensors. The goal is for the neural network to interpret these sensory

inputs and produce four continuous motor control signalsÐone for each jointÐwithin a

ﬁxed range. These signals dictate how much torque is applied at each joint, essentially

driving the robot’s walking gait.

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

3.2.4 Evolutionary Algorithm

Starting from randomly initialized neural networks, an EA can be used to ﬁnd a suitable

set of model parameters as described earlier (listing 4). Here,

solutions[i]

contains

the weights of a neural network and

Agent(solutions[i])

creates an instance of a

policy agent by loading those weights into a neural network architecture. The vector

solutions[i]

is typically a ŕat array produced by an EA. This array encodes all of the

trainable parameters of the network, including the weights and possibly the biases for each

layer, concatenated in a speciﬁc order. The particular EA algorithm used in the experiment

is CMA-ES.

Listing 4 EA training loop for the BipedalWalkerHardcore-v3.

1 env = gym.make('BipedalWalkerHardcore-v3')

2 solver

= EvolutionaryAlgorithm() # use our favorite EA

3 while True:

4 solutions

= solver.ask() # EA gives a set of params

5 fitlist = np.zeros(solver.popsize)

for i in range(solver.popsize): # evaluate for each solution

7 agent = Agent(solutions[i]) # init agent with a solution

8 fitlist[i] = rollout(agent, env) # rollout env

9 solver.tell(fitness_list) # give scores back to EA

10 bestsol, bestfit = solver.result() # get best param & fitness

11 if bestfit > MY_REQUIREMENT: # see if our task is solved

12 break

3.2.5 Training for Generality

BipedalWalkerHardcore deﬁnes solving the task as getting an average score of over 300

over 100 consecutive random trials. While it is relatively easy to train an agent to walk

across the map successfully using an RL algorithm, it is diﬃcult to get the agent to do so

consistently and eﬃciently, making this task an interesting challenge.

When running the code in listing 4, we ﬁnd that the best evolved agent achieves an

average score of only about 220 to 230 across 100 trials. Because the terrain map is

randomly generated for each trial, sometimes the agents face an easy terrain and sometimes

a diﬃcult one. This variability means that agents with weak policies can get lucky during

training but then might not generalize well. Put in another way, even though the agent was

tested over 100 tr ials at the end, it was trained on single trials, so the test task was not the

same as the training task.

To get more robust agents, an agent’s training can instead be deﬁned as consisting of

16 random rollouts, and the average of the rewards over 16 rollouts as its ﬁtness score.

The data eﬃciency of this method is 16 times worse, but the ﬁnal policy is more robust.

When the ﬁnal policy evolved under this extended training regime was tested over 100

consecutive random trials, its average score exceeded the 300 points required to solve the

task. Figure 3.2 shows the progress from early to late generations in training. Early on,

the agent often gets stuck on obstacles. After lear ning to avoid them, it gets better and

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

faster at walking. Interestingly, standard RL algorithms typically lead to policies that fall

short of an average score of 300. For instance, the popular RL algorithm PPO (Schulman,

Wolski, Dhariwal, et al., 2017a; Schulman, Wolski, Dhariwal, et al., 2017b) only achieved

an average score of around 240 to 250 over 100 random trials.

Figure 3.2: Various stages of progress in

BipedalWalkerHardcore

. Early on, evolution

discovers solutions that can walk relatively well on ŕat ground but frequently get stuck on obstacles.

Those who get over some of them are rewarded, and gradually the population gets better with them.

Once obstacles are no longer a problem, faster walks evolve as well. In this manner, the exploration

in population-based search leads to solutions of hard problems. For animations of these early

learning behaviors and later successful ones, see

https://neuroevolutionbook.com/demos

The ability to control the tradeoﬀ between data eﬃciency and policy robustness is

a powerful property of neuroevolution; it is useful in many real-world domains where

safe policies are needed. In theory, with enough compute it would have been possible to

average over all 100 rollouts and optimize the bipedal walker directly to the requirements.

Professional engineers often must have their designs satisfy speciﬁc quality assurance

guarantees and meet certain safety factors. Such safety factors need to be considered when

training agents to learn policies that may aﬀect the real world.

As a side note, what if we do not want the agent’s policy to be deterministic? For

certain tasks, even as simple as rock-paper-scissors, the optimal policy is a random action,

so the agent needs to learn a stochastic policy. One way to convert a deterministic policy

network into a stochastic one is to make the ﬁnal layer a set of

𝜇

and

𝜎

parameters and

sample the action from

𝑁 (𝜇, 𝜎𝐼)

. Adding such randomness to the output also helps

encourage the agent to explore the environment and escape from local optima.

In conclusion, this case study showed that EA can ﬁnd neural networks to control

a bipedal walker. When averaging across multiple rollouts, the resulting policies could

robustly handle randomly generated terrains. However, the power of evolution does

not stop there. In the natural world, bodies evolved at the same time as brains, in an

environment that is changing, and has many other actors that are also changing. Principles

and eﬀects of such coevolutionary processes will be discussed further in chapters 7, 9,

and 14. More general robust control through neuroevolution will be discussed in chapter 6.

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

3.3 Neuroevolution of Augmenting Topologies

As mentioned in section 3.1.2, topology and weight evolving artiﬁcial neural networks

(TWEANNs) are advanced neuroevolution methods capable of designing neural archi-

tectures from scratch, rather than assuming a ﬁxed structure. This section reviews the

challenges in doing that and describes a particular solution, NEAT, in detail.

3.3.1 Motivation and Challenges

The motivation for TWEANNs is clear: the space of possible network architectures is vast,

and ﬁnding the right architecture for a problem manually can be a tedious trial-and-error

process. If evolution can search through architectures automatically, it may discover novel

or non-intuitive designs that improve performance. However, early attempts at evolving

topologies identiﬁed critical problems:

Competing Conventions (i.e. the Permutation Problem): Neural network genomes

can encode the same functionality in multiple ways by per muting or relabeling hidden neu-

rons. Two diﬀerent encodings of an equivalent network are called competing conventions,

and crossing them over can produce corrupted oﬀspring. Figure 3.3 illustrates this problem:

two networks with hidden nodes labeled (A, B, C) vs. (C, B, A) implement the same

function, yet a naive one-point crossover misaligns their genes and yields oﬀspring missing

vital connections (e.g. one oﬀspring has two copies of A and none of C). In general, with

𝑛

hidden nodes, there are

𝑛!

functionally equivalent encodings, so recombining topologies

blindly often disrupts networks. This historical diﬃculty in aligning genomes made

crossover of arbitrary topologies highly unstable. Some earlier TWEANN methods tried

to avoid crossover altogether or enforced identical ordering of nodes, but these constraints

also make the search weaker. The competing conventions problem, also referred to as

the permutations problem (Radcliﬀe, 1993), remained a łholy grailž challenge: how to

recombine networks with diﬀerent topologies meaningfully.

Loss of New Structural Innovations: A second problem was that adding new

structure (new nodes or connections) often initially hurts performance, so those mutations

tend to be eliminated before they can prove useful. For example, inserting a new hidden

neuron introduces a random nonlinear change; until its weights are tuned, the network’s

ﬁtness usually drops. In a standard evolutionary algorithm, such an individual would likely

be outcompeted immediately by others, causing the innovation to disappear. In eﬀect,

complex structural mutations were rarely given time to optimize. Some prior TWEANNs

attempted ad-hoc remedies (e.g. adding łdeadž structure that initially has no eﬀect), but

without a systematic way to protect novel structures the population would converge to

conservative topologies. This lack of protection made it risky to evolve larger topologies:

major innovations could be prematurely lost.

Complexity vs. Search Eﬃciency: A third challenge was controlling the explosive

search dimensionality when topology is unfettered. Many earlier TWEANN implemen-

tations began evolution with a population of random large networks to ensure diverse

structures. However, random graphs often include redundant or unconnected components

(e.g. some inputs not reaching outputs), which waste evaluations. More subtly, starting

with excessive complexity burdens the search with many unnecessary parameters that were

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

Figure 3.3: The competing conventions problem. Two functionally identical networks (each

with three hidden neurons) have hidden nodes labeled in diﬀerent orders (

𝐿𝑒 𝑓 𝑡

: AśBśC,

𝑅𝑖𝑔ℎ𝑡

CśBśA). A naive crossover (recombining at one hidden node position) produces oﬀspring with

misaligned structures (

𝑏𝑜𝑡𝑡𝑜𝑚

), each missing one of the three hidden neurons (here, one oﬀspr ing

lost C and the other lost A). This example illustrates how exchanging genes between diﬀerently

ordered genomes can lose information. Figure from Stanley and Miikkulainen (2002).

never optimized from scratch. Evolution then spends eﬀort pruning or tuning irrelevant

structure instead of focusing on solving the task. One approach to favor simpler networks

was to penalize network size in the ﬁtness function. Yet such penalties are problem-

dependent and introduce diﬃcult trade-oﬀs. Ideally, the evolutionary process itself would

łcomplexifyž only as needed, i.e. start with minimal architectures and gradually add

complexity when it confers an advantage. This process was hard to establish: if every

individual starts simple (e.g. no hidden nodes), there is little initial topological diversity,

and any complex mutation would be instantly disadvantaged (tying back to the previous

issue).

In summary, to harness topology evolution, one needs (1) a crossover method robust

to competing encodings, (2) a way to protect and nurture new structural mutations, and (3)

a strategy to evolve minimal solutions ﬁrst and grow complexity gradually without ad-hoc

penalties. Neuroevolution of augmenting topologies (NEAT) was developed speciﬁcally

as a solution to these challenges (Stanley and Miikkulainen, 2002). It was conceived in the

early 2000s, and has served as a foundation for over 200 further algorithms and methods in

the ﬁeld since then (Papavasileiou, Cornelis, and Jansen, 2021). The algorithm’s hallmark

features are: (1) a novel genetic encoding with historical markings that aligns genes during

crossover to solve the competing conventions issue, (2) a speciation mechanism with

ﬁtness sharing to protect new innovations by reducing competition between disparate

topologies, and (3) an incremental complexiﬁcation approach that begins with minimal

networks and adds nodes/connections over generations. This section describes how each

of these mechanisms is implemented in NEAT, and how together they enable eﬃcient

evolution of increasingly sophisticated neural networks.

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

3.3.2 Genetic Encoding and Historical Markings

The genome in NEAT consists of node genes and connection genes (ﬁgure 3.4). Node

genes encode information about each neuron in the network. Connection genes, on the

other hand, encode information about the connections between nodes. Each connection

gene speciﬁes the two nodes it connects, the weight of the connection, whether the

connection is enabled or disabled, and a unique innovation number that tracks its origin.

Figure 3.4: NEAT genotype. Node genes deﬁne the types of nodes in the network: sensors (input

nodes), outputs, and hidden nodes. Connection genes represent the connections between nodes,

with each gene specifying the source and target nodes, connection weight, whether the connection

is enabled or disabled, and an innovation number indicating the historical origin of the gene. The

bottom section illustrates the neural network (phenotype) constructed based on the genome. This

encoding makes it possible to evolve network structures as well as the weights. Figure from Stanley

and Miikkulainen (2002).

The initial population of networks has a simple architecture, such as having each

input signal and bias connect directly to the outputs with no hidden layers. In NEAT,

mutations can aﬀect both connection weights and network structures. Connection weight

mutations occur similarly to other neuroevolution systems, where each connection’s

weight is either perturbed or left unchanged dur ing each generation. Structural mutations,

however, introduce new components to the genome, increasing its size. There are two

types of structural mutations: adding connections and adding nodes.

In the add connection mutation, a new connection gene is introduced, linking two

previously unconnected nodes (ﬁgure 3.5; top). In the add node mutation, an existing

connection is split, and a new node is inserted at the split point (ﬁgure 3.5; bottom). The

original connection is disabled, and two new connections are added to the genome. One

of the new connections, leading into the new node, is assigned a weight of 1, while the

other, leading out of the new node, retains the weight of the or iginal connection. This

approach minimizes the immediate impact of the mutation, allowing the new node to

integrate smoothly into the network.

As mutations occur, NEAT genomes grow larger over time, producing networks with

varying sizes and diﬀering connections. This network complexiﬁcation can result in

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

Figure 3.5: Structural mutations in NEAT. Mutations in NEAT can add new connections and

new neurons to the evolving neural network. (

𝑇𝑜 𝑝

) A new connection with an innovation number

7 is added between neurons 3 and 4. (

𝐵𝑜𝑡𝑡𝑜𝑚

) New neuron 6 is added, splitting the connection

between neurons 3 and 5: connection 5 becomes disabled, and new connections 8 and 9 are added

to the genome. In this manner, NEAT complexiﬁes the network architecture over time. Figure

from Stanley and Miikkulainen (2002).

genomes with diﬀering topologies and weight conﬁgurations, presenting challenges in

performing a meaningful crossover between neural networks. NEAT’s solution to this

challenge is based on the concept of innovation protection.

Innovations are protected in NEAT by assigning a unique innovation number to each

structural mutation, such as adding a new connection or node. These innovation numbers,

also called historical markings, are global identiﬁers that track the origin of mutations

across the population. When a structural change occurs in diﬀerent individuals that is

functionally equivalent (i.e. adding a connection between the same two nodes, meaning

the innovation numbers for the source and target node match between individuals), the

same innovation number is assigned, ensuring that similar changes can be recognized and

aligned.

Tracking the historical origins of genes in NEAT is computationally eﬃcient. Each

time a new gene is introduced through a structural mutation, a global innovation number is

incremented and assigned to that gene. Thus, innovation numbers create a chronological

record of when each gene appeared within the system. For example, the two mutations

in ﬁgure

3.5 could have occurred sequentially, with the new connection gene resulting

from the ﬁrst mutation receiving innovation number 7, while the two new connection

genes introduced during the second mutation (a new node mutation) receiving innovation

numbers 8 and 9. Whenever genomes with these mutations are crossed over in the future,

their oﬀspring will inherit the same innovation numbers for those genes. Since innovation

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

numbers remain constant and unaltered, the historical origin of every gene is preserved

throughout the evolutionary process.

Figure 3.6: NEAT crossover. The example shows the merging of two parent networks to produce an

oﬀspring network. The top row shows two parent genomes, parent1 and parent2, each represented

by a series of genes (connections between nodes) and their corresponding neural network structures.

The crossover begins by aligning the genes of the two parents. Matching genes (those present

in both parents) are inherited randomly from either parent, while disjoint genes (genes that are

present in one parent but not the other) and excess genes (genes that appear after the last gene of the

other parent) are also considered. The resulting oﬀspring genome combines these inherited genes,

reŕecting both the inherited traits from the parents and potentially new neural connections. The

ﬁnal oﬀspring neural network structure, shown at the bottom, includes the selected connections

and nodes from both parents. Thus, innovation numbers make it possible to implement crossover

without expensive graph matching operations. Figure from Stanley and Miikkulainen (2002).

During crossover (ﬁgure 3.6), innovation numbers enable NEAT to align genomes

with diﬀering structures. Genes are categorized based on their innovation numbers into

matching, disjoint, and excess genes. Matching genes have the same innovation number in

both parent genomes and are directly inherited and recombined. Disjoint genes, which

appear in one genome but not the other, and excess genes, which exist only in the larger

genome, are handled diﬀerently depending on the parents’ ﬁtness. This alignment prevents

the random mixing of unrelated genes, ensuring that crossover produces viable oﬀspring

with functional genetic material preserved.

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

By tracking mutations and aligning genes using innovation numbers, NEAT makes

meaningful crossover possible between genomes with diﬀerent topologies. This process

preserves functional structures and avoids the destructive eﬀects of uncoordinated genetic

mixing. Ultimately, innovation protection ensures diversity in the population and allows

NEAT to evolve increasingly complex and eﬀective neural networks while maintaining

their functional integrity.

The crossover operation is quite power ful. Suppose we have a network that is good at

some subtask, and another network that is good at some other subtask. In that case, it may

be possible to breed an oﬀspring network that can potentially be good at combining these

skills and becoming better than both parent networks at performing a bigger task.

Another important component of NEAT is speciation, which will be described next.

3.3.3 Speciation and Fitness Sharing

Speciation is the idea of g rouping the population of genes into diﬀerent species consisting

of similar members of the population. The goal is to give novel members of the population,

which may be promising although not yet very good, more time to evolve to their full

potential, rather than to kill them oﬀ at each generation. Imagine an isolated island

populated by wolves and penguins only. If we let things be, the penguins will be dead

meat after the ﬁrst generation, and all we would be left with are wolves. But if we create a

special no-kill zone on the island where wolves are not allowed to kill penguins once they

step inside that area, a certain number of penguins will always exist. They will have time

to evolve into ŕying penguins that will make their way back to the mainland, where there

is plenty of vegetation to live on, while the wolves would be stuck forever on the island.

For a more concrete example, consider the example in section 1.1 about the 100 sets

of weights, and imagine modifying the algorithm from only keeping the best 20 and

getting rid of the rest, to ﬁrst grouping the 100 weights into ﬁve groups according to the

similarity measured by Euclidean distance. Now that there are ﬁve groups (or species)

of 20 networks, for each group only the top 20% is again kept (i.e. only four sets). The

remaining 80% (i.e. 16) can then be replaced by crossing over and mutating the four

existing members, or from the entire set of surviving members in the larger population.

By modifying the genetic algorithm this way to allow speciation, genes have the time to

develop to their full potential. Also, the diversity will lead to better genes that incorporate

the best of the diﬀerent species. In contrast, without speciation, the population could

easily get stuck at a local optimum.

To speciate the population, NEAT deﬁnes a compatibility distance

𝛿

between two

genomes based on their genetic diﬀerence. This distance is computed as a linear

combination of three factors: the number of excess genes (

𝐸

), the number of disjoint

genes (𝐷), and the average weight diﬀerence of matching genes (𝑊) as:

𝛿 =

𝑐

𝐸 + 𝑐

𝐷

𝑁

+ 𝑐

𝑊, (3.1)

where

𝑐

, 𝑐

are coeﬃcients determining the importance of each term, and

𝑁

is a

normalization factor (usually the genome length of the larger parent, to normalize for

network size). Thus, genomes with many unshared genes (high

𝐸

𝐷

) or very diﬀerent

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

connection weights (high

𝑊

) will have a large distance

𝛿

, meaning they are less compatible.

NEAT assigns individuals to species by compar ing this distance: if genome

𝑔

is within a

threshold

𝛿

𝑡

of some species’ representative genome, it belongs to that species; otherwise,

a new species is created for

𝑔

. The threshold

𝛿

𝑡

is a parameter that NEAT can adapt to

target a desired number of species. Species thus group networks of similar topology (i.e.

those sharing common genes) together.

Species membership is then used to enable explicit ﬁtness sharing (Goldberg and

Richardson, 1987) as the reproduction mechanism. This approach ensures that organisms

within the same species share the ﬁtness of their niche. Consequently, a species cannot

grow excessively large, even if many of its members per form well. This limitation prevents

any single species from dominating the entire population, which is essential for maintaining

speciated evolution. The adjusted ﬁtness

𝑓

′

𝑖

of an organism

𝑖

is computed based on its

distance Δ from every other organism 𝑗 in the population as

𝑓

′

𝑖

𝑓

𝑖

𝑛

𝑗=1

sh(Δ(𝑖, 𝑗))

, (3.2)

where the sharing function

is deﬁned as

sh(Δ(𝑖, 𝑗)) = 1

Δ(𝑖, 𝑗) ≤ Δ

𝑡

, and

sh(Δ(𝑖, 𝑗)) =

otherwise (Spears, 1995). The

𝑡

represents the distance threshold. Eﬀectively,

𝑛

𝑗=1

sh(Δ(𝑖, 𝑗))

corresponds to the number of organisms within the same species as

organism

𝑖

, as species are pre-clustered based on compatibility using

𝑡

. The number of

oﬀspring allocated to each species is proportional to the sum of its member organisms’

adjusted ﬁtness values 𝑓

′

𝑖

3.3.4 Example: Double Pole Balancing

Let’s look at an example of NEAT applied to a simple toy problem to illustrate how it

works. In this task, called the double pole balancing (ﬁgure 3.7

𝑎

), two poles of diﬀerent

lengths are attached to a movable cart via hinges. The neural network must control the cart

by applying horizontal forces to keep both poles balanced for as long as possible, without

the cart exceeding the boundaries of the track. Due to the diﬀering lengths of the poles,

they respond diﬀerently to applied forces, introducing complex nonlinear interactions

that make the task challenging. The system’s state is deﬁned by the cart’s position

𝑥

and velocity

¤𝑥

, the angle and angular velocity of the ﬁrst pole

(𝜃

𝜃

)

, and the angle

and angular velocity of the second pole

(𝜃

𝜃

)

. Control is possible due to the diﬀering

lengths (and therefore, masses) of the poles, which causes them to respond diﬀerently to

the same input forces.

Success on the task is deﬁned as maintaining both poles within

±36

◦

of vertical for

100,000 time steps, equivalent to 30 minutes of simulated time. Fitness is measured by

the number of consecutive time steps during which both poles remain balanced. The task

can be made arbitrarily hard by making the poles more similar in length; when they are

the same, the task becomes unsolvable. In typical experiments, the shorter pole is 1/10th

of the length of the longer one.

When velocity information is included in the input in this manner, the task is fully

observable and Markovian, and not particularly hard: many learning methods can solve

it. The task can be made considerably more diﬃcult by omitting the velocities: the

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

(𝑎) A challenging pole-balancing task (𝑏) A compact solution by NEAT

Figure 3.7: A compact, explainable solution NEAT discovered for the pole-balancing problem.

(𝑎) In this version, there are two poles on a moving cart that needs to be pushed left or right with

a constant force at regular intervals to keep the poles from falling and the cart within the left

and right boundaries of the 1-D track. (

𝑏

) NEAT’s solution uses the derivative of the pole angle

diﬀerence, with a recurrent connection enabling the hidden node to detect whether the poles are

converging or diverging, eliminating the need to compute individual pole velocities. Figure

𝑎

from

Gomez, Schmidhuber, and Miikkulainen (2008).

controller is then required to estimate these missing state variables internally. That is, the

task is a partially observable Markov decision process (POMDP) and requires recurrent

or memory-capable network architectures. Traditional reinforcement learning methods

struggle with POMDP in general, and the POMDP version of double pole balancing is

particularly challenging for them. It is challenging for neuroevolution as well; only the

advanced neuroevolution methods can solve it (Gomez, Schmidhuber, and Miikkulainen,

2008).

However, NEAT ﬁnds a particularly clever solution: taking the derivative of the

diﬀerence in pole angles (ﬁgure 3.7

𝑏

). Using the recurrent connection to itself, the single

hidden node determines whether the poles are falling away or towards each other. This

solution allows controlling the system without computing the velocities of each pole

separately. It would be diﬃcult to design such a subtle and compact solution by hand, but

neuroevolution that complexiﬁes makes its discovery more likely.

Through ablation studies, it is possible to deter mine whether each component of NEAT

is essential to its performance. For instance, one might question the importance of starting

from a minimal structureÐperhaps the other features, such as speciation and historical

markings, are suﬃcient for NEAT to perform optimally. Conversely, it is also possible that

speciation contributes little, i.e. that protecting innovation is not critical. Lastly, NEAT is

speciﬁcally designed to support crossover, even when genomes diﬀer in size; is it useful

for the genomes to grow over evolution, or would ﬁxed-topology NEAT perform just as

well?

Table 3.1 summarizes the results of ablation experiments on NEAT. To allow the

ablated versions to succeed, double pole balancing with velocities was used as the

task. In each experiment, one of the components of NEAT was disabled to assess

its contribution to performance. First, removing growth from minimal structure led

to the most severe performance degradation, with only 20% of runs succeeding and

requiring over eight times more evaluations than full NEAT. This result suggests that

speciation and historical markings alone are not suﬃcient for guiding eﬀective evolution

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

Table 3.1: Ablation study removing each component of NEAT in turn. All components are needed

to achieve the full power of NEAT in solving the MDP version of the double pole-balancing task.

Method Evaluations Failure Rate

No-Growth NEAT (Fixed-Topologies) 30,239 80%

Initial Random NEAT 23,033 5%

Nonspeciated NEAT 25,600 25%

Nonmating NEAT 5,557 0%

Full NEAT 3,600 0%

without incremental complexity. Starting with random initial topologies (1ś10 hidden

nodes) also signiﬁcantly slowed learning and modestly increased failure rates, indicating

that beginning with minimal structure is more conducive to eﬀective exploration and

optimization. Second, disabling speciation caused the population to converge prematurely

on suboptimal structures, particularly when using random initialization. This ablation

resulted in a high variance and a 25% failure rate, emphasizing the importance of speciation

in preserving diversity and supporting structural innovation. Third, removing crossover

increased the number of evaluations by over 50%, though performance remained better

than in the other ablations. This result shows that while crossover is not as critical as

growth and speciation, it still contributes meaningfully to NEAT’s overall eﬃciency.

Thus, the ablation studies demonstrated that all three componentsÐgrowth from minimal

structure, speciation, and crossoverÐare essential to NEAT’s success. Performance

consistently suﬀers when any single element is removed, highlighting the importance of

their combined eﬀect in enabling eﬃcient and robust evolution.

To gain insight into how innovation emerges during evolution, it is essential to examine

the dynamics of speciation. Key questions include: How many species emerge throughout

a run? How frequently do new species appear or go extinct? These questions can be

addressed by visualizing the progression of speciation over time.

Figure 3.8 illustrates a representative run of the double pole balancing with velocities

task, which took 29 generations to solve. Generations are arranged vertically, with species

depicted horizontally. The width of each species reŕects its size, and new species appear

on the right. Initially, all organisms belonged to a single species, persisting until the

ﬁfth generation due to high compatibility. As new species emerged, the original species

declined and became extinct by the 21st generation. The second species also went extinct

in the 19th generation, unable to compete with more innovative species. A pivotal mutation

occurred in the 21st generation, enabling the second-oldest species to connect the long

pole angle sensor to a hidden node, boosting its ﬁtness. Simultaneously, a younger species

developed a useful connection between the short-pole velocity and long-pole angle sensors.

By the 28th generation, this species made a key connection between the cart position and

its earlier mechanism for comparing pole velocity and angle, solving the task in one more

generation. In the ﬁnal generation, the winning species, 11 generations old, comprised 38

neural networks out of 150, successfully concluding the run.

Many species that did not approach a solution still persisted throughout the run. This

result conﬁrms visually that innovation is preserved. The winning species does not

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

Figure 3.8: Species progression in the double pole balancing task. White triangles indicate

extinct species, red good solutions (one stdev), and yellow best solutions (two stdev). A number of

species were created as evolution discovered novel structures. They expanded and shrank based

on how well they performed, but stayed around long enough so that the innovations in them had

a chance to be optimized. In this manner, speciation promotes both innovation and diversity,

resulting in better and more creative solutions. Figure from Stanley (2003).

dominate the entire population, ensuring that a diverse set of solutions is maintained. This

diversity is particularly valuable in applications where the optimal behavior evolves over

time. For example, it makes it possible for NEAT to keep complexifying its networks in a

coevolutionary arms race (section 7.2).

3.4 Scaling up Neuroevolution

While much of neuroevolution has focused on small, structured networks, it is possible

to scale it up to large networks as well. This section reviews the diﬀerences of evolved

networks vs. deep learning, suggests ways to scale up to deep networks, and to take

advantage of modern computing to do so.

3.4.1 Neuroevolution vs. Deep Learning

Note that the networks that result from NEAT, and neuroevolution in general, are ver y

diﬀerent from those commonly used in deep learning. Neuroevolution networks are aimed

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

at AI-based decision-making, rather than prediction based on big data. The computational

requirements are diﬀerent, and therefore the networks are also diﬀerent.

However, even in domains where deep learning can be applied, neuroevolution provides

a potentially useful alternative. Performance with deep learning networks is based on

overparameterization where individual components perform only minimal operations: for

instance, the residual module in ResNet architectures combines bypassing the module with

the transformation that the module itself computes (K. He, X. Zhang, Ren, et al., 2016).

In contrast, in NEAT every complexiﬁcation is there for a purpose that can in principle

be identiﬁed in the evolutionary history. It thus oﬀers an alternative solution, one that is

based on principled neural network design.

This kind of compact evolved neural networks can be useful in several ways: First,

they can provide an explainable neural network solution. When neural networks are

trained with gradient descent, information in their embeddings becomes highly distributed,

making it diﬃcult to interpret (Hinton, McClelland, and Rumelhart, 1986; Kumar, Clune,

Lehman, et al., 2025; Miikkulainen and M. G. Dyer, 1991). In contrast, while a NEAT

network still performs based on recurrency and embeddings, its elements are constructed

to provide a particular functionality, and therefore its behavior is transparent. One such

example was discussed in section 3.3.4, where NEAT discovered a particularly innovative

solution to the pole-balancing problem. The network computes the derivative of the

diﬀerence of the pole angles, which makes it possible to control the system with a very

small network (ﬁgure

3.7). Several other examples of such insights are reviewed in

sections 7.2.1 and 14.1.

Second, they can provide regularized neural network solutions, instead of overﬁtting

to the dataset. The networks are compact, which generally leads to better regularization

(Ganon, Keinan, and Ruppin, 2003; Oymak, 2018; Reed, 1993), and they are chosen

based on their overall performance instead of ﬁne-tuned to ﬁt individual examples. This

property should be particularly useful when the datasets are relatively small, which is the

case in many practical applications. Thus, they can extend the scope of machine learning.

Third, they can utilize minimal hardware resources well. The advantages of deep-

learning networks do not emerge until a very large number of parameters. If the hardware

does not allow that scale (as is the case e.g. with many edge devices), evolved networks

provide an alternative principle that can be optimized to the given resources.

Fourth, they can be constructed to ﬁt hardware constraints. Gradient descent in

principle requires high-precision weights and diﬀerentiable activation functions that are

expensive to implement in hardware. In contrast, evolution can be used to optimize

the performance of networks with e.g. quantized weights, linear threshold units, or

FPGA-compatible components that are easier to implement (Gaier and Ha, 2019; Z. Liu,

X. Zhang, S. Wang, et al., 2021; Shayani, Bentley, and Tyrrell, 2008; Whitley, 2024a).

Optimization of neural networks for neuromorphic hardware is a promising emerging area

discussed in more detail in section 11.5.

Fifth, neuroevolution allows us to observe and study fundamentally diﬀerent forms of

internal representation that emerge through open-ended evolutionary processes, rather

than via backpropagation. NEAT in particular and TWEANN methods in general can

serve as a gateway to understanding how representations might form when networks

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

are allowed to grow in complexity organically, rather than being sculpted all at once

by gradient descent on a ﬁxed architecture. For example, recent work (Kumar, Clune,

Lehman, et al.,

2025) demonstrated that where SGD tends to entrench fractured and

entangled representations, especially when optimizing toward a single objective, NEAT

oﬀers a contrasting developmental dynamic. By starting with minimal structures and

expanding incrementally, NEAT encourages the emergence of modular, reusable, and

semantically aligned representations. Neuroevolution gives us a rare opportunity to study

representations not just as a byproduct of loss minimization, but as artifacts of open-ended

exploration and accumulated structural regularities. Without NEAT, or an equivalent

evolutionary or developmental approach, we would be limited to analyzing representations

formed in the constrained regime of SGD-trained deep networks.

3.4.2 Deep Neuroevolution

While neuroevolution methods such as NEAT shine in producing compact solutions, a

new direction has emerged in applying evolutionary algorithms to larger neural networks

as well. This recent direction, referred to as deep neuroevolution, shifts the focus from

evolving neural architectures to optimizing the parameters of large, ﬁxed-topology networks

directly. This work emphasizes scalability, simplicity, and the surprising competitiveness

of evolutionary algorithms in training deep networks for complex tasks. Two particularly

inŕuential contributions to this resurgence are the works of Salimans, Ho, X. Chen, et al.

(2017) and Petroski Such, Madhavan, Conti, et al. (2017). Both studies demonstrated that

even simple evolutionary algorithmsÐwhen paired with modern compute resourcesÐcan

scale eﬀectively to high-dimensional deep networks and match, or even exceed, the

performance of conventional reinforcement learning algorithms.

Salimans, Ho, X. Chen, et al. (2017) followed a ﬁxed-topology/direct encoding setup

similar to the one in the case study in section 3.2. However, instead of CMA-ES, they

used the OpenAI ES approach (section 2.2.4) to evolve neural networks with thousands of

parallel workers. In this approach, neural networks for complex continuous control tasks

like 3D humanoid walking could be found in just 10 minutes, and competitive results

on Atari games could be achieved within an hour. This work highlighted some of the

advantages of ES over deep RL methods, such as greater robustness to noisy and sparse

rewards and smoother learning curves. The experiments further demonstrated that the

slightly lower data eﬃciency of ES versus RL can be mitigated by the lower compute

requirements, resulting from not having to perform backpropagation and not needing a

value function.

Around the same time, Petroski Such, Madhavan, Conti, et al. (2017) used a simple

genetic algorithm for training ﬁxed-topology deep convolutional networks, particularly

targeting the Atari 2600 suite of environments. Their approach did not include crossover

or complex encoding schemes. Instead, it relied purely on selection and mutation, where

each individual in the population represented a full set of neural network weights encoded

directly as real-valued vectors. This approach used truncation selection, where the top

𝑇

individuals become the parents for the next generation, and elitism, where the best

individual was copied unmutated to the next generation. Because the Atari environments

are noisy, each of the top 10 individuals was evaluated on 30 additional episodes to get

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

a better estimate of their true performance. To produce oﬀspring, a parent was selected

uniformly at random and its parameter vector

𝜃

mutated by applying additive Gaussian

noise as

𝜃

𝑡

= 𝜃 + 𝜎𝜖 where 𝜖 ∼ N(0, 𝐼). (3.3)

Despite its simplicity, this approach was able to train networks with over four million

parameters to play Atari games from pixels alone. Their per formance was competitive

with RL algorithms, with each method doing better on some games and worse in others.

Among the 13 games tested, DQN, ES, and the GA each achieved the highest score on three

games, while the RL method A3C achieved the top score on four games. Notably, in the

game of Skiing, the GA achieved a score higher than any previously reported at the time,

surpassing a variety of diﬀerent DQN variants. In some games, the GA’s performance

exceeded that of DQN, A3C, and ES signiﬁcantly, particularly in Frostbite, Venture, and

Skiing. When allowed to run six times longer (6B frames), scores improved across all

games. With these post-6B-frame scores, the GA outperformed A3C, ES, and DQN in

head-to-head comparisons on seven, eight, and seven out of the 13 games, respectively. A

summary of the results across many Atari games can be seen in table 3.2.

However, while a GA can eﬃciently ﬁnd policies for many Atari games, it can struggle

in other domains. For example, a GA took around 15 times longer than ES and still

performed slightly worse when optimizing a neural network for humanoid locomotion.

The reason for this diﬀerence may be that an ES algorithm has an easier time making

precise weight updates than a GA, which could be critical for the intricate movements

necessary for humanoid locomotion. Further research is needed to elucidate this issue in

more depth.

Surprisingly, even a random search variation, which only evaluates randomly generated

policies, can perfor m well. While it does not outperform the GA on any of the games tested,

which suggests that the GA is eﬀectively optimizing over generations, it outperforms

DQN on three games, ES on three, and A3C on six. These results suggest that sometimes

following the gradient (as is done in gradient-based optimization algorithms) can actually

be detrimental to performance, and it can be more eﬃcient to do a dense search in some

local neighborhood of parameters.

3.4.3 Taking Advantage of Big Compute

One important diﬀerence of neuroevolution vs. traditional RL is that neuroevolution is

inherently parallelizable. Instead of improving a single individual solution, an entire

population is evolved at once. The population can be very large and distributed over a

large number of compute nodes, leading to discoveries that would otherwise be diﬃcult

to obtain. As will be discussed in the epilogue (chapter 15), such experiments are yet to

be runÐand they may require diﬀerent kinds of evolutionary methods, including those

designed to take advantage of neutral mutations, weak selection, large populations, and

deep time (as will be discussed in more detail in section 9.1.1).

Another promising direction is to take advantage of GPUs/TPUs. Many deep learning

algorithms, such as deep reinforcement learning, have beneﬁted greatly from rapid training

of neural networks on hardware accelerators, and thus shorter iteration times. Previously,

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

Table 3.2: Scores of ES and GA neuroevolution approaches on the Atari benchmark compared

to RL. Diﬀerent methods perform best in diﬀerent games (higher values are better). Neuroevolution

can thus be extended even to very large networks, where they are competitive with modern RL

techniques, and potentially oﬀer advantages through large-scale parallelization. Interestingly, even

a random search variant (RS) can ﬁnd policies that outperfor m policies found by DQN, A3C, and

ES for some games. Table adapted from Petroski Such, Madhavan, Conti, et al. (2017).

DQN ES A3C RS GA GA

Frames 200M 1B 1B 1B 1B 6B

Forw. Passes 450M 250M 250M 250M 250M 1.5B

Backw. Pass. 400M 0 250M 0 0 0

Operations 1.25B U 250M U 1B U 250M U 250M U 1.5B U

amidar 978 112 264 143 263 377

assault 4,280 1,674 5,475 649 714 814

asterix 4,359 1,440 22,140 1,197 1,850 2,255

asteroids 1,365 1,562 4,475 1,307 1,661 2,700

atlantis 279,987 1,267,410 911,091 26,371 76,273 129,167

enduro 729 95 -82 36 60 80

frostbite 797 370 191 1,164 4,536 6,220

gravitar 473 805 304 431 476 764

kangaroo 7,259 11,200 94 1,099 3,790 11,254

seaquest 5,861 1,390 2,355 503 798 850

skiing -13,062 -15,443 -10,911 -7,679 -6,502 -5,541

venture 163 760 23 488 969 1,422

zaxxon 5,363 6,380 24,622 2,538 6,180 7,864

these advances have been tailored to algorithms based on gradient descent, but the NE

community has been developing its own frameworks, constantly narrowing this gap.

While NE algorithms have mostly relied on CPU parallelism in the past, the aforemen-

tioned work by Petroski Such, Madhavan, Conti, et al. (2017) (section 3.4.2) was also an

early demonstration of the power of an NE approach that capitalizes on GPU acceleration.

Even using only a single GPU, training can be signiﬁcantly sped up. Since then, more

work has been done to further take advantage of distributed hardware-accelerated setups

and the massive throughput provided by GPUs/TPUs. While distributing training across

multiple CPUs can already give a substantial speedup, another level of training speed and

network size can be reached by taking advantage of hardware acceleration.

Deep lear ning methods in general, and RL methods in particular, have long been able

to take advantage of training across a large number of TPUs and GPUs. In recent years,

the advent of high-performance computing frameworks like JAX has also ﬁnally enabled

such eﬃcient hardware acceleration for evolutionary algorithms. Two notable libraries

that leverage JAX for evolutionary computation are EvoJAX (Tang, Tian, and Ha, 2022)

and EvoSAX (Lange, 2023). For example, one of the important features of EvoJAX is its

use of JIT compilation to optimize the evaluation of the ﬁtness function. This technique

ensures that the computationally intensive parts of the algorithm are executed as eﬃciently

as possible. Additionally, EvoJAX supports vectorized operations, allowing simultaneous

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

evaluation of multiple individuals, further enhancing performance.

This modernization mirrors a broader trend in neuroevolution: the reimplementation

of classical ideas using modern deep learning prog ramming stacks, unlocking performance

that was previously unattainable. This work includes modern versions of NEAT, such

as TensorNEAT (L. Wang, M. Zhao, E. Liu, et al., 2024), which take advantage of

JAX and can reach speedups of up to 500 times compared to other existing non-JAX

implementations. TensorNEAT serves as a proof-of-concept that classic NE algorithms

like NEAT can thrive in the era of hardware acceleration and modern ML tooling. It

opens the door to applying topology-evolving methods to more complex tasks than have

heretofore been possible.

Note that TPUs and GPUs were designed to run deep learning architectures well,

and they may not be as great a ﬁt for neuroevolution. Chapter 11 reviews neuromorphic

approaches, where spiking neural networks are evolved for hardware implementation,

resulting in energy-eﬃcient implementations. Field-programmable gate arrays (FPGAs)

are another promising direction, for continuous-time recurrent neural networks (CTRNNs)

in particular (Whitley, 2024a; Whitley, 2024b). FPGA can be conﬁgured in less than a

millisecond to implement a particular neural network architecture, making it possible to

evaluate network candidates rapidly, for instance 20-28% faster than an ARM processor

Thus is possible to take advantage of special hardware and modern compute stacks

to scale up the neuroevolution process, both in terms of speed and in terms of network

size. The next chapter will take a look at more methodological ways to scale up, taking

advantage of indirect encodings. It is also possible to combine deep learning synergistically

with evolution (and methods such as NEAT), which is a topic of chapters 10 and 11. An

interesting synergy is also emerging with RL and generative AI, as will be discussed in

chapters 12 and 13. These are all recent and emerging extensions of neuroevolution. The

unique core of it, however, is still evolving intelligent behavior and decision-making, as

will be discussed in chapters 6 through 9.

3.5 Chapter Review Questions

Evolutionary Algorithms: What advantages do evolutionary algorithms (EAs)

oﬀer over traditional reinforcement learning (RL) when solving tasks where only

the ﬁnal outcome is known, rather than intermediate rewards?

Key Mechanism: Describe how an EA can be applied to train a neural network

to solve a reinforcement learning task. Include the role of the ﬁtness function and

population-based search.

Deterministic vs. Stochastic Policies: What is the diﬀerence between deterministic

and stochastic policies in neuroevolution? Why might a stochastic policy be

beneﬁcial for certain tasks?

Robust Policies: In the context of the BipedalWalkerHardcore example, how does

evaluating an agent over multiple trials improve the robustness of the policy? What

tradeoﬀs does this introduce?

CHAPTER 3. THE FUNDAMENTALS OF NEUROEVOLUTION

Evolutionary Optimization: Explain how neuroevolution can evolve both the

weights and the architecture of a neural network. Why is evolving the architecture a

signiﬁcant step beyond evolving weights alone?

NEAT: What are the main components of the NEAT algorithm? Describe how

mutation, crossover, and speciation contribute to its eﬀectiveness.

Neuroevolution vs. Deep Learning: In what scenarios might neuroevolution

outperform deep learning? Highlight at least two scenarios where neuroevolution

oﬀers unique beneﬁts.

Explainability and Compactness: Why might solutions discovered through neu-

roevolution, such as NEAT’s compact pole-balancing solution, be more explainable

than those generated by deep learning?

9. Emerging Synergies: How can neuroevolution complement other AI approaches,

such as large neural networks, neuromorphic hardware, or generative AI models?

Provide an example of one such synergy.

10.

Scaling Up: How does leveraging modern hardware acceleration (e.g. GPUs, TPUs)

improve the scalability of neuroevolution, and what are some notable examples of

frameworks that enable this acceleration?

Chapter 4

Indirect Encodings

When neural networks are encoded directly, the elements in the genetic representation

correspond one-to-one to elements in the neural network. Indirect encodings, on the other

hand, utilize a mechanism that allows expanding a compact genetic encoding into much

larger and more complex neural networks. Several such approaches are reviewed in this

chapter. The ﬁrst three represent diﬀerent levels of abstraction of indirect encoding in

biology, i.e. development through cellular growth, grammatical encoding, and learning.

Next, indirect encoding through hypernetworks is reviewed, where one network indirectly

encodes the design of another. Finally, we’re looking at dynamic indirect encodings

through self-attention mechanism.

4.1 Why Indirect Encodings?

Biological organisms in nature all develop from a single starting cell. Through local

cell interactions and growth over time, an initially unassuming mass of cells eventually

transforms into a complex and sophisticated structure with specialized cells and intricate

connections. This process of growth and development, known as morphogenesis, is a

fundamental aspect of biology that underlies the formation of all living organisms. In the

case of the human brain, this process is particularly remarkable, as it gives rise to the

most complex and sophisticated structure known to science, with billions of neurons and

trillions of connections.

The human brain exhibits a complex network of interconnected modules, which form

the basis of intelligence. How this intricate structure is encoded in our genetic code,

consisting of approximately 24,000 genes or 3 billion base pairs (International Human

Genome Sequencing Consortium, 2004), is a fascinating question that we’re still struggling

to completely answer. Although learning plays a crucial role, much of this information is

already encoded in the genome.

To achieve this remarkable feat, regularity is necessary, which involves reusing

structural motifs to enable compression and compactness of the genome. Interestingly,

regularity also provides computational advantages to neural structures, as seen in the

success of convolution in deep learning. Convolution, a pattern of connectivity that uses the

same feature detector at multiple locations in a layer, has proven to be a powerful solution for

CHAPTER 4. INDIRECT ENCODINGS

capturing translation-invariant features in deep learning architectures. Instead of designing

such patterns and others by hand and ultimately being limited by a human designer,

ideally, our neuroevolutionary algorithms would identify these powerful regularities in an

automated way. This is the idea behind indirect encodings in neuroevolution.

Before we go into more details about indirect encodings, let’s revisit the NEAT

algorithm from the previous chapter. As we discussed, NEAT is an example of a direct

encoding. There is no compression involved or any type of reuse of information, resulting

in a one-to-one mapping between the parameters of a NEAT genotype (the description of

the nodes that exist in the network and how they are connected to each other) and those of

the neural network phenotype. In other words, for every connection in the neural network,

there exists a parameter in the underlying genotype. As we have seen, NEAT works well

for many problems but because it is a direct encoding it has the drawback that every

subpart of the solution needs to be reinvented separately by evolution instead of allowing

the genome to reuse it. It is therefore not surpr ising that NEAT has mostly been used for

tasks requiring compact neural networks, with orders of magnitude fewer parameters than

those used in current reinforcement learning approaches.

Let’s look at an example of what this means for a particular problem. Imagine you

want to evolve a controller for a quadrupedal robot. This task likely would beneﬁt from an

approach that takes into account the underlying task patterns and symmetries; in other

words, knowing how to control one leg is likely helpful in controlling the rest. A tried

and tested approach for resolving such a problem using an evolutionary algorithm is to

assist it in recognizing patterns and symmetries. This method involves manually breaking

down the problem into smaller components, such as designing the controller for one leg

of a quadruped and then duplicating it for each leg, with slight variations in phase. By

doing this, the algorithm is encouraged to adopt a modular approach and employ a single

encoding for multiple modules. However, it would be ideal if the algorithm were able to

take advantage of the symmetry and regularities of the tasks automatically, without an

engineer having to decompose the problem manually. While it is easy to see how the

problem could be decomposed into sub-solutions for a quadr upedal walker, it is not always

as straightforward. The idea behind indirect encodings is to address this issue through

representations that have the ability to capture and express regularities such as symmetries

and repetition in the phenotypic structures automatically.

Indirect encodings draw inspiration from the compression of DNA in natural systems

and have a long research history stretching back several decades, including early experi-

ments in pattern for mation. Researchers have explored the use of evolvable encodings

for a diverse range of structures ranging from simple blobs of artiﬁcial cells to complex

robot morphologies and neural networks (Bongard and Pfeifer, 2001; Doursat, Sayama,

and Michel, 2013; Gruau, 1994; Hornby and Pollack, 2002; J. F. Miller and Turner, 2015;

Stanley and Miikkulainen, 2003).

In evolutionary computation, the process of how the genotype is translated into the

phenotype, which entails all the observable characteristics of an organism, is usually

called the genotype-to-phenotype mapping. In nature this mapping is achieved through

the process of development. Thus, one way to take advantage of indirect encodings is

to mimic development in biology (Miikkulainen and Forrest, 2021). There are three

CHAPTER 4. INDIRECT ENCODINGS

main approaches: modeling cellular growth processes, abstracting development into a

grammatical rewrite system, and combining evolution synergistically with learning. These

are the topics discussed in the next section.

The two sections after that review fundamentally diﬀerent mechanisms of indirect

encoding. The ﬁrst one is hypernetworks, in which one neural network encodes the weights

of another neural network. While developmental systems are suitable for modeling natural

structures and self-similar patterns, neural networks give us more ŕexibility in generating

diverse and rich patterns. They can not only capture regularities such as symmetry and

repetition but also more complex patterns such as repetition with variation. Following, we

look at how hypernetworks can be extended to serve as dynamic encodings, in which the

generated weight pattern can be made input dependent. This type of dynamic indirect

encoding is closely related to the idea of self-attention. How they can be the basis for an

indirect encoding is the focus of the last section in this chapter.

4.2 Developmental Processes

As discussed in section 14.4, development is a fundamental way in biology to construct

complex solutions. Instead of specifying the ﬁnal solution directly, evolution speciﬁes

a developmental process, i.e. the initial structure and a mechanism for building a full

solution through intrinsic growth or through interactive adaptation to the environment.

Such mechanisms can be harnessed in artiﬁcial systems as well. Emulating biology,

many diﬀerent developmental mechanisms can be used to establish artiﬁcial embryogeny

(Stanley and Miikkulainen, 2003), i.e. a biologically inspired way to take advantage of

indirect encodings. One way is to emulate cell-chemistry mechanisms such as cellular

growth and genetic regulation. Another is to abstract development into grammatical

rewrite steps. A third is to take advantage of learning, either individually or through

population culture. These ideas will be reviewed in the subsections below.

4.2.1 Cell-Chemistry Approaches

Understanding the fundamental characteristics of natural patterns has been an important

motivation for developmental systems. In seminal work in 1952, Alan Turing proposed

a system based on diﬀusing chemicals, successfully simulating patterns reminiscent of

those found on seashells, feathers in birds, and fur in mammals (Turing,

1952). At the

other end of the spectrum, Aristid Lindenmayer in 1968 proposed high-level grammatical

abstractions called L-systems, demonstrating that they can produce lifelike plant structures

(Lindenmayer, 1968a; Lindenmayer, 1968b).

Initially, both Turing and Lindenmayer drew inspiration from the patterns observed

in nature, prior to their endeavors to describe the mechanisms behind these patterns.

They took opposite perspectives on development: Turing’s cell-chemistry is a bottom-up

approach whereas Lindenmayer’s grammatical systems are top-down. Interestingly, neither

one of those was designed to be evolved, nor were they intended speciﬁcally to explain

how neural networks are constructed. However, both serve as biological motivation for

neuroevolution that takes advantage of indirect encoding through development. This

CHAPTER 4. INDIRECT ENCODINGS

section focuses on approaches based on cell chemistry; the next section focuses on

grammatical approaches.

Cell-chemistry approaches aim to capture and utilize some of the fundamental physical

mechanisms underlying development. Turing’s reaction-diﬀusion model is a foundation for

many of them. It consists of diﬀerential equations that describe how chemical substances,

or morphogens, propagate and change over time through diﬀusion through a medium and

reaction with each other. Initially the morphogens are randomly distributed, and their

concentration vector 𝐶 at each location changes over time as

𝜕𝐶/𝜕𝑡 = 𝐹 (𝐶) + D∇

𝐶, (4.1)

where the diagonal matrix

represents how fast each mor phogen diﬀuses through the

medium, and the function

𝐹

describes how the morphogens react to each other. The

process characterized by this equation takes place at all locations and time steps in parallel,

resulting in a dynamic system of morphogen concentrations. Over time, it can result in

signiﬁcant patterns such as those on seashells, feathers in birds, and fur in mammals.

The model can be applied to the development of neural networks as well (Nolﬁ and

Parisi, 1992). Diﬀusion represents axonal growth, and reactions are interactions between

axons and cell bodies, i.e. the forming of active connections. To evolve networks, each

genome of a network consists of its neuron deﬁnitions, i.e. the location of each cell body

and parameters that deﬁne how axons will branch out of it. There is exuberant growth

with pr uning to remove connections that are not useful. In this manner, reaction-diﬀusion

implements a developmental mechanism that allows coding network structures indirectly.

It is an abstract analogy, however, i.e. not intended to model the actual underlying chemical

processes.

Approaches based on genetic regulatory networks (GRNs), in contrast, aim at building

on such chemical processes. As mentioned in the introduction to this chapter, the number

of genes in e.g. human genome is relatively small. Much of the complexity lies in the

mechanisms that construct an individual based on those genes (GRNs; Cussat-Blanc,

Harrington, and Banzhaf, 2019; Y. Wang, 2013). In particular, the genes interact: Many

genes participate in encoding a particular trait through a complex network of interactions.

Through chemical reactions and diﬀusion, the networks may enhance or suppress the

eﬀect of individual genes, generating variation and robustness in gene expression. In

this manner, instead of coding everything directly into genes, evolution also encodes an

interaction mechanism that results in an indirect and potentially more powerful encoding.

Interestingly, this mechanism is entirely missing from standard evolutionary algorithms!

GRNs can be implemented as diﬀerential equations or abstracted into computationally

more eﬃcient implementations, such as Boolean functions (Dellaert and Beer, 1994).

Such functions, called operons, describe the interactions at a high level, for instance

𝐴 ∧ ¬𝐵 → 𝐶;

𝐴 ∧ 𝐶 → 𝐵,

which states that if protein

𝐴

is in the cell and

𝐵

is not, then

𝐶

is produced, and if

𝐴

and

𝐶

are both in the cell,

𝐵

is produced. Thus, starting from

𝐴

, this process produces

𝐶

, then

CHAPTER 4. INDIRECT ENCODINGS

𝐵

, and stops. Such systems of rules or equations can be encoded as genomes and then

evolved towards a given target, such as the production of a certain protein.

Importantly, GRN processes can be scaled up to represent growing neural networks.

Some of the proteins may represent receptors, and others axonal growth. The proteins have

to match in order for the connection to be made. In this manner, chemistry-guided axonal

growth like that observed in the brain can be modeled and utilized in neuroevolution. The

approach is potentially powerful, however it is diﬃcult to take advantage of it. It may need

to be simpliﬁed further by representing the genome as a string. It can then be evolved to

e.g. construct a neural network that controls a simulated robot to move around without

hitting obstacles. Or, GRNs may be abstracted into a more general representation of analog

genetic encoding, which then allows for complexiﬁcation and decomplexiﬁcation of the

network as needed in the evolutionary process (Mattiussi and Floreano, 2007). Other

implementations exist as well (Iba and Noman, 2016). A particularly ambitious example

will be discussed in section 9.1.3, where GRNs are used to construct a system with high

evolvability, as a potential ingredient in open-ended evolution.

In general, much work remains in taking advantage of indirect encodings through

development. A closer look at biological development reveals that between grammatical

and cell-chemistry approaches, there are many dimensions that could be modeled and

utilized (Stanley and Miikkulainen,

2003). There are mechanisms for (1) cell fate, i.e.

what role each cell develops to take on in the organism; (2) targeting, i.e. how connections

ﬁnd their appropriate end locations; (3) heterochrony, i.e. how timing and ordering of

developmental phases aﬀects the end result; (4) canalization, i.e. how some changes

because robust and tolerant to mutations; and (5) complexiﬁcation, i.e. how new genes are

added to the genome, increasing the complexity of the phenotype. NEAT, of course, takes

advantage of complexiﬁcation, and GRNs utilize targeting, but the other dimensions and

their combinations are largely unexplored.

Thus, much can still be learned from biology and harnessed in neuroevolution. Such

work can also help understand biology better, as will be discussed from several perspectives

in chapter 14.

4.2.2 Grammatical Encodings

In contrast with the cell-chemistry approaches, Lindenmayer’s L-systems are high-level

abstractions of development. They are grammatical rewrite systems; each rewrite step can

be seen as a step in development. As mentioned above, they were originally developed to

explain patterns seen in plants, and indeed they can produce some very interesting such

designs. For instance, the company SpeedTree has created tools that can produce realistic

virtual foliage, which has been used in many videos and movies such as Iron Man 3 or

Avatar. In L-systems, rewrite rules are applied concurrently to all characters within a

string, similar to how cell divisions occur simultaneously in multicellular organisms. By

iteratively replacing sections of a basic object according to a predeﬁned set of rewriting

rules, intricate structures can be generated. Figure 4.1

𝑎

shows an example of such a process.

While the grammatical rules leading to certain structures are traditionally designed by

hand, such as in Lindenmayer’s original system, they can also be optimized through an

evolutionary search method (Ochoa, 1998).

CHAPTER 4. INDIRECT ENCODINGS

(𝑎) L-system Rewriting (𝑏)

Figure 4.1: L-systems. (

𝑎

) L-systems can grow plant-like structures by repeatedly applying rewrite

rules to an initial starting character. (

𝑏

) With the addition of some stochasticity, the approach is

able to generate realistic trees. Figure (𝑎) from Prusinkiewicz, Hammel, Hanan, et al. (1996).

(𝑎) (𝑏) (𝑐) (𝑑)

Figure 4.2: Tables grown by evolved L-systems. Shown are tables evolved with a direct (

𝑎

𝑏

)

and indirect encoding (

𝑐

𝑑

). In contrast to the directly encoded tables, the indirectly encoded ones

display key biological regularities such as repetition and symmetry. Figures from Hornby and

Pollack (2001b).

In an impressive demonstration of their versatility, and going beyond the lifelike plant

structures they were initially designed for, Hornby and Pollack (2001b) applied an L-system

approach to the optimization of table designs. Here, one can optimize L-system rules

that grow designs that have a speciﬁc height, surface structure, and stability. Compared

to a direct encoding approach, in which discovered components could not be reused, the

indirect L-system encoding produced better results faster, and those designed were more

aesthetically pleasing (ﬁgure 4.2). On a quick ﬁrst glance, they could be mistaken for

IKEA furniture. On the other hand, the designs produced by the direct encoding approach

are lacking regularities and look more piecemeal.

By identifying the shared properties among natural patterns, it becomes evident which

aspects artiﬁcial systems should account for. One of the fundamental characteristics

observed in biological organisms is the presence of repetition. This hallmark trait manifests

in multiple instances of the same substructures found throughout an organism’s body.

From the tiniest cells to complex neural networks in the brain, these recurr ing motifs play

a crucial role in shaping the organism’s structure and function. This repetitive nature in

the outward appearance of an organism is also referred to as self-similarity. Furthermore,

this repetition is not always exact but often occurs with subtle variations. For example,

CHAPTER 4. INDIRECT ENCODINGS

within the vertebral column, each vertebra shares a similarity in structure but exhibits

distinct proportions and morphologies. Similarly, human ﬁngers follow a regular pattern,

yet they display individual diﬀerences, making each ﬁnger on the same hand unique.

This phenomenon of repetition with variation is pervasive throughout all of natural life.

A prevalent form of repetition in biological organisms is through symmetry. Bilateral

symmetry, a classic example, occurs when the left and right sides of an organism’s body

are mirror images of each other. This symmetrical arrangement is commonly observed in

various living beings. While overall symmetry is noticeable in many biological structures,

true perfection is rare. Imperfect symmetr y is a common feature of repetition with

variation. The human body, for instance, exhibits an overall symmetric layout, yet it is

not entirely equivalent on both sides. Some organs are exclusive to one side of the body,

and the dominance of one hand over the other is a typical example of this asymmetr y. In

conclusion, the occurrence of repetition and its variations, along with diﬀerent forms of

symmetry, play a fundamental role in shaping the intricate structures and patterns found

in biological organisms. Understanding these principles is essential for unraveling the

complexities of life and the underlying mechanisms that govern the diversity of living

forms.

Throughout many generations, the regularities observed in biological organisms often

undergo elaboration and further exploitation. An illustrative example of this process

is evident in the evolution of early ﬁsh, where the bilaterally symmetric ﬁns gradually

transformed into the arms and hands of mammals, while still retaining some of the original

regularities. Preservation of established regular ities is a remarkable aspect of biological

evolution. Over generations, these regularities are typically strictly maintained. For

instance, bilateral symmetry rarely gives rise to three-way symmetry, and animals with

four limbs rarely produce oﬀspring with a diﬀerent number of limbs, even though the limb

design itself may undergo elaboration and modiﬁcation.

By using this list of regularities and their evolutionary patterns, researchers can analyze

phenotypes and lineages resulting from artiﬁcial encodings, comparing them to natural

characteristics. This analysis provides valuable insights into whether a particular encoding

accurately captures the essential properties and capabilities observed in the process of

natural development.

The grammatical approach can be applied to neuroevolution as well. In cellular

encoding (CE; Gruau and Whitley, 1993; Gruau, Whitley, and Pyeatt, 1996), a grammar

describes how the neural network should be constructed step by step. The process starts

with an ancestor cell connected directly to input and output (łcellž here refers to a node in

the neural network being constructed; ﬁgure 4.3

𝑎

). Each cell has a pointer to the grammar,

which is represented as a tree. Each node in the grammar tree contains an instruction that

speciﬁes how the neural network should be modiﬁed. After each such step is completed,

the pointer traverses to the child of the node, until a node with the łendž instruction is

reached.

For example, in ﬁgure 4.3, the ﬁrst step is a sequential division. The top cell is then

divided in parallel, and the bottom node is divided sequentially again. The top node of

that division is divided in parallel, and the connection to the bottom node is negated. As

the last step, one is added to the threshold of the ﬁrst node resulting from the last parallel

CHAPTER 4. INDIRECT ENCODINGS

(𝑎) Initial network) (𝑏 Final XOR network)

Figure 4.3: Cellular encoding approach to evolving neural network structure. (

𝑎

) The grammar

encodes instructions on how to construct the network step by step, starting from a network that

consists of a single ancestor cell. Each cell points to a location in the current location in the

grammar tree, and is advanced to a child node in the tree as the instruction is executed. S=sequential

division, P=parallel division, - = negating a connection, A=adding one to a node threshold, E=end

the construction branch. In addition, a recurrency symbol R (not shown) allows continuing the

construction again from the top of the grammar, with a counter deciding how many times the

recurrency can be traversed. (𝑏) After eight steps, the network that results from this construction

process implements XOR. With recurrency added to the bottom right of the grammar, it can be

extended by repeating the entire structure, thus implementing networks that calculate the parity of

any number of inputs. The grammar trees can be evolved with genetic programming techniques,

making automated discovery of complex networks with repeating structure possible. Figures from

Gruau and Whitley (1993).

division. As a result of this construction process, a neural network that implements XOR

is created (ﬁgure 4.3 𝑏).

An important extension to this simple example is the ability to include recurrency in

the grammar. For example, if a recurrency is added to the leftmost end node, the entire

network structure is constructed again at that location from the top of the grammar. Its

output becomes the ﬁrst input of the ﬁrst network, thus including one more input to the

combined network. A counter can then be used to specify that the recurrency should be

traversed

𝑛

times. Thus, the execution of the grammar results in a network that calculates

𝑛 + 1

-bit parity! Similarly, networks can be constructed that calculate e.g. whether the

input vector has a symmetric pattern of ones and zeros. Thus, the recurrency in the

grammar is a powerful way to take advantage of repetitive structure in networks.

Whereas L-systems were not designed to be evolved, CE was: Because the CE

grammars are trees, genetic programming (Banzhaf, Nordin, R. E. Keller, et al., 1998) is a

natural way to evolve them. Indeed, parity networks up to 51 bits were evolved in this

manner, demonstrating that evolution can indeed take advantage of repetition. It is also

possible to prove that any neural network topology can be represented in CE grammars.

However, it does not mean that the good topologies are easy to ﬁnd. As a matter of fact,

the grammar can be turned around to represent connections in the network rather than

cells, resulting in a diﬀerent bias in the kinds of networks that can be constructed easily

(Luke and Spector, 1996). The challenge is to discover the right biases and code them into

the grammatical representation.

Besides L-systems and CE, other grammatical encoding mechanisms have been

CHAPTER 4. INDIRECT ENCODINGS

developed as well. For instance, in order to scale neuroevolution to the size and complexity

of deep learning (section 3.4.2), it is possible to represent the weights as a sequence of

mutations, and only store the mutation seeds (Petroski Such, Madhavan, Conti, et al., 2017).

The process begins with an initial neural network parameter vector

𝜃

, which is generated

from a random seed

𝜏

using a deterministic initialization function

𝜙

𝜃

= 𝜙(𝜏

)

. Each

subsequent network in the evolutionary lineage is derived from its parent by applying

a deterministic mutation function

𝜓

, which adds pseudo-random Gaussian noise to the

parent’s weights. In this framework, the complete weight vector

𝜃

𝑛

of any individual in

the population is reconstructed by sequentially applying the mutation function across a

series of seeds, beginning with the original initialization. This sequence-based encoding

replaces the need to store full high-dimensional weight vectors with a compact list of seeds

[𝜏

, 𝜏

, . . . , 𝜏

𝑛

]

. Since each mutation step can be reproduced exactly from its corresponding

seed, the genotype of each network is both lightweight and fully deterministic.

Thus, encoding the developmental processes as a series of grammatical rewrite

operations is a high-level alternative to systems that aim at replicating the low-level

cell-chemistry mechanisms. Incorporating learning as a lifetime stage of development

synergistically with evolution is a third approach, as will be described next.

4.2.3 Learning Approaches

In addition to the physical development explored in the last two subsections, much

of biological development happens through learning. The individual interacts with the

environment and adapts its structure and parameters accordingly. Such lear ning is a form of

indirect encoding as well: Evolution deﬁnes a starting point and a learning mechanism, and

the full individual emerges indirectly through their synergy. The biological implications of

this idea are explored in more depth in section 14.4. In this subsection, the synergy is put

to work as a computational mechanism that allows us to construct more complex systems.

Many of the neuroevolution methods reviewed so far can be used to construct the

initial starting point, and many of the standard neural network learning algorithms can be

used to establish the developmental phase. But several questions remain: First, should the

improvements discovered by learning be coded back into the genome, in a Lamarckian

evolutionary process, or should it only determine the ﬁtness of the individual, thus guiding

a Darwinian evolution through the Baldwin eﬀect (as described below)? Second, if

gradient-descent-based learning methods are to be used, where do the targets for it come

from? Third, does the development require weight adaptation, or can it be more eﬀectively

encoded as a state of activation? Each of these questions is addressed in turn in this

section.

First, Lamarckian evolution (Lamarck, 1809) suggests that acquired traits can be

inherited, which is unlikely in biology. For instance, giraﬀes stretch their necks in order

to reach higher, and their oﬀspring will have longer necks as a result. In some cases,

non-genetic transmission is possible through epigenetic means (Lacal and Ventura, 2018).

For instance, in a process called methylation, a methyl molecule attaches to the DNA,

modulating genetic expression. As a result, for instance animals that must live in a hostile

environment may have oﬀspring that are more sensitive and fearful, compared to oﬀspring

of those who exist in a normal environment. While such changes are not permanently

CHAPTER 4. INDIRECT ENCODINGS

Fitness

With learning

Without learning

Genotype

Figure 4.4: Learning guiding evolution through the Baldwin eﬀect. In this needle-in-the-

haystack problem, it would be diﬃcult for evolution to ﬁnd the sharp peak when the ﬁtness

evaluations of the other solutions are all the same. However, learning allows modifying these

solutions, i.e. moving left and right along the

𝑥

-axis. Therefore, the closer the solution is to the

peak, the more likely it is to ﬁnd it through learning, as indicated by the red curve. Learning can

thus provide a more useful ﬁtness, and help evolution ﬁnd the peak faster. Adapted from Hinton

and Nowlan (1987).

encoded in the DNA, they do provide an immediate survival advantage that is inheritable.

Whether biologically plausible or not, computational evolution can take advantage

of both Lamarckian evolution and epigenetics. For instance, it may be possible to take

advantage of these principles in evolving deep learning networks. Such networks are often

too large to evolve eﬀectively; however, it may be possible to train them and code the

learned weights back to the genome. This approach has been successful, for instance,

in evolving convolutional architectures for image processing (Hadjiivanov and Blair,

2019; Prellberg and Kramer, 2018). Through the approach, evolutionary exploration and

gradient-based tuning can be combined.

One challenge in implementing Lamarckian/epigenetic evolution is that it may lead

to a loss of diversity. Through gradient descent, the individuals in the population are

modiﬁed in the same direction, as suggested by the gradient. The learning process may

thus inter fere with evolutionary exploration. A possible way to cope with this challenge is

to train diﬀerent individuals with diﬀerent batches of data, or more broadly, use ensembling

techniques to keep the population diverse. Eﬀective ways of managing exploration and

learning are still open to research.

The Baldwin eﬀect can also lead to powerful computational approaches. The

adaptations are not coded back into the genome, but only used to determine ﬁtness.

Learning thus guides evolution towards more promising individuals (which is the Baldwin

eﬀect). Indeed, early studies showed that such a combination can be more powerful than

evolution or learning alone. For instance, in the needle-in-the-haystack problem, even

when learning consisted of simply random changes, it was enough to broaden the basin

of the target, and make it more likely for evolution to discover it (ﬁgure 4.4; Hinton and

Nowlan, 1987). Thus, even if the learning does not aﬀect the genome, it can be useful

in guiding the evolution by suggesting which genetic individuals are more promising.

This idea is consistent with theories in evolutionary biology that emphasize the role of

developmental plasticity in driving evolution (West-Eberhard, 2003).

Interestingly, this result does not mean that an evolutionary system guided by the

CHAPTER 4. INDIRECT ENCODINGS

Baldwin eﬀect gradually encodes more and more of the learned information into the genes,

eventually making learning unnecessary. That is, the evolved solutions before learning

often perform quite poorlyÐit is only after the learning that they perform well. This

phenomenon is precisely the idea of synergistic development. Because learning is always

part of the evaluation, evolution discovers the best possible starting points for learning, so

that the system as a whole performs as well as possible. The starting points can be far

from the optimum as long as learning can reliably pull them into the optimum. Apparently,

in many tasks, there are many such starting points and they are easier for evolution to ﬁnd

than points close to the optimum would be. Therefore, evolution ﬁnds a synergy where

both methods play a signiﬁcant role.

Regarding the second question posed at the beginning of this subsection, so far the

discussion has assumed that the optimal targets for gradient descent are known. However,

surprisingly, the process works even when such targets are not available. One possibility

is to use related targets, such as predicting how the inputs are going to change as a result

of the action (section 14.4.1). They do not directly specify what the agent should do, but

they do allow learning internal representations that help evaluate the candidate.

Another approach is to use the behavior of cur rent population champions, or even just

that of parents, to train the oﬀspring (McQuesten and Miikkulainen, 1997). This result is

counterintuitive because evolution depends on discovering oﬀspring that are better than

the parents. However, what is important is that the oﬀspring perform well after training.

Thus, the process takes advantage of the Baldwin eﬀect in the same way as evolution

did in the needle-in-the-haystack problem (ﬁgure 4.4; Hinton and Nowlan, 1987). If the

teachers are in the neighborhood of the optimal solutions, training will move the oﬀspring

around in this neighborhood, making it more likely that some of them will get closer to

the optimum (ﬁgure 4.5). Selecting such solutions allows evolution to make progress even

when the ﬁtness evaluations without learning are not very informative.

The third question concerns the nature of adaptation: Is it necessary to encode the

learned behaviors into the weights, or could it be more eﬀective to encode them simply

as a recurrent activation state? Of course, if the network needs to perform many trials

starting from a reset activation, weight adaptation is necessary. However, in many domains,

individuals perform and adapt continuously throughout their lifetime. With the appropriate

recurrent circuitry, they could develop an activation state that modulates their fur ther

actions, similarly to a change in weights. Such an encoding of adaptation could be easier

to discover and maintain.

To study this question, instead of gradient descent, a more general low-level adaptation

mechanism is needed: Hebbian learning (Widrow, Y. Kim, D. Park, et al., 2023). The

basic idea is that if the neurons on both sides of the connection are active at the same time,

the connection is useful and its weight should be increased. To bound such increases, a

normalization process such as weight decay is often added, for instance:

Δ𝑤

𝑖 𝑗

= 𝛼

𝑖 𝑗

𝑜

𝑖

𝑜

𝑗

− 𝛽

𝑖 𝑗

𝑤

𝑖 𝑗

, (4.2)

where

𝑤

𝑖 𝑗

is the weight between neurons

𝑖

and

𝑗

with activations

𝑜

𝑖

and

𝑜

𝑗

, and

𝛼

𝑖 𝑗

and

𝛽

𝑖 𝑗

are learning and decay rate parameters. Unlike gradient descent, Hebbian learning is

entirely local to each connection and requires no learning targets at the output. In this sense,

CHAPTER 4. INDIRECT ENCODINGS

Figure 4.5: Training to imitate champions or parents. When well-performing individuals, such

as population champions or parents, are used as teachers (T), they pull the oﬀspring (X) towards

the teachers. Those oﬀspring that perform the best after training are likely to be located near the

optimum to begin with, and although some (red X) are worse after training, some (green X) are

likely pulled closer to the optimum. Such training provides useful exploration around the optimum,

making it more likely to be discovered.

it is closer to biological learning than gradient descent, and therefore a proper comparison

to adaptation based on recurrency. Note that Hebbian learning also provides an alternative

that avoids the second question in this section, i.e. where the targets for development come

fromÐit does not need them. On the other hand, it cannot take advantage of targets either,

and therefore it is generally not as powerful as gradient descent.

Nevertheless, Hebbian learning is a compelling approach to developmental indirect

encoding on its own. Networks with Hebbian learning can change their behavior based on

what they observe during their lifetime. For instance, they can evolve to ﬁrst perform one

task, such as turn on a light, and then switch to another such task, such as to travel to a

target area (Floreano and Urzelai, 2000). While it is biologically plausible, an interesting

practical question ar ises: Can such low-level adaptation be more eﬀectively implemented

through recurrent activation?

The above foraging domain with good and bad food items can be used to study this

question (Stanley, Bryant, and Miikkulainen, 2003). The usual NEAT method for evolving

recurrent networks can be compared with a version that takes advantage of Hebbian

learning: It evolves the learning rate and decay rate parameters

𝛼

𝑖 𝑗

and

𝛽

𝑖 𝑗

for each

connection, in addition to the weights and the network topology. Each evolved network is

placed into the foraging environment where it can consume food items; if an item is good,

it receives a pleasure signal, and if bad, a pain signal. All items in a trial are the same, so

CHAPTER 4. INDIRECT ENCODINGS

(𝑎) With Hebbian learning (𝑏) No Hebbian learning

Figure 4.6: Networks evolved with NEAT with and without Hebbian learning. Nodes are

numbered through historical markings. Black lines represent excitatory and blue lines inhibitory

connections; loops indicate recurrent connections; line thickness cor responds to the connection

weight. (

𝑎

) With Hebbian adaptation, performance is encoded more holistically, utilizing

plastic synapses throughout the network. (

𝑏

) Without Hebbian adaptation, the network is more

parsimonious, with adaptation coded into recurrent connections at the outputs. While both types

of solutions are successful, Hebbian adaptation provides a larger search space that is more diﬃcult

to navigate. In simple tasks, at least, it can thus be more eﬀective to rely on recurrency to represent

adaptation. Figure from Stanley (2003).

after it consumes the ﬁrst item, it needs to either eat all of them or none of them to receive

maximum ﬁtness.

While both approaches evolved successful networks, NEAT without adaptation

required about half the generations to do so. There were fewer parameters to optimize,

and evaluations were more consistent. Indeed, the solution networks look very diﬀerent

(ﬁgure 4.6): While the ﬁxed-weight recurrent networks were parsimonious with recurrency

focused at the output, the adaptive networks were more complex and holistic, using many

more adaptive weights throughout the network. Because many weights adapt, it was

not possible to rely on only a few loops, and the behavior became encoded redundantly

throughout.

Thus, in such a simple task recurrency was more eﬀective than Hebbian adaptation.

It is of course possible that in more complex situations adaptation provides additional

power that may be needed. And indeed, uch a task will be discussed in section 12.3.2 in

the context of real-world transfer for locomoting robots. There also exists an interesting

connection between Hebbian learning and modern machine learning mechanisms such as

self-attention, which we will discuss later in section 4.4.2.

4.3 Indirect Encoding through Hypernetworks

A common feature of indirect encodings in the previous section is that a speciﬁc phenotypic

component at a given point in development inŕuences the states of nearby components. In

other words, development progresses through local interactions. This section reviews a

particularly popular indirect encoding that, when ﬁrst introduced, broke with the strong

CHAPTER 4. INDIRECT ENCODINGS

tradition of such local interactions and temporal unfolding. In eﬀect, it introduces a new

category of indirect encoding at a diﬀerent level of abstraction.

This approach, now known under the name hypernetwork, is based on the idea

of one neural network (the hypernetwork) encoding the parameters of a potentially

much larger phenotype in one shot, i.e. each component in the phenotype is determined

independently of any other component. Whereas many indirect encoding approaches

illustrate opportunities for utilizing biological principles but do not yet perform as well as

the best direct approaches, such hypernetworks already perform better in many standard

benchmarks. Initially tested on indirectly encoding images, which we will discuss in the

next section, this approach can be extended to many other domains, such as 3D robot

morphologies, and even to encode artiﬁcial neural networks themselves (section 4.3.3).

4.3.1 Compositional Pattern Producing Networks

The most common way to implement hypernetworks in neuroevolution is through com-

positional pattern-producing networks (CPPNs; Stanley, 2007). Even though they are

fundamentally distinct from developmental systems, CPPNs are inspired by developmental

biology: Structures are built within a geometric space analogously to chemical gradi-

ents that deﬁne the axes of the embryo. For example, when the embryo of Drosophila

melanogaster (one of developmental biologists’ favorite pets and commonly known as

the fruit ŕy) develops, chemical gradients establish axes from front to back, head to tail,

and left to right. This way, structures such as the wings can be situated at their correct

positions. Inside these structures and substructures, such as the intricate patterning of

the wings, which are placed within the local coordinate system of the wing itself. In our

own bodies, such gradients help deﬁne the position of e.g. the legs, arms, and hands, and

within these structures, substructures such as the ﬁngers of the hands. It is expensive to

simulate the underlying process of the diﬀusion of morphogens, which is why CPPNs

simplify this process into a network of function compositions represented as a graph. On

a high level, CPPNs are generative neural networks that create structures with regularities

in one shot and without going through a period of unfolding/growth.

We will start by looking at how a CPPN can be used as an indirect encoding for image

generation (ﬁgure 4.7) but later explore how it can be easily extended to other domains

such as generating neural network connectivity patterns (section 4.3.3), morphologies of

3D soft robots (section 4.3.2), and agent environments (section 9.3). CPPNs have also

impacted the broader ﬁeld of machine learning in a variety of diﬀerent ways. For example,

CPPNs can be evolved to generate images that are entirely unrecognizable to humans,

yet they successfully fool even highly accurate deep neural networks, which conﬁdently

classify them as familiar objects (A. M. Nguyen, Yosinski, and Clune, 2015a). CPPNs

have even inspired improvements to deep neural networks, particularly addressing some

limitations of convolution by introducing coordinate-based input representations (R. Liu,

Lehman, Molino, et al., 2018).

A CPPN generates an image by taking as input the coordinates of a 2D location

𝑝 = (𝑥, 𝑦)

and outputting HSV, RGB, or grayscale values of the pixel at that location.

By repeating this process for all the pixels of a two-dimensional grid, a two-dimensional

image can be created. One advantage of the CPPN representation is that images can be

CHAPTER 4. INDIRECT ENCODINGS

x y d bias (1.0)

CPPN

h s v

(a) CPPN

(b) CPPN inputs

Figure 4.7: CPPN image encoding. Compositional pattern producing networks are neural

networks with diverse activation functions that generate geometric patterns. (

𝑎

) The network

illustrated is a two-dimensional CPPN, as it receives inputs

𝑥

and

𝑦

, along with

𝑑

, the distance from

the point (

𝑥

𝑦

) to the center of the image. When evaluated over many coordinates (

𝑏

), the CPPN’s

output forms an image or spatial pattern. The architecture depicted in (

𝑐

) is the speciﬁc CPPN that

generates the skull pattern shown at the top right. The colors in (

𝑐

) highlight diﬀerent components

of the evolved network that contribute to key features of the skull image, as determined through

functional analysis. The small images within the network nodes represent the activation patterns

computed at each node over

(𝑥, 𝑦)

coordinates. These patterns are ultimately combined by the

network to produce the ﬁnal output image, illustrating that CPPNs can encode complex spatial

regularities through simple compositional principles. Figure (

𝑐

) from Kumar, Clune, Lehman,

et al. (2025).

generated at any resolution by only changing the resolution of locations sampled and

without increasing the number of genotypic parameters of the CPPN itself. Such scaling

would not be possible with a direct encoding, in which each pixel in the image would have

to be optimized separately.

As discussed earlier in this chapter, one common goal of indirect encodings is to

be able to express patterns such as symmetry, repetition, etc. In order to allow CPPNs

to more easily express such patterns, nodes in these networks do not all implement

the same activation function as in traditional neural networks (including the networks

traditionally evolved by NEAT) but are chosen from a small set of activation functions,

such as Gaussian, sigmoid, and sine wave functions. For example, a Gaussian function

can create something similar to a symmetric chemical gradient, while a sigmoid generates

an asymmetric one, and a sine wave can create a repeating pattern. Things get more

interesting when functions are composed with each other, which is in some way analogous

CHAPTER 4. INDIRECT ENCODINGS

(a) (b) (c) (d) (e)

Figure 4.8: CPPN examples. CPPNs can produce patterns with repetition (

𝑎

) and repetition with

variation (

𝑏

). They can also create symmetric patterns such as the sunglasses shown in (

𝑐

), which

is encoded through the CPPn shown in (

𝑒

). By changing only a single connection, varying degrees

of symmetry can be produced, such as the morphed glasses in (

𝑑

). These examples demonstrate

the expressive power and ŕexibility of CPPNs in generating complex, structured patterns. Figure

from Stanley (2007).

to the morphogens creating local coordinate systems in real organisms, enabling their

incredible levels of complexity. For example, a sine wave composed with the square of a

variable

sin(𝑥

)

produces a pattern that is repeating but with some type of variation. Such

patterns are ubiquitous in many patterns seen in nature. Networks composed of only a few

of such functions can produce surprisingly complex structures, making them useful in a

wide range of applications, as we’ll see throughout this book. An example of such a CPPN

with diﬀerent activation functions is shown in ﬁgure 4.7

𝑏

, which creates the symmetric

and repeating pattern shown in ﬁgure 4.7𝑎.

How can we evolve these CPPNs? Traditionally, CPPNs are evolved with NEAT,

which makes it possible to optimize both the weights and the architecture of the network.

Additionally, NEAT enables CPPNs to slowly complexify and to produce more and more

complex patterns. Augmenting NEAT to evolve CPPNs instead of the typical ANNs is

straightforward. Every time a structural mutation adds a node to the network, the activation

function of that node is randomly chosen from a pre-deﬁned set of activation functions,

often with equal probability. However, it is certainly possible to also use a method like ES

to optimize the weights of a ﬁxed-topology network, which includes randomly assigned

activation functions for each node. We will leave this as an exercise for the reader.

One way to explore the representational power of an encoding is through interactive

evolutionary computation (IEC)(Takagi, 2001). Instead of evolving towards a certain

target, in interactive evolution, the user guides the evolutionary search by selecting parents

from a set of candidate solutions (often by visually taking a look at them and deciding

what they like most). The beneﬁt of IEC is that it can reveal an encoding’s ability to being

able to encode a diversity of artifacts, while being able to establish and exploit regularities.

We’ll further discuss how this idea of interactive evolution allows human designers to

drive evolutionary discovery, how it enables multiple humans to collaboratively evolve

artifacts, and how it can even lay the foundation for new types of machine learning-based

games in chapter 8.

Exploring the space of CPPN-encoded images through IEC demonstrates that the

representation is able to capture many of the desirable regularities identiﬁed earlier in this

chapter. For example, it is able to create patterns that show repetition (ﬁgure 4.8

𝑎

) but

also repetition with variation (ﬁgure

4.8

𝑏

). Figure 4.8

𝑐

illustrates a set of "sunglasses"

CHAPTER 4. INDIRECT ENCODINGS

Figure 4.9: CPPN pattern elaboration over generations. The ﬁgure shows a chronological

sequence of CPPN-encoded designs, discovered and elaborated upon during interactive evolution.

Together with the designs, the number of hidden node functions and connections is also shown.

This progression illustrates CPPNs’ capacity to preserve fundamental structural regularities, such

as bilateral symmetry, while elaborating on them across generations. Figure from Stanley (2007).

that exhibit bilateral symmetry, meaning they are mirror images on either side. This

symmetry serves as an example of how genetic elements can be eﬀectively reused. In

this case, the CPPN-based function that deﬁnes one lens (the left one) is identically used

for the other lens (the right one). Intriguingly, modifying just one connection gene, as

shown in ﬁgure 4.8

𝑒

, can alter the symmetry of the lenses, resulting in a slight asymmetry

while still preserving the overall pattern’s coherence, as seen in ﬁgure 4.8

𝑑

. Even though

the łgenetic codež is the same for both sides, one lens displays a variant of the pattern

seen in the other. This ability to evolve and reﬁne speciﬁc features without disrupting

the fundamental pattern is signiﬁcant and possible because changes in the coordinate

frame within a CPPN do not ruin the overall pattern being created. Therefore, even if

the symmetry of the underlying coordinates is disr upted by a single gene alteration, the

intricate pattern created within these coordinates remains intact and unaltered.

Additionally, one of the fundamental properties of natural evolution is that it is able to

elaborate on discovered designs in subsequent generations. For example, the fundamental

bilateral body plan, discovered early on during the Cambrian explosion, has undergone

extensive development over hundreds of millions of years, yet its core structure has been

consistently preserved. In a similar vein, the question arises: Can a CPPN eﬀectively

replicate a bilateral body plan and, over generations, both preserve and reﬁne this bilateral

symmetry? IEC experiments demonstrate that after discovering a spaceship-like design

with bilateral symmetry (ﬁgure 4.9

𝑎

), that design can then be elaborated upon, with the

CHAPTER 4. INDIRECT ENCODINGS

underlying regularities becoming more complex in subsequent generations. Importantly,

the basic parts that form the spaceship are conserved during this elaboration, such as its

nose, tail, and wings. In the subsequent sections, we will see that this ability to elaborate

on previous discoveries is an important property of CPPNs.

CPPNs are also not restricted to 2D and can easily be extended to generate 3D for ms

instead of 2D images by adding a third

𝑧

-input and can even encode locomoting 3D soft

robots, as we will see in the next section.

4.3.2 Case Study: Evolving Virtual Creatures with CPPN-NEAT

A good test domain for diﬀerent indirect encodings is evolved virtual creatures, which

refer to digital entities that interact within a computational environment. These creatures

are typically part of a simulation in which various forms of artiﬁcial life compete, survive,

reproduce, and evolve over time based on certain predeﬁned criteria or environmental

pressures. In this section, we will have a look at how the morphologies of such creatures

can be deﬁned through a CPPN. We will encounter virtual creatures again throughout the

book, such as in the context of collective intelligence (section 7.3.2) or when discussing

the co-evolution of morphologies and neural networks (section 9.2.2).

Unlike the static CPPN-encoded images we have encountered in the previous section,

virtual creatures often have to interact with their environment, requiring a form of embodied

cognition. This dynamism challenges the encoding schemes to not only create viable

forms but also to encode behaviors that are eﬀective in a given environment. Virtual

creatures, with their varied morphologies and behaviors, present a complex and diverse

space to explore. This complexity makes them ideal for testing the capabilities of indirect

encodings to generate a wide range of solutions, where there is a coherent link between

form and function.

The particular vir tual creatures we are looking at next are three-dimensional soft

robots (Cheney, MacCurdy, Clune, et al., 2014). Each robot is made out of an arrangement

of voxels, where each voxel can be one of four materials, displayed as diﬀerent colors

(ﬁgure 4.10). Voxels colored green undergo periodic volumetric actuations at 20%

intervals. Voxels colored light blue are passive and soft, with no inherent actuation; they

deform only in response to the actions of nearby voxels. Red voxels behave like green

ones but with counter-phase actuations. The dark blue voxels are also passive, but they

are more rigid and resistant to deformation than their light blue counterparts. These soft

robots do not have sensors, and the patterns of material types thus fully determine the

robot’s actuation pattern. This means that the optimization task here equals ﬁnding a

pattern of materials that makes the robot move as fast as possible.

The robot-generating CPPNs take as input the

𝑥

𝑦

, and

𝑧

coordinates, and the distance

from the center (

𝑑

) of each voxel. One of the network’s outputs indicates the presence of

material, while the other four outputs, each representing the speciﬁc material mentioned

above, output the maximum value indicating the type of material present at that voxel.

Separating the phenotypic component’s presence and its parameters into distinct CPPN

outputs has been demonstrated to enhance per formance. If there are several disconnected

patches, only the central patch is considered in creating the robot morphology.

Optimizing these CPPN representations with NEAT showed that they were indeed

CHAPTER 4. INDIRECT ENCODINGS

(𝑎) Indirect encoding

(𝑏) Direct

encoding

Figure 4.10: Indirect vs. direct encoding. The goal in this domain is to ﬁnd the right composition

of voxel materials (e.g. red and green voxels actuate at diﬀerent frequencies while dark blue

voxels are passive) so that the robot is able to locomote as fast as possible. This ﬁgure shows an

example of a 3D soft robot generated with the indirect CPPN encoding (𝑎) and a direct encoding

(

𝑏

), in which each voxel is optimized independently. In contrast to the direct encoding, the

CPPN-based encoding is able to produce 3D structures with symmetries and repeating motifs,

resulting in fast locomotion. Figure from Cheney, MacCurdy, Clune, et al. (

2014). Videos at

https://neuroevolutionbook.com/demos.

not restricted to generating static structures but could produce fully functional three-

dimensional soft robots. An example of such an evolved robot locomoting is shown in

ﬁgure 4.10

𝑎

. This robot morphology, together with other morphologies discovered during

evolution, displayed interesting regularities, often including symmetry and repetition. The

opposite is true for robots that used a direct encoding, in which the parameters of each

voxel were encoded individually. These robots often failed to perform well without any

clear regularities in their morphologies (ﬁgure 4.10

𝑏

). A direct encoding made it more

challenging to ﬁnd structures that display the globally coordinated behaviors necessary for

eﬃcient locomotion strategies.

CPPNs can generate structures with regularities by giving the network access to

the locations of each element of the structure to be generated. In biological systems,

this information is not directly available; it is thus an interesting question whether

it is also possible to generate complex patterns artiﬁcially solely based on the local

communication of the structure’s components. We’ll return to this question in section 7.3

on neuroevolutionary approaches for collective intelligence, where we will also again

encounter three-dimensional soft robots.

4.3.3 HyperNEAT

This chapter started with a discussion of the intricate structure of the human brain and its

complex regularities. For example, in the brain, there are neural modules with repeating

connectivity patterns and left/right symmetry. Given a CPPN’s ability to express complex

2D and 3D patterns, it makes sense to also consider if they could be used to generate

such complex neural connectivity patterns as well. With this goal in mind, the question

becomes what such a CPPN should look like and what its inputs should be.

To answer this question, again consider convolutional connectivity patterns. In a

convolutional neural network the same feature detector is employed at multiple locations

in a network. In order for the algorithm to discover such heuristics by itself, a method

is needed that can learn that there should be cor relations between the weights of nearby

neurons. Essentially, this involves generating weight patterns based on the geometry of

CHAPTER 4. INDIRECT ENCODINGS

(𝑎) (𝑏)

Figure 4.11: HyperNEAT substrates. Two diﬀerent types of HyperNEAT substrates are shown,

which are the ar rangement of nodes and their roles. In (

𝑎

), nodes are arranged on a 2D plane. The

CPPN is queried with all pairs of nodes to determine how they are connected to each other. A

more complex substrate for evaluating checkerboard game positions is shown in (

𝑏

). The input

layer reŕects the geometry of the board. The output layer C has one node that determines the

quality of a board state. The CPPN has two outputs, AB and BC. To query a connection from

layer A to B, output AB is used, while from layer B to the output layer C, output BC is used. In

this manner, the design of the substrate allows HyperNEAT to leverage geometric regularities to

produce structured connectivity patterns. Figure (

𝑎

) from Stanley, D’Ambrosio, and Gauci (2009)

and ﬁgure (𝑏) from Gauci and Stanley (2010).

the input and output domains. For instance, if the input and output domains are both

two-dimensional, the weight of a connection between two neurons can be expressed

as a function

𝑓

of the positions

(𝑥1, 𝑦1)

and

(𝑥2, 𝑦2)

of the source and target neurons,

respectively.

This is the fundamental insight behind the method called HyperNEAT (hypercube-

based NEAT; Stanley, D’Ambrosio, and Gauci, 2009), which can be viewed as one of the

most foundational and impactful applications of CPPNs. In essence, in HyperNEAT every

neuron is given a role (e.g. input, hidden, output) and a location in space (traditionally by

a user, but this process can also be automated, as we will see in the next section). The

collection of roles and positions in HyperNEAT is often referred to as the substrate, to

distinguish it from the CPPN itself. The connectivity patter ns between the neurons are

determined by CPPNs evolved through NEAT, which take as input the location of two

nodes. Querying the CPPN with every possible connection between two points, with the

output of the CPPN representing the weight of the connection, produces an artiﬁcial neural

network. This process is visualized in ﬁgure 4.11

𝑎

. To not only produce fully connected

networks, connections might only be expressed if the CPPN output is higher than a certain

threshold. In other HyperNEAT variants, a second output determines if a connection

should be expressed (Verbancsics and Stanley,

2011). This approach can be helpful

because it decouples the pattern of weights from the pattern of expressed connections.

Given the neurons’ positions in space, HyperNEAT can create a variety of regular

connectivity patterns. For example, in a typical convolutional network, a ﬁlter is applied

across the geometry of the input space. HyperNEAT can invent the concept of convolution

by itself, because it can be expressed as a function based on the distance of the source to

the target neuron:

𝑥

− 𝑥

and

𝑦

− 𝑦

. The intriguing aspect of HyperNEAT lies in its

CHAPTER 4. INDIRECT ENCODINGS

ability to go beyond conventional convolution as the sole signiﬁcant pattern of connectivity.

Through HyperNEAT, evolved neural networks have the potential to uncover and leverage

various patterns of regularity, inaccessible to traditional learning algorithms for neural

networks.

For example, consider the task of creating a neural network that evaluates board

positions in the game of checkers; that is, a speciﬁc board conﬁguration is given to a

neural network as input, and it has to determine how good this position is. This game is

intuitively geometric, with the movement rules for each piece being the same for every

location on the board. The HyperNEAT approach should be able to take advantage of the

CPPN’s ability to calculate the connection weights based on the positional diﬀerences

between two nodes, enabling it to uniformly apply a repeating concept throughout the

entire board. In a sense, HyperNEAT is able to see the geometry of the task. We thus

expect that an indirect representation that can learn to repeat strategies across the board

should have an advantage when compared to a direct encoding like NEAT, which has to

learn this pattern for each square on the board separately. In the adaptation of HyperNEAT

for the game of checkers, the input layer can be designed as a two-dimensional structure,

mirroring the checkerboard’s layout, as illustrated in ﬁgure 4.11

𝑏

(Gauci and Stanley,

2010). This substrate has one input

𝐴

and one hidden layer

𝐵

and a single output node

𝐶

which outputs the evaluation of a board position. Note that the CPPN here has two outputs,

AB and BC. Therefore, the

𝑥

and

𝑦

coordinates of each node are adequate to pinpoint

the speciﬁc connection being queried, with the two separate outputs diﬀerentiating the

connections between A&B and B&C from each other.

And indeed, HyperNEAT was able to ﬁnd a high-performing board evaluator signiﬁ-

cantly faster than NEAT, which was in part due to HyperNEAT’s ability to search through

a smaller genotypic space. Additionally, when comparing the most general solutions found

by both approaches to randomized opponents, HyperNEAT showed a signiﬁcantly higher

win rate and also lost signiﬁcantly fewer games than NEAT solutions. These improved

generalization abilities were a result of HyperNEAT’s ability to discover the necessary

regularities in the geometry of the game. This observation was supported by examinations

of the connectivity patterns of the most general HyperNEAT solutions, which were often

smoother and more continuous than less general solutions.

Beyond board games, we hypothesized at the beginning of this chapter that indirect

encodings should also be useful for tasks such as controlling a quadruped robot (ﬁg-

ure

4.12

𝑎

), taking advantage of the task’s symmetry and regularities. For HyperNEAT,

the positions of sensor and motor neurons within a quadruped body can be exploited to

eﬃciently develop consistent gait patterns that rely on connectivity patterns unrelated to

convolution (Clune, Stanley, Pennock, et al., 2011). Each leg can be viewed as a repeated

module, with diﬀerent gaits having diﬀerent regularities themselves. For example, in a

typical horse trot gait, the diagonal pairs of legs move forward at the same time, whereas

in other gaits, such as the pace gait, the two legs on the same side move forward at the

same time. The HyperNEAT substrate for this task is shown in ﬁgure 4.12

𝑏

, and features

three 2D sheets for the inputs, hidden layer, and output layer. Inputs on the substrate are

arranged to reŕect the geometry of the task, with each row receiving information about

the state of a single leg (e.g. the current angle of the three joints of the leg, a sensor

CHAPTER 4. INDIRECT ENCODINGS

Figure 4.12: A neural network controller for a quadruped robot produced by HyperNEAT.

The goal in this task is to ﬁnd a neural network able to control a quadruped robot (

𝑎

). The

HyperNEAT substrate has three layers: input, hidden, and output (

𝑏

). The input and output

nodes are arranged in a way to take the task geometry into account. (

𝑐

) shows a front view of the

network, and (

𝑑

) a view from the back. Input nodes are shown in yellow, and output nodes in blue.

Line thickness represents the magnitude of the weight. HyperNEAT autonomously discovers and

exploits geometric regularities in the task, generating connectivity patterns that enable eﬃcient

quadruped locomotion without requiring the user to specify these patterns explicitly. Figure from

Clune, Stanley, Pennock, et al. (2011). Videos at

https://neuroevolutionbook.com/demos

that indicates if the leg is touching the g round). The output substrate also reŕects the

morphology of the robot, with the three elements in each row outputting the desired new

joint angle.

It is interesting to look at the performance of indirect vs. direct encodings across

the continuum of regularity. For example, in the quadruped domain, the regularity

of the problem can be decreased by introducing faulty joints, in which noise is added

to the requested joint angle and the actual motor command that is sent. As expected,

HyperNEAT’s performance increased with increased task regularity, outperforming all

other approaches (NEAT and FT-NEAT, which is a variant of NEAT that has a ﬁxed

number of hidden nodes, which is the same as the number used in the HyperNEAT

substrate) with no or one faulty joint. When the problem was suﬃciently irregular (eight

and 12 faulty joint treatments), FT-NEAT and NEAT started to outperform HyperNEAT.

The important lesson here is that the type of method to be used highly depends on the

target domain and how many regularities there are to exploit.

Interestingly, going beyond pure quantitative results, the gaits produced by HyperNEAT

were also often more regular and coordinated than those from NEAT. HyperNEAT often

produced two types of gaits. In one of them, all legs moved for ward in unison at the same

time, which suggests that HyperNEAT repeated the same connectivity pattern for each leg.

The other gait resembled more of a horse gallop gait, in which three legs moved together

with one of the legs moving in opposite phase. This gait indicates that HyperNEAT can

also produce regularities with variation (i.e. one leg moves diﬀerently from the other three

legs). These regularities were also reŕected in the HyperNEAT-produced weight patterns.

Figures 4.12

𝑐

𝑑

show the view of the same network from the front and from the back,

respectively. Observe the intricate and consistent geometric patterns of weight distribution,

such as the inhibitory connections from input nodes directed towards the upper hidden

nodes and excitatory connections aimed at the lower hidden nodes. Additionally, there is a

notable regularity with variations, exempliﬁed by the spread of inhibitory connections

into the output nodes, which changes along both the 𝑥 and 𝑦 axes.

CHAPTER 4. INDIRECT ENCODINGS

In summary, an indirect encoding such as HyperNEAT can oﬀer great beneﬁts, allowing

relatively compact CPPNs with only a handful of connections to encode functional neural

networks with millions of weights. In fact, even before DeepMind demonstrated that it

is possible to learn to play Atari games from pixels (Mnih, Kavukcuoglu, Silver, et al.,

2015), which has been a signiﬁcant milestone in their early successes and shaping the

landscape of deep RL, HyperNEAT was the ﬁrst method used to train neural networks to

play Atari games from pixels alone (Hausknecht, Lehman, Miikkulainen, et al., 2014).

However, HyperNEAT is also not a panacea for every task; it does perform best in

domains where regularities can be exploited, but it works less well in domains with many

irregularities. There have been attempts at combining the best properties of both direct and

indirect encodings. One such method is hybridized indirect and direct encoding (HybrID),

which discovers the regularities of the domain with an indirect encoding but then accounts

for the irregularities through a ﬁne-tuning phase that optimizes these weight parameters

directly (Clune, Beckmann, Pennock, et al., 2011). Another, more biologically plausible

solution is a combination of an indirect encoding together with lifetime learning. While

indirect encodings are eﬀective for generating regular neural structures, they also serve

as a strong foundation for local learning rules, such as the Hebbian rules introduced in

section 4.2.3. And indeed, neuroevolutionary experiments showed that neural connectivity

motifs that were indirectly encoded and thus more regular learned the best in a simple

operant conditioning task (Tonelli and Mouret, 2013), when compared to directly encoding

those starting weights.

This strong relationship between indirect representations and synaptic plasticity

underscores a crucial interplay between development and adaptability in biological

systems. Synaptic plasticity interacts closely with the structured neural connectivity

formed during development. This interplay allows for both the initial formation of

eﬃcient networks and their subsequent adaptation to new infor mation and experiences. In

biological systems, such connectivity patterns are not only shaped by genetic encoding but

are also dynamically reﬁned through experience-dependent plasticity. Understanding this

connection could signiﬁcantly impact the types of representations that will deﬁne the next

generation of indirect encodings. However, despite its potential implications for developing

more adaptable neural networks, this interplay between indirect encoding and synaptic

plasticity has yet to receive substantial attention from the broader neuroevolutionar y

research community.

4.3.4 Multiagent HyperNEAT

A potential killer application for generative and developmental systems such as HyperNEAT

is multiagent learning. In multiagent systems, multiple agents must learn behaviors that

may be cooperative (share common goals) or competitive (have opposing goals). In fact,

the quadruped robot example from the previous section can be viewed as a cooperative

multiagent system, where each leg acts as an individual agent that must coordinate with

the others to achieve eﬃcient locomotion. Traditional multiagent approaches often treat

each agent as a separate learning problem. For instance, in multiagent reinforcement

learning, each agent, or each role, might be trained with its own policy (Busoniu, Babuska,

and De Schutter, 2008). While this approach allows for specialization, it has two major

CHAPTER 4. INDIRECT ENCODINGS

drawbacks:

First, when agents are learned separately, they must each rediscover fundamental

behaviors from scratch (the problem of reinvention). Common skills that all agents should

share, such as the ability to kick or pass in soccer, are learned independently with no

mechanism to transfer knowledge. Such learning is ineﬃcient and can hinder coordination.

It also complicates credit assignment: whether the team succeeds or fails, it is unclear

which agent’s policy to credit or blame, since they were learned in isolation. In cooperative

settings, this approach is likely to lead to suboptimal team performance because the agents

may not develop complementary behaviors.

The second issue is scalability. As team size grows, lear ning separate policies becomes

exponentially harder. The joint state-action space grows with each added agent. More

agents mean more pairwise interactions to consider, and encoding each agent separately

makes the search space explode. If a method cannot reuse policies and share structure

easily, adding new agents requires signiﬁcant retraining or search. This limitation is

problematic for domains where team sizes are not ﬁxed or where large teams are needed.

Multiagent HyperNEAT addresses these challenges in an elegant way, by representing

a team of agents as a spatial pattern of policies rather than as separate, unrelated controllers

(D’Ambrosio and Stanley, 2008). Each agent’s policy can be associated with its position or

role in a canonical team layout. In other words, there exists an underlying policy geometry

describing how an agent’s behavior should change according to its location or role in the

team. For example, consider a soccer team: players near their goal have defensive roles,

and those toward the center and near the opponent’s goal have more oﬀensive roles. As

the position shifts, the policy gradually changes from defensive to oﬀensive in a smooth

pattern. Multiagent HyperNEAT aims to encode that entire pattern in one genome, so that

the team’s controllers are generated as coordinated variations of a shared strategy.

HyperNEAT’s CPPN is well-suited to encode such patterns. To extend HyperNEAT

to multiagent teams, an extra dimension

𝑧

is introduced to represent diﬀerent agents.

Essentially, imagine that the neural network substrate for a single agent’s controller

is replicated for each agent, but each replica is positioned at a diﬀerent z-coordinate

corresponding to that agent’s role. The same CPPN is then queried to produce weights

for every agent’s network, but with the

𝑧

-value indicating which agent’s network is being

wired. In this manner, one CPPN can generate distinct controllers for each agent, yet they

all originate from a common encoding. Figure 4.13 illustrates this concept: one CPPN

produces a heterogeneous team by mapping diﬀerent

𝑧

-layers to diﬀerent agent controllers.

The

𝑧

-axis eﬀectively acts as a blueprint for team heterogeneity, allowing the CPPN to

vary the policy smoothly across agents or keep them identical by ignoring

𝑧

. Notably, the

CPPN can be initialized with knowledge of symmetry along the

𝑧

-axis (e.g. if left/right

roles should mirror) by special symmetric functions, injecting prior knowledge of team

structure (D’Ambrosio, Lehman, Risi, et al., 2010).

Because all controllers are derived from one generative model, fundamental skills can

be shared. The CPPN can output similar weight patterns for multiple agents (imparting a

common skill) while also outputting variations for speciﬁc roles. This process addresses

the reinvention problem: a basic strategy discovered for one agent can automatically

appear in others. For example, if passing behavior is encoded as part of the CPPN’s

CHAPTER 4. INDIRECT ENCODINGS

(𝑎) CPPN (𝑏) Team substrate

(𝑐)

Predator-prey

task

(𝑑) Training performance (𝑒) Scaling performance

Figure 4.13: Multiagent HyperNEAT encoding. A single CPPN is used to generate distinct

neural networks for each agent in a team. The CPPN (

𝑎

) is augmented with an additional input

𝑧

indicating which agent’s neural network is currently being created. The team substrate (

𝑏

) consists

of multiple copies of a single substrate replicated along the

𝑧

-axis. By querying it, policies that

vary smoothly across agents can be created. For example, in the predator-prey task (

𝑐

), the

𝑧

coordinates for each predator (shown in white) are determined by their initial position, arranged

along the horizontal dimension. The heterogeneous multiagent HyperNEAT approach achieved

more eﬀective solutions and did it faster than a homogeneous approach (

𝑑

). When scaled to larger

numbers of agents after training, the heterogeneous approach scaled signiﬁcantly better (

𝑒

). In this

manner, eﬀective teams of varying sizes can be discovered automatically. Figure from D’Ambrosio,

Lehman, Risi, et al. (2010). Videos at https://neuroevolutionbook.com/demos.

function, all relevant agents can pass without each evolving that skill independently. In

essence, the genome encodes a continuum of heterogeneity from fully identical policies to

fully distinct ones. Evolution can ﬁnd the optimal point on that spectrum, distributing

shared skills among agents and specializing where needed. This ability is a powerful

representational advantage over direct encodings.

To evaluate the value of multiagent encoding, multiagent HyperNEAT was compared

to a homogeneous setup where the additional

𝑧

input was not provided to the CPPN,

thus encoding the same neural network for each agent (ﬁgure 4.13

𝑐

). Experiments were

run in the predator-prey task where the predators had to coordinate to catch the prey.

Importantly, while the predators are equipped with ﬁve rangeﬁnder sensors that detect

nearby prey, they cannot detect other predators, making the task particularly challenging

and demanding precise coordination. Heterogeneous teams discovered more eﬃcient

policies and converged faster than homogeneous teams, highlighting the advantages of

a team-wide policy geometry (ﬁgure 4.13

𝑑

). Homogeneous teams rarely succeeded in

solving the task, further emphasizing the beneﬁts of the policy diversity generated by

CHAPTER 4. INDIRECT ENCODINGS

multiagent HyperNEAT. The approach was able to discover sophisticated strategies such

as corralling, where multiple predators surround the prey and gradually drive it toward the

center. An exciting consequence of representing a team as a continuous policy geometry

is the ability to scale team size on the ŕy. Since the CPPN is a function that can be queried

at arbitrary points (including new

𝑧

-coordinates), we can add new agents by sampling

new points in the policy space. For instance, if a predator-prey team is evolved with ﬁve

predators, one can deploy more predators by assigning them appropriate new positions

and using the CPPN to create their controllers, eﬀectively interpolating the learned policy

geometry. In other words, new policies are inserted by sampling between existing ones.

Using this approach, performance can be scaled to larger teams of 1,000 agents without

further training (ﬁgure 4.13

𝑒

). This capability of learning once and deploying to any team

size is a unique feature of the multiagent HyperNEAT encoding. It provides a level of

ŕexibility not available in methods that evolve a ﬁxed number of agents. In practice, there

may be limits; extrapolating far beyond the training conﬁguration can degrade performance

if the CPPN was not evolved with varying sizes, but the approach is often surprisingly

robust.

While the focus of this section was on indirect encoding of teams, the area of collective

systems is a major focus in neuroevolution in general, as will be discussed in chapter 7.

The next section addresses one of the drawbacks of the original HyperNEAT formulation:

how to decide on the number and locations of hidden nodes automatically.

4.3.5 Evolvable Substrate HyperNEAT

While it is often clear how the locations of the inputs in a HyperNEAT substrate relate to the

output units and thus where they should be placed (e.g. the rangeﬁnders of a robot should

relate to the network’s outputs that control its movement), how to decide on the position

of the hidden nodes is less straightforward. A less obvious eﬀect is that requiring a hidden

node

𝑛

to be at position (

𝑎

𝑏

), as speciﬁed in the original HyperNEAT, inadvertently

demands that any weight pattern created by the CPPN must intersect exactly at position

(

𝑎

𝑏

) with the appropriate weights. This means the CPPN in HyperNEAT has to align the

correct weights precisely across all coordinates

(𝑎, 𝑏, 𝑥2, 𝑦2)

and

(𝑥1, 𝑦1, 𝑎, 𝑏)

. However,

this raises the question: why enforce such a random constraint on weight locations? The

CPPN might more easily represent the desired pattern slightly oﬀ the speciﬁed location,

but this would not work with the constraints set by the user.

These limitations are addressed by an extension of HyperNEAT, called evolvable

substrate HyperNEAT (ES-HyperNEAT) (Risi and Stanley, 2012b). The basic idea behind

ES-HyperNEAT is that the weight pattern generated by the CPPN should give some

indication of where the hidden nodes should be placed and how many there should be.

That is, areas in the 4D hypercube that contain a lot of information should result in more

points being chosen from these areas. Remember, each point in that 4-dimensional weight

space is a connection in two dimensions.

For example, take a hypercube whose weights are all uniform, meaning that CPPN(

𝑥1

𝑦1

𝑥2

𝑦2

) =

𝑘

for all diﬀerent input combinations; it would not make much sense to

express many connections if there is not much information in the underlying weight

pattern. On the other hand, if the variance of the weight pattern is high in some regions,

CHAPTER 4. INDIRECT ENCODINGS

Figure 4.14: Evolvable-Substrate HyperNEAT. (

𝑎

) Starting from the input nodes, ES-HyperNEAT

analyzes sequences of 2D slices through the hypercube weight pattern to discover areas of high

variance. This information is then used to determine which connections, and thereby nodes, should

be expressed. The approach continues from the discovered hidden nodes (

𝑏

) until some maximum

depth has been reached. (

𝑐

) Similarly, we start from the output nodes to determine to which hidden

nodes they should be connected. (

𝑑

) Once the approach has run a maximum number of iterations

or when no new nodes are discovered, the resulting ANN is pruned, removing any nodes that do

not connect to both the inputs and outputs of the network. Thus, ES-HyperNEAT is able to fully

determine the topology and weights of a neural network encoded by a CPPN. Figure from Risi and

Stanley (2012b).

it might indicate that there is more information available and thus more connections

should be expressed. In ES-HyperNEAT, if a connection is chosen to be expressed, the

nodes that it connects must therefore also be expressed. Which nodes to include thus

becomes implicit in the question, which connections to include from the inﬁnite set of

potential connections encoded by the CPPN. By making the number and location of nodes

depending on the CPPN-generated pattern, we give the system a łlanguagež, i.e. a way to

increase or decrease the number of connections (and thus nodes) and change their location

by varying the underlying pattern.

For this approach to work, it is useful to have a data structure that can represent space

at variable levels of granularity. One such data structure is the quadtree (Samet, 1984),

which has found successful applications in various ﬁelds, including pattern recognition

and image encoding, and partitions a two-dimensional space by recursively subdividing

it into four quadrants or regions. This process creates a subtree representation, where

each decomposed region becomes a descendant with the original region as the parent.

The recursive splitting continues until the desired resolution is achieved or until further

subdivision becomes unnecessary, indicating that additional resolution would not reveal

new information.

ES-HyperNEAT works as follows: For each input neuron at position (

𝑝1

𝑝2

), apply

CHAPTER 4. INDIRECT ENCODINGS

Outputs

Rangefinders

Radar

Bias

X1 Y1

X2 Y2

(𝑎) Generation 24

ANN: 30 n, 184 c

CPPN: 2 n, 9 c

ﬁtness = 0.85

Bias

X1 Y1

X2 Y2

(𝑏) Generation 30

ANN: 52 n, 280 c

CPPN: 3 n, 10 c

ﬁtness = 0.93

Bias

X1 Y1

X2 Y2

(𝑐) Generation 106

ANN: 42 n, 310 c

CPPN: 3 n, 10 c

ﬁtness=5.96

Bias

X1 Y1

X2 Y2

(𝑑) Generation 237

ANN: 40 n, 356 c

CPPN: 5 n, 18 c

ﬁtness = 10.00

Figure 4.15: ES-HyperNEAT example lineage. Shown are four milestones in one of the

maze solution lineages. The CPPN is shown at the top with the decoded neural network in the

middle (CPPN activation functions are G=Gaussian, A=absolute value, S=sigmoid, Si=sine). In

addition to the location of nodes, the CPPN also receives the length L of a connection as an

additional input. The resulting maze navigation behavior is shown at the bottom, together with

the number of connections and nodes in the neural network and in the CPPN. One can observe a

gradual growth in the complexity of the CPPN, which increases the information in the underlying

hypercube pattern and thus results in an increase in the number of ANN weights and neurons.

ES-HyperNEAT outperforms original HyperNEAT in this task because it can evolve networks with

limited connectivity, elaborate on existing network structure, and compensate for the movement of

information within the hypercube. Figure from Risi and Stanley (2012b).

the quadtree to analyze regions for their variance of the 2-dimensional sub-slice through

the hypercube spanned by CPPN(

𝑝1

𝑝2

𝑥2

𝑦2

) (ﬁgure 4.14). In areas of high variance,

as detected by the quadtree algorithm, connections and their corresponding nodes are

created. The process is then repeated from those discovered hidden nodes until some

maximum depth is reached, after which only the neurons are kept that have a path to an

input and output neuron. After this process is repeated for each input (and output) node,

the ANN is constructed and can be applied to the task at hand.

A good domain to evaluate this approach should test its ability to build and elaborate

on previously discovered stepping stones. While it is easy to see how a method such as

NEAT would be able to accomplish this task, it is less obvious how an indirect encoding

would fare. For example, the original HyperNEAT has the tendency to often produce

fully connected networks, which makes it harder to elaborate on intermediate milestones

since all connections are already used for the current partial solutions. On the other hand,

100

CHAPTER 4. INDIRECT ENCODINGS

ES-HyperNEAT should be able to do so because it can increase the number of nodes and

connections in the substrate.

One such task is called the hard maze, originally introduced to study more exploratory

search methods such as novelty search (section 5.3). Here, the agent has rangeﬁnder

sensors to detect walls and a pie-slice radar sensors that ﬁre when the goal is within the

agent’s corresponding pie-slice sensor (ﬁgure 4.15). To encourage the agent to discover

the intermediate stepping stones, the original task was modiﬁed to speciﬁcally reward the

agent for traversing the green way points (which are not visible to the agent).

As hypothesized, the original HyperNEAT indeed struggled with this task, and only

found solutions in 45% of 20 independent evolutionary runs. ES-HyperNEAT, on the

other hand, was able to ﬁnd a solution in 95% of all runs. As shown in ﬁgure 4.15, analysis

of an example lineage showed that ES-HyperNEAT was able to elaborate on previously

discovered stepping stones. This ﬁgure shows four milestone ANNs (middle row), together

with the underlying CPPN (top) and the resulting agent trajectory (bottom). Interestingly,

all the ANNs display common geometrical features, which were kept during evolution,

such as the symmetric network topology. While larger changes occur earlier in evolution,

the networks from generations 106 and 237 show a clear, holistic resemblance to each

other, with strong connections to the three output neurons. These results also demonstrate

that ES-HyperNEAT is able to encode a larger network with a compact CPPN. In fact, the

solution ANN with 40 hidden nodes and 256 connections was encoded by a CPPN with

only 5 nodes and 18 connections.

In addition to the maze navigation domain, the approach was also evaluated on a dual

task designed to test multimodal behavior. This task combined two independent scenarios:

(1) a navigation task, where the agent had to move from a starting point to a goal using

only its rangeﬁnder sensors to detect walls, and (2) a food-gathering task, where the agent

relied solely on pie-slice sensors acting as a compass to locate and collect randomly placed

food items. The agent’s ﬁtness was deﬁned as the average of its performance in both

tasks, and a solution required simultaneously solving both (i.e. navigating successfully

and collecting all food items)

The results showed that ES-HyperNEAT solved the dual task in all 20 runs, averaging

33 generations to success. By comparison, the best ﬁxed-substrate HyperNEAT setup

succeeded in only 13 of 20 runs. ES-HyperNEAT also produced more targeted connectivity

between neurons and did so with signiﬁcantly smaller CPPNs, indicating both greater

eﬃciency and better support for multimodal problem-solving than the original HyperNEAT

approach.

4.3.6 General Hypernetworks and Dynamic Indirect Encodings

HyperNEAT and its variations are particular examples of a family of algorithms now

called hypernetworks (Ha, A. Dai, and Le, 2017). Hypernetworks generalize HyperNEAT

to any approach in which one network (termed the hypernetwork) generates the weights of

another target neural network. The hypernetwork is typically a smaller network designed

to learn a mapping from a low-dimensional input space to the high-dimensional weight

space of the target network. The target network is the actual network that performs the

main task, such as classiﬁcation, regression, or controlling an agent. Pioneering work

101

CHAPTER 4. INDIRECT ENCODINGS

on hypernetworks goes back to the early 90s, where Schmidhuber (1992) introduced the

idea of fast weight programmers, where a łslowž neural network trained through gradient

descent learned the łfastž weights of another network.

Mathematically, given an input

𝑥

to the target network, a hypernetwork

𝐻

takes an

auxiliary input

𝑧

and outputs the weights

𝜃

𝑇 𝑁

for the target network. This relationship is

expressed as

𝜃

𝑇 𝑁

= 𝐻 (𝑧)

. The target network then uses these weights to perform its task,

represented as

𝑦 = 𝑇 (𝑥; 𝜃

𝑇 𝑁

)

, where

𝑥

is the input to the target network,

𝑧

is the auxiliary

input to the hypernetwork,

𝜃

𝑇 𝑁

are the weights generated by the hypernetwork, and

𝑦

the output of the target network.

In the previous section on HyperNEAT, we saw a special case of such a hypernetwork,

i.e. one that was geometrically-aware, i.e. the auxiliary inputs

(𝑥, 𝑦)

gave nodes locations

in space, and which was trained through NEAT. Other approaches, such as compressed

network search (Koutník, Gomez, and Schmidhuber, 2010) do not employ CPPN-NEAT

but instead use discrete cosine transformations (DCS) to compress the weights of a larger

weight matrix into a smaller number of DCS coeﬃcients, resembling the popular JPEG

compression. It is also possible to combine evolving the neural architecture with gradient-

based weight training, which was demonstrated in an approach called diﬀerentiable pattern

producing networks (DPPNs; Fernando, Banarse, M. Reynolds, et al., 2016).

Building on these earlier ideas, modern variants of hypernetworks can also be trained

end-to-end through a gradient-descent-based training approach (Ha, A. Dai, and Le,

2017). This work strikes a balance between the compressed network search approach,

where a DCS prior limits the type of weight matrices that can be produced, and the

HyperNEAT approach, which requires evolving both the architecture and weights through

NEAT (adding signiﬁcant complexity for many practical problems). These hypernetworks

generate the weights of feedforward networks one layer at a time by conditioning the

hypernetwork on the speciﬁc layer embedding (ﬁgure 4.16). Layer embeddings can either

be ﬁxed or they can also be learned, allowing the system itself to learn approximate weight

sharing within and across layers. This approach was able to produce the weights for a

deep convolutional network for CIFAR-10 classiﬁcation, with only a small decrease in

classiﬁcation accuracy but a drastic reduction in the number of trainable model parameters.

Interestingly, when applying the hypernetwork approach to create the weights for a target

network that was fully-connected, it was able to learn convolutional-like ﬁlters when the

location of the target weight and the 𝑥, 𝑦 location of each input pixel was provided.

Importantly, hypernetworks oﬀer the intriguing ability to serve as a dynamic indirect

encoding, in which the produced weight pattern is allowed to change over time and made

dependent on the inputs for the task at hand. For example, a hypernetwork could be trained

to produce the weights of an RNN target network for handwriting sequence generation,

which would change over time and be dependent on the agent’s internal state and the inputs

(the previous output of the RNN) (ﬁgure 4.17). In other words, the hypernetwork was

taking a low-dimensional representation of the input character and the hidden state of the

RNN as inputs, outputting the weights for the next prediction step. This approach allowed

the RNN to dynamically adapt its parameters based on the current context and is a good

demonstration of how concepts from neuroevolution are being eﬀectively combined with

those from the traditional machine learning ﬁeld.

102

CHAPTER 4. INDIRECT ENCODINGS

Figure 4.16: St atic hypernetwork. In this example, the hypernetwork (shown in orange) generates

the weights of each layer of the main network (shown in black) by conditioning the network on layer

embeddings. These embeddings are treated as learnable parameters optimized during training. In

this manner, they enable approximate weight sharing both within and across layers of the main

network. Figure from Ha, A. Dai, and Le (2017).

In summary, hypernetwork-like approaches can signiﬁcantly reduce the number of

trainable parameters while still performing well across diﬀerent domains. However, it

is also clear that their full potential hasn’t been fully realized yet and likely depends

on combining these techniques with more open-ended search methods (chapter

9) and

with life-time learning approaches (chapter 12) that can take advantage of the encoded

regularities for fast adaptation.

The concept of dynamic indirect encodings is closely linked to neural self-attention,

which will be explored in the next section. Self-attention has served as the foundation for

many recent breakthroughs in deep learning, most notably the transformer architecture.

In this approach, larger input-dependent weight matrices are created through the outer

product of two smaller matrices called keys and values. As will be seen in the next section,

this type of indirect encoding allows encoding matr ix

of size

𝑂(𝑛

)

using only

𝑂(𝑑)

number of genotype parameters.

4.4 Self-attention as Dynamic Indirect Encoding

In the preceding section, we explored the concept of hypernetworks, illustrating their role

as indirect encoding methods where one neural network, the hypernetwork, generates

the weights for another network, termed the target network. Typically, hypernetworks

generate these weights without directly considering the speciﬁc input

𝑥

to the target

network. Transitioning from this, we introduce the concept of self-attention mechanisms,

which embody a sophisticated method of dynamically generating contextual relationships

within data. Unlike hypernetworks, self-attention mechanisms inherently account for the

input

𝑥

during the processing phase, tailoring the computational focus in a data-driven

manner. This capability not only allows self-attention to act as a form of indirect encoding

but also enhances it to be a dynamic encoding process. The dynamic nature arises from

its ability to adjust the internal model representations in response to the particularities of

103

CHAPTER 4. INDIRECT ENCODINGS

Figure 4.17: Application of dynamic hypernetworks for handwriting sequence generation.

In the dynamic indirect encoding approach, the hypernetwork takes as input the internal state of

the neural network and its previous action to dynamically generate the weights of the RNN target

network (shown as four diﬀerent colors). In this manner, the dynamic hypernetwork approach

enables the model to adapt its parameters on the ŕy, allowing for highly ŕexible and context-aware

handwriting generation. Figure from Ha, A. Dai, and Le (2017).

the input data at any given moment, thereby oﬀering a more ŕexible and context-aware

approach to encoding information.

4.4.1 Background on Self-Attention

The attention mechanism (Vaswani, Shazeer, Parmar, et al., 2017), a groundbreaking

innovation in the ﬁeld of neural networks, particularly in natural language processing, has

revolutionized how models handle and interpret sequential data like text and time series.

At its core, attention allows a model to focus on diﬀerent parts of the input sequence

when producing each part of the output sequence, mimicking the human cognitive process

of focusing more on certain aspects while perceiving or processing information. The

introduction of attention mechanisms in transformer-based architectures like LLMs has

led to substantial improvements in various complex tasks in language understanding and

generation.

While modern attention mechanisms can adopt various conﬁgurations, including

positional encoding and scaling, their fundamental concept can be described by the

following equations:

𝐴 = softmax



√

𝑑

(𝑋

𝑊

)(𝑋

𝑊

)

⊺



(4.3)

𝑌 = 𝐴 × (𝑋

𝑊

) (4.4)

where

𝑊

, 𝑊

∈ R

𝑑

×𝑑

are the matrices that map the input matrix

𝑋 ∈ R

𝑛×𝑑

components called query, key and value (i.e.,

query = 𝑋

𝑊

key = 𝑋

𝑊

value = 𝑋

𝑊

Since the average value of the dot product grows with the vector’s dimension, each entry

in the query and the key matrices can be disproportionally too large if

𝑑

is large. To

counter this, the factor

√

𝑑

is used to normalize the inputs. The attention matrix

𝐴 ∈ R

𝑛×𝑛

is obtained by applying a nonlinear activation function, typically a

softmax

operation, to

each row of the matrix. This mechanism is referred to as self-attention when

𝑋

= 𝑋

;

otherwise it is known as cross-attention.

104

CHAPTER 4. INDIRECT ENCODINGS

4.4.2 Self-Attention as a Form of Indirect Encoding

As we described previously, indirect encoding methods represent the weights of a neural

network, the phenotype, with a smaller set of genotype parameters. How a genotype

encodes a larger solution space is deﬁned by the indirect encoding algorithm. As we

have seen, HyperNEAT encodes the weights of a large network via a coordination-based

CPPN-NEAT, while compressed network search (Koutník, Cuccu, Schmidhuber, et al.,

2013) uses the discrete cosine transform (DCT) to compress the weights of a large weight

matrix into a small number of DCT coeﬃcients, similar to JPEG compression. Due to

compression, the space of possible weights that an indirect encoding scheme can produce

is only a small subspace of all possible combinations of weights. The constraint on

the solution space resulting from indirect encoding enforces an inductive bias into the

phenotype. While this bias determines the types of tasks that the network is naturally suited

to doing, it also restricts the network to a subset of all possible tasks that an unconstrained

phenotype can (in theory) perform.

Similarly, self-attention enforces a structure on the attention weight matrix

𝐴

that

makes it also input-dependent. If we remove the query and the key transformation matrices,

the outer product

𝑋

⊺

deﬁnes an association matrix where the elements are large when

two distinct input terms are in agreement. This type of structure forced in

𝐴

has been

shown to be suited for associative tasks where the downstream agent has to learn the

relationship between unrelated items. If this sounds familiar, this is not surprising; we

have seen a similar mechanism already in Hebbian learning (section 4.2.3). Self-attention

and Hebbian learning both emphasize correlation and amplify related signals: Hebbian

through permanent weight changes, attention through temporary, context-dependent

weights. The similarity matrix in attention acts like a Hebbian correlation matrix, but

instead of structural updates, attention applies these correlations on the ŕy, making it a

dynamic mechanism.

Because the outer product

𝑋

⊺

has no free parameters, the corresponding matrix

𝐴

will not be suitable for arbitrary tasks beyond association. The role of the small query and

key transformation matrices (i.e.,

𝑊

and

𝑊

) allows

𝐴

to be modiﬁed for the task at hand.

𝑊

and

𝑊

can therefore be viewed as the genotype of this indirect encoding method.

𝑊

, 𝑊

∈ R

𝑑

×𝑑

are the matrices that contain the free parameters and

𝑑

is a constant

depending on the inputs. The number of free parameters in self-attention is therefore in

the order of

𝑂(𝑑)

, while the number of parameters in

𝐴

is in the order of

𝑂(𝑛

)

. This

form of indirect encoding allows us to represent the phenotype with a much smaller set of

trainable genotype parameters. Additionally, this type of indirect encoding dynamically

adapts to various inputs.

Building on the concepts discussed in the previous section, we formulated the output

of a hypernetwork

𝐻

𝜃

𝑇 𝑁

= 𝐻 (𝑧)

where

𝜃

𝑇 𝑁

are the parameters for a target network

(TN) and

𝑧

is an auxiliary input (e.g. layer index). In a similar vein, self-attention can be

conceptualized as

𝜃

𝑇 𝑁

= 𝑆 𝐴(𝑥)

where

𝑥

is the target network’s input. This adaptation

allows for a more ŕexible and responsive model conﬁguration, tailored to speciﬁc input

characteristics and demands.

Furthermore, the aforementioned dynamic adaptation mechanism in self-attention,

which allows real-time modulation of connection strengths based on input, also echoes the

105

CHAPTER 4. INDIRECT ENCODINGS

concept of fast weights (Schmidhuber, 1992), where the idea of rapidly adaptable weights

that could temporarily store information over short sequences was introduced. Similarly,

self-attention leverages dynamic encoding to adjust the attention matrix

𝐴

, eﬀectively

using

𝑊

and

𝑊

to reshape the network’s responses based on the input characteristics.

This adaptability is critical for tasks where the relevance of speciﬁc input features varies

markedly across contexts, akin to how fast weights facilitate short-term synaptic plasticity

for rapid learning adaptation.

This comparison between attention mechanisms and classical indirect encoding

suggests that both approaches may be tapping into a shared underlying principle. That

is, the use of compact and ŕexible representations to dynamically generate context-

sensitive behavior. While attention mechanisms were developed independently within

the supervised learning paradigm and indirect encodings grew out of evolutionary and

biological inspirations, their convergence reŕects a broader computational strategy, which

aims to reduce dimensionality while retaining expressiveness and adaptability. Rather

than being entirely distinct, these approaches may represent complementary rediscoveries

of a general design principle.

4.4.3 Self-Attention Based Agents

AttentionAgent (Tang, D. Nguyen, and Ha, 2020) is inspired by the concept of inattentional

blindnessÐa phenomenon where the brain, when engaged in eﬀortful tasks, focuses its

attention on task-relevant elements while temporarily ignoring other stimuli. Leveraging

this principle, the agent employs an attention-based mechanism for video game play,

improving interpretability through pixel-space reasoning, as illustrated in ﬁgure 4.18.

This approach is grounded in self-attention (speciﬁcally,

𝑋

= 𝑋

), with cropped game

screen image patches serving as inputs. Key modiﬁcations to the attention mechanism in

AttentionAgent include: (1) condensing the attention matrix into an importance vector,

and (2) omitting the value component in favor of extracting the top-

𝑘

(

𝑘 = 10

in the paper)

most signiﬁcant patch features as the output

𝑌

. This extraction is achieved through sorting

and pruning, detailed in ﬁgure 4.19 and the paragraphs below.

Concretely speaking, given an input game screen, AttentionAgent segments the input

image into small square patches in a fashion similar to how a 2D convolution layer works.

It then ŕattens these patches and treats the output with shape

𝑁 × 𝐶 𝑀

as the input

𝑋 ∈ R

𝑛×𝑑

(ﬁgure 4.19, left). Here

𝑁

is the number of patches,

𝐶

is the number of

channels in the image, and

𝑀

is the length/width of each patch; therefore

𝑛 = 𝑁

and

𝑑

= 𝐶 𝑀

Upon receiving this transformed data, the self-attention module follows the equations

we mentioned above to get the attention matrix

𝐴

of shape

(𝑁, 𝑁)

. After the softmax,

each row in

𝐴

sums to one, so the attention matr ix can be viewed as the results from a

voting mechanism between the patches. If each patch can distribute fractions of a total

of 1 vote to other patches (including itself), row

𝑖

thus shows how patch

𝑖

has voted, and

column

𝑗

gives the votes that patch

𝑗

acquired from others. In this interpretation, entry

(𝑖, 𝑗)

𝐴

is regarded as how important patch

𝑗

is from patch

𝑖

’s perspective. Taking

sums along the columns of

𝐴

results in a vector that summarizes the total votes acquired

by each patch, and this vector is called the patch importance vector (ﬁgure 4.19, middle).

106

CHAPTER 4. INDIRECT ENCODINGS

Figure 4.18: Demonstrating indirect encoding in AttentionAgent for enhanced interpretability.

White patches on the game screens signify the agent’s focus areas, with their opacity indicating the

relative importance of each patch. The approach was tested on two games. (

𝑡𝑜𝑝

) CarRacing-v0

requires top-down car racing from a pixel-observation environment. (

𝑏𝑜𝑡𝑡𝑜𝑚

) In the Doom-

TakeCover environment, enemy monsters spawn randomly along the opposite wall and shoot

ﬁreballs, which the player has to learn to avoid. Agents are able to selectively focus on a small,

survival-critical portion of their visual input, resulting in interpretable agents that are both compact

and more generalizable. In CarRacing, the agent primarily attends to road boundaries but shifts its

focus to upcoming turns before adjusting its heading. In DoomTakeCover, the agent concentrates

on ﬁreballs and monsters, aligning well with human intuition. Figure from Tang, D. Nguyen, and

Ha (2020). Videos at https://neuroevolutionbook.com/demos.

Unlike the self-attention we introduced earlier, AttentionAgent relies solely on the patch

importance vector and does not utilize the value component of self-attention.

Finally, based on the patch importance vector, AttentionAgent picks the

𝐾

patches

with the highest importance and throws away the rest. It passes the indices of these

𝐾

patches into a feature retrieval function, which returns the features extracted from the

corresponding patches. These features are then fed into a neural network-based controller

to output the appropriate actions the agent should take (ﬁgure 4.19, right). By discarding

patches of low importance, AttentionAgent becomes temporarily blind to other signals,

which eﬀectively creates a bottleneck that forces it to focus on patches only if they are

critical to the task. Once learned, it is possible to visualize the

𝐾

patches and have the

agent’s reasoning interpreted in the pixel space. Given the non-diﬀerentiable nature of the

sorting and the pruning operations, AttentionAgent is optimized using CMA-ES.

The major building block of AttentionAgent is the self-attention mechanism. Although

slightly modiﬁed in that context (i.e. the value component is not utilized), as we have

107

CHAPTER 4. INDIRECT ENCODINGS

Figure 4.19: Method overview of AttentionAgent. Key modiﬁcations to the attention mechanism

include (1) condensing the attention matrix into an important vector, and (2) omitting the value

component in favor of extracting the top-

𝑘

most signiﬁcant patch features as the output

𝑌

. In this

manner, the architecture allows the agent to focus on information that is critical to the task at hand.

Figure from Tang, D. Nguyen, and Ha (2020).

established previously, the indirect-encoding nature of the mechanism remains the same.

More explicitly, the patch importance vector is based on the attention matrix

𝐴

, which is

the phenotype that is controlled by the two parameter matrices 𝑊

, 𝑊

, the genotype.

The advantages of employing indirect encoding in this context are clear: First, for

an input image of size

𝑛

(which can be substantial, e.g. 100px

100px, translating to

tens of thousands of pixels), the attention matrix spans a space of size

𝑂(𝑛

)

. Conversely,

𝑊

, 𝑊

transition image patches from

𝑑

= 3

(representing RGB colors) to a lower feature

dimension

𝑑 ≪ 𝑛

, resulting in a more manageable size of

𝑂(𝑑)

. Despite this signiﬁcant

reduction in representation space, the inductive bias inherent in the model’s design enables

the genotype to eﬀectively map to a set of phenotypes that are pertinent to the task at hand.

The AttentionAgent approach was evaluated on two tasks. The ﬁrst one is CarRacing-

v0, a 2D continuous control benchmark: the agent must drive through procedurally

generated tracks from a top-down perspective. The car is controlled with three continuous

commands (gas, steer, brake). The game provides 64

64 RGB image at each time

step. The agent is rewarded for covering track tiles eﬃciently while minimizing time

and avoiding leaving the track. The second task is DoomTakeCover, a 3D ﬁrst-person

survival challenge that is part of the VizDoom open

source AI research platform (Kempka,

Wydmuch, Runc, et al., 2016), repurposing the classic video game Doom (id Software,

1993). In this task, the agent views the world from a ﬁrst-person 3D perspective and must

survive by dodging ﬁreballs launched by monsters. As time progresses, more monsters

appear, with the episode ending when the player dies. The only actions available are

straﬁng left, right, or standing still, and the agent receives a small reward (+1) for every

frame it stays alive. The visual input again consists of 64×64 RGB images.

AttentionAgent was able to solve these complex problems with only a few thousand

parameters, unlike other methods, which may require hundreds of thousands or even

millions of parameters. The dynamic adaptive capability of self-attention allowed

AttentionAgent to ŕexibly adjust its decision-making based on the received inputs,

resulting in more robust decisions that are not susceptible to external distractions such as

108

CHAPTER 4. INDIRECT ENCODINGS

Figure 4.20: Visual variations to the CarRacing and VizDoom:TakeCover environments.

The original domains are shown on the left. Diﬀerent modiﬁcations are shown to the right. The

CarRacing environments were modiﬁed with (1) color perturbation, (2) vertical frames, and (3) a

background blob. The VizDoom: TakeCover environments were modiﬁed with (1) higher walls,

(2) diﬀerent ŕoor texture, and (3) hovering text. Because of the dynamic adaptive capability of

self-attention, the AttentionAgent is unaﬀected by these diﬀerent types of external distractions.

Figure from Tang, D. Nguyen, and Ha (2020).

Table 4.1: Comparison of Attention Mechanism and Classical Indirect Encoding.

Feature Attention Mechanism Indirect Encoding

Representation Relationships as dynamic weights Rules or compressed instructions

Scalability Scales with input length Scales with system complexity

Decoding Process Weighted sum for context vectors Generative or constructive process

Abstraction Focus Relevant relationships dynamically High-level patterns / reusable modules

changed background colors or hovering text on the screen (see ﬁgure 4.20 for examples).

To summarize, the attention mechanism exempliﬁes the principles of indirect encoding

by representing relationships and interactions in a compact, abstract manner. Instead of

explicitly modeling all possible connections within an input, attention dynamically encodes

relevance through weights that guide the construction of context-sensitive representations.

This mechanism shares key attributes with classical indirect encoding, such as scalability,

generalization, and adaptability, making it a modern realization of these longstanding

principles. Table

4.1 summarizes the comparison, which highlights how attention

encapsulates the essence of indirect encoding while introducing innovations tailored to

modern ML problems.

In progressing through the book, it becomes clear that the same underlying concepts,

such as encoding principles, can be manifested in diverse ways across diﬀerent systems.

Just as indirect encoding enables the discovery of varied designs in evolutionary systems,

ML methods can also beneﬁt from mechanisms that foster diversity in representations and

109

CHAPTER 4. INDIRECT ENCODINGS

solutions, which is the topic of the next chapter.

4.5 Chapter Review Questions

Direct vs. Indirect Encoding: What is the primary diﬀerence between direct

and indirect encodings in neuroevolution? Why is indirect encoding particularly

advantageous for tasks requiring large and complex neural networks?

Biological Analogy: How does the process of morphogenesis in biology inspire the

concept of indirect encodings in neuroevolution? Provide an example of a biological

principle that aligns with the goals of indirect encoding.

Regularity in Neural Networks: Why is the concept of regularity, such as

symmetry and repetition with variation, important in indirect encodings? How does

this principle enhance the eﬃciency and functionality of evolved solutions?

Applications of Indirect Encodings: How can indirect encodings be applied to a

task such as evolving a quadrupedal robot controller? Discuss how they can utilize

patterns and symmetries without manual intervention.

Challenges of Direct Encoding: Why is NEAT limited to smaller networks, and

how do indirect encodings address this limitation? Provide an example illustrating

how indirect encodings can simplify the representation of a complex neural network.

Hypernetworks Overview: What distinguishes hypernetworks from traditional

local interaction-based indirect encodings? How does the "one-shot" generation of

phenotypes make hypernetworks diﬀerent from development-based approaches?

CPPNs in Neuroevolution: How do CPPNs leverage geometric space and function

composition to generate complex patterns? Provide an example of a regularity that

CPPNs can encode eﬀectively.

HyperNEAT Substrate: Explain how HyperNEAT utilizes neuron positions in a

geometric space to generate connectivity patterns. Why is this approach particularly

advantageous for tasks involving spatial regularities like controlling a quadrupedal

robot?

Strengths and Limitations: In what types of tasks do HyperNEAT and CPPNs

perform better compared to direct encodings like NEAT? Conversely, what are the

limitations of these indirect encodings when applied to irregular or noisy domains?

10.

Self-attention: Describe the relationship between self-attention and indirect

encodings. How does the AttentionAgent leverage this principle to process high-

dimensional visual input eﬃciently and interpretably? What advantages does this

indirect encoding approach oﬀer in terms of parameter eﬃciency and robustness?

110

Chapter 5

Utilizing Diversity

A most remarkable outcome of biological evolution is the tremendous diversity of solutions

it has produced. There is life in a large variety of environments: organisms thrive in

extreme heat, and cold, thin atmosphere and deep ocean pressure, on large and small scales,

based on a variety of energy sources and chemical building blocks. The mechanisms that

produce such diversity make it possible to both construct complex solutions over time and

to adapt to the changing world. As a matter of fact, a new challenge can often be met by

small modiﬁcations to already existing solutions, leading to the observation that evolution

is a tinkerer (F. Jacob, 1977).

The same is true of computational evolution: generating and maintaining diversity

makes it possible to solve harder problems. Diversity does not arise naturally in most

evolutionary methods but requires special mechanisms. Such methods usually focus on

genetic diversity; however, with neuroevolution, behavioral diversity has an important

role as well. This perspective leads to methods of balancing performance and diversity

objectives, as will be discussed in this chapter.

5.1 Genetic Diversity

Evolutionary computation is often formalized as a process of ﬁnding an optimum in a

ﬁtness landscape. The process starts with an initial population that is widespread on the

landscape and gradually converges around the highest peaks in it. In this sense, loss of

diversity is an essential par t of the process: It allows allocating search resources where

they matter the most, eventually reﬁning the solutions so that the best ones can be found

reliably and accurately.

However, the process may sometimes converge too soon, before all the promising peak

areas have been discovered. Some of the best solutions may have narrow basins and may

thus be missed. Such premature convergence is diﬃcult to detect and guard against. Also,

if the problem is dynamic, i.e. the ﬁtness landscape changes over time, the converged

population cannot keep up. Once the population has converged, there is little hope of

ﬁnding anything better, or anything new.

The reason is that the most powerful and unique mechanism of evolutionar y search,

recombination, no longer works in a converged population. If all solutions are similar,

111

CHAPTER 5. UTILIZING DIVERSITY

recombining them generates nothing new, and progress stops. Mutation still remains, and

can in principle create new material. However, without an eﬀective crossover, the process

is essentially reduced to random search.

Thus, most evolutionary computation methods today are in direct conŕict with diversity.

The methods aim at making progress in a greedy manner, with a strong selection that

converges quickly. As will be discussed in section 9.1.1, this is not the case in biological

evolution. The selection is weak; many genetic changes are neutral and remain in the

population for a long time. Slowing down the process in this manner may result in more

diversity and creativity. This is also an option in evolutionary computation, but it has not

yet been fully explored. Taking advantage of weak selection, neutrality, and deep time is

an interesting direction for the future.

The simplest approach to coping with premature convergence is to increase the

mutation rate. If it is done early enough, it may give crossover enough material to operate.

However, this material is essentially random and, at large enough levels, will undermine

evolutionary search. Another straightforward approach is to extend the current population

with an archive of representative past individuals. The archive ensures that diversity is not

lost, but it is infeasible to grow the archive indeﬁnitely, and it is diﬃcult to decide which

individuals should be included in it.

Another brute-force but eﬀective approach is delta-coding (Gomez and Miikkulainen,

1997; Whitley, Mathias, and Fitzhorn, 1991). If evolution stagnates with no further

increases in ﬁtness, the current population champion is used to create a population of

-chromosomes, i.e. diﬀerences from the current best solution. This population is then

evolved further, with solutions formed by adding the

-values to the best solutions.

Delta-coding can be applied multiple times, with successive populations representing

diﬀerences from the previous best solution. Thus, if evolution stagnates due to premature

convergence, delta-coding may get it moving again.

In this manner, evolutionary computation relies on mechanisms that are added to search

for the purpose of maintaining diversity. The ﬁrst challenge in building such mechanisms

is to measure diversity. At the level of genetic encodings, it is often possible through a

distance metric between genomes. They are often vectors of values, so Euclidean distance

(L2) is often suﬃcient. Manhattan distance (L1), Hamming distance, or edit distance, may

also work in various cases. With such a distance metric, diversity can be measured as the

average distance between genomes in the population.

Diversity measures can be further focused on a local area of the space, or

𝑘

nearest

neighbors. Such an approach is useful in case it is important to identify which individuals

in the population contribute to diversity more than othersÐthose individuals can then be

kept in the population or the archive longer.

Several methods have been developed to take advantage of these measures. In crowding

(De Jong, 1975), new individuals are allowed to replace existing individuals that are

similar to them, or their parents. Note that this mechanism does not drive the creation of

diversity, but slows down convergence: it is not as easy for similar individuals to take over

the population.

Section 3.3 on NEAT already described one mechanism that can help promote diversity:

ﬁtness sharing. In ﬁtness sharing (Goldberg and Richardson, 1987), the actual ﬁtness of

112

CHAPTER 5. UTILIZING DIVERSITY

an individual is adjusted based on how similar it is to other individuals in the population.

More speciﬁcally, the ﬁtness 𝑓 (𝑥) of individual 𝑥 is adjusted by

𝑓

′

(𝑥) =

𝑓 (𝑥)

𝑠(𝑥)

. (5.1)

The similarity metric 𝑠 is e.g.

𝑠(𝑥) =

𝑛



𝑗=1

𝑑(𝑥, 𝑦

𝑗

), (5.2)

where the distance

𝑑(𝑥, 𝑦

𝑗

)

is taken over all

𝑛

members

𝑦

𝑗

of the population. In this

manner, the ﬁtness is reduced for individuals that are similar to many other individuals

in the population. The adjustment makes them less likely to be chosen as parents and

more likely to be discarded, thus slowing down convergence. The similarity metric is

expensive to calculate. It can be made more practical by reducing the calculation to a local

neighborhood, or to a sampling of the population.

Fitness sharing in some domains can be implemented implicitly, avoiding the extra

computation. In particular in cooperative coevolution (discussed in detail in section 7.1),

solutions are constructed by combining individual population members into a single

structure, such as a neural network composed of several neurons (Moriarty and Miikku-

lainen, 1997; Potter and De Jong, 2000). The entire solution is evaluated for ﬁtness; the

individual’s ﬁtness is the average ﬁtness of all solutions in which it participated. It turns

out that good solutions are usually composed of diverse individuals. If, for instance, a

neural network is put together from a single neuron cloned many times, it would likely

not per form well. Thus, evolution in cooperative coevolution populations maintains

diversity as part of the evolution process itself. If one kind of neuron starts taking over

the population, it will be selected too many times for the network, the network performs

poorly, the neuron receives lower ﬁtness, is likely to be discarded, and diversity returns.

Thus, by making diversity implicitly part of the ﬁtness evaluation, it can be maintained

automatically.

Further, when evolving neural networks, genetic diversity is often less important than

the diversity of the behavior the networks generate. This perspective will be discussed

next.

5.2 Behavioral Diversity

It is important to maintain genetic diversity in evolution so that the search process can

cover enough of the search space to ﬁnd good solutions, and can adapt to any changes

in the landscape. This goal is important in neuroevolution as well, and genetic diversity

maintenance methods are useful in it. However, neuroevolution is diﬀerent from many other

types of evolutionary optimization in that it aims to construct computational structures, i.e.

neural networks, rather than static solutions. It is important that the behaviors of those

networks are diverse as well. In many such domains, the ﬁtness landscapes are deceptive,

i.e. the highest peaks are surrounded by valleys, or they are ŕat, i.e. many diﬀerent

113

CHAPTER 5. UTILIZING DIVERSITY

behaviors lead to similar ﬁtness. Methods that rely on hill-climbing, i.e. incremental

improvement through small changes, such as reinforcement learning and mutation-based

search, struggle in such domains. They are diﬃcult for neuroevolution as well, but search

based on behavioral diversity makes it more eﬀective.

Creating and maintaining genetic diversity does not necessarily lead to diverse

behaviors. The reason is that the mapping between the genotype and behavior is complex

and unpredictable. First, the same behavior can be encoded by very diﬀerent neural

networks. One example of this phenomenon is competing conventions, which we already

encountered in section 3.3.1: The same neurons and weights in the network are encoded

in a diﬀerent order in the genome. As a result, the networks function exactly the same,

but the encodings have no overlap, i.e. are maximally diverse. Second, a small change in

the encoding can have a large eﬀect on the behavior. Negating an activation function, for

example, may cause the robot to turn left instead of right. Genetic diversity is thus not a

good indicator of behavioral diversity.

Evolution of behaviors still takes place at the level of encodings, of course, and the

genetic diversity needs to be maintained to prevent convergence. However, the mechanisms

for measuring, maintaining, and creating behavioral diversity are quite diﬀerent, resulting

in fundamentally diﬀerent evolutionary processes.

Whereas genetic diversity could be measured in a relatively straightforward manner

based on the distance between encodings, behavioral diversity is more complex. First,

behavior needs to be character ized formally, taking into account what matters in the

domain. This often involves creating a vector representation of the behavior, or a behavior

characterization (BC; Lehman and Stanley, 2011a; Mouret and Doncieux, 2012). For

instance, for a mobile robot, the BC could consist of a histogram of the sensory inputs,

actions, and locations encountered during a number of sample runs. More generally, a

collection of possible inputs to the network could be created, and the outputs corresponding

to each of these inputs taken as the BC. If domain knowledge is not available, they can be

generated randomly. With domain knowledge, it may be possible to deﬁne a collection of

situations that forms a representative sample, or better yet, a sample of the most important

decision points in the domain, thus creating a more meaningful BC (Gomes, Urbano, and

Christensen, 2013; Lehman and Stanley, 2011a; Mouret and Doncieux, 2012; Stanley and

Lehman, 2015).

It is diﬃcult to form such a BC for recurrent neural networks where not only the

current inputs matter, but also the history of the preceding inputs and actions. A common

approach is to represent the actions as distributions, and the BC as a mapping: for a

set of sensory states, it speciﬁes the distribution of actions the agent is likely to take.

Interestingly, with such a representation, it is possible to learn optimal BCs (Meyerson,

Lehman, and Miikkulainen, 2016) for a set of multiple tasks in the same domain, such

as robot navigation in multiple mazes. The BCs are adapted so that they represent the

distributions of optimally behaving agents in known tasks, forming a powerful foundation

for evolution of optimal behavior in new tasks.

Once a BC has been deﬁned, the next step is to measure diversity among them. As

in the case of genetic diversity, calculating the average distance between individuals is

a common approach. A more formal way is to utilize entropy, an information-theoretic

114

CHAPTER 5. UTILIZING DIVERSITY

concept that measures the level of surprise or uncertainty in the outcomes of a random

variable. Intelligent behavior in general can be descr ibed as resulting from entropy

maximization (Wissner-Gross and Freer, 2013). In evolutionary computation, it can be

applied to the behavior of an agent or a population of agents, thus describing how diverse

they are. For instance, the behavioral space can be divided into discrete intervals, and

the number of agents visiting each interval counted (Kang, Bei, Shen, et al., 2021). The

entropy of this distribution then measures the behavioral diversity of the population.

The information-theoretic approach can be developed fur ther to measure empowerment,

i.e. the ability of an agent to control its world (Salge, Glackin, and Polani, 2014).

Empowerment can be deﬁned as the channel capacity between the agent’s actuators

𝐴

𝑡

time 𝑡 and its sensors 𝑆

𝑡+1

at the next time step:

𝐸 = max

𝑝 (𝑎

𝑡

)

𝐼 (𝑆

𝑡+1

; 𝐴

𝑡

), (5.3)

where

𝑝(𝑎

𝑡

)

is the probability of actuator value

𝑎

𝑡

at time

𝑡

and

𝐼 (𝑆; 𝐴)

is the mutual

information between 𝑆 and 𝐴, i.e.

𝐼 (𝑆; 𝐴) = 𝐻 (𝐴) − 𝐻(𝐴|𝑆) = 𝐻 (𝑆) − 𝐻 (𝑆|𝐴), (5.4)

where

𝐻(𝑋)

is the entropy of

𝑋

. The

𝐼 (𝑆; 𝐴)

thus measures how much of the state entropy

measure above can be explained by actions. The resulting metric, channel capacity, stands

for the maximum rate of information transmission from

𝐴

𝑆

. In essence, empowerment

𝐸

thus measures the causal inŕuence of the agent’s actions on its future sensory inputs,

i.e. how much power the agent has in changing the world it perceives. Empowerment is

a useful concept in many ways. It is possible to characterize the evolution of intelligent

agents as a process that maximizes empowerment. Similarly, the evolved agents then

behave in order to maximize their empowerment. Such behavior provides the agents an

intrinsic motivation that results in various goal-oriented behaviors.

Empowerment is thus a general theory of evolution of intelligent behavior. It measures

a general desirable quality of an evolved agent and can be used as an explicit evolutionary

objective. While it does not measure diversity directly, it often correlates with it. Similarly

to implicit ﬁtness sharing described in the previous section, empowerment favors actions

that have a large impact, regardless of other objectives. In that sense, it often serves to

diversify the set of actions that are available for the agents, and thereby leads to diverse

behaviors.

As an example of behavioral diversity at work, consider a task for an evolutionary robot

that moves around in an environment where seven lights are on or oﬀ in ﬁxed locations

(ﬁgure 5.1; Mouret and Doncieux, 2009). The robot can sense each light, and it can move

around by controlling its two wheels. When it steps on a light, one or two other lights tur n

on. The task is to discover how to turn on light 6. In the beginning, only light 0 is on. To

turn on light 6, it has to ﬁrst go to light 0, then to 4, 5, and 6; or else, go to lights 0, 1, 3,

4, 5, and 6. Fitness is deﬁned as the number of time steps to reach light 6; thus, unless

the robot is successful, it receives no ﬁtness and no indication of whether its behavior is

promising. It is therefore very diﬃcult to discover successful behavior based on ﬁtness

only. Therefore, the evolutionary search for the optimal behavior does not even get started.

115

CHAPTER 5. UTILIZING DIVERSITY

(𝑎) Controller (𝑏) Full light sequence (𝑐) Discovered sequence

Figure 5.1: Using behavioral diversity to discover solutions in a domain with a deceptive or

ﬂat ﬁtness function. The robot (

𝑎

) has to move to the lights in the order indicated by the arrows

(

𝑏

) to eventually turn on light 6. Fitness is deﬁned as the number of time steps to reach light 6,

and therefore does not indicate which behaviors are promising early on. In contrast, behavioral

diversity rewards controllers that turn on more and more lights; thus, it encourages exploration that

eventually makes the search successful (

𝑐

). In this manner, behavioral diversity can be used to

guide search even when the ﬁtness function is ŕat (as in this case) or deceptive (more generally).

Figures from Mouret and Doncieux (2009).

However, it is possible to deﬁne BC as the collection of lights that are on, such as

1000000, 1001000, 1100000, and so on. An archive of discovered behaviors can then be

formed, and evolution rewarded for exploring new behaviors. In this manner, evolution

quickly discovers movement sequences that result in more lights being turned on, including

eventually light 6. Thus, behavioral diversity makes search eﬀective in this domain where

the ﬁtness function does not provide a hill to climb. In the same manner, behavioral

diversity helps cope with ﬁtness functions that are deceptive, i.e. ﬁtness peaks are located

behind ﬁtness valleys.

This section has introduced and illustrated the fundamentals of behavioral diversity.

The next two subsections push these concepts further in opposite directions: novelty search

aims to maximize exploration and creativity through divergent search, and quality-diversity

methods seek to combine diversity with performance objectives.

5.3 Novelty Search

The previous sections have shown how evolution with behavioral diversity objectives

can discover solutions that are diﬃcult to ﬁnd. It is possible to take this approach one

step further and make it the only objective of search. That is, the entire aim of evolution

is to keep generating new variation and never converge at all: it is divergent instead of

convergent.

A good motivation for divergent evolution comes from biology. Unlike traditional

evolutionary computation, biological evolution does not have a goal. Variation is generated

continuously, and selection operates upon it. This selection pressure is much weaker than

that used in evolutionary computation, and results in much broader diversity. Evolution can

thus quickly adapt to new situations, taking advantage of niches that confer an advantage

in survival. The results can sometimes seem extremely creative, like the anglerﬁsh, which

116

CHAPTER 5. UTILIZING DIVERSITY

lures prey by generating light at the end of a long ﬁn ray (Coleman, 2019), or bacteria that

evolve to utilize citric acid as their carbon source (Blount, Borland, and Lenski, 2008). It

is this kind of creativity that computational divergent search is aimed at capturing.

Divergent search can be formalized within the current evolutionary computation

framework simply by rewarding behavioral diversity instead of performance. This

approach is called novelty search (Lehman and Stanley, 2008; Lehman and Stanley, 2011a;

Stanley and Lehman, 2015). A novelty metric is deﬁned that measures how diﬀerent a

candidate solution is from solutions that have been generated before, i.e. how novel it is.

This novelty metric then replaces the usual ﬁtness metrics that measure performance in a

task.

A common novelty metric is the sparseness of the behavior space around the individual,

i.e. the average distance to its 𝑘 nearest neighbors. Similarly to Equation 5.2,

𝜌(𝑥) = 1/𝑘

𝑘



𝑗=1

𝑑(𝑥, 𝑦

𝑗

), (5.5)

where

𝜌(𝑥)

stands for the novelty of individual

𝑥

𝑦

𝑗

is the

𝑗

th nearest neighbor of

𝑥

, and

𝑑

is the distance metric between their behavioral characterizations. This novelty is computed

against the current population as well as an archive of prior solutions. The archive is ﬁrst

initialized randomly, and new individuals are then added to it with a low probability. In

this manner, the archive constitutes a sampling of the behavior space, guiding the search

to new areas.

Novelty search indeed leads to diverse solutions. However, and most remarkably, it

sometimes also discovers solutions that are useful in the domainÐeven though there is

no provision in the search for preferring them in any way. One potential explanation is

that in order to be most diﬀerent from what has been created before, it is a good idea to

utilize structure in the domain. That is, search may discover stepping stones that can be

combined eﬀectively into more complex solutions, thus creating more diversity than a

random search.

The motivation for this idea comes from the Picbreeder game (section 8.3), where

human players select the most interesting images and evolution creates more images by

crossing over and mutating the CPPN neural networks that generated them (Secretan,

Beato, D’Ambrosio, et al.,

2011). It turns out that the human players do not usually

have a goal in mind in what they are trying to generate, but instead, use the raw material

serendipitously: They come up with ideas of what to create on the ŕy, depending on what

interesting shapes and images are currently in the population. For instance, in creating

a skull image, they utilized, over time, many images that looked nothing like the skull.

There were images that could be described as a crescent moon, a comet, a drop on water,

and a mask (ﬁgure 5.2

𝑎

; Woolley and Stanley, 2011). These images served as stepping

stones that eventually came together to generate the skull.

Interestingly, if evolution is set up with the goal of generating the skull image, it fails

(ﬁgure 5.2

𝑏

). The images approach the skull shape overall, but never get the elements

right. Perhaps the evolution of something that complex relies on discovering the proper

stepping stones, i.e. discovering the solutions that represent the prominent structure in the

domain?

117

CHAPTER 5. UTILIZING DIVERSITY

Gen 12 Gen 20 Gen 36 Gen49 Gen 74

(𝑎) Intermediate images in the evolution of the skull image

Run 1 Run 3 Run 7 Run 15 Run 17

(𝑏) Attempts to evolve the skull image directly

Figure 5.2: Stepping-stone-based vs. direct evolution of a skull image. How can a CPPN be

evolved to create a particular image, such as the skull? (

𝑎

) Human players of Picbreeder selected

images that looked interesting on their own, without the goal of generating a skull, which emerged

serendipitously toward the end of the evolution from these stepping stones. (

𝑏

) When evolution

is directed to evolve the skull image directly with a distance-based ﬁtness, it falls short of most

of the details; shown are the ﬁnal results of ﬁve such example runs. In this sense, the discovery

of stepping stones is crucial in generating complex solutions. Figures from Woolley and Stanley

(2011).

One way to characterize the stepping stones is that they are local maxima in the search

space wrt. a metric diﬀerent from novelty. This metric could measure how impressive

they are (Lehman and Stanley, 2012), or it could be related to performance in the domain

(Meyerson and Miikkulainen, 2017). Stepping stones can then be identiﬁed as those

solutions that dominate other solutions in terms of novelty and ﬁtness (i.e. through

behavioral domination). In this manner, the search discovers global novelty and local

reﬁnement. For instance, in the domain of ﬁgure

5.3, neither novelty-based nor ﬁtness-

based search is much better than random search in ﬁnding the high ﬁtness region on

the top right. However, the claw-like areas form stepping stones: The ﬁtness increases

horizontally and vertically in each toe, and by combining the end solutions of each toe, it

is possible to jump to the next claw (with superior ﬁtness). A search mechanism that takes

advantage of local ﬁtness and global novelty can utilize such stepping stones and discover

useful solutions in the domain.

Stepping stones can be found in complex real-world domains as well (Lehman and

Stanley, 2011a; Stanley and Lehman, 2015). For instance, consider evolving a controller

network for a bipedal simulated robot (ﬁgure 5.4). It is possible to reward the networks

simply by the distance the walker can travel before falling over. Such evolution is rewarded

by incremental progress, and results in movement that is limited and aims to be stable, but

is also vulnerable to disturbances and variations that might occur in the environment. In

contrast, when such walking is evolved through novelty search, many behaviors that have

118

CHAPTER 5. UTILIZING DIVERSITY

Figure 5.3: Illustration of search based on stepping stones. In this experiment, a population of

points is evolved on the 2D rectangle. Fitness is zero in the background, and increases in each

claw from left to right and from bottom to top. The population starts at the bottom left and has to

discover the top ﬁtness at the top right. While ﬁtness-based and novelty-based searches are not

much better than random, a search method that discovers and utilizes stepping stones performs

much better. It discovers the local optima at the end of each ﬁnger of the claw-like pattern, and then

combines them to make the jump to the next claw. In this manner, stepping stones can be identiﬁed

as local optima and recombined to make discoveries that would otherwise be diﬃcult to make.

For an animation, see

https://neuroevolutionbook.com/demos

. Figure from Meyerson and

Miikkulainen (2017).

little to do with walking are discovered, such as falling ŕat, jumping forward, taking a few

steps before falling, and ultimately, leaning forward and moving legs fast to prevent falling.

It turns out that such walking is more robust and more eﬀective. It emerged from many

diﬀerent kinds of failures, and avoids them eﬀectively. Evolution utilizes these failures as

stepping stones, combining them eﬀectively into more comprehensive solutions.

Quality diversity methods can be seen as a way to take advantage of stepping stones

in a more general framework. The idea is to combine novelty search with ﬁtness-based

search in a way that allows ﬁnding better solutions and ﬁnding them faster, presumably

taking advantage of stepping stones along the way. Quality diversity methods will be

discussed in the next section.

5.4 Quality Diversity Methods

Quality diversity (QD; Pugh, Soros, and Stanley, 2016) represents a signiﬁcant shift

in evolutionary computation. QD is an evolutionary search paradigm that prioritizes

discovering a diverse collection of high-quality solutions, rather than a single optimal

solution. This concept emerged from the observation that natural evolution tends toward

divergence rather than convergence: instead of yielding one łbestž species, nature

produces a myriad of diﬀerent species, each highly adapted to its own niche. In traditional

optimization, evolutionary algorithms are typically used to converge on one top-performing

119

CHAPTER 5. UTILIZING DIVERSITY

(𝑎) Fitness-based search (𝑏) Novelty search

Figure 5.4: Contrasting the creativity of solutions in convergent and divergent search. Gaits

for the bipedal walker are evolved in two ways. (

𝑎

) Convergent (ﬁtness-based) evolution favors

small, safe improvements that allow the robot to travel incrementally further. The resulting gait

is rigid and slow and often fails. (

𝑏

) In contrast, divergent (novelty-based) evolution discovers

dynamic behaviors such as falls and jumps that are diﬀerent from others. They serve as stepping

stones in exploring a larger space, which eventually includes robust dynamic gaits. In this manner,

superior solutions can be discovered even when (and because!) they are not directly rewarded. For

animations, see https://neuroevolutionbook.com/demos.

individual (or a set of trade-oﬀ solutions in multi-objective optimization), which can cause

premature convergence and loss of diversity. By contrast, QD algorithms seek to maintain

and foster diversity in the population while also optimizing performance within behavioral

niches. In other words, the goal of QD is to ﬁll the space of possibilities with the best

possible example of each type of behavior.

5.4.1 Motivation and Challenges

This new approach has been called an łilluminationž of the search space, as it illuminates

how performance varies across diﬀerent behaviors or features of solutions. The motivation

for QD algorithms arises from challenges in traditional neuroevolution and optimization.

Many evolutionary runs tend to converge to a single solution that exploits the easiest

path to high ﬁtness, foregoing alternative strategies or morphologies. This convergence

is problematic in deceptive domains, where reaching the global optimum may require

exploring low-ﬁtness intermediary regions that a purely objective-driven search would

avoid. Pioneering work on novelty search, which we discussed in the previous section,

showed that completely removing the objective and rewarding novelty instead can mitigate

convergence and even ﬁnd global optima in deceptive tasks. However, NS treated diversity

merely as a means to an end (ﬁnding a single solution) and did not explicitly value quality

in its diverse outcomes. Quality diversity algorithms take the next step by valuing diversity

as an end in itself, alongside quality.

In QD, the aim is to obtain a maximally diverse collection of behaviors such that

each is as high-performing as possible. This dual focus is often analogized to natural

evolution producing many species each optimally adapted to its niche. The key innovation

120

CHAPTER 5. UTILIZING DIVERSITY

is to balance exploration (ﬁnding many diﬀerent behaviors) with exploitation (optimizing

performance within each behavior niche) simultaneously in one evolutionary run. To

enable this, QD methods introduce mechanisms that reward behavioral innovation while

also conducting localized competition within behaviorally deﬁned niches. Importantly,

unlike approaches that return multiple optima by focusing only on peaks of a ﬁtness

landscape, QD measures diversity in terms of behavioral descriptors (also called behavior

characterizations) that the user deﬁnes for the domain. The assumption is that all regions

of this behavior space are of interest, not just those near the global optimum. Thus, QD

algorithms strive to cover the entire behavior space at some resolution, reporting the

highest-performing individual found for each region. By prioritizing diversity over pure

quality, QD avoids driving the search away from low-performing regions entirelyÐeven

niches with relatively modest ﬁtness can be maintained if they represent unique behaviors.

Two early realizations of the QD paradigm are novelty search with local competition

(NSLC; Lehman and Stanley, 2011b) and multi-dimensional archive phenotypic elites

(MAP-Elites; Cully, Clune, Tarapore, et al., 2015; Mouret and Clune, 2015). These

algorithms embody the QD approach by combining the drive for behavioral diversity with

a localized search for performance quality. NSLC and MAP-Elites have demonstrated that

this focus on diversiﬁcation, rather than pure optimization, can yield impressive results in

various domains, including those where traditional optimization methods fall short.

5.4.2 Novelty Search with Local Competition

To illustrate the usefulness of QD, it can help to look at a domain where both quality

and diversity are important. One such domain is that of evolving virtual creatures, which

should not only have diverse morphologies but also locomote eﬃciently (ﬁgure 5.5). In

contrast to natural evolution, virtual creatures in evolutionary computation experiments

often evolve toward a single dominant morphology, driven by selection mechanisms that

disproportionately reward the easiest-to-exploit designs. Novelty search has been proposed

as a remedy, rewarding divergence from past designs to enhance ecological diversity.

However, focusing solely on novel morphologies can lead to functionally impractical

designs, indicating the necessity of balancing morphological novelty with functionality to

ensure that evolved creatures are not only diverse but also capable of eﬀective performance

within their environments.

To address this problem, novelty search can be combined with a mechanism for local

competition (NSLC; Lehman and Stanley, 2011b), which is motivated by the biological

principle that individuals often compete primarily with others in their local environment

rather than with the entire global population. Novelty search, rewarding uniqueness rather

than just ﬁtness for a task, eﬀectively prevents convergence on premature solutions. Local

competition, simulating a more natural selection environment where creatures compete

against others in their immediate vicinity rather than against a global ﬁtness standard,

promotes performance localized within morphological niches. As we will see, such a dual

approach leads to high diversity while also maintaining the functional capabilities of the

creatures.

NSLC can be implemented using a genetic algorithm where each individual in the

population is assessed both for its novelty and its competitive ability. Novelty is measured

121

CHAPTER 5. UTILIZING DIVERSITY

based on a multi-dimensional feature descriptor that quantiﬁes how diﬀerent an individual

is from the rest of the population and from those stored in an archive of historically

novel individuals. The local competition is implemented by having individuals compete

for survival against a subset of the population within their niche, rather than the entire

population. The genetic representation of the creatures is a type of graph grammatical

encoding (section 4.2.2), in which an evolved genotypic graph structure is unrolled into a

coupled body plan and control policy. Crucially, this encoding supports a wide range of

robot morphologies with diverse body sizes and shapes, making it well-suited for testing

the capabilities of NSLC.

In more detail, competition occurs among the

𝑘

nearest neighbors in a morphological

feature space (e.g. based on Euclidean distance in a space deﬁned by height, mass, and the

number of active joints), where

𝑘

is a ﬁxed parameter that is determined experimentally.

Combining novelty and local competition can naturally be achieved with a multi-objective

evolutionary optimization algorithm such as NSGA-II (section 2.2.5). In this setup, each

individual is evaluated based on two objectives: (1) Novelty, the average distance to

its

𝑘

nearest neighbors in mor phology space. (2) Local competition score, which is

the number of neighbors that the individual outperforms in terms of locomotion ﬁtness.

There is one key diﬀerence in this implementation from the standard NSGA-II approach.

While NSGA-II promotes diversity along the non-dominated front, NSLC replaces that

mechanism with a separate objective that explicitly rewards genotypic diversity. This

change is justiﬁed because both novelty and local competition are inherently relative

metrics. Individuals with identical novelty or local competition scores might be grouped

together under a Pareto-based diversity scheme, even though they could diﬀer signiﬁcantly

in morphology or performance.

In this domain, NSLC led to several beneﬁcial eﬀects. First, the ecosystem of evolved

creatures showed a much higher level of diversity compared to systems evolved with

traditional ﬁtness-only approaches, as is illustrated in ﬁgure

5.5. Secondly, the local

competition model ensured that while diversity is maintained, the creatures also developed

the ability for fast locomotion. This method eﬀectively balanced the exploration of the

morphological space (through novelty search) with the exploitation of successful strategies

(through local competition).

5.4.3 MAP-Elites

Multi-dimensional archive of elites (MAP-Elites) distinguishes itself within the QD

domain by explicitly deﬁning niches (Cully, Clune, Tarapore, et al.,

2015; Mouret and

Clune,

2015), a stark contrast to the passive emergence seen in NSLC. MAP-Elites

operates by par titioning the search space into a grid of niches, each deﬁned by speciﬁc

feature dimensions that describe meaningful characteristics of possible solutions. These

characteristics are also known as behavior characterization (BC) and typically deﬁned by

the user, who also chooses how ﬁnely this space should be divided; each cell in this grid

will eventually hold the best solution found for that combination of features.

Initially, MAP-Elites populates the map by generating a set of random candidate

solutions. For each one, it simulates or evaluates the solution to calculate its performance

and determine its feature descriptors. Each solution is then placed into the appropriate

122

CHAPTER 5. UTILIZING DIVERSITY

Figure 5.5: Diverse competent morphologies discovered within a typical single run of NSLC.

Various creatures are shown that have specialized to eﬀectively exploit particular niches of

morphology space. Compared to approaches relying on global competition, NSLC uncovers

a greater range of functional morphologies in a single evolutionary run. The hopper (

𝑎

) is a

unipedal hopper that is very tall, (

𝑏

) is a heavy, short crab-like creature, and (

𝑐

) and (

𝑑

) are

distinct quadrupeds. Creature (

𝑐

) drives a large protrusion on its back to generate momentum,

and (

𝑑

) has a tail for balance. Figure from Lehman and Stanley (2011b). Videos at

https:

//neuroevolutionbook.com/demos

cell in the feature space grid, based on its features. If the cell is empty or the new solution

performs better than the one already in that cell, it replaces the existing occupant.

Once this initial seeding is done, the main evolutionary process begins. At each

iteration, the algorithm selects one of the already stored solutions from the map. This

solution is then mutated or recombined (if crossover is used) to create a new variant. The

new solution is evaluated to determine its features and performance. Just like before, it is

inserted into the cell corresponding to its features if it is better than the current occupant.

This process continues for a ﬁxed number of evaluations or until a certain convergence

criterion is met. Over time, the algorithm ﬁlls more cells of the feature map, continuously

replacing weaker solutions with stronger ones. The search is biased toward discovering

high-performing solutions across a broad range of features, rather than optimizing

performance within a narrow slice of the space. By the end of the r un, MAP-Elites

produces a feature-performance map: a landscape showing which combinations of features

yield strong solutions, and what the best-known solutions are for each combination. This

map serves both as a practical tool for selecting from a diverse set of elite solutions, and

as an analytical resource for understanding the structure of the problem domain.

For example, in the domain of locomoting soft robots we have encountered in

section

4.3.2, BCs can be deﬁned as the percentage of the robot made from stiﬀ bone

material, and the overall size of the robot, measured by the percentage of ﬁlled voxels. If

a new robot exhibits the same percentage of stiﬀ material and ﬁlled voxels, it will only

replace the elite if it travels faster (i.e. has a higher locomotion ﬁtness score). This process

ensures that each niche retains the best solution found so far according to the ﬁtness

function, but crucially, also captures a diverse array of solutions across the entire range of

deﬁned features. Listing 5 details the MAP-Elites approach.

123

CHAPTER 5. UTILIZING DIVERSITY

Listing 5 Default MAP-Elites algorithm.

1 def map_elites():

# Create an empty, N-dimensional map of elites including

3 # solutions and their performances.

4 solutions, perfs = create_archive()

for i in range(num_iters):

# Create a new solution.

8 if i < num_rand_solutions:

9 x

= random_solution()

else:

11 x

= random_selection(solutions)

12 x

= random_variantion(x)

14 # Update the archive and its solutions' performances.

15 x_feat_desc = feature_descriptor(x)

16 x_perf

= performance(x)

17 elite

= get_elite_with_feat(solutions, x_feat_desc)

if elite is None:

19 update_archive(solutions, x, perfs, x_perf)

else:

21 elite_perf

= get_elite_perf(perfs, x_feat_desc)

if elite_perf < x_perf:

23 update_archive(solutions, x, perfs, x_perf)

return solutions, perfs

The eﬀects of applying MAP-Elites are multi-faceted: First, it preserves a diverse set of

solutions, each excelling in diﬀerent parts of the feature space. For example, MAP-Elites

managed to evolve a variety of locomoting soft robots, each representing the best of their

respective behavior niche. In contrast, typical evolutionary algorithms tend to converge

on a narrow set of morphologies within a single run, repeatedly ﬁnding variations of the

same local optimum and missing out on alternative, high-performing designs that exist

elsewhere in the feature space.

Second, MAP-Elites eﬀectively łilluminatesž the search space, providing insights

into how diﬀerent features of solutions contribute to their success and interrelate with

each other. This is particularly valuable in complex domains where the relationship

between features and performance is not well understood. Two such maps, created by

MAP-Elites, are shown in ﬁgure

5.6. Each smaller image shows the best-performing

organism found within a particular niche deﬁned by the two behavioral features mentioned

above (e.g. percentage of voxels ﬁlled and proportion of bone material). This diversity is

very useful for robustness and adaptability, as it provides a spectrum of potential solutions

to unforeseen challenges or changes in task requirements. For example, this principle

can allow robots confronted with damage or environmental change to rapidly adapt by

selecting an alternative behavior from its precomputed MAP-Elites archive (Cully, Clune,

Tarapore, et al., 2015).

124

CHAPTER 5. UTILIZING DIVERSITY

% bone

% voxels ﬁlled

ﬁtness

bipeds

two-arm

crawler

biped biped biped

jumper

triped triped

Same orgs,

from the side

triped

(𝑎)

% bone

% voxels ﬁlled

ﬁtness

3-legged triped

(muscle legs)

3-legged triped

(muscle legs)

(𝑏)

Figure 5.6: Example maps annotated with example organisms from diﬀerent areas of the

feature space. Figures (

𝑎

) and (

𝑏

) show maps of two diﬀerent MAP-Elites runs. Within a map,

MAP-Elites smoothly adapts a design theme along the desired dimensions of variation. One can

see that there is some variation between maps, both in the performance discovered at speciﬁc

points and in the types of solutions. That said, each map generally paints the same overall picture

of the performance capabilities of each region of the feature space. Note the diﬀerent scale of the

bottom color map. Figure from Mouret and Clune (2015).

In summary, both NSLC and MAP-Elites ultimately seek a diverse set of high-

performing solutions, but they do so diﬀerently. NSLC uses an implicit niching: niches

125

CHAPTER 5. UTILIZING DIVERSITY

form organically as similar individuals compete locally within a single population. MAP-

Elites uses explicit niching: the user deﬁnes the niches in advance (the grid), and there is

an archive slot reserved for each niche. The advantage of the MAP-Elites’ approach is

simplicity and direct control over which aspects of behavior are considered (the dimensions

of the map). Its evolutionary loop is also simpler (single objective acceptance criterion

for each bin). On the other hand, NSLC’s implicit approach can be more ŕexible if the

appropriate behavior dimensions are not obviousÐit essentially lets evolution discover

niches based on where diﬀerent solutions arise. NSLC uses continuous evolutionary

dynamics (with a ﬁxed population size each generation), whereas MAP-Elites accumulates

an ever-growing set of elites (bounded by the number of bins).

In practice, the choice between them can depend on the problem: MAP-Elites is often

favored for low-dimensional, user-deﬁned behavior spaces where one wants a coverage

of that space, while NSLC can be easier when one prefers not to discretize behaviors or

when using multi-dimensional continuous behavior spaces.

5.4.4 Implementing and Enhancing QD Algorithms

Since the establishment of QD as a powerful concept, exempliﬁed by algorithms such as

NSLC and MAP-Elites, numerous studies have emerged to analyze and enhance various

facets of QD. A selected set of works is introduced below to showcase the intricacies of

implementing QD from three main perspectives:

Behavior Characterization: BC not only determines the form of diversity during the

search process but also signiﬁcantly inŕuences the eﬃcacy of the optimization algorithm.

Therefore, it should be meticulously chosen to enhance the QD’s performance (Pugh,

Soros, and Stanley, 2016). While there is complete freedom in determining BC for a QD

task, it is preferable and necessary to choose those closely related to the desired objective.

This approach provides additional beneﬁts, such as improved model interpretability, and

is crucial for achieving reasonable performance.

For instance, Pugh, Soros, and Stanley (2016) examined the impact of using BCs that

are both highly aligned (e.g. ﬁnal coordinates at the trial’s end) and misaligned (e.g. the

most frequent direction of orientation) with the quality metric (e.g. goal achievement)

in solving maze navigation tasks through various QD implementations. Their ﬁndings

indicate that BCs misaligned with the quality metric not only underperform but also fail to

match the eﬃcacy of pure optimization-based methods. Conversely, BCs aligned with the

task’s objectives enhance performance, achieving state-of-the-art results at the time. Even

when paired with misaligned BCs, the overall performance still surpasses pure ﬁtness

searching methods. The key takeaway is that BCs aligned with the quality concept are

essential to overcome deception in challenging problems.

However, crafting BCs manually requires domain knowledge of the problem and

the solution. For problems with limited information, one approach is to use a pure

ﬁtness searching method as a baseline, then iteratively incorporate and test candidate

BCs for alignment with the quality metric, based on performance improvement over the

baseline. Recent studies also suggest the feasibility of learning BC. For instance, meta-

learning has been employed to discover optimal BD deﬁnitions, enhancing success rates

in multiple tasks (Meyerson, Lehman, and Miikkulainen, 2016). In robotic locomotion

126

CHAPTER 5. UTILIZING DIVERSITY

tasks, AURORA (Grillotti and Cully, 2022) uses dimension reduction models like PCA

and autoencoders to encode a robot’s sensory data, treating the encoded vector as the BC

during learning. These methods have shown promising results and point toward a more

generalized approach for BC design.

Niches Representation: After establishing BCs, the subsequent task is to develop a

technique for segmenting solutions into niches based on these BCs. The approach to niche

representation notably diﬀerentiates NSLC from MAP-Elites. In NSLC, niches emerge

dynamically, deﬁned by the

𝑘

-nearest neighbors among a generation’s peers and the elites

in the archive. This results in an evolving archive, where the number and speciﬁcs of the

cells are neither predetermined nor known in advance. Conversely, MAP-Elites divides

the BC space into discrete behavioral cells. This division is based on the BC range

and user-deﬁned granularity, oﬀering a complete overview of the archive’s size and cell

characteristics.

However, this method grapples with the curse of dimensionality, as the cell count

escalates exponentially with the increase in BCs and their granularity. To mitigate this

issue, a variant of MAP-Elites called centroidal Voronoi tessellation MAP-elites (CVT-

MAP-Elites), employs a clustering approach like

𝑘

-means to segment the archive space

into

𝑘

Voronoi tessellations (Vassiliades, Chatzilygeroudis, and Mouret, 2017). While

CVT-MAP-Elites shares core functionalities with MAP-Elites, it diverges in two key

operations: archive deﬁnition and cell querying. For deﬁning the archive, CVT-MAP-

Elites deploys

𝐾 ≫ 𝑘

vectors in the BC space to identify

𝑘

centroids representing the

cells, unlike MAP-Elites’ straightforward discretization of BCs. When querying a cell to

store a phenotype, CVT-MAP-Elites requires checking distances to centroids, potentially

increasing computational complexity to

𝑂(𝑘)

in the worst case, compared to the

𝑂(1)

complexity in MAP-Elites. Despite this increase in computational load, CVT-MAP-Elites

proves advantageous, capable of scaling up to 1,000 dimensions in maze experiments, a

signiﬁcant leap from MAP-Elites’ limitation to around 20 dimensions.

Optimization Algorithm: Although NSLC and MAP-Elites have shown impressive

results, their most successful applications have predominantly been in robotic locomotion

tasks with simple, low-dimensional controllers (Colas, Madhavan, Huizinga, et al., 2020).

In addition, both QD implementations commonly employ a mutation-based GA as their

foundational optimization algorithm, leaving the potential of ES family members largely

unexplored. Consequently, investigating new optimization methods to achieve scalability

and enhance learning eﬃciency is a logical next step.

In this context, Colas, Madhavan, Huizinga, et al. (2020) introduced MAP-elites with

evolution strategies (ME-ES), utilizing the eﬃciency of ES to extend MAP-Elites to

high-dimensional controllers managed by large neural networks. ME-ES demonstrated

the ability to learn a neural network controller with approximately

parametersÐ

signiﬁcantly larger than those in previous studiesÐoutperforming GA-based methods

even with triple the computation time.

Simultaneously, Fontaine, Togelius, Nikolaidis, et al. (2020) developed covariance

matrix adaptation MAP-elites (CMA-ME), which integrates the high-performing CMA-ES

algorithm from the ES family into the QD framework. A ﬁtness function that prioritizes

exploration (i.e. populating empty cells) over optimization (i.e. enhancing performance in

127

CHAPTER 5. UTILIZING DIVERSITY

ﬁlled cells) is the primary objective for CMA-ES. When the archive remains unchanged,

CMA-ES’s initial parameters and internal states are reset using a randomly chosen

individual from the archive. In comparative experiments, CMA-ME outperformed MAP-

Elites by not only doubling the solution quality but also providing a broader diversity of

solutions.

Building upon these advancements, Fontaine and Nikolaidis (2021) introduced MAP-

elites via a gradient arborescence (MEGA). Unlike traditional ES methods, which treat

objective and BC functions as black boxes, MEGA integrates directional perturbations

into MAP-Elites based on gradients of these functions, provided they are ﬁrst-order

diﬀerentiable. It employs CMA-ES to optimize the factors within the perturbation function.

CMA-MEGA signiﬁcantly surpasses traditional QD algorithms by not treating objective

and BC functions as black boxes, and it demonstrates its eﬃcacy in generating a diverse

array of high-quality images by searching the latent space of a StyleGAN.

Further building on these innovations, covariance matrix adaptation MAP-annealing

(CMA-MAE) by Fontaine and Nikolaidis (2023) introduces a nuanced alteration in the

ranking mechanism. This change gradually reduces the inŕuence of elites in ﬁlled cells

of the archive, ensuring that the optimization process does not prematurely shift focus

from the objective to exploration. This issue is especially pertinent in cases involving ŕat

objectives or low-resolution archives. Remarkably, this modiﬁcation is compatible with

both CMA-ME and CMA-MEGA, broadening its applicability.

5.5 Multiobjectivity

While quality diversity focuses on two objectives, one on performance and the other on

diversity, multiobjective optimization (section 2.2.5) in general is a good approach to

maintaining diversity in evolutionary computation. The motivation once again comes

from biology (Miikkulainen and Forrest, 2021). Biological ﬁtness is complex: animals

must seek food and shelter, avoid predators, ﬁnd mates, and care for the young, and often

some of these objectives conŕict. The problem can be solved in many ways, leading to

multiple niches, and such diversity leads to powerful further adaptation.

Note, however, that biological objectives can be expressed simply as a single high-level

objective: survival of the species. A similar approach can be taken in evolutionary

computation, i.e. a complex optimization task can be expressed simply as winning a game,

making a lot of money, or gaining a lot of publicity. Such objectives allow evolution to

be creative; on the other hand, the ﬁtness signal is weak and may not allow identifying

good ideas until they are fully developed. This approach may need to be paired with

neutral mutations, weak selection, and deep time, placing it closer to biological evolution

(section 9.1.1).

Multiobjective optimization can thus be seen as a practical approach one level below

such a high-level speciﬁcation. It is often possible to devise performance objectives, cost

objectives, and secondary objectives such as simplicity, accuracy, or appearance, without

specifying the desired solutions directly. In many cases, it is useful to have a Pareto front

as a result, i.e. a collection of solutions that each represents a diﬀerent tradeoﬀ between

them such that no solution is better than any other across all objectives. One solution

128

CHAPTER 5. UTILIZING DIVERSITY

in the Pareto front can then be chosen according to other criteria, such as conditions at

deployment time, or human preferences that are diﬃcult to express as objectives.

The approach can be taken a step further to evolve complex behavior in a prescribed

manner. For instance in the NEWS/D approach (Salih and Moshaiov, 2022; Salih and

Moshaiov, 2023a; Salih and Moshaiov, 2023b), the overall behavior is decomposed into a

set of single-objective problems that are optimized together, resulting in a Pareto front of

solutions. Some of these solutions are specialized to a particular objective and others are

non-specialized. When applied to a set of robot motion tasks, the nonspecialized solutions

represented general controllers that transfer red well to new tasks. The method was used to

optimize behavior according to a set of scenarios in aerial pursuit-evasion tasks, providing

signiﬁcant improvement over the standard method of proportional navigation.

Multiobjectivity is also a natural way to boost diversity: with multiple objectives, there

are many ways of being successful. Niching or speciation may emerge in the population,

and may be further encouraged separately with mechanisms such as those in NEAT.

Species can then be used to form ensembles, taking advantage of the diversity. Such

methods are reviewed in the next section.

5.6 Ensembling

In general in machine learning, it is often a good idea to train multiple diﬀerent models for

the task, and then form the ﬁnal system by ensembling them. The idea is that each model

is somehow diﬀerent, e.g. has a diﬀerent architecture, is initialized diﬀerently, or is trained

with diﬀerent training samples. Thus, each of them may end up learning something the

other models do not, and together they can perform better than any model alone. This

idea is consistent with studies in psychology, social science, and business that suggest that

diversity in human teams leads to improved decision-making (Rock and Grant, 2016).

Ensembling may be as simple as just averaging the outputs of multiple models, or

combining them more intelligently, or selecting one model that is most likely to have

the correct answer for each input. Methods have also been developed, such as mixtures

of experts (Masoudnia and Ebrahimpour,

2014) and RHEA (section 6.4.5), to train

and combine diﬀerent models more systematically. The fact that ensembling works

is statistically surprising and was controversial for a while, but there is now a good

understanding of it, especially in classiﬁcation tasks (H. Li, X. Wang, and Ding,

2018).

Ensembling intelligent agents requires more complex methods because behavior often

depends on sequences of inputs and decisions and is often based on recurrent neural

networks, but it is possible as well. Ensembling is thus par t of the standard machine

learning toolbox and can be used routinely to improve performance.

Ensembling is a particularly natural extension of evolutionary approaches. EAs create

and maintain a population from which the ensemble can be drawn. Moreover, having

a diverse set of candidates is crucial both for evolution and ensembling. Often, the

individuals in the ﬁnal population end up with slightly diﬀerent skills, from which an

eﬀective ensemble can be formed (Islam and Yao,

2008). Examples of such diversity

include e.g. the age-estimation network architecture (section 11.3.6) and training with

population culture (section 5.7). Such diversity is even more pronounced when the task is

129

CHAPTER 5. UTILIZING DIVERSITY

multiobjective: Individuals in the Pareto front form a natural pool from which to select

ensemble members.

The NEAT neuroevolution method also employs a speciation mechanism that en-

courages diversity in search (section

3.3). In eﬀect, NEAT runs multiple island-based

evolutionary processes, i.e. separate subpopulations that only periodically cross over, and

species that are created and removed dynamically as evolution progresses. The species

are created and maintained based on topological (i.e. genetic) diversity, but they result

in enough behavioral diversity for ensembling to be eﬀective. Indeed, it is possible to

use just the species champions as the members of the ensemble, and then add a voting,

averaging, winner-take-all, or gating as the ensembling mechanism (Pardoe, Ryoo, and

Miikkulainen, 2005).

Note that ensembling is related to many neuroevolution ideas and mechanisms

discussed in this book. For instance, the main idea of the ESP method (section 7.1.1) is

to evolve neurons for each location in the network in separate subpopulations; because

good performance requires diﬀerent neurons, diversity across populations is automatically

maintained, and neurons are evolved that cooperate well together. Such a network can

be seen as an ensemble with a very strong combination mechanism. Similarly to the

hierarchical mixtures of experts approach in machine learning, ESP can be extended

hierarchically to construct a team of networks, where each network receives diﬀerent

inputs. For instance, each network can keep track of a diﬀerent opponent, and at the

highest level, a combiner neural network decides what action to take (Rajagopalan, Rawal,

Miikkulainen, et al., 2011). This approach was used to evolve both the prey and the

predator agents in the coevolutionary arms race example described in section 7.2.2.

In MM-NEAT (section 6.3), multiple modules emerge from the evolution of a single

network. They can be seen as ensemble members, and the preference neurons in each

module as the ensembling mechanism, suggesting how the module output should be

combined. Such preference neurons can be evolved in separate networks as well: In

essence, each network places a bet that they have the right answer (Bruce and Miikkulainen,

2001). They are evolved to maximize the return from their bets, and as a result, the bets

serve as conﬁdence estimates. Ensembling then consists of simply selecting the network

with the highest conﬁdence. The context+skill approach (section 6.2) can also be seen as

an ensembling mechanism. There are two special ensemble members, one representing

context and the other the most likely action, and a combiner network on top representing

the ensembling mechanism.

However, the most straightforward ensembling approach can already be useful in

neuroevolution: A NEAT population can be evolved in a control task ﬁrst, and then a

gating neural network evolved to select which controller to use at each step. The approach

was applied to a more challenging version of the pole-balancing task where the pole is

actually a telescope that can change its length, and the pole’s tip chases a moving target

particleÐas if trying to swat a ŕy (ﬁgure 5.7). Even though there’s only a single pole

and the controller sees the positions and velocities (so that recurrency is not needed), the

response of the pole changes with its length. Thus, the actions change the dynamics of the

task, requiring the controller to adjust its strategy continuously. Such ŕexible control is

hard to achieve with a single neural network, but easier with an ensemble. After evolving a

130

CHAPTER 5. UTILIZING DIVERSITY

(𝑎) Particle chasing task (𝑏) Improvement through ensembling

Figure 5.7: Eﬀect of simple ensembling in a complex control task. (

𝑎

) When the cart-pole

task is extended with an extensible pole, it becomes a ŕy-swatting task. The control dynamics

change constantly as the pole changes, making control highly context-dependent and well-suited to

ensembling. (

𝑏

) The population of controllers is ﬁrst evolved with NEAT for 150 generations; once

the performance plateaus, a gating network is evolved to select among eight species champions.

The performance improvement is signiﬁcant and immediate, suggesting that ensembling is a simple

and reliable way to boost performance of neuroevolution experiments. Figures from Pardoe, Ryoo,

and Miikkulainen (2005).

population of controller neural networks for 150 generations, the species champions were

used as an ensemble. A gating neural network was then evolved for another 50 generations

to pick one network to control the system at each step. The performance improvement

was signiﬁcant and immediate, demonstrating how even simple ensembling can add value

to an existing neuroevolution approach.

The approach could easily be extended with various techniques to ﬁt particular

problems. For instance, diversity of the ensemble population could be increased by

making evolution multiobjective. Secondary objectives may be deﬁned naturally in

many domains (such as speed, or cost, in addition to accuracy), but novelty is always a

possible such objective, and highly eﬀective in promoting diversity (section 5.3). Or, the

ensemble members could be evolved to optimize not their own performance in isolation,

but per formance as a useful member of the ensemble (García-Pedrajas, Hervás-Martínez,

and Ortíz-Boyer, 2005). This approach could boost the performance of even the simplest

ensembling methods, like voting, averaging, or gating.

Further, the gating network could be evolved not simply to select, but to combine the

outputs of the population members, similar to context+skill approach or conﬁdence-based

ensembling (GPAI, 2024). The ensemble members could indicate conﬁdence as part of

their outputs, and the combiner could take that into account in constructing its actions

(instead of simply selecting the most conﬁdent network). The ensemble and combiner

networks could be co-evolved to maximize the performance of the ensemble, similarly to

hierarchical ESP and CoDeepNEAT (sections 7.2.2 and 10.3.2).

In this manner, the general idea of ensembling can take many forms in neuroevolution.

However, it should always be part of constructing the solution. Without some kind of

ensembling in the end, a neuroevolution experiment often leaves money on the table.

More broadly, the simple success of ensembling oﬀers a powerful lesson to problem-

solving and decision-making in general: Diverse teams with multiple viewpoints are

likely to perform better than individual experts, provided that there is some principled

way of combining these viewpoints. Ensembling provides a simple such way: egalitarian

131

CHAPTER 5. UTILIZING DIVERSITY

learning, described in the next section, extends it further with learning.

5.7 Utilizing Population Culture and History

The knowledge that exists in the population beyond a single individual can be seen as

population culture. There are common elements to it, i.e. knowledge that many individuals

share, such as common behaviors, variations of this common knowledge, and also elements

unique to single individuals. Generally, culture operates at a time scale between learning

and evolution, but can also emerge even during the lifetime of individuals, and can last as

long as the population. It can also include artifacts that exist outside the population. They

may be essential in establishing open-ended evolution in that they permanently alter the

environment where evolution takes place (Lehman, Gordon, S. Jain, et al., 2023).

In evolutionary computation, population culture can be utilized in many ways to

make evolution more eﬀective (Belew, 1990; Maheri, Jalili, Hosseinzadeh, et al., 2021;

McQuesten, 2002; R. G. Reynolds, Michalewicz, and Cavaretta, 1995; Spector and Luke,

1996). Just like in human societies, an essential element of it is diversity. The population

includes many diﬀerent kinds of solutions; the power of cultural algorithms comes from

exploiting such diversity.

The simplest way is to utilize diversity in a single generation of oﬀspring. That

is, instead of generating the usual two oﬀspring at each crossover, dozens or hundreds

are created. They are then quickly evaluated, and only the most promising few are

keptÐand they are most likely better than those two resulting from the normal process.

This mechanism, called culling, is based on the observation that most crossovers are awful

(Nordin and Banzhaf, 1995; Whitley, Dominic, and Das, 1991), i.e. result in oﬀspring

that are weaker than the parents. This eﬀect is especially severe in neuroevolution with

competing conventions, where most crossovers are wasted on incompatible individuals.

Some algorithms forgo crossover entirely and only rely on mutation. However, crossover

is an important vehicle of adaptation in biology, so somehow our implementation of it is

lacking. Culling is a way of trying to ﬁx it. It is motivated by biology in that embryos that

are not viable are discarded early in gestation, and litters are often much larger than one or

two individuals. There are probably other mechanisms at work as well in biology that make

crossovers more productive than crossovers in computation, such as more complicated

genotype-to-phenotype mappings (Miikkulainen and Forrest, 2021). They can be partially

modeled by making culling more extreme, i.e. generating more oﬀspring and retaining

only a few of them, which is easy to do in evolutionary computation.

The challenge in culling is to recognize the few most promising oﬀspring without

having to run a full ﬁtness evaluation on the whole set. If that is possible, then culling

can speed up evolution. It turns out that such approximate evaluation is possible through

culture. A set of inputs can be formed, i.e. a set of questions, or a syllabus if you will,

that is then given to each oﬀspring to see how they respond. Those answers can then be

compared to answers that other prominent population members would create, such as the

parents or population champions. Those oﬀspring whose answers are very diﬀerent from

the culture can then be culled. Even though the hope is that some oﬀspring’s answers diﬀer

because they are better than anything seen before, this process is eﬀective in identifying

132

CHAPTER 5. UTILIZING DIVERSITY

oﬀspring that are the worst, i.e. nonviable. Most crossovers are awful; it is enough to

discard only those. This process can be very eﬀective, for instance, speeding up evolution

by a factor of three or more in neuroevolution for the pole-balancing task (McQuesten,

2002).

Similar cultural mechanisms can be applied to other parts of the evolutionary process.

For instance, in selecting parents for crossover, the main goal is to combine the good

traits of both parents. This goal is challenging because ﬁtness alone does not tell the full

story. Sometimes good genes are incompatible with or dominated by other genes in the

individual, resulting in poor ﬁtness overall (as will be seen in section 6.4.5). Therefore,

parents should be chosen not only based on ﬁtness, but also on distance. That is, the

parents should be close enough in the genotypic space to be compatible, but diﬀerent

enough so that crossover will generate something new. In this manner, combining the

strengths of both parents becomes more likely.

One practical implementation of this idea is to select the ﬁrst parent based on ﬁtness

only, as usual, and the second to complement itÐthat is, while still competent in ﬁtness,

to be as diﬀerent from the ﬁrst as possible. The diﬀerence can be measured based on the

answers in the syllabus, as in culling. It turns out that in neuroevolution for the acrobot

task (i.e. swinging the jointed pole upright), a better oﬀspring is generated twice as often

as without such parent selection (15% of the time instead of 7%) (McQuesten, 2002).

Note that the second parent is usually much worse in ﬁtness, so such high ﬁtness is likely

achieved by combining complementary strengths.

Culture can also be used to maintain diversity directly by focusing on which individuals

are discarded from the population to make room for new oﬀspring. Usually, the individuals

with the poorest ﬁtness are removed, but diversity can be used as a secondary measure.

One way to implement this idea is to ﬁnd two pairs that are the closest in the population

in terms of the answers to the syllabus, and then discard the less ﬁt of them. Again, in

acrobot neuroevolution, such a mechanism resulted in populations that were three times

as diverse (in average distance in answers to the syllabus), making evolution 30% faster

(McQuesten, 2002).

A fourth way of taking advantage of culture is to use it to leverage learning in evolution.

As discussed in section 4.2.3, the syllabus of inputs can be paired up with answers of the

parents or population champions, and then used as a training set for gradient descent. In

this manner, those oﬀspring that have the best learning potential can be identiﬁed. Even

when the learned weights are not coded back into the genome, evolution becomes more

eﬀective through the Baldwin eﬀect, i.e. a more informative selection of oﬀspring. In

pole balancing, this mechanism can make neuroevolution an order of magnitude faster

(McQuesten, 2002).

However, even better use of this idea can be made by taking advantage of diversity in

the population culture. That is, the behaviors of all individuals in the population serve as

the cultural heritage; individuals can learn from any of these behaviors, and such learning

can guide genetic evolution in a more diverse and eﬀective way.

At the outset, it is not clear that this idea would work. To be sure, dividing the

population into teachers and learners, and utilizing parents and population champions

as teachers, makes sense: The new and poorly performing individuals in the population

133

CHAPTER 5. UTILIZING DIVERSITY

are trained to be more like those that are known to perform well. However, such training

is also bound to reduce diversity. Much of the population starts copying a few good

individuals, which may make it more diﬃcult for evolution to discover new solutions.

Also, even though the parents and champions perform well overall, some of their

actions can still be quite poor during evolution. Conversely, there may be other individuals

in the population who perform very well in speciﬁc situations, even though they do not

perform that well overall. In broader terms, in evolutionary computation as in society in

general, any individual may have something useful to teach to any other individual. This

is one reason why diverse teams in general may be more innovative than teams that are

not (Rock and Grant, 2016).

This principle can be captured computationally in a method called Egalitarian Social

Learning (Tansey, Feasley, and Miikkulainen, 2012). The idea is that each agent A

observes the performance of each other agent B in various situations in the task. If B

receives a high reward in a situation

𝑥

where A receives a low reward, there is a learning

opportunity for A. A training example is formed with

𝑥

as input, agent B’s action

𝑦

output, and gradient descent is used to modify agent B. In a sense, the entire set of agents

and their behaviors forms a population culture. Each agent is then trained to adopt those

aspects of the culture that are the most successful.

This approach works in domains where rewards can be obtained frequently and

are associated with partial behaviors. To enhance diversity, it is possible to divide

the population into subcultures. Agents in each subculture teach and learn from the

other agents in the same subculture, making it less likely for the population to converge

prematurely. The approach can be implemented through Lamarckian evolution or the

Baldwin eﬀect. When diversity is maintained through subcultures, Lamarckian evolution

may be more eﬀective.

The approach was demonstrated in a foraging domain where food items are randomly

scattered and vary in their value from very good to poor to outright poisonous (ﬁgure 5.8).

The agents sense these items in eight 22.5

𝑜

sectors in front of them and also sense their

own velocity. As their output, they control their velocity and orientation. With egalitarian

learning, many diﬀerent strategies evolved. Some subcultures focused on high-speed

exploration in order to utilize high-value food. Others moved more slowly, and carefully

consumed all positive food items. Overall, the egalitarian population was signiﬁcantly

more eﬀective in utilizing the available food resources than a comparable student-teacher

model and direct neuroevolution. The experiment thus illustrateed the value of diversity

in a team of agents, as well as the value of egalitarian learning.

Instead of using the diverse solutions in a population for training, the knowledge in

such solutions can be abstracted into a statistical model that then guides evolution. The

model predicts how likely the diﬀerent combinations of elements in these solutions are to

result in high ﬁtness. The approach is similar to CMA-ES (section 2.2.3), which uses a

model to make intelligent mutations, and estimation of distribution algorithms (EDAs;

Alden and Miikkulainen, 2016; Baluja and Caruana, 1995; Larranaga and J. Lozano, 2002;

J. A. Lozano, Larrañaga, Inza, et al., 2006; Pelikan, Goldberg, and Cantú-Paz, 1999),

where solutions are constructed step by step using a statistical model such as a Bayesian

network or a Markov random ﬁeld. At each step, the model is used to deter mine which

134

CHAPTER 5. UTILIZING DIVERSITY

(𝑎) The foraging domain (𝑏) Foraging ﬁtness over evolution

Figure 5.8: The eﬀect of diversity and egalitarian learning. A population of agents needs

to forage in an environment with good bad objects. (

𝑎

) The agents gain ﬁtness by consuming

food items of various positive values (A), and avoiding items of negative values (B). They

have a limited view (C), requiring them to move around a lot to ﬁnd the items. With direct

neuroevolution, several strategies developed, some taking advantage of covering a lot of ground,

and others taking advantage of being careful not to miss anything. (

𝑏

) With egalitarian social

learning (ESL), the evolved agents could also learn from each other during their lifetime. ESL

achieved higher ﬁtness by generation 50 than direct neuroevolution or a student-teacher approach

by Generation 500. This experiment thus demonstrated both the value of diversity and of learning

from population culture. Figures from Tansey, Feasley, and Miikkulainen (2012). Videos at

https://neuroevolutionbook.com/demos.

further elements would be most likely to result in good solutions, given the elements

chosen so far.

Instead of building a model of gene statistics, it can be built for neurons or modules

that form a network in approaches such as SANE, ESP or CoDeepNEAT (sections 7.1.1

and 10.3.2). In such a process, the neuron that correlates most signiﬁcantly with high

ﬁtness is selected ﬁrst. When selecting the next neuron, a measure of epistasis (i.e.

dependence) is ﬁrst used to decide whether the ﬁtness correlations of the next neuron

candidates should be calculated based on only those networks that contain the previous

neuron, or all networks in the population. The neuron with the highest correlation is then

chosen as the next neuron. In this manner, a single oﬀspring is constructed at a time in a

probabilistic process that does not employ crossover or mutation. In problems such as

double pole balancing, this approach, called Eugenic neuroevolution, can ﬁnd solutions

several times faster and more reliably than methods that evolve partial solutions without it

(Alden, Kesteren, and Miikkulainen, 2002; Polani and Miikkulainen, 2000; Prior, 1998).

Note that diversity in the population is crucial to form a good modelÐand the model is a

good way to take advantage of such diversity.

So far the idea of utilizing culture has relied on the cur rent population only. But

culture can extend over multiple generations, and there is no reason why populations

from prior generations couldn’t be utilized in evolutionary algorithms as well. The more

solutions there are to deﬁne culture, the more diversity there is also likely to be, making

cultural algorithms more eﬀective. Of course, an eﬃcient way to store the solutions and

select parents among them is needed.

Neuroannealing (Lockett and Miikkulainen, 2013) provides such a mechanism. All

135

CHAPTER 5. UTILIZING DIVERSITY

solutions ever encountered in the evolutionary run are organized into a partition tree of

solutions. There are four levels: the ﬁrst one is partitioned according to the number of

layers in the network, the second according to the number of nodes in each layer, the third

according to the connectivity patterns between layers, and the fourth according to the

weight values. A parent is selected by traversing the tree using a Boltzmann distribution

on the average ﬁtness of each branch, as in simulated annealing. Once a parent is selected,

NEAT-like mutations are performed to generate new solutions based on it.

Compared to standard NEAT, the neuroannealing process provides more ways to

increase complexity without forgetting any of the previous solutions. It can thus construct

larger and deeper networks than NEAT. Such networks may be useful in e.g. fractured

domains that make evolution of behavioral strategies challenging (section

6.3. Neu-

roannealing outperforms NEAT in many such problems, including multiplexer design,

concentric spirals, and double pole balancing.

Neuroannealing can be seen as implementing an extreme form of elitism: any solution

can have useful information in it, and therefore, nothing is ever discarded. Thus, the

population grows larger over time, and is likely to include more diversity in solutions

than smaller and constant-size populations can. With all this information, it is possible to

represent the ﬁtness function more comprehensively.

Each of the methods reviewed in this section points out opportunities for utilizing

diversity in population culture in neuroevolution. An interesting challenge for the future is

to ﬁnd synergies between them: for instance, neuroannealing could be combined with

eugenic evolution to build better models; culling, mate selection, and intelligent discarding

with any generation-based methods; egalitarian learning with eugenic or neuroannealing

systems. In this manner, diversity can be utilized in many more ways than simply powering

search based on crossover.

More broadly, this chapter discussed the role of diversity in neuroevolution, including

diﬀerent ways it can be characterized, how diversity can be encouraged to emerge, and

how it can be harnessed to ﬁnd better solutions. These techniques will be put to work in

the rest of the book, starting with evolving behavior in the next chapter.

5.8 Chapter Review Questions

Biological and Computational Diversity: Explain why diversity is a cornerstone

of both biological evolution and computational neuroevolution. How does diversity

enable complex solutions to emerge over time and adapt to changing environments?

Genetic Diversity: What role does genetic diversity play in evolutionary computa-

tion? Discuss the problems that arise when a population converges too quickly and

how these issues hinder recombination and exploration.

Behavioral Diversity: Why is behavioral diversity particularly important in

neuroevolution? Contrast it with genetic diversity, and describe a scenario where

behavioral diversity could improve the search process.

136

CHAPTER 5. UTILIZING DIVERSITY

Diversity Maintenance Techniques: Compare and contrast two methods for

maintaining genetic diversity: ﬁtness sharing and crowding. Howdo these techniques

work, and what are their limitations?

Behavior Characterizations: What is a behavior characterization (BC), and why is

it essential for measuring and promoting behavioral diversity? Provide an example

of how a BC could be deﬁned in a robot navigation task.

Multiobjectivity: Explain how multiobjective optimization fosters diversity in

neuroevolution. What are the beneﬁts of having a Pareto front, and how does it

relate to boosting population diversity?

Quality Diversity: What is the goal of quality diversity (QD) in evolutionary

algorithms, and how does it diﬀer from traditional optimization objectives? Describe

how QD methods like MAP-Elites or NSLC maintain both high-performing and

behaviorally diverse solutions.

Ensembling: Why is ensembling particularly well-suited for evolutionary algo-

rithms? Describe how the NEAT method uses speciation to facilitate ensembling,

and provide an example of its application.

Cultural Diversity: What is the role of population culture in neuroevolution? How

can cultural mechanisms, such as culling, mate selection, discarding, and training,

improve the eﬃciency and outcomes of evolutionary processes?

10.

Egalitarian Learning: Deﬁne egalitarian social learning in the context of neu-

roevolution. How does it diﬀer from a student-teacher approach, and why does it

enhance diversity in a population?

137

Chapter 6

Neuroevolution of Behavior

An important area of neuroevolution is to construct agents that behave intelligently in

simulated or real environments. Such behavior spans several levels: At the lowest level,

the neural networks optimize control tasks, such as locomotion for robots or production

in bioreactors. At gradually higher levels, they optimize behavioral strategies e.g. for

navigation, game play, or cognitive domains. At the very highest level, they may implement

decision strategies e.g. for business, healthcare, and society in general. This chapter

reviews successes and challenges in such domains, and also discusses how human expertise

can be incorporated into the discovery process.

6.1 From Control to Strategy

Neuroevolution is naturally well-suited for controlling agents and discovering behavioral

strategies for them, in both physical and virtual environments. However, in many domains

the environment can change in unexpected ways. The behavior has to adapt, sometimes

by tuning existing behaviors, sometimes by deploying distinctly diﬀerent behaviors at

diﬀerent times, and sometimes by discovering entirely new behaviors. Neuroevolution

approaches to discovering such ŕexible behaviors, and indeed prospects for evolving

generally intelligent agents, are reviewed in this section.

One of the most natural applications of neuroevolution is to discover eﬀective behavior

through interaction with the environment: The network receives sensor values as input,

and issues control commands to eﬀectors as output. If the network is recurrent, it can

integrate inputs over time, and thus disambiguate partially observable environments. It

can understand and take advantage of physical eﬀects such as friction and momentum,

remember objects that may be currently hidden from view, and so on.

For instance, in driving a simulated race car, neuroevolution discovered that it could

get through curves faster by tracing a wider trajectory. This strategy is counterintuitive

because such trajectories are longer; however, they allow for higher speeds, which is more

eﬀective in the end. In robot-arm control, neuroevolution discovered a way to compensate

for an inoperative main motor: It couldn’t turn around its main (vertical axis), so it evolved

instead to turn the arm away from the target, then swing it toward the target very fast,

creating enough momentum to turn the entire robot around. In controlling a simulated

138

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

spacecraft, when it did not have the jets to stop its forward movement, it instead turned it

around and then stopped the turn, resulting in a hard stop. In playing the Gomoku (or

5-in-a-row) against other programs submitted into a tournament, it discovered that it could

win by making a move very far awayÐthe other programs expanded their board size to

incorporate it, and crashed because they ran out of memory. There are numerous similar

examples in the literature, demonstrating creative ways of controlling simulated and real

robots, sometimes compensating for problems, other times achieving goals in creative

ways (Fullmer and Miikkulainen, 1992; Lehman, Clune, Misevic, et al., 2020; Moriarty

and Langley, 1998; Moriarty and Miikkulainen, 1996; Sit and Miikkulainen, 2005).

When discussing behavior, it is often useful to separate it into two diﬀerent levels. At

a lower level, the challenge is to discover an eﬀective single behavior, i.e. to devise optimal

control. At a higher level, the challenge is to utilize multiple behaviors appropriately, i.e.

to devise an optimal behavioral strategy. The challenges and solutions are diﬀerent in the

two cases.

Neuroevolution is well-suited to discovering single behaviors in challenging domains,

i.e. those that are dynamic, nonlinear, and noisy. For instance, in rocket control the goal is

to keep the rocket ŕying straight, even though it is an unstable system and can easily lose

stability due to atmospheric disturbances. Large rockets with multiple engines have them

each on a gimbal, making it possible to turn them through control algorithms, which is

heavy, expensive, and diﬃcult (indeed, rocket science). Smaller rockets instead have large

ﬁns that create enough drag at the back of the rocket to turn it into a stable system, with a

cost in performance. It turns out a neurocontroller can be evolved simply to control the

amount of thrust in each of the engines, and thus keep the rocket stable even without any

ﬁns at all (ﬁgure 6.1; Gomez and Miikkulainen, 2003). Such control is precise, robust,

and eﬀective, and would be diﬃcult to design by hand.

However, by itself such control is not particularly robust. It works well within the

conditions encountered during training, but it does not extend well to new conditions.

Yet in the real world, such changes abound. In rocket control, the rocket parameters may

vary, and weather conditions may vary; the rocket may need to ŕy through atmospheric

disturbances. A walking robot may need to get around or over obstacles, or deal with a

surface covered with water or ice. Sensors may drift or break entirely; actuators have

wear and tear or may become inoperative. Coping with such variation is, of course, a

major challenge for neural networks: While they interpolate well within the space of their

training, they do not extrapolate well outside it.

Similar successes and challenges can be seen at higher levels of behavior as well, i.e.

in discovering eﬀective behavioral strategies. A good example is the NERO video game

(Stanley, Bryant, and Miikkulainen, 2005). In this game, simulated robots are engaged in

a battle in a virtual world where they can sense objects, their teammates, opponents, and

line of ﬁre, and move around and shoot. The player does not control them directly, but

instead has the task of training them to behave eﬀectively in the battle. This goal means

coming up with a curriculum of gradually more complex challenges, such as approaching

a target, shooting accurately, avoiding ﬁre, coordinating an attack, and coordinating a

defense. The player achieves these behaviors by manipulating multiple objectives, i.e. the

ﬁtness function coeﬃcients along several measurable dimensions of behavior.

139

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

Info Box: Neuroevolution at UT Austin

Connectionist Models Summer School was a series of workshops organized in the

late 1980s and early 1990s to promote the burgeoning ﬁeld of neural networksÐor

connectionism, as it was then called. The 1988 version was organized at Carnegie

Mellon by Dave Touretzky, Geoﬀ Hinton and Terry Sejnowski. Some 100 students

participated, including me (Risto Miikkulainen), eager to learn how to bring about

a big change in AI. It was an exuberant convergence of ideasÐand one of them

was neuroevolution. It wasn’t actually one of the topics in lectures; it was brought

up in one of the breakout sessions by Mike Rudnick, a PhD student from Oregon

Graduate Institute. Genetic Algor ithms had gained some popularity, and Mike

thought they could be used to construct neural networks as well. I was working on

connectionist natural language processing then, but the idea seemed fascinating to

me and I put it aside hoping to get back to it someday.

That didn’t take longÐin Spring 1991, during my ﬁrst year as an assistant professor

at UT Austin, an undergrad named Brad Fullmer wanted to do an honors thesis,

and ended up evolving neural networks for an agent that roamed a virtual world

and decided which objects in it were good and which were badÐlaunching a

research direction in my lab on virtual agents that continues to this day! Brad

developed a marker-based encoding technique where junk DNA could become

functional later, which I think still should be explored more. Dave Moriarty, a PhD

student, picked up the topic about a year later, and developed his own approach,

SANE (part of an appropriately named system called Sherlock), about evolving

a population of neurons, i.e. parts of a network instead of full neural networks.

Dave’s solution to forming full networks was to evolve network blueprints. In

parallel, Tino Gomez came up with another solution, Enforced SubPopulations, i.e.

evolving neurons for each location in the network separately. At the time, the ideas

were separate partly so that Dave and Tino could each make a distinct contribution

in their dissertationsÐit wasn’t until 22 years later that we realized we could bring

them together to evolve deep learning architectures in CoDeepNEAT!

At that time, I was ready to write a book about neuroevolution: The idea of

evolving elements for a dense structure (i.e. neurons for a fully connected network)

was elegant and the applications to control and behavior compelling. But a third

PhD student, Ken Stanley, at about 1999 started to make noises about how the

network’s topology mattered as well, and that we could optimize the topology of

a sparse neural network for the task. It didn’t ﬁt the paradigm, and I told him I

didn’t think it would workÐwhich probably only made him work on it that much

harder. That idea eventually became NEAT, and one of the most enduring ideas in

neuroevolution. Ken went on to build his own group at the University of Central

Florida and beyond, and to develop several new ideas with students who’ve in turn

formed their own groups in academia and industryÐincluding a fellow named

Sebastian, but that is another story.

Interestingly, it is possible to design curricula that are more eﬀective than others, in

that they result in more sophisticated behavior that takes more factors into account. There

140

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Rocket control (𝑏) NERO video game

Figure 6.1: Neuroevolution of eﬀective control and behavioral strategies. (

𝑎

) Neuroevolution

discovers a controller that can keep the rocket stable by controlling the amount of thrust to its four

engines. It is accurate enough so that the ﬁns are no longer required, allowing the rocket to ŕy

much higher with the same amount of fuel. It is, however, diﬃcult for the controller to generalize

to variations in the rocket parameters and environmental conditions. (

𝑏

) In the NERO video game,

a human player trains the agents through a curriculum of exercises to attack a target while at the

same time avoiding ﬁre from opponent agents. This is a sophisticated behavior, but a good team

needs other behaviors as well, such as defending and sharpshooting, which are diﬃcult to evolve at

the same time. A challenge for neuroevolution, thus, is to discover ŕexible, multimodal behavior

on its own, as an important step towards general intelligence. For animations of these behaviors,

see

https://neuroevolutionbook.com/demos

. Figure (

𝑎

) from Gomez and Miikkulainen

(2003); ﬁgure (𝑏) from Stanley, Bryant, and Miikkulainen (2005).

also does not appear to be a single strategy that always works better than others, but team

A can beat B, which can beat C, which can beat AÐthis is precisely what makes the game

interesting for a human player.

However, NERO also illustrates the limitations of the standard neuroevolution approach

in discovering behavioral strategies. Throughout the evolutionary process, it elaborates on

earlier behaviors and usually produces a sophisticated ﬁnal behavior that subsumes all

of them. However, the most successful teams in the game are composed by hand from

individuals evolved separately toward diﬀerent goals: sharpshooters, attackers, defenders,

etc. Evolution does not spontaneously evolve agents that could deploy such very diﬀerent

behaviors at diﬀerent times, nor a strategy for switching among them appropriately. Yet

if neuroevolved agents are to be deployed in the real world, such ŕexible multimodal

behavior is likely to be required. There are oﬀensive and defensive modes in many games;

the opponent may utilize a diﬀerent strategy; the agent may be part of a team with diﬀerent

abilities.

Such ŕexibility in control and strategy is a hallmark of general intelligence. Much

recent work has focused on techniques that would allow discovering and utilizing it, as

will be discussed in the next three subsections.

141

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

6.2 Discovering Robust Control

As was discussed in section 3.2, control means managing the eﬀectors of a real or simulated

agent so that it reaches its target in an eﬀective manner. Usually, the controller observes

the current state of the agent and environment through sensors (in a closed-loop or

feedback control setting), and therefore can be naturally implemented in a neural network.

The advantage is that such networks can deal with noise, nonlinear eﬀects, and partial

observability in a natural way. It is still challenging for them to react to changes that were

not seen in training, which happens all the time in any complex environment in the real

world. Therefore, several techniques have been developed to make them robust in such

situations.

6.2.1 Noise, Exploration, and Novelty

Perhaps the simplest way of encouraging robust control is to add noise to the outputs of the

controller. Such trajectory noise means that the control does not have precisely the desired

eﬀect, but continually places the controller into situations from which it has to recover

(Gomez and Miikkulainen,

2004). Interestingly, trajectory noise is more eﬀective than

sensor noise in producing this eﬀect. Apparently, adding noise to sensors may confuse the

agent about what it should do, but it does not similarly place it in useful training situations.

This idea can also be put to work more directly by using evolution to discover such

situations automatically. For instance, if the desired actions can be speciﬁed for each

situation, the controller could be trained with gradient descent. But how can the desired

actions be speciﬁed? The answer is that a separate neural network can be evolved to

generate them. That is, for each input situation, a teacher network generates the targets,

and a controller network is trained by gradient descent to reproduce them. The teacher’s

ﬁtness depends on how well the controller it trains performs in the task. How is this

approach any diﬀerent from evolving a network to generate good actions directly? It turns

out the targets that the teacher evolves to generate do not actually correspond to optimal

outputs in the task, as was demonstrated in a foraging robot domain (Nolﬁ and Parisi,

1994). Instead, they evolve to represent maximally eﬀective learning experiences, i.e.

those that allow learning to proceed faster and more robustly. They may be exaggerated,

more varied, and more diﬃcult situations, thereby leading to better ﬁnal performance in

the task.

This approach can be generalized further into a setting where problems are coevolved

with solutions. For instance, a set of objective functions can be evolved for maze

running, encouraging solutions that get closer to the goal, but also maximize several novel

objectives. Such evolution was more eﬀective in discovering solutions to harder mazes

than ﬁxed-ﬁtness evolution and novelty search (Sipper, J. H. Moore, and Urbanowicz,

2019). Similarly, the coevolution of obstacle courses and runners results in more eﬀective

running behavior. Evolution starts with simple courses and gradually complexiﬁes them

as better runners are discovered, eventually constr ucting behavior that far exceeds what

direct evolution could do. This system, POET (R. Wang, Lehman, Clune, et al., 2019),

will be described in more detail in section 9.3. Such coevolution can also occur naturally

142

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

in competitive environments, such as zebras and hyenas described in section 7.2.2. Each

species evolves to compensate for the more sophisticated strategies that the other species

discovers, resulting in an arms race of more complex behaviors that would be discovered if

the other species were ﬁxed. In all these cases, neural network controllers are evolved in a

task that is not ﬁxed, but becomes more challenging as evolution progresses, automatically

encouraging robust and general solutions and more complexity that can be achieved in a

static setting.

Novelty search, discussed in more detail in section 5.3, can be seen as a related but

subtly diﬀerent approach. In novelty search, individual controllers are rewarded if they

generate behavior that is diﬀerent from that seen before during evolution. Thus, the

idea is to create as much diversity as possible, and to explore the space of behaviors as

completely as possible. Eventually, some individuals will be chosen as solutions because

they happen to perform well in the task of interestÐwhich is not driving novelty search

directly. Importantly, the process of discovering these solutions is very diﬀerent from

goal-directed search. The process may include stepping stones that have little to do with

the ultimate task. The solutions may thus be built on a more general and therefore robust

foundation. This result was seen clearly in the bipedal walk example in section 5.3:

Whereas ﬁtness-based evolution resulted in a rigid, slow walk that often fails, novelty

search discovered a dynamic, fast walk that is remarkably robust.

In this manner, variation in the evaluation of agents can lead to more robust control.

Another approach is to incorporate knowledge from the domain, as will be discussed next.

6.2.2 Symmetry, Context, and Adaptation

In some cases, we may know something about the system we are controlling, and it may

be possible to take such knowledge into account in designing the network architecture

that is then evolved to control it. For instance in multilegged walking, each leg should

be controlled in a similar way, and there are symmetries between the left and the right

side, and possibly the front and the back. These symmetries result in a number of possible

gaits: For instance, four-legged animals such as horses can trot (move diagonal legs in

phase), bound (move front legs in phase and back legs in phase), pace (move legs on each

side in phase), and pronk (move all legs in phase). These basic gaits can then be adjusted

according to the speed and terrain.

The symmetry-breaking approach can be formalized computationally in bilevel

neuroevolution approach (ENSO; Valsalam, Hiller, MacCurdy, et al., 2013; Valsalam

and Miikkulainen,

2011). Each leg controller, or a module, receives the angle of the

leg it controls as its input, and outputs the desired angular velocity of that leg. In

addition, through intermodule connections, it receives input from all the other modules

(ﬁgure 6.2). The process starts with a population of fully symmetric individuals, where

all leg controllers are identical, and they are all connected with the same intermodule

connections. The connection weights are initially assigned randomly, and evolved as

usual through mutation and crossover in order to ﬁnd the best individuals with the current

symmetry.

At the higher level, evolution then explores diﬀerent symmetr ies. Through symmetry

mutations, the initial symmetry is broken and the connections start to diverge. Some

143

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Leg controller

(

𝑏

) Overall symmetry

(𝑐) Walking sideways on an incline

Figure 6.2: Evolving symmetries for four-legged walking. In this experiment, neuroevolution

was extended to take advantage of symmetry in the four-legged robot. (

𝑎

) Each leg has its own

controller neural network, and each one receives input from the others. (

𝑏

) Evolution starts with

fully symmetric designs and breaks the symmetry as needed, i.e. allowing the weights on the

diﬀerent connections to diverge (as indicated by the colors). Such highly symmetric networks

allow the robot to take advantage of the four main gaits on the ŕat ground. (

𝑐

) A controller crossing

a slippery incline requires a less symmetric solution than a straightforward walk on ŕat ground: It

evolved to use the front downslope leg primarily to push up so that the robot could walk straight. In

this manner, neuroevolution can demonstrate how principles such as symmetry help construct robust

behavior. For animations of these behaviors, see

https://neuroevolutionbook.com/demos

Figures (𝑎) and (𝑏) from Valsalam and Miikkulainen (2011).

of the modules are no longer constrained to be the same, and some of the intermodule

connections are no longer constrained to be the same. In this manner, evolution evaluates

more symmetric solutions before evaluating less symmetric ones. This bias allows it to

discover simpler and more general gaits ﬁrst, and more complex ones later if they turn out

to be necessary. Interestingly, on ŕat ground, highly symmetric individuals evolve that are

capable of all four main gaits. Depending on how their leg positions are initialized, they

may pace, trot, bound, or pronk. Also, they can dynamically switch between them. For

instance, an individual may start with a bound gait, but hit a simple obstacle that prevents

it from moving its legs the way it attemptsÐit can then switch to a trot, which moves

the legs over the obstacle one at a time. Such robustness emerges automatically from the

constraints of maximal symmetry among the controllers.

However, the environment may also present challenges where less symmetric solutions

are required. The terrain may be cluttered with major obstacles, or slippery and inclined;

faults may occur in the system, i.e. some legs may be damaged or inoperative and no

longer move as expected. It tur ns out that the symmetry evolution approach can discover

solutions for many such cases by breaking more of the symmetry. For instance when

it has to walk sideways on a slippery incline, the front downslope leg evolved a role of

simply pushing the agent upwards, while the other three propelled it forward. It would be

diﬃcult to design eﬀective gaits for such situations by hand, yet the systematic approach

to understanding the symmetry of the agent and constraining evolution to take advantage

of it makes it possible to discover them eﬀectively and robustly.

Another powerful approach to dealing with variation in the environment is to model

it explicitly within the controller. That is, the system consists of three neural network

components: A skill network that takes actions, a context network that models the

environment, and a decision network that uses the current representation of the context

144

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

to modulate the actions of the skill module (ﬁgure 6.3; X. Li and Miikkulainen, 2018;

Tutum, Abdulquddos, and Miikkulainen, 2021).

This context+skill approach was ﬁrst developed for opponent modeling in poker, where

it resulted in a surprising ability to generalize against new opponents. When evolved to

play well against only four canonical simple behaviors (always raise, always call, always

fold, follow raw hand strength statistics), it was able to beat Slumbot, the best open-source

poker player at the time. The skill module evolved to make reasonable actions based

on the sequence in each game; the context module evolved to recognize the canonical

behaviors that Slumbot used at diﬀerent times; and the decision-maker evolved to adjust

the actions based on the context.

It turns out that the approach can be generalized to robust control more generally,

including games such as FlappyBird, LunarLander, and CARLA (simulated driving). For

instance in FlappyBird, it can be used to play robustly when the game conditions change.

In this game, a bird ŕies at a constant speed through a horizontal track where it has to

avoid hitting pipes that appear at constant intervals. The player takes a łŕapž action to

push the bird up, and gravity will pull it down constantly. Precise timing of the ŕap actions

is required to avoid the pipes, and they have to anticipate not just the next pipe but the

location of those that follow as well. In an extended version of the game, another action, a

forward ŕap, is added, causing a forward push that is constantly slowed down by drag.

Diﬀerent versions of the game can be generated by simply adjusting the strength of the up

and forward push and the strength of gravity and drag.

It turns out that without the context module, the FlappyBird controller does not

generalize much at all beyond the versions seen during training, i.e. with +/-20% of

variation on the four parameters. As is usual in neural networks, the controller can

interpolate between situations it has seen before, but cannot handle situations that would

require extrapolation. With context, however, it can ŕy robustly in conditions that vary +/-

75%, i.e. in conditions that require signiﬁcant extrapolation.

It is interesting to analyze how context modulation achieves such robustness. One might

expect that the context network outputs change signiﬁcantly in new situations, making

it possible for the decision-maker to modulate the skill network’s actions accordingly.

However, the opposite is actually true: The outputs of the context and skill actually change

very little, requiring very little new behavior from the decision-maker. In eﬀect, the context

network evolved to standardize the diﬀerent situations and map them to a limited range

where the actions are known. Such a principled understanding of the domain extends to a

much broader range of conditions, and therefore leads to extrapolation.

The context+skill approach can also be useful in coping with environments that change.

As will be discussed in section 6.2.3, the real world is rarely constant, but instead, there

are changes due to outside factors, wear and tear in the mechanics, noise and drift in the

sensors, and so on. The context module can learn to anticipate such changes and modulate

the skill module accordingly. For instance in the gas sensor drift domain (Warner, Devaraj,

and Miikkulainen, 2024), it learned the direction and magnitude of such changes over

time, allowing it to classify future examples signiﬁcantly more accurately than a model

that was simply trained to be as general as possible.

Changes in the environment may not always be predictable over time and may exceed

145

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Context+skill network (𝑏) Context+skill control (𝑐) Skill-only control

Figure 6.3: Modeling the environment explicitly with a context network. In many domains,

conditions can vary signiﬁcantly and unexpectedly, requiring extrapolation beyond training. For

instance in an extended FlappyBird domain, the strength of the forward ŕap, upward ŕap, gravity,

or drag can change. (

𝑎

) In such settings, it can be beneﬁcial to model the variation explicitly

with a context network; the decision maker can then use the context to modulate the actions of

the skill network appropriately. (

𝑏

) The context network evolves to standardize the variation

so that the decision-maker sees little of it (shown here through the ﬁrst principal components

of the context and skill module output over time on top, lined up with the bird’s location in

the bottom). It can thus perform well in a new situation, such as the decreased strength of the

upward ŕap or an increased drag. (

𝑐

) Without context, the skill network outputs vary much more,

making it diﬃcult for the decision maker to generalize. In this manner, explicit understanding

of the context extends the behavior robustly to variations of the domain. For animations of these

behaviors, see

https://neuroevolutionbook.com/demos

. Figure from Tutum, Abdulquddos,

and Miikkulainen (2021).

the generalization ability of the controller networks. In such cases, some kind of rapid

online adaptation may be necessary. However, neuroevolution is usually applied as an

oﬄine method, i.e. the controllers are evolved during a training period ahead of time and

then deployed in the application. Further adaptation would then require another period of

oﬄine evolution. Continuing evolution during deployment is diﬃcult because it creates

many candidates that are not viable. Indeed, the exploratory power of evolution, which

is its greatest strength, makes it diﬃcult to apply it online, where every performance

evaluation counts. Historically, this was the main diﬀerence between reinforcement

learning, which was intended as an online lifelong learning method, and evolutionary

computation, which was an oﬄine engineering approach. This diﬀerence has blurred

recently: Many reinforcement learning approaches are now oﬄineÐand similarly, there

are versions of neuroevolution that can work online (e.g. rtNEAT in section 8.1, EANT,

odNEAT and others; Agogino, Stanley, and Miikkulainen, 2000; Cardamone, Loiacono,

and Lanzi, 2009; Metzen, Kirchner, Edgington, et al., 2008; Silva, Urbano, Correia, et al.,

2015).

For instance, once the initial neurocontrollers have been evolved oﬄine, they can be

146

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

reﬁned online using particle swarming (PSO; Gad, 2022; Kennedy, Eberhart, and Shi,

2001). PSO is loosely based on the movement of swarms such as birds or insects. A

population is generated around a well-performing individual, and changes made to each

individual by combining its own velocity (i.e. history of changes) with that of the best

individuals in the population. PSO therefore provides a way to ﬁnd local optima accurately.

Combining a GA and PSO thus allows for both exploration and exploitation: GA can

make large changes to the solutions, discover ing diverse approaches and novelty, and PSO

can reﬁne them through local search. Such combinations of global and local search, or

memetic algorithms, are useful in neuroevolution in general, including neural architecture

search (ElSaid, Ricanek, Lyu, et al., 2023; Lorenzo, Nalepa, Kawulok, et al., 2017; Ribalta

Lorenzo and Nalepa, 2018). They can also implement online adaptation: Assuming the

changes in the environment are gradual, they can create alternative solutions that still

perform well, but also track the changing requirements.

For instance in the bioreactor control domain, micro-organisms grow by consuming

a nutrient substrate which is continuously fed into the reactor. The growth process is

dynamic, nonlinear, and varies unpredictably. The best production is achieved close

to the maximum liquid level of the reactor; however, this level must not be exceeded,

otherwise the reactor needs to be shut down. While the initial controllers constructed

through neuroevolution were able to keep the reactor operational, ﬁne-tuning through PSO

improved the production signiﬁcantly. When changes were introduced into the simulation,

online adaptation through PSO was able to keep the operation safe, while still tracking the

economic optimum closely (van Eck Conradie, Miikkulainen, and Aldrich, 2002a; van

Eck Conradie, Miikkulainen, and Aldrich, 2002b). In this manner, online adaptation can

be used to add robustness to the control that would be diﬃcult to achieve otherwise.

Thus, neuroevolution can naturally deal with noisy and nonlinear domains, and there

are many ways to make it robust when the domain varies signiﬁcantly. But are such

solutions robust enough to cope with variation in the physical world? This question will

be addressed next.

6.2.3 Transfer to Physical Robots

There is generally a reality gap between simulation and physical reality: Simulations are

clean and deterministic, and the real world is noisy, nondeterministic, includes external

factors that are not part of the simulation, there’s give and wear and tear in the wheels and

motors, etc. As a matter of fact, the robotics community is often not very impressed even

with very impressive simulation results, and justiﬁably so.

However, neuroevolution is in a good position to make the transfer to real robots

possible. By its very nature, controllers are evolved to cope with imperfections, and

even take advantage of them, as was seen in the robot with an inoperative main motor in

section 6.1. A similar result was obtained in the four-legged walking domain (Valsalam,

Hiller, MacCurdy, et al., 2013). An actual physical four-legged robot was constructed

with a similar structure to the simulations. Its four legs were each angled away from the

center and rotated around a circle, thus each propelling it forward with a slight angle

(ﬁgure 6.4a). Such a gait made it possible to walk forward as well as turn at will. Most

remarkably, when one of the legs became inoperative, an asymmetric gait evolved where

147

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Physical four-legged robot (𝑏) Dreamer robot with Mekahand

Figure 6.4: Transferring control to physical robots. In these two examples, the controller

neural network is evolved in simulation and then used to control the corresponding physical

robot. (

𝑎

) A four-legged physical robot evolved to walk straight even with one leg inoperative.

(

𝑏

) An accurate simulator of a robotic arm was used to evolve controllers that generalize

well to new situations and imprecise computation. In this manner, it is not only possible to

transfer to physical robots, but also construct controllers that are robust against noise, faults,

and new situations. Figure (

𝑎

) from Valsalam, Hiller, MacCurdy, et al. (2013); Figure (

𝑏

) from

C. Huang, Sentis, Lehman, et al. (2019). For an animation of the four-legged robot, see

https://neuroevolutionbook.com/demos.

the remaining leg on the same side traced a wider arc than the two on the other, allowing

the robot to still walk straight. Thus, not only did the neuroevolution approach transfer to

physical robots, it also came up with a solution to a situation that would have been very

diﬃcult to design by hand. Another approach that can facilitate transfer to real robots is

Hebbian learning, which we will review in a case study in section 12.3.2.

If transfer to the physical world is anticipated, the simulation can be extended with

mechanisms that simulate the physical challenges. For instance, factors such as wind,

variable friction, and uneven terrain can be programmed into the simulation. However, it

is more diﬃcult to simulate all possible imperfections that might occur, such as slippage,

blocked sensors, loose connections, battery drainage, and wear and tear. One way to

deal with such issues is to add noise and stochastic blockage to the simulated sensors and

eﬀectors. Both kinds of noise allow simulating the world more realistically. As mentioned

above, eﬀector (or trajectory) noise also allows training the controller in more varied

situations.

Recently, robotics simulators have become accurate enough to support transfer in

many cases. For instance in robotic grasping, it is possible to evolve a neural network

controller and transfer it into the physical robot as is (P.

C. Huang, Sentis, Lehman, et al.,

2019). NEAT was used with the Graspit! simulator and transferred to the Dreamer robot’s

Mekahand (ﬁgure 6.4

𝑏

). The resulting controller was surprisingly robust, coping with

sensor and eﬀector inaccuracies as well as novel objects well. Most interestingly, it was

robust against imprecise computation: When the grasping had to be completed very fast,

only approximate information about the process was available, yet the controller managed

to grasp the object safely in most cases.

Even though neuroevolution of behavior mostly focuses on virtual agents, much of

148

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

it actually originates from robotics. The ﬁeld of evolutionary robotics emerged in the

1990s and continues to this day (Bongard,

2013; Doncieux, Bredeche, Mouret, et al.,

2015; Nolﬁ and Floreano, 2000; Vargas, Di Paolo, Harvey, et al., 2014). The controllers

and sometimes also the hardware are evolved, and often the controllers are simple neural

networks. The original motivation was that robot control is diﬃcult to design by hand, and

can be more readily done through neuroevolution (Cliﬀ, Harvey, and Husbands, 1993).

Simulations are often a useful tool; however, it is also possible to evolve the controllers

directly on robotic hardware. For instance, recurrent discrete-time neural networks were

evolved on the Khepera miniature mobile robot to develop a homing behavior (ﬁgure 6.5

𝑎

;

Floreano and Mondada, 1996a). The network developed an internal topographic map

that allowed it to navigate to the battery charger with minimal energy simply in order to

survive.

An interesting direction is to evolve both the controllers and hardware at the same

time. Indeed, such coevolution can facilitate the evolution of more complex and robust

solutions (Bongard, 2011). For instance in evolving locomotion, the robots may start with

an eel-like body plan and gradually lose it in favor of a legged design. The gaits on robots

that go through such a process can be more robust than those evolved on the legged design

directly. To make morphological innovations feasible, it may be useful to protect them by

temporarily reducing evolutionary selection pressure (Cheney, Bongard, SunSpiral, et al.,

2018). Such protection is a useful general principle in discover ing complexity, similar to

speciation in NEAT (section 3.3). In section 7.1.2 we will see how this type of approach

can also be extended to protecting innovation in heterogeneous neural architectures.

The most extreme demonstration of this approach is GOLEM (genetically organized

lifelike electromechanics; Figure 6.5

𝑏

; Lipson and Pollack, 2000). Not only were the

hardware designs and the neural network controllers coevolved, but the robots themselves

were 3-D printed according to the evolved designs. The designs were evaluated for their

locomotive ability in simulation. The best ones were then printed and evaluated in the

physical world, and found to perform as expected. The evolved virtual creatures (Lessin,

Fussell, and Miikkulainen, 2013; Lessin, Fussell, and Miikkulainen, 2014) discussed in

section 14.5 extend this approach to more complex morphologies and behaviors, all the

way to ﬁght-or-ŕight, albeit in simulation and with a hand-constructed syllabus. However,

it is possible to imagine a future where robot bodies and brains are coevolved automatically,

the results created on multimaterial 3D printersÐand once the printing is ﬁnished, the

robots wake up and walk oﬀ the printer on their own.

Evolutionary robotics has already been scaled up to swarms, i.e. robot teams that

exhibit collective behavior (Dorigo, Theraulaz, and Trianni, 2021; Trianni, Tuci, Ampatzis,

et al.,

2014). The challenge in this area is to evolve the swarm to perform tasks that single

robots could not. For instance, such robots can hook up and form a linear train that can get

over obstacles and gaps that a single robot could not (ﬁgure 6.5

𝑐

). Many interesting issues

come up in evolving neural controllers for such robots. For instance, should they all be

clones of each other, or each evolved to ﬁll a speciﬁc role in the team? Collective behavior

in general is an important area of neuroevolution, discussed in depth in chapter 7.

149

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Evolving control in

hardware

(𝑏) Coevolving morphology and

control

(𝑐) Swarm robots working

together

Figure 6.5: Neuroevolution in Evolutionary robotics. While robotics generally focuses on

hardware designs, it is diﬃcult to construct controllers by hand, especially with novel and variable

designs. Neuroevolution is often a useful approach in many such cases. (

𝑎

) Neural network

controllers can be evolved directly in hardware, for instance to develop homing behavior in

Kheperas. The light source identiﬁes the corner with the charging area (painted in black). (

𝑏

)

It is possible to evolve the robot morphology and control together, and 3D print the designs,

in essence evolving artiﬁcial life forms. (

𝑐

) Swarms of robots can perform tasks that single

robots may not, such as traversing over holes in the ground. In this manner, neuroevolution

makes it possible to develop behaviors for a wide variety of robotic designs. Figure (

𝑎

) from

Floreano and Mondada (1996a); Figure (

𝑏

) from Lipson and Pollack (2000); and Figure (

𝑐

) from

Trianni, Tuci, Ampatzis, et al. (2014). Videos of the coevolving morphology and control at

https://neuroevolutionbook.com/demos.

6.3 Discovering Flexible Strategies

The neuroevolved solutions so far have focused on control. At this level, adaptation

most often means modulating or adjusting a single existing behavior: Throttle one of

the engines a little more, move one leg a little faster, ŕap a little harder. When behavior

extends from such low-level control to a high-level strategy, goal-driven coordination of

multiple behaviors is required. For instance, oﬀensive vs. defensive play in robotic soccer

may require getting open vs. covering an opponent; actions required of a household robot

are very diﬀerent when it is vacuuming vs. emptying the dishwasher vs. folding laundry;

game agents may need to gather resources, attack, and escape. Such strategies are the

topic of this section.

6.3.1 Switching between Behaviors

Evolving high-level strategies is challenging not only because the agent must have command

of a much larger reper toire of behaviors, but it also needs to know when and how to

switch between them. Proper switching is diﬃcult for two reasons: ﬁrst, in some cases

it may have to be abrupt, i.e. small changes in the environment may require drastically

diﬀerent actions; second, sometimes the diﬀerent strategies need to be interleaved or

blended instead of making a clean switch.

The ﬁrst challenge can be illustrated e.g. in the half-ﬁeld soccer domain, where

ﬁve oﬀenders try to score on ﬁve defenders, using eight behaviors: getting open and

intercepting the ball, and holding the ball, shooting at the goal, and passing it to one of the

150

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Game situation (𝑏) Values of actions

Figure 6.6: Fractured high-level strategy in half-ﬁeld soccer. High-level strategies are diﬃcult

to discover and implement because they often require changing behaviors abruptly based on small

changes in the input. (

𝑎

) For instance in half-ﬁeld soccer, ﬁve oﬀenders (blue dots) try to score on

ﬁve defenders (white dots) by holding the ball, passing to one of the teammates, and shooting.

(

𝑏

) Visualization of successful actions for an oﬀender with a ball at various locations in the ﬁeld,

given the positions of all other players. Each color represents a subset of actions that would

be successful. Small changes to just this one variable have a large eﬀect on success, making

good strategies highly fractured and diﬃcult to evolve. Neuroevolution with local neurons and

cascaded reﬁnement is an eﬀective approach in such cases. For animations of these behaviors, see

https://neuroevolutionbook.com/demos. Figures from Kohl and Miikkulainen (2011).

four teammates (ﬁgure 6.6 ; Kohl and Miikkulainen, 2011). Depending on the position of

the ball, teammates, and opponents, boundaries between these behaviors are very tricky.

If an opponent moves even slightly to block a teammate, passing becomes infeasible; if an

opponent crosses a threshold distance, holding becomes infeasible. Furthermore, actions

that interpolate between these behaviors are not possible: They have to be performed fully

or not at all. Thus, the domain can be described as fractured: as the state of the world

changes, the correct actions change frequently and abruptly.

It is very diﬃcult for neuroevolution to discover such fractured strategies. In most

domains, continuous control works just ﬁne, i.e. when the situation changes a little,

the control output changes a little, and continuously so. Neural networks represent

such continuity well naturally, and we have seen how approaches such as multiagent

HyperNEAT can take advantage of it to encode a team of agents (section 4.13). In contrast,

hard switches are more diﬃcult to establish. However, the network architecture can be

designed to make them easier to discover in two ways: (1) instead of sigmoid activation

functions, radial basis functions can be used. They each activate a neuron in a speciﬁc

local region, making it easier to cover fractured decision boundaries. (2) the network

topology can be constructed in a cascaded manner, i.e. complexifying by adding neurons

as extra layers on top of the existing network, instead of anywhere in the network as usual

in NEAT. Such a cascade allows each new neuron to implement a reﬁnement of existing

behavior, gradually forming more fractured decision boundaries. These mechanisms can

be used to augment the usual NEAT mechanisms as needed through adaptive operator

selection (SNAP-NEAT; Kohl and Miikkulainen, 2011) Indeed, in domains like half-ﬁeld

soccer, this approach performs much better than handcoded solutions as well as standard

151

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

reinforcement learning and other neuroevolution techniques.

A second challenge in constructing an eﬀective strategy is that switching between

behaviors needs to be ŕexible. In some cases, such as switching between batting and

ﬁelding in baseball, or vacuuming and emptying the dishwasher, the behavior changes

entirely for a long period of time. Such tasks are isolated and can be implemented even

with diﬀerent neural networks and a switch network that decides between them. However,

in other cases the behaviors are interleaved, occurring several times in rapid succession.

For instance, the possession of the ball in soccer can change rapidly, requiring the players

to switch between oﬀensive and defensive play often, and even anticipate such switches.

In yet others, such as dodgeball, the oﬀensive and defensive behaviors are blended because

there are multiple balls at play, and a player may attempt to throw a ball at the same time as

avoiding getting hit by one. Thus, intelligent agents must be capable of diﬀerent behaviors

at diﬀerent times, as well as interleaving and blending them.

A good platform to study such behaviors is the Ms. Pac-Man video game (ﬁgure 6.7

Schrum and Miikkulainen, 2016b). In a maze, the player eats pills while trying to avoid

getting eaten by ghosts. Upon eating a power pill, the ghosts become edible too. Thus,

the behaviors of running away from threatening ghosts and approaching edible ghosts are

interleaved. However, as soon as a ghost is eaten, it returns as a threat, and at that point,

the tasks are blended: The player has to run away as well as approach some of the ghosts

at the same time. With slight modiﬁcations to the game, isolated tasks can be studied as

well, i.e. by ﬁxing the ghosts to be either threatening or edible.

A network controlling Ms. Pac-Man sees the state of the game e.g. as distances to pills,

power pills, and ghosts in diﬀerent directions, and whether the ghosts are edible. As its

output, it decides which way to move. A simple such network can be evolved e.g. with

NEAT but it does not perform very well: It has a diﬃcult time separating the diﬀerent

behaviors, and tends to blend them and not perform any one of them very well. This result

indeed illustrates the main challenge in learning high-level strategies with neuroevolution.

The opposite approach would be to have a human expert identify what behaviors

are needed, and evolve each one separately, as well as a selection neural network that

decides which behavior needs to be used when. This approach works well when the tasks

are clearly separated (e.g. ﬁght-or-ŕight in section 14.5), but it can also work when two

behaviors need to be combined, such as evading a predator while simultaneously catching

a prey (A. Jain, Subramoney, and Miikkulainen, 2012).

However, it may also be possible to learn multiple behaviors in a single network,

taking advantage of commonalities between them. For instance, it is possible to evolve a

single multitask network with diﬀerent outputs to control Ms. Pac-Man when the ghosts

are threatening and when they are edible. The division is not learned but implemented

algorithmically. This approach works well with isolated and interleaved versions of the

task. Since the same part of the network is used consistently in similar situations, evolution

discovers eﬀective oﬀensive and defensive behaviors. In blended situations it is not

eﬀective though. A third set of outputs can be evolved for such situations, but it does not

learn very well.

A fourth approach is to let evolution discover when to use what strategy. In this

Modular Multiobjective NEAT method (MM-NEAT; Schrum and Miikkulainen, 2016a),

152

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Preference neuron architecture (𝑏) Invoking the lur ing module

Figure 6.7: Discovering eﬀective and surprising multimodal task divisions. Behavioral

strategies are often multimodal, i.e. require performing diﬀerent behaviors at diﬀerent times.

Modular network structures are a natural way to encourage multimodal behavior to emerge. (

𝑎

) A

powerful approach is to evolve a network with multiple output modules together with preference

neurons (grey) to indicate when each module should be used to control the agent. (

𝑏

) Such a system

may discover surprising task divisions. For instance in Ms. Pac-Man, instead of separating the

threatening and edible ghost situations into diﬀerent modules, it separates general easy movement

into one module, and behavior when ghosts are close into an escape module (active during the

green trace). That module is used to lure the ghosts nearby and then escaping to eat a power

pill; afterward, the movement module is used to eat up the ghosts (which is easy because they

are nearby), resulting in a high score. Such division and behavior would be diﬃcult to discover

and prescribe by hand, yet evolution discovers it as an eﬀective solution to a multimodal game.

For animations of these behaviors, see

https://neuroevolutionbook.com/demos

. Figure (

𝑎

)

from Schrum and Miikk ulainen (2016b).

each of the output modules is coupled with a preference neuron that indicates how strongly

the network believes the corresponding output should be used. In this setting, evolution

might be expected to discover oﬀensive and defensive strategies and how to switch between

them. However, it discovers a much more sophisticated and sur prising approach. The

strategies that evolve are not oﬀensive and defensive, but instead behaviors that apply to

easy and diﬃcult situations. That is, one output module controls Ms. Pac-Man when she

is running around eating pills when no ghosts are nearby, whether they are threatening

or edible. A second module specializes in escaping when threatening ghosts are nearby.

With these modules it implements a highly eﬀective luring strategy: It lets the ghosts

get close, then escapes them to the nearby power pillÐand is then able to eat the ghosts

eﬀectively because they are close!

Even though the escape module is rarely active, it is crucial in obtaining a high score in

the game. Therefore, half the network is dedicated for this behavior. Such a strategy would

have been diﬃcult for human designers to prescribe, yet evolution discovered it as the most

eﬀective way to play the game. It demonstrates how eﬀective high-level strategies are not

153

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

only composed of multiple behaviors, but of intelligent ways of combining them. It also

shows that if evolution is allowed enough freedom to explore, it can discover surprising

and eﬀective such combinations.

6.3.2 Evolving Cognitive Behaviors

One potentially important role for novelty search and related methods is in discovering

cognitive behaviors such as communication, memory, and learning. Such behaviors

are complex and challenging to evolve, and several approaches have been developed to

discover them (see e.g. section 14.8.2; Ollion, Pinville, and Doncieux, 2012; Risi, Hughes,

and Stanley,

2010; Saunders and Pollack, 1996; Yamauchi and Beer, 1993). They illustrate

diﬀerent challenges and ways to overcome them, often through carefully crafted domains

and ﬁtness functions based on domain knowledge. A possible reason, evident even in the

most rudimentary versions of these behaviors, is that they require overcoming deception.

For instance, in order to evolve communication, it is necessary to discover what and

when to communicate, the mechanisms to send a signal, to receive it, and to interpret

it. Each one of these mechanisms requires extra hardware that does not provide an

evolutionary advantage unless all of the mechanisms are functional at once. They are thus

deceptive, and it is unlikely that evolution would stumble into them all at once. Also, if a

partial solution is found, it is diﬃcult for evolution to discard it in favor of a better one

(Floreano, Mitri, Magnenat, et al., 2007). They could, however, be discovered as stepping

stones by novelty search, making communication more likely to be discovered.

As an illustration of this idea, consider an agent in a T-maze (ﬁgure 6.8; Lehman and

Miikkulainen, 2014). Each agent is controlled by a neural network whose activation is

reset before each trial. In each trial, the agent starts at the bottom end. It needs to move to

the intersection and decide whether to go left or right in order to get to the reward. An

evaluation consists of multiple trials during which the reward stays in one place, but the

reward can move to the opposite end between evaluations. Thus, if the reward does not

move very often, or is most often found in one location, evolution can develop a simple

strategy that is better than chance: Go to the location where it is found more often and/or

more recently. However, if the reward moves frequently enough, communication, memory,

or learning is needed to capture it more reliably.

In a communication task, the agent can generate a signal at the end of the trial, and

the agent in the next trial will receive it at the start. A successful communication thus

indicates whether the agent should turn left or right at the intersection. In a memory task,

the agent will receive an A or B signal and then an X or Y signal before it can start to

move. The AX combination indicates the reward is at left, others indicate that it is at

right. The agent thus has to remember the combination of two signals in order to act

appropriately. In the learning task, the agent can adapt the network’s connection weights

through modulated learning rules after each trial to make a successful outcome more likely

(sections 12.3.3 and 14.3; Risi, Hughes, and Stanley, 2010). These weight changes persist

throughout the evaluation.

Indeed, ﬁtness-based evolution in this domain developed a reactive strategy of always

going to the left or right, depending on frequency and recency. This strategy was successful

only in less than 20% of the trials. Even when communication, memory, and learning

154

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Communication (𝑏) Memor y (𝑐) Learning (𝑑) Solution lineage

Figure 6.8: Overcoming deception in the evolution of cognitive behaviors. During an evaluation

that consists of multiple trials, the agent needs to use (

𝑎

) communication, (

𝑏

) memory, or (

𝑐

)

learning to navigate to the reward in the T-maze reliably. Even when the necessary elements for

these abilities are available, ﬁtness-based evolution cannot discover how to put them together.

Instead, it only discovers reactive behaviors, i.e. always going to the left or the right. In contrast,

they serve as stepping stones for novelty search, which eventually discovers eﬀective cognitive

behavior. Thus, the lineage of an eventual successful agent in novelty search includes many drops

in ﬁtness (

𝑑

). For instance, the novel behavior of going to the opposite corridor with some inputs

(arrow) turns out to be a useful stepping stone in discovering communication. Figures from

Lehman and Miikkulainen (2014).

were available, evolution could not ﬁnd a way of taking advantage of themÐin other

words, it could not overcome deception. However, with novelty search, evolution was

able to discover communication, memory, and learning strategies that were successful

in approximately 79%, 81%, and 57% of the trials. Analysis of the lineages of eventual

solutions shows that novelty search was indeed utilizing stepping stones, i.e. behaviors

that received lower ﬁtness on their own, but turned out useful in constructing the ﬁnal

communication, memory, or learning-based strategy.

Although the behaviors in the T-maze are simple, they are intended to capture the

essential challenge of discovering cognitive structures. The results thus suggest that

straightforward objective-based evolution is unlikely to discover cognitive behaviors, and

thus novelty search and perhaps quality diversity methods are essential.

6.3.3 Utilizing Stochasticity, Coevolution, and Scale

In many virtual domains, whether games or training environments, it is important that the

virtual agents are not entirely predictable. That is, their behavior should be nondeterministic

(or stochastic) to some degree, so that the simulation leads to a wider variety of situations

and challenges. Similarly during training, the agents then encounter a wider variety of

situations and may learn more robust and comprehensive behavior.

The action-unit coding at the output of the agent is generally a powerful approach:

The action represented by the most highly activated output unit is chosen at each time step.

Especially early in evolution, it is easier to ﬁnd such networks rather than networks that

would output continuous values (representing a range of actions) accurately.

If the agent networks were trained with backpropagation, such value-unit encoding

would result in a probability distribution, i.e. for each input, the activations across the

output units would indicate the probabilities of the correct action (Morgan and Bourlard,

1990). However, such distributions do not develop automatically in neuroevolution. The

155

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

networks may be able to identify the winner, i.e. develop the highest activation on the

correct output unit, but the activations of the other units do not develop into probabilities:

They do not matter for performance, and therefore can be anything, as long as they are

lower than that of the winning unit.

However, evolution can be guided to develop probabilities with the simple technique

of stochastic sharpening (Bryant and Miikkulainen, 2006). From the beginning, the

output activation values are treated as probabilities: They are normalized to sum up to

1.0, and the action to be performed is selected stochastically weighted by these values.

For instance in the Legion-II domain, initially the action values were relatively uniform,

generating a lot of randomness, but over evolution they became sharper, leading to more

eﬀective perfor mance. However, the perfor mance even in the end was somewhat stochastic,

resulting in the kind of believable and interesting gameplay that would be diﬃcult to

achieve otherwise.

Interestingly, stochastic sharpening also improves the search for eﬀective behaviors,

and such agents eventually outperfor m those evolved without it. They are exposed to more

situations during evolution, and thus evaluated more comprehensively. Their behavior

becomes more consistent because unexpected situations do not throw them oﬀ. They

also avoid output race conditions, i.e. situations where two output unit activations are

almost exactly the same, resulting in unreliable choices. Thus, stochastic sharpening is

one simple tool that can make behavior more eﬀective, so much so that it may even be

worth converting continuous domains to action-unit coding just to take advantage of it.

One important principle in evolving complex behavior that has not yet been discussed

is coevolution, i.e. evolving the behavior in competition with other agents, or in cooperation

with other agents. This is the topic of chapter 7, and in a sense it thus continues the

discussion of this section. More generally, coevolution may be extended to evolving body

and brain together, or the brain together with the tasks that it needs to solve (chapter 9).

All these approaches take advantage of the fact that behavior is not generated solely by the

agent’s neural network, but emerges through a continuous dynamic interaction between

the agent and its environment (Nolﬁ, 2011).

Another important topic for the future is the evolution of behavior in large-scale

networks. In particular, transformer architectures have shown surprising power when

scaled up to billions of parameters, or a million times more than many of the networks

discussed in this section (Ouyang, J. Wu, X. Jiang, et al., 2022). One way to characterize

this power is that such a scale solves the problem of variable binding, or dynamic

inferencing, that has limited the generality of smaller networks. For example, if trained

with sentences of type 1 composed of words of type A, and sentences of type 2 composed

of words of type B, such networks would not generalize to 1-sentences with B-words, and

2-sentences with B-words. Large language models perform such generalization routinely,

if they are large enough: For instance, they can write technical instructions in the style of

Shakespeare, never seen together in the training corpus.

Interestingly, a large scale is necessary for this ability to emerge. Transformers are

based on attention, i.e. discovering useful relationships between input tokens. While

the perfor mance of large language and image models is not yet fully understood, it is

possible that with a large enough scale, such models start learning relationships between

156

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

abstractions as well. It would be interesting to see if scale has a similar eﬀect in generating

complex, robust, multimodal behavior. It may be possible to use existing pre-trained

foundation models in language or vision as a starting point, and evolve behavior generation

as a modiﬁcation or augmentation to them. Or perhaps it will be possible to construct a

foundation model for behavior from scratch through the imitation of massive datasets? Or

maybe neuroevolution methods can be scaled to large models, and behavior discovered

through massive simulations? Research on such scale-up forms a most interesting direction

for future work.

6.4 Decision-Making

Intelligent behavior, as discussed above, focuses on agents that are embedded in a real

or simulated physical environment and interact with it through physical sensors and

eﬀectors. In contrast, intelligent decision-making focuses on behavior strategies that are

more abstract and conceptual, such as those in business and society. Neuroevolution can

play a large role in decision-making as well, but the approaches and opportunities are

distinctly diﬀerent. They often need to take advantage of surrogate modeling, and take

advantage of human expertise, as discussed in this section.

6.4.1 Successes and Challenges

To begin, note that human organizations today have vast amounts of data that describe their

operation: Businesses record interactions with their customers, measure how eﬀective

their marketing campaigns are, track performance of their supply chains; health-care

organizations follow the behavior of patients, measure eﬀectiveness of treatments, track

performance of providers; government organizations track crime, spending, health,

construction, economy, etc. Such data has made it possible to predict future trends.

Predictions are then used to decide on policies, i.e. decision strategies, i.e. prescriptions,

in order to maximize performance and minimize cost.

Discovering optimal decision strategies is an excellent opportunity for neuroevolution.

Optimal policies are not known; they involve a large number of variables that interact

nonlinearly; the observations and outcomes are often partially observable and noisy; often

several conŕicting objectives, such as performance and cost, must be optimized at the

same time. They are therefore well-suited for representation in neural networks, and

discovery through evolution.

However, a major challenge is that the search for optimal strategies usually cannot be

done in the real world itself. Discovery requires exploration, and it is usually unacceptable

to explore novel medical treatments with actual patients, or novel investment strategies

with actual money. In discovering intelligent behaviors, such exploration is done in

simulation, but it is usually not possible to simulate human behavior, biology, or society

in suﬃcient detail.

However, the vast amount of data, and the predictive models that can be built based on

them, provide a possible solution: It may be possible to construct data-based surrogate

models of the decision-making environment. If strategy outcomes are available, surrogate

157

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

models can be trained to predict them (Francon, Gonzalez, Hodjat, et al., 2020); if not,

such models can be trained to compare two strategies (Mańdziuk and Żychowski, 2023).

These models are phenomenological, i.e. they model the statistical correlations of contexts,

actions, and outcomes, and do not simulate the actual underlying processes. However, it

turns out that understanding these processes is not even necessary: Phenomenological

surrogate models are enough to evaluate the decision strategies, and therefore discover

good strategies through neuroevolution.

A surprising synergy emerges in this process. If the predictive models are learned

at the same time as the decision strategies based on them, they provide a regular ization

eﬀect, and a curricular lear ning eﬀect. As a result, the strategies are more robust and

easier to learn. This eﬀect will be discussed in the next subsection.

A second challenge in optimizing decision-making is that the discovered strategies

need to be acceptable to human decision makers. Humans are eventually responsible for

deploying them, and in order to do so, they need to be conﬁdent that they are indeed good

strategies. The strategies need to be trustworthy, i.e. express conﬁdence; they need to

make explainable decisions; and it must be possible for the decision makers to interact

with them, try out counterfactual scenarios, and convince themselves that the strategies

are robust. Considerable work goes into these aspects beyond just neuroevolution of

good strategies (as e.g. in the NeuroAI system; Miikkulainen, Fink, Francon, et al., 2025;

Miikkulainen, Francon, Meyerson, et al., 2021; Qiu, Meyerson, and Miikkulainen, 2020;

Shahrzad, Hodjat, and Miikkulainen, 2024).

Part of this challenge is also that there is already signiﬁcant human expertise in

many decision-making domains, and it should be possible to use it as a starting point

in discovering better policies. Evolution can still explore, but its exploration is more

informed, and may be more likely to discover improvementsÐalso those improvements

may be easier for the decision makers to accept. Again, it turns out that there is a surprising

synergy of human expertise and evolutionary discovery: When put together in this manner,

the results are better than either one alone. This eﬀect will be discussed in the second

subsection below.

6.4.2 Surrogate Modeling

The general idea of discovering decision strategies through surrogate modeling, i.e. the

evolutionary surrogate-assisted prescription approach (ESP; not to be confused with the

enforced subpopulations method of sections 5.6 and 7.1.1) is depicted in (ﬁgure 6.9;

Francon, Gonzalez, Hodjat, et al.,

2020). The decision-making problem is formalized

as a mapping from contexts

𝐶

and actions

𝐴

to outcomes

𝑂

. The goal is to discover

a decision strategy, i.e. a prescription policy, that results in the best outcomes for each

possible patient.

The starting point is a database, obtained through histor ical observation, that includes

as many examples of this mapping as possible. For instance,

𝐶

might describe patient

characteristics,

𝐴

might describe procedures or medication, and

𝑂

might measure the

extent and speed of recovery. This data can be used to train a model, such as a neural

network or a random forest, to predict the outcome of a given action in a given context.

158

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Predictor and prescriptor models (𝑏) Surrogate modeling process

Figure 6.9: Evolutionary surrogate-assisted prescription. In domains where evaluation of

decision strategies is not possible, a surrogate model can be used to guide the search. (

𝑎

) The

surrogate model, or a predictor, maps contexts and actions to outcomes. The decision-maker

model, or a prescr iptor, maps contexts to optimal actions. (

𝑏

) The models are constructed in one or

more cycles of an iterative process. Starting from historical observations of contexts, actions, and

outcomes, the predictor (e.g. a neural network or a random forest) is trained through supervised

learning. It is then used to evaluate prescriptor candidates, constructed through neuroevolution.

The ﬁnal prescriptor is deployed in the domain. More data can then be collected and the cycle

repeated, resulting in more accurate predictors and more eﬀective prescriptors. Figures from

Francon, Gonzalez, Hodjat, et al. (2020).

Thus, the predictor is deﬁned as

𝑃

𝑑

(𝐶, 𝐴) = 𝑂

′

, (6.1)

such that

𝑗

𝐿(𝑂

𝑗

, 𝑂

′

𝑗

)

across all dimensions

𝑗

𝑂

is minimized, where

𝐿

is any of the

standard loss functions.

The predictive model in turn can serve as a surrogate in search for good decision

strategies. The strategies are mappings themselves, i.e. from contexts to actions, and in

particular to actions that result in the best possible outcomes. They are therefore naturally

represented as neural networks, and called prescr iptive models. The prescriptor takes a

given context as input, and outputs a set of actions:

𝑃

𝑠

(𝐶) = 𝐴 , (6.2)

such that

𝑖, 𝑗

𝑂

′

𝑗

(𝐶

𝑖

, 𝐴

𝑖

)

over all possible contexts

𝑖

is maximized. It thus approximates the

optimal decision policy for the problem. Because optimal strategies are not known ahead

of time, these models need to be constructed through search, i.e. through neuroevolution.

Each candidate is evaluated against the predictor instead of the real world, thus making it

possible to explore fully and evaluate a very large number of candidates eﬃciently.

Once a good candidate is found, it can be deployed in the real world. At this point,

uncertainty metrics can be applied to it, it can be distilled into a set of explainable rules, and

an interactive scratchpad can be built so that the decision maker can convince him/herself

that the policy works as well as expected (Miikkulainen, Francon, Meyerson, et al., 2021).

When it is deployed, more (

𝐶, 𝐴, 𝑂

) data can be collected and added to the database.

159

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

These data are now closer to the actual implemented policies, and make it possible to

learn a model that is more accurate where accuracy is most needed. The cycle can then

be repeated, resulting in more accurate predictors and more powerful prescriptors in the

process.

A practical example of discovering decision strategies for pandemic interventions

will be presented in the next subsection. However, in order to evaluate the power of the

approach wrt. the state of the art, and to gain insight into how it constructs solutions, it can

be implemented in standard reinforcement lear ning domains (Francon, Gonzalez, Hodjat,

et al., 2020). One good such domain is OpenAI Gym CartPole-v0, i.e. balancing a vertical

pendulum by moving a cart left or right. In this case, the process starts with a population

of random prescriptors; the predictors are trained at the same time as the prescriptors are

evolved, i.e. the loop in ﬁgure 6.9𝑏 is traversed rapidly many times.

Compared to direct evolution of the control policy as well as standard reinforcement

learning methods PPO and DQN, ESP learned signiﬁcantly faster, found better solutions,

had lower variance during search, and lower regret overall. Most importantly, because it is

based on the surrogate, ESP is highly sample-eﬃcient, i.e. it requires very few evaluations

in the actual domain. Sample eﬃciency is one of the main challenges in deploying

reinforcement learning systems in the real world, and therefore ESP provides a practical

alternative.

Such domains are also useful in illustrating how ESP ﬁnds solutions. It turns out that

they are based on two surprising synergies with learning the predictors. The ﬁrst one is

that such co-lear ning results in automatic regularization. This eﬀect can be seen most

clearly in the domain of evolving function approximators (ﬁgure 6.10). In this case, the

context is a scalar value in the

𝑥

-axis, and the action is a scalar value in the

𝑦

-axis. The

optimal policy is a sine wave; the rewards decrease linearly away from it.

The ESP process starts with randomized feedforward predictor and prescriptor neural

networks. In each training episode, a context-action pair is chosen randomly, and the

predictor is trained for 2000 epochs with the pairs so far. A population of prescr iptors is

then evolved for 20 generations, using the same pairs to evaluate them against the current

predictor. The top prescriptor is then evaluated against the ground truth to illustrate

progress at each episode.

As seen in ﬁgures 6.10

𝑏

𝑓

, after 15 episodes the predictor is still far from representing

the sine wave, and the policy optimal wrt. this predictor is highly irregular as well.

Remarkably, however, the policy represented by the top prescriptor is much closer to the

actual optimal policy. This trend continues throughout training and evolution. By 75

episodes, the top prescriptor has already converged to the optimal policy even though the

predictor still suggests an irregular policy, and by 100 episodes, even the predictor-optimal

policy is a sine wave. This convergence is remarkably rapid: PPO takes over 3000 episodes

to learn a good approximation, and direct evolution (with the predictor) is not even close

at that point.

How is it possible for ESP to discover an optimal policy when the predictor is still far

from it? It turns out that the simultaneous learning of the predictor provides a regularization

eﬀect. The best predictors stay in the population for several generations, and therefore

are evaluated against many diﬀerent versions of the predictors. Especially early on in

160

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Problem space (𝑏) After 15 samples (𝑐) After 75 samples (𝑑) After 100 samples

Figure 6.10: Evolving eﬀective decision-making through co-learning of the surrogate model.

This example illustrates the synergy of learning the predictor and prescriptor at the same time in

the function approximation domain. (

𝑎

) With the context as

𝑥

and the action as

𝑦

, the ground

truth outcomes are indicated by the colored background. (

𝑏

𝑑

) The current predictor is indicated

as the colored background instead, so that it can be compared with the ground truth in (

𝑎

). The

training pairs are illustrated with translucent dots. The actual optimal policy is indicated by the

blue dotted line, and the policy that is optimal wrt. the current predictor is shown as a white

dotted line. The policy represented by the current top predictor is indicated by the solid orange

line. The prescriptors evolve policies that are better than the predictors suggest. The prescriptors

are evaluated with several diﬀerent predictors over time, which act as an ensemble that is more

accurate than any single predictor alone. Such co-learning of the predictor and the prescriptors

thus results in automatic regularization, leading to faster learning and more robust solutions. For

an animation of this process, see

https://neuroevolutionbook.com/demos

. Figures from

Francon, Gonzalez, Hodjat, et al. (2020).

predictor training, the predictors vary signiﬁcantly. In a sense, they form an ensemble, and

the prescriptors are evaluated against this ensemble. The ensemble performs better than

any individual predictor, and therefore the prescriptor evaluation is more accurate as well.

Thus, the co-learning of predictors and prescriptors provides a surprising regularization

eﬀect that makes it possible to progress faster than expected.

Another useful eﬀect of co-learning is the curricular learning environment it provides.

That is, the early predictors capture the main trends and the most general aspects of the

environment, which then become reﬁned as they learn more. Thus, the challenges start

simple and become more complex as the training goes onÐthis is the main principle of

curricular learning in general, and a good way to construct complex behavior (as also seen

in section 3.3).

The eﬀect can be made concrete in the FlappyBird game environment. The bird ŕies

at a constant speed through a series of gates in pipes. The player has only one action,

ŕap, which lifts the bird up a constant amount. Gravity will then bring it rapidly down.

The challenge is to time the ŕaps so that the bird gets through the next gate, and is also

well-positioned to get through the next gate. In the ESP setup, the predictor is trained

to estimate the next game states given the current state and the action, and prescriptors

evolved to decide when to ŕap. The ﬁtness is increased for every gate that the bird

successfully clears.

Figure 6.11 shows four sample predictions during evolution. Curricular learning is

evident in these snapshots: At the beginning, the predictor tends to place the gate near the

161

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) First gate (𝑏) Pair of gates (𝑐) Straight run (𝑑) Full problem

Figure 6.11: Automatic curricular evolution through co-learning of the surrogate model. In

the FlappyBird game, the challenge is to ŕap the bird up at the appropriate times so that it ŕies

through a course of gates without hitting them. The predictor, trained to estimate the result of

an action (ŕap/no-ŕap) at a state, (

𝑎

) ﬁrst places the gate nearby, (

𝑏

) then clusters a number of

them together, (

𝑐

) then spreads them apart at the same level, and (

𝑑

) ﬁnally presents the full game

challenge accurately. Such a series of increasingly challenging evaluations provides a curriculum

that makes it possible to evolve successful behavior, even when it would not evolve with the full

challenge from scratch. Co-learning the predictor and prescriptor thus constructs an eﬀective

curriculum automatically, allowing neuroevolution to solve more diﬃcult tasks. For animations of

these behaviors, see https://neuroevolutionbook.com/demos.

bird, making it easy to ŕy to it. By the time the bird evolves to ŕy through one gate, the

predictor has learned to expect the next gate, but clusters it together with the ﬁrst one. It is

thus relatively easy to evolve behavior that clears several gates. As the predictor learns, it

spreads the gates further apart, but still keeps them roughly at the same level. While the

prescriptors evolve to ŕy straight through, the predictors start placing the gates further

up and down, eventually providing a realistic challenge. By that time, it is relatively

easy to evolve behavior that takes the height of the gates into account, and ŕap the bird

successfully through the course. In contrast, direct evolution, i.e. evolution from scratch in

the actual task, never constructs successful behavior. This result demonstrates the power

of curricular learning and shows how it can be automatically discovered by learning the

challenges at the same time as the solutions.

ESP forms a foundation for discovering decision strategies with neuroevolution. The

next two subsections illustrate how real-world decision systems can be built on it (utilizing

the NeuroAI platform; Miikkulainen, Fink, Francon, et al. (2025)).

6.4.3

Case Study: Mitigating Climate Change through Optimized Land Use

A signiﬁcant factor contributing to climate change is how much land area is allocated for

diﬀerent uses (Friedlingstein et al., 2023). Forests in general remove more carbon from

the atmosphere than e.g. crops and ranges, yet such uses are essential for the economy.

Land-use patterns must therefore be planned to minimize carbon emissions and maximize

carbon removal while maintaining economic viability.

An approach to optimize land use can be developed based on the ESP method discussed

162

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

in the previous section (D. Young, Francon, Meyerson, et al., 2025). The idea is to ﬁrst

utilize historical data to learn a surrogate model on how land-use decisions in diﬀerent

contexts aﬀect carbon emissions and removals. Then, this model is used to evaluate

candidates in an evolutionary search process for good land-use change policies. While it

is diﬃcult to predict the economic impact of changes in land use, the amount of change

can be used as a proxy for it. As a result, a Pareto front is generated of solutions that trade

oﬀ reduction in carbon emissions and the amount of change in land use. Each point in the

Pareto front represents an optimal policy for that tradeoﬀ.

The data for carbon emissions (emissions resulting from land-use change, ELUC)

originate from a high-ﬁdelity simulator called bookkeeping of land-use Emissions (BLUE)

developed by Hansis, S. J. Davis, and Pongratz (2015). BLUE is designed to estimate

the long-term CO2 impact of committed land use. łCommitted emissionsž means all

the emissions that are caused by a land-use change event are attributed to the year of the

event. BLUE is a bookkeeping model that attributes carbon ŕuxes to land-use activities.

While in principle a simulator can be used as the surrogate model for ESP, in practice the

simulations are too expensive to carry out on demand during the search for good policies.

Therefore, the BLUE team performed a number of simulations covering a comprehensive

set of situations for 1850-2022, resulting in a dataset that could be used to train an eﬃcient

surrogate model.

The Land-Use Change (LUC) data is provided by the Land-Use Harmonization project

((LUH2; Hur tt et al., 2020). A land-use harmonization strategy estimates the fractional

land-use patterns, the underlying land-use transitions, and key agricultural management

information, annually for the time period 850-2100 at 0.25 x 0.25 degree resolution.

Based on these data, the modeling approach aims to understand the domain in two

ways: (1) In a particular situation, what are the outcomes of the decision maker’s actions?

(2) What are the decisions that result in the best outcomes, i.e. the lowest carbon emission

and cost for each tradeoﬀ between them? The data is thus organized into context, action,

and outcome variables.

Context describes the problem the decision maker is facing, i.e. a par ticular grid cell,

a point in time when the decision has to be made, and the usage of the land at that point.

More speciﬁcally, it consists of latitude and longitude and the area of the grid cell, the

year, and the percentage of land used in each LUH2 category (as well as nonland, i.e. sea,

lake, etc.).

Actions represent the choices the decision-maker faces. How can they change the land?

In the study of this paper, these decisions are limited in two ways: First, decision-makers

cannot aﬀect primary land. The idea is that it is always better to preserve primary

vegetation; destroying it is not an option given to the system. Technically, it is not possible

to re-plant primary vegetation. Once destroyed, it is destroyed forever. If replanted, it

would become secondary vegetation. Second, decision-makers cannot aﬀect urban areas.

The needs of urban areas are dictated by other imperatives and optimized by other decision

makers. Therefore, the system cannot recommend that a city should be destroyed or

expanded.

Outcomes consist of two conŕicting variables. The primar y variable is ELUC, i.e.

emissions from land-use change. It consists of all CO2 emissions attributed to the change,

163

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

in metric tons of carbon per hectare (tC/ha), obtained from the BLUE simulation. A

positive number means carbon is emitted, a negative number means carbon is captured.

The secondary variable is the cost of the change, represented by the percentage of land

that was changed. This variable is calculated directly from the actions. There is a trade-oﬀ

between these two objectives: It is easy to reduce emissions by changing most of the land,

but that would come at a huge cost. Therefore, decision-makers have to minimize ELUC

while minimizing land change at the same time. Consequently, the result is not a single

recommendation, but a Pareto front where each point represents the best implementation

of each tradeoﬀ given a balance between the two outcomes.

The ESP implementation consists of the predictor, trained with super vised learning

on the historical data, and the prescr iptor, trained through evolution. Given the context

and actions that were performed, the predictive model estimates the outcomes. In this

case, since the cost outcome can be calculated directly, only the ELUC is predicted by the

model. That is, given the land usage of a speciﬁc location, and the changes that were made

during a speciﬁc year, the model predicts the CO2 long-term emissions directly caused by

these changes. Any predictive model can be used in this task, including a neural network,

random forest, or linear regression. As usual, the model is ﬁt to the existing historical data

and evaluated with left-out data.

Given context, the prescriptive model suggests actions that optimize the outcomes. The

model has to do this for all possible contexts, and therefore it represents an entire strategy

for optimal land use. The strategy can be implemented in various ways, including decision

trees, sets of rules, or neural networks. The current approach is based on neural networks.

The optimal actions are not known, but the performance of each candidate strategy can

be measured (using the predictive model); therefore, the prescriptive model needs to be

learned using search techniques such as neuroevolution. As in prior applications of ESP

(Francon, Gonzalez, Hodjat, et al., 2020; Miikkulainen, Francon, Meyerson, et al., 2021),

the prescription network has a ﬁxed architecture of two fully connected layers; its weights

are concatenated into a vector and evolved through crossover and mutation.

In preliminary experiments, prediction performance was found to diﬀer between major

geographical regions. To make these diﬀerences explicit, separate models were trained

on diﬀerent subsets of countries: Western Europe (EU), South America (SA), and the

United States (US). Three diﬀerent predictive models were evaluated: linear regression

(LinReg), Random Forests (RF), and neural networks (NeuralNet). They were trained

with a sampling of data up to 2011, and were tested with data from [2012-2021]. Not

surprisingly, in each region the models trained on that region performed the best. The

LinReg models performed consistently the worst, suggesting that the problem includes

signiﬁcant nonlinear dependencies. RF performed signiﬁcantly better; however, RF does

not extrapolate well beyond the training examples. In contrast, neural nets both capture

nonlinearities and extrapolate well, and turned out to be the best models overall. Therefore,

the global neural net surrogate was used to evolve the prescriptors.

The prescriptors were evolved and tested with the same training and testing sets as

the global neural net. The prescriptors were ﬁxed fully connected neural networks with

two layers of weights. Their weights were initially random, and modiﬁed by crossover

and mutation. They received the current land-use percentages as their input, and their

164

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Evolution of Pareto front (𝑏) All prescriptors evaluated (𝑐) Comparing to heuristics

Figure 6.12: Prescriptor evolution and performance. In the land-use optimization domain, the

goal is to achieve low carbon emissions with minimal change in land-use. (

𝑎

) The Pareto front

moves towards the lower left corner over evolution, ﬁnding better implementations for the diﬀerent

tradeoﬀs of the ELUC and change objectives. (

𝑏

) Each prescriptor evaluated during evolution is

shown as a dot, demonstrating a wide variety of solutions and tradeoﬀs. The ﬁnal Pareto front is

shown as red dots in both ﬁgures, constituting a set of solutions from which the decision-maker

can choose a preferred one. (

𝑐

) The Pareto fronts of evolved prescriptors vs. heuristic baselines.

Whereas the heuristics try to optimize each region equally, the evolved prescriptors allocate more

change to where it matters the most. This result demonstrates that the approach can discover

non-obvious opportunities in the domain, and thus ﬁnd better solutions than the obvious heuristics.

For an interactive demo of the system, see

https://neuroevolutionbook.com/demos

. Figure

from D. Young, Francon, Meyerson, et al. (2025).

outputs speciﬁed the suggested changed land-use percentages; they were then given to the

predictor to estimate the change in ELUC. The outputs were compared to the inputs to

calculate the change percentage.

Figure 6.12 demonstrates the progress of evolution towards increasingly better pre-

scriptors, i.e. those that represent better implementations of each tradeoﬀ of the ELUC

and change objectives. They represent a wide variety of tradeoﬀs, and a clear set of

dominant solutions that constitute the ﬁnal Pareto front (red dots). That set is returned

to the decision-maker, who can then select the most preferred one to be implemented.

Importantly, the evolved Pareto front dominates two linear baselines: one where land

is converted to forest from all other types evenly, and another where other land types

are converted to forest in a decreasing order of emissions. A closer look revealed that

evolution discovered an unexpected strategy: Instead of trying to improve everywhere, as

the heuristics did, it identiﬁed a smaller number of locations where land-use change had

the largest eﬀect, and allocated maximum change to those locations. In other words, it

found that it is important to pick your battles! This result suggests that the approach is

able to learn and utilize non-obvious opportunities in the domain, and therefore results in

better solutions for land use than the obvious heuristics.

6.4.4 Case Study: Optimizing NPIs for COVID-19

One example of discovering intelligent decision strategies through neuroevolution is

a system for optimizing non-pharmaceutical interventions in the COVID-19 pandemic

(Miikkulainen, Francon, Meyerson, et al., 2021). Throughout the pandemic in 2019-

2023, governments and decision makers around the world were trying to contain the

health and economic impacts of the pandemic by imposing a variety of regulations on

165

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

the society. Economically, the most severe restrictions included school and workplace

closings, stay-at-home requirements, and restrictions on public events, gatherings, and

domestic and international travel; less severe ones included public information campaigns,

testing arrangements, contact tracing, and masking requirements. The approaches were

very diﬀerent around the world, partly because especially early on it was not clear how

eﬀective they each were individually and in combination.

COVID-19 was the ﬁrst global pandemic that took place in the information age, and

data about it became available in vast amounts and almost immediately. It became a

major focus of the scientiﬁc community (in late 2020, a new paper was submitted to

arXiv/bioarXiv on average every 17 minutes), and many approaches were developed to use

the data to understand it and cope with it. Most of the approaches were based on existing

technology of epidemiological modeling, developed in the early 1900s during and after the

major pandemics at that time (Kermack and McKendrick, 1927). The idea is to construct

diﬀerential equations that describe how diﬀerent populations become susceptible, exposed,

infected, and recover or die (SEIR). The models require estimating several parameters,

the most important of which is

𝑟

, the transmission rate. The eﬀect of NPIs can be taken

into account by modifying these parameters. More recently, these models have been

augmented with agent-based modeling approaches and network models, which can extend

their granularity almost to an individual person’s level (Newman, 2002; Venkatramanan,

Lewis, J. Chen, et al., 2018). Properly constructed, the models can be accurate and useful

in predicting the course of the pandemic. However, estimating the parameters is diﬃcult,

and the models are computationally expensive to run.

Much of the community, especially early on, focused on prediction, i.e. what will

happen. The decision makers could then, in pr inciple, use these predictions to evaluate

alternative NPIs and decide what to do about it. Even such communication between

the scientists and decision makers turned out to be diﬃcult, especially in the political

climate at the time, but there were several cases where it was eﬀective and resulted in good

outcomes (Fox, Lachmann, Tec, et al., 2022). An interesting question therefore arises:

Could optimal intervention policies be discovered automatically using machine learning?

The approach described in the previous section is well-suited to this task. The ﬁrst

step is to build the surrogate, i.e. the predictive model that could then be used to evaluate

the policy candidates. It turned out that the usual SEIR approaches could not serve this

role very well for three reasons: It was diﬃcult to parameterize them for the hundreds

of countries and ﬁner-grain locations; it was diﬃcult to parameterize them to model all

possible intervention combinations; and the models took too long to run to evaluate the

large number of candidate policies that needed to be tested. However, there were enough

data available so that it was possible to develop a data-driven approach to prediction:

train a neural network to predict the number of cases (or hospitalizations, or deaths)

phenomenologically.

The approach was possible because good sources of data existed to construct it. Time

series data were available for cases and other indicators for diﬀerent locations around t he

world through centralized sources almost daily (Center for Disease Control and Prevention,

2023). In addition, a major project at Oxford University evaluated government and news

outlet sources in order to formalize the NPI policies in eﬀect at these locales (Hale, Webster,

166

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Predictor model (𝑏) Prescriptor model

Figure 6.13: Predictive and prescriptive models for discovering nonpharmaceutical inter-

ventions (NPIs) in the COVID-19 pandemic. The predictor is used as a surrogate model for

the world in order to evolve prescriptors that implement good NPI strategies. (

𝑎

) The predictor

is an LSTM network that receives a 21-day sequence of cases and NPIs as input, and predicts

the cases next day. The network is trained with historical data across diﬀerent countries. During

performance, the prediction is looped back to the input, and rolled out indeﬁnitely into the future.

(

𝑏

) The prescriptor receives the same sequence of cases and NPIs as input, and prescribes the

NPIs for the next day. Since the optimal prescriptions are not known, it is constructed through

neuroevolution to reduce both cases and the total str ingency of NPIs. Each prescriptor is evaluated

through the predictor as the surrogate model. In this manner, the predictor is constructed entirely

based on data and is fast enough to evaluate a large number of prescriptor candidates. Figures

from Miikkulainen, Francon, Meyerson, et al. (2021).

Petherick, et al., 2020). The NPIs around the world were uniﬁed into a representation with

12-20 categories, each with 1-4 stringency levels.

Such data made it possible to use supervised machine learning techniques to form the

predictive surrogate model (ﬁgure 6.13a). An LSTM neural network with two channels,

one for the number of cases, and the other for the NPIs, was trained to predict the cases

the next day. As its input, it received the history of the last 21 days, and the predictions

were looped back into the input so that they could be unrolled indeﬁnitely into the future.

The separation made it possible to impose simple constraints on the predictions, such as

caps based on the population size of the locale, and that more stringent NPIs should not

lead to increases in the number of cases.

The prescriptor models were then evolved to discover good intervention policies

(ﬁgure 6.13

𝑏

). Each prescriptor received the same sequence of case numbers and NPIs as

its input, and suggested NPIs as its output. These suggestions were input to the predictor,

which then estimated the number of cases. The cases and NPIs were looped back into the

input of both models, and in this manner, the prescriptor was evaluated 90 days into the

future. Its performance was measured based on the number of cases as well as the total

stringency of the NPIs it suggested. The problem is thus multiobjective, and NSGA-II

(section 2.2.5) was used to constr uct a Pareto front of solutions. Therefore, the end result

is a collection of prescriptors on the Pareto front. The idea is that the decision maker can

then choose a suitable tradeoﬀ between cases and stringency, i.e. health and economic

outcomes.

167

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

Note that this problem is a good example of a decision-making task where a surrogate

is necessary, for three reasons. First, even if the decision makers could incorporate science

into their process, only one decision policy could be implemented at any one timeÐyet a

very large number of alternatives need to be evaluated in the search process. Second, the

NPI policies need to be evaluated over a long time during which the world does not stay

constant. The NPIs change over time, the number of cases changes as a result of the NPIs,

and also changes diﬀerently depending on the stage of the pandemic. The evaluations

thus need to be done against a surrogate that is accurate enough to track such changes.

Third, simply predicting the most likely outcome is not suﬃcient; it must also be possible

to estimate the uncertainty of the predictions. With a surrogate model, it is possible to

estimate the uncertainty in the initial predictions; the evaluation can then be unrolled

multiple times to observe the variation in the long term, resulting in conﬁdence bounds.

Throughout the pandemic, from May 2020 through December 2022, the predictor and

prescriptor models were trained daily, forming a constantly adapting set of predictions

and policies for all locations. The data-driven approach worked surprisingly well in

constructing reliable predictors. Diﬀerent countries implemented diﬀerent restrictions,

and they encountered diﬀerent phases of the pandemic at diﬀerent times. Thus, the data

was diverse enough so that the predictor learned to evaluate the diﬀerent policy candidates

accurately. These results were conﬁrmed by evaluating the predictions against actual data

in various countries at various stages of the pandemic early on. As long as there were

no major changes in the NPIs or the pandemic, the predictions tracked the cases well

(ﬁgure 6.14𝑎).

Similarly, prescriptor evolution discovered a range of eﬀective policies for diﬀerent

stages of the pandemic and for diﬀerent locations (ﬁgure 6.14

𝑏

). Evaluations with the

surrogate model suggest that, in many cases, they would have resulted in a lower number

of cases and lower economic impact than the actual policies implemented. An interesting

pattern of discoveries emerged in this process: The models often discovered principles a

few weeks ahead of the time they became widely known. The ﬁrst such result appeared in

May 2020: the models consistently suggested the most stringent restrictions on schools

and workplaces. And in fact, a few weeks later results came out suggesting that the virus

was transmitted most eﬀectively in such closed spaces where people stayed in contact

for several hours every day. In September 2020 the suggestions changed, focusing on

gatherings and travel restrictions, but suggesting less stringent restrictions for schools.

Indeed measures had been taken at schools wrt. separation, ventilation, dividers, and

masks that made it possible to keep them open in a safer manner.

Perhaps the most signiﬁcant demonstration of the power of the approach took place in

March 2021, during the delta variant surge. The models predicted a huge explosion of

cases in India, which was surprising because India had had the pandemic under control

until then, and there was no indication that anything was wrong. However, the models had

seen delta surges elsewhere, and apparently recognized that the NPIs at the time made

it vulnerable. Even though it was diﬃcult to believe the models, they were correct. If

the recommendations had been followed, much of the surge could have been avoided

(ﬁgure 6.15).

On the other hand, the models were much less successful in coping with the omicron

168

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Predictor accuracy (𝑏) Prescriptor Pareto front

Figure 6.14: Learned predictors and prescriptors. (

𝑎

) Given the diverse training data across

time and countries, the predictor lear ned to estimate the number of cases accurately. This example

is Italy in July 2020. Given the actual sequence of NPIs as input, it predicted the cases accurately

for the next 14 days for which there was data. It also suggested that these NPIs, if maintained, would

bring the cases down, but if lifted, an explosion of cases would result. (

𝑏

) The performance of the

ﬁnal population of prescriptors along the case and cost objectives. The Pareto front evolved strongly

towards the bottom left, and in the end oﬀered a set of tradeoﬀs from which the decision makers can

choose. For an animation of the Pareto front, see

https://neuroevolutionbook.com/demos

Figures from Miikkulainen, Francon, Meyerson, et al. (2021).

surge. It was indeed diﬀerent in that it happened very rapidly all over the worldÐthere was

not enough time for the models to get to see it in some countries, and then apply it to others.

It also turned out that in 2022, it no longer made sense to train the models from all the

available data. Diﬀerent NPIs were used: there was better testing, tracing, and masking,

and fewer restrictions on work, school, and travel. Also, people behaved diﬀerently in

2022 compared to 2020. In many locations, they did not adhere to the restrictions the

same way, and also masking, testing, and vaccinations made it less necessary to do so.

Therefore, it was better to train the models with less but more recent data. On the other

hand, this result again emphasized that it is important to train the predictor together with

the prescriptor; in that manner, they can both adapt to the changing world.

The NPI optimization application, as described above, was primarily a technology

demo, but it has already had a signiﬁcant impact. In a couple of cases it was also used

to inform actual policy decisions, such as the school openings in Iceland in the Fall of

2021. A major eﬀort in mainstreaming the approach was the XPRIZE Pandemic Response

Challenge in December 2020-March 2021 (Cognizant AI Lab, 2023; XPRIZE, 2023).

Over 100 teams around the world participated in creating predictors and prescriptors for

the pandemic. The general setup and the data sources were the same, but the approaches

varied widely. The winning teams were successful not only in terms of performance, but

also in communicating the results with decision makers. Most recently, Project Resilience

(Francon, 2025; ITU, 2023), a project led by the International Telecommunication Union

169

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) 2/19/2020 (𝑏) 3/1/2020 (𝑐) 3/21/2020

Figure 6.15: The predicted delta surge in India and a prescription to avoid it. (

𝑎

) On 2/19/2020,

the cases were decreasing (top plot) and the prescriptors suggested that many NPIs could be lifted

(bottom plot, lighter colors). (

𝑏

) The cases were similarly low on 3/1/2020, but there had been

delta surges elsewhere, and the models predicted a major surge in India if the current NPIs were

continuedÐwhich was hard to believe at the time. The prescriptors suggested tightening some of

them, which could have still avoided a major surge. (

𝑐

) However, more stringent NPIs were only

established several weeks later, and by that time even a full lockdown could not have avoided the

major surge. In this manner, the models can be used to detect problems early enough when it is still

possible to ﬁx them. For an interactive demo, see

https://neuroevolutionbook.com/demos

(ITU) agency of the United Nations, is an attempt to build on these successes further

and extend to other challenges such as the climate change. In this manner, over time, it

is possible that the surrogate optimization approach in general, and neuroevolution in

particular, will gradually become widely used in coping with a variety of problems in

decision-making in society.

An interactive demo of the NPI optimization system is available through the book

website

https://neuroevolutionbook.com

. It allows going back in time and evalu-

ating the model’s suggestions, comparing them to actual NPIs, and modifying them to

see the eﬀects. The code prepared for the XPRIZE competition is available through the

website as well. Using that starting point, it is possible to develop further models for the

pandemic dataset and others.

6.4.5 Leveraging Human Expertise

Recent applications of supervised learning have demonstrated the power of learning the

statistics of large numbers of labeled examples, and various reinforcement learning and

evolutionary optimization approaches have reached super-human performance in many

game-playing domains without much human involvement. However, there are many

domains where humans have signiﬁcant expertise. Incorporating such expertise in learning

could provide a better starting point, allowing it to ﬁnd better solutions in complex tasks,

and also solutions that may be easier and safer to deploy.

Neuroevolution provides a natural way to incorporate such knowledge into creative

problem-solving. Human solutions can be encoded in equivalent neural networks to

form the initial population, which is then evolved further to take advantage of both the

knowledge and machine discovery.

A method called RHEA (realizing human expertise through AI) was developed for

170

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

this purpose (Meyerson, Francon, Sargent, et al., 2024). It consists of four phases: (1)

Deﬁne the problem in a manner such that diverse expertise can be applied to it. (2) Gather

the solutions from the experts. (3) Distill the solutions into a population of equivalent

neural networks. (4) Evolve the neural network population to discover improved solutions.

Let us illustrate the approach ﬁrst in a synthetic domain illustrated in ﬁgure 6.16. The

problem is deﬁned as one where a subset of policy interventions

𝑎

, 𝑎

...𝑎

𝑛

needs to be

selected for diﬀerent contexts

𝑐

, 𝑐

...𝑐

𝑚

to optimize utility

𝜙

and cost

𝜓

. Assume there

are three expert solutions available: two specialists for

𝑐

and

𝑐

, and a generalist that can

be applied across all contexts. They can be distilled into a common grid representation

where black in cell

(𝑐

𝑖

, 𝑎

𝑗

)

indicates choosing an action

𝑎

𝑗

for context

𝑐

𝑖

. This population

of three solutions can then be evolved to obtain better solutions.

Let the utility be deﬁned as

𝜙(𝑐, 𝐴) =











1, if 𝑐 = 𝑐

∧ 𝐴 = {𝑎

, 𝑎

}

2, if 𝑐 = 𝑐

∧ 𝐴 = {𝑎

, 𝑎

}

3, if 𝑐 = 𝑐

∧ 𝐴 = {𝑎

, 𝑎

}

4, if 𝑐 = 𝑐

∧ 𝐴 = {𝑎

, 𝑎

}

5, if 𝑐 = 𝑐

∧ 𝐴 = {𝑎

, 𝑎

}

1, if 𝑐 = 𝑐

∧ 𝐴 = {𝑎

, 𝑎

}

1, if 𝐴 = {𝑎

, 𝑎

}

0, otherwise,

(6.3)

and the cost

𝜓

be the number of actions in the solution. The Pareto front resulting from

RHEA is illustrated on top of ﬁgure 6.16. Some of the solutions are found by recombining

existing expert solutions, e.g. by adding

𝑎

, 𝑎

𝑎

, 𝑎

𝑐

. Importantly, evolution

can also innovate beyond the experts, e.g. by adding

𝑎

to this solution. It can also reﬁne

solutions by removing actions that are redundant or detrimental, such as

𝑎

𝑐

, and by

incorporating knowledge from the generalist solution, i.e. 𝑎

..𝑎

for 𝑐

...𝑐

Interestingly, other methods cannot take advantage of such mechanisms. For instance

mixture-of-experts (MoE; Masoudnia and Ebrahimpour, 2014) can utilize diﬀerent

experts for diﬀerent contexts (as shown at the bottom of ﬁgure

6.16), but cannot form

recombinations of them, or innovations or reﬁnements. Its Pareto front therefore falls far

short of that of evolution. Similarly, Weighted Ensemble solutions (Dietterich, 2002) can

only choose a single combination of experts that is then applied to all contexts, which

results in even less eﬀective Pareto front.

Note also that it would be diﬃcult for evolution alone to ﬁnd a good Pareto front, i.e.

starting from random solutions instead of the experts. There is little information in partial

solutions that allows constructing them gradually, and evolution would thus be looking for

needles in a haystack. Indeed, experimentally RHEA discovers the entire optimal Pareto

front reliably whereas evolution does not, especially when the number of actions increases.

This synthetic example thus illustrates how evolution can take advantage of expert

knowledge, how it can improve solutions beyond such knowledge, and how these abilities

are unique to evolution as compared to standard machine learning approaches. Do these

insights carry over to large real-world domains?

171

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

Total Utility (Methods A.1)

Figure 6.16: RHEA leveraging expert solutions through evolution, compared to mixture-of-

experts (MoE) and weighted ensemble. Several solutions may include diﬀerent good ideas; the

challenge is to form a combined solution that takes advantage of all of them. In this synthetic

example, the plots in the middle show the Pareto fronts for each method: RHEA in blue

★

, MoE in

green

, and Weighted Ensemble in yellow

; in addition, the original expert solutions are shown

in purple

•

. The structure of each solution is visualized as a grid that identiﬁes which actions (row)

are used in each context (columns. On the left are the two original specialist solutions a and b,

and on the right, the original generalist solution c. The solutions on the RHEA Pareto front are

on top, and those for MoA in the bottom. Whereas MoE and Weighted Ensemble can utilize the

knowledge in the expert solutions only in a limited way, RHEA can recombine, add innovations,

and remove redundancies and detrimental elements to construct superior solutions. Whereas such

solutions would be diﬃcult to evolve from a random initial population, RHEA thus harnesses

the latent potential in expert solutions, and ﬁnds the optimal Pareto front reliably. Figures from

Meyerson, Francon, Sargent, et al. (2024).

To demonstrate the real-world power of RHEA, it was implemented in the XPRIZE

Pandemic Response domain mentioned in the previous section. In phase 2 of the

competition, a total of 169 diﬀerent prescriptors were submitted. They were constructed

with diﬀerent methods such as epidemiological modeling, decision rules, statistical

methods, gradient-based optimization, and evolution; some of them also utilized auxiliary

data sources, and some focused on speciﬁc locations. This set of prescriptors was thus

quite diverse, representing diverse human expertise. Several studies in psychology, social

science, and business suggest that diversity in human teams leads to improved decision-

making (Rock and Grant, 2016). The question is: Can we use AI (i.e. neuroevolution) to

take advantage of this diversity of human expertise?

The XPRIZE competition provided a convenient framework for the ﬁrst two phases.

The distillation was done by training an autoregressive neural network with gradient

descent to mimic the behavior of each solution created by human experts. Training

examples were created by querying the prescriptor with a comprehensive sampling of the

172

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Pareto fronts (𝑏) Human-preferred solutions

Figure 6.17: Combining human expertise and machine discovery in NPI optimization. The

recombination and mutation operators in evolution are well-suited for combining, reﬁning, and

extending existing ideas. (

𝑎

) The RHEA Pareto front dominates both the solutions created by

human experts (Distilled), as well as solutions evolved from a random initial population. (

𝑏

)

Given the human decision makers’ preference for mid-range tradeoﬀs, RHEA’s solutions would

be selected nearly always. These results demonstrate that neuroevolution can be used to take

advantage of human expertise, resulting in solutions that are better than both those of humans and

evolution alone. Figures from Meyerson, Francon, Sargent, et al. (2024).

Oxford data set. Evolution was done through the same ESP approach as described in the

previous section. That is, the latest predictor at the time was used as the surrogate, and

neural networks optimized the case and cost objectives as before.

Remarkably, the results exceeded all expectations (ﬁgure 6.17). The RHEA Pareto

front pushed signiﬁcantly further down and to the left than the Pareto front consisting of

the best solutions created by human experts, as well as the Pareto front resulting from the

evolution from initially random neural networks. In other words, RHEA evolution was

more power ful than either human expertise or evolution from scratch alone. Moreover, the

RHEA solutions dominated especially in the areas of the front that mattered: Given the

human decision-makers’ preference for mid-range tradeoﬀs, they would be likely to select

RHEA’s solutions over those of other methods nearly 100% of the time.

It is interesting to evaluate what RHEA actually discovered diﬀerently from humans

and machines alone. Figure 6.18(

𝑎

) characterizes the policies along ﬁve dimensions:

The range of their stringency (swing), whether they utilize diﬀerent phases (separability),

number of IPs used (focus), how often the IPs change (agility), and whether they utilize

weekly changes (periodicity). The policies are characterized for RHEA, evolution-only,

and submitted solutions, as well as the actual policies implemented in the world during

the pandemic.

Several interesting observations can be made from this comparison. First, in terms of

swing and separability, the submitted solutions had more variability than policies in the real

world, suggesting that human experts were exploring opportunities to improve. However,

RHEA’s solutions were more similar to the real world, although RHEA also discovered

that extreme separability could sometimes be useful. In this manner, RHEA did discover

173

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

(𝑎) Dimensions of NPI strategies (𝑏) Performance vs. contribution

Figure 6.18: Characterizing the discovered NPI policies. The policies can be characterized in

ﬁve dimensions, revealing similarities and diﬀerences between approaches. (

𝑎

) RHEA’s policies

were similar to the submitted ones in terms of focus, but diﬀered in four other dimensions. In

terms of swing and separability, it found solutions similar to those implemented in the real world,

but in terms of agility and periodicity, a potential new opportunity that both human experts and

real-world decision-makers missed. In this manner, RHEA can leverage both human expertise and

machine creativity. (

𝑏

) Performance (in terms of hypervolume) of the submitted solutions vs. their

contributions to the ﬁnal Pareto front. While better solutions generally contribute more, there are

many solutions that do not perform well but end up contributing a lot (those in the upper left area).

This result highlights the value of soliciting diverse expertise even if some of it is not immediately

useful: Methods such as RHEA can then be used to realize their latent potential. Figures from

Meyerson, Francon, Sargent, et al. (2024).

that the human expert’s innovations were not always productive. Second, in terms of focus,

RHEA’s solutions were more similar to the submitted solutions, and quite diﬀerent from

the real-world solutions. In this manner, it utilized the expert solutions’ tendency to focus

on a small number of NPIs. Third, in terms of agility and periodicity, RHEA diﬀered

from both submitted and real-world solutions, utilizing more frequent variations as well as

weekly periodicity. The solutions that were evolved from a random starting point were

similar along these two dimensions, suggesting that they were indeed discovered through

machine creativity. Such solutions tend to be more diﬃcult to implement in the real world,

although in some cases they were (e.g. for a time in Portugal and France). In this sense,

RHEA discovered a potential opportunity that both real-world decision-makers and human

experts’ solutions had missed. The conclusion is that RHEA can indeed utilize ideas from

solutions created by human experts as well as develop its own in order to construct the

best possible policies.

It is also interesting to characterize how RHEA discovered the best solutions, by

analyzing their evolutionary history. Some such solutions can be traced back to only a

single beneﬁcial crossover of two submitted ancestors, while others were constructed

in a more complex process involving several ancestors. Usually, the crossovers were

systematic, i.e. resulted in oﬀspring whose case-stringency tradeoﬀ was in-between the

two parents. It is also interesting to measure the contribution of each ancestor to the

solutions in the ﬁnal Pareto front, i.e. how much of their genetic encoding was found

174

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

in those best solutions (ﬁgure 6.18

𝑏

). As expected, submitted ancestors that performed

well generally contributed more, but there are also many ancestors that made outsize

contributions through the evolutionary process. This observation demonstrates why it is

so useful to solicit diversity of expertise, even when some of it is not immediately useful.

Neuroevolution methods such as RHEA can then be used to realize their latent potential.

The NPI optimization example demonstrates the power of RHEA in combining human

expertise and machine creativity through neuroevolution. The approach can be applied

to many other domains as well, where such diverse expertise is available. It can be

further combined with techniques for trustworthiness, such as interactive exploration and

conﬁdence estimation. Neuroevolution can thus play a crucial role in taking advantage of

intelligent decision-making in the real world.

Note that in RHEA, human expertise is treated as a black box. This approach makes

it possible to utilize such expertise in any form, distilled into a common neural network

representation. However, sometimes expertise is available explicitly in the form of rules,

examples, and advice. Such knowledge can be incorporated into neuroevolution by

modifying the evolved networks directly, as will be discussed in section 8.2. It is a diﬀerent

way of utilizing human expertise in neuroevolution.

Interestingly, distillation can also be useful in the other direction, i.e. by taking a neural

network that per forms well as a black box, and then evolving a set of rules to replicate

its performance (Shahrzad, Hodjat, and Miikkulainen,

2024, e.g. using the EVOTER

approach, ). Rule sets are transparent and interpretable, and in this manner, it may be

possible to explain how the network performs. In particular with RHEA, this approach

may make it possible to characterize the initial expert solutions in a uniform manner, and

further identify what new knowledge evolution discovers to improve them. Neuroevolution

can thus work synergistically with rule-set evolution to make both human and AI designs

explainable.

To conclude, neuroevolution is a powerful approach to discovering behavior at all levels,

from low-level control through multi-behavior strategy to high-level decision-making.

The next three chapters build on this foundation by extending to collective systems of

multiple agents, to incorporating humans in the loop, and to approaches for open-ended

discovery of increasingly complex behaviors.

6.5 Chapter Review Questions

Levels of Behavior: Describe the diﬀerent levels of behavior that neuroevolution

aims to optimize, from low-level control to high-level decision strategies. Provide

an example of a success story for each level.

Robust Behavior: What are some challenges in evolving robust behaviors in

dynamic or unpredictable environments? Discuss methods like trajectory noise,

coevolution, or symmetry evolution that address these challenges.

Simulation to Reality Transfer: Explain how neuroevolution can be adapted to

bridge the "reality gap" between simulations and the physical world. What role does

noise, stochasticity, and modern robotics simulators play in this process?

175

CHAPTER 6. NEUROEVOLUTION OF BEHAVIOR

Behavioral Switching: Why is switching between high-level strategies more

challenging than low-level control adjustments in neuroevolution? Provide examples

of fractured decision boundaries and interleaved/blended behaviors that illustrate

these challenges.

Fractured Strategies and Network Design: Explain how speciﬁc network design

choices, such as using radial basis functions or cascaded reﬁnement, can address the

challenge of discovering fractured decision boundaries in domains like half-ﬁeld

soccer.

Multimodal Task Division: Discuss the role of preference neurons in discovering

and implementing multimodal behaviors. How does this approach enable neuroevo-

lution to discover surprising and eﬀective strategies, such as in the Ms. Pac-Man

example?

Surrogate Modeling: What is the role of surrogate models in discovering decision

strategies with neuroevolution? Discuss how they enable exploration and evaluation

in domains where real-world experimentation is infeasible.

Evolutionary Surrogate-Assisted Prescription (ESP): Describe the ESP process

for decision-making. How does co-learning between predictors and prescriptors

contribute to automatic regularization and curricular learning?

COVID-19 NPI Optimization: In the context of optimizing non-pharmaceutical

interventions during the COVID-19 pandemic, how did the ESP approach combine

predictive and prescriptive modeling to discover eﬀective policies? What were the

advantages of this data-driven method over traditional epidemiological models?

10.

Human Expertise in RHEA: Explain how RHEA incorporates human expertise

into neuroevolution. How does it utilize diverse expert solutions to discover superior

decision strategies, and what unique advantages does it provide over other methods

like Mixture-of-Experts?

176

Chapter 7

Neuroevolution of Collective Systems

One of the most fascinating aspects of nature is that groups with millions or even trillions

of elements can self-assemble into complex forms based only on local interactions and

display what is called a collective type of intelligence. For example, ants can join to create

bridges or rafts to navigate diﬃcult terrain, termites can build nests several meters high

without an externally imposed plan, and thousands of bees work together as an integrated

whole to make accurate decisions on when to search for food or a new nest. Surprisingly,

achieving these incredible abilities is a result of following relatively simple behavioral

rules. These rules have been discovered through evolution that relies on cooperating

individuals, i.e. through cooperative coevolution.

A fundamental driving force in evolution is competition. Individuals compete for

resources, mates, and status. Groups of individuals battle for resources, but also may

engage in direct conŕict, including predators trying to catch prey, who in turn try to avoid

being caught. When the opponents discover new successful behaviors, the species also

have to develop new mechanisms to survive. This process results in continual adaptation,

i.e. competitive coevolution.

Cooperative and competitive coevolution can be used to drive neuroevolution as

well. Mechanisms range from cooperating neurons and networks, and cellular automata

deﬁned by evolved neural networks, to establishing an arms race of increasingly competing

networks. In many cases, complex behavior results that would be diﬃcult to discover in

other ways.

7.1 Cooperative Coevolution

A fundamental insight in generating intelligent systems is that they do not exist in a vacuum:

Intelligence often emerges from interactions with the environment. These interactions may

originate from constraints of a physical body, with its limited sensory and motor abilities.

They may originate from constraints posed by the physical surroundings: for instance,

Herb Simon’s point that even though an ant’s path may appear complex to the outsider,

the ant may be largely responding to the obstacles and contours in its path (H. A. Simon,

1969). Most importantly, signiﬁcant interactions originate from other agents. They may be

adversarial, posing a threat or obstacle, or they may be cooperative, requiring collaboration

177

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

to achieve a common goal.

Neuroevolution is well-suited for building such interactive intelligent systems. The

techniques focus on constructing intelligent systems from a large number of components

that work together. A fundamental principle is cooperative coevolution, i.e. evolving these

components together to achieve eﬀective behavior (Wiegand, 2003). Such cooperation

can take place at many levels: a single neural network; multiple neural networks in a

multiagent system; in a competitive environment between multiple cooperative multiagent

systems. The techniques are based on the same fundamental principle of shared ﬁtness,

but each addresses the challenge of intelligent behavior at a diﬀerent level.

7.1.1 Evolving a Single Neural Network

At the most basic level the goal is to construct a single intelligent agent in an environment

that returns a dedicated ﬁtness for it. In other words, a neural network is formed by

evolving a population of partial solutions, such as neurons, connections, or modules.

In the spirit of classiﬁer systems (Holland and Reitman, 1978), the ﬁrst approaches

of this kind focused on the evolution of cooperative neurons (Husbands and Mill, 1991;

Moriarty and Miikkulainen, 1997; Potter and De Jong, 2000). For example in the SANE

system (symbiotic adaptive neuroevolution) there was a single population of neurons, each

with its own input connections. The networks were formed according to blueprints, i.e. a

separate population of individuals that speciﬁed which neurons from the population were

included to form the network. The networks speciﬁed by each blueprint were evaluated

in the task, and the neurons in the blueprint inherited the blueprint’s ﬁtness. Both the

blueprint and the neuron population were evolved based on this ﬁtness, thus encouraging

the discovery of partial solutions (i.e. neurons) that collaborate well with other neurons.

This principle was further enhanced in the ESP system (enforced subpopulations,

section

5.6) where, instead of a diverse set of blueprints, there was only one network

structure: a fully connected network of

𝑛

neurons (ﬁgure 7.1; Gomez and Miikkulainen,

1997). However, each neuron in the network was evolved in a separate subpopulationÐthus,

each subevolution searched for a neuron that optimized performance for one location in the

network. The networks were then formed by selecting one neuron from each subpopulation

randomly to ﬁll the corresponding location in the network. All the neurons started

with random weights, and all the subpopulations were thus initially identical. However,

over evolution, they gradually diverged and specialized: they discovered diﬀerentiated,

computational roles for the neurons.

For instance, in the task of evolving a network that can run through a maze as a

simulated Khepera robot, several such roles could be identiﬁed. One subpopulation

evolved neurons that would slow the robot down if there was an obstacle in front; another

veered the robot to the right if there was an obstacle on the left; another veered left

with an obstacle on the right. Although such discovery and specialization were evident,

most importantly, each subpopulation usually performed at least two such subfunctions to

some extent. The reason is that such redundancy makes the construction of competent

individuals more robust; the neurons do not have to be perfect in what they do because

other neurons in the network compensate for their ŕaws. Such construction also results in

a more robust search: even if a suboptimal neuron is sometimes chosen from one of the

178

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Figure 7.1: Evolution of subpopulations of neurons. In the cooperative coevolution of a single

network, each subpopulation evolves one neuron for the network, which may be e.g. fully recurrent.

The genetic encoding of each neuron speciﬁes the neuron’s connection weights to other neurons.

Each neuron receives the ﬁtness of the entire network evaluated in the task. Thus, neurons evolve

to cooperate well with other neurons: the subpopulations optimize compatible subtasks and each

subtask is encoded robustly in a couple of subpopulations. Such a search for partial solutions is

also eﬃcient: the subtasks remain diverse, the approach avoids competing conventions, and the

search space is compact. From Gomez (2003).

subpopulations, the others cover for it. Thus, selection favors redundancy and thus more

robust networks. This is a powerful fundamental principle of cooperative coevolution in

general.

So far, the partial solutions (i.e. neurons) inherit the ﬁtness of the full solution (i.e.

a network) as is. However, such neuroevolution can be further enhanced by calculating

the ﬁtness of individual neurons separately as well, and using it in combination with the

inherited network ﬁtness. This is possible through diﬀerence evaluation, i.e. evaluating

the network in the task with and without the neuron, thus measuring how much better

oﬀ (or worse oﬀ) the network is with each neuron. In control tasks such as double pole

balancing and rover exploration, this approach can ﬁnd signiﬁcantly better solutions and

ﬁnd them signiﬁcantly faster (Agogino, Tumer, and Miikkulainen, 2005).

Based on these pioneering systems, it is already possible to see why the cooperative

coevolution approach can be powerful. There are three main reasons: First, it has a

built-in mechanism for maintaining diversity and avoiding premature convergence. A

good network requires many diﬀerent kinds of neurons. If e.g. the neural population

in SANE starts to converge, the similar neurons perform poorly in a network, and are

discarded in favor of those that are diﬀerent. Second, it avoids the competing conventions

problem. The neurons are chosen to distinct locations in the network, and optimized for

performance for those speciﬁc locations. Third, it reduces the search space. Instead of

having to optimize all the connection weights in the network at once, it is suﬃcient to

optimize the weights of single neuronsÐwhich can be done easily in parallel multiple

times. There are other ways to solve these problems in neuroevolution, including indirect

179

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

encodings (chapter 4), but the cooperative coevolution method is designed to tackle them

explicitly.

This approach of cooperative coevolution of compatible roles can be extended to other

levels of granularity as well. A particularly powerful way of constructing recurrent neural

networks is CoSyNE (Gomez, Schmidhuber, and Miikkulainen, 2008), where individual

connections are evolved in separate subpopulations. However, although the general idea is

a logical and compelling extension of ESP, it tur ned out that with such a large number of

subtasks, it is diﬃcult for evolution to converge to a compatible set. The solution is to

focus the search in two ways. First, individual connections are not chosen randomly from

each subpopulation to form a network, but instead the connections with the same index (i.e.

location) in the subpopulation are combined into the network. Thus, the indices serve as

simple blueprints, allowing search to focus on reﬁning these networks. Second, in addition

to the usual mutation and crossover in each subpopulation, a small subset of individuals is

permuted within each subpopulation, thus exploring a diﬀerent role for each of them. In

this manner, the search can more eﬀectively ﬁnd good combinations of individual weights,

which is especially important in highly recurrent neural networks. At the time, CoSyNE

was able to discover solutions to the most challenging control tasks, such as balancing two

poles simultaneously on a moving cart without precomputed velocity information, where

other neuroevolution and reinforcement learning methods could not.

Interestingly, the cooperative coevolution approach has recently proven valuable at the

higher level of granularity as well, i.e. neural architecture search for deep learning. As

will be described in more detail in chapter 10, the goal in neural architecture search is to

ﬁnd a design for a deep learning system that performs as well as possible when trained

with gradient descent. This process requires ﬁnding optimal hyperparameter settings,

network topologies, and layer types. It turns out that these elements can be coevolved in

separate subpopulations to form entire architectures, similarly to how neurons are evolved

to form networks. For instance, in the CoDeepNEAT method (Miikkulainen, J. Liang,

Meyerson, et al.,

2023), network modules consisting of a few layers and connections

between them are coevolved in separate subpopulations, and a blueprint population is

evolved to indicate how these modules are combined to form complete networks. Each

of these subpopulations is evolved with NEAT to for m complex recurrent structures. In

essence, CoDeepNEAT is thus a combination of SANE, ESP,and NEAT, applied at the

level of large deep learning architectures.

Compared to other neural architecture search methods, CoDeepNEAT is particularly

powerful in exploring new architectures because its search space is relatively unconstrained.

It is also possible to seed it with human designs and ﬁnd novel combinations of them

that the humans may have missed. For instance in the domain of image captioning,

CoDeepNEAT was initialized with the types of layers and connections that existed in

the state-of-the-art architecture at the time, the Show&Tell network (Vinyals, Toshev,

S. Bengio, et al., 2015). It was able to ﬁnd a network that improved performance by 5%.

Interestingly, it did so by employing a principle that is not common in human designs:

The best networks included multiple parallel pathways of processing that were brought

together in the end. This principle will still need to be evaluated more generally, but it

illustrates the kind of discoveries that are possible using the cooperative evolutionary

180

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Figure 7.2: Heterogeneous neural architecture training through DIP. The agent model is

composed of three main modules. First, a visual component generates a latent code

𝑧

𝑡

at each time

step

𝑡

. This code is concatenated with the hidden state

ℎ

𝑡

from an LSTM-based memory module,

which receives

𝑧

𝑡

and the previous action

𝑎

𝑡 −1

as input. The resulting vector

(𝑧

𝑡

, ℎ

𝑡

)

is then passed

to the controller module, which selects the agent’s next action. By temporarily protecting recent

innovations in upstream components, the deep innovation approach (DIP) allows training the whole

architecture end-to-end using a multi-objective genetic algorithm. From Risi and Stanley (2021).

Videos of trained agents at https://neuroevolutionbook.com/demos.

approach.

7.1.2 Evolving Structured Heterogeneous Networks

The cooperative coevolution approaches introduced in the previous section demonstrated

how breaking a neural network into partial solutions, such as neurons or synapses, can

lead to more tractable and robust search. These methods are built on the premise that

dividing the problem into independently evolving components allows evolution to ﬁnd

better global solutions through local coordination. SANE, ESP, and CoSyNE elegantly

address challenges such as maintaining diversity, reducing search complexity, and avoiding

competing conventions.

However, modern neural network systems are often much larger and consist of

several heterogeneous components in a functional structure. For instance, world model

architectures (discussed in section 13.5) include visual encoders that compress high-

dimensional observations, memory modules that capture temporal context, and controllers

that determine actions. Such systems can still be optimized by cooperative coevolution.

However, the process is diﬀerent from coevolving partial solutions: the overall structure is

determined by the task, and successful evolution depends on their ability to co-adapt over

time.

A key challenge that emerges in this context is the credit assignment problem (CAP):

when the overall performance of the network changes, it is diﬃcult to determine which

module was responsible and how the others should respond. For example, improvements

in one moduleÐsuch as a better visual representationÐcan initially lead to worse overall

performance if downstream components like the controller have not yet adapted to the

new representation. This phenomenon can cause evolution to discard useful innovations

prematurely, simply because their beneﬁts are not immediately realized.

The deep innovation protection (DIP) approach (Risi and Stanley, 2021) addresses this

issue and introduces a novel mechanism for coordinating the evolution of heterogeneous,

interdependent neural components. Instead of evolving distinct subpopulations, DIP

181

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

evolves these heterogeneous neural networks end-to-end using a single population, while

leveraging a multiobjective optimization strategy (section 2.2.5) to temporarily protect

recent innovations in upstream components. This method reframes the credit assignment

problem in neuroevolution as one of managing temporal coordination among co-evolving

partsÐensuring that innovations are not prematurely discarded before their full beneﬁts

can be realized. Such protection represents a powerful general principle for fostering the

emergence of complexity, akin to the role of speciation in NEAT (see section 3.3), which

preserves innovation by allowing novel structures time to mature before being subjected to

full competitive pressure. However, unlike typical speciation methods used in approaches

like NEAT, DIP explicitly protects a type of innovation that general genomic similarity

might not capture as well: the interdependence between components in a heterogeneous

neural architecture.

The particular agent architecture that was used to test DIP was composed of a

convolutional visual encoder that processes high-dimensional input, an LSTM-based

memory module that encodes temporal context, and a controller that determines the

agent’s actions (ﬁgure 7.2). Using NSGA-II, individuals in DIP were evaluated not only

on their performance (i.e. task reward) but also on an auxiliary łagež objective. Originally

pioneered for co-optimizing robot controllers and morphologies (Cheney, Bongard,

SunSpiral, et al., 2018), this age objective does not measure how long an individual has

been in the population, as in traditional diversity-preserving methods, but rather how

long a given componentÐhere the visual or memor y moduleÐhas remained unchanged.

During mutation, a single component was selected at random and its parameters were

perturbed by adding Gaussian noise to the parameter vectors of the network components.

When a mutation altered one of these upstream components, the individual’s age was

reset to zero, signaling that the rest of the network (especially the controller) had not

yet had time to adapt. As a result, individuals with newer innovations but equivalent

performance are preferentially selected, providing evolutionary time for the rest of the

system to co-adapt. The DIP approach was evaluated on the two tasks we have already

encountered in the context of AttentionAgents (section 4.4.3): the 2D continuous control

benchmark CarRacing-v0, and the 3D ﬁrst-person survival challenge DoomTakeCover.

These tasks were chosen to test DIP’s ability to evolve complex neural architectures in

environments with diﬀerent levels of perceptual and strategic complexity.

CarRacing-v0 tests the agent’s ability to generalize across unseen tracks and requires

ﬁne-grained control of steering, acceleration, and braking. Both DIP and the baseline

version (a standard GA without innovation protection; Risi and Stanley, 2019) performed

well on this task. The evolved agents consistently achieved average rewards above 900,

which is considered a successful solution. DIP reached a reward of 905

80, while the

standard genetic algorithm without innovation protection reached 903

72. These results

indicate that in relatively simple and smooth environments like CarRacing-v0, where the

interdependence between modules is less disruptive, both approaches can converge to

good solutions without signiﬁcant diﬀerences.

In contrast, the DoomTakeCover task presents a far greater challenge. As a reminder,

here the agent views the world from a ﬁrst-person 3D perspective and must survive by

dodging ﬁreballs launched by monsters. In this more complex scenario, the diﬀerences

182

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

between DIP and non-DIP approaches were striking. The DIP-based agents successfully

learned to survive, achieving an average score of 824.33 (

491.59), which exceeded

the performance threshold for solving the task (750 timesteps alive, averaged over 100

episodes). In contrast, agents evolved without innovation protection consistently failed to

reach this level. The standard genetic algorithm was unable to maintain useful innovations

long enough for the rest of the system to adapt, leading to stagnation and suboptimal

performance.

This contrast highlights the power of DIP: In environments where changes in perception

or memory require downstream adaptation, DIP allows the evolutionary process to preserve

and reﬁne promising solutions. It manages the temporal dynamics of learning within

the architecture itself, which proves essential for mastering tasks like VizDoom, where

emergent behavior and for ward prediction are necessary for survival. To gain a better idea

how exactly DIP is solving the VizDoom task, we can look at an evolutionary trajectoryÐ

the intermediate stepping stones that led to the eventual solution. In one representative

evolutionary run, the agent began to recognize ﬁreballs as salient features in early

generations (0ś30), but responded in a limited way, either by standing still or consistently

moving to the right. A notable performance improvement occurred around generation

34, when the agent began to explore both left and right evasive maneuvers. However, at

this stage, the internal representations guiding these actions remained ambiguous. This

ambiguity was resolved by around generation 56, which corresponded to another jump

in performance. In the generations that followed, the agent rapidly ﬁne-tuned its policy,

ultimately developing the ability to reliably distinguish between diﬀerent threat scenarios

and surviving for the full duration of an episode.

In conclusion, by dynamically adjusting selection pressure based on the recency

of innovations in upstream components, DIP eﬀectively orchestrates the training of

heterogeneous systems. It ensures that promising innovations are not lost before their

beneﬁts are realized, and that downstream components are given time to learn to take

advantage of new internal representations. The result is a more robust evolutionary process

capable of solving complex tasks that are diﬃcult to solve without protecting evolutionary

innovation.

7.1.3 Evolving a Team

At a higher level of coordination than a single neural network, neuroevolution can be used

to construct teams, i.e. groups of individual agents that solve problems cooperatively. An

interesting question is: how should the search for team members be organized? A single

neural network could be evolved to control the entire team; each team member could

be evolved separately; or the team could be formed by cloning a single evolved network

(ﬁgure 7.3).

The most straightforward extension from the single agent construction introduced in

section 7.1.1 is to evolve each agent in a separate subpopulation, and reward each agent

based on the success of the entire team. Predator-prey, or pursuit-evasion, scenario is

a good way to illustrate the approach. In the simplest such scenario, a team of three

predators was evolved to capture a single non-evolving (algorithmic) prey that always

moves away from the nearest predator (Yong and Miikkulainen, 2010). However, the prey

183

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

(𝑎) Centrally controlled (𝑏) Heterogeneous (𝑐) Homogeneous

Figure 7.3: Evolving centrally controlled, heterogeneous, and homogeneous teams. (

𝑎

) A

population of controller networks is evolved in a single population; each network controls all three

agents in the team. (

𝑏

) The three networks are evolved in three separate populations, and the team

is formed by randomly selecting one network from each population. (

𝑐

) The networks are evolved

in a single population, and the team is formed by cloning a selected network three times. In each

case, the ﬁtness of the team is used as the ﬁtness for each network that participated in it. While in

principle the central controller is able to coordinate the team well, heterogeneous networks may

evolve distinctly diﬀerent compatible roles that solve the task better. However, each network in a

homogeneous team is a generalist that can take on diﬀerent roles at diﬀerent times, resulting in a

more ŕexible team.

is as fast as the predators. Thus, in an unbounded (e.g. toroidal) ﬁeld it could never be

caught, unless the predators evolve a cooperation strategy.

Such a strategy was indeed evolved reliably using a multiagent version of the ESP

approach outlined above (ﬁgure 7.4). Each predator agent was controlled by an ESP neural

network, i.e. a recurrent network evolved from their own subpopulation of neurons. At a

hierarchically higher level, the three agents were evolved in parallel and evaluated based

on how often the entire team was able to capture the prey. Indeed, two behavioral roles

emerged: two of the agents behaved as chasers, forcing the prey to r un straight away from

them in a path that extended around the toroidal space. The remaining agent behaved as

a blocker, staying in place waiting for the chasers to push the prey to itÐthe prey had

nowhere to go and was captured.

Upon further analysis, two remarkable observations were made. First, such a

cooperative approach was more eﬀective than evolving a single network to control all

three agents. Second, it was more eﬀective to evolve it without any direct communication

between the agents, even as simple as simply sensing each other’s location. Each agent

would only sense the prey’s location, and based on the role they had evolved into, knew

what the other agents were likely doing, and what they needed to do themselves. In

other words, their coordination was based on stigmergy, i.e. communication through the

environment.

184

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

2 3

Frame 1 Frame 2 Frame 3 Frame 4

X X

Figure 7.4: Role-based cooperation through stigmergy. Similarly to a single-network evolution,

team members can be evolved in separate subpopulations and rewarded based on team success. In

a toroidal world, three predator agents tried to capture a prey (X) that always runs away from the

nearest predator and is as fast as the predators. Two of the predators (2, 3) evolved chaser roles,

and the third (1) a blocker role: The chasers push the prey to the waiting blocker around the torus.

Remarkably, evolution of agents in separate subpopulations was more eﬀective than evolution of a

central controller for the entire team. It was also more eﬃcient to not bother with communication

with other team members (even through visual sensing); each team member knew their role, and it

was most eﬀective for them to simply observe the prey, i.e. to communicate through stigmergy. For

an animation of this behavior, see

https://neuroevolutionbook.com/demos

. Figure from

Yong and Miikkulainen (2010).

Both of these are powerful principles that can be harnessed more generally in building

complex systems. They suggest that in similar domains, discovering compatible behaviors

can be easier than discover ing a comprehensive strategy for the entire team. Each behavior,

or role, can be ŕexible and robust on its own, compensating for inaccuracies in the other

agents’ behaviorÐsuch robustness is diﬃcult to discover in a central control system. Also,

when cooperation is based on such roles, it may be enough to observe simply the current

state of the problem: The subsequent behavior of each role can be assumed without direct

observation or communication, making problem-solving more eﬀective. The situation is

similar to playing soccer with a team that has practiced together and knows each other well:

You know what the others are doing even without looking, and you know what you need to

do by observing the opponents. A possible generalization of this idea is the evolution of

ensembles: Each ensemble member discovers a role that solves only part of the problem,

but when combined with the other roles in the ensemble, constitutes a full solution.

While role-based cooperation is often eﬀective, sometimes the behavior has to be

more ŕexible. In the soccer analogy, you may be playing a pick-up game: You do not

know the other players on your team, and have to constantly observe them to decide what

you should do. More generally, the number of agents required in diﬀerent roles may vary

over time, and the agents may need to be able to switch roles For instance, in robotic

soccer the behaviors are diﬀerent depending on which team has the ball and where in the

ﬁeld. A team of agents sent to rescue people in a disaster may require cleaning up rubble,

stabilizing structures, searching for targets, transporting them out, and each agent should

be able to take on any of these roles as needed.

An entirely diﬀerent kind of evolutionary approach may be needed to construct such

teams. Instead of evolving specialists, it is necessary to evolve generalists. This goal can

be achieved e.g. by evolving a homogeneous team, i.e. each member of the population is

evaluated based on how well it performs as part of a team that consists of clones of itself

185

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

(Bryant and Miikkulainen, 2018). For the team to be successful, it needs its members to

perform diﬀerent roles at diﬀerent times. Thus, evolution favors individuals that can adapt

their behavior to the situation, assuming appropriate behaviors that are compatible with

those of the other team members.

Such behavior can be demonstrated naturally in a civilization-type game environment.

The agents are settlers who have to perform various tasks at various times, including division

of labor into construction, mining, agriculture, defense, etc. One such demonstration

focused on legions defending multiple cities against barbarians. The barbarians were

controlled algorithmically, attacking cities with little defense, retreating when outnumbered,

and spawning at a regular rate in the countryside to replace those eliminated by the legions.

The legions were rewarded based on minimal damage to the cities, i.e. time they were

occupied by the barbarians.

Unlike in the role-based cooperation approach outlined above, in the adaptive teams

approach it is useful for the agents to observe each other continuously (i.e. to communicate),

in addition to the barbarians and the state of the cities. It is through such global awareness

that the agents evolve to decide what role they should take on. It requires developing an

internal model of the other agents and their behaviorÐa rudimentary theory of mind, if

you will. Some of the legions take on the task of defending the cities under attack, while

others prepare to defend cities that are likely to be attacked soon, and yet others proactively

hunt down the barbarians in the countryside. While perfect ﬁtness is not possible due to

randomness and occasionally algorithmic changes to the barbarian’s strategy, the adaptive

approach does help them obtain better ﬁtness. In a sense, the adaptation helps them

deal with the uncertainty and instability in the domain. Such robustness can serve as an

important ingredient in building intelligent agents that can cope with the messiness of the

real world.

Interestingly, for such coordination and communication to evolve, selection must

operate at the team level rather than at the individual level (Floreano, Mitri, Magnenat,

et al., 2007). How such high-level selection can be established is an interesting question

that has implications to biology as well, e.g. in understanding evolutionary breakthroughs

(section 14.7) and major transitions (section 9.1.5)

7.2 Competitive Coevolution

While cooperation of multiple elements or agents is a powerful approach in building

complex behavior, so is competition. That is, the agents evolve to outdo each other,

and the population thus collectively discovers increasingly more powerful behaviors

in an evolutionary arms race. Competitive coevolution is useful because it deﬁnes an

open-ended ﬁtness function automatically. The main challenge is that it is sometimes

diﬃcult to guarantee that progress is made continuously in an absolute sense. The process

can be set up to discover a single eﬀective behavior, or it can be set up to evolve multiple

competing behaviors. These approaches are described in the subsections below.

186

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

7.2.1 Evolving Single Neural Networks

One challenge in constructing complex behavior through neuroevolution is that it is

diﬃcult to design a suitable objective function. One approach is to make it very general

and high-level, such as survival, number of games won, or number of oﬀspring generated.

This approach poses few constraints on how such ﬁtness is achieved, and evolution can

ﬁnd creative solutions, but the signal may be too weak to make much progress. Another

approach is to specify a number of detailed components that are believed to be par t of

successful behavior, such as high speed, sharp tur ns, or accurate shooting, each providing

part of the ﬁtness. It is possible to make incremental progress in this manner, but it is

diﬃcult to make sure that robust solutions emerge, let alone creative solutions.

Competitive coevolution solves these problems by deﬁning ﬁtness in terms of the

behaviors in the current population. Individuals compete with other individuals, and their

ﬁtness is determined based on how well they do in this competition. As the population

improves, it becomes more diﬃcult to achieve high ﬁtness, thereby establishing an

open-ended, automatic mechanism of shaping the ﬁtness function.

Competitive coevolution is thus similar to curriculum, or incremental, learning in

general machine learning. Generative adversarial networks (GANs; Goodfellow, Pouget-

Abadie, Mirza, et al., 2014) are based on a similar mechanism, as are game-playing systems

based on self-play such as AlphaZero (Silver, Hubert, Schrittwieser, et al., 2018). One

of the earliest such systems was based on neuroevolution: Blondie24 used a version of

evolutionary programming to evolve neural network activation functions for checkers (and

later chess). Star ting without any built-in expert knowledge, it evolved into an expert-level

player (Chellapilla and D. B. Fogel, 1999; D. B. Fogel, 2001; D. B. Fogel, Hays, Hahn,

et al.,

2004). There is a large literature on competitive coevolution since the 1950s,

including analyses based on game theory (Adami, Schossau, and Hintze, 2016; de Jong

and Pollack, 2004; Ficici and Pollack, 2001; Samuel, 1959). There are many examples in

this book as well, including those in chapter 9.

The main challenge in competitive coevolution is to make sure that it actually makes

progress toward better solutions. Since ﬁtness is deﬁned in relation to other solutions,

improvement is not guaranteed in any absolute sense. It is possible to achieve higher

ﬁtness simply by exploiting weaknesses in the current candidates. Therefore, it is often

useful to maintain a collection (i.e. archive) of previous candidates and evaluate ﬁtness

against them as well as the current population. In this manner, good candidates are indeed

better than anything discovered by evolution so far.

However, progress against an archive of candidates does not necessarily mean progress

in a global sense, i.e. in the entire search space. In order to make global progress, a set of

previously unseen candidates needs to be included in the ﬁtness evaluations. They can be

obtained from other, independent runs of evolution. Or, the archive can be periodically

divided into training and validation sets, with the validation set used to ﬁlter out variations

that lead to only local progress (Miconi, 2009; Nolﬁ and Pagliuca, 2025; Simione and

Nolﬁ, 2020).

A mechanism such as NEAT provides yet another solution. As reviewed in section 3.3,

NEAT starts with a minimal network and gradually complexiﬁes it over evolution. Through

mutation and crossover, it adds more nodes and connections to the existing networks. The

187

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

earlier structures are still thereÐevolution elaborates on them instead of replacing them.

Therefore, the earlier behaviors are likely to be there as well, and the newer behaviors are

likely to be more elaborate and eﬀective. Therefore, it is likely that the newer solutions

perform better in comparison to the earlier ones, thereby guiding evolution towards

absolute progress.

This process was demonstrated in an experiment where neural network controllers

were evolved for a combined foraging, pursuit, and evasion task (Stanley and Miikkulainen,

2004). Two simulated Khepera-like robots were placed in a closed environment with

scattered food items. They were able to sense the distance to the opponent and the food

items around them, the distance to the nearest wall, and the diﬀerence between their

opponent’s and their own energy. The robots moved around by powering their two wheels;

they gained strength by consuming the food items and lost strength by moving. They

would win the game by crashing into their opponent when they had a higher strength

than the opponent. Thus, performing well required not only sensing and moving but also

estimating how much energy they and their opponent would gain and lose by consuming

and moving. Fitness was deﬁned as the average win rate over the four highest species

champions.

Because NEAT starts small and complexiﬁes (as was discussed in section 3.3), it was

possible to understand the complexiﬁcation that took place in the networks and behaviors

throughout the coevolutionary process. Evolution ﬁrst discovered a simple foraging

behavior that was often successful by chance: The agent occasionally crashed into the

opponent when it had more energy than the opponent (ﬁgure

7.5

𝑎

). It then evolved a

hidden node that allowed it to make an informed switch between behaviors: Attack when

it had high energy, and rest when it did not (ﬁgure 7.5

𝑏

). Another added node made it

possible to predict the agent’s own and its opponent’s energy usage from afar and attack

only when a win was likely (ﬁgure 7.5

𝑐

). The most complex strategy, with several more

nodes and complex recurrent connections between them, allowed the agent to predict

also the opponent’s behavior, encourage it to make mistakes, and take advantage of the

mistakes to win (ﬁgure 7.5𝑑).

Note that such an analysis and explainability is possible precisely because the networks

are evolved in a principled manner through elaboration. Even though large deep-learning

networks could perhaps be trained in this task, they would remain opaque and not provide

much insight into how the network establishes its behavior. Consequently, they could not

be trusted in the same way as NEAT networks can.

Interestingly, the elaboration process turned out to be crucial in discovering such

complex behavior. In a further experiment, a population was initialized with the ﬁnal

architecture from ﬁgure 7.5

𝑑

, i.e. all individuals had the same architecture with randomized

weights. This architecture supports the complex behavior, and therefore it should be easy

for evolution to discover the right weights. Surprisingly, it was not; each complexiﬁcation

step builds on a prior, simpler architecture that already performs some desired behaviors.

It is therefore relatively easy to add a complexiﬁcation to improve upon that behavior. In

multiple such small steps, a complex behavior eventually develops. In contrast, discovering

everything at once is very diﬃcult, and such evolution does not get past the ﬁrst few simple

behaviors.

188

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

(𝑎) Forage (𝑏) Forage/attack (𝑐) Predict energy

(𝑑)

Cause a mistake

Figure 7.5: Discovering complex behavior through competitive coevolution. Two simulated

Khepera robots need to consume food, pursue the opponent when they have higher energy than

the opponent, and evade it when their energy is lower. When the robots collide, the one with

higher energy wins. In the top row, the dark ovals are food items, and the red and yellow circles

are the two robots. The red line indicates the direction the robot is facing, the outer ring the

opponent sensor values, and the inner ring the food sensor values. The rings are yellow for the

robot with higher energy. In the bottom row, the network nodes are depicted as red squares and

numbered in the order they were created. Positive connections are black and negative are blue,

recurrent connections are indicated by triangles, and the width of the connection is proportional to

its strength. The approach discovered (

𝑎

) a foraging strategy that resulted in high energy and was

often successful when accidentally crashing on the opponent, (

𝑏

) a hidden node that allowed it to

switch between following and resting based on energy, (

𝑐

) a way to model and compare opponent’s

and their own energy, and (

𝑑

) eventually how to fake a move towards a far-away food item (top),

causing the opponent to (i) dash to it and then (ii) spend most of its energy to get to the last item

(left) but (iii) failing to get to it ﬁrst, thereby (iv) providing an easy win. Complexifying evolution

thus provides a way of understanding network performance; in this experiment, it provides a clear

example of how a single competitive coevolution population can discover increasingly complex

behaviors. For animations of these behaviors, see

https://neuroevolutionbook.com/demos

Bottom ﬁgures from Stanley (2003).

Thus, the foraging, pursuit, and evasion experiment demonstrates how coevolution

can be harnessed to discover complex behavior. It is achieved collectively in a simple

population where every individual tries to solve the same problem, and they simply

compete against each other. The coevolutionary setup can be made more complex by

incorporating multiple populations that try to outdo each other explicitly. In a sense, one

population discovers solutions and the other discovers more challenging problems. One

example is given in the next section; another (POET) later in chapter 9.

7.2.2 Evolving Multiple Teams

At the next higher level of complexity, multiple cooperative teams coevolve in a competitive

environment. Each team challenges the other teams to perform better, thus establishing an

evolutionary arms race: Over time, each team outsmarts the other multiple times, leading

to increasingly complex behavior for all teams.

189

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Competitive coevolutionary dynamics have been studied extensively from a theoretical

perspective, for example through game theory, and are now relatively well understood

(M. Mitchell,

2006; Popovici, Bucci, Wiegand, et al., 2012). Absolute improvement is

sometimes diﬃcult to establish, and the process can go wrong in multiple ways: For

instance, instead of getting better, the teams may simply become more weird. Later teams

may even lose to the earlier ones. However, in many natural tasks, the more complex

behavior often subsumes the earlier behaviors, which does lead to improvement in an

absolute sense.

Once again, a good domain to study such competitive-cooperative dynamics is predator-

prey tasks (Rawal, Rajagopalan, and Miikkulainen, 2010). Extending the multiagent ESP

approach of section 7.1.3, a simulation can be set up to evolve both the prey and the

predator populationsÐlet’s call them zebras and hyenas. Again in a toroidal world, the

zebras can run away from the hyenas, but the hyenas can catch them by approaching from

multiple sides.

At the very ﬁrst stages of evolution (generations 50-75), the zebras evolved an individual

strategy of running away from the nearest predator, replicating the algorithmic behavior

in the previous section. Correspondingly, the predator team evolved a two-blocker,

one-chaser strategy (ﬁgure 7.6; phase 1). In the next phase (generations 75-100; phase 2),

the prey evolved a new strategy of running in a small circle with the chaser following at its

tail. This strategy is eﬀective because the blockers simply wait to catch the prey. Next

(generations 100-150; phase 3), one of the blocker predators evolved to act as a chaser

as well, approaching the prey from two diﬀerent directions. As a response (generations

150-180; phase 4), the prey evolved a baiting strategy, letting both chasers get close

and then escaping away from them both. Next (generations 180-250; phases 5ś6), the

predators evolved to change roles between blockers and chasers dynamically, so that they

can better sandwich the prey. As a result (generations 250-300; phase 7), the prey adjusted

its strategy, letting all predators get close, and then escaping between them. In the next

few hundred generations (300-450; phases 8ś9), both of these strategies became gradually

more reﬁned and precise, eventually resulting in about 50-50 chance of the prey escaping

and getting caughtÐsimilar to what is seen in biology.

However, an interesting next step is to add another prey to the prey teamÐthe prey can

now evolve cooperation in order to confuse the predators. This is one of the most eﬀective

strategies used by prey in nature, and there is computational evidence (using Markov

Brains) that predator confusion is a suﬃcient reward to evolve swarming behavior (Olson,

Hintze, F. C. Dyer, et al., 2013). It also evolves reliably in the two-prey simulations. First

(in 150 further generations), the predators mostly capture one prey at a time, but are often

confused by the other, and fail. Then (generations 150-200, phase 1), they are able to adapt

their single-prey sandwiching strategy to herd the two prey together and capture both of

them. Remarkably, the prey are able to adapt their strategy in the same way (generations

200-300, phase 12, baiting the predators together, and then escaping in opposite directions,

leaving the predators confused. In further evolution, both of these strategies become more

precise, resulting in about an even chance of escape and capture in the end.

This example is interesting for two reasons: First, it illustrates how neuroevolution can

be used to understand how the behaviors observed in nature may have emerged through

190

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Figure 7.6: Evolutionary arms race of increasingly complex pursuit-evasion strategies.

Through multiple phases, the predator and prey populations alternate in gaining the upper hand

in the competition, which serves as a challenge and opportunity for evolution to improve the

disadvantaged population. The later behaviors largely subsume the earlier ones, and therefore

there is a progression in an absolute sense toward more complex and eﬀective behaviors that

would otherwise be diﬃcult to discover. The simulation also serves to shed light on observed

animal behaviors such as cooperative hunting and herding, and escaping by confusing the

predators. It thus demonstrates both a way to construct complex intelligent agents, as well as to

understand how intelligence may have emerged in biological evolution. For animations of these

behaviors, see

https://neuroevolutionbook.com/demos

. Figures from Rawal, Rajagopalan,

and Miikkulainen (2010).

coevolution. Sometimes, when observing biological behavior as it is, it is diﬃcult to

191

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

understand aspects of it. However, behavior, like other aspects of biology, is a product of

evolution, and should be understood in the light of how evolution may have constr ucted it,

through all the intermediate stages that may no longer be visible. Evolutionary computation

simulations may be used to uncover them; for instance, why it may be beneﬁcial for the

prey to let the predators get close before escaping. These opportunities will be discussed

in more detail in chapter 14.

Second, the example demonstrates a successful coevolutionary arms race. Complex

behavior is discovered through multiple stages, each a stepping stone to the next. The

imbalance of performance at each state forms a challenge to the disadvantaged population,

and evolution discovers ways to meet that challenge. In this manner, such competitive-

cooperative coevolution may be a crucial ingredient in open-ended evolution, and perhaps

also in establishing major transitions (Miikkulainen and Forrest, 2021). Opportunities for

such advances are discussed more in section 9.1.

7.3 Cellular Automata

Many collective systems in nature are made up of many components that are highly

interconnected. The absence of any centralized control allows them to quickly adjust to

new stimuli and changing environmental conditions. Additionally, because these collective

intelligence systems are made of many simpler individuals, they have in-built redundancy

with a high degree of resilience and robustness. Individuals in this collective system can

fail without the entire system breaking down.

A simpliﬁed yet powerful platform to study collective systems in various contexts is

cellular automata (CA). They oﬀer insights into how individual behaviors, when aggregated,

can lead to the emergence of remarkable and often unexpected g roup-level phenomena.

Constructing intelligent or life-like systems from a large number of cooperating components

is central to CAs, and as will be seen in this section, they allow complex patterns to emerge

based only on the local and self-organized interaction of cells. CAs have recently seen a

renaissance and renewed interest in the machine learning community by scaling them up

and combining them with deep neural networks.

Originally proposed in the 1940s, cellular automata mimic developmental processes in

multicell organisms, including morphogenesis. A CA is a spatially extended decentralized

system that contains a grid of similarly structured cells, which are locally connected and

updated per iodically in discrete time steps. At every time step, the status of each cell can

be represented as a state, which is then transitioned into the next state per the update rule.

The speciﬁc transition depends on the current state of the cell and the neighboring cells

(often this neighborhood is deﬁned as the cells directly bordering the cell in question, but

a larger neighborhood is also possible). For example, in a particular CA devised by John

Conway in 1970 called Conway’s game of life, a few rules govern the transition at each

timestep, such as: an alive cell that has fewer than two alive neighbors dies, while a cell

becomes alive if it has exactly three neighbors. These automata serve as eﬀective models

for a range of physical and biological processes. For instance, they have been employed to

simulate ŕuid dynamics, the emergence of galaxies, seismic events like earthquakes, and

the formation of intricate biological patterns.

192

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

A CA’s transition rule can be speciﬁed as a lookup table that determines, for each

local neighborhood conﬁguration, what the state of the central cell should be in the next

timestep. While the states are either 0 or 1 in the e.g. Conway’s Game of Life, we’ll

shortly see that cells can have more states or even be described by not only a single number

but a hidden state vector instead. In Conway’s game of life, the speciﬁc transition rules

were human-deﬁned. However, in some instances it can make sense to search for speciﬁc

rules that lead to desired behaviors or patterns. For example, researchers such as Melanie

Mitchell have shown that it is possible to optimize CA transition rules with evolutionary

algorithms (M. Mitchell, Crutchﬁeld, and Das, 1996). This way, rules can be found that

perform a speciﬁc type of computation, such as determining if the initial CA conﬁguration

has more 1s than 0s.

Instead of evolving rule tables directly (which can quickly become prohibitively large

when the number of CA states increases), rules can also take the form of programs

(Koza,

1994) or neural networks (Wulﬀ and Hertz, 1992). Here, a copy of the same

program/neural network r uns in each cell, taking information from its CA neighbors and

potentially previous cell states into account to determine which state the cell should take

next. Because each cell shares the same trainable parameters, the whole system can

be viewed as a type of indirect encoding, in which the size of the grown patterns can

potentially be much larger than the size of the underlying representation.

A popular benchmark to test the abilities of these systems is to grow forms resembling

simple 2D patterns. Originally proposed by developmental biologist Lewis Wolpert in the

1960s, the French ŕag problem (Wolpert, Tickle, and Arias, 2015) is such a task, and asks

how embryonic cells could diﬀerentiate into complex patterns, such as the three diﬀerently

colored stripes of a French ŕag. The inquiry extends to understanding how these patterns

can scale proportionally with tissue size, for example, such that the grown French ŕag

pattern is always one-third blue, one-third white, and one-third red. In an impressive early

demonstration of collective intelligent systems, J. F. Miller (2004) showed that a genetic

cell program can be evolved that allows growing a French ŕag-like pattern from a single

cell, which can even self-repair when being damaged. When the cell’s update function is

a neural network, it is now often called a neural cellular automata (NCA), and we’ll have a

closer look at those next.

7.3.1 Evolving Neural Cellular Automata

In a neural cellular automata (NCA; Wulﬀ and Hertz, 1992), a neural network updates

the states of each cell based on communicating with its local neighbors. The same neural

network is applied to each grid cell, resembling the iterative application of a convolutional

ﬁlter (Gilpin, 2019). In other words, NCAs can be viewed as an indirect encoding

(chapter 4) in which identical modules are applied with identical weight parameters across

the space of cells. More recently, the use of neural networks for CAs has seen a resurgence,

in particular because of their integration with popular deep learning frameworks.

Because NCAs are neural networks, they can naturally be evolved with the NEAT

algorithm. However, in this approach (CA-NEAT; Nichele, Ose, Risi, et al., 2017), evolved

neural networks are applied slightly diﬀerently to what we have seen in previous sections.

In an NCA, a collection of cells, each controlled by a copy of the same evolving neural

193

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

(𝑎) Pattern growth (𝑏) Pattern replication

Figure 7.7: CA-NEAT. The NEAT evolved neural networks have learned to grow a French ŕag-like

pattern (

𝑎

) and to perform pattern replication (

𝑏

), only through the local interaction of cells. It

thus demonstrates a way that neuroevolution can produce complex, coordinated behaviors from

simple, decentralized rules. Figures from Nichele, Ose, Risi, et al. (2017).

network, needs to learn to collaborate to perform the task at hand. This process was

demonstrated in an experiment where NCAs were evolved to lear n to grow a certain target

pattern, based only on the local information they receive from the neighboring grid cells.

Here, ﬁtness was assigned based on how closely the resulting pattern resembles the target

pattern during the growth process. In addition to growing a particular target pattern,

the system was also trained to replicate a certain pattern, which is another fundamental

property of biological systems. In this domain, the neural network was tasked to replicate

a given seed pattern a speciﬁc number of times.

NEAT was indeed able to solve both of these tasks. Figure 7.7

𝑎

shows an example

where a NEAT-evolved network grows a French ŕag-like pattern iteratively starting from

an initial seed cell (Nichele, Ose, Risi, et al., 2017). Figure 7.7

𝑏

demonstrates how an

evolved neural network learned to replicate an initial mosaic pattern along one axis, taking

a total of eight developmental steps.

How far can we push this approach? Can we learn to grow patterns of arbitrary

complexity? While NEAT was able to discover networks that can grow simple shapes

and learn to replicate them, further experiments showed that it struggled to learn to grow

more complex shapes, such as a Norwegian ŕag-type pattern. The reason for this is likely

that the evolutionary optimization algorithm gets stuck in some local optima of the ﬁtness

landscape. We have seen similar phenomena in section 5.3, when trying to re-evolve

CPPNs to generate speciﬁc target patterns like the skull image. In a similar vein, evolution

here likely depends on discovering the proper stepping stones towards the solution and

the developmental dynamics of NCAs likely make this optimization problem even more

complicated.

While open-ended search methods like quality diversity (section 5.4) could potentially

be useful to overcome the stepping stone problem in this domain, evolutionary approaches

194

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

tend to perform especially well when the search space is less constrained. Often, we aren’t

aiming for a precise target pattern but rather for satisfying functional goalsÐfor example,

discovering a robot morphology that maximizes locomotion speed. As we will see in the

next section, neuroevolution excels at this kind of creative, goal-driven discovery.

In addition to g rowing speciﬁc patterns, these experiments highlighted how NCAs can

not only learn spatial organization but can also exhibit behaviors such as self-replication.

This capability echoes broader ﬁndings in artiﬁcial life research, where simple systems

have been shown to being able to develop self-replicating dynamics (Agüera y Arcas,

Alakuijala, Evans, et al.,

2024). Indeed, life and intelligence can be understood as

computational processes rooted in replication, variation, and interaction (Agüera y Arcas,

2025). These mechanism are fur ther discussed in chapter 14.

7.3.2 Growing Functional Machines

In the previous section, we saw that NCAs can be evolved to grow inanimate artifacts,

such as 2D patterns. However, in nature, entire organisms grow from a single cell, moving

and interacting with the world. Additionally, as a result of their developmental programs,

such systems continuously renew their cells and possess the ability to repair themselves.

Can NCAs be extended to accomplish similar feats?

In this section, we revisit the domain introduced in section 4.3.1 where we explored

how CPPNs can be used to encode the morphology of soft, mobile robots. In that work, a

CPPN was queried with the location of each voxel and would then output a voxel material

type. CPPNs were able to create high-performing soft robots with regular patterns such

as symmetry and repetition. However, each voxel needed access to its global location in

space and while this is not necessarily a problem in simulated soft-robots, in modular

physical robots (where each module is identical), this information might not be directly

available. Can we design soft robots using a collective approach, where each voxel

determines its material solely through local cell-to-cell communication? Drawing parallels

with biological systems, each cell should be able to determine its function through local

interactions alone.

Here we will look at such a completely distributed approach, which is based on

evolving NCAs (Horibe, Walker, and Risi,

2021). In this example, the NCA was a

rather simple neural network with a ﬁxed topology consisting of three layers. The input

dimension of the neural network was

3 × 9 ×2 = 54

, with a hidden layer of 64 nodes. The

neural network had ﬁve outputs that determine the next state (i.e. material type) of each

voxel, such as muscle or bone, and one output that determine if the cell is alive. The same

neural network was applied to each voxel neighboring a voxel that is already alive. Robots

were grown from an initial seed cell in the center position of the 3D grid for a certain

number of timesteps until they were placed in the simulation environment. Each robot’s

voxel materials were then actuated, and the robot was tested for its ability to locomote.

Instead of using NEAT, the parameters of these networks with a ﬁxed architecture were

evolved through a simple genetic algorithm, in which parents were selected uniformly at

random. Genomes were mutated by adding Gaussian noise to the neural network’s weight

vectors. The GA performed simple truncation selection with elitism.

Similar to the CPPN-encoded soft robots, evolved NCAs were able to create high-

195

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

(𝑎) Grown soft robots

(𝑏) Damage recovery

Figure 7.8: NCA-based soft robots. Evolution discovered a variety of NCAs that were able

to grow 2D and 3D soft voxel robots with diﬀerent walking gaits (

𝑎

). A second NCA, trained

speciﬁcally for damage recovery, is able to regrow damaged parts of the robot solely through the

local communication of cells (

𝑏

). Thus, neuroevolution is not only well-suited to ﬁnding NCAs for

static designs but also functional morphologies. Figures from Horibe, Walker, and Risi (2021).

Videos at https://neuroevolutionbook.com/demos.

performing 3D soft robots through a process of growth and local communication alone.

However, unlike CPPNs, they were able to do so without requiring a global coordinate

frame. Some of the example grown robots are shown in ﬁgure

7.8

𝑎

. Once grown, the

creatures display diﬀerent walking gaits, such as the L-walker that resembles an L-shaped

form; it moves by opening and closing the front and rear legs connected to its pivot point

at the bend of the L or the crawler, which has multiple short legs and its legs move forward

in concert.

Collective systems oﬀer the advantage of being highly resilient to perturbations and

disruptions, as they are designed with built-in redundancies and lack a single point of

failure. For example, the morphogenetic systems of many biological organisms give

them amazing regenerative capabilities, allowing them to repair and reconﬁgure their

196

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

morphology in response to damage or changes in components. Primitive organisms

such as Hydra and Planaria are particularly capable of regeneration and can thus achieve

complete repair, no matter what location of the body part is cut oﬀ (Beane, Morokuma,

Lemire, et al., 2013). But also more complex creatures, such as salamanders, are capable

of regenerating an amputated leg. Can our artiﬁcial collective system show a similar kind

of resilience and adaptability?

To explore this question, we can remove parts of the fully developed robots and rerun

the same NCA for several developmental steps to observe whether the damaged areas

regenerate. As it turns out, it is challenging to evolve one NCA that controls both the

initial growth and the damage recovery. We have already seen in section 6.3.1 that it can

be challenging for neuroevolution to switch between diﬀerent behaviors. However, we can

make the task easier by training a second NCA whose sole purpose is to regrow a damaged

morphology. In other words, one NCA grows the initial morphology and the other NCA

is activated once the robot is damaged. This way, robots were often able to regrow

damaged components, allowing them to restore their ability to locomote (ﬁgure 7.8

𝑏

Nevertheless, small discrepancies in the restored morphology could lead to a signiﬁcant

loss of locomotion ability. In section 7.3.5, we will revisit this task and explore how the

synergistic integration of neuroevolution and gradient descent can ultimately enable the

same neural network to not only grow a robot but also facilitate a higher accuracy in

damage and locomotion recovery.

7.3.3 Case Study: Growing Game Levels with QD-Evolved NCAs

So far in this chapter, we have explored approaches where the goal is to g row one particular

artifact that satisﬁes certain functional or visual criteria. To evolve a diversity of designs,

such as the robot morphology, the algorithm needed to be run multiple times from scratch.

In this section, we will look at a case study that evolves a diversity of neural cellular automata

with a QD-algorithm, with the goal of generating a variety of diﬀerent video game levels

(Earle, Snider, Fontaine, et al., 2022). Level generation can serve as a good benchmark

for evolving NCAs and the creative abilities of neuroevolution in general, because such

artifacts often need to satisfy a diverse range of criteria, from being aesthetically pleasing,

to fun to play, and, after all, functional (i.e. a level needs to be playable). Indeed, we will

encounter this domain again in the context of combining neuroevolution with generative

AI (section 13.4). Additionally, we have seen in section 7.3.1 that it can be diﬃcult to

learn to control the complex dynamics of a self-organizing system such as NCAs to grow

into a particular target shape. Because QD algorithms can take advantage of stepping

stones discovered along the way, we will see that they are better able to navigate these

complex ﬁtness landscapes.

A well-suited video game to study these algorithms is the old school The Legend of

Zelda (Nintendo, 1986). In the simpliﬁed Zelda clone in these experiments, the agent

has to navigate 2D levels and locate the key that will open the level’s exit door, while

killing monsters. Zelda levels often show some level of symmetry, and therefore symmetry

(both horizontal and vertical) in addition to the path-length from the goal to the exit, were

chosen as the MAP-Elites dimensions of interest. In a straightforward application of QD

and NCAs, each elite in the map would be an NCA that produces a map with a particular

197

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Figure 7.9: NCA architecture for game level generation. A convolutional network repeatedly

transforms levels based on the local interaction of 3

3 cells. Levels are evaluated after being

modiﬁed for a ﬁxed number of iterations. Figures from Earle, Snider, Fontaine, et al. (2022).

level of symmetry and path length. However, a designer would ideally have more than

one level with a speciﬁc path length to choose from. To address this issue, each NCA can

be treated as a whole level łgeneratorž and tested for its ability to generate a diversity of

diﬀerent levels with the same path-length, given diﬀerent random initial states as input.

With the QD dimensions deﬁned, a measure for the quality of each NCA generator

was needed, which was evaluated based on three diﬀerent criteria: validity, reliability,

and intra-generator diversity. The validity term quantiﬁed how well the generated level

conformed to the soft constraints of the particular game. For example, in the case of Zelda,

this constraint meant that levels should form one connected region, with the generator

receiving a lower score for each additional region that was not connected to the main

region. The reliability term captured how reliably one NCA generated structures with

a particular QD measure. For example, an NCA in Zelda was penalized if it produced

levels with very diﬀerent path lengths each time it generated a new level from a diﬀerent

initial state. The last term, intra-generator diversity, measured the amount of diversity

in a batch of levels generated by the same NCA (given diﬀerent starting seeds). This

term was added to prevent generators from ignoring the latent seed input and collapsing

to producing only one particular level design. These three terms were then ultimately

combined to measure the quality of a particular NCA, with the goal of having a generator

that produces a distribution of valid levels with reliable behavior characterization.

A detailed view of the NCA architecture is shown in ﬁgure 7.9. It comprised three

convolutional layers, utilized ReLU and sigmoid activation functions, and had 32 hidden

channels. The NCA’s output retained the dimensions and channel count of its input.

However, it employed an arg max function on a channel-by-channel basis to yield a

discrete representation of the subsequent state. To generate a game level using an NCA, a

one-hot-encoded random starting level was given as input (also termed as łlatent seedž).

This process was reiterated using the NCA’s output until the level either stabilized or

reached a predetermined step limit. The QD algorithm was a variant of the classical

MAP-Elites algorithm, in particular, CMA-ME (Fontaine, Togelius, Nikolaidis, et al.,

2020). This approach (see section 5.4.4) combines the MAP-Elites type of solution

198

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Figure 7.10: NCA-generated Zelda levels. Shown are example levels generated by NCAs evolved

using a MAP-Elites-based QD approach. The method successfully discovers NCAs capable

of producing a diverse set of valid and solvable Zelda maps, varying meaningfully along two

dimensions: path length and symmetry. Each map adheres to strict gameplay constraints, including

exactly one avatar, one key, and one door. These results demonstrate the eﬀectiveness of combining

NCAs with QD algorithms for constraint-aware, diverse procedural content generation in game

design. Figures from Earle, Snider, Fontaine, et al. (

2022). Videos of the growth process at

https://neuroevolutionbook.com/demos.

archiving with the adaptation mechanism of CMA-ES, which is particularly well-suited

for continuous domains.

The approach was able to grow a diversity of levels along the dimensions of interest

path-length and symmetry (ﬁgure 7.10). The maps were all solvable, satisfying the

required game constraints such as only producing one key, one door, and one avatar. One

interesting question is: How does the NCA approach compare to a CPPN-like generation

of levels, which does not go through the process of growth? QD-algorithms are particularly

well-suited to compare diﬀerent representations since they can illuminate how well the

approach covers the search space along dimensions of interest. To make the comparison

fair, each CPPN also needed to become a generator, allowing it to produce not just one

199

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Figure 7.11: NCA level growth. Shown are the intermediate growth states of a Zelda level. The

growth process starts with a ﬁxed initial seed at the center of the level until a stable conﬁguration

is reached. Interestingly, during the intermediate stages of growth, levels frequently contained

multiple keys or doors. These additional intermediate tiles appear to function as a form of external

memory, helping to transmit spatial information across the level and enabling the emergence of

globally coherent patterns. The main result is that through purely local iterative interactions, the

NCA is able to produce levels that fulﬁll complex, high-level functional constraints. Figures from

Earle, Snider, Fontaine, et al. (2022).

map but multiple ones. This could be achieved by augmenting the CPPN with a latent

vector input, in addition to the typical 𝑥, 𝑦 coordinates.

Surprisingly, the results showed that the NCA-based approach was able to explore a

larger space of the levels and the individual generators produced more diverse outputs

than the CPPN-based encoding and an additional variational autoencoder (VAE)-inspired

decoder architecture (Kingma and Welling, 2014). One would assume that having global

information would, in fact, make it easier to produce a diversity of levels. However, in

this instance, the NCA-based architecture was better suited for searching the space of

high-quality and diverse levels.

How was the NCA able to produce designs with global coherence without the global

information available to a CPPN or VAE decoder? Looking at a level g rowth sequence

reveals some interesting insights (ﬁgure 7.11). During the intermediate growth process, we

can see that the levels often contain multiple keys or doors; however, at the end, the process

converges towards a solution with just one key and one door. These intermediate tiles

seem to function as a type of external memory, propagating spatial information across the

level to form patterns with global complexity. Surprisingly, through these iterative local

interactions alone, the NCA was able to generate levels that satisfy high-level functional

constraints.

Producing patterns with global coherence through local interactions alone is an

essential ability seen in many collective intelligence systems in nature. In the next section,

we will investigate the opportunities of such advances for the growth of neural networks

themselves.

7.3.4 Evolving Self-Assembling Neural Networks

One of the most impressive feats of a collective system cooperating is the self-assembly

of billions of cells into a human brain. While most current neural networks in machine

learning are hand-designed and lear ning is restricted to optimizing connection weights,

200

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

biological neural networks are grown through a process of local communication and

self-organization. In the previous sections, we have seen that NCAs can learn to grow 2D

structures, game levels, and even locomoting 3D soft robots. Can they also learn to grow

and self-assemble an artiﬁcial neural network?

In section 4.2.2 on grammatical indirect encodings, we have encountered early work

in this direction with an approach called cellular encodings (Gruau and Whitley, 1993;

Gruau, Whitley, and Pyeatt,

1996). In a cellular encoding, a program evolved through

genetic programming guides the growth of a policy network. This pioneering work was

maybe ahead of its time, with direct encodings such as NEAT being able to outperform it

in terms of the number of evaluations needed to ﬁnd a solution for simple tasks such as

pole balancing. The cellular encoding approach has therefore been less well adopted than

conceptually simpler and more direct encoding approaches.

However, with the recent advances in training NCAs to produce complex patterns more

eﬃciently, a cellular encoding based on neural networks (instead of GP), could potentially

serve as a powerful indirect encoding. Related approaches such as ES-HyperNEAT also

progressively construct networks (section 4.3.5), but do not take advantage of the collective

collaboration between cells during this process. In nature, these abilities seem essential in

enabling the remarkable robustness and adaptability of collective intelligent systems.

A step towards this direction is the HyperNCA approach (Najarro, Sudhakaran, Glanois,

et al., 2022), which models neural network growth using neural cellular NCAs. The idea

is straightforward: Over a number of steps, the NCA grows a spatial pattern. The novel

idea is to then interpret one channel of the resulting pattern as the weights of a policy

network. This indirectly encoded network is then evaluated in a task (ﬁgure

7.12), and

the ﬁtness outcome guides the optimization of the NCA using an evolutionary algorithm.

While the approach showed promise in continuous control tasks, such as LunarLander and

quadrupedal robot locomotion, one limitation of HyperNCA is that it does not incorporate

any awareness of the ﬁnal network’s structure, i.e. the mapping from the grown 3D pattern

to the policy weight matrix does not take the topology of the network into account.

A method that aims to address this issue is the neural developmental program (NDP)

approach (Najarro, Sudhakaran, and Risi, 2023). NDPs are building on the ideas behind

neural CAs but extend them to growing graph-like structures. In other words, these graph

cellular automata (GCA) approaches extend the traditional gr id-based structure of cellular

automata by operating over arbitrary graph topologies, where each node represents a cell

with its own internal state, and edges deﬁne local neighborhoods (Grattarola, Livi, and

Alippi, 2021). This ability allows them to model systems with a non-uniform connectivity,

such as neural networks. Like standard NCAs, graph NCAs rely on local, shared update

rules, but they generalize these rules to work over graph structures instead of ﬁxed g rids.

This enables the growth and self-organization of systems that are not conﬁned to spatial

latticesÐsuch as neural circuitsÐbridging the gap between self-organizing developmental

systems and functional artiﬁcial architectures.

In NDPs, the goal of the graph NCA is to grow and adapt a policy network to control

an agent in an environment, solely based on each neuron’s local information received from

its neighbors. Note that while the approach grows a neural architecture, the goal here is

diﬀerent from techniques like NEAT and the other neural architecture search methods

201

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

RL Environment

Policy network

cell update

(new state)

8 inputs

4 outputs

Layer 1

Layer 20

Layer 2

One Developmental step

3D NCA

n steps

}

3D Conv.

FC Network

(accross

channels)

Policy EvaluationPolicy Developmental Growth

Figure 7.12: Hyper Neural Cellular Automata (HyperNCA): In a developmental growth phase

(

𝑙𝑒 𝑓 𝑡

), a 3D NCA updates an initial random seed over a ﬁxed number of steps. The NCA and the

seed may contain one or multiple information channels; for simplicity, a single-channel example

is shown. In the policy evaluation phase (

𝑟𝑖𝑔ℎ𝑡

), the ﬁrst channel of the developed pattern is

interp reted as the weight matrix of a policy network, which is then evaluated on the particular task.

Figure from Najarro, Sudhakaran, Glanois, et al. (2022).

we will have a closer look at in chapter

10. While these methods change the architecture

of the neural networks during evolution, the idea in NDPs is to grow neural networks

during a developmental phase. The beneﬁts of this approach are that the development

of the neural network can be shaped by the experience and take advantage of sensory

information from the environment to drive the neural developmental process.

A more detailed view of the NDP approach is shown in ﬁgure 7.13. Each node

in the g rowing graph has an internal state vector, whose values are updated during

the developmental process based on the local communication between nodes. The

NDP has three neural networks: One of these networks is responsible for updating the

aforementioned hidden states of the nodes, while a second network takes a state of a node

as input and predicts whether this node should replicate. The third network takes the state

of two hidden nodes as input and outputs the edge weight between them.

A good initial test to evaluate the expressiveness of these NDPs is to task them

with growing graphs with properties found in many biological neural networks. One

predominant topological characteristic of these biological networks is small-worldness,

which means networks that are characterized by small average shortest path lengths and

relatively larger clustering coeﬃcients. And in fact, optimizing an NDP directly for these

two properties with CMA-ES did indeed lead to a graph satisfying the small-worldness

criteria. A more complex task involves optimizing the NDP to grow a policy neural

network that enables an agent to interact successfully with its environment. When applied

to various control tasks such as Cartpole, LunarLander and HalfCheetah, CMA-ES was

able to ﬁnd high-performing NDPs. Looking into the growth sequence of one of these

networks for the cart pole balancing task, shows the rapid proliferation of nodes during the

ﬁrst few developmental stages (ﬁgure 7.14). This rapid increase in the number of nodes is

an interesting diﬀerence to e.g. NEAT. Even an NDP from early in evolution could grow

202

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Figure 7.13: Neural developmental program approach. During the stage of information

aggregation, the graph systematically transmits the state

𝑠

of each node to its adjacent nodes over

𝑛

iterations. The replication model network takes as input the updated node state

𝑠

𝑡+𝑛

and decides

which nodes should replicate. Another network comes into play to determine the weights of the

edges connecting each node pair, using their combined embeddings. Once the network is grown

for the given number of developmental steps, it is then evaluated to solve a speciﬁc task. From

Najarro, Sudhakaran, and Risi (2023).

networks with large numbers of nodes, while NEAT typically requires many generations

to gradually add and reﬁne nodes and connections through structural mutations. However,

the relative beneﬁts and drawbacks of NDPs versus NEAT are not yet entirely clear and

will require some deeper exploration in the future.

While there are many open research directions regarding developing more power ful

NDPs, the fact that NDPs can capture some of the fundamental patterns seen in biological

networks through self-organization and local growth alone suggests they can be a good

base for further exploration. For example, the NDP model can be used to study diversity

maintenance in neural populations. And in fact, a key issue with training the original NDPs

is that if all neurons diﬀerentiate into the same type, growth-related decisions become

uniform, leading to homogeneous ANN structures incapable of producing complex

behaviors. Two biological-inspired key modiﬁcations can resolve this issue (Nisioti,

Plantec, Montero, et al., 2024). First, introducing intrinsic states that remain unchanged

during growth ensures that diversity is preserved in the network. By initializing networks

with a small set of cells, each with a distinct intrinsic state, diversity can be introduced

at the start of growth. As the network expands, these intrinsic states are replicated,

resulting in cell lineages similar to biological networks. The second mechanism is lateral

inhibition, which is believed to play a crucial role in maintaining diversity during biological

development. This mechanism prevents neighboring cells from taking similar actions for a

limited number of steps when one cell makes a decision. While the role of lateral inhibition

regarding agent performance is currently less clear, adding intrinsic states allowed the

NDP to perform much better. It reached performance levels similar to a hypernet-based

approach across a diversity of complex control tasks such as the ant, inverted double

pendulum, reacher robot arm, and HalfCheetah (ﬁgure 7.15).

Another key limitation of the original NDP model is that it was temporally constrained

to a pre-environmental phase and did not account for an agent’s lifetime, let alone lifelong

learning. That is, the networks were grown during a developmental phase but remain

203

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Figure 7.14: NDP growth of a network solving the CartPole task. The network begins as a solitary

node and progressively develops into a more complex network, encompassing two, four, ﬁve, and

ultimately ten neurons, along with 33 weighted edges, over the course of four growth stages. Within

this network, the red nodes function as sensory neurons, the white nodes serve as hidden neurons,

and the blue nodes operate as output neurons. Above each neuron, there is a vector displayed,

representing the node embeddings. These embeddings are indicative of the state of each neuron

throughout the stages of network development. These results demonstrate that NDPs can enable the

growth of well-performing policy networks during a phase of neural development. Figures from

Najarro, Sudhakaran, and Risi (2023). Videos at

https://neuroevolutionbook.com/demos

static while the agent interacts with the environment. However, as we will explore

more in section 12.3, for many tasks, lifetime adaptation is critical. The lifelong NDP

version (LNDP) introduced a mechanism that enables plasticity and structural adaptation

throughout an agent’s lifetime (Plantec, Pedersen, Montero, et al., 2024). This is achieved

through local computations based on the activity of individual neurons in the ANN and

the global reward signals from the environment. This method performed similarly to the

original NDP in tasks not requiring lifetime adaptation, such as CartPole. However, when

applied to a foraging task that necessitates the agent to learn and remember the position of

a randomly placed food source, the LNDP performed signiﬁcantly better.

More broadly, the NDP highlights the diﬀerences between approaches that are based

on bottom-up self-organization vs. the established top-down engineering. While these

approaches have yet to be able to compete with current state-of-the-art methods, they oﬀer

an exciting alternative to achieving more robust and adaptive forms of neural networks.

7.3.5 Combining Evolutionary Creativity with GD Precision

Neuroevolution works especially well when it is less constrained, taking advantage of the

power of evolution’s creative discovery. For example, neuroevolution is well-suited to

evolve neural networks that grow soft robots able to locomote or video game levels with

interesting properties. However, these algorithms can struggle when tasked to reevolve a

target pattern that requires traversing many diﬀerent stepping stones (section 5.3). The

same is true for evolving morphogenetic systems that are tasked to grow a more complex

target pattern.

If a target is given, such as a particular 2D or 3D structure, it makes sense to take

advantage of eﬃcient gradient descent to optimize for growing that target directly. For

example, NCA can be trained eﬃciently through backpropagation to grow certain 2D

images (Mordvintsev, Randazzo, Niklasson, et al., 2020) or even functional 3D Minecraft

structures that can regrow damaged components (Sudhakaran, Grbic, S. Li, et al., 2021).

Some of these examples are shown in ﬁgure 7.16.

204

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Figure 7.15: NDP performance across tasks. The original vanilla NDP is compared to a version

that includes intrinsic states and to a version based on hypernetworks, which does not include

development. Intrinsic states allow the NDP to perform signiﬁcantly better in more complex

domains. While the approach does not outperform a hypernetwork approach, it is able to reach a

competitive performance through a completely decentralized approach based on neural growth.

Note that in all four experiments, NDP-vanilla converged to a degeneration policy early in training

and was therefore run for fewer generations.

Returning to the task of evolving NCAs to create resilient soft robots oﬀers an

interesting opportunity for combining the beneﬁts of evolution for creative discover y and

gradient descent for eﬃcient optimization (Horibe, Walker, Berg Palm, et al., 2022). One

idea is to use the undamaged morphologyÐdiscovered through evolution as a training

target for regeneration. Once a robot morphology is evolved for eﬀective locomotion,

that intact structure becomes the goal for the NCA to regrow after damage. This is a

challenge gradient descent is perfectly suited for, and by training the NCA toward this

target, the system learns to reconstruct complex, functional morphologies from partial

or damaged states. This approach allows the strengths of evolution (creative discovery)

and supervised learning (precise reconstruction) to be combined in a single framework.

Figure 7.17 shows an overview of this hybrid approach: (1) A diversity of morphologies is

discovered through evolutionary optimization. (2) A neural cellular automata is trained to

regrow a target mor phology found by evolution under diﬀerent damages through gradient

descent. (3) The resulting NCA is able to grow a soft robot while being able to recover

from extreme forms of damage.

The results show that using gradient descent to train for recovery signiﬁcantly

outperformed using neuroevolution alone for the same task. When neuroevolution was

used to train a second NCA for regeneration (section 7.3.2), the robots could partially

recover their original morphology and locomotion, but the results were limited. For

example, morphological similarity to the original robot topped out around 91ś99%,

and locomotion recovery was inconsistentÐsome robots regained only 20ś45% of

their movement, depending on the complexity of the damage and the morphology. In

contrast, when gradient descent was used to train the same NCA to handle both growth

and regeneration, the robots not only regrew more accurate morphologies (achieving

97.9ś100% similarity across multiple damage types), but they also recovered a greater

205

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Figure 7.16: Learning to grow diﬀerent 3D target structures. An NCA is trained through

gradient descent to grow a given target pattern. The approach is able to grow both static structures,

such as a tree or an apartment building, but also functional machines, such as a locomoting

caterpillar. The caterpillar can even regenerate into two creatures when cut in half. Figures from

Sudhakaran, Grbic, S. Li, et al. (2021). Videos at

https://neuroevolutionbook.com/demos

percentage of their locomotion ability, often over 80% and in some cases 100%.

In summary, combining evolutionary algorithms with gradient descent-based tech-

niques oﬀers a promising approach for developing systems that are both innovative and

resilient. Evolutionary processes excel at exploring a vast search space of potential

solutions, producing a diversity of designs and behaviors that are often not achievable

through gradient-based methods alone. This creative potential is particularly advantageous

in open-ended domains like soft robotics, where unconventional solutions can emerge.

On the other hand, once a target design or structure is identiﬁed, gradient descent-based

training shines in its ability to ﬁne-tune and optimize the system eﬃciently, enabling

robust growth and regeneration capabilities.

This chapter explored how cooperative and competitive coevolution can drive the

emergence of complex behaviors in agents and systems. Through cooperative coevolution,

individual components evolve together to form robust and specialized solutions. In contrast,

competitive coevolution fosters open-ended discovery via evolutionary arms races, where

agents continually adapt against evolving opponents. While collective systems can evolve

autonomously, some problems beneﬁt from human intuition and creative input, especially

when goals are hard to formalize. In the next chapter, we turn to how we can bring humans

into the loop, allowing them to guide evolution based on more subjective criteria.

206

CHAPTER 7. NEUROEVOLUTION OF COLLECTIVE SYSTEMS

Figure 7.17: Combining evolutionary discovery and gradient descent precision. (1) Evo-

lutionary optimization is used to discover a wide range of diverse morphologies. (2) A neural

cellular automaton (NCA) is then trained to regenerate these target morphologies, even after

diﬀerent types of damage. (3) The trained NCA can successfully grow a soft robot and recover

it from severe damage. Figure from Horibe, Walker, Berg Palm, et al. (

2022). Videos at

https://neuroevolutionbook.com/demos.

7.4 Chapter Review Questions

Conceptual Understanding: What are the fundamental diﬀerences between coop-

erative and competitive coevolution, and how do they contribute to neuroevolution?

Cooperative Coevolution: Describe the concept of shared ﬁtness in cooperative

coevolution. How does it ensure eﬀective collaboration among components?

Evolving Single Neural Networks: How does the ESP system (Enforced Subpopu-

lations) improve upon the SANE system in evolving neural networks?

Specialization in Subpopulations: Why is redundancy within subpopulations

important in the context of ESP, and how does it lead to robust networks?

Evolving Teams: In the predator-prey scenario, how do stigmergy-based coordina-

tion strategies lead to eﬀective team behaviors without direct communication?

Competitive Coevolution: How does competitive coevolution establish an open-

ended ﬁtness function, and what challenges does it face in ensuring progress?

Evolutionary Arms Race: Using the zebras and hyenas example, explain how

alternating advantages between predator and prey populations drive increasingly

complex behaviors.

Cellular Automata: What role do local interactions play in the emergence of

complex patterns in CAs, and how are these principles applied to neural CAs?

Applications of Neural CAs: How can NCAs be used to solve tasks like the French

ŕag problem or pattern replication? What are their advantages over traditional

approaches?

10. Evolving Resilient Systems: Explain the hybrid approach combining neuroevolu-

tion and gradient descent for growing and regenerating resilient soft robots. How

does each method contribute to the overall system’s functionality?

207

Chapter 8

Interactive Neuroevolution

The previous two chapters discussed how the behavior of agents that operate embedded in

an environment can be discovered through neuroevolution. Starting from reactive control

and expanding all the way to sequential decision-making strategies, eﬀective solutions

can be discovered that may be surprising to human designers. Moreover, discovery can

be embedded in a collective environment, where opponents and cooperators are evolving

as well, thereby providing new and creative challenges. In some cases, however, it may

be useful for human designers to drive this discovery process more explicitly. They may

have knowledge that is diﬃcult to capture in a formal objective function. For instance,

the desired behavior may be complex and multifaceted, or depend on believability or

aesthetic values. In such cases, neuroevolution can be made interactive. The construction

of new individuals is still done through evolutionary operators, but the selection is at least

partially due to human judgment. This chapter reviews how interactive neuroevolution can

be set up eﬀectively, and demonstrates it in several examples in various game domains.

8.1 The NERO Machine Learning Game

Setting up neuroevolution experiments sometimes feels like a game. You have a goal in

mind, i.e. an idea of what you want the evolved agents to do. You have to think about

how to express that behavior in terms of an objective function, which in turn depends

on behavioral descriptors that can be readily measured. You may need to come up with

a shaping strategy, starting with simpler behaviors and gradually making the objective

function more demanding. You may need to try out many diﬀerent such setups before

ﬁnding some that achieve eﬀective behavior. There may be several such solutions, and

some of them may even surprise you. Finding such solutions, and perhaps better than

those seen before, is what makes this game appealing.

NERO (Stanley, Bryant, and Miikkulainen, 2005) is an actual game built on this

very idea. It can be seen as a pioneering eﬀort to establish a new game genre, machine

learning games. Unlike in other genres, such as ﬁrst-person shooter games or sims, the

human player is not controlling game agents directly. Instead, the player takes the role of

a teacher/coach/drill sergeant, designing a curriculum of learning challenges for actual

agents in the game. Those agents solve the challenges using machine learning. After

208

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

learning, the agents engage in a head-to-head competition with other similarly trained

agents in order to determine how good the training was.

More speciﬁcally, in the NERO game agents are battle robots controlled by neural

networks evolved with NEAT (ﬁgure

8.1

𝑐

𝑑

). The entire population of them is placed

in the environment at once. The environment is usually an enclosed area with walls,

buildings, trees, and other objects, allowing the agents to move around, hide, and take

cover. Simple algorithmically controlled enemy agents can be placed in it, including static

enemies (and ŕags) that act as targets, static enemies that ﬁre at the agents, and mobile

enemies that ﬁre and approach the agents. As their input, they observe the number and

distance to enemy agents as well as teammates in sectors around them, distance to walls

and other static objects in several directions, whether their weapon is on target, and the

direction from which the ﬁre from the nearest enemy is coming. As their output, they can

move forward and back, turn left and right, and ﬁre their weapon.

In such an environment, NEAT can evolve networks that exhibit interesting behaviors.

The agents can charge the enemy, approach from diﬀerent directions, disperse in order to

be less likely to hit, converge to increase ﬁrepower, take temporary cover behind walls, hide

in order to survive until the end of the game, and many others. The interesting question is:

what kind of behaviors are useful in a battle against an actual enemy? Further, how can we

encourage evolution to discover such behaviors, while still encouraging open innovation

as well? This is precisely the question interactive neuroevolution aims to address.

In NERO, the human player has a number of tools at their disposal (ﬁgure 8.1

𝑎

𝑏

They can place various objects in the ﬁeld, such as walls, static and mobile enemies,

and ŕags. They can control a number of sliders that correspond to coeﬃcients in the

objective function, such as approach/avoid the enemy, hit a target, avoid getting hit, follow

teammates, disperse, etc. Both objects and sliders can be changed dynamically as the

training progresses, therefore making it possible to design a curriculum. For instance, it

is may be useful to reward the agents for approaching the enemy ﬁrst, then do it while

avoiding ﬁre, then while avoiding ﬁre from moving enemies, then while utilizing walls as

cover, etc. (ﬁgure

8.2). Such curricular evolution, or shaping, can result in more complex

and eﬀective behaviors than could be achieved with a single static objective function

without human guidance.

One interesting extension needs to be made to the NEAT method, however. Note that

the entire population is evaluated in the environment at the same time. This approach

makes the evolution eﬃcient, since the evaluations are done in parallel. The population

is also always visible to the human player, making it easier to understand how well the

evolution is progressing. However, if the entire population is replaced at the same time, as

is usual in generational evolution, the game appears discontinuous and diﬃcult to follow.

Instead, evolution needs to progress continuously one agent at a time.

In this real-time extension of NEAT, called rtNEAT, among all the agents that have

been evaluated suﬃciently long, the worst agent is removed from the population. The

species are recalculated, and an oﬀspring is generated as usual in NEAT. This oﬀspring is

then placed in the environment to be evaluated. This replacement takes place at regular

intervals, and because it involves only one individual at a time, is largely invisible to the

human player. In this manner, evolution progresses continuously while the population

209

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

(𝑐) Possible objects (𝑏) Sliders deﬁning ﬁtness

(𝑐) A network controlling one agent (𝑑) A population being evaluated

Figure 8.1: Setting up a NERO experiment. The NERO game allows specifying increasingly

challenging environments so that complex behavior can be evolved. (

𝑎

) The human player can

place various objects in the environment to create challenges, including walls, ŕags, static enemies,

and moving enemies. (

𝑏

) The human player controls the ﬁtness by adjusting sliders with continuous

positive or negative values along various dimensions such as approach an enemy, approach a ŕag,

hit a target, avoid getting hit, and stay together with teammates. (

𝑐

) Each agent in the game is

controlled by a neural network evolved through NEAT. As its input, it senses the environment

around it, including enemies, teammates, walls, and other objects; it also senses whether its

weapon is on target, and the direction from which the nearest ﬁre is coming. As its output, it issues

actions to move forward and back, turn left and right, and ﬁre. (

𝑑

) During evolution, the entire

population of agents is evaluated together in an enclosed environment that may contain multiple

objects. In this case, the agents spawn on the right and are rewarded for approaching the ŕag on

the left. At regular intervals, the worst agent is replaced by oﬀspring in a continuous replacement

process. In this manner, the human player can create a curriculum of increasingly challenging

tasks that prepares the team well for battle against other teams. For animations of various training

scenarios, see

https://neuroevolutionbook.com/demos

. Figures from Stanley, Bryant, and

Miikkulainen (2005).

is being evaluated. Although it was designed for the visual eﬀect in NERO, the same

approach can be useful in other domains where continuous adaptation is needed.

After the cur ricular evolution is complete, the teams are evaluated in a battle mode

of NERO. Two teams are placed in the same environment, which may be the same one

used in training, or something completely diﬀerent. At this stage (in NERO 1.0), the

agents operate independently of the human player, applying what they were trained to

do in competition with another team. If an agent is hit a suﬃcient number of times,

it is removed from the environment. The game ends when one team is annihilated or

the clock runs out, in which case the team with the most agents still on the ﬁeld wins.

Note that the battle domain is obviously a violent game, similar to many video games

in the ﬁrst-person shooter genre. The principles are more general, however, and apply

to less violent settings as well. In fact, neuroevolution can play many diﬀerent roles in

210

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

Figure 8.2: Training NERO teams through interactive neuroevolution. The player ﬁrst speciﬁes

a simple task such as approaching a static enemy that ﬁres (a łturretž), so the agents learn to

approach it from diﬀerent sides. In the next scenario, they learn to approach one turret while

minding ﬁre from another. Next, the turrets move and turn, and the agents need to take cover

behind walls. Through multiple such increasingly challenging scenarios, the agents learn eﬀective

battle behaviors. The team is then placed into a battle against another team, evaluating how well

the human player was able to train them. NERO thus aims at creating intelligent behavior strategies

through interactive neuroevolution. Figure from Stanley, Bryant, and Miikkulainen (2005).

video games (Risi and Togelius, 2015). For example, in section 8.4, we examine how

it contributes to the procedural generation of content in the gardening game Petalz. A

robotic battle domain, however, provides clear and compelling measures and visualizations

of performance, which were useful for a pioneer ing example of machine learning games.

Often interesting interactions result that were not anticipated, suggesting ideas for further

interactive neuroevolution of the team.

One of the ﬁrst behaviors is often to approach a ﬁring enemy. The agents quickly

evolve to avoid ﬁre by going around and approaching from the side. This behavior is

general and adapts easily to enemies that are turning. If subsequently the łapproachž

slider is abruptly changed to łavoidž (i.e. negative rewards for approaching), an interesting

demonstration of evolutionary search can be seen. As always, there are individuals in the

population that do not perform very well. Even if most agents approach the enemy, some

of them may stand still, roam around, or run away. When the slider changes, they become

the seed for the behavioral change. They receive higher ﬁtness, and their oﬀspring take

over the population, resulting in avoidance in a few reproductions.

In some cases, careful curriculum design can be used to construct eﬀective desired

behaviors. For instance, it is possible to evolve the agents to run through a maze to a target

on the other side. First, the environment may consist of a single wall, and gradually more

walls in complex conﬁgurations as the agents evolve to run around them (ﬁgure 8.3

𝑎

The resulting behavior can be quite general and eﬀective, despite involving no actual

path planning. It is enough for the agents to know the general direction; they can then

navigate around even complex mazes, as long as they do not contain deceptive traps.

Combined with the objective of dispersing, the agents also take diﬀerent paths through the

mazeÐwhich is eﬀective because it is diﬃcult to defend against an enemy that approaches

from many directions at once.

On the other hand, evolution can still discover surprising and eﬀective behaviors as

well. One such result was that the agents sometimes evolved to run backward (ﬁgure 8.3

𝑏

This seems odd at ﬁrst, but does serve a purpose in some cases. If the enemy tends to

pursue the agents persistently, running backward is useful because the weapon remains

211

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

(𝑎) Running a maze (𝑏) Running backward while shooting

(𝑐) Forming a ﬁring squad (𝑏) Subteams of three agents

Figure 8.3: Discovery of expected and unexpected behaviors in NERO. What makes the

game interesting is that the player has some control over what will happen, but evolution will

also ﬁnd surprising solutions. (

𝑎

) By gradually adding more walls and rewarding the agents for

staying away from each other, they evolve to take various paths through the maze, without any

explicit path planning. (

𝑏

) An eﬀective strategy for hitting the target while not getting hit is to

run backward while shooting. (

𝑐

) An avoidant team can be eﬀective when they have time to

back up against a wall, forming a ﬁring squad. (

𝑑

) A subteam of three agents is agile and has

signiﬁcant ﬁrepower. These discoveries and many more like them were surprising, resulting from

evolution solving the challenges posed by the human player. In this manner, humans can provide

guidance while still letting evolution to ﬁnd creative solutions. For animations of these and other

battle behaviors, see

https://neuroevolutionbook.com/demos

. Figures

𝑎 − 𝑐

from Stanley,

Bryant, and Miikkulainen (2005).

pointed to the enemy. Another discovery was that extremely avoidant behavior can be

eﬀective in battle (ﬁgure

8.3

𝑐

). That is, most of the time aggressive teams are evolved

that approach the enemy and pursue it if they retreat. An avoidant team, however, would

retreat until the agents have their back against the wall. It turns out that if they are fast

enough to do it, so there is still enough of them, they form a ﬁr ing squad that is very

diﬃcult to approach, and aggressive pursuers are often eliminated. Yet another surprising

discovery was that some teams evolved to form subteams of three agents (ﬁgure 8.3

𝑑

they approach the enemy together, they ﬁre at the same enemy, and they retreat together.

Such a subteam is eﬀective because it has signiﬁcant ﬁrepower yet is very agile. Evolution

discovered it independently; however, this principle turned out to be well established in

actual military training.

212

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

One interesting question in NERO is: Is there an actual best strategy in the game,

or does it support several diﬀerent strategies that each dominate some, but not all, other

strategies? This is a crucial question for machine learning games in general, as well as

interactive neuroevolution. While it is diﬃcult to answer this question conclusively, it is

possible to conduct a large-scale experiment with many players and evaluate the resulting

strategies.

The ﬁrst massive open online course (MOOC) on Artiﬁcial Intelligence in 2011, run by

Peter Norvig and Sebastian Thrun, provided such an opportunity (Karpov, L. M. Johnson,

and Miikkulainen, 2015). As an optional assignment in the course, the students designed

NERO teams, and a comprehensive round robin tournament was run with them. Out of the

156 submissions, some performed much better than others, and the teams could be ranked

according to total wins: The best one won 137 times, then next 130, then two teams at

126, then 125, 124, 123, etc.

When the behavior was characterized in terms of actions taken in various situations,

ten major behavioral strategies were identiﬁed. However, none of them were clearly more

successful than others; what mattered the most was how well they were implemented.

What is most interesting, however, is that there was clear circularity among the best teams:

Team A beat Team B, which beat Team C, which beat Team A. This result suggests that

it is unlikely that one best strategy exists, but diﬀerent behaviors are required to do well

against diﬀerent opponents. Both of these properties make the game more interesting to

human players, and suggest that machine learning games are indeed a viable genre. They

also suggest that human intuition in interactive evolution can be useful and can provide

an outlet for human creativity, as is also demonstrated in the following sections of this

chapter. Furthermore, combining human and machine insight is a powerful approach for

designing complex systems.

The software for the original NERO, as well as its open source version, is available

from the book website. The original NERO includes version 2.0 of the game, which

features human guidance also during the battles, as well as the ability to construct teams

by combining individuals from diﬀerent evolutionary runs. The goal was to make the

teams more versatile and the gameplay more interactive; the interactive evolution aspect

remained the same. OpenNERO was also designed to support other AI and machine

learning methods, making it possible to compare and demonstrate diﬀerent approaches to

intelligent agents. They can serve as a starting point for exercises and projects in this book.

8.2 Incorporating Human Knowledge into NERO

NERO is one of the ﬁrst examples of a genre of machine learning games, i.e. the gameplay

consists of players interacting with a machine learning system. Its focus was on one

particular kind of interaction, i.e. on shaping neuroevolution through human insight.

However, it is possible to incorporate human knowledge into neuroevolution in other ways

as well, including explicitly through rule-based advice and implicitly through behavioral

examples.

Note that these approaches are useful in creating intelligent agents in general; for

instance, advice can be used in prey capture to help the agent evolve a corralling strategy,

213

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

pushing the prey into the corner rather than chasing it in circles (Fan, Lau, and Miikkulainen,

2003). Similarly, examples can be used to train agents in a strategy game to establish

behavioral doctrines that also observe safety constraints, resulting in visibly intelligent

behavior that does not easily emerge on its own in neuroevolution (Bryant and Miikkulainen,

2007). However, advice and examples can be most clearly demonstrated and evaluated in

NERO because it is an interactive evolution environment to begin with.

In NERO, successful behaviors are discovered through exploration. This means that

even the most obvious ones, like moving around a wall without getting stuck, take many

iterations of tr ial and error. This process is often frustrating to watch because eﬀective

behavior is obvious to the observer, and s/he might as well tell the agents what they should

do. Evolution can then use that advice as a starting point, modify it fur ther, and move on

to more interesting discoveries faster.

A mechanism for incorporating such advice into evolving neural networks can be

built based on knowledge-based artiﬁcial neural networks (KBANN; Towell and Shavlik,

1994). The knowledge is ﬁrst speciﬁed in a set of rules, such as łif a wall is some

distance in front, then move forward and turn rightž and łif a wall is near 45 degrees

to the left, then move forward and turn slightly right.ž The rules are then converted

into partial neural network str uctures: The conditions are coded as input nodes and

consequences as output nodes, with hidden nodes mapping between them (ﬁgure 8.4

𝑎

𝑏

;

Yong, Stanley, Miikkulainen, et al., 2006). These structures are spliced into each existing

neural network in the population, thus adding the wall-circling behavior to their existing

behaviors. Weight values are usually constant, with a positive or negative sign, but can

also be g raded to indicate e.g. the degree of turn. Note that such additions are natural

in NEAT, which already has mechanisms for growing the networks through add-node,

add-connection, and change-weight mutations. Evolution then continues to modify these

networks, incorporating the advice into the general behavior, modifying the advice to

make it more useful, or even rejecting it entirely and changing it into something else.

Conﬁdence values can be used to specify how likely such modiﬁcations are, i.e. how

immutable or plastic the advice is. Given that the evolutionar y changes modify rules that

were originally inter pretable, the modiﬁcations may be interpretable as well, i.e. it may be

possible to explain what new knowledge evolution discovers in this process.

Experiments demonstrate that such advice indeed helps learn the task of e.g. going

around the wall faster (ﬁgure 8.4

𝑐

𝑑

). Remarkably, if the task changes so that it is now

better to go around the left side instead of the right, adaptation is very fast: evolution

quickly changes the output actions to the left while the rest of the advice network structure

stays the same. If the task changes again to make the right side better, there’s little

diﬀerence between networks that evolved with advice or not. In both cases, the advice

has become incorporated into the general network structure. In this manner, advice helps

evolution discover the needed behaviors but does not constrain evolution in the longer

term.

In some cases, it may be diﬃcult or inconvenient to write down advice as rules, but it

may be easy to demonstrate the desired behavior by driving an agent in the game. For

instance, the knowledge about going around a wall can be presented in this way. The

agent is placed in a starting location, the player takes possession of it, and gives movement

214

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

(𝑎) The advice network

structure

(𝑏) Advice spliced

into a NERO network

(𝑐) The three phases of the experiment (𝑑) Performance over generations

Figure 8.4: Utilizing rule-based advice in NERO. It is sometimes useful to be able to guide the

evolutionary discovery with human knowledge. Such knowledge can be expressed as rules and

incorporated into the population of networks. (

𝑎

) As an example, two rules about going around

the wall on the right side are encoded as a partial network structure. (

𝑏

) This structure is then

spliced into NEAT networks like any mutation. The networks continue to evolve to take advantage,

modify, or co-opt the advice to perform better. (

𝑐

) A snapshot of NERO with the three sequential

positions identiﬁed. The agents were ﬁrst rewarded for going to the ŕag in the middle, then to the

one at left, then the one at right. (

𝑑

) The advice suggested going to the ﬁrst ŕag around the right

side, and it sped up evolution compared to having no advice. When the ŕag was moved to the

left, networks with advice adapted very quickly, utilizing the same advice structure with diﬀerent

output actions. After the ŕag was moved again, there was no diﬀerence in adaptation with or

without advice, suggesting that the advice had become incorporated into the network like any other

structure in it. Figures from Yong, Stanley, Miikkulainen, et al. (2006).

commands that take it to the target ŕag. At each step, the inputs and outputs to the agent

are recorded and used as a training set with backpropagation through time; alternatively,

the path of the agent can be divided into segments, and the actions that keep the agent on

the example path used as targets. The agent is ﬁrst trained to reproduce the ﬁrst segment,

then the ﬁrst two, and so on until it successfully replicates the entire example. The

weight changes are encoded back to the genetic encoding of the network (implementing

Lamarckian evolution), and are thus inherited by its oﬀspring.

It is interesting to evaluate how well each of these methods for incorporating human

knowledge (e.g. shaping, advice, and examples) works in interactive neuroevolution. To

215

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

(𝑎) Going around

a wall

(𝑏) Catching a

moving target

(𝑐) Traversing through

waypoints

Figure 8.5: Tasks for evaluating methods that incorporate human knowledge in NERO.

Plain neuroevolution from scratch on one hand and full scripting of behavior on the other were

compared with advice, examples, and shaping. Plain neuroevolution turned out to be more

successful than scripting, and at least one of the human-guided methods more successful than plain

neuroevolution: examples in (

𝑎

), advice in (

𝑏

), and shaping in (

𝑐

). Thus, the diﬀerent methods

of incorporating human knowledge can play a diﬀerent role in constructing intelligent agents in

interactive neuroevolution domains. Figures from Karpov, Valsalam, and Miikkulainen (2011).

this end, a human-subject study was conducted (Karpov, Valsalam, and Miikkulainen,

2011). A total of 16 participants were given three tasks: going around the wall, catching a

moving target, and traversing a trajectory consisting of multiple waypoints (ﬁgure 8.5).

They were instructed to solve these tasks by two diﬀerent methods: by writing a set of

rules, i.e. a script for the entire behavior, and one other method, which was either advice,

examples, or shaping, randomly chosen and in random order. Their performance was

recorded, and they were surveyed afterward; the performance was also compared with

plain neuroevolution from scratch without any human knowledge.

The surveys suggested that the example-based approach was favored as the best quality

approach, then scripting, shaping, and advice. Shaping was found to be low quality in

the moving-target task, advice low quality in the waypoints task, and all methods were

found to be good in the wall-circling task. These ratings did not always correlate with the

rate of success, suggesting that they mostly measure how easy or fun it was to use each

methodÐwhich is useful information on its own.

The recordings were used to measure the average time to a successful solution, with a

30-minute upper bound. It turned out that scripting was the most diﬃcult way to achieve

successful performance: even plain neuroevolution was more successful. Interestingly,

at least one human-assisted method performed better than plain neuroevolution. Advice

was most eﬀective in catching the moving target. It was possible to specify an intercept

course rather than chasing the target indeﬁnitely. In general, advice makes sense when

the behavior can be expressed as a general rule. In contrast, examples were best in the

going-around-the-wall task. Indeed, this approach is most appropriate when the desired

behavior is concrete and speciﬁc. Shaping, the usual staple of the NERO game, was the

most eﬀective in the waypoint task, where it was possible to start with a single target

and then gradually add more waypoints. The approach makes sense in general in tasks

where it is possible to start with a simpliﬁed or partial version and then gradually make

the task more demanding. In this manner, each of the diﬀerent ways of incorporating

human knowledge into interactive neuroevolution can play a diﬀerent role in constructing

216

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

Figure 8.6: A proposal for active human-guided neuroevolution. The human expert provides

advice, examples, and shaping for the neuroevolution process. The process monitors itself and

determines what kind and when such input would be most useful. In this manner, humans and

machines can work synergistically to construct intelligent agents. Figure from Karpov, L. M.

Johnson, Valsalam, et al. (2012).

intelligent agents.

When exactly should each of these methods be used? An interesting possibility for

the future is for the interactive evolution system itself to request advice, examples, and

shaping when it deems it most helpful (Karpov, L. M. Johnson, Valsalam, et al., 2012).

For instance, the system can identify parts of the state space where it has little experience,

or that are least likely to lead to success, or where the population of agents disagrees the

most, and where its previous advice or examples do not apply. It can then present the user

with an advice template specifying such a situation and ask the user to ﬁll in the blanks.

Alternatively, it can present a starting point for the agent and ask the user to provide an

example. If evolution seems to have stagnated, it could prompt the user to shape either

the rewards or the environment to get evolution going again. It could even make speciﬁc

suggestions, such as adjusting the sliders to make the task more demanding, or rolling back

prior simpliﬁcations. Such an ability would eventually result in interactive neuroevolution

where human knowledge and machine exploration work synergistically in both directions

to solve problems (ﬁgure 8.6).

217

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

Figure 8.7: Picbreeder interface. Users in Picbreeeder select at least one CPPN-generated

image, from which subsequent populations are generated through mutations and crossover of

the underlying CPPNs. Users can also move back and forth through the generations and publish

their creations, allowing others to branch oﬀ from their discoveries. Figure from Secretan, Beato,

D’Ambrosio, et al. (2011).

8.3 Neuroevolution-enabled Collaboration

While NERO enabled players to shape the evolution of their team of agents, the game

did not allow many humans to collaboratively train their teams by building on the

interesting behaviors found by others. This section showcases some examples of inter-

active neuroevolution applications and games that were developed to incorporate such

collaboration.

In particular, we’ll take a closer look at Picbreeder (Secretan, Beato, D’Ambrosio,

et al., 2011), a highly inŕuential generative AI system that came out of the lab of Kenneth

Stanley. Picbreeder is a great example of a system that allows users to perform collaborative

interactive neuroevolution, enabling them to explore a large design space together. Similar

to Dawkin’s BioMorphs from his book łThe Blind Watchmakerž, the basic idea in

Picbreeder is to breed images. Users are presented with several images and asked to select

the ones they like the most (ﬁgure 8.7). The selected images are then used as parents to

produce a new generation of images through crossover and mutation of the underlying

representations. The new generation of images becomes the next population, and the

process iterates. With each generation, users continue to select the images they prefer, and

the algorithm evolves the images based on their choices.

218

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

Images in Picbreeder are represented by CPPNs (section 4.3.1) and modiﬁed by the

NEAT algorithm (section 3.3). While the CPPN representation allows users to easily evolve

images with interesting regularities, employing NEAT for the mutation and crossover

of CPPNs has an added beneﬁt: the evolved images gradually get more complex over

generations because the underlying CPPNs are becoming more complex. To allow users

to navigate the space of images in a meaningful way, NEAT mutation parameters for

Picbreeder have to be chosen in a way such that the next generation of images resembles

their parents but also shows interesting variations.

With such an interactive evolution interface, one user by herself can already explore

parts of the design space of images, but there are only so many generations a single person

can evolve images for. Single-user interactive evolution applications often suﬀer from

what is called user fatigue: The user might not see anything very interesting within 10

to 20 generations and thus lose interest in exploring further (Takagi, 2001). Picbreeder

addresses these issues in a clever way, by allowing users to evolve collaboratively, thereby

taking advantage of the fact that diﬀerent users naturally want to evolve diﬀerent artifacts.

For example, some users might start with the idea of evolving a particular image, such as

an insect, while others keep selecting the images that appear most compelling to them

without a preset target in mind. In Picbreeder, a user can see what others have evolved

and decide to continue evolution from any of their published images, a mechanism called

branching. Through this process, users have been able to explore large parts of the design

space. Figure 8.8 shows some selected images that many users were able to evolve together.

Initially, starting out from abstract shapes similar to the ones shown in ﬁgure 8.7, users

were able to collaboratively evolve a great variety of diﬀerent images, resembling subject

matters such as faces, animals, landscapes, and many others.

Picbreeder has spawned a large number of projects that extend on its original idea,

such as EndlessForms (Clune and Lipson, 2011), which allows users to breed 3D artifacts

instead of 2D images using a three-dimensional CPPN representation. Other examples

include platforms like Artbreeder (J. Simon, 2018), which combines a Picbreeder-inspired

interface with generative AI models such as GANs to allow users to directly start the

evolutionary search in an interesting part of the design space. We take a closer look at some

of these hybrid systems in chapter 13 on generative AI. Interactive neuroevolution also

does not need to be limited to generated visual artifacts, as demonstrated by systems such as

NEAT drummer (Hoover, Rosario, and Stanley,

2008) or MaestroGenesis (Hoover, Szerlip,

and Stanley, 2014), which allows users to interactively breed musical accompaniment to

existing songs.

However, a common challenge with many of these systems is that, even though the

process of interactive evolution by itself can be entertaining for a while, users often do not

spend that much time on the site. Wrapping the whole collaborative evolution loop inside

a game can address this issue, as we will see next.

219

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

Figure 8.8: Examples of Picbreeder images. Shown is a variety of designs that were evolved

by many collaborating users. For each design, the number of nodes

𝑛

, connections

𝑐

of the

underlying CPPN are also shown together with the total number of cumulative generations

𝑔

Because Picbreeder allows users to build on each other’s work, it facilitates the discovery of a wide

range of complex and compelling images. Figure from Secretan, Beato, D’Ambrosio, et al. (

2011).

8.4

Case Study: Collaborative Interactive Neuroevolution

Through Play

Just as interactive neuroevolution paved the way for innovative games like NERO, the

concept of collaborative neuroevolution also facilitated the emergence of other types of

video games, such as Petalz (Risi, Lehman, D’Ambrosio, et al., 2016) and Galactic Arms

Race (Hastings, R. K. Guha, and Stanley, 2009). In both of these games, collaborative

interactive neuroevolution serves as a method for what is called procedural content

generation (PCG). In PCG, the goal is to generate game content, such as levels, characters,

items, and more, algorithmically rather than manually designing them. In Petalz, which

was a casual Facebook game, the main idea was to allow players to collaboratively

breed diﬀerent types of procedurally generated ŕowers. More speciﬁcally, players in

Petalz possess a balcony they can decorate with various available ŕower pots (ﬁgure 8.9).

Additionally, players can visit the balconies of friends and water or like their ŕowers.

Players can evolve their ŕowers by clicking on existing ŕowers, which opens a menu

220

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

Figure 8.9: The Petalz video game. Players in Petalz can decorate their balconies with various

pots and balcony designs. They can breed new ŕowers by clicking on the existing ŕowers and

trading ŕower seeds with other users. By allowing players to branch oﬀ the ŕowers discovered by

others, Petalz allows a new type of digital social interaction that links players through collaborative

interactive neuroevolution. Figure from Risi, Lehman, D’Ambrosio, et al. (2016). Videos at

https://neuroevolutionbook.com/demos.

that allows generating ŕower oﬀspring through mutations or to cross-pollinate a ŕower

with another one, thereby performing a crossover. Flowers are generated by a CPPN

representation that is modiﬁed to generate ŕower images and shapes (instead of arbitrar y

images), which are themselves also allowed to become more complex via the NEAT

algorithm.

Players can also list their ŕower seeds in a digital marketplace at a price of their

choosing or gift them to others. These mechanisms allow other players to continue

breeding new ŕowers and build entirely new lineages. A compelling question is whether

ŕower seedsÐbeing truly novel digital artifactsÐcan hold economic value, and whether

skilled breeders are rewarded for their eﬀorts. Analysis of the ŕower market indicates that

this is indeed the case: ŕowers that are more aﬀordable or aesthetically appealing tend to

sell better.

The global marketplace also facilitates collective discover y and breeding of a diverse

range of ŕowers, as illustrated in the ŕower phylogeny shown in ﬁgure 8.10. Beyond strategy-

focused games like NERO, the results from the Petalz game suggest that collaborative

neuroevolution can also enable engaging machine learning games for casual players. While

it was live, Petalz attracted over 1,900 registered online users and saw the creation of

38,646 unique evolved ŕowers, showcasing the potential of this approach.

Players especially appreciated the novel form of digital social interactionÐconnecting

through the exchange of ŕower seeds and collaborative breedingÐthat added a new layer

221

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

Figure 8.10: A Petalz ﬂower phylogeny. Shown is a family tree that tracks the collaborative

eﬀorts of 13 distinct users. Each pair of parent and oﬀspring is divided by one generation. For

cases where a ŕower emerges from cross-pollination, the connecting line to the second parent is

highlighted in red. The inset oﬀers a closer look at the evolutionary dynamics, featuring minor

phenotypic changes (

𝑎

), an instance of cross-pollination (

𝑏

), and substantial yet shared phenotypic

transformations (

𝑐

). This ŕower phylogeny highlights the rich diversity and lineage of designs

that emerge when users are able to collaboratively evolve content through play. Figure from Risi,

Lehman, D’Ambrosio, et al. (

2016).

of engagement to the experience.

In Galactic Arms Race (GAR), another multiplayer game built on CPPNs and NEAT,

players pilot a spaceship and ﬁght enemies to acquire unique and procedurally generated

particle weapons. GAR is another machine learning game, in which the integration of

user preferences is slightly less direct than in a game such as Petalz, in which the users

directly choose which ŕowers to reproduce. To smoothly integrate user preferences into a

real-time game such as GAR, here the neuroevolutionary algorithm takes into account

implicit information within the game’s usage statistics. In particular, in GAR, the game

keeps track of how often players ﬁred the diﬀerent weapons that they have in their three

available weapon slots. New weapons being spawned into the game world are chosen

to be mutations of the weapons that players preferred in the past. This way, players can

collaboratively discover a wide variety of particle weapons. Instead of describing a static

2D or 3D image, CPPNs in GAR are an interesting example of a CPPN generating a

dynamical system. For each frame and for every particle of a particular weapon, the CPPN

receives the particle’s current position as input, in addition to the position it was initially

ﬁred from. The CPPN then outputs the particle’s velocity in addition to its RGB-encoded

color. While all particular weapons have the same number of particles, the ability of player

projectiles to intersect enemy projectiles can lead to several tactical trade-oﬀs explored by

evolution. Slower projectiles oﬀer the beneﬁt of easier blocking against incoming ﬁre,

providing a defensive advantage. On the other hand, faster projectiles are better suited for

precise aiming at distant enemies, oﬀering oﬀensive prowess. Two particularly fascinating

types of evolved weapons are shown in ﬁgure 8.11. Wallmakers are capable of forming a

literal wall of particles in front of the player, and tunnelmakers generate a protective line

of particles on both sides of the player.

222

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

Figure 8.11: Evolved particle weapons in Galactic Arms Race. The interactive evolution

component of GAR allowed players to evolve a large diversity of diﬀerent and aesthetically pleasing

weapons. More importantly, diﬀerent evolved weapons have diﬀerent tactical implications, such as

the Wallmaker (

𝑐

), which favors defense-play by creating a particle wall in front of the player, or

the (

𝑒

) Tunnelmaker, which protects the player from attacks from the left or right side. Figure from

Hastings, R. K. Guha, and Stanley (2009). Videos at

https://neuroevolutionbook.com/

demos

Together, the examples in this and the previous section show that interactive neuroevo-

lution can enable the creation of novel types of machine learning games with engaging

player dynamics. Petalz had over 1,900 registered online users and 38,646 unique evolved

ŕowers, which showcases the potential for PCG to enable these kinds of casual game

mechanics. In the ﬁrst two months of going online in 2009, GAR had over 1,000 registered

online players who evolved 379,081 weapons. In addition to demonstrating the increasing

entertainment value with a constant stream of evolved content, these examples also

demonstrate the versatility of CPPNs to encode a variety of diﬀerent types of content, from

ŕower images to particle weapons, which all beneﬁt from NEAT’s ability to complexify

the underlying representations and thus the resulting phenotypic patterns.

Beyond their application to games, interactive evolution systems can also serve other

important functions. They enable researchers to visually explore the representation

power of diﬀerent types of encodings or the way that users individually or collaboratively

explore such a space, leading to surprising insights. For example, as mentioned already in

section

5.3, while Picbreeder was initially invented to explore the CPPN encodingÐplaying

with the system and realizing that users in Picbreeder explore a vast search space very

diﬀerently to current optimization algorithmsÐled Kenneth Stanley and Joel Lehman

to invent the novelty search algorithm (section

5.3). Interestingly, the diﬀerent ways

a search space is explored can also lead to very diﬀerent types of representations. In

CPPN-representations evolved by users in Picbreeder, developmental canalization often

emerges, where certain dimensions of variation are more likely while others are prevented

(Huizinga, Stanley, and Clune, 2018). For example, in Picbreeder, some of these canalized

223

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

dimensions of variation are a łgenež for the size of objects, a łgenež determining how

much the mouth of a skull (shown in ﬁgure 8.8

𝑜

) is open/closed, or a łgenež that controls

the shadow of objects in an image. This type of developmental canalization is often linked

to the evolution of evolvability in natural systems, which many believe to be essential

for the tremendous diversity of functional organisms we see in nature. Representations

evolved with traditional objective-based evolution do not show this type of canalization,

and mutations to single genes here often aﬀect none or many parts of the image (Kumar,

C. Lu, Kirsch, et al., 2024). Artiﬁcial evolutionary systems can thus help us to determine

under what circumstances diﬀerent properties evolve, and we will return to this important

topic in chapter 14.

8.5 Making Human Contributions Practical

Interactive evolution experiments require signiﬁcant human eﬀort, which makes it diﬃcult

to take advantage of them more broadly. Some domains, like Picbreeder, are inherently

interesting and rewarding, and a large number of people can contribute to them through

publicly available websites. But other domains may be more abstract and progress in them

less obvious, resulting in users fatiguing and losing interest.

One solution is to use human computation markets (HCM), such as Amazon Mechanical

Turk, to recruit humans to this role. In a sense, monetary reward can thus be used as a

substitute for the intrinsic enjoyment of creativity and curiosity. Of course, using HCM

requires funds, but so do other types of computation as well. In a sense, some of the

computational budget is used for human computation instead of cloud computation.

HCMs can be used eﬀectively in three roles (Lehman and Miikkulainen, 2013): to

bootstrap experiments to become interesting, to evaluate diﬀerent designs, and to extend

interactive evolution to long experiments.

First, even if a task such as a Picbreeder is eventually engaging and rewarding, it is not

so at the very beginning. The forms are simple and stay simple for several generations. It is

diﬃcult to get people to evaluate such images, and evaluation itself is not very meaningful.

It turns out that if this phase is automated, or HCM is used to get through it, the ﬁnal

images turn out more interesting. For instance in the Picbreeder domain, it is possible to

generate an initial set of images algorithmically, and thus make them more complex and

interesting than simple geometric forms (Lehman and Stanley, 2012). A simple ﬁtness,

such as one based on rarity (or novelty) and complexity (or eﬀort), can be used to guide

this initial evolution. At the next phase, it is then possible to use HCM to improve upon

those images further, up to a level where the images are actually appealing to humans, and

the creativity/curiosity rewards can take over.

Figure 8.12 compares three interactive evolution runs of Picbreeder in these two

conditions: starting from random images, and starting from algorithmically seeded images,

in both cases followed by a period of further evolution with HCM. The seeded runs resulted

in more complex images, and human judges also found them more aesthetically appealing.

Thus, initial machine exploration and HCM can be used to make interactive evolution

experiments more eﬀective.

Second, there are also tasks where the creativity/curiosity reward never becomes

224

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

Figure 8.12: Example initial and ﬁnal images with and without seeding interactive neuroevo-

lution. The early phase of Picbreeder is not very engaging, but can be bypassed by seeding. In this

comparison, the initial unseeded images were generated with random CPPNs; the initial seeded

ones were generated by running CPPN evolution for a while and selecting the most impressive

images. Both sets of images were then evolved further with Picbreeder using HCM. Interactive

evolution from seeded images results in more complex and appealing ﬁnal images, suggesting

that proper initialization is crucial in taking full advantage of interactive evolution. Figure from

Lehman and Miikkulainen (2013).

large enough to justify the human eﬀort, and therefore HCM is necessary to perform the

experiments in the ﬁrst place. A particularly important general case is the experimental

design of such experiments. For example, the images can be encoded in various ways:

through using CPPNs or simple ANNs with diﬀerent activation functions. It may not be

possible to make these design choices correctly without running preliminary experiments,

and such experiments are often not very interesting to human users. HCM can be used to

good eﬀect to discover the best designs before running the actual experiments.

Third, in some cases evolution needs to be run very long in order to get good results.

Even if the task is interesting, the users will eventually fatigue. HCM can provide a

continual, indeﬁnite stream of new users in such experiments. On the other hand, each user

225

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

makes only a transient contribution to the evolutionary process, and these contributions

may be inconsistent. It turns out, however, that long-running evolution can still utilize

them as a guide towards good solutions. Evaluations in most domains are always noise,

and such inconsistency is simply another form of such noise. As usual, evolution is

robust against noisy evaluations, and they may even boost creativity by encouraging

exploration. Thus, HCM can be harnessed to enable long-running interactive evolution

experiments. In conclusion, while interactive evolution experiments require signiﬁcant

human eﬀort, there are ways to make them practical and thus realize the full potential

of human guidance. Later in this book we will explore another alternative, which star ts

with a genotype-to-phenotype mapping that is lear ned through a generative AI approach,

thus from the get-go producing outputs that resemble valid, domain-speciﬁc artifacts

(section 13.4).

In this chapter, we have seen how interactive neuroevolution can create novel forms of

gameplay and design experiences. By involving human users directly in the evolutionary

loopÐwhether through selecting visual artifacts, guiding agent behavior, or exchanging

and breeding digital contentÐthese systems empower players and designers alike to

steer the creative process. Interactive neuroevolution thus oﬀers a powerful tool for

fostering open-ended exploration and innovation, enabling the emergence of surprising

agent behaviors, aesthetic artifacts, or even entirely new design spaces.

A natural next step is to explore how evolutionary processes can drive this discovery

autonomously, without constant human guidance. In the next chapter, we turn our

attention to open-ended neuroevolution systems that aim to automate the generation

of complexity, novelty, and diversity. Such systems represent a shift from user-driven

creativity to autonomous open-ended discovery, where evolution itself becomes the engine

of exploration.

8.6 Chapter Review Questions

Conceptual Understanding: How does interactive neuroevolution diﬀer from

standard neuroevolution, and what types of problems is it particularly well-suited to

solve?

Human-Guided Evolution: In the context of the NERO game, what tools are

provided to the human player to guide the neuroevolution process? How can these

tools shape the evolution of agent behaviors?

Real-Time Evolution: What is the role of rtNEAT (real-time NEAT) in NERO, and

how does it enhance the interactive experience compared to traditional generational

neuroevolution?

Behavioral Shaping: Describe how curricular evolution is implemented in NERO

to train agents progressively. Why is this approach often more eﬀective than using a

single, static objective function?

Surprising Behaviors: Give examples of unexpected strategies discovered by

evolution in NERO. How do such discoveries highlight the balance between human

226

CHAPTER 8. INTERACTIVE NEUROEVOLUTION

guidance and evolutionary creativity?

Interactive Machine Learning Games: Based on the NERO example, what

characteristics make machine learning games engaging for human players, and how

does the circularity of strategies contribute to the gameplay?

Collaborative Exploration: How does Picbreeder address the challenge of user

fatigue in interactive neuroevolution, and what role does branching play in enabling

collaborative exploration?

Generative Applications: Descr ibe how Petalz and Galactic Arms Race utilize

collaborative neuroevolution to procedurally generate game content. How do their

approaches diﬀer in incorporating user preferences?

Representation and Evolvability: What is developmental canalization, and how

does it emerge in CPPN representations evolved in Picbreeder? Why is this property

signiﬁcant for understanding evolvability?

10.

Practical Implementation: What strategies can make interactive neuroevolution

more practical in domains with limited user engagement or long-running exper-

iments? Provide examples of how human computation markets (HCM) can be

eﬀectively utilized.

227

Chapter 9

Open-ended Neuroevolution

A major goal in neuroevolution of behavior is to keep innovating beyond the obvious

solutions, over long periods of time, while the environment is changingÐin other

words, establish an open-ended discovery mechanism. Coevolutionary arms race and

interactive neuroevolution from previous chapters are examples of such processes. This

chapter reviews opportunities for open-ended neuroevolution more generally, including

inspirations from biology and their computational instantiations, body/brain coevolution,

and coevolution of agents and environments.

9.1 Open-ended Discovery of Complex Behavior

Neuroevolution has produced several convincing demonstrations where complex behavior

is discovered in behavioral tasks, sometimes rivaling the complexity seen in nature.

However, there is one striking diﬀerence: Neuroevolution is set up to solve a par ticular

problem, whereas biological evolution has no goal. In nature, solutions are discovered

continuously as challenges and opportunities come up. Such open-endedness is still a

challenge for artiﬁcial evolution, especially when the goal is to evolve general intelligent

agents (Miikkulainen and Forrest, 2021). This section reviews ﬁve elements of open-

endedness in biology that may, if we can implement them well, lead to open-ended

neuroevolution: neutrality with weak selection, enhanced exploration through extinction

events, highly evolvable representations, powerful genotype-to-phenotype mappings, and

major transitions in complexity.

9.1.1 Neutral Mutations with Weak Selection

Current evolutionary computation approaches, including those that evolve neural networks

for behavior, aim to be strong and eﬃcient. They utilize small populations that can be

evaluated quickly; the crossover and mutation operations are often carefully crafted to

make it likely that ﬁtness is improved; ﬁtness is measured precisely, and selection is

strongly proportional to ﬁtness. As a result, evolution converges the population quickly

around the most promising solutions and ﬁnds good solutions there fast. This approach is

eﬀective e.g. in many engineering problems where the search space and ﬁtness are well

228

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

deﬁned and the problem consists largely of optimizing the design.

However, this success often comes with the expense of reduced extrapolation and thus

reduced creativity. It is also not very eﬀective when the agents need to be general, i.e.

cope with uncertain and changing environments and solve multiple tasks simultaneously.

Other mechanisms are needed to counterbalance the eﬀective search, such as diversity

maintenance methods, novelty search, and quality diversity search (section

5.3). They

are intended to keep the population of solutions diverse for a longer time and spread it

out further in the solution space. The idea is to not miss solutions that are complex or

unexpected, i.e. hard to ﬁnd through greedy search.

Interestingly, biological solutions are sometimes highly creative and unexpected, yet

do not seem to result in any special mechanisms for diversity maintenance. If anything,

biological solutions need to be viable always, which seems to counteract the need for

diversity. How does biology do it?

Nature seems to employ an entirely diﬀerent approach to creativity (Lynch, 2007;

Miikkulainen and Forrest, 2021; A. Wagner, 2005). The populations are very large, and

selection is weak. Often, there is also a lot of time for these processes to ﬁnd solutions.

Phenotypic traits are coded redundantly through several genes, much of the DNA exists in

non-coding regions, and many of the mutations are neutral, i.e. do not aﬀect ﬁtness. As a

result, diversity can exist in such populations: there is time to create it, and it stays even if

it isn’t immediately beneﬁcial. The population as a whole can thus stay robust against

changes, develop expertise for multiple tasks, and maintain evolvability through time.

Neutrality in ﬁtness landscapes can be seen to produce similar eﬀects in computational

models. When mutations do not alter ﬁtness, the search space reorganizes: basins of

attraction become larger, paths to global optima grow shorter, and populations can drift

across neutral networks instead of becoming trapped in local peaks (Verel, Ochoa, and

Tomassini, 2010). In this way, neutral drift not only maintains diversity but also increases

evolvability, creating the conditions for escaping dead ends and reaching higher-ﬁtness

solutions. Weak selection combined with neutrality therefore emerges as a powerful driver

of robust and creative adaptation.

There is a good reason for the strong and impatient approach that evolutionary

computation has taken until now. Evolutionary optimization is computationally intensive,

and such techniques were necessary in order to take advantage of what was available.

However, now that we have a million times more compute than just a couple of decades ago

(Routley, 2017), it may be time to rethink the approach. This is precisely what happened

with deep learning. Much of the technology, such as convolutional networks, LSTMs, and

autoencoders, existed since the 1990s, but they only started working well when taking

advantage of the massive increases in scale (LeCun, Y. Bengio, and Hinton, 2015).

A similar opportunity may exist for evolution in general, and neuroevolution in

particular. It may be possible to scale up to large populations, large redundant genomes,

non-coding DNA, neutral mutations, and deep time. It may be possible to take advantage

of massive amounts of behavioral data and large-scale simulations to evaluate the solutions.

The evaluations may be multiobjective and high-level, instead of carefully engineered

to produce solutions of the expected kind. Eventually, it may even be possible to create

foundation models for neuroevolution, i.e. large, diverse populations of neural networks

229

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

that have many diﬀerent abilities and are thus highly evolvable to solve new tasks.

One way to accelerate evolution in such populations is through extinction events, as

will be discussed next.

9.1.2 Extinction Events

In biological evolution, large-scale extinction events have occurred several times, often

seemingly changing the course of evolution (Meredith, Janečka, Gatesy, et al., 2011;

Raup, 1986). For instance, the Cretaceous-Paleogene extinction displaced dinosaurs with

mammals, eventually leading to the evolution of humans. An interesting question is:

Are such events simply historical accidents, or do they implement a principle that in

some way enhances, or hinders, evolution in the long term? Even though such events

obviously destroy a lot of solutions, can they possibly serve to reset evolution so that better

evolvability is favored, which in the long term results in accelerated evolution and more

complex solutions?

While it is diﬃcult to evaluate this hypothesis in nature, it is possible to do so in

computational experiments. It is possible to set up a large population with many diﬀerent

solutions, representing adaptations to diﬀerent niches. If evolution runs in a stable manner

for a long time, those niches are eventually ﬁlled with good solutions, and evolution

stagnates. At such a point in time, an extinction event eliminates most such solutions.

Those that remain, even just very few, are then free to evolve to ﬁll the open niches. Such

evolution can be described as radiation from the remaining niches, but note that there is

also a meta-level selection at play: The solutions that are more evolvable, i.e. faster to

adapt to the open niches, will spread faster and wider, making them more likely to survive

the next extinction event. Thus, under repeated extinction events, evolution favors higher

evolvability. Extinction events can thus have a positive long-term eﬀect, accelerating

evolution, and possibly resulting in more complex solutions as well.

To visualize the basic idea, consider a very simple computational setup (Lehman and

Miikkulainen, 2015). The niches are cells in a toroidal 401×401 grid world. Individuals

consist of grid coordinates and a probability of changing those coordinates. Thus,

adaptation means moving to a new cell, and high evolvability is represented by a high

probability of change. Initially, there is only one individual at the center, and evolution

creates more individuals by cloning and then mutating grid coordinates, and at the same

time, mutating the probability. Over time, the population spreads to ﬁll in all niches

simply through dr ift (ﬁgure 9.1

𝑎

). However, with extinction events, only ﬁve individuals

at random locations survive. If such events occur often, there is a strong selection towards

individuals that mutate with a high probability. Thus, after prolonged evolution, the

population evolved with extinction events is more evolvable than a population evolved

without them (ﬁgure 9.1𝑏).

Do these results hold at the level of behavior as well? Consider again the bipedal

walker domain described in section

5.3. As before, the controllers are neural networks

evolved with NEAT, taking the location of the two feet (whether on the ground or not)

as input, and torque to the six motors (one in each knee, two in each side of the hip) as

output. A behavioral niche can be deﬁned on the grid as in the abstract domain, i.e. the

ﬁnal location of the bipedal walker after 15 seconds of simulation. This location is also

230

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

(

𝑎

) Abstract: No extinction

(15,000 gens)

(𝑏) Abstract: Random

extinctions (15,000 gens) (𝑐) Walker: Niches ﬁlled over time

Figure 9.1: Eﬀect of extinction events on evolvability. While extinctions are catastrophic in

the short term, they may empower evolution in the long term. (

𝑎

) Without extinction events, the

population in the abstract domain evolves to ﬁll in the available niches (i.e. cells in the 401

401

grid). A variety of evolvability levels exists in the end, indicated by the grey-scale values (lighter is

more evolvable). (

𝑏

) With extinction events, higher evolvability is favored. Such events occurred at

random intervals averaging 2,500 generations. In this snapshot, ﬁve individuals survived a recent

event, and the population is currently expanding to ﬁll in the available niches. On average, these

individuals are about 50% more evolvable than those in (

𝑎

), indicated by the lighter color. (

𝑐

) In

the bipedal walker domain, extinction events rebound quickly, ﬁlling in more niches than before

the event, and eventually more than evolution without extinction events. Thus, extinction events

accelerate evolution and result in the discovery of more novel solutions. Figures from Lehman and

Miikkulainen (2015).

used to measure novelty, and evolution is set to maximize novelty. Evolvability can then

be measured as the behavioral diversity of the oﬀspring: The individual is mutated 200

times; the number of distinct ﬁnal locations of the oﬀspring represents its evolvability.

As can be seen in ﬁgure 9.1

𝑐

, evolution without extinction events expands to ﬁll in

the various niches monotonically. With extinctions, there is an immediate drop to ﬁve

niches and a fast rebound to a higher level than before the event. Moreover, the rebounds

become more eﬀective over time, eventually ﬁlling more niches than evolution without

extinctions. Thus, extinction events result in accelerated evolution and solutions with

increased novelty.

These computational experiments suggest how extinction events can accelerate evo-

lution in biology. Although major such events have taken place only a few times, they

can be frequent at a smaller scale, resulting e.g. from ﬁres, volcanic eruptions, climate

events, predator migrations, and even human impact. The results also suggest that the

same eﬀect could be harnessed in engineering applications of computational evolution,

leading to better results in the long term. Combining it with large populations and weak

selection, as discussed in section 9.1.1, is therefore a compelling direction for future work.

9.1.3 Evolvable Representations

This chapter so far has outlined an approach to open-ended evolution that is still largely

building on genotypic and phenotypic diversity, with a constant mapping between them.

An alternative approach is to take advantage of evolvability, which can be deﬁned as

231

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

adapting the genotype-phenotype mapping over time such that the search operators are

more likely to generate high-ﬁtness solutions. High evolvability is often based on indirect

encodings, which can provide a substrate for this adaptation.

The main challenge is that whereas high evolvability provides a future beneﬁt for

evolution, it needs to be developed implicitly based on only current and past information.

In biology, evolvability may selected for in three ways (Kirschner and Gerhart, 1998):

more genetic variation can be stored in the population (because fewer mutations are

harmful), it makes organisms more tolerant against stochastic development, and it makes

it more likely for the populations to survive in changing environments.

Each of these can be evaluated in computational experiments. Opportunities for the

ﬁrst one were already discussed above in section 9.1.1. Opportunities for the second one

are illustrated in sections on development (sections 4.2 and 14.4). In short, an individual

is not complete at birth, but goes through a period of physical and mental development

that results in a more complex and capable individual (Müller, 2014). Often this period

involves interactions with the environment, i.e. at least some of the complexity is not

innate, but is extracted from the environment. These interactions can be synergistic and

encoded into critical periods of development. For example, human infants need to receive

language input when they are one to ﬁve years old, otherwise they do not develop full

language abilities (see section 14.8.1 on the biology of language). In this manner, instead

of coding everything directly into genes, evolution also encodes a learning mechanism

that results in a more evolvable encoding (Elman, Bates, M. H. Johnson, et al., 1996;

Valsalam, Bednar, and Miikkulainen, 2005).

The third advantage opens up an opportunity that is particularly well aligned with

open-ended evolution. Given a domain with known structure, such as evolution of

symmetric bitstrings, evolution can be given an open-ended series of challenges in the

form of diﬀerent target bitstrings (Reisinger and Miikkulainen, 2006). The population

has to discover each target by continuing evolution of the current population (initially

random). The target changes at given intervals, which have to be long enough for success

to be possible. The evolvable representation consists of linkage parameters between bit

locations, biasing the mutations that occur. Over time, evolution discovers linkages that

favor symmetric strings, which makes discovery of targets gradually faster and more likely.

In other words, the representations become more evolvable in this domain.

How can such representations be designed for more complex solutions such as neural

networks and behavior? It turns out that the idea of linkages that adapt to the domain can

be scaled up to neural networks, with an approach that is motivated by genetic regulatory

networks (GRNs; Y. Wang, 2013). As was discussed in section 4.2.1, GRN is one way in

which biology establishes an indirect encoding. Building on the operon implementation

of GRNS in section 4.2.1, GRNs can be modeled more generally with a set of rules

(Reisinger and Miikkulainen,

2007). As usual in rule-based systems, each rule has an

antecedent that is matched with the current state of the system, and a consequent that

determines what output, or product, is generated. When used to construct neural networks,

the products are either hidden or output nodes. When the antecedent is matched with

currently existing products within a similarity tolerance, connections are created between

nodes. The tolerance, amount of products, and the resulting connection weights are

232

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

Figure 9.2: Constructing neural networks with a GRN. GRNs, a mechanism for decoding

genetic representations in biology, can also be used as an indirect encoding for neural networks.

The GRN is encoded as a set of rules. The current state is represented by products (indicated by

letters). The antecedents are matched with the current products, leading to the generation of more

products. The match is based on similarity between products, implemented through regulatory

factors. In mapping the GRN to a network, products create nodes and antecedent matches create

connections between them. In this case, starting with products G and B as a star ting point, matching

the ﬁrst rule creates a negative connection from B to itself. Because C is a similar product to B, H

and D are created as hidden nodes and connected to B. Matching D in turn leads to a recurrent

self-connection, as well as creating and connecting to an output node K. In this manner, a recurrent

structure is created; it can be further evolved by modifying the rule set and the regulatory factors.

Figure from Reisinger and Miikkulainen (2007).

determined by regulatory factors in the antecedents. A simple example of this process is

depicted in ﬁgure 9.2.

The rules and the regulatory factors in them are modiﬁed through evolution in order

to construct a neural network to solve the task. Note that this is a continuous, soft process,

where a given product can gradually increase (through neutral mutations) until a tolerance

is reached. It therefore has signiﬁcant potential for evolvability: A general GRN structure

is discovered where mutations often lead to viable oﬀspring.

This process was demonstrated in Nothello, a board game similar to Othello, but

with a diamond-shaped board of 36 cells and an objective of the fewest pieces on the

board. It oﬀers faster evolution with still much of the same complexity as full Othello.

The networks were evolved to serve as heuristic board evaluators for minimax search; a

single-ply lookahead was used to allow for longer evolutionary runs. In a coevolutionary

setup, each candidate was evaluated with a random sampling of other individuals in the

population. Note that coevolution provides an environment where the ﬁtness function

is constantly changing. As discussed above, such an environment should encourage

evolvable representations to emerge. Evolvability is also directly useful because it results

in discovering better gameplay over time.

Indeed, the GRN-based implicit encoding approach results in discovering better

networks over time compared to e.g. standard NEAT neuroevolution, as seen in ﬁgure 9.3

𝑎

This improvement is likely due to increased evolvability. Evolvability was measured as

the average ﬁtness of the local mutation landscape: Each representation was mutated to an

increasing extent, and the performance of the oﬀspring was measured. The GRN-based

implicit encoding results in much more robust mutations, i.e. improved evolvability

233

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

(𝑎) Champion performance in

1-ply search

(𝑏) Performance vs. oﬀspring

distance

(𝑐) Signiﬁcance of network

motifs

Figure 9.3: Performance, evolvability, and structure resulting from GRN-based neuroevolution.

The GRN-based encoding has several useful properties, as illustrated in the Nothello game domain.

(

𝑎

) The GRN-based indirect encoding evolves better solutions faster. (

𝑏

) This result is likely due

to the evolvability that the system discovers over evolution, measured by how good the oﬀspring

solutions are on average. (

𝑐

) The evolvability is likely due to more varied networks motifs,

taking advantage of recurrent structures. The signiﬁcance is measured by comparing to randomly

connected networks with the same size. This example illustrates a fundamental principle of

evolvability: It emerges from the continuously changing ﬁtness function (due to coevolution), and

makes coevolution more eﬀective, and can thus potentially be harnessed for open-ended discovery.

Figure from Reisinger and Miikkulainen (2007).

(ﬁgure 9.3

𝑏

). It is also interesting to see that the network str uctures that result are diﬀerent.

Whereas the NEAT networks are entirely feedforward, the GRN-based approach takes

advantage of many diﬀerent network motifs, many of which are recurrent (ﬁgure 9.3

𝑐

). In

this manner, it likely discovers structures that support evolvability, and thereby coevolution,

and thereby open-ended discovery.

9.1.4 Expressive Encodings

The mechanisms outlined above can be captured, generalized, and described mathematically

through the concept of expressive encodings (Meyerson, Qiu, and Miikkulainen, 2022).

The idea is that such encodings allow miracle jumps, i.e. large jumps in the search space:

For instance, ŕipping all bits in a binary encoding from 0 to 1 might be such a jump. A

standard evolutionary algorithm with a direct encoding would be unlikely to make such

changes, and therefore could not explore the search space as eﬀectively.

Expressive encodings do already exist. For instance, genetic programming utilizes

such an encoding (ﬁgure 9.4

𝑎

). Programs may share structure, but also have segments

that make large changes in the phenotype, such as conditionals. Small changes in such

segments can create miracle jumps. Neural networks are another expressive encoding

(ﬁgure 9.4

𝑏

): Even when they are not used as mappings from input to output, but simply

to encode vectors of outputs (with a constant input), small changes in a few weights can

create a miracle jump. Interestingly, such jumps may not be possible through a direct

encoding (ﬁgure 9.4𝑐).

The usual approach to making evolutionary algor ithms more powerful is to design

more complex and intelligent genetic operators that capture the proper ties of the domain.

For instance, estimation of distribution algorithms and covariance-matrix adaptation

evolutionary strategies aim at capturing the statistics between gene combinations and

ﬁtness (Hansen and Ostermeier, 1996; J. A. Lozano, Larrañaga, Inza, et al., 2006). In

234

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

Figure 9.4: Expressive encodings through GP and neural networks. Expressive encodings

make evolution more powerful by allowing for large changes. (

𝑎

) For instance, the phenotypes

of these two GP parents are all zeros, but their crossover results in an oﬀspring of all ones with

a probability of 0.25. They share most of the structure except for special segments deﬁning the

variables a and b. (

𝑏

) A similar encoding through a neural network. The input is a constant 1,

and the output is all zeros; They diﬀer in the weights of the ﬁrst layer such that a crossover results

in all ones with a probability of 0.25. (

𝑐

) Direct encoding of parents cannot lead to an all-ones

oﬀspring. These simple examples illustrate how expressive encodings make such miracle jumps

possible when they are not possible through direct encoding. Figures from Meyerson, Qiu, and

Miikkulainen (2022).

contrast, expressive encodings can work with basic, simple genetic operators such as

crossover and mutation. In this sense, they capture the essence of biological expressiveness

that is obtained through interactions and development. Theoretically, both genetic

programming and feedforward neural networks with sigmoid activation functions are

expressive encodings for both uniform crossover and single-point mutation.

Expressive encodings have been shown to be more powerful than standard evolutionary

approaches in various benchmark challenges, including tasks where objectives change over

time deterministically or randomly, and in large block assembly, both theoretically and

experimentally (Meyerson, Qiu, and Miikkulainen, 2022). The approach oﬀers maximum

evolvability, to the extent that there is no catastrophic forgetting when the objectives

change. It is also similar to biology in that much of the solutions are shared: more than

99% of the genes are the same across humans, for example, and much of the DNA is

shared across species (Collins, Guyer, and Chakravarti, 1997; Hardison, 2003). Only a

few crucial diﬀerences cause the diﬀerences between individuals and species. It is this

expressivity that the expressive encodings capture.

One particularly interesting opportunity for neuroevolution is to improve the trans-

mission function over time, i.e. the probabilistic mechanisms through which the child

phenotype is generated from the parent phenotypes. Evolution can be used to complexify

transmission functions, thus potentially powering open-ended evolution. With expressive

encodings and an evolving transmission function it may be possible to create a system that

starts simple, solves problems as they appear, and becomes more eﬀective at it over time.

One remaining challenge is to enable transitions to more complex organizations, as will

be discussed next.

9.1.5 Major Transitions

In biological evolution it is possible to identify several major transitions in complexity

(Maynard Smith and Szathmáry, 1997; Szathmáry, 2015). First there were self-replicating

molecules that organized into chromosomes; then these chromosomes were enclosed

235

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

in cells; next, cells complexiﬁed to include several plastids; such cells joined together

and specialized to form multicellular organisms; the organisms grouped to form eusocial

societies ﬁrst, and then actual societies, eventually with language and culture. In each

of these transitions, the individuals joined together into groups, specialized into distinct,

cooperative roles, and lost the ability to reproduce independently. Throughout these

transitions, information for biological organisms is still encoded at the molecular level.

However, how that information is organized, transmitted between individuals, translated

into physical structures, and selected for reproduction changes at each transition. As a

result, what it means to be an individual becomes more complex at each transition.

While the transitions are described in detail in biology, the mechanisms that produce

them are not well understood. In particular, are there multiple levels of selection operating

in parallel, or only one at the highest level? How do the individuals specialize, and how

do they lose their individual ability to reproduce? Do multiple phases exist at the same

time and cooperate and compete to eventually lead to a transition? Are the dynamics the

same at each transition, or is each one a separate, unique process?

A potentially powerful approach to answering these questions is to produce transitions

synthetically (Miikkulainen and Forrest, 2021; Solé, 2016). It has been very diﬃcult

to achieve: the closest successes focus on deﬁning hierarchical mathematical functions

and organizational structures in abstract mathematical games (Koza, 1992; Turney,

2020; Watson and Pollack, 2003). However, they are still far from major transitions in

behavior. For instance, the agents might discover ways to communicate or to construct

permanent artifacts such as roads. Further evolution might then discover behaviors that

take advantage of these constructs: The agents might communicate to establish ŕexible

roles and coordinate their behavior; they may move longer distances and harness more

resources. More generally, neuroevolution might construct network segments that perform

useful subfunctions, then group them together to construct more complex behaviors, and

multiple behaviors at diﬀerent times (i.e. general intelligence). Such specialization and

grouping could potentially continue for several levels.

Ingredients for such transitions have already been demonstrated in several ways. For

instance, it is possible to predesign the representations at diﬀerent levels by handÐe.g.

a syllabus for evolved virtual creatures allows discovering body and brains for simple

locomotion ﬁrst and build up to ﬁght-or-ŕight in multiple steps (Lessin, Fussell, and

Miikkulainen,

2013; Lessin, Fussell, and Miikkulainen, 2014). Similarly, mechanisms

can be created for discovering cooperative structures that work together at a higher level.

For example, in the CoDeepNEAT method, neural network modules are evolved to work

well together in a large composite network (J. Liang, Meyerson, Hodjat, et al., 2019;

Miikkulainen, J. Liang, Meyerson, et al.,

2023). Also, a competitive process can be

established that allow new challenges to emergeÐsuch as the arms race of better runners

and more challenging tracks in POET (section 9.3), or more complex prey behaviors

and better predators in zebra/hyena simulations (Rawal, Rajagopalan, and Miikkulainen,

2010; R. Wang, Lehman, Clune, et al., 2019). Multiple agents can communicate through

stigmergy, through observing each other, and through signaling, and thus coordinate their

behaviorÐfor example in capturing a prey or a desirable resource in a video game (Bryant

and Miikkulainen, 2018; Rawal, Rajagopalan, and Miikkulainen, 2010; Werner and M. G.

236

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

Dyer, 1992; Yong and Miikk ulainen, 2010). Architectures and approaches have been

developed for representing and executing multiple tasks in a unifor m mannerÐfor example

through a common variable embedding space as in TOM (Meyerson and Miikkulainen,

2021).

In sum, mechanisms of cooperative and competitive coevolution, multitasking, multi-

objectivity, evolvability, and expressive encodings are potentially useful ingredients in

producing major transitions. However, they do not yet drive actual transitions. How such

transitions can be established is an important challenge for neuroevolutionÐone that

would also have a large impact on understanding biology.

9.1.6 Open-ended Evolution of Intelligence

Many of the possible ingredients for open-ended neuroevolution do already exist. The

recently available computational power could be har nessed to set up evolutionary processes

that harness large populations, weak selection, neutral mutations, and deep time. While

many of the cur rent indirect genotype-to-phenotype mappings still focus on a single task,

the emerging theoretical understanding of expressive encodings could lead to mappings

that allow searching indeﬁnitely for more complex solutions as the environments and tasks

change. Such mechanisms could be harnessed to establish evolutionary innovation that

operates continuously.

However, open-ended innovation also requires that the environment presents the

evolutionary system continually with new challenges. The environments themselves can

change and evolve, or it may be possible to create multiple competing species in the

environment, thus establishing an evolutionary arms race. While current multiagent and

multipopulation systems still largely focus on solving a single task, evolution in such

domains has already been shown to lead to specialization and discovery of cooperation,

which could lead to major transitions. Multitask and multiobjective evolution are already

known to result in more robust solutions, and in such environments could lead to progressive

development of general intelligence. Perhaps the most promising avenue is to have the

agents themselves modify the environment, building artifacts and complexity into it that

persists (Lehman, Gordon, S. Jain, et al., 2023). In this manner, the environment and the

agents in it can complexify indeﬁnitely.

What goals might such experiments be set to achieve? An important one is a better

understanding of biological evolution, i.e. the origins of major transitions and intelligence.

Another one is to construct better artiﬁcial systems, i.e. systems that can be deployed in

natural environments and social environments where they adapt to existing challenges

and changes to them indeﬁnitelyÐmuch like people do. Such ability is one essential

ingredient in artiﬁcial general intelligence. To make these ideas concrete, the next two

sections review concrete experiments in which environments and agents coevolve, in both

cooperative and competitive fashion.

237

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

9.2 Cooperative Coevolution of Environments and Solutions

As discussed in sections 4.2.3 and 14.4, part of the complex structure of biological systems

originates from the complexity in the environment. A possible way to evolve complex

systems is thus to evolve the environment, to present increasingly complex settings.

9.2.1 The Inﬂuence of Environments

Our thought processes and behaviors are signiﬁcantly inŕuenced by the speciﬁc time

and place we inhabit on Earth. These elements are shaped by distinct circumstances,

cultural understandings, prevailing beliefs, and local customs. Together, they create

a framework that both deﬁnes and restricts our experiences and the patterns of our

thoughts (Ryan Ruggiero, 2012). For example, take the concept of individualism versus

collectivism, which varies widely across cultures. In many Western societies, such as the

United States, there is a strong focus on individual achievement and independence. This

cultural context fosters a thought pattern that emphasizes personal goals and self-reliance.

In contrast, many Eastern societies, like Japan, emphasize collectivism, where the focus is

on group harmony and community. In such cultures, thought patterns and behaviors are

more aligned with group goals and the collective well-being. Inhabiting a diﬀerent era or

being part of a distinct culture would fundamentally transform who we are, reshaping our

identity in profound ways.

This principle that humans are shaped by their environments applies similarly to AI and

ML systems. For example, large language models are deeply inŕuenced by their training

data. If trained on scientiﬁc literature, the model will excel in technical explanations,

whereas training on conversational texts results in more colloquial responses. This eﬀect

extends to the biases and perspectives inherent in the data. Similarly, in image generation,

diﬀusion models produce diﬀerent outputs based on their training datasets: models trained

on classical art will generate diﬀerent images than those trained on modern digital art. In

the realm of reinforcement learning, the training environment cr ucially deﬁnes an agent’s

skills. For instance, an agent trained in a simulated urban setting will develop diﬀerent

capabilities and strategies compared to one trained in a virtual natural landscape.

Just as human experiences are shaped by our environments and cultures, AI agents

are similarly molded by their training contexts and data environments. The quality and

diversity of their training inputs are crucial, emphasizing the importance of coevolving AI

systems with their environments to enhance their capabilities and behaviors.

9.2.2 Body and Brain Coevolution

Section 3.2 showed how neuroevolution can discover a policy to control a bipedal walker.

In that setting, the physical structure of the walker was predetermined, and only the

controller was optimized. From the perspective of coevolving environments and solutions,

the body can be viewed as part of the environment in which the brain must learn to

operate. Evolutionary algorithms, unlike gradient-based methods, are well-suited to

jointly optimize both the morphology of the agent and the controller that governs it. Why

238

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

Figure 9.5: Examples of evolved morphology. In the easy ŕat environment, the approach developed

a thick but short rear lower limb that enabled a fast gait (

𝑡𝑜𝑝

). In the more complex environment that

included obstacles and holes, a larger rear leg evolved that allowed the agent to push over obstacles

better (

𝑏𝑜𝑡𝑡𝑜𝑚

). Evolution thus optimized the body and control jointly to meet the challenge as well

as possible. Figure from Ha (2019). Videos at https://neuroevolutionbook.com/demos.

constrain ourselves to weights when we can also optimize other design choices governing

our agents?

Body and brain co-evolution was brieŕy discussed in the context of NSLC (section 5.5);

however, that section did not explore the eﬀect of diﬀerent environments on the evolved

morphologies. In addition to the weights of the control networks, the width, length, radius,

mass, and orientation of an agent’s body parts can be treated as evolvable parameters (Ha,

2019). The goal is to learn

𝑤

, i.e. a joint vector of neural network weights and robot design

parameters, to maximize the expected cumulative reward. An interesting question is: can

the agent evolve a physical structure that is not only better suited for the task, but also

facilitates evolving a better control policy? Such cooperative coevolution may uncover

design principles that are useful more generally.

For this task, evolution can basically be implemented using any of the neuroevolution

methods discussed earlier; the parameter-based exploration (PGPE) version of evolutionary

strategies (Sehnke, Osendorfer, Rückstieß, et al., 2010) was used in the experiments

in this section. With the head payload, mater ial density, and motor joint conﬁguration

held constant as in the original environment, only the lengths and widths of the four

leg segments were allowed to evolve together with the neural network controller. One

constraint was that the robot parts had to stay within a range of ±75% of the original.

It turns out that lear ning a better version of an agent’s body not only helps achieve

better performance but also enables the agent to jointly learn policies more eﬃciently.

The combined morphology+control approach was able to complete the more diﬃcult

BipedalWalkerHardcore domain in just 30% of the time required by the original, static

version of the robot. Across 100 rollouts, the learnable version achieved an average score

of 335

37, outperforming the baseline score of 313

53. In this environment (ﬁgure 9.5,

239

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

Figure 9.6: Optimizing for desired design properties. Evolution was rewarded for ﬁnding

solutions that included small legs. In the easy ŕat environment (

𝑡𝑜𝑝

), very small legs evolved. In

the more challenging environment (

𝑏𝑜𝑡𝑡𝑜𝑚

), its legs were longer, but they were the smallest that

could still solve the task. In this manner, multiple design goals can be combined to obtain a variety

of solutions. Figure from Ha (2019).

bottom), the agent generally learns to develop larger rear legs to serve as a useful stability

function for navigation. Its front legs, which are smaller and more maneuverable, also

act as a sensor for dangerous obstacles ahead, complementing its LIDAR sensors. In the

simpler domain without obstacles (ﬁgure 9.5, top), the agent tends to learns to develop

longer, thinner legs, with the exception of one leg part.

It is maybe not surprising that allowing an agent to learn a better version of its body

enables it to achieve better performance. However, can we trade oﬀ some of the additional

performance gains to achieve other design goals? For instance, can evolution discover

a design that utilizes the least amount of materials while still achieving satisfactory

performance on the task? To this end, the leg size can be calculated and rewards scaled by

a utility factor 𝑈 of:

𝑈 = 1 + log



original_leg_area

new_leg_area



(9.1)

With such rewards, evolution developed a lean, minimal design where every inch matters.

It also learned movements that appear more insect-like, with the smallest pair of legs that

can still solve the more challenging bipedal walker environment (ﬁgure 9.6).

Thus, interesting life-like results can be achieved with added constraints. What if we

do the opposite and remove the initial constraint that each part has to be within

75% of

its original value? Without any design constraints, evolution discovers an extremely tall

bipedal walker agent that łsolvesž the task by simply falling over and landing at the exit

(ﬁgure 9.7)!

In this manner, body-brain coevolution provides an avenue for open-ended discovery

of better solutions. As the agent gets better at controlling the body, the body can become

more complex, providing a new challenge in a cooperative manner. These principles

240

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

Figure 9.7: Optimization without constrains. With all design constraints removed, evolution

came up with a really tall bipedal walker that solves the task by simply falling over and landing

near the exit! This example shows that the approach can be creative beyond preconceived human

notions of what the solutions should be like. Figure from Ha (2019).

will be developed further in two later sections: Body-brain coevolution is combined with

reinforcement learning in section 12.4, and scaled up to more complex virtual creatures

in section 14.5. While body-brain coevolution enables progress by adjusting the agent’s

physical substrate, another powerful strategy is to adapt the environment in tandem with

the agent’s growing capabilities. The next section explores recent methods where the

tasks and environments themselves evolve cooperatively in response to what the agent has

learned.

9.2.3 Coevolution Driven by Interestingness

A key issue in open-ended learning is deciding what the next learning challenge should be,

especially in large or unbounded task spaces. Methods based on learning progress oﬀer

one answer by selecting tasks that are neither too easy nor too hard, but they often fall into

the trap of proposing trivial variations that do not meaningfully extend the agent’s abilities.

What is needed is a way to prioritize tasks that are not only learnable but worthwhile,

that is, tasks that are novel, diverse, and interesting from a human perspective. This

idea echoes earlier work such as the innovation engine (A. M. Nguyen, Yosinski, and

Clune,

2015b), which used a predictor of human interest to guide open-ended search.

The OMNI (J. Zhang, Lehman, Stanley, et al., 2024) and OMNI-EPIC (Faldor, J. Zhang,

Cully, et al., 2025) frameworks addressed this challenge by integrating models of human

interestingness into the training loop, allowing agents and their environments to co-adapt

in a more meaningful and productive way.

OMNI (open-endedness via models of human notions of interestingness), introduced

a method for ﬁltering tasks using two criteria: learning progress and human-like inter-

estingness. Tasks were ﬁrst scored based on how much the agent is improving, and then

ﬁltered using LLMs such as GPT-3 (Floridi and Chiriatti, 2020) and GPT-4 (Achiam et al.,

2023), which were prompted to judge which tasks are worthwhile (the use of LLMs in

neuroevolution is discussed in more detail in chapter

13). The overall structure of this

241

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

Task sampler

Environment

RL Agent

Action

Learning Progress

Which tasks are not too easy or

difficult for the agent to learn from?

Model of Interestingness

Which tasks are interesting?

Tasks

success rates

Next tasks to train

on (interesting

and learnable)

Observations

Reward

Typical RL training

Figure 9.8: Overview of OMNI. OMNI enables open-ended learning in vast environment search

spaces by ensuring that the training tasks not only have high learning progress, but are also

interesting. They harness LLMs to make such a heretofore impossible judgment. Figure from J.

Zhang, Lehman, Stanley, et al. (2024).

select

Task

Generator

(LLM)

+ N most similar learned and

failed tasks from the archive

Task description in

natural language

Generate next learnable

and interesting task

Chosen

task

Task Archive

Environment

Generator

(LLM)

Iterate on

compilation errors

Compare against M

most similar tasks

from the archive

Environment code:

Simulated world

+ Reward function

Post-generation

Model of

Interestingness

(LLM)

Yes,

interesting

Train agent

with RL

Success

Detector

(VLM/ LLM)

Agent behavior

+ Task description

+ Environment code

Success,

add to archive

as learned task

Add to

archive as

failed task

Iterated for

the max

number of

times?

Failed,

iterate on the

same task

Task description

+ Environment code

+ Reasoning for failure

Not interesting, regenerate

Figure 9.9: Overview of OMNI-EPIC. OMNI-EPIC continuously generates and solves new,

interesting tasks in simulation. The approach maintains a task archive of learned and failed tasks.

Figure from Faldor, J. Zhang, Cully, et al. (2025). Videos at

https://neuroevolutionbook.

com/demos

approach is illustrated in Figure

9.8.

OMNI-EPIC (open-endedness via models of human notions of interestingness with

environments programmed in code) extended this idea by generating entirely new envi-

ronments in code. It used LLMs to describe new tasks in natural language, translated

them into Python code deﬁning the simulation and reward structure, and used a second

model of interestingness to ﬁlter out redundant or unremarkable tasks. A success detector

evaluated whether the agent had learned the task, and a growing archive of successes

and failures guided future generations. This full pipeline is shown in ﬁgure 9.9; the

iterative loop enables both the agent and its task distribution to grow in complexity

together. The approach is similar to the POET approach described in the section 9.3.

The crucial diﬀerence is that in POET, the new environments were created to be simply

challenging to the existing solutionsÐtherefore, the environments and solutions compete.

In OMNI-EPIC, the environments are intended to be interestingÐtherefore, the process

can be seen as a cooperative.

The results from these two studies highlighted the eﬀectiveness of this co-adaptive

242

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

training steps

Average Task Success Rates

No. of Tasks with

Success Rates > 0.2

― OMNI: LP + MoI

― LP

― Uniform

Uniform LP OMNI: LP + MoI

10075

250

10075

250 10075

250

training steps (million)

(𝑎) Success probabilities ( 𝑏) Performance

Figure 9.10: Results in Crafter. (

𝑎

) Conditional success probabilities of all tasks in Crafter. Tasks

are organized from simple to complex based on the prerequisite tasks that must be accomplished

before completing the target task. Task names (left of each row) are readable in a digital format with

zoom. (

𝑏

) Performance in Crafter on all tasks. While OMNI biases training towards interesting

tasks, it achieves higher average task success rates and learns more tasks than uniform sampling

or choosing tasks based on learning progress alone, even across all tasks. Figure from J. Zhang,

Lehman, Stanley, et al. (2024).

approach. OMNI was tested in the Crafter (Hafner, 2022) and BabyAI (Chevalier-Boisvert,

Bahdanau, Lahlou, et al., 2019) environments. Crafter is a 2D Minecraft-like environment

with a technology tree, where tasks must be completed in a meaningful sequenceÐsuch

as gathering resources before crafting tools. BabyAI is a grid-based world focused on

grounded language understanding, where agents follow natural language instructions

involving navigation and object manipulation. Both environments are ideal for testing

open-ended learning because they feature large, combinatorial task spaces. And indeed,

in both environments OMNI achieved substantially higher task success rates and learned

a greater number of tasks when guided by the model of interestingness (ﬁgures 9.10

and 9.11).

OMNI-EPIC extended these results by showing that the environments themselves can

be generated in an open-ended way. In long-run simulations of an R2D2 robot, the system

created a wide variety of tasks starting from just a few seeds, spanning challenges in

navigation, push manipulation, and coordination. In actual RL training runs, OMNI-EPIC

adapted to agent performance by simplifying tasks after failures or combining mastered

skills into more complex ones. Quantitative evaluations conﬁrmed that both the model

of interestingness and the task archive are essential for sustained diversity and progress

(ﬁgure 9.12).

These systems oﬀer a promising realization of cooperative coevolution between

environments and solutions. The agent is not learning in a static world, nor is the task

distribution ﬁxed in advance. Instead, the agent and its environment develop together, each

responding to changes in the other. The model of interestingness ensures that the evolving

curriculum remains focused on tasks that are genuinely valuable rather than superﬁcial.

243

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

10075

250

10075

250 10075

250

training steps (million)

Uniform LP OMNI: LP + MoI

< 5 ×10

-2

> 5 ×10

-1

Tasks with 1 instruction

Tasks with 2 instructions

Tasks with 3 instructions

Tasks with 4 instructions

Tasks with 5 instructions

training steps

Average Task Success Rates

No. of Tasks with

Success Rates > 0.2

― OMNI: LP + MoI

― LP

― Uniform

(𝑎) Success probabilities (𝑏) Performance

Figure 9.11: Results in BabyAI. (a) Conditional success probabilities of a subset of tasks in

BabyAI. These plots only show tasks with a success rate of at least 0.05 by any method at any

timestep. Tasks are organized from simple to complex based on the instruction length. (b)

Performance in BabyAI on all tasks. The average task success rate scale for BabyAI is low because

it is averaged over the entire task set, which includes many tasks that are diﬃcult to learn. This

approach captures the microcosm of the real world, where there can be inﬁnitely many diﬃcult

or even impossible tasks. OMNI achieves much higher average task success rates and learns

more tasks than uniform sampling or choosing tasks based on learning progress alone. Figure

from Faldor, J. Zhang, Cully, et al. (2025).

The result is a dynamic and constructive interplay between learning and environment

design, mirroring the mutual shaping seen in natural evolution and cultural development.

9.3 Competitive Coevolution of Environments and Solutions

Just as cooperation between agents and environments can drive progress, competition can

also serve as a powerful engine for complexity. By evolving environments that actively

challenge evolving agents, competitive setups can create an arms race, where solutions

must constantly improve to survive.

9.3.1 Paired Open-Ended Trailblazer

Algorithms like novelty search (section 5.3), promote behavioral rather than genetic

diversity, making them less prone to getting stuck in local optima. As a result, they

naturally align with the principles of open-endedness by prioritizing divergence over

convergence. These approaches are motivated by the idea that reaching innovative solutions

often requires navigating through a sequence of intermediate łstepping stonesžÐsolutions

that may not resemble the ﬁnal goal and are typically not identiﬁable in advance.

In section 5.4 we have seen how quality diversity algorithms build upon this idea by

maintaining a diverse set of niches, each optimized in parallel. Unlike pure novelty search,

QD algorithms evaluate how well solutions from one niche perform in othersÐa strategy

244

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

OMNI-EPIC

learning progress only

OMNI-EPIC

w/o archive

Cell Coverage in Archive Diversity

(𝑎) Archive diversity

Successfully Learned Archive Size

ANNECS-OMNI

— OMNI-EPIC

— OMNI-EPIC learning progress only

— OMNI-EPIC w/o archive

(𝑏) Performance

Figure 9.12: OMNI-EPIC Performance in a long R2D2 Simulation. (

𝑎

) Cell coverage of

archive diversity plots in long runs with simulated learning by OMNI-EPIC and the controls. (

𝑏

)

ANNECS-OMNI measure of progress for OMNI-EPIC and the controls. Dotted lines are median

values, shaded regions are 95% conﬁdence intervals. OMNI-EPIC generated signiﬁcantly more

diverse tasks and continued to innovate throughout the run. Figure from Faldor, J. Zhang, Cully,

et al. (2025).

known as goal switching (A. M. Nguyen, Yosinski, and Clune,

2015b). This mechanism

enables the discovery of unexpected stepping stones across niches.

The POET algorithm (R. Wang, Lehman, Clune, et al., 2019) extends these principles

by integrating goal switching within a divergent search framework. While conventional

QD methods drive solution diversity, they typically operate in static environments, which

ultimately limits long-term discovery. For machine learning to achieve true open-endedness,

algorithms must evolve both problems and solutions. POET is designed to drive an open-

ended process of co-discovery in a single run. It maintains a population of environments

(e.g. obstacle courses) and a population of agents (e.g. neural network controllers),

with each agent paired with a speciﬁc environment. This setup results in a divergent

coevolutionary process that continuously pushes the frontier of both challenges and skills.

As new environments are created, they present fresh challenges, while agents adapt by

developing more advanced capabilities. Existing skills are leveraged not only through

continued optimization but also by transferring agent behaviors across environments to

uncover promising stepping stonesÐfacilitating ongoing, open-ended discovery.

In more detail, POET begins with an initial simple environment, such as a ŕat-ground

obstacle course, paired with a randomly initialized neural network agent. Throughout its

operation, POET executes three core tasks within its main loop:

Environment Generation: POET generates new environments by mutating the

parameters of existing ones. In the bipedal walker task,, these environmental parameters

include (1) stump height, (2) gap width, (3) stair height, (4) number of stairs, and (5)

surface roughness. This process is selective, adding new environments to the active

population only if they provide a suitable challenge and introduce novelty. For example,

a minimum criterion (MC) of

𝑆

min

< 𝐸

child

(𝜃

child

) < 𝑆

max

, where

𝑆

min

and

𝑆

max

are

pre-deﬁned scores thresholds, can be used to ﬁlter out child environments that appear too

challenging or too trivial, yet fostering a diverse range of challenges.

Agent Optimization: Each agent is continuously optimized within its environment

245

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

using evolutionary strategies, though other optimization methods could also be applied.

The objective is to maximize performance metrics relevant to each environment, such as

traversing an obstacle course eﬃciently. This optimization happens independently for

each pair, which facilitates parallel processing and enhances computational eﬃciency.

Agent Transfer: To foster cross-environment adaptation, POET attempts to transfer

agents between diﬀerent environments. This strategy can help agents escape local optima

by applying successful strategies from one context to another. For example, an agent

performing well in a mountainous terrain might oﬀer insights when transferred to a rocky

terrain, potentially leading to breakthroughs in performance.

POET maintains a controlled number of environment-agent pairs in its active list,

capped at a maximum size to manage computational resources. Environments that become

obsolete or overly familiar are phased out to make room for new ones, ensuring the

population remains dynamic and conducive to continuous learning.

Experiments conducted by POET using diﬀerent types of obstacles (such as gaps,

rough terrain, and stumps) reveal that challenges generated and solved by POET are far

too diﬃcult for ES when tackled directly, see ﬁgures 9.13 and 9.14. For example, agents

optimized by ES in these environments tend to stop and avoid moving further to prevent

penalties rather than learning to navigate obstacles eﬀectively. This behavior contrasts

starkly with the capabilities developed by agents under POET, which successfully navigate

these complex environments. Additional results highlight that POET not only engineers

these challenging environments but also devises innovative solutions that ES alone cannot

achieve. This includes agents developed by POET that can navigate wide gaps and r ugged

terrains, which ES agents fail to handle. In simpler environments also created by POET,

ES consistently underperforms, unable to match the high standards set by POET’s adaptive

and dynamic approach.

A key question explored in the POET experiments was whether the environments

created and solved by POET could also be addressed by an explicit direct-path curriculum-

building control algorithm. To investigate this, POET was compared to a control approach

designed to create a sequence of progressively more diﬃcult environments leading to

a target environment. This curriculum was constructed manually, following principles

common in the literature on curricular learning.

In the direct-path curriculum, the sequence began with an extremely simple environment

consisting of ŕat ground, which was solvable by a randomly initialized agent. Subsequent

environments were constructed by incrementally increasing the diﬃculty of one or more

obstacle parameters (e.g. stump height or gap width) until the target environment was

reached. Agents were trained using ES, and progression to the next environment occurred

once the agent achieved a predeﬁned performance threshold. Importantly, this curriculum-

building control was given the same computational budget as POET to ensure a fair

comparison.

The comparison focused on three levels of environment diﬃculty: challenging, very

challenging, and extremely challenging. Diﬃculty is deﬁned by how POET-generated

environments exceed the reference values of the BipedalWalkerHardcore environment. For

example, extremely challenging environments in POET have stumps, gaps, and roughness

values that are up to 4.5 times what they were in the original diﬃcult version of the bipedal

246

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

(𝑎) Generated agents attempting gaps

(𝑏) Generated agents on rough surfaces (𝑐) Generated agents attempting stumps

Figure 9.13: The paired open-ended trailblazer (POET) approach. POET generates complex

environments and eﬀective agent solutions unachievable through standard ES. As depicted, agents

optimized directly by ES (top row of panel (

𝑎

) and left panels of (

𝑏

) and (

𝑐

)) tend to develop

suboptimal behaviors, often quitting prematurely. In contrast, POET not only engineers these

demanding scenarios but also successfully trains agents that adeptly navigate through them, as

demonstrated in the bottom row of panel (

𝑎

) and the right panels of (

𝑏

) and (

𝑐

). Figure from R.

Wang, Lehman, Clune, et al. (2019). Videos at https://neuroevolutionbook.com/demos.

walker domain. These results illustrate the system’s ability to generate truly novel and

diﬃcult scenarios.

Figure 9.15 provides a visual comparison of POET and the direct-path curriculum

algorithm. Each rose plot represents an environment created and solved by POET (red

pentagons) alongside the closest conﬁgurations reached by the curriculum algorithm in ﬁve

independent runs (blue pentagons). The pentagon vertices correspond to key parameters:

roughness, the lower and upper bounds of gap width, and the lower and upper bounds of

stump height.

The results show a striking dichotomy between the two approaches. Across all

247

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

(𝑎) Large steps (𝑏) Mixed terrain (𝑐) Performance

Figure 9.14: Agents demonstrate advanced navigation abilities in complex scenarios engineered

by POET. Notable challenges include (

𝑎

) navigating exceptionally large steps and (

𝑏

) mastering a

rough terrain course featuring a mix of narrow and wide gaps, alongside stumps of varying heights.

In addition, ES alone fails to match POET’s performance in various settings. (

𝑐

) A dotted line at a

score of 230 indicates the success threshold. The plots clearly show that ES consistently falls short

of meeting the challenges eﬀectively addressed by POET. Figure from R. Wang, Lehman, Clune,

et al. (2019).

diﬃculty levels, the curriculum algorithm consistently failed to reach the complexity

and challenge of POET-generated environments. This trend is especially pronounced

in extremely challenging environments (top two rows), where the blue pentagons fall

signiﬁcantly shor t of the red pentagons in terms of parameter values, such as maximum

roughness or gap width. Even at lower diﬃculty levels, the curriculum algorithm struggled

to match POET’s ability to solve nuanced and demanding scenarios.

In follow-up work, an enhanced version of POET (R. Wang, Lehman, Rawal, et al.,

2020) introduced an additional set of algorithmic innovations. The ﬁrst is the performance

of all transferred agents measure (PATA-EC). PATA-EC is a domain-general measure of

how meaningfully novel new challenges are, enabling the system to potentially create and

solve interesting challenges endlessly.

The second is a more eﬃcient heuristic for determining when agents should goal-switch

from one problem to another. The heuristic is based on the insight that what makes an

environment interesting is how agents behave in it, and novel environments are those

that provide new information about how the behaviors of agents within them diﬀer. This

heuristic is more computationally eﬃcient than the original POET algorithm and helps

open-ended search scale better.

Third, enhanced POET introduced a novel, more ŕexible way to encode environmental

challenges based on CPPNs (section 4.3.1). In the case of enhanced POET, CPPNs are used

to generate obstacle courses for the bipedal walking agent. The generated environments

shown in ﬁgure 9.16 demonstrate that the use of CPPNs allows for the generation of

much more complex and diverse challenges than what was used in the original POET

experiments.

From these results, it is evident that POET exempliﬁes the principle of coevolution

between agents and their environments. As an automatic curriculum builder, POET

continuously creates new challenges that are optimally balanced, neither too easy nor

too hard, eﬀectively teaching agents how to tackle increasingly complex problems. This

coevolutionary process fosters an environment where skills developed in one context are

248

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

Figure 9.15: POET versus direct-path curriculum-building controls. Each rose plot depicts

one environment that POET created and solved (red pentagon). For each, the ﬁve blue pentagons

indicate what happens in control runs when the red pentagon is the target. Each blue pentagon is the

closest-to-target environment solved by one of the ﬁve independent runs of the control algorithm.

The ﬁve vertices of each pentagon indicate roughness (roughness), the bottom and top values of

the range of the gap width of all the gaps (gap_lower and gap_upper), and the bottom and top

values for the height of stumps (stump_lower and stump_upper) in the given solved environment.

The value after MAX in the key is the maximum value at the outermost circle for each type of

obstacle. Each column contains sample solved environments from a single independent run of

POET. Figure from R. Wang, Lehman, Clune, et al. (2019).

not only honed but also become transferable, aiding agents in solving new and more

complex challenges.

9.3.2 Learning to Chase-and-Escape

In chapter 7, two settings of competitive coevolution were discussed: evolving a neural

network controller for a single agent by having it compete against other agents in the

population (section 7.1.1), and evolving two diﬀerent species of controller networks,

one for each of the two competing teams of agents, in two separate populations. An

evolutionary arms race ensued in both settings, resulting in several stages of innovation,

249

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

(𝑎) Sample environments from a single run of

original POET

(𝑏) Sample environments from a single run of Enhanced

POET

Figure 9.16: Enhanced POET. With the CPPN-based environment generation and other innova-

tions, enhanced POET is able to generate (and solve) a wide diversity of environments within a

single run. In contrast, the original POET can only generate environments with limited types of

regularly-shaped obstacles (e.g. stumps and gaps). Figure from R. Wang, Lehman, Rawal, et al.

(2020).

each with more sophisticated solutions than in the previous stages.

This subsection revisits such settings in the framework of the coevolution of environ-

ments and solutions. Whereas in POET each environment provides a static challenge for

each solution, competitive coevolution of agent controllers provides a dynamic challenge.

That is, the environment consists of other agents that respond dynamically to the agent’s

actions. For clarity, a domain where there are two agents with adversarial goals is used:

one agent is trying to escape and the other is trying to catch it (Tang, J. Tan, and Harada,

2020). As the chaser evolves more sophisticated tactics, the escapee evolves more reﬁned

moves to evade capture. This dynamic interaction leads to an arms race of increasingly

sophisticated strategies that is, in principle, open-ended.

The chaser is a simulated quadrupedal robot that needs to learn low-level joint

commands (i.e. desired joint angles), and the escapee is a dot robot that learns swift

commands (i.e. desired velocities and directions). The escapee is said to be caught if the

distance between the two robots is less than a predeﬁned threshold

𝑑

min

. The two robots

are trained in an iterative fashion.

First, in each iteration, the chaser robot plays against an opponent that is randomly

sampled from an adversary pool

𝑎

. The pool initially only contains an escapee robot that

stays still, giving the chaser robot time to learn basic locomotion skills in the early stages.

Second, after the chaser robot’s control policy is evolved, an opponent robot plays

against the upgraded version of the chaser. The escapee robot has no memory of the

skills it previously learned, and will devote all its energy and capacity to learn new skills

that discover and exploit the weakness of the chaser robot’s locomotion capability. After

learning, this escapee robot’s policy is added to Π

𝑎

While having the adversary pool

𝑎

encourages the chaser robot to play against

various escapees and helps ﬁght catastrophic forgetting, the diversity in the escapee robots’

escaping maneuvers is also critical. To achieve this, the authors sampled diﬀerent

𝑑

min

when training the escapee robots. Intuitively, a small distance threshold allows the escapee

250

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

to stay close to the chaser and develop sudden, quick movements to dodge, while larger

values would encourage the escapee to use large circular trajectories to stay away from the

chaser.

This iterative coevolution between the chaser and escapee robots is critical in developing

their agility and robustness. Each cycle of adaptation not only hones their individual

strategies but also contributes to a richer, more responsive interaction between them.

By continuously evolving both agents and the dynamics of their environment, the study

showcases how the complexity and eﬀectiveness of autonomous systems can be signiﬁcantly

enhanced.

After training, the quadrupedal chaser robot develops a symmetric gait that alternates

between its forelimbs and hind limbs, mimicking the bounding gait commonly seen

in quadrupedal animals at high speeds. To execute sharp turns, it extends the stance

phase of one forelimb, using it as a pivot to rapidly rotate its body and change direction.

Additionally, the escapee robot demonstrates sophisticated maneuvers, such as sprinting at

full speed, circling to confuse the chaser, and employing sudden lateral dodges to cause the

chaser to overshoot. For visual examples of these dynamic interactions, refer to ﬁgure

9.17,

which illustrates the trajectories of both the chaser and escapee robots.

Figure 9.17: Sample episodes of chase and escape. The quadruped robot is the chaser and

the red dot-bot is the escapee; the blue and red lines are their trajectories. In the exper iments,

some adversarial agents developed advanced evasion tactics, such as luring the quadruped robot to

approach, then dodging and stopping abruptly, causing the robot to run past them. Figures from

Tang, J. Tan, and Harada (2020).

To illustrate the advantages of coevolutionary methods over static training environments,

three inductive bias-driven baseline methods are presented and depicted in the top row of

ﬁgure 9.18. First is the cone conﬁguration (

𝜋

cone

). Here, a target position is randomly

selected within a fan-shaped area directly ahead of the chaser robot, simulating a forward-

focused pursuit. Second is the circular conﬁguration (

𝜋

circle

), where the target is randomly

placed anywhere within a complete circular area surrounding the chaser, promoting

omnidirectional movement. Third is the zigzag conﬁguration (

𝜋

zigzag

), where targets are

alternately placed to the left and right directly in front of the chaser, encouraging it to

adopt a zigzagging movement pattern. Additionally, to underscore the importance of

diversity in training, a scenario in which the chaser robot plays against a single evolved

opponent is included for comparison, denoted as 𝜋

single

These conﬁgurations were employed to benchmark the performance of traditional

methods against those that dynamically coevolve the training environment alongside the

251

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

agent. The bottom row of ﬁgure 9.18 illustrates the trajectories of all chaser policies as

they attempted to intercept a target moving along a sine-shaped route. In the ﬁrst two cases,

the coevolved policy successfully intercepted the target even before it reached the ﬁrst

turn. In contrast, the policies trained with the baseline conﬁgurations either fell behind or

required more time to catch up. When the target maneuvers through turns (as shown in the

last two plots), the coevolved policy adeptly followed the trajectory and captured the target,

whereas the baseline policies struggled, often losing balance or needing to slow down

signiﬁcantly to manage the turn. This stark contrast highlights that the coevolution of the

agent and the environment is crucial for achieving superior performance, as it allows the

agent to adapt more eﬀectively to complex and dynamic challenges.

(

𝑎

) Three conﬁgurations of initial positions for a static adversary

(

𝑏

) Trajectories of methods when the chaser robot tries to catch an escapee moving along a sine-wave route

Figure 9.18: Comparison with baseline methods. (

𝑎

) shows three conﬁgurations of initial

positions for a static adversary. (

𝑏

) shows trajectories of the methods when the chaser robot tries

to catch an escapee robot that moves along a sine-wave-shaped route. A cross at the end of a

trajectory indicates that the chaser has fallen or the target has escaped. A dot at the end means

successfully catching the target at that position. Short trajectories ending with dots indicate the

chaser catches the target early. The chaser trained with dynamic adversaries (blue trajectory) is

able to catch the target much earlier than other baseline policies, including the policy that plays

against a single opponent (𝜋

single

). Figure from Tang, J. Tan, and Harada (2020).

This example of coevolution of adversarial agents demonstrates how dynamic envi-

ronments can lead to open-endedness. They are more complex than static environments,

providing many ways to create new challenges. Agents evolved in this manner are not

only superior but also more robust, suggesting that the new challenges can be met. It

remains to be seen how far this approach can be pushed. It may need to be combined

with abilities to modify the body and the environment (as discussed in section 9.2.2, but

dynamic environments are likely an essential ingredient of constructing intelligent systems

through open-ended neuroevolution.

These considerations conclude the discussion of neuroevolution of behavior in this

book. The next three chapters will expand on the idea of cooperative learning systems.

However, instead of coevolution, combinations with other machine learning mechanisms

252

CHAPTER 9. OPEN-ENDED NEUROEVOLUTION

will be considered, including deep learning, reinforcement learning, and generative AI.

These mechanisms are synergistic in several ways, resulting in more powerful machine

learning.

9.4 Chapter Review Questions

Key Ingredients: What are the ﬁve elements of biological open-endedness that

could potentially inspire open-ended neuroevolution, and how do they support

continuous innovation?

Neutral Mutations: Why are neutrality and weak selection crucial for maintaining

diversity in large populations, and how do such processes diﬀer from traditional

approaches in evolutionary computation?

Role of Extinctions: How can extinction events accelerate evolution and increase

evolvability in computational experiments? Provide an example e.g. from the

bipedal walker domain.

Long-Term Eﬀects: Describe how repeated extinction events can lead to populations

that are more evolvable and capable of ﬁlling niches more eﬀectively.

GRNs and Evolvability: How do GRNs provide a substrate for evolvability, and

what advantages do they oﬀer compared to direct encodings in tasks like Nothello?

Indirect Encodings: Explain the role of indirect encodings in enhancing evolvability.

How do GRNs contribute to the discovery of robust and diverse neural network

motifs?

Miracle Jumps: What are łmiracle jumps,ž and why are expressive encodings

(e.g. GP or neural networks) more eﬀective than direct encodings in achieving such

jumps?

Comparative Power: Compare the beneﬁts of expressive encodings with traditional

evolutionary algorithms for solving problems with dynamically changing objectives.

Body-Brain Coevolution: How does coevolving an agent’s body and brain lead

to better solutions, and what principles can it reveal about designing eﬃcient and

specialized morphologies?

10.

Environment-Agent Coevolution: Describe the core mechanisms of the POET

algorithm for coevolving agents and environments. Why is this approach eﬀective

for solving complex challenges?

253

Chapter 10

Evolutionary Neural Architecture

The design of neural network architectures, i.e. the organization of neurons into assemblies

and layers and the connections between them, has played an important role in the advances

in deep learning. Through a combination of human ingenuity and the need to push state-

of-the-art performance, there have been several large leaps of technological innovation

since the early 2010s. During this time, the technique now known as neural architecture

search (NAS) also emerged as its own subarea of deep learning research. The goal of

NAS is to employ various methods such as reinforcement learning, gradient descent,

Bayesian optimization, and evolutionary search to automate the search for novel neural

network architectures, which are then trained with gradient descent to obtain the ﬁnal

network. The idea is that such an automated search could result in architectures super ior

to those hand-designed by human researchers. Evolutionary optimization is particularly

well-suited for NAS because it can optimize not only continuous hyperparameter values,

but discrete choices among alternative components, and even large structures such as

graphs. Many evolutionary optimization techniques have found a new use in NAS, and

new ones have been developed as well.

This chapter starts with a simple example combining NEAT topology search with

backpropagation for the weights. It then expands to deep learning architectures, with

examples in convolutional, recur rent, and general topologies. Particularly useful cases

for NAS are multiobjective domains where aspects other than performance need to be

optimized as well, and multitask domains where the needs of several tasks can be combined.

NAS requires a lot of computation, so techniques have been developed for eﬃcient search

and evaluation. It may also be possible to evolve the networks entirely, without gradient

descent as the second phase, in the future.

10.1 Neural Architecture Search with NEAT

The NAS idea can be illustrated by combining the NEAT topology search algorithm with

the backpropagation algorithm for training the weights of each neural network topology.

This concept of backprop NEAT appeared many times even before deep learning, and in

254

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

Figure 10.1: Types of nodes and activation functions in the backprop NEAT experiment. The

colors are used to label nodes in ﬁgures 10.2 and 10.3. Diﬀerent functions implement diﬀerent

computational properties that make the search for a good architecture more eﬀective.

that sense it can be seen as the grandfather of modern NAS. Incidentally (as discussed in

the info box later in section 10.2), it also encouraged the development of the NAS subﬁeld

within Google.

In backprop NEAT, a neural network topology is evolved using the NEAT-style

crossover and mutation operators. Unlike in the original version of NEAT, in this

experiment many types of activation functions are possible, represented as diﬀerent colors

in the neural network (the legend is shown in ﬁgure 10.1). The input to a neuron is the

usual weighted sum of incoming connections. The

add

operator does nothing to the input,

while the

mult

operator multiplies all the weighted inputs together. By allowing for a

sinusoidal operator, the network can produce repetitive patterns at its output. The

square

and

abs

operators are useful for generating symmetries, and the Gaussian operator is

helpful in drawing one-oﬀ clustered regions. The output neurons have sigmoid activation

functions since the task consists of classifying examples into two categories (0 or 1).

Each neural network topology that NEAT creates is represented as a computation

graph. It is then possible to run backprop on this same graph to optimize the weights of

the network to best ﬁt the training data. In this manner, NEAT is strictly responsible for

specifying the architecture, while backprop determines the best set of weights for it (in

the original NEAT, evolution is also used to determine the weights). In this experiment,

an L2 regularization term is also included in the backprop. The initial population of

networks consists of minimal architectures like the one in ﬁgure 10.2

𝑎

, implementing

logistic regression with a diﬀerent set of random weights, i.e.

𝑜 = 𝜎

(

𝑤

𝑥 + 𝑤

𝑦 + 𝑤

𝑏

)

, (10.1)

where

𝑥

and

𝑦

are the coordinates of the input sample,

𝑏

is the bias unit (activated at 1.0),

𝑤

𝑖

are the initial random weights, and

𝑜

is the output of the network. This simple network

divides the plane into two halves as shown in ﬁgure 10.2

𝑏

. The color coding represents

values from 0.0 (orange) through 0.5 (white) to 1.0 (blue). When the dataset consists of

two Gaussian clusters, this simple initial network performs quite well already. In fact,

when starting with an initial population of 100 simple networks with random weights,

before any backprop or genetic algorithm, the very best network in the population is likely

good enough for this type of dataset.

Each network architecture is assigned a ﬁtness score based on how well they do in the

classiﬁcation task after training them with backprop. In addition to measuring how well

each network ﬁts the training data, using the maximum likelihood metric, the number of

connections is also taken into account. Usually simpler networks are more regularized

255

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

(𝑎) Network architecture (𝑏) Classiﬁcation performance

Figure 10.2: An example network from the ﬁrst generation. The task consists of classifying input

samples (2-D points) into one of two categories (0/1). The initial population consists of networks

that implement logistic regression with a diﬀerent set of random weights. If the population is

large enough and the classiﬁcation problem is simple enough, some of those initial networks may

already do well in the task, as is the case in this nearly linearly separable classiﬁcation task. Videos

at https://neuroevolutionbook.com/demos.

and thus generalize better to new examples, and also take less memory and are faster to

run. Thus, simpler networks are preferred if they achieve similar regression accuracy to

more complex ones, or if they are much simpler, even if they are somewhat less accurate.

To achieve this goal, the ﬁtting error is adjusted by the number of connections as

𝑓 = −𝐸

√

1 + 𝑟𝑐, (10.2)

where

𝑓

is the ﬁtness,

𝐸

is the error over the training set,

𝑐

is the number of connections,

and

𝑟

is a proportionality factor. Thus, a network with more connections will have a ﬁtness

that is more negative than a network with fewer connections. The square root is used

because intuitively it seems a network with e.g. 51 connections should be treated about

the same as a network of 50 connections, while a network with ﬁve connections should

be treated very diﬀerently from a network with four connections. Other concave utility

functions may achieve the same eﬀect. In a way, like the L2 regularization of weights, this

type of penalty is a form of regularization on the neural network structure.

After a few generations, networks evolve that once trained, ﬁt training data well, even

in tasks that are not linearly separable (ﬁgure 10.3). How is backprop NEAT able to

do it? In machine learning and data science in general, performance often depends on

appropriate feature engineer ing, i.e. selecting or designing features that best represent the

input. This approach has the advantage of incorporating known human expertise into the

problem, making the learning task simple. For example, if the classiﬁcation task consists

of separating a small circle inside a big circle, the decision boundary is simply the distance

from the origin. Constructing two new features by squaring each input dimension, most of

the work has already been done for the network.

It is interesting to see whether NEAT can discover these features by itself without

relying on human engineering. So, the raw inputs to each NEAT network will only be

256

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

(𝑎) Network architecture (𝑎) Classiﬁcation performance

Figure 10.3: Evolved backprop NEAT networks for classifying data of varying complexity.

With XOR (top row), the architecture relies on

abs

and ReLU that allow the forming of long

lines with sharp corners. In contrast with concentric circles (middle row), the architecture takes

advantage of sinusoidal, square, and Gaussian functions to establish features that work well in such

radially (nearly) symmetric domains, making the machine learning task easier. With concentric

spirals, it further utilizes a complex topology to approximate the complex decision boundary. In

this manner, evolution discovers hyperparameters and structures that work well for the task, similar

to and possibly exceeding the ability of human engineers to design them.

the

𝑥

and

𝑦

coordinates, and the bias

𝑏 = 1

. Any further features, such as squaring those

variables, multiplying them, or putting them through a sinusoidal gate, will have to be

257

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

discovered by the algorithm. Indeed, it can select the appropriate activation functions

and network structure around them to implement useful features. For example with the

XOR dataset, networks utilized

abs

and ReLU activation functions, which are useful

in producing decision boundaries that are more or less straight lines with sharp corners

(ﬁgure 10.3). With concentric circles, the ﬁnal network often included many sinusoidal,

square, and Gaussian activation functions, which makes sense given the radial symmetry

of the dataset. With concentric spirals, which is almost symmetric but much more complex

as well, the architectures utilized similar functions but also a complex topology that

allowed it to match the complex decision boundary.

An interesting further observation is that networks that backprop well will tend to be

favored in the evolution process, compared to networks with gradients that are unstable. A

network with blown-up weight values is likely to perform poorly in classiﬁcation, resulting

in a poor ﬁtness score. More generally, given a set of backprop parameters, such as a

small number of backprop iterations or a large learning rate, evolution produces diﬀerent

kinds of networks, presumably those that learn well under such conditions. On the other

hand, if the parameters are not set right, backprop may not ﬁnd good weight values even

if they exist, thus discarding a powerful architecture. Analogously, a person with an

extraordinarily high IQ may never reach their full potential if they live in a very harsh

environment, or perhaps lack the people skills to inŕuence their peers to accept their ideas.

A solution in NAS is to make learning parameters evolvable as well. In that manner, good

parameter values can be discovered together with architectures that work well with them.

Such meta-learning approaches are discussed further in chapter 11.

10.2 NAS for Deep Learning

The backprop NEAT experiment in the previous section introduced the concept of topology

search for backpropagation neural networks. It illustrates the idea that even though gradient

descent will optimize weights for a given neural network, it is also useful to optimize its

hyperparameters and topology. This idea can be applied to modern deep learning as well.

This section brieŕy outlines the history of NAS in deep learning, introduces the general

approach, and reviews successes and challenges. Examples of prominent approaches and

future directions are described in the sections that follow.

As deep learning rose in power and popularity, it became evident that simple fully-

connected neural networks were not suﬃcient for most applications. Historically, many

powerful neural network building blocks have been discovered through a process of trial-

and-error to address certain existing neural network limitations. For example, convolutional

neural networks (CNNs) were created to minimize the number of connections required

for computer vision problems. Over time, CNN architectures grew more sophisticated,

including AlexNet (ﬁgure 10.4; Krizhevsky, Sutskever, and Hinton, 2012), the winner of

the 2012 ImageNet competition (Russakovsky, Deng, Su, et al., 2015). This result drew a

lot of attention and essentially got us out of the neural network winter and into the era of

deep learning. AlexNet led to the development of many more complicated architectures,

such as VGG (Simonyan and Zisserman, 2015), highway networks (R. K. Srivastava,

Greﬀ, and Schmidhuber, 2015), inception networks (Szegedy, Vanhoucke, Ioﬀe, et al.,

258

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

Figure 10.4: The AlexNet deep learning architecture. This architecture put deep learning into the

spotlight when it won the ImageNet competition in 2012. There are careful engineering decisions

that were involved in its design, including the principled organization into convolutional, pooling,

and dense layers. More recent networks are often even more sophisticated and require a pipeline

that spans network architecture and careful training schemes. Much manual labor is required in

addition to the human insight to make them work, which suggests that automated methods of

conﬁguring them might help. Figure from Krizhevsky, Sutskever, and Hinton (2012).

2016), and residual networks (ResNet; K. He, X. Zhang, Ren, et al., 2016), and more

recently, DenseNet, MobileNet, EﬃcientNet, and CoAtNet (Z. Dai, H. Liu, Le, et al.,

2021a; G. Huang, Z. Liu, van der Maaten, et al., 2017a; Sandler, Howard, M. Zhu, et al.,

2018; M. Tan and Le, 2021). These architectures were designed to stack up many layers of

neural networks eﬀectively by taking advantage of repeated modules and skip connections

between them.

Concurrently, for sequential tasks, people designed better recurrent neural network

(section

2.3.3) architectures that outperformed simple full-connected vanilla recurrent

neural networks, such as LSTM (section 2.3.4), gated recurrent unit (J. Chung, Gulcehre,

Cho, et al., 2014), and others. Most recently, with the introduction of the self-attention-

based transformer architecture (section 2.3.6), there have been a host of proposals that

claim to oﬀer better, incremental performance to the original transformer.

Much of this research was performed by graduate students who experimented with

diﬀerent architecture conﬁgurations, based on their hunches and instincts, who would try

to experimentally discover new architectures that would oﬀer some performance beneﬁts

compared to prior architectures. Some refer to this process as graduate student descent

(GSD), a joke on the stochastic gradient descent (SGD) optimization process, hinting that

the progress of machine lear ning research might be automated by a machine (J.

B. Huang,

2021).

One of the main obstacles to the automated approach was that most deep learning tasks

typically take several days to train. However, with the advent of large GPU computing

clusters, it became feasible in the mid-2010s. The NAS subﬁeld gradually emerged and

became quite popular in the late 2010s. A form of graduate student descent applied to the

area of NAS itself, and today, there are thousands of papers on the subject (for reviews,

see e.g. Y. Liu, Sun, Xue, et al.,

2021; C. White, Safari, Sukthanker, et al., 2023), and

even a popular, standardized benchmark for measuring the performance of NAS methods

(Dong and Y. Yang, 2020; Ying, Klein, Christiansen, et al., 2019; Zela, Siems, Zimmer,

259

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

et al., 2022).

Info Box: Development of NAS inside Google Brain

In a way, the development of NAS was related to the career path that prompted

me (David Ha) to become a researcher at Google Brain and led me to conduct

much of my nature-inspired research ever since. In 2016 I published the Backprop

NEAT experiment (section 10.1) as a personal blog post, and it somehow caught

the attention of Jeﬀ Dean, who reached out to me to comment on the concept of

separating topology search and weight optimization, and had an interest to explore

this idea deeper, potentially at Google scale. This conversation prompted me to

apply and join Google Brain’s residency programÐin fact, Quoc Le (a co-author

in the early NAS paper; Zoph and Le, 2017) was my ﬁrst interviewer for the job!

Quoc had a fantastic vision of developing a pipeline that could eventually automate

much of the machine learning work at Google, which eventually became known as

the AutoML project years later.

Quoc became my mentor and advisor, and we decided to explore two concepts:

neural networks that generated weights (which became Hypernetworks (Ha, A. Dai,

and Le, 2017), my ﬁrst project there), and neural network architecture search (a

project led by Barret Zoph, who is a brilliant engineer and quickly learned to

navigate Google’s enormous compute resources with a ﬁtting name, Borg!). The

NAS project sought to apply topology searchÐdeﬁne a search space for neural

network architectures, and by leveraging Google’s large compute resources, identify

the architectures within the search space that will perform well on benchmark deep

learning tasks such as image recognition or language modeling. This project got

me started on large machine learning models, a path I’m on still today.

At around 2016, there were two dominant paradigms in deep learning: CNNs for

image processing and RNNs for sequence processing (or some combination of CNNs and

RNNs for spatial-temporal applications such as video processing). The architecture design

problem for CNNs and RNNs looked quite diﬀerent. For CNNs, it involved identifying the

best combination of convolutional ﬁlters, which are great priors for image processing due

to the positional invariance property. Therefore, the task for designing, or automating the

design of, CNN architectures required a search space that mainly focused on the edges (or

the connections) of a graph. In contrast, sequential processing and sequence generation

tasks relied on RNNs, which applied the same network architecture many times over,

recurrently (hence the name). The essential element of the RNN is its memory node, i.e.

a ﬁxed structure that is replicated and activated many times. The search space mainly

focused on the architecture of this node, i.e. its internal structure of cells, connections,

activation functions, and speciﬁcation of the state. In both cases, the problem was framed

as a black-box optimization problem.

This automated search approach required enormous computational resources (Real,

S. Moore, Selle, et al., 2017); while the sampling process of architectures (the outer loop)

is eﬃcient, the calculation of the reward signal, or ﬁtness for each candidate architecture

(the inner loop), required training a neural network on the actual task. Computer vision

260

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

benchmarks at the time, such as CIFAR-10, often required training the neural network for

weeks on a single GPU. As a solution, researchers started to use proxies for the ﬁtness

function. For instance, for image classiﬁcation, they would train for only a limited number

of steps on CIFAR-10, and make the assumption that whatever metric had been achieved

after

𝑛

steps will be a good metric to rank the models (S. Jiang, Ji, G. Zhu, et al., 2023;

Miikkulainen, J. Liang, Meyerson, et al., 2023; Rawal and Miikkulainen, 2020). This is a

good assumption since there is often a high correlation between the ﬁnal performance and

early-stage training performance of neural networks. Also, the tasks and benchmarks used

for NAS were often smaller in scale. For instance, CIFAR-10 or a low-resolution version

of ImageNet was used for training image classiﬁcation models, and the Penn Treebank

(PTB) dataset was used for training language models. The authors would then demonstrate

that the resulting models transfer to larger-scale datasets, such as the full ImageNet or

JFT-300M for images, and Wikipedia 100M or 1B benchmarks for text (Real, Aggarwal,

Y. Huang, et al.,

2019; Zoph, Vasudevan, Shlens, et al., 2018). Fur ther, the child models

can share parameters, speeding up the search thousandfold (Pham, Guan, Zoph, et al.,

2018). The architectures can also be scaled or stacked to have more capacity and thus

achieve better performance (Real, Aggarwal, Y. Huang, et al., 2019).

NAS did produce architectures that are useful in production, especially neural networks

that achieve high performance at low computational cost for inference (in terms of inference

speed and also number of parameters). Three examples are reviewed in the next section,

on LSTM node design, general modular networks, and reﬁnement of existing designs, all

based on evolutionary optimization. Evolutionary NAS was also applied to the transformer

architecture, to produce evolved transformers (So, Le, and C. Liang, 2019), which also

perform better on benchmark tasks while requiring fewer resources.

It is actually remarkable that there are many diﬀerent approaches to NAS, and they all

work well. It seems that you can apply almost any optimization techniqueÐevolution, RL,

Bayesian optimization, gradient descentÐand get improved results. Even just random

search may perform well, for instance achieving results within less than half a percent

of more sophisticated NAS methods, and close to state-of-the-art performance for both

image classiﬁcation and language modeling benchmarks (L. Li and Talwalkar, 2020; Real,

Aggarwal, Y. Huang, et al., 2019). This observation suggests that much of the performance

is already baked into the hand-engineered building blocks of NAS, such as convolutional

ﬁlters, self-attention layers, and RNN nodes. The research community has designed them

by hand to achieve state-of-the-art performance. NAS has proven useful as a way to

ﬁne-tune them, but it has not yet produced innovations that could automate the discovery

of such truly fundamental concepts.

That is probably why, despite these improved MobileNet, transformer, and RNN node

architectures, people still often use the traditional MobileNet, the classical transformer,

and the original LSTM in most networks in production. The per formance gains have

not yet been large enough and their implementations stable enough for the software and

hardware vendors to converge on the improved variants. The NAS ﬁeld continues to make

progress though, including successes outlined in the next few sections, and discoveries

that extend to other ﬁelds, which may lead to such convergence in the future.

261

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

10.3 Case Studies: Improving Deep Learning SOTA

This section reviews three NAS case studies that resulted in SOTA performance at the time.

The ﬁrst one, the design of LSTM nodes, improved the original design that had stayed

the same since the 1990s. It demonstrated that complexifying the design can add power

even though such designs are diﬃcult for humans to discover. The second, CoDeepNEAT,

generalizes ideas from general neuroevolution to the level of network architectures. In

principle, it could discover new architectural principles that work better than the existing

human-designed ones. It has not so farÐthe challenge is to identify the proper building

blocks and then take advantage of structure. The third, Amoebanet, utilizes structure,

scaling, and regularization more explicitly by hand. It achieved SOTA on ImageNet in

2018, which was a remarkable achievement given that ImageNet was the main focus of

the machine-learning community at that time. It may be possible to use an Amoeba-like

approach in the future to incorporate new ideas and improve performance again. Note that

even a slight improvement is sometimes useful: For instance in ﬁnance, healthcare, and

engineering design, it translates to money, lives, and resources saved.

10.3.1 LSTM Designs

First, consider the design of better LSTM nodes. The original architecture (ﬁgure 10.5

𝑎

)

had been developed in the 1990s (Hochreiter and Schmidhuber, 1997), and despite many

attempts to improve it by hand, it was deemed to be robust, general, and usually at least

as good as the alternatives (Greﬀ, R. K. Srivastava, Koutník, et al., 2016). We reviewed

the LSTM architecture in section 2.3.4; in essence, an LSTM node is a neuron that can

memorize a value in its internal memory cell indeﬁnitely long. It contains circuitry for

loading that value (the input gate), reading it out (the output gate), and erasing it (the

forget gate). A sequence processing network includes many such nodes, and their internal

parameters (weights, activation functions) can be modiﬁed through backpropagation.

Through such learning, each node determines when and how it can utilize its memory cell

best as part of processing sequences.

Even though this design is principled and makes sense, it turns out that it can be

complexiﬁed signiﬁcantly, leading to LSTM nodes that perform better. Its internal

processing can be more complex, propagating through a nonlinear network with multiple

paths. Its memory state can be more complex, consisting of multiple memory cells. It can

utilize a variety of activation functions in its internal nodes and more general memory

blocks. Such complexiﬁcation is diﬃcult for humans to develop, but NAS methods can do

it.

The ﬁrst such improvement was based on reinforcement learning (Zoph and Le,

2017). A recurrent network was used to generate the node designs, trained through the

REINFORCE algorithm (R. J. Williams, 1992) to maximize the expected accuracy on a

validation set. The resulting NASCell was signiﬁcantly more complex than the original

LSTM design (ﬁgure 10.5

𝑏

). However, the exploration ability of such reﬁnement search

is somewhat limited and can be expanded through evolutionary methods.

In particular, genetic programming was used to search for trees representing the node

structure, resulting in designs with multiple nonlinear paths and multiple memory cells

262

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

(

𝑎

) Original

LSTM

(𝑏) NASCell node (language

modeling)

(𝑐) Evolved node (language

modeling)

(𝑑) Evolved node

(music modeling)

Figure 10.5: NAS in LSTM node design. At the lowest level, NAS can be used to design nodes in

a recurrent neural network. In the node diag rams above, the

ℎ(𝑡)

is the main output of the node,

propagated to other nodes. The

𝑐(𝑡)

and

𝑑(𝑡)

are outputs of the native memory cell, propagated

internally. The green input elements denote the native memory cell outputs from the previous

time step (i.e.

𝑐(𝑡 − 1)

𝑑(𝑡 − 1)

). The red input elements are formed after combining the node

output from the previous time step (i.e.

ℎ(𝑡 − 1)

) and the new input from the current time step

(

𝑥(𝑡)

. The other colors identify activation functions in computational cells: ReLU, sigmoid, tanh,

sin, add, and multiply. In all solutions, the memory cell paths include relatively few nonlinearities.

Unlike LSTM and NASCell, the evolved nodes reuse inputs and utilize extra memory cells in

diﬀerent parts of the node; they also discovered LSTM-like output gating. The evolved nodes

for language and music modeling are diﬀerent, suggesting that evolution captures and utilizes

the inherent structure in these domains to perform better. In this manner, neuroevolution was

able to improve upon a human design that had stayed the same for decades and was considered

optimal among many variants. For an animation of this search process and an interactive demo, see

https://neuroevolutionbook.com/demos. Figures from Rawal and Miikkulainen (2020).

(ﬁgure 10.5

𝑐

; Rawal and Miikkulainen, 2020). In the language modeling domain (i.e.

predicting the next word), this design was organized into two layers of 540 nodes each

and evolved for 30 generations. Compared to networks of similar size, it improved 20

perplexity points over the original LSTM and 1.8 points over the NASCell, achieving the

state-of-the-art (SOTA) performance of 62.2 at the time. Most interestingly, when the

same approach was applied to the music modeling domain (i.e. predicting the next note),

a diﬀerent design emerged as the best (ﬁgure 10.5

𝑑

). This result suggests that diﬀerent

domains have diﬀerent structure; such structure can be learned by NAS and architectures

customized to take advantage of it.

These results opened the door to optimizing combinations of diﬀerent kinds of

memory nodes, like those used in the neural Tur ing machine (section

12.3.5; Khadka,

J. J. Chung, and Tumer,

2019), and other recurrent network elements (Ororbia, ElSaid,

and Desell, 2019). As a result, the memory capacity of the model increased multifoldÐan

improvement that likely would not have happened without such automated NAS methods.

263

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

(𝑎) CoDeepNEAT approach (𝑏) Image captioning network

Figure 10.6: Discovering general neural architectures through coevolution of modules and

blueprints. The CoDeepNEAT approach (Miikkulainen, J. Liang, Meyerson, et al., 2023) aims at

discovering modular architectures in an open-ended search space. (

𝑎

) Blueprints represent the

high-level organization of the network and modules ﬁll in its details. The blueprint and module

subpopulations are evolved simultaneously, based on how well the entire assembled network

performs in the task. This principle was originally developed for evolving entire networks including

the weights (Gomez and Miikkulainen, 1997; Moriarty and Miikkulainen, 1997), but it applies

in neural architecture search for deep learning as well. (

𝑏

) The overall structure of a network

evolved for the image captioning task; the rectangles represent layers, with hyperparameters

speciﬁed inside each rectangle. One module, consisting of two LSTM layers merged by a sum,

is repeated three times in the middle of the network. The main advantage of CoDeepNEAT is

that it can discover a wide range of network structures. They may take advantage of principles

diﬀerent from those engineered by humans, such as the multiple parallel paths brought together

at the end in this network. For a demo of CoDeepNEAT in the character recognition task, see

https://neuroevolutionbook.com/demos

. Figures from Miikkulainen, J. Liang, Meyerson,

et al. (2023).

10.3.2 CoDeepNEAT

As a second example, consider the CoDeepNEAT method of discovering general network

designs. CoDeepNEAT (J. Liang, Meyerson, Hodjat, et al., 2019; Miikkulainen, J. Liang,

Meyerson, et al., 2023) builds on several aspects of techniques developed earlier to evolve

complete networks. In SANE, ESP, and CoSyNE (section 7.1.1), partial solutions such as

neurons and connections were evolved in separate subpopulations that were then combined

into full solutions, i.e. complete neural networks, with the global structure speciﬁed

e.g. in terms of a network blueprint that was also evolved (Gomez and Miikkulainen,

1997; Gomez, Schmidhuber, and Miikkulainen, 2008; Moriarty and Miikkulainen, 1997).

Similarly, CoDeepNEAT co-evolves multiple populations of modules and a population

of blueprints that specify which modules are used and how they are connected into a

full network (ﬁgure

10.6

𝑎

). Modules are randomly selected from the speciﬁed module

264

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

population to ﬁll in locations in the blueprint. Each blueprint is instantiated in this way

many times, evaluating how well the design performs with the current set of blueprints.

Each module participates in instantiations of many blueprints (and inherits the ﬁtness of

the entire instantiation each time), thus evaluating how well the module works in general

with other modules. The main idea of CoDeepNEAT is thus to take advantage of (and

scale up with) modular structure, similarly to many deep learning designs such as the

inception network and the residual network (K. He, X. Zhang, Ren, et al., 2016; Szegedy,

Vanhoucke, Ioﬀe, et al., 2016).

The modules and the blueprints are evolved using NEAT (section 3.3), again initially

designed to evolve complete networks and adapted in CoDeepNEAT to evolving network

structure. NEAT starts with a population of simple structures connecting inputs straight

to outputs, and gradually adding more modules in the middle, as well as parallel and

recurrent pathways between them. It thus prefers simple solutions, but complexiﬁes the

module and bluepr int structures over time as necessary. It can, in principle, design rather

complex and general network topologies. However, while NEAT can be used to create

entire architectures directly, in CoDeepNEAT it is embedded into the general framework

of the module and blueprint evolution; it is thus possible to scale up through repetition

that would not arise from NEAT naturally.

The power of CoDeepNEAT was originally demonstrated in the task of image

captioning, a domain where a competition had been run for several years on a known

dataset (Miikkulainen, J. Liang, Meyerson, et al., 2023). The best human design at that

point, the Show&Tell network (Vinyals, Toshev, S. Bengio, et al., 2015), was used to

deﬁne the search space; that is, CoDeepNEAT was set to ﬁnd good architectures using

the same elements as in the Show&Tell network. Remarkably, CoDeepNEAT was able

to improve the perfor mance further by 15%, thus demonstrating the power of neural

architecture search over the best human solutions (Miikkulainen, J. Liang, Meyerson,

et al.,

2023). Similar CoDeepNEAT evolution from a generic starting point was later

used to achieve a state-of-the-art in text classiﬁcation (Wikidetox; J. Liang, Meyerson,

Hodjat, et al., 2019) and image classiﬁcation (chest X-rays; J. Liang, Meyerson, Hodjat,

et al., 2019)). Indeed, these successes demonstrated that with minimal computational

cost, neural architecture search can achieve performance that exceeds that of standard

architectures, making it possible to quickly and eﬀectively deploy deep learning to new

domains.

Most importantly, the best networks utilized a principle diﬀerent from human-designed

networks: They included multiple parallel paths, possibly encoding diﬀerent hypotheses

brought together in the end (ﬁgure 10.6

𝑏

). In this manner, the large search space utilized

by CoDeepNEAT may make it possible to discover new principles of good performance.

Such discover y is indeed the main power of CoDeepNEAT, and what it was initially

designed to do. At the time, papers were coming out, outdoing each other by proposing a

diﬀerent architecture. The space of good architectures seemed large and ripe for discovery.

Soon after, however, the transformers and diﬀusion architectures were developed and

became dominant. While there is still plenty of opportunity to optimize variants of

them using neuroevolution, a major question for the future is whether open-ended search

methods such as CoDeepNEAT can be developed further to discover new principles that

265

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

(𝑎) AmoebaNet approach (𝑎) Comparison in ImageNet

Figure 10.7: Evolutionary discovery in the NASNet search space compared to RL and random

search. In contrast with the open-ended search in CoDeepNEAT, the AmoebaNet method (Real,

Aggarwal, Y. Huang, et al., 2019) performs a more focused search. (

𝑎

) It evolves a stacked

architecture of inception-like normal and reduction modules (cells); these networks are then scaled

to larger sizes algorithmically. AmoebaNet also promotes regularization by removing the oldest

individuals in the population. (

𝑏

) As a result, it discovers architectures that are more accurate than

those discovered through random search and RL, reaching state-of-the-art accuracy in standard

benchmarks like ImageNet. Figures from Real, Aggarwal, Y. Huang, et al. (2019).

might follow them.

10.3.3 AmoebaNet

Even small improvements to performance are sometimes useful. If you are designing

a network to predict ﬁnancial data, half a percent can translate to millions. If it is to

predict eﬀects of treatments, it can save lives. Thus, NAS applied to the reﬁnement of

existing ideas can play an important role. Perhaps the best example of such work is the

AmoebaNet system (Real, Aggarwal, Y. Huang, et al.,

2019). At its time, it improved

the state-of-the-art in the ImageNet domain, which had been the focus of deep learning

research for several years. Human experts have designed many architectures and ideas for

it; AmoebaNet exceeded the performance of all of them by utilizing evolutionar y neural

architecture search in a manner that mattered in practice.

Three innovations made this result possible. First, search was limited to a NASNet

search space (Zoph, Vasudevan, Shlens, et al.,

2018), i.e. networks with a ﬁxed outer

structure consisting of a stack of inception-like modules (ﬁgure 10.7

𝑎

). There were two

diﬀerent module architectures, normal and reduction; they alternate in the stack, and

are connected directly and through skip connections. The architecture of the modules

is evolved, and consists of ﬁve levels of convolution and pooling operations. The idea

is that NASNet represents a space of powerful image classiﬁers that can be searched

eﬃciently. Second, a mechanism was devised that allowed scaling the architectures to

much larger numbers of parameters, by scaling the size of the stack and the number of

ﬁlters in the convolution operators. The idea is to discover good modules ﬁrst and then

increase performance by scaling up. Third, the evolutionary process was modiﬁed to favor

266

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

younger genotypes by removing those individuals that were evaluated the earliest from the

population at each tournament selection. The idea is to allow evolution to explore more

instead of focusing on a small number of genotypes early on. These ideas are generally

useful in evolutionary ML, not just as part of the AmoebaNet system.

Indeed, AmoebaNet’s accuracy was the state-of-the-art in the ImageNet benchmark

at the time. Experiments also demonstrated that evolutionary search in NASNet was

more powerful than reinforcement learning and random search in CIFAR-10, resulting in

faster learning, more accurate ﬁnal architectures, and ones with lower computational cost

(ﬁgure 10.7

𝑏

). It also demonstrated the value of focusing the search space intelligently so

that good solutions are in that space, yet it is not too large to ﬁnd them.

Thus, LSTMs, CoDeepNEAT, and AmoebaNet demonstrated the potential of evolution-

ary NAS in discovering new principles and making practical optimizations to existing ones.

A challenge for the future is to take them to transformers, diﬀusion networks, and beyond.

In the meantime, however, such approaches are useful in two important areas: optimizing

architectures for speciﬁc hardware constraints, and discovering architectures that can

perform well with little data by utilizing other tasks and datasets. These opportunities will

be discussed in the next section.

10.4 Multiobjective and Multitask NAS

In the NAS discussion so far, improved SOTA performance in the task has been the main

and only objective. Indeed, as mentioned above, in certain domains the cost of putting

together a large dataset and spending a lot of compute to achieve even small improvements

can be worth it. Benchmarks are also a good motivation for research: it is fun to compete

with other researchers in achieving better performance in them, and thus gain prestige and

recognition.

However, when new technologies are taken to the real world, a number of new, practical

challenges emerge. In particular, expertise to build good models may not be available; the

possibility of adversarial attacks may need to be taken into account; the models may run

on the edge, with limited compute and other hardware restrictions; the data may not be

suﬃcient in quality and quantity to train good models. Neural architecture search, and

meta-learning in general, can be used to cope with each of these challenges.

First, designing good models for new learning tasks still relies on scarce expertise. The

available simulators, such as TensorFlow, PyTorch, and Keras provide standard models as

starting points, and in many cases, they work well. However, the number of datasets and

problems where they can potentially be used is also very large, and applications could

often beneﬁt even from small optimizations. Searching for appropriate architectures is

not the only optimization; other meta-learning dimensions such as activation functions,

loss functions, and data augmentation are useful as well, as is optimization of general

learning parameters (these approaches will be reviewed in chapter 11). The term łAutoMLž

has been coined to refer to such processes in general: The user provides a dataset and

a starting point for learning, and the learning system conﬁgures itself automatically to

achieve better results (X. He, K. Zhao, and Chu, 2021; J. Liang, Meyerson, Hodjat, et al.,

2019). The goal is not necessarily to achieve state-of-the-art in any particular domain but

267

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

to reduce the human time and expertise needed to build successful applications. In this

manner, deep learning can have a larger impact in the real world.

Second, adversarial robustness is a crucial consideration outside of controlled bench-

mark environments. In the real world, models are often exposed to carefully crafted

inputsÐknown as adversarial examplesśthat can lead to critical misclassiﬁcations. Tra-

ditional defenses, such as adversarial training, are often limited in generalizability and

computationally expensive. A promising alternative is to frame NAS as an optimization

problem, where both standard accuracy and robustness to adversarial attacks are optimized

simultaneously. For example, robust architecture search (RAS; Kotyan and Vasconcellos

Vargas, 2020) extends NAS by explicitly incorporating adversarial accuracy into the ﬁtness

function. The resulting architectures, discovered without adversarial training, display

structural patternsÐsuch as high-dimensional projections and diverse computational

pathwaysÐthat contribute to their inherent robustness. This approach echoes insights

from manually designed models: for instance, WideResNet has been the state-of-the-art

for CIFAR-10 adversarial robustness since 2020, in part due to their architectural width

and capacity for feature diversity. RAS demonstrates that similar or even novel robust

features can be discovered automatically through neuroevolution.

Third, many applications cannot be deployed to run on data centers with dedicated

top-of-the-line hardware, but need to run on commodity compute, or even on highly

constrained compute in the edge: vehicles, drones, probes in extreme environments, as

well as watches, appliances, clothing, and so on. Only a fraction of the model sizes used

in research may be available in such applications, and there may be limitations on memory

structure, communication, latency, etc. NAS can play a signiﬁcant role in optimizing the

models to perform as well as possible under such conditions.

In some cases, the constraints must be met entirely, or the solutions are unviable.

As usual in evolutionary computation, such constraints can be implemented as penalty

functions, thus allowing evolution to explore more broadly but eventually converge to

solutions that satisfy the constraints. It may also be possible to modify the solutions

algorithmically to make them comply; evolution will then ﬁnd a way to optimize the

solutions under such postprocessing.

In other cases, the constraints incur a cost that needs to be minimized. NAS for such

applications is multiobjective, aiming at identifying good tradeoﬀs between performance

and cost outcomes. For instance, CoDeepNEAT can be extended with multiobjective

optimization to form Pareto fronts of accuracy and network size (J. Liang, Meyerson,

Hodjat, et al.,

2019). In the domain of classifying X-ray images, a variety of tradeoﬀs

were discovered, but there was also a sweet spot in the front: an architecture that was

1/12th of the size of the best-performing network while only giving up 0.38% in accuracy

(ﬁgure 10.8). In a similar manner, other objectives could be included, such as training

time, the amount of training data needed, or energy consumption. Multiobjective NAS

can thus make many more deep learning applications feasible in the real world.

In the most extreme case along these lines, NAS can be used to optimize designs

for neuromorphic hardware. In order to minimize energy consumption, many such

architectures are based on spiking neurons, are small in size, and limited in connectivity.

Standard deep learning architectures are not well-suited for them, and there are many

268

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

Figure 10.8: Simultaneous optimization of network size and performance. The number of

parameters in the network is in the

𝑥

-axis and the accuracy in classifying X-ray images to 14

diﬀerent diseases is in the

𝑦

-axis. The curves show the Pareto fronts obtained in a single-objective

evolution (of accuracy; green) and multiobjective evolution (of accuracy and number of parameters;

blue). Both populations include a range of tradeoﬀs, but the multiobjective evolution discovers

consistently better ones, including one at the elbow that is 1/12th of the size and 0.38% less accurate

than the top accuracy. In this manner, NAS can discover architectures that not only perform well

but also adhere to cost constraints, making more applications possible in the real world. For

an animation of this process, see

https://neuroevolutionbook.com/demos

. Figures from

J. Liang, Meyerson, Hodjat, et al. (2019).

opportunities to discover creative, new designs. A most interesting and potentially

fundamental way is to co-evolve the hardware design with the neural network design

simultaneously. In this manner, it may be possible to discover powerful solutions that are

highly specialized and customized to individual use cases. These opportunities will be

discussed in more detail in section 11.5.

The fourth real-world challenge is insuﬃcient data. Indeed, data is now collected

everywhere from small businesses, doctors’ oﬃces, and engineering ﬁrms to large-scale

transportation, weather, business, and education systems. Unfortunately, such data is

often siloed and not aggregated, and often also proprietary and intentionally kept in-house.

Even though the data could in principle be used to solve many prediction and optimization

problems, there is not enough of it to take advantage of modern machine learning. Such

models would simply learn to memorize and overﬁt and not perform well with future data.

Interestingly, in many such domains, it may be possible to build better models by

utilizing other datasets (Caruana, 1997; Meyerson and Miikkulainen, 2019). When a

model is trained to per form multiple tasks simultaneously, represented by diﬀerent datasets,

it learns to encode each task based on synergies and commonalities between them. Such

common knowledge in turn establishes biases that make it possible to generalize better,

even when the training data within each task alone would be insuﬃcient.

An important role for NAS is to discover architectures that take the best advantage of

such synergies between tasks. Many designs are possible (ﬁgure 10.9: If the tasks are

well-aligned, a single processing path with a diﬀerent head for each task may be the best

269

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

Figure 10.9: Alternative approaches to multitask learning. When multiple tasks are learned

simultaneously, the network may discover and utilize general principles underlying them, and

perform better than when trained with each task alone. (

𝑎

) If the tasks are similar, a single column

with a diﬀerent head for each task may work well. (

𝑏

) A more ŕexible architecture may consist of

a number of modules at each level, and each task uses them diﬀerently. (

𝑐

) In the most general

case, a customized topology may be used to support a number of diﬀerent tasks. It is diﬃcult to

decide which architecture works well; evolutionary NAS can be used to ﬁnd optimal ways to do it.

Figure from Meyerson and Miikkulainen (2018a).

way to integrate them. Alternatively, many parallel paths can be constructed, and diﬀerent

tasks will utilize them diﬀerently. If the tasks are suﬃciently diﬀerent, a complex topology

with diﬀerent tasks performed at diﬀerent levels based on customized topologies may be

needed. It is diﬃcult to tell ahead of time which architectures work well; evolutionary

NAS is a good way to optimize them.

To motivate an approach, ﬁrst consider training a simple network to support multiple

tasks. The network consists of a few tightly connected layers and has a number of decoder

layers on top, one for each task. The tasks can be real, i.e. be based on diﬀerent datasets,

or they can be pseudotasks, constructed artiﬁcially by assigning a diﬀerent set of labels to

the same training examples (Meyerson and Miikkulainen, 2018b). Gradient descent can

then be used to train this architecture.

In the next step, the architecture consists of multiple levels of several such modules.

All modules are included at all levels, but the network learns to utilize them diﬀerently

at diﬀerent levels for diﬀerent tasks. Through gradient descent, they learn functional

primitives that are useful in several tasks (Meyerson and Miikkulainen, 2018a).

This is where neuroevolution comes in. It is possible to use evolution to discover

an optimal topology of these modules for each task. That is, each task has a diﬀerent

organization of modules into a network topology, but the modules all come from the same

set, trained together via gradient descent in all tasks. In this manner, the modules still

learn to encode functional primitives; evolution ﬁgures out how to use these primitives

optimally in each task.

The ﬁnal step, then, is to use CoDeepNEAT to evolve the structure of the modules

themselves (in the CMTR method; J. Liang, Meyerson, and Miikkulainen,

2018). In

this manner, (1) high-level evolution customizes the topology for each task, (2) low-

level evolution optimizes the structure of the modules so that they can extract common

270

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

knowledge most eﬀectively, and (3) gradient descent extracts the common knowledge

across tasks and encodes it into the modules.

This approach was demonstrated e.g. in the Omniglot domain, i.e. in recognizing

handwritten characters in multiple diﬀerent alphabets (Lake, Salakhutdinov, and Tenen-

baum, 2015; J. Liang, Meyerson, and Miikkulainen, 2018). While the alphabets are quite

diﬀerent, they are still related in that each consists of shapes and combinations of lines in

a limited area. While there are only 20 examples of each character, there are 50 diﬀerent

alphabets, and therefore multitask learning is an eﬀective way to combine knowledge

from all alphabets to learn each one well. Moreover, evolutionary optimization makes it

possible to learn and utilize common knowledge well, as well as to specialize: The CMTR

approach improved the state-of-the-art by 30% in this domain.

It is interesting to see the solutions CMTR created (ﬁgure 10.10). In general, the more

complex the alphabet, the more complex the topology. One example is Angelic, a synthetic

alphabet designed in the 1500s to communicate with angels. It is more decorative and

unique than most, and the network constructed for it is complex. Also, alphabets that

look similar have similar networks. For instance, Hebrew and N’ko both have dominant

horizontal lines, and their network topologies are similar; Latin and Cyrillic are similar as

well. Interestingly, when evolution is run multiple times, consistent topologies emerge for

the same language each time, suggesting that they indeed capture essential representations

for each task. It would be diﬃcult to come up with such representations by hand, but

evolutionary NAS does it reliably.

Multitask learning has been demonstrated to work well even when the tasks are very

diﬀerent. For instance, language learning, vision, and genomic structure prediction can all

be mutually informative, even though they represent very diﬀerent domains in the world.

A method for aligning the parameters across such diﬀerences is needed, but with such a

method, it seems possible to support many disparate domains with many others (Meyerson

and Miikkulainen, 2019).

Apparently, the world is based on a set of fundamental principles and structures that re-

peat across domains, perhaps as low-dimensional manifolds embedded in high-dimensional

spaces. Thus, learning to understand part of the world helps in understanding other parts. It

may be possible to take advantage of this observation to evolve supernetworks,, consisting

of modules that can be reused in diﬀerent conﬁgurations, to learn new tasks (section 10.5.

More generally, it may be possible to construct a central facility that learns and represents

these regularities as variable embeddings, and diﬀerent tasks are then established by

learning specialized encoders and decoders of this knowledge (as in the traveling observer

model, or TOM; Meyerson and Miikkulainen, 2021). This approach can be instantiated

through multitask lear ning and evolution. It may also be possible to utilize LLMs as

the central facility, and then evolution to discover the customized encoders and decoders.

While such architectures do not yet exist, the approaches reviewed in this section are a

possible starting point for constructing them. This is one approach that might, in the long

term, lead to agents with general intelligence.

271

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

Figure 10.10: Network topologies discovered for diﬀerent handwritten alphabets. Each

network is trained to recognize handwritten characters of one alphabet. However, each topology is

constructed from the same set of neural network modules (indicated by color) and thus such training

results in modules that encode the underlying functional primitives of many tasks. More complex

alphabets receive more complex topologies, and similar alphabets receive similar topologies.

The resulting topologies are consistent across several runs of evolution and training, suggesting

that they indeed capture underlying principles. Even though the training data is limited for each

task, the pr imitives make it possible to learn each task wellÐbetter than if the networks were

trained from scratch with their own data only. Thus, NAS can be used to tie together learning

of multiple tasks so that learning with otherwise insuﬃcient data is possible, making it possible

to extend machine learning to more real-world tasks. For an animation of this evolutionary

process, an interactive character recognition demo, and other demos on multitask evolution, see

https://neuroevolutionbook.com/demos.

10.5 Making NAS Practical

Even in settings where NAS can make useful discoveries, the approaches are still limited

by available computation. Eﬃcient implementations can make a big diﬀerence, leading

to better solutions. The approaches involve evaluating a large number of neural network

designs, which is very expensive. Training a deep learning network can take several days,

and a search for good designs may need to evaluate millions of candidates. If the search

simply runs as an outer loop, it will be limited to a few hundred or thousand candidates.

Several principled eﬃciency optimizations are possible. One impor tant one is to

utilize surrogate models. Instead of modeling how the world will respond to a solution, as

was done in section 6.4.2, they model the solutions directly, i.e. how well each solution

is going to perform in the task. This approach is useful in meta-learning in general: In

its most general form, it powers bilevel evolution, i.e. an approach where an outer-loop

evolution optimizes the parameters of an inner loop evolutionary process (section 11.2). It

can be instantiated to speed up search in all aspects of meta-learning, including that of

activation functions (section 11.3.2).

Surrogate models are usually trained with a sample of solutions. For instance in NAS,

272

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

Figure 10.11: The MSuNAS approach for evolving convolutional networks. The idea is to make

search practical by limiting the search space and by guiding the search. The search space consists of

ﬁve computational blocks, and is parameterized through the number of layers, kernel size, channels

(that expand through the layers), and input resolution. (

𝑎

) The parameters are selected from a

prespeciﬁed set and can be coded either as variable (

𝑏

) or ﬁxed (

𝑐

) length individuals. A supernet

is created with the largest values and subsumes the entire search space. Good tradeoﬀs between

performance and other objectives are then found in this space using the NSGA-II multiobjective

search method. A surrogate model, trained with a sample of architectures in this space, is used to

guide the search, and the trained supernet to initialize the weights of the candidates. The approach

can ﬁnd architectures that perform better or similar to standard architectures, and are smaller, with

signiﬁcantly less training. Figure from Z. Lu, Deb, Goodman, et al. (2020).

a set of diﬀerent architectures is created and evaluated ahead of time, the model trained to

map architecture descriptions to performance, and then used to predict the performance of

new solutions. Several such benchmark collections have already been created, and they

can ser ve as a catalyst for studying NAS methods in general (Dong and Y. Yang, 2020;

Ying, Klein, Christiansen, et al., 2019; Zela, Siems, Zimmer, et al., 2022).

Another way of making NAS practical is to limit the search space. The Amoeba

method (section 10.3.2) already took advantage of it by optimizing the variations of

a repetitive structure. In a more extreme approach, a supernet is ﬁrst created, i.e. a

large network that consists of the entire search space, including all possible layers, their

variations, and connections between them (Cha, T. Kim, Lee, et al., 2023; Chebykin,

Alderliesten, and Bosman, 2022; Fernando, Banarse, Blundell, et al., 2017). The supernet

is then trained in the task (at least partially). It then serves as a starting point for creating

candidates during search, providing the search space and initial evaluations. This approach

makes sense if the goal is not just to ﬁnd the best-performing network (for which the

supernet itself might be the best choice), but at the same time, achieve other objectives

like minimizing the size of the solutions.

Several of these ideas were implemented in the MSuNAS approach, where the

NSGA-II multiobjective optimization method was adapted to NAS of convolutional

image-processing networks (Figure

10.11; Z. Lu, Deb, Goodman, et al., 2020). The

search space was restricted to networks with ﬁve computational blocks with four design

parameters, i.e. the number of layers, the number of channels, the kernel size, and the

input resolution, each with a predetermined range. A supernet was created by setting each

of these parameters at their maximum values; thus all other candidates in the search space

273

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

were enclosed in it. A surrogate model was trained with 2000 randomly sampled networks

in this space. Each network was trained for 150 epochs on CIFAR-10, CIFAR-100, and

ImageNet, and evaluated with 5,000 unseen images. The supernet was trained in this task

as well, and its weights were used to initialize the candidates during search.

The approach found solutions that represented useful tradeoﬀs in this domain. The

most accurate architectures performed as well or better than standard architectures, and

many of them were much smaller as well. The surrogate modeling approach resulted

in several orders of magnitude faster learning. These results suggest that NAS can be a

practical and useful technique in searching for variations in a limited search space.

Sometimes such methods are called one-shot methods, because the supernet is trained

to represent the entire search space. The more general approach consists of black-box,

or zeroth-order, methods, where the search space is open-ended (such as CoDeepNEAT

described in section 10.3.2). Such methods have more potential for discovery, but it is

more diﬃcult to make them eﬃcient and therefore take advantage of them.

Intermediate approaches may provide a good tradeoﬀ. For instance, it is possible

to limit NAS to traditional convolutional networks only, i.e. those with a number of

convolutional and pooling layers followed by a number of fully connected layers (as

opposed to very deep networks with many skip connections such as ResNet or DenseNet).

Such a limited search space allows customizing many aspects of the NAS process, making

it eﬃcient.

In one such approach, EvoCNN (Sun, Xue, M. Zhang, et al., 2020), it was possible

to design a variable-length representation for the architecture that allows networks of

variable sizes to be represented systematically and compactly. The population could then

be initialized as a random sample of such architectures, instead of minimal networks,

providing for a more comprehensive search process. On the other hand, the number of

parameters was used as a ﬁtness component during evolution, favoring smaller networks,

thus making sure that the complexity that was there actually mattered. Weight initialization

was also included as part of the representation as mean and standard deviation values for

sets of connections. As is well-known in deep learning (and discussed in more detail below),

good initialization makes it more likely that the architecture performs as well as it can,

resulting in more consistent and fair evaluations. Genetic operators were then designed to

operate eﬃciently on such architectures. With these customizations, EvoCNN performed

better than other hand-designed traditional CNN architectures. Also interestingly, the

evolved initialization performed better than standard initialization methods, such as Xavier

(Glorot and Y. Bengio, 2010).

Part of why fully general (zeroth-order) methods are challenging to design is because

it is diﬃcult to implement even basic evolutionary search, i.e. crossover. The architectures

are usually represented as graphs, and they suﬀer from the permutation problem (or

competing conventions problem): the same functional design can be coded in several

diﬀerent ways simply by changing the order of elements in it. The permutation problem

makes crossover ineﬀective, which is why most black-box methods rely only on mutation.

As a matter of fact, the same issue exists in many other areas of evolutionary

computation, to the extent that the entire validity and usefulness of crossover is sometimes

called into question (Qiu and Miikkulainen, 2023). Yet, biology utilizes crossover very

274

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

eﬀectively, creating solutions that are viable and creative (section 9.1.1). This observation

suggests that perhaps we do not understand crossover very well, and our implementations

of it are lacking something.

Interestingly, NAS can be used as a domain to gain insight into the general problem

of what makes crossover useful (Qiu and Miikkulainen, 2023). Two architecture repre-

sentations can be compared through graph edit distance (GED), measuring how many

modiﬁcations are necessary to transform one into the other. This metric can then be used

to construct a crossover operator that results in individuals that lie along the shortest

edit path (SEP) between them. It turns out that theoretically the expected improvement

from the SEP crossover is greater than the improvement from local search (i.e. mutation),

from standard crossover, and from reinforcement learning. These theoretical conclusions

can be demonstrated numerically, as well as in practical evaluation in various NAS

benchmarks: They converge to optimal architectures faster than other methods, even with

noisy evaluations.

Thus, crossover can be a useful tool in NAS if implemented in the right way. More

generally, if evolutionary computation is not using crossover, it is probably leaving money

on the table.

Several other useful tools were initially developed with NAS in mind, but have proven

valuable in neuroevolution, evolutionary computation, and neural networks more broadly.

An important one is to initialize the networks in a proper way before training (Bingham

and Miikkulainen, 2023a). In deep learning, a fundamental challenge is that the signals

(activation and gradients) may vanish or explode. If the network weights are initialized so

that the activation stays within reasonable bounds, training is more likely to be successful.

In NAS, this means that the evaluation of the candidate is more reliable, making the

search more eﬀective. The initialization can be done in various ways and customized

to speciﬁc activation functions, topologies, layers, and even data. However, there is a

general principle that works well in most cases: Setting the weights of each layer so that

the outputs have zero mean and unit variance.

In a method called AutoInit, such weight initialization was derived for the most common

layer types (Bingham and Miikkulainen,

2023a). Experimentally, AutoInit resulted in faster

and more reliable convergence for convolutional, residual, and transformer architectures,

various hyperparameter settings, model depths, data modalities, and input sizes. It was

also shown to be particularly useful in meta-learning of activation functions, and in NAS.

When implemented in CoDeepNEAT, it adapted to each candidate’s unique topology and

hyperparameters, improving its performance in several benchmark tasks. As expected,

much of this improvement was due to reduced variance in evaluations. However, AutoInit

also allowed utilizing a broader set of hyper parameter values and topologies. Some such

solutions are diﬃcult to train properly and only perform well with proper initialization.

Thus, intelligent initialization makes it possible for NAS to ﬁnd more creative solutions as

well.

Ultimately, NAS methods need to r un on parallel hardware and utilize such computation

well. Like all evolutionary algorithms, NAS is well suited for such hardware because

candidate evaluations can be performed at diﬀerent compute nodes. However, evaluation

times can sometimes be very long and vary signiﬁcantly. It is therefore important that such

275

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

Evaluation

Queue

Generate

K initial

individuals

M individuals

submitted

M individuals

returned with

fitnesses F

R distributed

compute

workers

Server

M evaluated

individuals

L elites

Selection

Mutation,

crossover

children

Update

elites

(𝑎) Evolving individual encodings

Evaluation

Queue

Disassembly

from M networks

Assembly into

M networks

M assembled

networks

submitted

M assembled

networks returned

with fitnesses F

R distributed

compute

workers

Species 1

Server

Species 1

species, each with L

% elites;

total blueprints

Species 1

species, each with L

% elites;

total modules

Selection

Mutation,

crossover

Species

update

Selection

Mutation,

crossover

Species

update

K initial

assembled

networks

Blueprint Population

Module Population

(𝑏) Coevolving hierarchical encodings

Figure 10.12: Asynchronous evaluation of individual and coevolutionary encodings. One

challenge in parallelizing the evaluation of neuroevolution candidates is that the evaluation times

may vary. Therefore, instead of evaluating an entire generation of candidates synchronously

before generating new ones, candidates are placed in a queue and evaluated as soon as compute

nodes become available. In this manner, compute nodes are never idle and evaluation can be

sped up signiﬁcantly. (

𝑎

) With encodings that represent the entire solution, the population

and elites are maintained as usual, and evolution progresses in batches of

𝑀

individuals. (

𝑏

)

With coevolutionary encodings such as CoDeepNEAT, the individuals are created and ﬁtness is

distributed among participating blueprint and module populations. The process favors individuals

with short evaluation times, which means that

𝑀

needs to be larger when those times vary a lot.

However, the speedup is also larger than, e.g. 14-fold for CoDeepNEAT. The bias towards networks

that evaluate fast is also beneﬁcial in NAS, resulting in more desirable solutions as a surprising

side beneﬁt Figures from J. Liang, Shahrzad, and Miikkulainen (2023).

evaluations are asynchronous: The nodes should not sit idle waiting for other candidates in

a generation to ﬁnish their evaluations, but should take on other evaluations immediately

(J. Liang, Shahrzad, and Miikkulainen, 2023).

Asynchronous evaluation, therefore, is based on an evaluation queue rather than

generations (ﬁgure

10.12). Individuals are created and evaluated, and the elite set is

updated continuously. While several such implementations exist already (including rtNEAT

discussed in section 8.1), the approach is more complex with more sophisticated NAS

methods that take advantage of structure. For instance with CoDeepNEAT, individuals

exist at the level of modules and blueprints, and both populations are speciated into

subpopulations with their own elites. Thus, there are several evolutionary processes going

on at the same time. When an assembled network is evaluated, the resulting ﬁtnesses are

incorporated into these processes asynchronously.

Note that although there are no generations, the evolutionary processes still need

to progress in batches. That is,

𝑀

individuals need to be evaluated and their ﬁtnesses

propagated to the current populations before another

𝑀

can be generatedÐeven though the

individuals may have diﬀerent ancestries and, in a sense, belong to diﬀerent generations.

As usual in evolution, the batch size

𝑀

needs to be optimized for each problem, balancing

276

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

the time used for evaluation and for search, i.e. how much evaluation noise can be tolerated.

However, with variable evaluation times, batch evaluations establish a search bias: Those

candidates that evaluate faster are more likely to be included in the batch, and thus more

likely to reproduce. Thus, in domains where the evaluation times are relatively uniform,

𝑀

can be small, and search proceeds faster. However, if the times vary signiﬁcantly,

𝑀

needs to be larger so that evolution is based on more diverse candidates.

In NAS, such a bias is fortunately not a problem. The speedup from asynchrony

increases more with variable evaluation times than the handicap from diversity. For

instance in designing sorting networks, where the times are relatively similar, asynchronous

search ﬁnds solutions twice as fast as synchronous search. In CoDeepNEAT, where the

times vary a lot, the speedup is 14-fold. Moreover, a bias towards faster networks is

desirable in any case. Even if it is not an explicit secondary objective, smaller networks

that evaluate faster are preferred over complex networks. In this sense, asynchronous

evaluation provides an advantage not only in speed, but quality of solutions as well.

10.6 Beyond Neural Architecture Search

While NAS is still work in progress, already many interesting and useful ideas have

stemmed from the ﬁeldÐideas that have impacted other subﬁelds of AI. As was discussed

in section 10.2, one of the main limiting factors of NAS is the two-stage optimization

process: One must search for the architecture in the outer loop, and spend a lot of

computation in the inner loop to train each model. However, it turns out that the inner loop

may not be as crucial in identifying good architectures as initially thought. Given that

NAS mostly focuses on optimizing architectures with known, powerful building blocks, it

may be possible to predict their performance without training them. A surrogate model

can be trained based on a benchmark dataset of architectures and their performance for this

task. Or, a hypernetwork can be used to predict the weights, making it possible to evaluate

and rank candidates without having to train them (Brock, T. Lim, Ritchie, et al., 2018).

In the extreme, it turns out that even randomly initiated CNNs (Ulyanov, Vedaldi, and

Lempitsky, 2018) and LSTMs (Schmidhuber, Wierstra, Gagliolo, et al., 2007) have useful

properties without any training. This leads to an important question: How important are

the weight parameters of a neural network compared to its architecture? An approach

called weight agnostic neural networks (WANNs; Gaier and Ha, 2019) evaluated the extent

to which neural network architectures alone, without learning any weight parameters, can

encode solutions for a given task. The basic idea was to apply a simple topology search

algorithm, NEAT, but explicitly make the weights random. To evaluate these networks,

the connections were instantiated with a single shared weight parameter sampled from a

uniform random distribution, and the expected performance was measured over multiple

such instantiations. It turned out that WANNs could perform several reinforcement

learning tasks, and achieved much higher than chance accuracy on supervised tasks such

as the MNIST classiﬁcation (ﬁgure 10.13). This result suggests that NAS alone may be

suﬃcient to solve some problems without any gradient descent. Indeed, in many biological

species the young are already proﬁcient in many survival tasks without any learning; NAS

with random weights can be seen as an approximation of this process.

277

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

(𝑎) Bipedal walking (𝑏) Race-car driving

(𝑐) Recognizing handwritten digits in MNIST

Figure 10.13: Solving problems with NAS alone without gradient descent. In the WANN

approach, network architectures are evolved with a shared random value for weights. Surprisingly,

without any gradient descent, they can solve reinforcement learning tasks such as bipedal walking

and driving, and perform competently (at 94%) in MNIST handwritten digit classiﬁcation. The

diagram on the left side of (

𝑐

) is part of an interactive demo that shows which parts of the input and

network are used to classify diﬀerent digits. WANN networks can be seen as a model of precocial

performance in many animal species, where newborn individuals already perform well in a number

of tasks necessary for survival without any experience or learning. For interactive demos, see

https://neuroevolutionbook.com/demos. Figures from Gaier and Ha (2019).

A complementary direction is to not only evolve architectures from scratch but also to

transfer and analyze knowledge across tasks. Recent work on evolutionary NAS (Assunção,

Lourenço, Ribeiro, et al., 2021) shows that incremental transfer learning can signiﬁcantly

reduce the search cost by reusing layers, learning rules, and optimizers from previous tasks.

Importantly, this process can be studied through search trajectory networks (Ochoa, Malan,

and Blum, 2021; Sarti and Ochoa, 2021), which provide a graph-based visualization

of how architectures mutate, converge, and inherit components. These analyses reveal,

for example, that convolutional and dropout layers tend to be consistently reused, while

pooling layers are often discarded. Such insights highlight how evolutionary NAS not only

discovers eﬀective architectures but also builds interpretable trajectories of architectural

knowledge, bringing it closer to how biological evolution reﬁnes innate structures over

generations.

Another compelling direction is to develop methods that discover the building blocks

as well. They can be seen as components of neural network architectures that have an

appropriate inductive bias for a variety of tasks. This approach is motivated by how

278

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

biological evolution works, in that individuals are not born with simply a blank slate

neural network to be trained using gradient descent, but one that already implements a

wide variety of useful innate behaviors that also impact their development. To quote Tony

Zador, a computational neuroscientist (Zador, 2019): “The ﬁrst lesson from neuroscience

is that much of animal behavior is innate, and does not arise from learning. Animal brains

are not the blank slates, equipped with a general-purpose learning algorithm ready to

learn anything, as envisioned by some AI researchers; there is strong selection pressure

for animals to restrict their learning to just what is needed for their survival.”

Ideas have also emerged on how to move back from designing large deep learning

architectures to optimizing such architectures entirely with evolution, including their

weights. For instance, indirect encodings, such as HyperNEAT, can be used to optimize a

very large number of weights by sampling the substrate more densely. In a more direct deep

neuroevolution approach (which we reviewed in section 4.2.2), deep network weights are

represented compactly as a list of random number seeds: One for the initialization of the

network and the rest for the random mutations that construct the network (Petroski Such,

Madhavan, Conti, et al., 2017). Another approach is based on ant colony optimization:

The ants traverse the architecture space from input to output, and the network is constructed

based on their paths. Architectures of any size can be constructed in this manner, and the

paths can include a weight dimension as well (ElSaid, Ricanek, Lyu, et al., 2023).

Many other promising ideas have emerged from the NAS ﬁeld. Rather than searching

for architecture, researchers have applied similar methods to search for better loss

functions, activation functions, learning methods, and data augmentation methods. These

optimizations are highly relevant even when network architectures have largely converged

on a few best designs, such as transformers. Such approaches will be discussed in

more detail in the next chapter, where we go beyond optimizing neural architectures to

optimizing the general design of neural networks.

In the long term, an interesting question is: what would it take to discover entirely new

architectures, based on new principles? For instance, how could NAS have discovered

transformers? Beyond simply scaling up with repetition, a search for appropriate

mathematical operations on internal representations would have been needed. A challenge

is that such a search space may be deceptive (as was discussed in the context of discovering

cognitive behaviors in section 6.3.2), and therefore mechanisms for neutral mutations,

weak selection, large populations, speciation, and deep time may be needed. Further,

could such approaches discover something more powerful than transformersÐfor instance

neural network architectures that know what they know, and networks that can perform

logical reasoning? It may be possible to incorporate biological processing principles of

feedback, adaptation, memory, and attention, and they could then lead to the discovery of

metacognitive abilities. Or it may be possible to include meta-level computing primitives

that allow networks to observe and act upon their own processes. In addition to the

technical challenges, it will be challenging to evaluate such abilities because they no

longer reduce to simple performance numbers. Such research has only now begun, and

may indeed drive the development of the next level of more powerful AI architectures.

279

CHAPTER 10. EVOLUTIONARY NEURAL ARCHITECTURE SEARCH

10.7 Chapter Review Questions

NAS Approaches: What are the primary methods used in Neural Architecture

Search (NAS) to automate the design of neural network architectures? Why is

evolutionary optimization particularly well-suited for this task?

Backprop NEAT: How does Backprop NEAT combine NEAT topology search

with backpropagation? What role do activation function diversity and ﬁtness

regularization play in improving the evolved networks?

Feature Discovery: In the context of Backprop NEAT, how does the algorithm

discover features that are typically engineered manually, such as those required for

classifying concentric circles or XOR data?

CoDeepNEAT: How does the CoDeepNEAT approach leverage modular evolution

to discover neural architectures? What advantages does its blueprint-module

coevolution provide compared to evolving full architectures directly?

AmoebaNet Contributions: What innovations in AmoebaNet’s evolutionary

process enabled it to achieve state-of-the-art performance in ImageNet? How did

these innovations improve the eﬃciency and accuracy of the NAS process?

Multiobjective Optimization: How does multiobjective NAS diﬀer from single-

objective NAS? What advantages does it oﬀer when deploying neural networks in

resource-constrained environments?

Pareto Fronts: Explain the concept of Pareto fronts in the context of NAS. How

are they used to optimize trade-oﬀs between objectives such as model accuracy and

size?

Multitask Learning: What are the beneﬁts of using NAS to discover architectures

for multitask learning? How do alternative designs (e.g., single-column vs. complex

topologies) address diﬀerences between tasks?

Module and Topology Co-Evolution: In multitask NAS, how does the co-evolution

of module structures and task-speciﬁc topologies (e.g., in CMTR) enhance learning

across tasks with limited data?

10.

NAS Eﬃciency: What strategies, such as surrogate modeling and supernets, have

been developed to make NAS computationally practical? How do they maintain

eﬀectiveness while reducing search costs?

280

Chapter 11

Optimization of Neural Network

Designs

Similarly to neural network architectures, the general design of neural networks can

beneﬁt from complexity beyond human ability to optimize them. This chapter reviews

opportunities for such optimization, also called meta-learning. The general motivation for

designing learning systems through automated search is ﬁrst discussed, and a compelling

example is given in bilevel neuroevolution, i.e. optimizing the neuroevolution mechanisms

through evolution. Several aspects of supervised neural network design are amenable to

meta-learning, including loss functions, activation functions, data augmentation, and the

learning methods themselves, leading to potential synergies. Neuromorphic systems, where

neural network architectures are optimized for and potentially together with hardware, are

a particularly promising application for these neuroevolution techniques.

11.1 Designing Complex Systems

Many areas of technical design are too complex for humans to optimize, and automated

methods must be used instead. VLSI design has long relied on machine optimization, but

other areas of engineering are starting to rely on it as well. The systems have become larger,

with many interacting elements, and several simultaneous performance goals. The sheer

dimensionality and size of the search space are too large to handle without an automated

search.

Evolutionary optimization is particularly well-suited to such scaling. In some cases,

like designing circuitry for a 70-bit multiplexer, it was possible to ﬁnd solutions in a space

with

potential solutions. While it is hard to imagine a space that large, consider that if

that number of potential solutions was printed on paper with a 10pt font, it would take

light 95 years to travel from the beginning to the end of the number (Miikkulainen, 2021).

In others, like designing an optimal schedule for metal casting, there are variables for

each type of object in each melting heat, and there may be tens of thousands of heats,

resulting in a billion variables (Deb and Myburgh, 2017). Such scaling is possible because

the population can discover partial solutions that can then be used as stepping stones to

construct more complete ones, thus requiring exploration of only a fraction of the space

281

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

and combinations of dimensions.

On the other hand, sometimes the scale is not the main problem, but complexity is:

Problems can have nonlinear interactions and even be deceptive so that good solutions are

overlooked. It is not just that search needs to be automated, but it should be intelligent

enough to handle deception, such as evolutionary search. For instance, the original

nose-cone of the Shinkansen bullet train was long and sleek, with great aerodynamics, but

it created a bang when going into a tunnel. In the next version, the engineers wanted to

eliminate the bang, but it was diﬃcult to do so by hand. However, they were eventually

able to do so by harnessing evolutionary optimization: a cone with deep grooves on

both sides (Ishida Lab, 2018). It was unconventional and unlikely to be discovered by

human engineers, but it got the job done. Similarly, evolution discovered that it may be

advantageous to keep the lights on 24 hours in computer-controlled greenhouses: Basil

doesn’t need to sleep (Miikkulainen, 2021). Further, webpage designs were found that

violated well-known design principles with garish colors and active language, yet they

were more eﬀective in engaging users: What the human designers referred to as an ługly

widget generatorž actually beat their design by 45% (Miikkulainen, Brundage, Epstein,

et al., 2020).

Similar stories abound in all areas of engineering, from drug design and medical

treatments to programming and autonomous control (see e.g. Lehman, Clune, Misevic,

et al., 2020, for examples). As a matter of fact, the annual human-competitive results

competition (łHumiesž) at the GECCO Conference has showcased hundreds of such

approaches since 2004 (Goodman, 2025).

This insight applies to neuroevolution as well. While so far in this book, evolution has

been used to optimize the network itself, i.e. its topology and weights, any aspect of the

design can be evolved. Opportunities include the overall architecture, activation functions,

loss functions, data augmentation, lear ning mechanisms, and even the neuroevolution

optimizer itself. As a result, the networks can perform more accurately, generalize better,

and/or use fewer resources than those designed by hand. Collectively, these approaches

are called meta-learning, which is the topic of this chapter.

11.2 Bilevel Neuroevolution

Several examples of neuroevolution discovering complex and robust behavior were

reviewed in chapter

6. Indeed, many such domains include a large number of variables

that interact nonlinearly, making it diﬃcult to design control algorithms using traditional

methods. While neuroevolution can often be used eﬀectively to construct robust controllers,

it is still crucial to get the parameter settings right. Most often, the experiments require a

painstaking search in the space of learning parameters, such as mutation and crossover

rates and extent, population size, elite percentage, number of stochastic evaluations, etc.

There are many such parameters and they interact nonlinearly, making the usual grid

search of possible combinations ineﬀective.

An elegant and compelling solution is to use bilevel evolution to optimize the

parameters (J. Liang and Miikk ulainen, 2015). That is, the optimization process is deﬁned

282

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

in terms of two nested problems (ﬁgure 11.1𝑎):

maximize

𝑝

𝑢

𝐹

𝑢

(𝑝

𝑢

) = 𝐸 [𝐹

𝑙

(𝑝

𝑙

)|(𝑝

𝑢

)] (11.1)

subject to 𝑝

𝑙

= 𝑂

𝑙

(𝑝

𝑢

), (11.2)

where

𝐸 [𝐹

𝑙

(𝑝

𝑙

)|𝑝

𝑢

]

is the expected performance of the neural network with parameters (i.e.

weights)

𝑝

𝑙

, obtained by the lower-level optimization algorithm

𝑂

𝑙

(i.e. neuroevolution)

with parameters

𝑝

𝑢

, which are in turn maximized by a separate upper-level optimization

algorithm 𝑂

𝑢

Bilevel evolution is a special case of meta-evolutionary EAs (MEAs; Eiben and Smit,

2011; Grefenstette, 1986; Sinha, Malo, Xu, et al., 2014) where evolution is used to optimize

algorithms oﬄine. It is related to self-adaptive EAs where evolutionary parameters are

adjusted online depending on progress in the optimization (Kramer, 2010; Kumar, B. Liu,

Miikkulainen, et al., 2022). In its most straightforward form, each ﬁtness evaluation of

each high-level individual

𝑝

𝑢

requires running an entire neuroevolution experiment. The

crucial idea of bilevel optimization is to estimate the ﬁtness of

𝑝

𝑢

without having to run

such an experiment every time. In essence, the idea is the same as surrogate optimization

for decision-making, discussed in section 6.4.2. Each run of a neuroevolution experiment

can be considered as a sample, and a predictor model lear ned to approximate the ﬁtness

landscape. The upper-level search can then be done mostly against the surrogate, with

only occasional neuroevolution experiments needed.

A simple approach is to ﬁt e.g. a quadratic function to these samples (Sinha, Malo,

Xu, et al., 2014). A more complex one is to train a random forest or a neural network, as

was done in section 6.4.2: Such models are nonparametric, i.e. more general, and less

prone to overﬁtting. Forming the surrogate is still diﬃcult because there are usually very

few samples and they are noisy. One way to deal with this problem is to construct the

ﬁtness

𝐹

𝑢

from multiple metrics over several neuroevolution runs with

𝑝

𝑢

, including best

and average ﬁtness and standard deviation, diversity of the population, and the shape of

the learning curve. In eﬀect, the idea is to predict the eventual performance of

𝑝

𝑢

after

prolonged evolution, and to take into account the reliability of this estimate.

To see the value of bilevel optimization, consider e.g. the benchmark task of evolving

a neural network for helicopter hovering. The goal is to keep the helicopter as close as

possible to a point in 3D space in windy conditions, with 12 state variables (coordinates,

angles, velocities) as the input, and four action variables (aileron, elevator, rudder, and

rotor pitch) as the output. The task is diﬃcult because there are many variables that

interact, their values are noisy, and the domain is unstable. However, neuroevolution can

solve it with a careful hand-tuning of eight evolutionary parameters: mutation probability,

rate, amount, replacement rate, and fraction, population size, crossover probability, and

crossover averaging rate (Koppejan and Whiteson, 2011). Remarkably, such hand-tuning

still leaves money on the table: by optimizing the parameter further with bilevel evolution,

it is possible to evolve solutions that perform signiﬁcantly better, both by learning faster

and achieving better ﬁnal accuracy (ﬁgure 11.1

𝑏

). Also, using a good surrogate is

crucial: while using a random forest surrogate improves bilevel optimization signiﬁcantly

compared to not using a surrogate, quadratic ﬁtting is too unreliable and actually decreases

performance.

283

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

(𝑎) Bilevel neuroevolution

(𝑏) Improvement over human ﬁne-tuning

in the helicopter hovering task

(𝑐) Improvement with more parameters

in the double pole balancing task

Figure 11.1: Enhancing neuroevolution with bilevel optimization. Neuroevolution performance

depends crucially on a proper setting of its hyperparameters. They can be evolved as part of the

optimization process, resulting in bilevel neuroevolution. (

𝑎

) More speciﬁcally, neural networks

with parameters (weights)

𝑝

𝑙

are evolved using a low-level neuroevolution algorithm

𝑂

𝑙

with

parameters

𝑝

𝑢

. The

𝑝

𝑢

are in turn optimized with an upper-level MEA algorithm

𝑂

𝑢

. The

expected ﬁtness

𝐹

𝑙

(𝑝

𝑙

)|𝑝

𝑢

is taken as the ﬁtness of

𝑝

𝑢

. In this manner, the neuroevolution

process can be optimized automatically, which makes it possible to solve harder problems with

it. (

𝑏

) Neuroevolution with eight hand-tuned evolution parameters (HNE) is successful in

the helicopter hovering task, but when those same parameters are optimized at the same time

through bilevel evolution (HNE

), better solutions are found faster. In this manner, bilevel

evolution can be harnessed to improve upon human design of neuroevolution experiments. (

𝑐

) The

cumulative success of neuroevolution with ﬁve hand-tuned evolutionary parameters (PNE), ﬁve

bilevel-optimized parameters (PNE

), and ﬁfteen bilevel-optimized parameters (PNE

) in the

double pole balancing task. More parameters allow bilevel evolution to develop a more powerful

neuroevolution parameterization, resulting in faster discovery of solutions. Therefore, when

bilevel optimization is available, it is better to make the neuroevolution method more ŕexible and

conﬁgurable, even beyond human ability to optimize. For animations in helicopter hovering, see

https://neuroevolutionbook.com/demos

. Figures from J. Liang and Miikkulainen (2015).

A common rule of thumb is that humans can take into account seven +/- two variables

at once, which is well in line with the helicopter hovering result. However, with bilevel

evolution, it may be possible to increase the number of variables signiﬁcantly. Would such

an extension result in better performance? For instance in the standard benchmark task

of double pole balancing, it is common to specify the values of ﬁve parameters by hand:

mutation rate and amount, replacement fraction, initial weight range, and population size.

284

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

There are, however, many other parameters that could be included, such as 1-pt, 2-pt, and

uniform crossover probability, tournament, truncation, and roulette selection probability,

etc. They are not strictly necessary to parameterize an eﬀective neuroevolution experiment,

but they do make it possible to establish a more complex search.

It turns out such extra customization pays oﬀ signiﬁcantly. It is much faster to

ﬁnd solutions when 15 evolutionary parameters are optimized rather than only ﬁve

(ﬁgure 11.1

𝑐

). This is an important result because it suggests that bilevel optimization

changes how we should think about problem-solving. Simple methods may be easy to

understand for people, but when they can be optimized automatically, it is better to make

the method more ŕexible and conﬁgurable, even beyond human ability. Such complexity

translates to better performance through bilevel optimization.

As more compute becomes available, bilevel optimization is likely to become an

increasingly important element of neuroevolution. It can also be extended in several

ways. For instance, instead of ﬁxed parameters

𝑝

𝑢

, it may be possible to discover

parameter adaptation schedules that change the parameters during the course of individual

neuroevolution runs, similarly to self-adapting EAs. They may themselves take the form

of a neural network that obser ves the performance of the run and outputs optimal current

parameters as its output. While the designs of neuroevolution algorithms have naturally

focused on compact and parsimonious methods, it may be possible to design them with

bilevel optimization in mind, which means creating many more conﬁguration parameters,

and thus take advantage of the power of expanded optimization. Also, better surrogate

modeling techniques can be developed, perhaps by utilizing knowledge of the domain,

benchmark collections, and methods for estimating ﬁtness in neural architecture search.

While bilevel neuroevolution focuses on optimizing the evolution method, the approach

can be extended to optimizing other machine learning methods as well. Section 12.2.3

discusses MAML, a similar approach applied to starting parameters in reinforcement

learning. The next section focuses on optimizing designs for supervised training of neural

networks.

11.3 Evolutionary Meta-learning

With supervised neural networks, several design aspects beyond the architecture (topic of

chapter 10) must be conﬁgured appropriately as well. Those include learning hyperpa-

rameters (such as the learning rate), activation functions, loss functions, data sampling

and augmentation, and learning methods. Approaches similar to those used in NAS can

be applied to them; however, the evolutionary approach has an advantage in that it is the

most versatile: It can be applied to graphs, vectors of continuous and discrete parameters,

and conﬁguration choices. This ability is particularly useful as new architectures are

developed. For instance, at this writing, work has barely begun on optimizing designs

of transformer (Vaswani, Shazeer, Parmar, et al.,

2017) or diﬀusion (Sohl-Dickstein,

E. Weiss, Maheswaranathan, et al.,

2015) architectures. They have elements such as

attention modules, spatial embeddings, and noise transformations that are diﬀerent from

prior architectures, yet they may be parameterized and evolved as well to optimize their

implementation. Most importantly, evolution can be used to optimize many diﬀerent

285

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

aspects of the design simultaneously, discovering and taking advantage of synergies

between them. Several such approaches are reviewed in this section.

11.3.1 Loss functions

Perhaps the most fundamental is the design of a good loss function. Mean-squared-error

(MSE) loss has been used for a long time, and more recently, cross-entropy (CE) loss has

become popular, especially in classiﬁcation tasks. Both of those assign minimal loss to

outputs that are close to correct, and superlinearly larger losses to outputs further away

from correct values. They make sense intuitively and work reliably, so much so that

alternatives are not usually even considered.

However, it tur ns out that it is possible to improve upon them in a surprising way that

would have been diﬃcult to discover if evolution had not done it for us (Gonzalez and

Miikkulainen, 2020; Gonzalez and Miikkulainen, 2021). If outputs that are extremely

close to correct are penalized with a larger loss, the system learns to avoid such extreme

outputsÐwhich minimizes over ﬁtting (ﬁgure 11.2

𝑎

). Such loss functions, called Baikal

loss for their shape, lead to automatic regularization. Regularization in turn leads to more

accurate performance on unseen examples, especially in domains where the amount of

available data is limited, as is the case in many real-world applications.

Baikal loss was initially discovered with a classic genetic programming approach

where the function was represented as a tree of mathematical operations (Gonzalez and

Miikkulainen, 2020). The structure of the tree was evolved with genetic algorithms, and

the coeﬃcients in the nodes with CMA-ES (Hansen and Ostermeier, 2001). This approach

is general and creative in that it can be used to explore a large search space of diverse

functions. However, many of those functions do not work well and are often unstable. In

the follow-up TaylorGLO method (Gonzalez and Miikkulainen, 2021), the functions were

represented instead as third-order Taylor polynomials. Such functions are continuous and

can be directly optimized with CMA-ES, making the search more eﬀective.

Regularization is an important aspect of neural network design in general. There

are many techniques available, such as dropout, weight decay, and label smoothing

(S. J. Hanson and Pratt, 1988; N. Srivastava, Hinton, Krizhevsky, et al., 2014; Szegedy,

Vanhoucke, Ioﬀe, et al., 2016), but how they work is not well understood. Loss-function

optimization, however, can be understood theoretically, and it thus provides a starting

point to understanding regularization in general (Gonzalez, Qiu, and Miikkulainen, 2025).

It can be described as a balance of two processes: a pull toward the training targets and a

push away from overﬁtting. This perspective leads to a practical condition for guiding the

search toward trainable functions.

Note that Baikal loss is a general principle; evolutionary optimization was crucial in

discovering it, but it can now be used on its own in deep learning. It is still possible to

customize it for each task and architecture, and even small modiﬁcations to the standard

Baikal shape may make a diﬀerence. Optimization may also have a signiﬁcant eﬀect on

various learning challenges, for instance when there is not much training data (Gonzalez,

Landgraf, and Miikkulainen, 2019), or when the labels are particularly noisy (B. Gao,

Gouk, and Hospedales, 2021). It may also be possible to modify the loss function during

learning, for instance by emphasizing regularization in the beginning and precision towards

286

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

(𝑎) Loss function proﬁles (𝑏) Performance with weight perturbation

Figure 11.2: Regularization and robustness with evolved loss functions. Surprising synergies

emerge when loss functions are evolved as part of the optimization process. (

𝑎

) The standard loss

function, such as log loss (or cross-entropy), has a high loss for outputs that are far from correct

(1.0 in this case) and a low loss otherwise. In contrast, evolutionary optimization of loss functions

through GLO/TaylorGLO (Gonzalez and Miikkulainen, 2020; Gonzalez and Miikkulainen, 2021)

discovered a new principle: When the output is very close to the correct one, a high loss is

incurr ed. This principle, termed Baikal loss for its shape, discourages overﬁtting, thus regularizing

the network automatically, leading to better generalization. Such a loss is eﬀective, but it is

counterintuitive and thus unlikely to be discovered by human designers. (

𝑏

) Baikal loss also makes

the network performance more robust. This eﬀect can be quantiﬁed by perturbing the network

weights. With Baikal loss, the network’s performance is less aﬀected than with cross-entropy

loss. This eﬀect can be further magniﬁed by making robustness against adversarial inputs an

explicit second objective in evolution. Thus, loss-function optimization can be used to improve not

just regularization but robustness as well. Figures from Gonzalez and Miikkulainen (2020) and

Gonzalez, Qiu, and Miikkulainen (2025).

the end (similarly to activation functions; section

11.3.2).

It turns out that loss functions that regularize also make networks more robust, and

this eﬀect can be further enhanced by including an explicit robustness goal in evolution

(ﬁgure 11.2

𝑏

). One way to create such a goal is to evaluate performance separately wrt.

adversarial examples. This result in turn suggests that loss-function optimization could

be an eﬀective approach to creating machine learning systems that are robust against

adversarial attacks.

Loss-function optimization can also play a major role in systems where multiple loss

functions interact, such as generative adversarial networks (GANs; (Gonzalez, Kant, and

Miikkulainen, 2023)). GANs include three diﬀerent losses: a discriminative loss for real

examples, a discriminative loss for fake examples, and a generative loss for fake examples.

It is not easy to get them right, and many proposals exist, including those in minimax,

nonsaturating, Wasserstein, and least-squares GANs (Arjovsky, Chintala, and Bottou,

2017; Goodfellow, Pouget-Abadie, Mirza, et al., 2014; Mao, Q. Li, Xie, et al., 2017).

Training often fails, for example resulting in mode collapse. However, the three losses

can be evolved simultaneously, using performance and reliability as ﬁtness. In one such

experiment on generating building facade images given the overall design as a condition,

287

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

the TaylorGLO approach resulted in better structural similarity and perceptual distance

than the Wasserstein loss (Gonzalez, Kant, and Miikkulainen, 2023). Although this result

is preliminary, it suggests that evolutionar y loss-function optimization may make more

complex learning systems possible in the future.

11.3.2 Activation Functions

Early in the 1980s and 1990s, sigmoids (and tanh) were used almost exclusively as

activation functions for neural networks. They had intuitively the right behavior as

neural models, limiting activation between the minimum and maximum values, a simple

derivative that made backpropagation convenient, and a theorem suggesting that universal

computing could be based on such networks (Cybenko, 1989; Hornik, Stinchcombe, and

H. White, 1989). There were indications, however, that other activation functions might

work better in many cases. Gaussians achieved universal computing with one less layer,

and were found powerful in radial basis function networks (RBFs; J. Park and Sandberg,

1991). Ridge activations also provide similar capabilities (Light, 1993).

However, with the advent of deep learning, an important discovery was made:

Activation functions made a big diﬀerence in whether the gradients vanished. In particular,

rectiﬁed linear units (ReLUs) were critical in scaling up deep learning networks (Nair and

Hinton, 2010). The linearly increasing region does not saturate activation or gradients,

resulting in less signal loss. Moreover, it turned out that in many cases, ReLU could be

improved by adding a small diﬀerentiable dip at the boundary between the two regions,

in a function called Swish (Ramachandran, Zoph, and Le, 2018). This result suggested

that there may be an opportunity to optimize activation functions, both generally and for

speciﬁc architectures and tasks.

Like loss functions, there is a straightforward opportunity to evolve activation functions

through genetic programming (Bingham, Macke, and Miikkulainen, 2020). Like loss

function optimization, such an approach can be creative, but it also results in many

functions that make the network unstable. A more practical approach is to limit the search

space to e.g. computation graphs of two levels, with a focused set of operators that are

more likely to result in useful functions. This approach was taken in the PANGAEA

system (Bingham and Miikkulainen,

2022). Given a list of 27 unary and seven binary

operators, two basic two-level computation graph structures, and four mutation operators,

evolution can search a space of over ten trillion activation functions.

However, ﬁnding an eﬀective function is only part of the challenge. The function also

needs to be parameterized to perform as well as possible. While coeﬃcients multiplying

each operator can be evolved together with the structure, it turns out that such ﬁne-tuning

can be done more eﬃciently through gradient descent. In other words, in PANGAEA,

evolution and gradient descent work synergistically: evolution discovers the general

structure of the function, and gradient descent ﬁnds its optimal instantiation.

The method is powerful in two ways: it ﬁnds general functions that perform better

than previous functions (such as ReLU, SeLU, Swish, etc.) across architectures (such as

All-CNN, Wide ResNet, Resnet, and preactivation Resnet) and tasks (such as CIFAR-10,

CIFAR-100). However, it is most powerful in discovering activation functions that are

specialized to architecture and task, apparently taking advantage of the unique requirements

288

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

Figure 11.3: Activation functions discovered over space and time. Activation functions are as

fundamental to network performance as its weights. PANGAEA (Bingham and Miikkulainen, 2022)

combines evolution of function structure synergistically with gradient descent of its parameters. It

is possible to discover general functions, but the approach is most powerful in customizing them to

a particular architecture and task. Moreover, the functions change systematically over lear ning

time as well as through diﬀerent depths of layers, presumably starting with coarse learning and

regularization and transforming into ﬁne-tuning and classiﬁcation. These results suggest a possible

duality with weight learning and a possible synergy for the future. Figure from Bingham and

Miikkulainen (2022).

in each such context.

Furthermore, performance can be further improved by allowing diﬀerent functions at

diﬀerent parts of the network, and at diﬀerent times throughout training (ﬁgure 11.3). The

optimal designs change continuously over time and space. Diﬀerent activation functions

are useful early in training, when the network learns rapidly, and late in training, when

ﬁne-tuning is needed; similarly, more nonlinear functions are discovered for later layers,

possibly reŕecting the need to form a regularized embedding early, and make classiﬁcation

decisions later.

The PANGAEA results suggest an intriguing duality: While neural network learning

is mostly based on adapting a large number of parameters (i.e. weights), perhaps a similar

eﬀect might be achieved by adapting the activation functions over space and time? Perhaps

the two mechanisms could be used synergistically? Evolution of the activation function

structure provides the foundation for this approach, which still needs to be fully developed.

Interestingly, the recently discovered Kolmogorov-Arnold networks (KANs Z. Liu, Y.

Wang, Vaidya, et al., 2025) are a step in this direction. Every weight parameter is replaced

by a univariate function such as a spline whose parameters are then learned. A natural

extension would be to evolve these functions using a mechanism such as PANGAEA,

making the search for good KAN networks more comprehensiveÐa compelling direction

for future work.

289

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

11.3.3 Data Use and Augmentation

Optimizing the training data is another signiﬁcant opportunity for evolutionary optimization

of supervised learning systems. For instance, it may be possible to form embeddings of

the training samples through an autoencoder and then form a strategy for utilizing diﬀerent

kinds of samples optimally through time (Gonzalez, Landgraf, and Miikkulainen, 2019).

In this manner, evolution could discover ways to balance an imbalanced dataset or to

design curricular learning from simple to more complex examples. Especially in domains

where not a lot of labeled samples are available, such techniques could result in signiﬁcant

improvements. It may also be possible to extend the methods to utilize multiple datasets

optimally over time in a multitask setting.

Another possibility is to evolve methods for augmenting the available data automat-

ically through various transformations. Diﬀerent datasets may beneﬁt from diﬀerent

transformations, and it is not always obvious ahead of time how they should be designed.

For instance, in an application to develop models for estimating the age of a person from

an image of their face, evolution was used to decide vertical and horizontal shift and

cutout, as well as a direction of ŕip operations, angle of rotation, degree of zoom, and

extent of shear (Miikkulainen, Meyerson, Qiu, et al.,

2021). Unexpectedly, it chose to

do vertical ŕips onlyÐwhich made little sense for faces until it was found that the input

images had been rotated 90 degrees! It also discovered a combination of shift operations

that allowed it to obfuscate the forehead and chin, which would otherwise be easy areas

for the model to overﬁt.

Given that datasets often contain a large number of variables, or features, a compelling

opportunity is to discover which features should be utilized in learning and which ones

should be left out. For instance, in the FS-NEAT method (Papavasileiou and Jansen,

2017; Whiteson, Stone, Stanley, et al., 2005), complexiﬁcation is used to select features

through connection mutations. The approach automatically determines an appropriate set

of inputs for the networks it evolves. The networks performed better, evolved faster, and

were smaller than regular NEAT networks e.g. in the CarRacing task. The approach can

also be instantiated as a general meta-learning method, i.e. evolution can be used to select

features for deep learning architectures that are then trained with gradient descent. This

approach has proven eﬀective e.g. in a currency trading task (Mańdziuk and Rajkiewicz,

2016).

A particularly interesting use for evolved data augmentation is to optimize not only

the accuracy of the resulting models, but also to mitigate bias and fairness issues with the

data. As long as these dimensions can be measured (S. Sharma, Henderson, and Ghosh,

2020), they can be made part of the ﬁtness, or separate objectives in a multiobjective

setting. Operations then need to be designed to increase the variance across variables

that might otherwise lead to bias through overﬁttingÐfor instance gender, ethnicity, and

socioeconomic status, depending on the application. While evolutionary data augmentation

is still new, this area seems like a diﬀerentiated and compelling opportunity for it.

290

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

Figure 11.4: Evolutionary discovery of learning methods. At the highest level, meta-learning

extends to the learning mechanisms themselves. In AutoML-Zero (Real, C. Liang, So, et al.,

2020), sequences of instructions for setup, prediction, and learning are evolved through mutation-

based regularized search. AutoML-Zero ﬁrst discovered simple methods such as linear models,

then several known extensions such as ReLU and gradient normalization, and eventually more

sophisticated techniques such as multiplicative interactions. The approach could be particularly

useful in customizing learning methods to diﬀerent domains and constraints. Figure from Real,

C. Liang, So, et al. (2020).

11.3.4 Learning Methods

An interesting extension of NAS is to evolve the learning system not from high-level

elements but from the basic algorithmic building blocks (mathematical operations, data

management, and ways to combine them)Ðin other words, by evolving code for supervised

machine learning. In this manner, evolution can be more creative in discovering good

methods, with fewer biases from the human experimenters.

The AutoML-Zero system (Real, C. Liang, So, et al., 2020) is a step towards this

goal. Given an address space for scalars, vectors, and matrices of ŕoats, it evolves setup,

predict, and learn methods composed of over 50 basic mathematical operations. Evolution

is implemented as a linear GP, and consists of inserting and removing instructions and

randomizing instructions and addresses. Evaluation consists of computing predictions

over unseen examples.

Starting from empty programs, AutoML-Zero ﬁrst discovered linear models, followed

by gradient descent, and eventually several extensions known in the literature, such as

noisy inputs, gradient normalization, and multiplicative interactions (ﬁgure 11.4). When

given small datasets, it discovers regularization methods similar to dropout; when given

few training steps, it discovers learning-rate decay.

Thus, the preliminary experiments with AutoML-Zero suggest that evolutionary search

can be a powerful tool in discovering entire learning algorithms. As in many meta-learning

291

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

approaches, the main power may be in customizing these methods to particular domains

and constraints. A crucial aspect will be to guide the evolution within the enormous

search space toward meaningful solutions, without hampering its ability to create, again a

challenge shared with most of meta-learning.

11.3.5 Utilizing Surrogates

While evolutionary meta-learning can discover more eﬀective neural network designs, it

is also challenging in three ways: It is computationally very expensive to evaluate all the

diﬀerent designs; it is diﬃcult to gain insight into what works; and it is not clear how the

search spaces should be deﬁned so that they are fast to search and contain good solutions.

One way to make progress toward meeting these challenges is to perform a full search

in as large a search space as possible, thus forming a benchmark dataset that makes it

possible to analyze what works. These insights may then be used to construct a surrogate

approach that makes it possible to search in larger spaces without having to evaluate

candidates through full training.

Such an approach, AQuaSurF, was demonstrated in the task of discovering eﬀective

activation functions (Bingham and Miikkulainen, 2023b). Based on the work described in

section 11.3.2, an exhaustive set of 2,913 diﬀerent activation functions was created from

a three-node computational graph of PANGAEA and tested on three architecture/task

settings, All-CNN/CIFAR-10, ResNet-56/CIFAR-10, and MobileViTv2-0.5/Imagenette.

Thus, they covered basic convolutional, residual, and transformer designs in the visual

domain. In each case, the networks were trained fully to evaluate how well each function

performed in the particular setting.

Most activation functions performed poorly, but a small number of functions performed

very well, conﬁrming that activation-function meta-learning is diﬃcult but also worthwhile.

Most interestingly, two trends were also observed: (1) There were clusters of functions

that performed well across architectures and tasks, representing reﬁnements of general

solutions; and (2) the very best performance in each setting was achieved by a few functions

that performed poorly in other settings, in other words, by activation functions that were

specialized to the architecture and task. This result suggests that meta-learning can be

most powerful when it is used to customize the designs to the particular problem.

The benchmark collection was then used to construct an eﬀective surrogate for full

network evaluations. It turned out that a combination of Fisher-information-matrix (FIM)

eigenvalues and the function shape is a powerful surrogate.

First, FIM quantiﬁes how much information the network parameters carry about the

data distribution, and thus serves as a characterization of network behavior. It has been used

in many studies to illustrate learning ability, generalization, robustness to perturbations, and

loss-function shape of neural networks (Jastrzebski, Arpit, Astrand, et al.,

2021; Karakida,

Akaho, and Amari, 2019; T. Liang, Poggio, Rakhlin, et al., 2019; Liao, Drummond, Reid,

et al., 2018). The information in FIM is represented compactly in its eigenvalues; there

are as many eigenvalues as there are network weights, but they can be binned into a

histogram of a lower dimensionality. The histogram vector then forms a computational

This dataset is available at https:// github.com/cognizant-ai-labs/act-bench.

292

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

(𝑎) Surrogate spaces (𝑏) Using the sigmoid

Figure 11.5: Utilizing surrogates to discover surprising activation functions. Surrogate

modeling can be used to evaluate activation function candidates without full training, making it

possible to search in larger spaces, which may result in more innovative solutions. (

𝑎

) UMAP

embeddings of the 2913 activation functions in the three benchmark settings (columns) in three

diﬀerent surrogate spaces: FIM eigenvalues (top row), function outputs (middle row), and both

(bottom row). UMAP is a dimensionality-reduction technique that preserves the structure of

high-dimensional spaces well, in this case 13692, 16500, and 11013 FIM eigenvalue histogram

dimensions and 1000 function output samples. Function performance is indicated by color coding.

Similar colors cluster best in the bottom row, suggesting that using both FIM and output features as

the surrogate space makes search for good functions the easiest. (

𝑏

) The best activation function

in the CoAtNet experiment turned out to be a sigmoid. The histograms indicate the values with

which it is activated in the network. At initialization (blue histogram), it is used similarly to ReLU;

after training (orange histogram), both saturation regions are used. This discovery suggests that

sigmoidal activations may be useful in speciﬁc situations, challenging the conventional wisdom in

deep learning. Figures from Bingham and Miikkulainen (2023b).

characterization of the network. Networks with diﬀerent activation functions have diﬀerent

such characterizations, and the space of these FIM-eigenvalue-histogram vectors can be

used as a surrogate search space for good activation functions.

However, the FIM also depends on other factors, including the architecture, loss

function, and data distribution, which makes it rather noisy. An additional surrogate

representation is useful in compensating for such noise: the shape of the activation function

itself. This shape can be represented as a sampling of activation function values for inputs

distributed as

N(0, 1)

, as they would be in a properly initialized network (Bingham and

Miikkulainen, 2023a). Using both FIM and output together form a powerful surrogate

(ﬁgure 11.5

𝑎

): functions that perform similarly are clustered together, making it easy to

search for good functions.

Indeed, the search for good activation functions was highly eﬀective in this surrogate

293

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

space. Even a simple search like

𝑘

-nearest neighbors regression could ﬁnd the best

functions quickly and reliably.

However, the surrogate approach also turned out to be eﬀective in activation opti-

mization beyond the benchmark settings in three ways. First, it scaled up to a much

larger search space of 425,896 functions for which the performance was not known,

as well as to the harder CIFAR-100 task with the same architectures. In each case, it

discovered new activation functions that performed better than any of the known functions

so far. Second, those discoveries also transferred to new settings: The best functions

performed better than any previously known functions on ResNet-50 on the full ImageNet

dataset. Thus, it is possible to discover good functions eﬃciently in smaller tasks and

then use them to improve performance in larger ones. Third, the approach also extended

to new architectures and baseline functions. For instance, the CoAtNet architecture is a

novel combination of convolutional and transformer networks (Z. Dai, H. Liu, Le, et al.,

2021b). When initialized with the best previously known activation functions and tested

on Imagenette (a smaller version of ImageNet), the approach outperformed all baselines.

Thus, the surrogate approach is a powerful way to optimize designs for new settings.

Interestingly, AQuaSurF achieved these results by balancing reﬁnement and novelty.

Many of the functions it discovered were similar e.g. to the well-known functions of ELU

and Swish, with minor changes to their shape. This result suggests that these are generally

good functions, but also that such customizations matter; AQuaSurF is well-equipped to

ﬁnd them.

However, in many cases, AQuaSurF also found designs that were very diﬀerent from

the existing ones, yet performed at least as well. Some had discontinuous der ivatives,

some did not saturate on either side, and some had positive instead of negative bumps. The

biggest surprise was discovered in the CoAtNet experiment on ImageNette (ﬁgure 11.5

𝑏

This function was essentially a sigmoid, similar to those used extensively during the early

days of neural networks, but largely discarded in favor of ReLU in deep learning. Why

would it be discovered again in these experiments?

In deep learning, the linearly increasing region of ReLU helped avoid vanishing

gradients. It is therefore important to look at how the sigmoid is used, by plotting which

parts of the function are actually activated during per formance. It indeed provides behavior

similar to ReLU early in training: The function is activated around the nonlinearity, but

does not reach the saturating region that occurs with larger activations. However, later

training also takes advantage of the saturating region. In this manner, the same activation

function can be used in two ways: presumably to keep the gradients from vanishing early,

and to commit to decisions later. This result challenges the common approach in deep

learning design and demonstrates the power of neuroevolution in meta-learning good

designs.

In sum, surrogate optimization techniques make it possible to scale up neuroevolution

meta-learning; in doing so, it is possible to identify principles that would be diﬃcult for

human designers to discover.

294

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

11.3.6 Synergies

Perhaps the most important future direction in evolutionary meta-learning is to discover

and utilize synergies between the diﬀerent aspects of the learning system design. For

instance, the best per formance was achieved by optimizing activation functions for the

speciﬁc architecture; it might be possible to optimize the architecture simultaneously to

emphasize this eﬀect.

Simply running evolution on all these design aspects simultaneously is unlikely to work;

the search space would be prohibitively large. Similarly, adding more outer loops to the

existing process (where supervised learning is the inner loop and meta-lear ning is the outer

loop) is likely prohibitive as well. However, it might be possible to alternate the evolution

of diﬀerent aspects. Better yet, techniques from bilevel (or multilevel) optimization

could be usefulÐthe idea is to avoid a full inner-outer loop structure, but instead use e.g.

surrogate models to evaluate outer loop innovations (J. Liang and Miikkulainen, 2015;

Sinha, Malo, Xu, et al., 2014).

A practical approach is simply adding constraints and searching in a smaller space.

A ﬁrst such step was already taken in the EPBT system (J. Liang, Gonzalez, Shahrzad,

et al.,

2021), which combines hyperparameter tuning, loss-function optimization, and

population-based training (PBT) into a single loop. That is, hyperparameters and loss

functions are evolved at the same time as the networks are being trained. Hyperparameter

tuning is limited to those that do not change the structure of the networks (e.g. learning

rate schedules) so that they can be continuously trained, even when the hyperparameters

change. Similarly, loss-function optimization is limited to TaylorGLO coeﬃcients (J.

Liang, Gonzalez, Shahrzad, et al., 2021) that can be changed while training is going

on. Even so, the simultaneous evolution and learning was deceptive, and needed to be

augmented with two mechanisms: quality-diversity heuristic for managing the population

and knowledge distillation to prevent overﬁtting. The resulting method worked well

on optimizing ResNet and WideResnet architectures in CIFAR-10 and SVHN, but also

illustrates the challenges in taking advantage of the synergies of meta-learning methods.

11.4 Case Study: Meta-learning vs. Human Design

How useful exactly is meta-learning in practice? Convincing results were obtained in a

natural experiment that compared human design with evolutionary meta-learning in the

domain of medical aesthetics (Miikkulainen, Meyerson, Qiu, et al., 2021).

Medical aesthetics focuses on treatments that improve appearance following injury or

disease, but also includes elective procedures intended to lower perceived age and thus

improve the patient’s self-esteem. They often involve injecting a toxin (e.g. Botox) or

a ﬁller in a targeted area of the face, changing the skin texture and other facial features

(Abelsson and Willman, 2020; Arsiwala, 2018). Evaluating the success of such procedures

is largely subjective. However, perceived age is quantiﬁable, and methods can be developed

for measuring that aspect of the outcome automatically.

Indeed, age estimation has been used as a benchmark for visual deep-learning

architectures for a long time. Many of the state-of-the-art architectures have been

evaluated in it, and good progress has been made (Rothe, Timofte, and Van Gool, 2018;

295

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

Y. Yang, Y.

H. Huang, Y.

Y. Lin, et al., 2018). There are, however, three challenges

in building an age estimator that could be used to evaluate medical aesthetics treatments.

First, the datasets used for age estimation are usually based on celebrity images. Such

images have often been retouched and processed in various ways, and the subjects often

have makeup and even medical aesthetics work done already. All such alterations make

learning reliable estimates diﬃcult. Second, while the architectures can be used on facial

images, they were usually developed for general image recognition benchmarks such as

CIFAR-10 and ImageNet. Thus, their architecture does not utilize special features of the

facial image dataset such as the structure of the face. Third, in order to evaluate the value

of treatments, it is necessary to estimate conﬁdence in the predictions. Deep learning

architectures do not by themselves provide such estimates.

The experiment consisted of addressing these challenges, making it possible to evaluate

the value of medical aesthetics treatments quantitatively. First, the celebrity face datasets

were replaced with images of actual patients. The ﬁrst dataset, D0, consisted of 10,837

training images and 2692 test images, with ages ranging from 18 to 79. This dataset was

less challenging and allowed for fast early development of models. It was later replaced by

dataset D1 with 18,537 training and 3733 testing images, with more variety in terms of

studies and patients. These two datasets were used to evolve and train good age estimator

models. While the DenseNet-121 architecture achieved a validation mean absolute er ror

(MAE) of 7.43 years on the celebrity dataset; multiple similar architectures did much

better on D0 and D1, including DenseNet-169 with 3.65 years on D1. Thus, the quality of

the datasets matters signiﬁcantly.

Second, several aspects of meta-learning were used synergistically to optimize the

age estimation architectures. What made this study particularly valuable was that at the

same time, there was a team of human data science experts who were performing the same

task by hand. The two teams did periodically share discoveries, such as better-per forming

baseline architectures, but they were trying to outperform each other. Thus, the project

turned into a natural experiment on the value of automated meta-learning.

The main strategy that both teams employed was to start small and expand in multiple

stages

𝑆

𝑖

. The experiment started with the D0 dataset and small baseline architectures

ResNet-50 (in stage

𝑆

) followed by DenseNet-121 (

𝑆

) (K. He, X. Zhang, Ren, et al.,

2016; G. Huang, Z. Liu, van der Maaten, et al., 2017b). With D1, larger baselines

DenseNet-169 (

𝑆

), DenseNet-201 (

𝑆

), and eventually EﬃcientNet-B6 (

𝑆

) (M. Tan

and Le, 2019) were used, and the image resolution was expanded from the initial 224

224

(

𝑆

) to 512

512 (

𝑆

) and eventually to 528

528

𝑆

. Finally, the three best models were

ensembled (

𝑆

). Population-based training (PBT; Jaderberg, Dalibard, Osindero, et al.,

2017; J. Liang, Gonzalez, Shahrzad, et al., 2021) was used throughout. That is, while

evolution modiﬁes various hyperparameters for training the networks, the network weights

persist from generation to generation. In this manner, training is a continuous process,

saving signiﬁcant computational eﬀort.

Evolution was set to optimize three types of hyperparameters: Those that specify

learning, architecture, and data augmentation mechanisms. The learning parameters

included the optimizer (Adam or RMSProp), initial learning rate, momentum, decay,

patience, and weight averaging. The architecture parameters included the base model,

296

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

Figure 11.6: Utilizing meta-learning synergies to beat human designers. In this natural

experiment, human experts and meta-learning were both working at the same time to improve

the accuracy of age estimation from facial images. In two datasets (D0 and D1), evolutionary

meta-learning was able to discover models that performed better than those simultaneously designed

by human data scientists. While the neural networks were being continuously trained, evolution

optimized the learning, architecture, and data-augmentation hyperparameters. The approach

discovered and utilized synergies between design aspects that were diﬃcult for humans to utilize.

The ﬁnal accuracy, MSE of 2.19 years, is better than human accuracy in age estimation (3-8 years).

Figure from Miikkulainen, Meyerson, Qiu, et al. (2021).

layers used as output, and loss function (i.e. linear combinations of MAE and cross-entropy).

The data parameters included rotation, shift, shear, zoom, ŕip, and cutout.

The main result, illustrated in ﬁgure 11.6, is that the meta-learning approach improved

upon the human data science team’s approach on both datasets. It discovered several

useful principles that the data scientists were not aware of: focusing data augmentation to

regions that mattered most, and utilizing ŕips only horizontally across the face; utilizing

diﬀerent loss functions at diﬀerent times during learning; relying mostly on the output

level blocks of the base models. It eventually reached the average er ror of 2.19 years,

which is remarkable because the human average error on this same task is estimated to be

3-4 years in controlled settings and 6-8 in more diverse settings (Burt and Perrett, 1995;

Voelkle, Ebner, Lindenberger, et al., 2012). Thus, meta-learning can be used to customize

deep learning approaches to the task and thus perform better than general designs and

better than human customization.

The third challenge is to estimate conﬁdence in the age estimations; it will then be

possible to demonstrate that the treatments provide statistically signiﬁcant improvement.

While deep learning models can be trained to provide a point prediction (i.e. continuous

value such as age), they do not by themselves provide any indication of what the conﬁdence

intervals around that value are. However, it is possible to train another model to estimate

such intervals. In the approach called residual input-output estimation (RIO; Qiu, Meyerson,

and Miikkulainen, 2020), a Gaussian process model (GP; Rasmussen and C. K. I. Williams,

2006) is trained to predict the residual er rors in the validation set. The GP model is then

297

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

Figure 11.7: Demonstrating the value of medical aesthetic treatment with AI. The vertical

axis shows the perceived age diﬀerence from pre-treatment images to images taken at diﬀerent

times after treatment. The error bars indicate standard error on RIO values, averaged across

individuals. Whereas the estimated age diﬀerences with placebo treatment are centered around

zero, the actual Botox treatments (of which there were two versions) reduce the apparent age

substantially, demonstrating that the treatments are eﬀective. Figure from Miikkulainen, Meyerson,

Qiu, et al. (2021).

used to create a distribution of possible values. The conﬁdence intervals can be identiﬁed

from this distribution. In addition, its mean can be used to adjust the actual prediction,

improving its accuracy. When trained with the age estimation data, RIO’s conﬁdence

intervals included 94.2% of the test set examples in its 95% conﬁdence interval, 89.2% in

its 90% conﬁdence interval, and 69.2% in its 68%/ conﬁdence intervalÐand its mean

improved the prediction accuracy by 9%.

In order to evaluate the value of treatments, a third dataset, D2, was collected. It

consisted of two diﬀerent treatments, altogether 631 patients with 3,925 images taken

before treatment, and 68,799 images taken at one week, two weeks, and monthly until six

months after treatment. In addition, 5,190 images were taken at the same time points of

another 156 patients who received a placebo injection instead of the actual treatment.

The results are shown in ﬁgure 11.7. The placebo eﬀect ŕuctuates somewhat but is

centered around zero. The two treatments, on the other hand, show a statistically signiﬁcant

decrease in age. After six months, the patients on average look 0.5 years younger, i.e. the

eﬀect is about one year for the single injections (typically multiple injections are used to

amplify this eﬀect). The result thus demonstrates that the medical aesthetics treatments

are an eﬀective way to make the patients look younger. AI can thus be used to quantify

the eﬀect that was previously only subjective.

Moreover, meta-learning was essential in achieving the result. With the same datasets

and baseline architectures, similar computational resources, and similar development

time, through meta-learning it was possible to achieve better results than through manual

298

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

optimization. The case study thus demonstrates that neuroevolution meta-learning is an

eﬀective way to develop practical applications of deep learning.

11.5 Neuroevolution of Neuromorphic Systems

Neuromorphic computing, i.e. spiking neural networks designed to be implemented in

hardware, is a promising new area for neuroevolution. Such networks need to be energy

eﬃcient, and therefore compact and complex, with many design parameters that need

to be optimized and customized. This general area is reviewed in this section, several

examples are given, and future opportunities are outlined.

11.5.1 Neuromorphic Computation

Neuromorphic computation, a ﬁeld focusing on hardware implementation of neural

networks, is a burgeoning ﬁeld with a long history (James, Aimone, Miner, et al., 2017;

Schuman, Potok, Patton, et al., 2017). There are several motivations: neuromorphic

circuits oﬀer parallel computation that results in real-time performance, they can be

fault-tolerant, such systems may learn online, and they can be used to evaluate hypotheses

in neuroscience. However, energy eﬃciency has gradually emerged as the main goal

over the years. Most of the implementations are based on spiking neurons, as opposed to

neurons that are activated with continuous values representing ﬁring rates. Such spikes

require very little power, resulting in energy savings of several orders of magnitude. As

computation and AI move to the edge, i.e. sensors and actuators in the ﬁeld, power becomes

a pr imary constraint on computation, and neuromorphic designs oﬀer a possible solution.

Although the full power of neuromorphic computing is still a way oﬀ, substantial

hardware designs have already been manufactured that demonstrate its potential. IBM’s

TrueNorth (Akopyan, Sawada, Cassidy, et al., 2015) is one and Intel’s Loihi (Davies,

Srinivasa, T.

H. Lin, et al., 2018) another, both with 1M spiking neurons. It is therefore

possible to generate neuromorphic methods and have them run on these actual physical

devices. However, the ﬁeld is much broader, and many methods are proposed for a

wide variety of conceptual devices. What makes the ﬁeld particularly interesting is that

the resulting neural network architectures and algorithms are often new and diﬀerent,

and not just hardware approximations of existing simulated neural networks, such as

backpropagation on a three-layer feedforward network. In that sense, neuromorphic

computing is driving innovation in neural networks.

Biology is the source for many such ideas in that many neuromorphic designs are

inspired by neuroscience. Some of them are also plausible, intended to capture principles

of biology closely enough to test hypotheses about it. For instance, spiking neurons

can be implemented at the level of Hodgkin-Huxley equations, i.e. the electrochemical

balance of compartments in the neural membrane. Such implementations allow studying

single-neuron computation well. Other models like the Izhikevich neuron aim to replicate

the bursting and spiking behavior with simpler computation. The leaky-integrate-and-ﬁre

model (LIF) simpliﬁes them further into integrating the spikes in each synapse over time

(with decay), and ﬁring when a threshold is exceeded.

299

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

Learning in spiking networks is often based on spike-timing-dependent plasticity

(STDP). If a postsynaptic neuron ﬁres shortly after the presynaptic neuron, it is possible that

the presynaptic ﬁring caused the postsynaptic ﬁring, and the connection is strengthened.

Conversely, if the postsynaptic neuron ﬁres shortly before the presynaptic neuron, the

connection is weakened. In this sense, STDP is a time-based reﬁnement of the Hebbian

learning principle, i.e. that neurons that ﬁre together wire together.

Note that STDP is an unsupervised learning method: there are no targets or gradients,

but simply an adaptation principle that applies to each connection independently. To make

learning more goal-directed, learning mechanisms that approximate backpropagation have

also been proposed. A practical approach along these lines is to ﬁrst train a standard

simulated ﬁring-rate backpropagation network oﬄine, and then convert the resulting

network into a spiking neural network equivalent (S. Lu and Sengupta,

2022). Such

implementations can achieve power savings; however, they do not take into account or

utilize any further properties of hardware systems, such as delays and timing.

Thus, LIF neurons with an STDP lear ning rule are the most common implementation

of neuromorphic architectures. It has low energy requirements and is event-driven, and

is thus suitable for many architectures and applications. The designs include hardware-

constrained circuits such as those provided by TrueNorth and Loihi, brain-inspired circuits,

feedforward neural networks, and convolutional networks.

Interestingly, reservoir computing architectures have emerged as a popular design as

well, as a way to extend neuromorphic computing to time-varying problems. A reservoir

is a recurrent network that generates a time-varying signal that can then be processed

with a feedforward network, making it possible to recognize time series, or generate

time-varying behavior such as locomotion. The reservoir is initialized with random

neurons and connection weights, and they are not modiﬁed, making them particularly

useful for neuromorphic computation, for instance through a memristor implementation.

The designs are often evaluated with standard machine learning tasks. However, the

ultimate applications range from vision and sensing to robotics and control. While it

may be possible to achieve better performance through e.g. deep learning, some of such

tasks need to be performed in physical devices at the edge with little power available.

For instance, visual and auditory signal detection, brain-machine interfaces, and central

pattern generators for locomotion may be such applications in the future.

Because neuromorphic designs are unique and varied, there is a great opportunity to

optimize them through neuroevolution, as will be discussed next.

11.5.2 Evolutionary Optimization

Neuromorphic designs include many dimensions that can be optimized towards several

diﬀerent objectives. For instance, the synaptic eﬃcacy, activation decay, ﬁring threshold,

refractory period, and transmission delay of LIF neurons can be adjusted; the connectivity of

the network can be changed, and the timing and extent of plasticity modiﬁed. Performance

in the task is one objective; energy consumption, size, and complexity of the network are

others.

Optimization of neuromorphic designs is thus a compelling application for neuroevo-

lution. First, g radients are often diﬃcult to obtain with neuromorphic architectures and

300

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

in domains where they would be applied. Neuroevolution does not depend on gradients,

and it can therefore be used to implement supervised learning. It can therefore be used to

extend neuromorphic computing to many engineering applications. Second, while many

applications can be built with deep-learning designs, they are too large to be eﬀectively

deployed at the edge. Neuroevolution often results in compact designs that are space and

energy-eﬃcient. Third, it is possible to optimize the designs towards multiple objectives

simultaneously, including performance, energy consumption, size, complexity, and speciﬁc

hardware restrictions. Fourth, evolution can be extended to include hardware design as

well, leading to the co-design of the hardware and the algorithms that run on it. Fifth,

while such optimization is compute-intensive, it can be done oﬄine, taking advantage of

existing hardware simulators.

Many approaches to neuromorphic neuroevolution have been proposed, targeting

diﬀerent aspects of hardware design. For instance, the evolutionary optimization of

neuromorphic systems (EONS; Schuman, J. P. Mitchell, Patton, et al., 2020) framework,

the idea is to evolve a ŕexible structure of nodes and edges, as well as many of their

parameters such as the connection weights, the time delay on the connections and neurons,

activation thresholds, and leak rate. The system starts with a randomly initialized

population represented as lists of nodes with IDs and parameters; as usual, each generation

of individuals is evaluated in the task, and crossover and mutation applied to selected

parents. The method is thus similar to NEAT but includes many more parameters that are

speciﬁc to neuromorphic hardware. Note EONS is also generic and can be adjusted to

diﬀerent kind of hardware. Evolution is simple enough so that it can be implemented in

hardware at the edge, but usually it is done oﬄine using a hardware simulator.

EONS has been tested on several standard benchmarks. For instance, in classiﬁcation

tasks from the UCI database it resulted in simpler and more accurate solutions than standard

neuromorphic designs. Evolution also adapted the solutions to hardware constraints such as

the number of bits used to encode the weights. With a secondary objective to minimize the

number of nodes and connections, in addition to accuracy, it produced a range of tradeoﬀs.

Such experiments thus demonstrate the viability of hardware/algorithm co-design.

11.5.3 Examples

A particularly interesting application of EONS is to optimize reservoir architectures.

Although reservoir networks usually have a ﬁxed structure and weights, and learning is

only done on the feedforward network that receives input from the reservoir, evolution

can be used to optimize the reservoir itself. Such optimization may include tuning its

hyperparameters, connectivity, and even the weights. This optimization can be done

before the learning in the feedforward network, the feedforward network can be evolved

directly at the same time, or the trained per formance of the feedforward network can be

used as ﬁtness for reservoir evolution (Iranmehr, Shouraki, Faraji, et al., 2019; J. Reynolds,

Plank, and Schuman,

2019). Note that even though these optimizations were developed for

neuromorphic computing, they apply to ﬁring-rate versions of reservoir networks as well.

Evolutionary optimization of reservoir networks was shown to result in better per-

formance than e.g. the usual grid search for good designs. A particularly illustrative

application was to classify radar pulse sequences in order to identify movements of

301

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

free electrons in the ionosphere. The performance was close to other machine learning

methods; the low-power implementation may make it possible to deploy actual physical

solutions even in satellites.

Along the lines of building better detectors, radiation anomaly detection is a similar

potential killer app for neuromorphic computing (Ghawaly, A. Young, Archer, et al., 2022;

Ghawaly, A. Young, Nicholson, et al.,

2023). As part of nuclear nonproliferation research,

the challenge is to detect hidden gamma-ray sources in an urban environment. This is

a diﬃcult task because the detection needs to be done by moving through the normal

accessible environment, and background radiation varies signiﬁcantly. Potential sources

need to be detected as anomalies in the observed levels that are very noisy, triggering an

alarm for further study. As usual in such tasks, the true positive rate needs to be increased

while keeping the false alarm rate as low as possible.

The task is well deﬁned, with ANSI standards for acceptable detection levels for

diﬀerent types of radiation, as well as standard datasets through which performance can

be evaluated. The best current approaches are based on machine learning: In a recent

competition by US Department of Energy, nine of the ten best methods were based on neural

networks and similar methods (Department of Energy, 2019). However, such methods

consume a lot of energy, which limits their applicability in the ﬁeld. Neuromorphic

computing is a viable alternative, oﬀering real-time detection with much less energy usage.

In a series of experiments, EONS was set to design a network for this task. As usual,

EONS optimizes the topology and weights of the network, but also several hyperparameters

such as the encoding for the spikes, the delays on neurons and connections, neuron leakage,

spiking thresholds, and short-term memor y between inferences. A threshold on the spiking

rate was used to trigger alarms, adjusted to an acceptable false-alarm rate. The resulting

designs had a sensitivity of about half of a computationally intensive PCA-based spectral

analysis method; thus, the energy savings still come with a cost. However, they met several

ANSI standards and performed better than a common k

𝜎

baseline method, suggesting

that it may already be possible to deploy them in conditions where energy is at a premium.

Most interestingly, the best designs leveraged both spatial and temporal features in the

signal, taking advantage of short-term memory. Also, while the leakage rate was not

important, spike encoding mattered, with the number of spikes generated being the most

powerful. Such insights are useful in neuromorphic computing in particular because

they can drive co-design of the hardware, suggesting what elements are most useful to

implement.

While low energy consumption is important in sensing, it can also be crucial for

actuators at the edge. For instance for autonomous cars, computing consumes 40 to 80%

of the power required for the control system (Baxter, Merced, Costinett, et al., 2018).

Neuromorphic computing could reduce this requirement signiﬁcantly, thus extending

battery life. This idea was tested in the F1Tenth system, which is a 1/10 scale simulation

and physical implementation of a Formula One race car (ﬁgure 11.8; Schuman, Patton,

Kulkarni, et al., 2022).

Compared to imitation learning based on hand-designed waypoints, neuroevolution

resulted in architectures that performed better, although they took longer to train. This

improvement was due to discovering a customized structure in the network; without it,

302

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

(𝑎) F1TENTH physical car (𝑏) Performance on simulated tracks

Figure 11.8: Evolving a neuromorphic race car controller. Neuromorphic control can reduce

the energy consumption of both sensing and actuation, which is crucial in applications at the

edge, such as self-driving cars. (

𝑎

) The physical platform was an F1TENTH robotic vehicle,

intended to represent 1/10 of a Formula One race car. The controller was implemented on the

𝜇

Caspian neuromorphic development board. (

𝑏

) Performance of the neuroevolved controller on

various simulated race tracks. The bottom ﬁve were used for training and the top 15 for testing.

Performance was measured on the

𝑥

-axis as the fraction of two laps completed. The box plots show

the distribution of the best networks found in 30 evolution runs; the red star is the network with the

best average performance. Some tracks are more diﬃcult than others, but evolution discovered

networks that performed well on all of them, and the best network on nine of the 15. When

transferred to a real-world track (not shown), performance was not as good as in the simulation,

but still demonstrated a practical implementation of a neuromorphic controller at the edge. Figures

from Schuman, Patton, Kulkarni, et al. (2022).

the results were not as good. Interestingly, the discovered network structures were also

smaller than the best hand-designed ones for imitation learning and evolution without

structure optimization. Since smaller networks are easier to deploy at the edge, with less

energy and space needed, neuroevolution again provides solutions that make physical

hardware implementations more realistic.

As a proof of concept, the evolved controllers were implemented on a circuit board

on a physical car and tested on a physical track setting. While the performance dropped

somewhat, as is usual in transfer from simulation to the physical world, the driving was

largely successful, demonstrating actual neuromorphic control at the edge.

11.5.4 Future Directions

Neuromorphic neuroevolution is a relatively new opportunity. The motivation for energy

consumption is compelling, and there are several encouraging results, but the performance

still needs to be improved and killer applications identiﬁed and implemented. However,

there are several ways in which it can be further developed and improved, which makes it

an interesting area for neuroevolution in the future.

While neural architecture search at the level of deep learning has become rather

diﬃcult, due to extremely large networks and a few dominant architectures, the demands of

neuromorphic computing are almost exactly the opposite. The networks need to be small,

303

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

often recurrent, and customized. There are many hyperparameters beyond the standard

neural network ones, such as delays, leakage, thresholds, spike encoding, and short-term

memory. The designs are constrained by restrictions and properties of the actual hardware

where they will eventually run.

As a result, there are many opportunities for neuroevolution. As with deep neuroevo-

lution, the overall topology, i.e. neurons and their connectivity, is important, but also

because the networks are compact, the connection weights can be optimized directly. The

hyperparameters make the optimization problem complex but also provide an opportunity

for further improvement and customization. New learning mechanisms may be developed

through neuroevolution, improving upon STDP and perhaps providing practical methods

for online supervised learning. Information about not only spike timing across an individ-

ual synapse may be used, but also timing across multiple synapses and their histor y. There

may be opportunities to leverage imperfections and other properties of physical devices,

and even interactions between them, like coupling.

Perhaps the most exciting opportunity is the co-design of neuromorphic architectures

and hardware. It may be possible to establish a cooperative coevolutionary mechanism

that modiﬁes both aspects simultaneously, resulting in an optimal ﬁt not unlike the brain

and behavior coevolution discussed in section 14.5. There are several constraints on both

sides on size, communication, and complexity, but they can possibly be incorporated

into the search and evaluation mechanisms. As a result, entirely new architectures and

algorithms may be discovered and customized to the task to be solved. Such an approach

may indeed prove crucial in moving more computing to the edge in the future.

This chapter explored how evolutionary methods can optimize various components

of neural networks, ranging from architectures and hyper parameters to loss functions and

learning algorithms. These approaches show how evolutionary search can discover more

eﬀective and often surprising conﬁgurations, outperforming human design and enabling

higher adaptability and performance, especially in complex and constrained environments

like neuromorphic systems.

The next three chapters will expand the discussion to synergies and insights that

neuroevolution can bring to other approaches and disciplines, star ting with reinforcement

learning. While neuroevolution and RL operate on fundamentally diﬀerent principlesÐ

population-based evolution versus gradient-based reward maximizationÐtheir strengths

are remarkably complementary, as we will see in the next chapter.

11.6 Chapter Review Questions

Complex System Design: What are the main advantages of using evolutionary

optimization for designing complex systems, such as VLSI circuits or neural

networks, compared to traditional human-driven approaches?

Bilevel Neuroevolution: How does bilevel neuroevolution enhance the performance

of neural networks? Why is surrogate modeling crucial in this process?

Loss Function Optimization: Discuss how evolutionary techniques discovered the

304

CHAPTER 11. OPTIMIZATION OF NEURAL NETWORK DESIGNS

"Baikal Loss" function, and its impact on regularization and robustness in neural

networks.

Activation Functions: Explain the role of activation functions in neural network

performance and how evolutionary approaches like PANGAEA can customize

activation functions for speciﬁc architectures and tasks.

Data Augmentation: Describe how evolutionary optimization can be applied to

data augmentation. Provide examples of transformations discovered during such

processes.

Learning Methods: What are the key ﬁndings of the AutoML -Zero system?

How does it demonstrate the potential of evolutionary approaches in discovering

fundamental learning algorithms?

Synergies in Meta-learning: Why is it challenging to optimize multiple aspects of

neural network design simultaneously? How can these challenges be addressed in

evolutionary meta-learning to outperform human-designed models?

Neuromorphic Computation: What are the key advantages of neuromorphic

computing, particularly in the context of energy eﬃciency and edge applications?

How do spiking neural networks diﬀer from traditional neural networks in achieving

these goals?

Evolutionary Optimization in Neuromorphic Systems: How does the Evolution-

ary Optimization of Neuromorphic Systems (EONS) framework adapt standard

neuroevolution methods for neuromorphic hardware? What unique parameters does

it optimize compared to traditional neural networks?

10.

Applications and Future Directions: Discuss how neuromorphic neuroevolution

has been applied in tasks such as reservoir optimization, radiation anomaly detection,

and autonomous vehicle control. What are some future opportunities and challenges

in combining hardware and algorithm co-design in neuromorphic systems?

305

Chapter 12

Synergies with Reinforcement

Learning

Reinforcement learning (RL) and neuroevolution are two prominent approaches for

optimizing the performance of neural networks, but they employ diﬀerent methodologies

with distinct trade-oﬀs. In the ﬁrst part of this chapter, we will look at their respective

advantages and disadvantages, and ways they could be combined.

In the second part of the chapter, we review approaches that go a step further, allowing

evolved networks to invent their own learning algorithm without relying on existing RL

methods. By leveraging the principles of neuroevolution, these networks can evolve not

only their architectures and weights but also the intrinsic rules that gover n how they learn

and adapt over time.

12.1 Reinforcement learning vs. Neuroevolution

RL is a type of machine learning where an agent learns to make decisions by taking

actions in an environment to maximize cumulative reward. This approach involves the

agent interacting with the environment in a trial-and-error manner, receiving feedback in

the form of rewards or punishments. RL algorithms, such as Q-learning, deep Q-networks

(DQN), and policy gradient methods, focus on ﬁnding a policy that dictates the best action

to take in each state of the environment. Among policy gradient methods, REINFORCE is

one of the simplest and most widely used; it adjusts the policy parameters in the direction

of actions that lead to higher returns, using the log-probability of the chosen actions

weighted by their observed rewards. One of the main advantages of RL is its ability to

handle a wide variety of tasks, especially those involving sequential decision-making and

dynamic environments. It is particularly eﬀective in domains where the environment’s

model is unknown or too complex to be explicitly deﬁned, such as robotics, game playing,

and autonomous driving.

However, RL also has several drawbacks. It often requires a signiﬁcant amount of

data and computational resources due to the extensive exploration needed to discover

eﬀective policies. The training process can be unstable and sensitive to the choice of

hyperparameters. Moreover, RL algorithms can struggle with high-dimensional state and

306

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

action spaces.

Math Detail: Connection Between REINFORCE and Evolution Strategies

REINFORCE and evolution strategies originate from diﬀerent traditions, but both

are instances of black-box gradient estimators based on the log-likelihood trick.

They optimize an expected objective

𝐽 (𝜃) = E

𝑧∼𝑝

𝜃

[𝑓 (𝑧)]

by estimating

∇

𝜃

𝐽

via

sampling, assuming 𝑝

𝜃

is diﬀerentiable.

Using the identity

∇

𝜃

𝐽 = E

𝑧∼𝑝

𝜃

[𝑓 (𝑧)∇

𝜃

log 𝑝

𝜃

(𝑧)]

, both methods compute

gradients without backpropagating through

𝑓

itself. The diﬀerence lies in how

𝑝

𝜃

is deﬁned.

In REINFORCE,

𝑝

𝜃

is a stochastic policy

𝜋

𝜃

(𝑎 | 𝑠)

, and

𝐽 (𝜃)

is the ex-

pected return over trajectories

𝜏 = (𝑠

, 𝑎

, . . . )

. The gradient becomes

∇

𝜃

𝐽 = E

𝜏

[𝑅(𝜏)∇

𝜃

log 𝜋

𝜃

(𝜏)]

, which expands to

𝜏

[

𝑡

𝑅(𝜏)∇

𝜃

log 𝜋

𝜃

(𝑎

𝑡

| 𝑠

𝑡

)

]

under trajectory factorization.

In ES,

𝑝

𝜃

is a search distribution over parameters, typically

𝜃 ∼ N(𝜇, 𝜎

𝐼)

, and

𝐽 (𝜇) = E

𝜃

[𝐹 (𝜃)]

. The gradient is

∇

𝜇

𝐽 = E

𝜃

[𝐹 (𝜃)∇

𝜇

log 𝑝

𝜇

(𝜃)]

. For a Gaussian,

this gradient becomes

𝜎

𝜃

[𝐹 (𝜃)(𝜃 − 𝜇)]

, or, using the reparameterization

𝜃 = 𝜇 + 𝜎𝜖 with 𝜖 ∼ N(0, 𝐼), we get ∇

𝜇

𝐽 =

𝜎

𝜖

[𝐹 (𝜇 + 𝜎𝜖)𝜖].

Practically, the gradient is approximated via Monte Carlo:

∇

𝜇

𝐽 ≈

𝑁𝜎

𝑁



𝑖=1

𝐹 (𝜇 + 𝜎𝜖

𝑖

)𝜖

𝑖

Both approaches use reward-weighted per turbations to estimate gradients, but diﬀer

in scope: REINFORCE perturbs actions, giving ﬁne-grained control and requiring

access to intermediate states and transitions; ES perturbs parameters directly and

treats the policy as a black box, making it more suitable for sparse-reward or

non-diﬀerentiable environments and large-scale parallelism.

Neuroevolution, on the other hand, is particularly advantageous in its ability to

optimize both the topology and parameters of neural networks simultaneously, making it

suitable for tasks where the optimal network structure is not known a priori. Additionally,

neuroevolution tends to be more robust to the pitfalls of local minima, as the population-

based search can explore a broader solution space compared to gradient-based methods used

in RL. For example, by repeatedly running the algorithm from scratch, policies discovered

using evolution tend to be more diverse compared to those discovered by reinforcement

learning algorithms such as REINFORCE, which perturbs actions within trajectories

rather than parameters directly. Despite these strengths, neuroevolution also faces certain

limitations. For example, neuroevolution might not perform well in environments requiring

real-time learning and adaptation since evolutionary processes generally operate on a

longer timescale compared to RL’s incremental updates. Additionally, especially when

the environment provides dense rewards each time step, RL methods often show a higher

sample eﬃciency than NE approaches.

While these methods are often presented as fundamentally diﬀerent, they share

307

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

deeper mathematical connectionsÐboth can be viewed as instances of black-box gradient

estimation using the same underlying principle. The following math detail box unpacks

this connection by showing how REINFORCE and evolution strategies emerge from the

same log-likelihood trick, diﬀering mainly in what they treat as the łsearch distribution.ž

12.2 Synergistic Combinations

In practice, RL and neuroevolution can be synergistically combined to leverage the

strengths of both approaches. This section reviews several ways for doing so, including

combining the two time scales, evolving value functions, and starting points.

12.2.1 Integrating Population-Based and Reinforcement-Based Search

One of the primary diﬃculties in deep reinforcement learning is discovering optimal

policies while avoiding early convergence to suboptimal solutions. Various techniques,

such as intrinsic motivation or curiosity, have been suggested to address this issue.

However, these methods are often not universally applicable and necessitate careful tuning.

Given their population-based nature, eﬀective exploration is an area where evolutionary

approaches shine. Additionally, because returns are consolidated across entire episodes,

they can often better deal with sparse rewards.

Evolutionary reinforcement learning (ERL; Khadka and Tumer, 2018) is a hybrid

algorithm that addresses some of these challenges. ERL utilizes an evolutionary population

to generate diverse data for training an RL agent and periodically integrates the RL agent

back into the EA population to infuse gradient information into the EA process. This

approach harnesses EA’s capability for temporal credit assignment using a ﬁtness metric,

eﬀective exploration through a variety of policies, and the stability of a population-based

strategy. Simultaneously, it leverages oﬀ-policy deep reinforcement learning to enhance

sample eﬃciency and accelerate learning through the use of gradients.

An overview of the approach is shown in ﬁgure 12.1. Similar to the standard

neuroevolution approach, a population of deep neural networks is evolved through an

evolutionary algorithm (mutations and crossover), where the ﬁtness is calculated as

the cumulative sum of the reward during a rollout. Additionally, a portion of the best-

performing individuals (the elites) are not mutated. This part of the algorithm is shown on

the left side of ﬁgure 12.1.

To allow the algorithm to also lear n within an episode, instead of only between episodes

as in the standard neuroevolution setup, during each interaction for each actor and each

time step, infor mation such as the current state, action, next state, and reward is stored in

a replay buﬀer. This replay buﬀer is then used to train agents with a deep RL approach.

While the EA explores through noise in the parameter space (i.e. mutating the weights of

the network directly), RL approaches often explore through noise in the action space by

sampling from the outputs of the network. ERL leverages both by generating additional

experiences for the replay buﬀer through a noisy version of the RL actor network.

To provide information back to the EA and to take advantage of the information

from the gradient descent learning, every once in a while, during a synchronization

308

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Figure 12.1: Evolutionary reinforcement learning.

𝐿𝑒 𝑓 𝑡

: In ERL, a population of neural

networks is evolved through NE. Data collected during those rollouts is used to train a deep

RL agent, which is periodically injected into the EA population.

𝑅𝑖𝑔ℎ𝑡

: In most domains,

ERR signiﬁcantly outperforms vanilla EA and deep RL approaches. By combining EA’s broad,

population-driven exploration with RL’s gradient-based optimization, ERL achieves both stability

and sample eﬃciency, leading to superior per formance even in sparse-reward and deceptive

environments. Figure from Khadka and Tumer (2018).

phase, the weights of the RL actor network are copied back into the EA population. This

network is then evaluated like any other network in the population, which allows good

discovered policies to survive and extend their inŕuence over subsequent populations,

while non-competitive policies will have fewer chances to reproduce. This transfer is

shown to be particularly useful in domains with sparse rewards and deceptive ﬁtness

landscapes.

This method leverages EA’s ability to explore the policy space and handle sparse

rewards while enhancing sample eﬃciency and learning speed through DRL’s gradient-

based optimization. The algor ithm is demonstrated on continuous control benchmarks,

signiﬁcantly outper forming state-of-the-art DRL methods like DDPG and PPO (ﬁgure 12.1,

right). ERL maintains eﬀective exploration, stabilizes convergence, and enhances

performance across various tasks by combining the episodic returns and population

stability of EAs with the gradient eﬃciency of DRL.

12.2.2 Evolving Value Networks for RL

Many RL approaches rely on the concept of a value function. The value function estimates

the expected cumulative reward that an agent can achieve from a given state or state-

action pair and can thus guide the agent’s actions. In deep RL, these value functions

are implemented as neural networks, enabling agents to learn complex behaviors in

environments with high-dimensional state and action spaces. However, decisions about

the architecture of such a value neural network can crucially impact performance, and not

ideally chosen values can lead to poor agent performance.

A signiﬁcant advantage of NE methods, such as NEAT, is that they can not only

optimize the weights of a neural network but also evolve the neural architecture at the

309

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

same time. This approach is thus well-suited to evolve the right initial parameters and

architecture of RL agent value networks that are better at lear ning. This setup diﬀers from

the typical usage of NEAT to evolve a direct action selector network, where the network

directly outputs the action to be taken by the agent. Here, the network only outputs the

value of each state-action pair, and the actual action to be taken is then derived from those

values.

Before we detail how to integrate NEAT with the particular RL algorithm Q-learning,

we ﬁrst brieŕy describe how the Q-learning algorithm works by itself. Q-learning is

a model-free reinforcement learning algorithm that aims to ﬁnd the optimal policy for

a given ﬁnite Markov decision process (MDP). The goal of Q-learning is to learn the

action-value function,

𝑄(𝑠, 𝑎)

, which represents the expected utility (cumulative reward)

of taking action 𝑎 in state 𝑠 and then following the optimal policy thereafter.

The Q-learning algorithm involves initializing the Q-values arbitrarily for all state-

action pairs, except for the terminal states where the Q-values are set to zero. At each time

step

𝑡

, the agent observes the current state

𝑠

𝑡

and selects an action

𝑎

𝑡

based on a policy

derived from the current Q-values, such as the

𝜖

-greedy policy. This policy balances

exploration and exploitation by choosing a random action with probability

𝜖

and the action

with the highest Q-value with probability 1 − 𝜖.

After executing the action

𝑎

𝑡

, the agent receives a reward

𝑟

𝑡

and observes the next state

𝑠

𝑡+1

. The Q-value update rule is then applied to update the Q-value for the state-action

pair

(𝑠

𝑡

, 𝑎

𝑡

)

based on the observed reward and the maximum Q-value of the next state.

The Q-value update rule is given by:

𝑄(𝑠

𝑡

, 𝑎

𝑡

) ← 𝑄(𝑠

𝑡

, 𝑎

𝑡

) + 𝛼



𝑟

𝑡

+ 𝛾 max

𝑎

′

𝑄(𝑠

𝑡+1

, 𝑎

′

) − 𝑄 (𝑠

𝑡

, 𝑎

𝑡

)



, (12.1)

where

𝛼

is the learning rate, determining the extent to which new information over rides

the old information, and

𝛾

is the discount factor, determining the importance of future

rewards.

The algorithm repeats this process until convergence, meaning that the Q-values no

longer change signiﬁcantly. The optimal policy

𝜋

∗

can then be derived by selecting the

action with the highest Q-value for each state:

𝜋

∗

(𝑠) = arg max

𝑎

𝑄(𝑠, 𝑎). (12.2)

In reinforcement learning, speciﬁcally in Q-learning, the traditional Q-table method

of storing the action-value function

𝑄(𝑠, 𝑎)

for each state-action pair becomes impractical

for large state or action spaces due to the exponential growth of the Q-table. To overcome

this limitation, a neural network can be used as a function approximator to estimate the

Q-value function

𝑄(𝑠, 𝑎; 𝜃)

, where

𝜃

represents the parameters of the neural network. The

network receives the state representation

𝑠

as input, and the output layer provides the

estimated Q-values for all possible actions in that state. Given a state

𝑠

, the neural network

outputs a vector of Q-values:

Q(𝑠; 𝜃) = NN(𝑠), (12.3)

310

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

where

Q(𝑠; 𝜃) = [𝑄(𝑠, 𝑎

; 𝜃), 𝑄(𝑠, 𝑎

; 𝜃), . . . , 𝑄(𝑠, 𝑎

|A|

; 𝜃)]

. The Q-value for a speciﬁc

action 𝑎 is then obtained by indexing into this vector:

𝑄(𝑠, 𝑎; 𝜃) = Q(𝑠; 𝜃)[𝑎]. (12.4)

During training, the neural network parameters

𝜃

are updated to minimize the diﬀerence

between the predicted Q-values and the target Q-values through gradient descent.

As mentioned at the start of this chapter, traditional temporal diﬀerence (TD) methods,

such as Q-learning, rely on manually designed function approximators to estimate the value

function, which can be labor-intensive and suboptimal. An approach called evolutionary

function approximation (Whiteson, 2006), combines NEAT with Q-learning, resulting

in the NEAT+Q algorithm. In a bilevel optimization setup (see section

11.2), NEAT

evolves the structure and weights of neural networks in the outer level, while Q-learning

updates these weights during the learning process in the lower-level optimization process.

The aim in this combination is to allow the system to discover eﬀective neural network

conﬁgurations that are better suited for learning accurate value functions, thereby enhancing

the performance of TD methods. Because Q-learning optimizes the weight of this network

in the lower-level optimization algorithm, we have to make a choice about what to do with

those modiﬁed weights in the outer-level.

As we have seen previously (section 4.2.3), we can either follow a Lamarckian

approach, in which the weights updated by Q-learning are written back into the original

NEAT genomes, or follow a Darwinian approach, where the weight changes are discarded

and the original genomes are used to create the neural networks for the next generation.

While the Darwinian approach is the more biologically plausible one, a Lamarckian

approach could have potential beneﬁts for RL tasks because the same learning doesn’t

have to be repeated for each generation. A Darwinian approach, on the other hand, could

take advantage of the Baldwin eﬀect, as we have seen previously in section 4.2.3.

When compar ing these methods in diﬀerent domains such as the MountainCar taskÐ

where a car must swing back and forth to build momentum to reach the hilltop goalÐor

server job schedulingÐwhere jobs must be assigned to servers eﬃciently under capacity

limitsÐit became obvious that while Q-Learning learned a lot quicker in early epochs,

performance soon plateaued (ﬁgure

12.2). NEAT and Q-learning, on the other hand,

continued improving, with NEAT-Q signiﬁcantly outperforming regular NEAT in both

domains. Interestingly, if Q-Learning started out with one of the best networks evolved by

NEAT, it was able to match the per formance of NEAT+Q. Two examples of such evolved

networks are shown in ﬁgure 12.3. The evolved networks are sparsely connected and

irregular, suggesting that ﬁnding them through a manual process is unlikely to succeed.

12.2.3 Evolving Starting Points for RL

Sections 11.2 and 11.3 described how evolution can be used to optimize the design

of neuroevolution methods and supervised neural networks. The same approach can

be applied to reinforcement learning as well. For example, an outer loop evolutionary

optimization can be tasked to ﬁnd starting parameters for an inner loop optimization

process with the goal of making a policy adaptable. This approach is closely related to

bilevel optimisation (section 11.2).

311

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Figure 12.2: Evolutionary function approximation. Q-learning with a manually designed

neural network is compared to both NEAT and NEAT+Q. Both NEAT methods signiﬁcantly

outperform Q-learning in both the MountainCar (

𝑎

) and server job scheduling tasks (

𝑏

). These

results demonstrate that NEAT is able to evolve the right initial parameters and architecture of

value networks that are better at learning. Figure from Whiteson (2006).

Figure 12.3: NEAT+Q evolved networks topologies. Shown are the best neural network evolved

by NEAT+Q for the MountainCar (

𝑎

) and server job scheduling (

𝑏

). Inputs are shown at the

bottom, while outputs are shown at the top. Each input is also directly connected to each output

node (connections not shown). Output nodes can also be connected to other output nodes. The

sparsity and irregularity of these networks suggest that they might be diﬃcult to ﬁnd through a

manual process. Figure from Whiteson (2006).

This type of meta-learning was popularized by the inŕuential work called model

agnostic meta-learning (MAML; Finn, Abbeel, and Levine,

2017). While deep RL

approaches have been shown to reach human or even superhuman performance in a

variety of tasks, there is still a large gap to the learning eﬃciency of humans. Typical RL

approaches require many trials to learn, while humans can perform decently well on a

variety of tasks with relatively little experience. The MAML approach tries to address

this issue to enable more rapid adaptation to diﬀerent tasks. However, the original MAML

relies on second-order gradients, which makes it computationally intensive and sensitive

to hyperparameters. Diﬀerent versions of evolutionary meta-learning have since been

developed to improve on the original MAML. For example, MAML-Balwin (Fernando,

Sygnowski, Osindero, et al., 2018) uses an evolutionary algorithm in the outer loop and

312

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

RL in the inner loop, while ES-MAML (X. Song, W. Gao, Y. Yang, et al., 2020) uses an

evolutionary optimizer in both the inner and outer loops. This section will look at those

variants in more detail.

What the evolutionary meta-learning methods have in common is that they try to

exploit the Baldwin eﬀect to evolve agents that can few-shot learn across a particular

distribution of tasks. In this way, the objectives extend beyond helping to navigate diﬃcult

ﬁtness landscapes, such as the ones encountered in the needle-in-the-haystack problem

from earlier studies of the Baldwin eﬀect (ﬁgure 4.4). While it is theoretically possible to

solve these tasks without learning, here we are interested in tasks that would be impossible

to solve through evolution alone without some form of lifetime adaptation. Consider, for

instance, the scenario where the robots depicted in ﬁgure 14.6 experience a malfunction,

such as the loss of a sensor or a limb. Similarly, envision the rockets illustrated in ﬁgure 6.1

encountering an engine failure or a neural network evolved to control one race car being put

into another diﬀerent race car. When the environment changes suddenly, there is often no

time to re-evolve a controller, and in these circumstances, a standard feedforward network

will often completely fail. Here, the agent has to adapt online to maintain performance.

Canonical tasks in this vein are HalfCheetah goal direction and goal velocity, two

high-dimensional MuJoCo locomotion tasks. In the goal direction task, the agent has to

rapidly learn to run in a particular direction. In goal velocity, the agent has to learn to adapt

its locomotion to match a given velocity. In both tasks, the agents have to learn quickly

during their lifetime. Here, the usual genetic algorithm approach for optimizing neural

network weights without lifetime learning can be compared to an evolutionary MAML

version (MAML-Baldwin), in which the initial weights are evolved through a simple GA

in the outer loop and an RL method (policy gradient method A2C) updates them in the

inner loop (Fernando, Sygnowski, Osindero, et al., 2018). During meta-training, diﬀerent

tasks (e.g. goal directions or target velocities, respectively) are sampled in the inner loop,

and the network needs to adapt to them only through reward feedback alone. This task

would be easy if the network received the desired velocity or direction as input. However,

in these domains this information is only provided in the form of a reward to the RL

algorithm. For the goal velocity task, this reward is the negative absolute value between

the agent’s current velocity and the target velocity; for the goal direction task, it is the

magnitude of the velocity in either the forward or backward direction.

While a typical genetic algorithm failed to solve these tasks, MAML-Baldwin evolved

agents that can quickly adapt their behavior based on the task requirements. For example,

in only 30 simulated seconds, the robot was able to learn to adjust its velocity to match

a target velocity. The comparison between the goal velocity and goal direction tasks

reveals an interesting diﬀerence. The goal direction task demands a signiﬁcant shift

in strategy, as it requires the agent to move forward in some episodes and backward in

others. In this scenario, Lamarckian evolution tended to get trapped in a local optimum,

where it could only move backward eﬀectively. Conversely, Baldwinian evolution adapted

more successfully to these varying tasks. In the goal velocity task, however, Lamarckian

evolution performed better because the ﬁnal velocity achieved in the previous task often

provided a suitable starting point for the target velocity in the next task (since the target

velocity was increased by 0.2 in each episode).

313

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Figure 12.4: Quick adaptation through ES-MAML. The evolutionary meta-learning approach

ES-MAML allows a robot only trained in a simulated environment to transfer to the real world and

adapt to changes not seen during training, such as reduced motor power and an added payload

of 500g placed on the robot’s side. Figure from X. Song, Y. Yang, Choromanski, et al. (

2020).

Videos at https://neuroevolutionbook.com/demos.

The approaches we saw so far, including evolutionary meta-learner MAML-Baldwin,

still relied on a policy gradient method in the inner loop. However, particularly when

dealing with real robots, the noise present in the real world presents challenges to methods

relying on gradient estimates since even small diﬀerences due to initial conditions, noise in

the sensors/actuators, etc. can lead to very diﬀerent trajectories. It would thus be desirable

to also be able to use the more robust evolutionary optimization approach in the inner

loop. However, one requirement is that the inner loop optimization should be data eﬃcient

because meta-learning is generally expensive.

ES-MAML (X. Song, W. Gao, Y. Yang, et al., 2020) provides such a mechanism.

Compared to the original MAML, ES-MAML is conceptually simple, does not require

estimating any second derivatives, and is easy to implement. An ES-MAML variant

particularly suited for noisy domains performs an evolution strategy on the initial network

parameters in the outer loop and then a simple batch hill-climb algorithm in the inner

loop (X. Song, Y. Yang, Choromanski, et al., 2020). Hill climbing in ES-MAML involves

starting with an initial set of model parameters and then iteratively making small, random

perturbations to these parameters. After each perturbation, the modiﬁed parameters are

evaluated based on their performance on the current task. The algorithm then compares the

performance of the modiﬁed parameters to that of the previous ones. If the performance

improves, the algorithm accepts the new parameters; if not, it rejects them and reverts to

the previous parameters.

This combination has been shown to be particularly eﬃcient, outperforming state-of-

the-art MAML and allowing a quadrupedal robot only trained in a simulation to not only

overcome the sim-to-real gap but also to adapt to changes in the real-world, such as (1)

reduced motor power and added payload, and (2) a slippery surface. An example of the

robot before and after adaptation is shown in ﬁgure 12.4.

314

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

In sum, evolutionary meta-learning approaches can exploit the Baldwin eﬀect to

produce powerful few-shot learning agents, are often easier to optimize than their gradient-

descent-based alternatives, and can deal with noisy environments that methods based on

gradient estimates can struggle with.

12.3 Evolving Neural Networks to Reinforcement Learn

Previous sections reviewed a selection of hybrid approaches that combine RL and

neuroevolution methods. While these synergistic combinations have proven very useful,

they still mostly rely on domain-agnostic learning approaches that can take many trials to

learn. Additionally, the aforementioned meta-learning approaches are designed to quickly

learn new tasks but struggle to continually learn; that is, learning new tasks without

forgetting what was previously learned. Finally, animals are born with innate priors

that facilitate fast learning, which go well beyond the current MAML-like paradigms

of only learning good starting weights. For example, a newly hatched chick orients

itself towards moving objects right from birth, before any learning takes place (Versace,

Martinho-Truswell, Kacelnik, et al., 2018). This evolved prior subsequently helps the

animal to quickly and robustly lear n to recognize complex objects under varying points of

view, abilities our current AI systems still struggle with.

In this section, we show that neural networks by themselves can be evolved to start with

useful priors and the capacity to adapt during their lifetime. This ability can enable them

to deal with environments with non-stationary rewards and sudden environmental changes.

While evolution is a relatively slow process that allows capturing gradual environmental

changes, learning enables an individual to adapt to changes that happen during its lifetime.

However, evolving these learning abilities is diﬃcult not only because the neural network

needs to learn which connections to change during the lifetime but also when to change.

One way that neuroevolution can allow agents to learn is to create recur rent connections

in the network, which enables them to maintain information through feedback loops. For

example, in the T-maze navigation domain in section 6.3.2, NEAT was able to evolve a

recurrent network that was able to keep information about the high reward location from

one trial in the maze to the next. More complex recurrent networks, such as LSTMs, have

been the main workhorse of machine learning methods that learn to reinforcement learn

(J. X. Wang, Kurth-Nelson, Tirumala, et al., 2016).

However, recurrent neural networks are not the only way that artiﬁcial agents can adapt

quickly. Several diﬀerent learning mechanisms are reviewed in this section, from simpler

local Hebbian learning to more advanced methods such as neuromodulation that allow

more precise control over plasticity. We will also explore how to combine the ideas of

plasticity with indirect encodings, reviewing the adaptive HyperNEAT approach. Finally,

we will look at approaches that extend neural networks with an external memory to further

separate adaptation and control, which allows them to more easily evolve the ability to

continually learn.

Later in this book, when we go into more details on what neuroevolution can tell us

about biological evolution (section 14.4), we will return to the questions of how learning,

development, and evolution interact and how much intelligent behavior is innate vs. how

315

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Figure 12.5: Navigation of mobile robot with Hebbian plasticity. The navigation of the robot

before (

𝑙𝑒 𝑓 𝑡

) and after lifetime learning (

𝑟𝑖𝑔ℎ𝑡

). The evolved learning rules allow the robot to

quickly learn to navigate a maze without colliding with the walls. Figures from Floreano and

Mondada (1996b).

much is learned.

12.3.1 Evolving Hebbian Learning Rules

A way to allow evolved neural networks to learn during their lifetime is to not only

evolve the network’s weights but also the rules that determine how those weights should

change based on incoming and outgoing activations, inspired by the plasticity in biological

nervous systems. The idea that all connection weights are genetically determined is

unlikely to happen in nature, where information is compressed and thus initial weight

values are likely not precisely encoded in the genome. The most well-known such rule,

which we already encountered in chapter

4.2, is Hebbian learning. This mechanism is

named after psychologist Donald Hebb and often summarized as: łCells that ﬁre together

wire together.ž In mathematical terms, this can be written as:

Δ𝑤

𝑖→𝑗

= 𝜂𝑥

𝑖

𝑥

𝑗

, where

Δ𝑤

is the change in weight from neuron

𝑖

to neuron

𝑗

is based on the activation between them

(

𝑥

𝑖

and

𝑥

𝑗

). The learning rate

𝜂

for each connection can be evolved, allowing evolution to

optimize the necessary degree of plasticity.

Pioneering work in evolving such plastic neural networks was performed by the labs

of Nolﬁ and Floreano (2000) who studied evolving controllers for simulated and real

robots, a ﬁeld called evolutionary robotics. In one of their seminal works, Floreano and

Mondada (1996b) trained a real miniature mobile robot to navigate a simple maze. Instead

of evolving the weights directly, which are initialized to small random values at the start of

a robot’s deployment, a genetic algorithm determines which of four possible learning rates

𝜂

(0.0, 0.3, 0.7, 1.0) each synapse in the network should have. In addition, the genome

also encoded which of the four Hebbian learning rule variations should be applied at

each synapse. These rules included: (1) a simple Hebbian rule, (2) a postsynaptic rule,

in which the weight is decreased if the postsynaptic unit is active and presynaptic is not,

(3) a presynaptic rule, which decreases the weight when the presynaptic neuron is active

316

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

and the postsynaptic not, and (4) a covariance rule in which the weight is decreased if the

activate diﬀerence between pre and postsynaptic neuron is below a given threshold, and

otherwise increased. The weights of these evolving networks were updated every 300 ms

following the synapse-speciﬁc evolved rule.

Info Box: The journey to a PhD in Neuroevolution

I (Sebastian Risi) ﬁrst encountered neural networks during my undergrad studies

in Germany in 2002. There was no course on neuroevolution (or even evolutionary

algorithms) at my university, but my interest really got piqued when I got my hands

on the Evolutionary Robotics book by Nolﬁ & Floreano. Back then, I had to really

convince my professor to let me write a Diploma thesis about this niche topic.

During my research for the thesis, I encountered Ken Stanley’s & Risto’s work on

NEAT and was blown away. Why not let evolution decide on everything, including

the structure of the network! At this point, I basically knew I wanted to pursue a

PhD in this direction; below is an excerpt of the email I wrote Ken in November 2007:

“I recently graduated from the Philipps-University Marburg in Germany

with a master’s degree in Computer Science. I am wondering if you have any PhD

positions available in the area of Neuroevolution for video games or a related

ﬁeld. Especially the NERO project and your publications about Neuroevolution of

Augmenting Topologies have drawn my attention.

My research interests focus on Artiﬁcial Intelligence, Neural Networks, Genetic

Algorithms and biologically inspired computational methods in general. My

curriculum vitae can attest to my extensive experience in these areas.

I am highly interested in fur ther investigating the nature of systems that allow

phylogenetic and ontogenetic adaptation and that display neural development. I

think that the evolution of adaptive Neural Networks that are able to learn online

can be used to create totally new game experiences going beyond the nature of

classical video games.

I am looking forward to hear from you. Thank you for your consideration.”

Even though, in retrospect, the sentence “My curriculum vitae can attest

to my extensive experience in these Areas.” was probably stretching it a bit,

Ken decided to hire me as a PhD student, and we got to work together on many

interesting and fun projects, some of which are detailed in this book. In the same

way I got inspired by Floreano’s & Nolﬁ’s Evolutionary Robotics book, I hope this

book might inspire others to join us in this exciting research ﬁeld!

While the employed plastic networks were tiny compared to current networks (they

have 27 connections in total, with eight infrared sensors, one hidden neuron, and two

motor output neurons), the evolved rules enabled the networks to quickly łlearnž how

to navigate during their lifetimes, even from completely random weights. In less than

ten sensor-motor loops, the best-evolved individuals were able to move forward without

getting stuck at walls (ﬁgure 12.5). Analyzing the evolved solutions showed that there

317

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

isn’t one particular learning rule that appears more often in these networks. However, the

basic Hebbian rule was not used frequently, which is likely due to the fact that it lacks the

capability to decrease synaptic eﬃcacy, potentially hindering future adaptability.

It is also interesting to note that, while the behavior of the robot was stable and it could

perform navigation without colliding with walls, the weights of these networks continuously

changed during navigation. This is in stark contrast to most other networks we encountered

in this book, including networks trained through methods such as reinforcement learning.

In these ﬁxed networks, the weights do not change during inference and only during a

dedicated training period. Plastic neural networks thus take us a step closer to biological

neural networks, which undergo continual changes throughout their whole lifetimes.

By building on recent advances in scaling evolution strategies to systems with a large

number of trainable parameters (section 3.4), evolved plastic neural networks can be

applied to more complex problems with larger parameter spaces as well. Thus, we can

not only deal with increased network sizes but also more general plasticity rules. While

we were previously limited to only choosing from a set of four discrete Hebbian rules,

evolving generalized Hebbian rules enables each connection to implement its very speciﬁc

weight update in the form of:

Δ𝑤

𝑗𝑖

= 𝜂[𝐴𝑜

𝑗

𝑜

𝑖

+ 𝐵𝑜

𝑗

+𝐶𝑜

𝑖

+ 𝐷], (12.5)

where

𝑤

𝑗𝑖

is the weight between neuron

𝑖

and

𝑗

𝜂

is the learning rates, correlation

terms

𝐴

, presynaptic terms

𝐵

, postsynaptic terms

𝐶

, constant

𝐷

, with

𝑜

𝑖

and

𝑜

𝑗

being

the presynaptic and postsynaptic activations, respectively. We thus have a total of ﬁve

parameters (𝜂, 𝐴, 𝐵, 𝐶, 𝐷) per connection.

These more complex plastic neural networks can tackle problems that are very diﬃcult

or even impossible to solve for standard feed-forward networks. In fact, they can now

start to address one of the fundamental limitations of current robots, which is their

fragility. While injured animals in nature can compensate for damage by changing their

behavior rapidly, robots often fail even if the situation has only changed slightly. Results

demonstrating the promise of this plastic neural network approach were obtained in

a four-legged walking domain (Najarro and Risi,

2020). Here, a standard three-layer

feedforward network with [128, 64, 8] nodes per layer (totaling 12,288 trainable weight

parameters) was compared to a plastic neural network with the same architecture in

which only the plasticity parameters were evolved (totaling

12, 288 ×5 = 61, 440

Hebbian

coeﬃcients). Three diﬀerent versions of a quadruped robot were devised to simulate the

impact of partial damage to one of its limbs, with ﬁtness being determined as the average

distance covered by two versions of the robot, one in its standard form and the other with

damage to its right front leg. The third version, which had damage to its left front leg,

was excluded from the training process to later assess the networks’ ability to generalize.

The networks’ parameters were optimized through a variation of OpenAI’s ES algorithm

(section 2.2.4).

While a feed-forward static neural network often works well on the morphologies it

was trained on, it failed when confronted with the new robot morphology not seen during

training. The evolved plastic network, on the other hand, quickly found network weights

that allow high performance in these more complex domains, even when starting from

318

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Figure 12.6: Dynamics in random networks with synapse-speciﬁc Hebbian plasticity. The

evolved Hebbian rules allow the controller to quickly learn to control a quadrupedal robot, starting

from randomly initialised starting weights. The ﬁgure shows the networks at three diﬀerent

timesteps (A, B, C) during the lifetime of a robot with the standard morphology. The quick change

in the initially random weights, which is driven purely by the learned Hebbian rules, is reŕected in

the increase in the reward performance (bottom). Even when the morphology of the robot changes

through damage to one of the legs (top, right), the same Hebbian network is able to adapt in a

few timesteps, allowing the robot to continue locomoting. Figures from Najarro and Risi (2020).

Videos at https://neuroevolutionbook.com/demos.

completely random weights in each episode and without access to any reward information

during its lifetime (e.g. distance traveled). Additionally, the Hebbian approach was able to

adapt to damages in the quadruped, such as the truncation of the left leg, which it had not

seen during training (ﬁgure 12.6). Instead of needing many thousands of learning steps as

is common in standard reinforcement learning approaches that start from tabula rasa, the

evolved Hebbian learning rules allowed the neural network to reach high-performance

after only 30 ś 80 timesteps. Interestingly, the Hebbian network achieved this performance

across the three diﬀerent morphologies, all without the network receiving any reward-

based feedback. The incoming activation patterns during the lifetime are suﬃcient for the

network to self-adjust, even without explicit knowledge of the speciﬁc morphology it is

319

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

simulating.

12.3.2 Case Study: Hebbian Learning for Physical Robot Transfer

With the Hebbian-based approach showing increased robustness to situations not seen

during training, it is now worth asking if this approach is also able to handle another type

of generalization: sim-to-real transfer.

Although several studies in neuroevolution have explored the sim-to-real transfer for

locomoting robots, existing work has largely focused on simple robots with only a few

degrees of freedom (Floreano and Urzelai, 2001), or on speciﬁc failure modes (e.g. loss of

a limb) to create robust controllers (section 6.2.3). These approaches are often based on

domain randomization, which consists of extending the training set to include a variety

of slightly diﬀerent scenarios, thereby signiﬁcantly extending the required training time.

One of the enduring challenges in robotics is enabling agents to generalize beyond the

conditions they were trained in, a problem commonly referred to as the out-of-distribution

(OOD) generalization. Traditional deep learning approaches, while powerful, often fail

when confronted with unforeseen variations in the environment, morphology, or task

dynamics.

In this case study, we will take a look at how a Hebbian approach can be scaled to

real-world legged robot platforms without the need for domain randomization (Leung,

Haomachai, Pedersen, et al., 2025). Three types of control policiesÐfeedforward, Hebbian,

and LSTM networksÐwere assessed for robotic locomotion tasks on two real-world legged

robot platforms: a dung beetle-like robot with 18, and a gecko-like robot with 16 degrees of

freedom (Figure 12.7

𝑏

𝑐

). The Hebbian approach followed the connection-speciﬁc ABCD

approach introduced in the previous section, but incorporated a weight normalization

approach that was found to be crucial to prevent weight divergence. In this setup, all the

weights were normalized layer-wise by dividing them by the maximum weight of that

layer.

The simulated environment used the Omniverse Isaac Gym reinforcement learning

environment (Makoviychuk, Wawrzyniak, Y. Guo, et al., 2021). All three networks

achieved comparable per formance in the training environments on the dung beetle-like

robot (ﬁgure 12.7). However, signiﬁcant diﬀerences emerged during testing in out-of-

distribution scenarios. Among the three, only the Hebbian network consistently enabled a

real-world robot to walk eﬀectively, surpassing the performance of both the feedforward

and LSTM-based controllers (ﬁgure 12.7

𝑒

). The robot controlled by the Hebbian network

achieved the highest walking speed, approximately seven cm/s. In contrast, the robots

using simple feedforward and LSTM policies barely moved from their starting positions

during the 20-second test period. Additionally, the Hebbian network exhibited some

intriguing locomotion behaviors: the robot remained stationary until it was placed on the

ground, initiating walking only upon foot-ground contact, and ceasing movement once it

was lifted oﬀ the ŕoor.

Interestingly, these results are in contrast to the superior performance of a recurrent

network compared to a Hebbian network for a simple food gathering task, which we

saw in section

4.2.3. How can this diﬀerence be explained? For the more complex

locomotion domains, the feedforward and LSTM networks likely exhibit overﬁtting due to

320

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Figure 12.7: Hebbian network for sim-to-real transfer. A neural network incorporating Hebbian

plasticity (

𝑎

) is trained to control a robot in simulation before being transferred to a physical robot.

The approach was tested on a dung beetle (

𝑏

) and a gecko-inspired robot (

𝑐

). Training Curves for

the dung-beetle robot locomotion are shown in (

𝑑

). The graph displays the average performance and

standard deviation of the best individual across ﬁve trials for each model. While the LSTM network

performs slightly better in the environments seen during training, only the Hebbian network is able

to control the dung beetle-like robot when transferred to the physical robot (

𝑒

). Figures from Leung,

Haomachai, Pedersen, et al. (2025). Videos at https://neuroevolutionbook.com/demos.

their reliance on highly speciﬁc characteristics of the simulated robotÐsuch as precise

mass distribution, joint dynamics, and surface frictionÐthat deviated signiﬁcantly from

the conditions of the physical robot. The simulation featured a more symmetrical mass

distribution, both left-to-right and head-to-rear, compared to its real-world counterpart. It is

possible that a more accurate simulation might have reduced the performance discrepancy

across models; however, the creation of high-ﬁdelity simulation environments remains a

resource-intensive endeavor. Consequently, the ability of Hebbian networks to generalize

robustly, even in imperfect simulation settings, illustrates their practical value for robotic

control.

It turns out that the Hebbian networks adapted to real-world conditions without explicit

training randomizations of terrain irregularities, mass variations, joint property ŕuctuations,

or morphological defects. While some stochasticityÐsuch as random initialization of

synaptic weights at each episode’s onsetÐwas present, similar randomization in LSTM

hidden states did not prevent overﬁtting. This suggests that Hebbian plasticity imparts a

unique form of adaptability not readily achievable through more conventional architectures.

Further generalization tests were performed with the gecko-like robot. After training

solely on ŕat terrain within simulation, the policy was deployed on the physical robot for

evaluation. The gecko-inspired robot demonstrated an ability to adapt its leg movements

321

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

to traverse uneven surfaces successfully. The Hebbian network also proved resilient

to substantial sensory loss and physical damage. Even with the loss of proprioceptive

feedback or limb functionality, the robot maintained locomotion ability.

The results in this case study highlight the promise of Hebbian plasticity mechanisms

for achieving robust, adaptable robotic behaviors capable of bridging the challenging

sim-to-real gap.

12.3.3 Learning When to Learn through Neuromodulation

Hebbian learning is far from the only adaptation mechanism in the brain. Another

mechanism is neuromodulation, which plays many diﬀerent roles in biological nervous

systems. Neuromodulation refers to the process by which neural activity is regulated or

modiﬁed by neurotransmitters and other chemicals within the brain and nervous system.

This process can inŕuence various aspects of neuronal function, including the strength and

eﬃcacy of synaptic connections, the excitability of neurons, and overall neural network

dynamics. Neuromodulation plays a crucial role in the brain’s ability to adapt to new

information, experiences, and environmental changes, aﬀecting learning, memor y, mood,

and behavior.

Given the numerous functions of neuromodulation in biological nervous systems,

it has also been incorporated in evolving plastic neural networks. In these instances,

neuromodulation is typically set to modify the Hebbian plasticity of neurons in the neural

network. This ability is useful because it allows switching plasticity łonž and łoﬀž,

enabling reward-mediated learning. For example, plasticity of some weights might be

switched oﬀ if they were responsible for obtaining a high reward in the environment, while

other connection should increase their plasticity when the reward is lower than what was

expected. In a pioneering demonstration of this idea Soltoggio, Bullinaria, Mattiussi, et al.

(2008) used an approach similar to NEAT, in which structural mutations during evolution

could not only insert and delete standard hidden nodes but also neuromodulatory nodes.

In contrast to standard neural networks, in which each node has the same type of eﬀect on

all the nodes it is connected to, in a neuromodulated network, each node

𝑖

calculates both

a standard activation 𝑎

𝑖

and a modulatory activation 𝑚

𝑖

as follows:

𝑎

𝑖



𝑗 ∈𝑆𝑡𝑑

𝑤

𝑖 𝑗

𝑜

𝑗

, (12.6)

𝑚

𝑖



𝑗 ∈𝑀𝑜𝑑

𝑤

𝑖 𝑗

𝑜

𝑗

, (12.7)

where

𝑤

𝑖 𝑗

is the strength of the connection between node

𝑖

and

𝑗

, and

𝑜

𝑗

is the output of

the postsynaptic neuron, which is calculated based on the standard activation

𝑜

𝑗

(𝑎

𝑗

) =

𝑡𝑎𝑛ℎ(

𝑎

𝑗

)

. In contrast to how pure Hebbian plasticity was modeled as

𝛿

𝑗𝑖

= 𝜂[𝐴𝑜

𝑗

𝑜

𝑖

𝐵𝑜

𝑗

+𝐶𝑜

𝑖

+ 𝐷]

, we are now making the weight change also dependent on the calculated

modulatory activation 𝑚

𝑖

: Δ𝑤

𝑗𝑖

= 𝑡𝑎𝑛ℎ(

𝑚

𝑖

)𝛿

𝑗𝑖

Incorporating neuromodulation has been shown to provide advantages in tasks that

require selectively switching plasticity on and oﬀ at critical moments during an agent’s

lifetime (Soltoggio, Dürr, Mattiussi, et al., 2007). One such task requires a simulated

322

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Figure 12.8: Neural activity and weights during the simulated bee’s lifetime. The top graph

shows the intensity of the signal generated by the single modulatory neuron. The middle graph

represents the amount of reward received upon landing, while the bottom graph tracks the synaptic

weights of color inputs to the output neuron, which determine the bee’s preference for a speciﬁc

ŕower color. Notably, the modulatory signal remains low during ŕight but increases signiﬁcantly

upon landing, facilitating a more rapid update of synaptic weights at that critical moment. Figure

from Soltoggio, Dürr, Mattiussi, et al. (2007).

3D bee to forage in an environment where ŕowers of two colors, blue and yellow, oﬀer

varying amounts of nectar. The reward provided by these ŕowers is determined by either

deterministic or probabilistic rules, creating a dynamic and uncertain environment. The

bees need to learn to associate ŕower colors with higher nectar rewards and adapt their

strategy as these reward contingencies shift over time. This setup required the bees to

demonstrate adaptive decision-making in response to environmental variability.

In this task, the evolved modulatory networks clearly outperformed both ﬁxed-weight

and traditional Hebbian plasticity networks. The evolved bee agents demonstrated

remarkable behavioral adaptability throughout their simulated lifetimes. They were able to

quickly adjust their preferences when the color associated with high reward was reversed.

This rapid re-learning reŕects the emergence of eﬀective dynamic learning strategies

within their neuromodulatory neural networks. Fur thermore, these agents exhibited the

capacity to estimate long-term reward expectations even in environments where rewards

were delivered probabilistically. Rather than relying on immediate reinforcement, they

aggregated historical reward outcomes to reﬁne their behavior, a trait closely aligned with

biological foraging strategies.

Beyond the environments used during evolution, the most successful neurocontrollers

also generalized well to an entirely new and more complex situation where both ŕower

types oﬀered the same average reward but with diﬀerent probabilities. Despite never

encountering this scenario during training, these controllers adapted eﬀectively, learning

which ŕower yielded better long-term gains. This result demonstrates a signiﬁcant degree

of generalization and supports the idea that evolved neuromodulatory topologies are

capable of developing not just task-speciﬁc behavior, but generalizable learning strategies

323

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

applicable to novel situations.

How did the evolved neuromodulated networks solve this task? Figure 12.8 provides

insights into the neural dynamics of the system. At the moment of landing, the modulatory

signal reaches its peak, triggering the network to update synaptic weights eﬀectively. During

ŕight, the modulation level remains low, enabling a gradual decay of synaptic weights,

which mirrors the diminishing expectation of a reward in its absence. Interestingly, there are

moments when neuromodulation drops entirely to zero, particularly when the bee perceives

the grey color outside the ŕower ﬁeld. Since these areas consistently yield no rewards

and are unaﬀected by changes in contingencies, synaptic plasticityÐand consequently,

learningÐis deactivated. These results demonstrate that the evolved neuromodulatory

network activates learning only when environmental conditions necessitate adaptation.

In conclusion, neuromodulation can play a critical role by acting as a regulatory

mechanism for synaptic plasticity. It enabled the system to "switch on" learning during

critical events, such as when the bees landed on a ŕower and received a reward signal, and

"switch oﬀ" learning in predictable or irrelevant situations, such as when ŕying over areas

without ŕowers. This dynamic control of plasticity allowed the artiﬁcial bees to learn

when necessar y and maintain stability when no learning was required. We’ll return to the

evolutionary advantages of neuromodulation in section 14.3, where we go into more detail

on what neuroevolution can tell us about biological evolution.

12.3.4 Indirectly Encoded Plasticity

A challenge with the previously mentioned approaches to encode plasticity is that the local

learning rules for every synapse in the network must be discovered separately by evolution.

However, similar to how connectivity patterns in the brain follow certain regularities, the

distribution of plasticity rules across a neural network likely would beneﬁt from such

regularities as well.

It turns out that the HyperNEAT approach we introduced in section 4.3.3 to indirectly

encode weight patterns can be generalized to also indirectly encode the plasticity of a

network. As in the brain, diﬀerent regions of the ANN should be more or less plastic and

employ diﬀerent learning rules, which HyperNEAT allows because it sees the geometry

of the ANN. The main idea behind this approach, which is called adaptive HyperNEAT

(Risi and Stanley, 2010), is that CPPNs in HyperNEAT can not only encode connectivity

patterns but also patterns of plasticity rules.

A straightforward way to enable HyperNEAT to indirectly encode a plastic network is

to augment the CPPN to not only produce each connection’s weight, but also additional

connection-speciﬁc parameters such as learning rate

𝜂

, correlation term

𝐴

, presynaptic

factor

𝐵

, and postsynaptic factor

𝐶

. When a policy network is initially decoded, it stores

these parameters and the connection weights for each synapse and then updates the weight

during its lifetime following this simpliﬁed version of the generalized Hebbian learning

rules:

Δ𝑤

𝑖 𝑗

= 𝜂 ·



𝐴𝑜

𝑖

𝑜

𝑗

+ 𝐵𝑜

𝑖

+𝐶𝑜

𝑗



. (12.8)

This approach was able to solve a simple T-Maze task, demonstrating that HyperNEAT

is, in fact, able to distribute plasticity coeﬃcients in a geometric manner. However,

324

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Figure 12.9: Adaptive HyperNEAT. In adaptive HyperNEAT, the CPPN is queried each time

step, given the location of nodes but also the current weight of the connection and the activity of

the pre- and postsynaptic neurons. This way, each connection in the network can learn arbitrary

learning rules that can be geometrically encoded by the CPPN. Figures from Soltoggio, Stanley,

and Risi (2018).

adaptive HyperNEAT is clearly an overkill for such simple domains, and we have seen

simpler approaches, such as directly-encoded Hebbian learning or LSTMs (section 6.3.2),

being able to do the same. However, things become a bit more interesting if we not only

allow adaptive HyperNEAT to encode these learning rule coeﬃcients but enable it to

evolve completely new learning rules itself. This more general adaptive HyperNEAT

model augments the four-dimensional CPPN that normally encodes connectivity patterns

with three additional inputs: presynaptic activity

𝑜

𝑖

, postsynaptic activity

𝑜

𝑗

, and the

current connection weight

𝑤

𝑖 𝑗

. That way, the synaptic plasticity of a connection between

two two-dimensional points (𝑥

, 𝑦

) and (𝑥

, 𝑦

) can be described by:

Δ𝑤

𝑖 𝑗

= 𝐶𝑃𝑃𝑁 (𝑥

, 𝑦

, 𝑥

, 𝑦

, 𝑜

𝑖

, 𝑜

𝑗

, 𝑤

𝑖 𝑗

). (12.9)

Instead of only being queried at the beginning of an episode, here the CPPN is quer ied at

every timestep to update the weights of the neural network. The same CPPN that decides

on the initial weights and network connectivity is now also responsible for how to change

the network, taking into account both the location and activity of the network’s neurons.

A simple, yet eﬀective domain to test the eﬀectiveness of this method is a variation

of the T-Maze domain with a nonlinear reward encoding. That is, in this domain the

agent received a high reward for rewards with łcolorž input values 0.3 and 1.0 but a low

reward for 0.1 and 0.8. Because the agent was given a network with no hidden nodes

(which is not able to learn this nonlinearity), evolution needed to discover a CPPN that

instead encodes the appropriate nonlinear learning rules. And indeed, this more general

adaptive HyperNEAT version was able to solve the task while a normal Hebbian network

and the simpler adaptive HyperNEAT (which outputs the Hebbian learning coeﬃcients)

failed. Interestingly, in this domain the discovered learning rules smoothly change with

the location of the presynaptic node, as shown in ﬁgure 12.9, suggesting that the substrate

geometry gives a useful task bias.

Adaptive HyperNEAT can also be combined with the evolvable substrate approach

(section 4.3.5) to alleviate the experimenter from deciding on the number of hidden nodes.

For the ﬁrst time, this uniﬁed approach called adaptive evolvable-substrate HyperNEAT

325

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

(Risi and Stanley, 2012a), was able to fully determine the geometry, density, and plasticity

of an evolving neuromodulated ANN. Although the tasks to which these methods have

been applied so far are relatively simple, they still ser ve an important purpose. They

demonstrate the CPPN’s ability to learn arbitrary learning rules that enable an agent to

quickly adapt to changes in its environment. The idea of learning to learn has since

become a larger focus of the wider machine learning community, but the groundwork

was laid by many neuroevolution methods. Scaling this approach up to work with larger

networks and for more complex tasks is an exciting future research direction.

As mentioned earlier in the book (chapter 4), in traditional indirect encodings like

HyperNEAT and adaptive HyperNEAT, you start compressedÐyou assume from the

beginning that the network structure or weights can be generated by a compact underlying

pattern (e.g. a small CPPN). The design constraints expressivity from the start, relying on

the hope that the compact representation will be powerful enough to capture all needed

variations.

It is an interesting question whether we can build an indirect encoding that starts the

other way around, i.e. maximally expressive and then gradually compressing itself. One

such approach is called evolve & merge (Pedersen and Risi, 2021). In this approach, each

synapse in the network is assigned a unique, parameterized local lear ning rule based on the

generalized Hebbian ABCD rule (section 12.3.1). Using ES, the population of networks

is ﬁrst optimized for performance on a task. The novel idea in evolve & merge is that after

a predeﬁned number of generations, K-Means clustering is employed to merge similar

learning rules. Each group of similar rules is replaced by a cluster center, eﬀectively

reducing the number of unique rules while maintaining learned behaviors. The evolution

process continues with the reduced rule set, and the merge-evolve cycle repeats until a

target number of generations is reached.

Applied to a quadrupedal locomotion task, evolve & merge achieved impressive

compression, reducing the number of trainable parameters by over 96% without sacriﬁcing,

and often enhancing, performance on unseen morphology variations. Plastic networks

evolved with this approach outperformed static networks in terms of robustness, even when

static networks were optimized with noisy inputs to encourage generalization. While static

networks achieved higher performance in the original, unperturbed environment, plastic

networks displayed far greater resilience under change. Interestingly, robustness improved

as the number of learning rules decreased, validating the hypothesis that a compact set of

adaptive rules promotes generalization. This observation aligns closely with the genomic

bottleneck hypothesis (Zador, 2019), which suggests that biological systems, by encoding

a limited number of developmental rules, achieve robust and generalizable behavior across

a wide range of conditions.

The evolve & merge framework extends the philosophy of indirect encoding to the

evolution of learning itself. Unlike classical indirect methods that impose compression

at initialization, this approach allows rich expressivity early in evolution and gradually

sculpts it into a compact form through environmental feedback and evolutionary pressure.

The ﬁnding that starting with a large rule set and pruning it leads to superior generalization

draws parallels to the lottery ticket hypothesis in deep learning (Frankle and Carbin, 2019).

This hypothesis proposes that within a large, randomly initialized neural network, there

326

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

exist small subnetworks (i.e. łwinning ticketsž) that, when trained in isolation, can match

or even exceed the performance of the full network. In both the case of the lottery ticket

hypothesis and evolve & merge, an initially large parameter space increases the chance of

ﬁnding high-performing solutions.

12.3.5

Learning to Continually Learn through Networks with External

Memory

A major challenge in AI in general, and in evolving plastic neural networks in particular,

is continual learning. That is, learning new tasks or knowledge without forgetting what

was previously learned. Most current neural networks struggle with this and suﬀer from a

symptom called catastrophic forgetting, where they can learn a new task but forget the

tasks they learned previously.

A promising approach to overcome this challenge is memory-augmented neural

networks, which are neural architectures in which the circuit for control and the mechanism

for adaptation are separated by design. In addition to learning through changes in

connection strength or activations (such as in LSTMs), modeling memory directly oﬀers

another way for agents to adapt and remember. One realization of this type of memory-

augmented neural network is the neural Turing machine (NTM; Graves, Wayne, and

Danihelka, 2014). The NTM combines traditional neural networks with the concept of a

Turing machine, enhancing the capability of neural networks by giving them the ability

to read from and write to an external memory module. This fusion allows the NTM to

not only process data through its neural network structure but also store and retrieve data,

enabling it to perform tasks that require memory. Just like LSTMs, NTMs are designed to

handle long-range dependencies in data. In section 2.3.4, we saw that LSTMs achieve

this through their gating mechanisms that regulate the ŕow of information, allowing

the network to maintain or forget information over long intervals. Similarly, NTMs can

maintain data over long periods using their external memory bank, albeit in a more explicit

and controllable manner.

An overview of the basic NTM architecture is shown in ﬁgure 12.10. At the heart

of an NTM is a neural network that acts as the controller. This controller operates like

any other neural network, processing task inputs and generating outputs. However, unlike

standard neural networks, it also interacts with an external memory bank through read and

write heads, directing the read and write operations. The primary advantage of NTMs is

their ability to perform tasks that require complex manipulation of data sequences or the

execution of algorithms that conventional neural networks struggle with. This includes

problems like sorting lists, simple arithmetic, or even executing simple programs.

The original NTM was designed to be completely diﬀerentiable, including the read and

write mechanisms. This means the NTM can be trained end-to-end using backpropagation,

similar to conventional neural networks. However, this diﬀerentiable architecture comes at

the cost of having to access the entire memory content at each step, making this approach

ineﬃcient for larger memory banks. It also limits the setup to a ﬁxed memory size.

Additionally, because the attention is "soft", small errors can accumulate, making the

approach not always generalize perfectly to e.g. copying long sequences.

An exciting direction is to train the NTM instead through neuroevolution, which not

327

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Figure 12.10: Neural Turing machine. In a Neural Turing machine (NTM), a neural network (the

controller) is augmented with an external memory component that it can learn to read from and

write to through dedicated read and write heads. The external memory allows the network to store

information over many time steps and use it to learn algorithms such as copy, sort, or associative

recall. Figures from Graves, Wayne, and Danihelka (2014).

only allows hard attention and potentially better generalization, but the approach can also

be directly applied to reinforcement learning-like problems that do not require input-output

examples. The evolvable NTM enables exactly this, optimizing both the NTM architecture

and its weights with NEAT (Greve, Jacobsen, and Risi, 2016). Because it is trained

through evolution, this model features a theoretically unlimited memory capacity.

The particular evolvable NTM version we review here operates with a single, uniﬁed

head for both reading and writing (ﬁgure 12.11

𝑎

). Beyond the standard inputs and outputs

needed to interface with the external environment, the network has inputs and outputs

that match the vector size of a memory entry. Additional outputs are used for selective

read/write operations, adjusting the active memory position, and employing content-based

addressing. In more detail, the evolvable NTM executes four primary operations:

Write: A write interpolation output dictates the blending of the current memory

vector at the head’s location with a new write vector. This is calculated as follows:

𝑀

𝑡+1

(ℎ) ← 𝑀

𝑡

(ℎ) · (1 − 𝑤

𝑡

) + 𝑎

𝑡

· 𝑤

𝑡

, (12.10)

where

𝑀

𝑡

(ℎ)

represents the memory vector at the head’s location at time

𝑡

𝑤

𝑡

the write interpolation weight, and 𝑎

𝑡

is the write vector.

Content Jump: If the neural network output for content jump exceeds a certain

threshold (e.g. 0.5), the head jumps to a position on the memory tape most akin to

the write vector, determined by an Euclidean distance metric in this implementation.

Shift: This network output can shift the read head either to the left or right from its

current position or maintain the position based on the highest activated shift output

among the three provided.

328

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Read: Following any content jumps and shifts, the content of the memory vector at

the ﬁnal location of the head is automatically fed into the neural network at the start

of the next cycle.

A good domain to compare the evolutionary NTM with the original backprop-trained

NTM is the copy task. In this task, the neural network must memorize and retrieve a

lengthy sequence of random binary vectors. The network receives an initial bit indicating

the start of the task, followed by a sequence of random binary vectors, and then a delimiter

bit that marks the beginning of the recall phase.

The comparison highlights one of the many advantages of neuroevolution. Since

NEAT begins with basic networks and progressively introduces nodes and connections,

it was able to ﬁnd a sparsely connected champion network that utilizes just a single

hidden neuron. This evolved network is signiﬁcantly smaller in size compared to the

original NTM, which features full connectivity, 100 hidden neurons, and a total of 17,162

parameters. Additionally, and in contrast to the original NTM, the evolved networks

generalized perfectly to long sequences.

Another beneﬁt of having an external memory is that it can help in tasks requiring

continual learning. While it can be diﬃcult to learn new information in an LSTM

or Hebbian network during the lifetime of the agent without catastrophic forgetting of

previous information, it is straightforward to tackle this challenge with an expanding

external memory (where new information can be put in an unused location in memory). A

task to test the evolvable NTM for continual learning is the season task (Ellefsen, Mouret,

and Clune, 2015), in which the agent must learn to identify and remember which food

items are nutr itious and which are poisonous across diﬀerent seasons, with the challenge

increasing as the food items and their properties change from one season to another.

The task tests the agent’s ability to withstand catastrophic forgetting and to learn new

associations while retaining old ones.

The evolvable NTM was further modiﬁed to facilitate continual learning (Lüders,

Schläger, and Risi,

2016). First, a default memory location was initialized with a ﬁxed

vector serving as a fallback when no existing memory meets a similarity threshold during

a content jump; once used, a new default was added at the end of the tape, helping prevent

overwriting past associations. Second, to further support the preservation of existing

memories, content jumps now only occurred if similarity exceeded a threshold; otherwise,

the default jump was used.

With these modiﬁcations in place, NEAT was indeed able to ﬁnd an NTM that

can learn new associations in a single trial without forgetting previously learned ones

(ﬁgure 12.11

𝑏

). Impressively, it was able to generalize almost perfectly to sequences

it had never encountered before. Which type of solution did evolution discover? The

network stores information about the food items in four memory locationsÐtwo for each

season (ﬁgure 12.11

𝑐

). Initially, the agent ignores all food items. However, after being

penalized for neglecting nutritious items, it begins to remember the ones it missed and

must consume in the future. Each nutritious item is stored in a separate memory location,

resulting in the use of all four locations. This memorization process is achieved by linking

the punishment input to the write interpolation output.

In summary, networks with an external memory oﬀer an intriguing complementary

329

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Figure 12.11: Evolvable Neural Turing Machine. (

𝑎

) The evolvable NTM is characterized by a

hard attention mechanism and a theoretically inﬁnite memory tape. (

𝑏

) The NTM discovered by

NEAT is able to learn new associates in one shot without forgetting previously learned ones. In

this manner, evolved networks with an external memory show promising performance for tasks

requiring continual learning. (

𝑐

) Days 3 and 4 of Season 1, as well as all days beyond Day 2 in

Season 2, are not displayed but are completed ŕawlessly. Legend: E-I: ANN output indicating

whether the food item should be consumed. E-O: ANN inputs from the environment: summer

item (1ś4), winter item (5ś8), reward (9), punishment (10). E-S: Score indicator. TM-W: Write

vector. TM-I: Write interpolation. TM-C: Content of the tape at the current head position after

writing. E-J: Content jump input. TM-S: The three shift values in descending order: left, none,

right. TM-R: Read vector. TM-H: Current head position after control operations. Figures from

Lüders, Schläger, and Risi (2016).

approach to learning that is not based on modifying activations (e.g. LSTMs, RNNs) or

weights (e.g. Hebbian learning). However, which approach (or which combination of

approaches) is best and for which type of problems is an important open research question.

12.4 Integrating Evolution, Learning, and Embodiment

While general-purpose RL algorithms are, in pr inciple, capable of solving a wide range of

tasks, they typically require vast amounts of data and interactions to do so. In contrast,

we have seen in this chapter that evolution can be used to łlearn to learnž by discovering

mechanisms that allow neural networks to adapt more eﬃciently to speciﬁc distributions

of tasks. This advance holds particular promise for real-world applications, such as robot

locomotion under various circumstances not encountered during training (section 12.3.2).

In this section, we review some of the major open questions and key challenges in

approaches that aim to combine the previously explored themes of evolution, learning,

and embodiment.

Balancing Generality and Adaptation: How can we evolve plastic neural networks

that are capable of truly learning new tasks dur ing their lifetimes? While current systems

have demonstrated impressive adaptability, such as transferring from simulation to physical

environments, they have yet to be conclusively tested on entirely novel task distributions.

This raises a fundamental tension between generality and specialization: how broad

should the capabilities of a learning system be, and how quickly should it adapt? A highly

specialized learner might adapt quickly to a narrow range of environments but fail to

330

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

generalize. Conversely, a general learner might be slower to adapt but more robust across

tasks. The optimal solution likely lies in discovering mechanisms that allow both fast

adaptation and wide generalization, mirroring the kind of ŕexible intelligence observed in

biological brains.

One unresolved question is the łcorrectž way to implement plasticity in artiﬁcial

neural networks. A promising direction is to explore systems that combine multiple

mechanismsÐlocal learning rules, memory, str uctural plasticityÐin a coordinated manner.

Neuroevolution is uniquely suited to discover such synergies, especially when indirect

encodings are used to represent both network structure and plasticity rules.

The Deceptive Trap of Learning to Learn: Even if a system contains all the

necessary ingredients for learning, there is no guarantee that evolution will discover the

optimal conﬁguration. A key challenge in evolving cognitive behaviors is deception in the

ﬁtness landscape. Evolutionary processes can become trapped in local optima, especially

when early-stage solutions provide some success without requiring genuine adaptation.

This observation is a well-known issue in meta-learning settings: simple heuristics can

outperform more complex, adaptive solutions in the short term, diverting evolutionary

trajectories away from the more promising long-term strategies. More open-ended search

strategies, such as novelty search, have proven eﬀective in overcoming such deception

(Risi, Hughes, and Stanley,

2010). By explicitly rewarding behavioral diversity, these

approaches help maintain exploration pressure and uncover more sophisticated adaptive

behaviors. For instance, we have seen in section 6.3.2 that novelty search has shown

promise in evolving agents with both memory and lifetime learning capabilities.

However, as we seek to combine more mechanisms, the search space becomes

increasingly complex and deceptive. Tackling this will require not only better optimization

methods but also a deeper understanding of how these components interact during both

evolution and learning.

Indirectly Encoding Plasticity and Generalization: Evolutionary algorithms with

indirect encodings excel at solving regular problems because they reuse genetic information

to generate structured, regular phenotypes. However, this reliance on regularity can be

a double-edged sword: while regular neural structures can generalize well, they can

also make ﬁne-tuning speciﬁc connections more challenging. This trade-oﬀ can pose a

challenge for solving more complex problems.

To address this trade-oﬀ, a promising solution emerges from biology: the combination

of developmental encodings with lifetime lear ning mechanisms like synaptic plasticity.

Developmental encodings bias evolution toward producing regular, scalable networks,

while plasticity enables those networks to adapt to unique, context-dependent details

during their lifetimes. This łgenomic bottleneckž has been hypothesized to facilitate

generalization, as it is a strong regularizer for architectures and learning rules that generalize

well (Zador, 2019). Empirical ﬁndings support this synergy: networks generated by more

regular encodings (Pedersen and Risi, 2021; Tonelli and Mouret, 2013) tend to exhibit

better general learning abilities when plasticity is introduced. These results suggest that

combining indirect encodings for eﬃcient structural generalization with reinforcement

learning or plasticity for ﬁne-grained adaptation can yield artiﬁcial systems that are both

robust and ŕexibleÐmirroring the dual strategy used by animal brains to balance inherited

331

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

Figure 12.12: Overview of the DERL approach. DERL generates embodied agents through

the interaction of two adaptive processes. The outer loop performs evolutionary search over

morphologies, applying structural mutationsÐsuch as limb addition or modiﬁcation, illustrated in

(

𝑏

)Ðto iteratively reﬁne the agent’s physical form. In parallel, the inner loop uses reinforcement

learning to train a neural controller from scratch for each morphology (

𝑐

). A range of example

morp hologies generated within the UNIMAL design space, a modular and expressive representation

for articulated agents, is shown in (

𝑑

). The environments in which these agents evolve vary in

complexity; (

𝑒

) shows the variable terrain setting, composed of stochastically generated obstacles

including hills, steps, and rubble. In the most complex scenarioÐmanipulation in variable

terrainÐagents must not only traverse the terrain, but also manipulate an object from a randomly

assigned starting location (green sphere) to a designated goal (red square), requiring coordinated

locomotion and interaction with the environment. Figure from Gupta, Savarese, Ganguli, et al.

(2021). Video at https://neuroevolutionbook.com/demos.

structure with lifelong adaptability.

Future research should focus on understanding how to best encode plasticity within

indirect frameworks and how to harness the synergy between genetic regularity and

lifetime learning. This combination could be the key to unlocking the full potential of

indirect and developmental encodings.

Embodiment and Morphological Evolution: An exciting avenue for future research

lies in the evolution of embodied agents, i.e. systems where learning mechanisms, neural

architectures, and physical morphologies co-evolve. In terms of learning and physical

morphology, one approach that takes a step in this direction is the deep evolutionary

reinforcement learning (DERL) framework (Gupta, Savarese, Ganguli, et al., 2021). DERL

combines an outer evolutionar y loop that searches over robot morphologies with an inner

loop of reinforcement learning that trains control policies within each agent’s lifetime.

While this combination does not use neuroevolution per se (i.e. the network weights are

trained with reinforcement learning), it shows the synergistic eﬀects of combining these

methods.

As outlined in ﬁgure 12.12

𝑎

, this dual loop allows agents not only to evolve structurally

332

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

through mutation and selection, but also to learn sensorimotor skills from scratch

using standard reinforcement learning methods (ﬁgure

12.12

𝑐

). The design space for

morphologies, UNIMAL (ﬁgure 12.12

𝑑

), is expressive enough to allow for highly varied

and articulated body plans, while remaining tractable enough for large-scale search.

What makes DERL interesting is how it reveals deep connections between environ-

mental complexity, morphological evolution, and the learnability of control. As agents

evolve in more challenging environments (ﬁgure

12.12

𝑒

), their bodies adapt in ways

that inherently support more general learning. Even when transferred to novel tasks,

these morphologies outperform others evolved in simpler settings. Moreover, a strong

morphological Baldwin eﬀect emerges: evolution consistently selects for bodies that

make learning easier. An exciting next step is to evolve not just morphologies, but also

the neural architectures and initial weights of these controllers using neuroevolutionary

methods. Such integration promises even faster and more robust lifetime learning. As

part of chapter

14 on what neuroevolution can tell us about biological evolution, we’ll

return to the evolution of virtual creatures and what their morphological constraints mean

for evolution (section 14.5).

In conclusion, the integration of evolution, lear ning, plasticity, and embodiment

represents one of the most exciting frontiers in artiﬁcial intelligence. This research not

only promises more eﬃcient and adaptive agents but also oﬀers a unique window into the

evolution of natural intelligence, which we will explore more deeply in chapter 14. For

now, we will turn our attention to another method that can be eﬀectively combined with

neuroevolution: generative AI.

12.5 Chapter Review Questions

Reinforcement Learning vs. Neuroevolution: What are the key strengths

and weaknesses of reinforcement learning and neuroevolution d when applied to

optimization tasks? How do their approaches diﬀer in handling sparse rewards and

high-dimensional spaces?

Evolutionary Reinforcement Learning (ERL): How does ERL combine evolution-

ary algorithms and deep reinforcement learning? What are the speciﬁc advantages

of integrating these methods in tasks with sparse rewards?

Replay Buﬀer in ERL: What is the role of the replay buﬀer in ERL? How does it

enable the algorithm to learn within episodes, unlike standard neuroevolution?

NEAT+Q Approach: How does the NEAT+Q algorithm integrate neuroevolution

(via NEAT) with Q-learning? What are the advantages of this approach for evolving

neural architectures in reinforcement learning tasks?

Meta-Learning with Evolutionary Methods: How does evolutionary meta-

learning diﬀer from traditional reinforcement learning? How does it exploit the

Baldwin eﬀect to enable few-shot learning across diverse task distributions?

333

CHAPTER 12. SYNERGIES WITH REINFORCEMENT LEARNING

ES-MAML: What makes ES-MAML particularly well-suited for meta-learning in

noisy environments? How does it diﬀer conceptually and computationally from

gradient-based meta-learning methods like MAML?

Evolving Networks to Reinforcement Learn: What are the advantages of evolving

neural networks capable of intrinsic reinforcement learning? How does this approach

address the challenges of non-stationary rewards and environmental changes?

Hebbian Learning Rules: How does the evolution of Hebbian learning rules

enable neural networks to adapt during their lifetimes? What are some limitations

of using simple Hebbian mechanisms for complex tasks?

Neuromodulation in Evolved Networks: How does incorporating neuromodu-

lation into evolved networks enhance their ability to learn and adapt? Why is

neuromodulation particularly eﬀective in tasks requiring memory and adaptation?

10.

Evolvable Neural Turing Machines: What distinguishes the architecture of the

evolvable NTM from that of traditional neural networks? How does it interact

with its external memory, and how does this form of memory usage compare to

learning via internal activations in models like LSTMs or through weight updates in

approaches such as Hebbian learning?

334

Chapter 13

Synergies with Generative AI

Generative AI, exempliﬁed by the breakthroughs like large language models, has redeﬁned

our ability to synthesize knowledge, create diverse content, and solve problems requiring

creativity. This paradigm includes a broad family of models such as generative adversarial

networks (GANs; Goodfellow, Pouget-Abadie, Mirza, et al.,

2020) for high-ﬁdelity image

synthesis, autoencoders (Hinton and Salakhutdinov,

2006; Kingma and Welling, 2014) for

representation learning and reconstruction, diﬀusion models (Ho, A. Jain, and Abbeel,

2020; Sohl-Dickstein, E. Weiss, Maheswaranathan, et al., 2015) for producing complex,

realistic samples through iterative reﬁnement, and large language models (LLMs; Hadi,

Al Tashi, Qureshi, et al., 2025; Min, Ross, Sulem, et al., 2024) for text generation and

reasoning. While generative AI thrives in producing new ideas and solutions, it often

beneﬁts from robust frameworks for exploration and optimizationÐwhich are areas where

neuroevolution excels. This chapter examines how these two ﬁelds can complement each

other in a bi-directional fashion. Evolutionary algorithms can expand the potential of

generative AI by evolving architectures, ﬁne-tuning parameters, and fostering diversity

in outputs. At the same time, generative AI can enhance evolutionar y computing by

generating creative solutions, identifying optimal conﬁgurations, and producing complex

evolutionary outcomes. Before we take a closer look at these synergies, let’s review some

relevant background information on LLMs.

13.1 Background on Large Language Models

Large language models (LLMs) are characterized by their vast scale and capacity to process

and generate human-like text, making them powerful tools for a variety of language-based

tasks. There are many such models, including GPT (Achiam et al., 2023; OpenAI, 2025),

Gemini (Anil et al., 2025; Gemini Team, 2025), Llama (Grattaﬁori et al., 2024; Touvron

et al., 2023), Claude (Anthropic, 2025a; Anthropic, 2025b), Qwen (Bai et al., 2023;

A. Yang et al., 2025),sidxLarge language models!Qwen Mistral (A. Q. Jiang et al., 2023;

Mistral AI, 2024), and DeepSeek (D. Guo et al., 2025; A. Liu et al., 2024). Some of

these are closed and accessible through a paid interface only, and others are open; some

are general chatbots, others include sophisticated reasoning abilities and tool use such as

web access; many of them are actually combinations of multiple models with diﬀerent

335

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

specialties.

The backbone of all of these LLMs is the transformer architecture (Vaswani, Shazeer,

Parmar, et al., 2017), which employs a self-attention mechanism allowing the model to

consider the importance of all other words in a sentence, regardless of their positional

distance from the word being processed. Unlike models that rely on recurrent layers, the

transformer’s architecture allows for parallel processing of data, increasing eﬃciency and

scalability when managing the large datasets essential for training LLMs. Self-attention

was described in more detail in section 4.4.

LLMs undergo extensive pre-training on large text corpora, learning to predict the

next token in a sequence. Beyond the massive data ingestion, researchers also ﬁne-tune

various aspects such as the ratio of diﬀerent data types in the training set, the learning rate,

and other training parameters to optimize performance.

The performance of LLMs adheres to what is called scaling laws (Kaplan, McCandlish,

Henighan, et al., 2020). These laws demonstrate that model performance improves

logarithmically with increases in size, data volume, and computational power. Large-scale

data not only aids in training more accurate models but also ensures a broader linguistic

coverage, allowing the models to generalize better across various tasks. The need for so

much data shows why scaling laws matter; they help us predict how well LLMs will work

as they get bigger.

However, despite their extensive pre-training, LLMs in their raw form are not

fully equipped to handle specialized tasks directly. The transition from a general

linguistic understanding to speciﬁc real-world applications requires signiﬁcant post-

training optimization. This phase involves ﬁne-tuning the model on task-speciﬁc datasets,

which reﬁnes its responses according to particular needs. Additionally, the use of prompt

engineering enhances how models interpret and respond to queries, making them more

eﬀective and adaptable. These adjustments are key to shaping LLMs for speciﬁc uses,

from everyday chatbots to more complex, domain-focused tasks.

While the current trend predominantly focuses on constructing larger models trained

on increasingly vast datasets, there exists a parallel strand of research that employs

evolutionary computing to enhance LLMs in innovative and less conventional manners (C.

Wang, J. Zhao, Jiao, et al., 2025; X. Wu, S.

h. Wu, J. Wu, et al., 2024), as we will explore

in subsequent sections.

13.2 Evolutionary Computing Enhances LLMs

While LLMs excel at generalizing knowledge across vast domains, leveraging their

capabilities for speciﬁc tasks often requires tailoring, optimization, and adaptation.

Evolutionary computing oﬀers a natural avenue for addressing these challenges, providing

mechanisms to explore and optimize solutions in high-dimensional, complex spaces.

This section explores how evolutionary algorithms can be harnessed to enhance LLM

performance, focusing on their role in optimizing task prompts and merging expert models

specialized in diﬀerent areas. Through this integration, evolutionary computing acts as

both an optimizer and a creative engine, complementing the generative capabilities of

LLMs and enabling them to perform better on speciﬁc tasks.

336

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

13.2.1 Evolutionary Prompt Engineering/Adaptation

To adapt LLMs for speciﬁc downstream tasks, adding an instruction to the input text, known

as a discrete prompt, directs the LLMs to perform desired tasks with minimal computational

cost. This method does not rely on the direct manipulation of parameters and gradients,

making it especially suitable for LLMs with black-box APIs like GPT (Achiam et al., 2023;

OpenAI, 2025), Gemini (Anil et al., 2025; Gemini Team, 2025), and Claude (Anthropic,

2025a; Anthropic, 2025b). However, the eﬃcacy of LLMs in executing speciﬁc tasks

heavily relies on the design of these prompts, a challenge commonly addressed through

prompt engineering.

Prompt engineering often requires extensive human eﬀort and exper tise, with ap-

proaches ranging from enumerating and selecting diverse prompts to modifying existing

ones to enhance performance. These methods can lead to a cycle of exploration, which

might consume resources without substantive gains, or exploitation, which may conﬁne the

search to local optima and stiŕe broader improvements. Evolutionary algorithms, which

are particularly suited for this discrete prompt optimization, oﬀer a robust alternative.

Sequences of phrases in prompts can we seen as gene sequences, allowing us to use the

whole EA toolkit for prompt adaptation.

Taking this concept further, the evolutionary process can be used to maintain a diversity

of prompts, helping to avoid diminishing returns seen in conventional prompt engineering

methods. The trick here is that we can use the LLM itself to modify prompts as well as

the strategy for prompt modiﬁcation, leading to self-referential self-improvement. This

way, we harness not only the LLM’s linguistic capabilities but also its ability to iteratively

reﬁne the prompts based on performance feedback. As representative works in this area,

we review two approaches in this section: EvoPrompt (Q. Guo, R. Wang, J. Guo, et al.,

2024) and Promptbreeder (Fernando, Banarse, Michalewski, et al., 2024).

EvoPrompt optimizes prompts for language models by employing evolutionary al-

gorithms such as a GA and diﬀerential evolution (DE), which we brieŕy touched up on

section 2.2.6 (ﬁgure 13.1). The evolutionary process begins with a set of initial prompts

that leverage the wisdom of humans and a development dataset, where each prompt is

evaluated based on how eﬀectively it elicits the desired responses from the language model.

Throughout a series of iterations, prompts are selected based on their performance scores.

New prompts are then generated through evolutionary operations that include combining

elements from multiple selected prompts (crossover) and introducing random variations

(mutation). The prompts to introduce these operations are shown in ﬁgure 13.1. These

newly created prompts are subsequently evaluated, and those with superior performance

are retained for further reﬁnement in subsequent iterations. This cycle of selection,

generation, and evaluation repeats, progressively enhancing the quality of the prompts.

A key innovation of this method is the use of the LLM itself to generate new candidate

prompts based on evolutionary instructions.

The EvoPrompt method was evaluated across multiple tasks, including language

understanding, language generation, and the particularly challenging big bench hard

(BBH) tasks. BBH is a subset of the broader BIG-bench benchmark, speciﬁcally curated

to include the most diﬃcult tasks where language models often struggle. All tasks

are text-based but span diverse formats such as logical reasoning puzzles, multi-step

337

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Genetic Algorithm (GA) Implemented by LLMs

Query:

Please follow the instruction step-by-step to generate a better prompt.

1. Cross over

the following prompts and generate a new prompt:

2. Mutate

the prompt generated in Step 1 and generate a final prompt bracketed with

<prompt> and </prompt>.

Response:

Prompt 2: Assign a sentiment label to the given sentence from ['negative',

'positive'] and return only the label without any other text.

Prompt 1: Now you are a categorizer, your mission is to ascertain the

sentiment of the provided text, either favorable or unfavourable.

𝐂𝐫𝐨𝐬𝐬𝐨𝐯𝐞𝐫

1. Crossover

Prompt: Your miss ion i s to ascertain the sentiment of the

provided text

and assign a sentiment label from ['negative', 'positive’].

Determine the sentiment of the given sentence and assign a label

from ['negative', 'positive'].</

prompt>

𝐌𝐮𝐭𝐚𝐭𝐞

Figure 13.1: GA process in EvoPrompt. In Step 1, LLMs perform crossover on the given two

prompts (words in orange and blue are inherited from prompt 1 and prompt 2, respectively). In

step 2, LLMs perform mutation on the prompt. Figure from Q. Guo, R. Wang, J. Guo, et al. (2024).

arithmetic, commonsense reasoning, and code understanding. This makes BBH a widely

used stress test for assessing reasoning and generalization. While the EvoPrompt method

demonstrated impressive results across all tasks, the performance on BBH is especially

representative of its capabilities, as success on BBH indicates strong generalization and

robustness across complex, text-based challenges.

For the BBH tasks, the EvoPrompt method was applied to optimize prompts speciﬁcally

for the GPT-3.5 model. A subset of the test set was used as the development set to iteratively

reﬁne the prompts, with the ﬁnal performance reported as nor malized scores (ﬁgure 13.2).

The results were striking: EvoPrompt achieved substantial improvements across all 22

evaluated tasks. Speciﬁcally, the diﬀerential evolution variant of EvoPrompt led to as

much as a 25% improvement in some tasks, with an average improvement of 3.5%. In

comparison, the GA variant also performed well but slightly lower, reaching a peak

improvement of 15% and an average of 2.5%. While diﬀerential evolution approaches

have been less explored in neuroevolution than e.g. approaches based on GA or ES, the

strong performance in combination with prompt evolution suggests that they may provide

a competitive and underutilized paradigm in the age of generative AI.

Like EvoPrompt, Promptbreeder automates the exploration of prompts by utilizing

evolutionary algorithms to generate and reﬁne task prompts that condition LLMs for better

responses (ﬁgure

13.3). Each task prompt serves to condition the context of an LLM before

additional input, aiming to elicit a better response from the model. Promptbreeder starts

with an initial set of task prompts and mutation prompts, derived from combining domain-

speciﬁc problem descr iptions with varied łthinking stylesž and mutation strategies. This

initial population is crucial as it sets the baseline for the evolutionary process, incorporating

a rich diversity of approaches and perspectives right from the beginning. The system

338

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Figure 13.2: Normalized scores on Big Bench Hard (BBH) tasks for EvoPrompt. Since

the tasks are challenging, GPT-3.5 was used as the LLM. Score normalization is calculated in

comparison to the prompt łLet’s think step by stepž with a 3-shot Chain-of-Thought demonstration.

The diﬀerential evolution (DA) version consistently outperformed the GA version, achieving up to

25% improvement with an average gain of 3.5%, while GA reached a peak of 15% and a 2.5%

average. Figure from Q. Guo, R. Wang, J. Guo, et al. (2024).

evaluates the eﬀectiveness of each prompt by testing it on a batch of domain-speciﬁc Q&A

pairs. This evaluation informs the evolutionary process, where prompts are iteratively

reﬁned.

The mutation process in Promptbreeder includes direct mutations, where new task

prompts are generated from existing ones by applying simple changes, and more complex

mutations, where multiple prompts are combined or signiﬁcantly altered to explore

new prompt spaces. This process is depicted through various mutation mechanisms in

ﬁgure 13.4. One of the standout features of Promptbreeder is its self-referential mechanism,

where the system not only evolves task-prompts but also the mutation-prompts that guide

their evolution. This recursive improvement process ensures that the system becomes

increasingly eﬀective over time. The mutation-prompts themselves are subject to evolution,

optimized to produce more eﬀective task-prompts as the system learns from its successes

and failures.

Promptbreeder has been tested across a variety of domains to evaluate its eﬀectiveness in

optimizing prompts for LLMs. These domains include arithmetic reasoning, commonsense

reasoning, instruction induction, and hate speech classiﬁcation. The results indicate that

Promptbreeder consistently outperforms the previously considered state-of-the-art plan-

and-solve (PS+) technique. In tests using the underlying LLM PaLM 2-L, Promptbreeder

showed superior performance on almost all datasets. Notably, its zero-shot accuracy

surpasses that of PS+ in all tests. When few-shot examples are incorporated with

the prompts, Promptbreeder shows even more signiﬁcant improvement, highlighting its

robustness in both zero-shot and few-shot scenarios. A speciﬁc example of Promptbreeder’s

capability is demonstrated in its application to the ETHOS hate speech classiﬁcation

problem. Promptbreeder evolved a strategy involving two sequentially applied, relatively

long prompts that signiﬁcantly outperformed the manually designed prompt (see listing 6).

This adaptation resulted in an accuracy improvement from 80% to 89%, illustrating

Promptbreeder’s potential for intricate domain-speciﬁc task adaptation.

While both Promptbreeder and EvoPrompt utilize evolutionary algorithms to optimize

prompts, there are distinct diﬀerences in their methodologies and focus. EvoPrompt

339

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Listing 6 Hate speech classiﬁcation prompts evolved by Promptbreeder.

Prompt 1:

"Text contains hate speech if it includes any of the following:

* Words or phrases that are derogatory, disrespectful, or dehumanizing

toward a particular group of people. * Statements that generalize about

a particular group of people in a negative way. * Statements that incite

hatred or violence against a particular group of people. * Statements that

express a desire for the extermination or removal of a particular group

of people. 7: Text contains hate speech if it contains language that is

hostile or discriminatory towards a particular group of people. This could

include language that is racist, sexist, homophobic, or other forms of hate

speech. Hate speech is harmful because it can create a hostile environment

for marginalized groups and can lead to discrimination and violence."

Prompt 2:

"You are given a piece of text from the internet. You are

supposed to label the text as Hate Speech or Not based on the given

criteria. Hate Speech is defined as speech that is intended to incite

hatred or violence against a particular group of people based on their

race, religion, sex, sexual orientation, or national origin. The given

criteria are: 1. Identify the target group of the speech. This can be

a specific group of people, such as a race, religion, or gender, or it

can be a more general group, such as people with disabilities or sexual

minorities. 2. Identify the harmful speech. This can be speech that

is threatening, abusive, or derogatory. 3. Evaluate the context of the

speech. This can include the speaker’s intent, the audience, and the time

and place of the speech. The advice was: Remember to always evaluate the

context of the speech when making a determination as to whether it is hate

speech or not. Speech that is intended to be humorous or satirical may not

be considered hate speech, even if it contains harmful language."

340

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Thinking

Styles

“Let’s think step by step”

+ “Change this instruction to make it more fun” +

“INSTRUCTION:” + “Solve this math word problem” + “INSTRUCTION MUTANT = ”

Mutation

Prompts

Sample Sample

Problem Description

specific to GSM8K, AQuA,

ETHOS, SVAMP etc.

LLM

“Make up a systematic answer that

makes you look quite clever”

P: "Make up a systematic answer that makes you look quite clever"

M: "Change this instruction to make it more fun"

P: "Draw a diagram representing the math problem"

M: "Mutate the prompt with an unexpected twist"

P = "Let’s think step through this maths problem"

M = "Modify the instruction like no self-respecting LLM would"

P: "SOLUTION:"

M: "Consider how a better teacher would put this"

0.2

0.4

0.1

0.9

Populate

Mutate

Replace

Initialization of Population of Task-Prompts and Mutation-Prompts

Population (N Task-Prompts and their Mutation-Prompts)

Estimated fitness from a batch of training Q&A pairs

Direct Mutation

Estimation of

Distribution Mutation

Hyper Mutation

Mutate mutation-prompt

Lamarckian Mutation

Generate task-prompt

from the "working out"

Prompt Crossover

and

Context Shuffling

Mutation Operators

Figure 13.3: The Promptbreeder approach. This process begins with a set of problem descriptions

and initial prompts, creating evolution units with task and mutation-prompts. Using a binary

tournament genetic algorithm, it evaluates and iteratively reﬁnes these prompts across generations,

enhancing their eﬀectiveness and domain-speciﬁc adaptation. Figure from Fernando, Banarse,

Michalewski, et al. (2024).

primarily concentrates on reﬁning prompts through direct evolutionary operations, such

as crossover and mutation, driven by performance evaluations. It uses a more traditional

approach where the evolutionary process is straightforward and focused primarily on

task prompts alone. In contrast, Promptbreeder introduces a more complex and layered

approach by not only evolving the task prompts but also the mutation prompts that guide the

task prompt evolution. This self-referential approach allows Promptbreeder to adapt more

dynamically to the nuances of diﬀerent domains by continually reﬁning the mechanisms of

prompt evolution itself. Despite these diﬀerences, both examples demonstrate the potential

of evolutionary computing to signiﬁcantly enhance the performance of LLMs in seemingly

straightforward ways. In the following section, we will explore how neuroevolutionary

methods can be applied to merge multiple LLMs, resulting in a composite model that

embodies a superset of the capabilities of its constituent models.

13.2.2 Evolutionary Model Merging

The intelligence of the human species is not based on a single intelligent being, but on a

collective intelligence. Individually, we are actually not that intelligent or capable. Our

society and economic system is based on having a vast range of institutions made up

of diverse individuals with diﬀerent specializations and expertise. This vast collective

intelligence shapes who we are as individuals, and each of us follows our own path in

life to become a unique individual, and in turn, contribute back to being part of our

ever-expanding collective intelligence as a species. Some researchers believe that the

341

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Figure 13.4: Overview of multiple variants of self-referential prompt evolution. In (

𝑎

), the

LLM is directly used to generate variations

𝑃

′

of a prompt strategy

𝑃

. Using a mutation prompt

𝑀

an LLM can be explicitly prompted to produce variations (

𝑏

). By using a hyper mutation prompt

𝐻

, the mutation prompt itself can also be evolved, turning the system into a self-referential one (

𝑐

Promptbreeder (

𝑑

) improves the diversity of evolved prompts and mutation prompts by generating

an initial population of prompt strategies from a set of seed thinking-styles

, mutation-prompts

, as well as a high-level description

𝐷

of the problem domain. Figure from Fernando, Banarse,

Michalewski, et al. (2024).

development of artiﬁcial intelligence will follow a similar, collective path. The future of

AI will not consist of a single, gigantic, all-knowing AI system that requires enormous

energy to train, run, and maintain, but rather a vast collection of small AI systemsÐeach

with its own niche and specialty, interacting with each other, with newer AI systems

developed to ﬁll a particular niche.

A noticing and promising trend in the open-source AI ecosystem is that open-source

foundation models are readily extended and ﬁne-tuned in hundreds of diﬀerent directions

to produce new models that are excellent in their own niches. Unsurprisingly, most of the

top-performing models on Open LLM leaderboards are no longer the original open base

models such as LLaMA or Mistral, but models that are ﬁne-tuned or merged versions of

existing models. Furthermore, open models of diﬀerent modalities are being combined

and tuned to be vision-language models (VLMs) which rival end-to-end VLM models

while requiring a fraction of the compute to train. Model merging shows great promise

and democratizes model-building to a large number of participants. However, it can be a

łblack artž, relying heavily on intuition and domain knowledge. Human intuition, however,

has its limits. With the growing diversity of open models and tasks, we need a more

systematic approach.

This requirement makes it the perfect task for neuroevolution, which we have seen

throughout this book can discover novel and unintuitive combinations that traditional

methods and human intuition might miss. One such approach is called evolutionary model

merge (Akiba, Shing, Tang, et al., 2025), which is designed to discover the best ways to

combine diﬀerent models. It combines two diﬀerent approaches (ﬁgure 13.5), which we

will discuss in more detail below: (1) Merging models in the data ŕow space (layers), and

(2) merging models in the parameter space (weights).

At a high level, merging in the data ŕow space uses evolution to discover the best

combinations of the layers of diﬀerent models to form a new model. In the model merge

community, intuition and heuristics are used to determine how and which layers of one

342

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Our Merged ModelsCollection of Models

Model Layers

Merge in PS

Merge in DFS

Merge in both

Q1: Mishka bought 3 pairs of shorts, 3 pairs of long pants, and 3 pairs of shoes. … How much were spent on all the clothing?

Q2: Cynthia eats one serving of ice cream every night. … How much will she have spent on ice cream after 60 days?

…

A1:

A2:

…

Accuracy: 0.18

A1:

A2:

…

Accuracy: 0.31

A1:

A2:

…

Accuracy: 0.52

A1:

A2:

…

Accuracy: 0.36

A1:

A2:

…

Accuracy: 0.56

Figure 13.5: Evolutionary model merging. The approach involves three key components: (1)

evolving the mixing weights for parameters at each layer within the parameter space (PS); (2)

evolving the permutations of layers within the data ŕow space (DFS); and (3) an integrated strategy

that combines both parameter and data ŕow merging. Importantly, merging in the PS goes beyond

simply copying and stitching together layer parameters; it actively blends the weights, much like

mixing colors (e.g. red and blue blending to form purple). Figure from Akiba, Shing, Tang, et al.

(2025).

model are combined with layers of another model. But one can see how this problem has a

combinatorially large search space, which is best suited to be searched by an optimization

algorithm such as evolution. On the other hand, merging in the parameter space evolves

new ways of mixing the weights of multiple models. There are an inﬁnite number of

ways of mixing the weights from diﬀerent models to form a new model, not to mention

the fact that each layer of the mix can, in principle, use diﬀerent mixing ratios. This is

where an evolutionary approach can be applied to eﬃciently ﬁnd novel mixing strategies

to combine the weights of multiple models. Finally, both data ŕow space and parameter

space approaches can be combined to evolve new foundation models that might require

particular architectural innovations to be discovered by evolution.

How far can this automated method advance by discovering new ways to combine the

vast array of open-source foundation models, particularly across domains that are quite

distant from each other, such as mathematics and non-English languages, or vision and

non-English languages? In fact, it turns out that it is possible to use neuroevolution to create

new open models with emergent combined capabilities that had not previously existed: a

Japanese math LLM, and a Japanese-capable VLM, all evolved using this approach and

achieve state-of-the-art performance on Japanese language and vision language model

benchmarks.

Concretely, a ﬁrst step was to evolve an LLM that can solve math problems in Japanese.

Although language models specialized for Japanese and language models specialized

for math exist, there were no models that excelled at solving mathematical problems in

Japanese. To build such a model, three source models were selected: a Japanese LLM

343

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

(Shisa-Gamma) and math-speciﬁc LLMs (WizardMath and Abel). In the merging process,

the evolution process went on for a couple of hundred generations, where only the ﬁttest

(the models that score highest in the population on the Japanese math training set) would

survive, and repopulate the next generation. The ﬁnal model that was evaluated on the test

set was the one that performed best on the training set during the evolutionary search.

Info Box: The Intersection of EC and LLMs

At the beginning of generative AI innovation, I (Yujin Tang) began my journey

at Google Brain, and later merged into Google DeepMind, pr imarily focusing on

evolutionary algorithms and their applications. The release of GPT-3 inspired me

to explore the symbiotic potential between evolutionary computing (EC) and LLMs.

With access to a suite of Google internal LLMs and early tests of Gemini, a bunch

of us recognized LLMs as exceptional pattern recognition machines. This led to

our works (Lange, Tian, and Tang, 2024a; Lange, Tian, and Tang, 2024b) that

explored the possibility of enhancing EC with pre-trained and ﬁne-tuned LLMs.

At the same time, despite the prowess of LLMs in understanding of generating

complex patter ns, I noted the signiﬁcant challenges associated with ﬁne-tuning

these models for speciﬁc tasks. This process demanded extensive engineering,

predominantly leaning on gradient-based methods, also a path heavily tread by

giants like Google, Meta, and OpenAI.

Later when I joined Sakana AI, I attempted to apply the NEAT algorithm to

LLMs, treating each layer as an independent node. This approach initially seemed

promising but was quickly met with challenges due to the vast search space and the

high sensitivity of LLM to local failures, i.e. even a small percentage of suboptimal

nodes could dramatically aﬀect overall model performance. To combat these issues,

I had to implement some strategic constraints such as limiting connections to serial

formations and applying scaling matrices, thereby reﬁning the data ŕow space

model merging method. These are all early works in marrying EC and LLMs, but

are already demonstrating the transformative power of integrating the two for more

adaptive and robust AI systems.

Table 13.1 summarizes these results. Model 4 is optimized in parameter space and

model 6 is further optimized in data ŕow space using model 4. The correct response rates

for these models are signiﬁcantly higher than the correct response rates for the three source

models. While it was incredibly diﬃcult for an individual to manually combine a Japanese

LLM with Math LLMs, through many generations, evolution was able to eﬀectively ﬁnd

a way to combine a Japanese LLM with Math LLMs to successfully construct a model

with both Japanese and math abilities. Notably, the per formances of the merged models

are approaching those of GPTs and surpassing larger models that are only specialized in

Japanese.

In constructing the Japanese VLM, a popular open-source VLM (LLaVa-1.6-Mistral-

7B) and a capable Japanese LLM (Shisa Gamma 7B v1) were used to see if a capable

Japanese VLM would emerge. Table

13.2 summarizes the performance of the merged

VLM and the baselines. Both JA-VG-VQA-500 and JA-VLM-Bench-In-the-Wild are

344

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Table 13.1: Performance Comparison of the LLMs. Models 1ś3 are source models, Models

4ś6 are merged models, and Models 7ś11 are provided for reference. PS stands for Parameter

Space merging, and DFS is the abbreviation for Data Flow Spacing merging. Models merged with

evolution (models 4ś6) signiﬁcantly outperformed similarly sized models (models 1ś3) and even

surpassed GPT-3.5 on the Japanese math task. Table from Akiba, Shing, Tang, et al. (2025).

Id. Model Type Size MGSM-JA (acc ↑)

1 Shisa Gamma 7B v1 JA general 7B 9.6

2 WizardMath 7B v1.1 EN math 7B 18.4

3 Abel 7B 002 EN math 7B 30.0

4 Akiba et al. 2025 (PS) 1 + 2 + 3 7B 52.0

5 Akiba et al. 2025 (DFS) 3 + 1 10B 36.4

6 Akiba et al. 2025 (PS+DFS) 4 + 1 10B 55.2

7 Llama 2 70B EN general 70B 18.0

8 Japanese StableLM 70B JA general 70B 17.2

9 Swallow 70B JA general 70B 13.6

10 GPT-3.5 commercial - 50.4

11 GPT-4 commercial - 78.8

Japanese benchmarks involving questions and answers about images. The higher the

score, the more accurate the description is answered in Japanese. Interestingly, the merged

models were able to achieve higher scores than not only LLaVa-1.6-Mistral-7B, the English

VLM on which it is based, but also JSVLM, an existing Japanese VLM. This was the ﬁrst

eﬀort to merge VLMs and LLMs, demonstrating that neuroevolutionary algorithms can

play an important role in the success of the merge.

13.2.3 Fine-Tuning with Evolution Strategy

Given the successes in prompt engineering and model merging, a compelling further ques-

tion is: does neuroevolution scale to optimizing LLMs directly? Much of neuroevolution

in earlier chapters focused on discover ing clever behavior that could be implemented with

much smaller networks: for instance, ﬁgure 3.7 showed how double-pole balancing without

velocities could be achieved with just a few neurons and weights. Neural architecture search

and metalearning (chapters 10 and 11) expanded the cope to deep learning architectures,

but evolutionary discovery was synergetically combined with gradient descent. Can

neuroevolution be used to optimize neural networks consisting of billions of parameters?

Surprisingly, they can. A recent study showed that a simple evolutionary approach

described in section 2.2.2, evolution strategy (ES), can be eﬀective in ﬁne-tuning LLMs

with several billion parameters (Qiu, Gan, Hayes, et al., 2025). Compared to the current

state-of-the-art ﬁne-tuning methods such as PPO and GRPO reinforcement learning, ES

can achieve better per formance, be more consistent across runs, more sample-eﬃcient and

compute-eﬃcient, more robust across diﬀerent LLMs, and less prone to reward hacking,

The main contrast with RL-based ﬁne-tuning is the focus of optimization. RL methods

are overwhelmingly based on action-space exploration, that is, they adjust the LLM policy

to favor outputs that lead to higher rewards. A policy gradient is calculated based on

345

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Table 13.2: Performance Comparison of the VLMs. LLaVA 1.6 Mistral 7B is the source VLM

and Japanese Stable VLM is an open-sourced Japanese VLM. While JA-VG-VQA-500 measures

general VQA abilities in Japanese, JA-VLM-Bench-In-the-Wild evaluates the model’s handling of

complex VQA tasks within Japanese cultural contexts. The performance of all merged models

(bottom group) sur passed the baselines on both tasks. Table from Akiba, Shing, Tang, et al. (2025).

JA-VG-VQA-500 JA-VLM-Bench-In-the-Wild

Model Size (ROUGE-L ↑) (ROUGE-L ↑)

LLaVA 1.6 Mistral 7B 8B 14.3 41.1

Japanese Stable VLM 8B - 40.5

Akiba et al. 2025 (PS) 8B 19.7 51.2

Akiba et al. 2025 (DFS) 12B 16.8 46.5

Akiba et al. 2025 (PS+DFS) 11B 20.4 47.6

reinforcement feedback, and model weights are then changed to make high-reward actions

more likely.

In contrast, ES optimizes the model in the parameter space. There is no gradient to

direct the changes, but instead, parameter values of the current best model in the population

are randomly perturbed in order to ﬁnd combinations that perform better. In principle, it

is possible to ﬁnd improvements that are more fundamental and systematic: they underlie

the better action sequences rather than immediately result in them. In particular, this

approach should work well in reasoning tasks with long-horizon rewards, where only the

ﬁnal outcome is rewarded rather than individual actions leading towards it.

The ES approach was evaluated in the countdown task, which requires constructing

an arithmetic expression with a given set of operators that results in a given target value

from a given set of input values. For instance, with the basic operators +, - *, /, the target

950, and inputs 3, 6, 50, 100, a valid solution is (3+6)*100+50. While the task is compact

and easily described, solving it requires constrained general symbolic reasoning, which is

generally diﬃcult for LLMs.

When set to ﬁne tune open-source Qwen and Llama Instruct models ranging from

0.5B to 8B parameters, ES ﬁne-tuning performed very well (table 13.3). On average, it

improved the performance of the base model by 36%, compared to 18% for PPO and 21%

for GRPO. To reach the same level of performance, it needed only 20% of the samples

required by RL. Whereas RL practically failed to improve small models at all, ES was

able to bring up their performance signiﬁcantly.

To understand the foundations of these diﬀerences, the comparison was implemented

in another ﬁne-tuning dimension: conciseness. The models were applied to a question

answering benchmark, in which they were already quite good, albeit verbose. In ﬁne-

tuning, they were not rewarded for the accuracy of answers, but instead only for the

conciseness of answers. For instance, with the prompt łName one primary colorž, a

verbose answer might be łPrimary colors can be combined to produce other colors. The

choice of primary colors depends on the medium; for instance, artists use red, yellow, and

blue. Therefore, a possible representative primary color is red.ž That answer would not

346

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Base Model Raw RL ES

PPO GRPO8 GRPO30

Qwen-2.5-0.5B-Instruct 0.1 0.3 0.3 0.5 14.4

Qwen-2.5-1.5B-Instruct 0.7 14.2 13.9 14.8 37.3

Qwen-2.5-3B-Instruct 10.0 20.1 30.9 32.5 60.5

Qwen-2.5-7B-Instruct 31.2 55.1 54.2 52.8 66.8

Llama-3.2-1B-Instruct 0.4 11.2 14.5 13.0 16.8

Llama-3.2-3B-Instruct 3.2 35.3 39.4 38.8 51.6

Llama-3.1-8B-Instruct 8.1 42.8 49.9 51.3 61.2

Table 13.3: Accuracy of ES Fine-tuning on the Countdown Task. The percentage of correct

answers is compared across diﬀerent model types (Qwen and Llama) and sizes (0.5B to 8B), and

diﬀerent ﬁne-tuning algorithms (PPO, GRPO, and ES). Raw refers to the model without ﬁne-tuning;

GRPO8 and GRPO30 indicate group sizes of eight and 30. On average, ES ﬁne-tuning improves

accuracy signiﬁcantly more than the RL methods, even with small models. For an animation of

this process, see https://neuroevolutionbook.com/demos.

get as high a reward as a short answer, such as łRedž.

In addition to conciseness, the ﬁne-tuned models were evaluated in terms of the

accuracy of their answers, as well as how diﬀerent they were from the original base model.

The KL divergence between models was used as the diﬀerence metric (after Rafailov,

A. Sharma, E. Mitchell, et al., 2023). The main result was that ES discovered a strongly

dominant Pareto front along conciseness and KL divergence. That is, it was able to

achieve concise answers with much smaller changes to the model (ﬁgure 13.6). As a

matter of fact, RL answers become concise only when the changes were so large that they

broke the performance of the model: they were no longer accurate, and often were even

nonsensical, i.e. constituted an extreme form of reward hacking. The ES performance was

also consistent across diﬀerent runs and models.

While the ES ﬁne-tuning results are good, they are surprising. More research is needed

to fully understand them, but several possible explanations have already emerged. First,

population-based search is likely to be a key ingredient: it is possible that a large number of

successful parameter settings exist, and it may be suﬃcient to ﬁnd a subset of such a setting

to establish the desired behavior (similar to the lottery ticket hypothesis in section 12.3.4).

Population-based search may then be an eﬀective way to ﬁnd such a setting. Second,

parameter-space exploration may make it possible to ﬁnd latent representations underlying

a class of behaviors, rather than overﬁtting to speciﬁc action sequence examples. For

instance, just two examples were enough to ﬁne tune the models for concisenessÐnot

millions, thousands, or hundredsÐtwo! Third, whereas gradient-based learning may be

misled by jagged reward landscapes, ES may be resistant to such eﬀects: because its

search is perturbative, it may be more informed by the broad outlines of the space.

Once the understanding of these eﬀects improves, it should be possible to improve the

search algor ithm itself. The results were achieved with vanilla ES; more sophisticated

versions of ES exist already, and others can be designed to address the speciﬁc needs

of ﬁne-tuning. For instance, CMA-ES approach could perhaps be used to optimize the

347

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Figure 13.6: Maximizing Reward and Minimizing Diﬀerence in the Conciseness Task. Qwen

models of various sizes were ﬁne tuned to generate concise answers to questions. Compared to RL

methods such as GRPO, ES makes answers concise (i.e. high reward) with very small changes to

the model (measured by KL divergence). To be concise, RL hacks the reward and often results in

incorrect or even nonsensical answers. The main diﬀerence is that ES explores in the parameter

space rather than in the action space, presumably making it possible to discover principled and

systematic changes that improve performance. Figure from Qiu, Gan, Hayes, et al. (2025).

perturbations, and swarm optimization to resist jagged changes. Such understanding and

methods could also lead to a better theory of representations of knowledge in LLMs, and

even better pretraining techniques for them.

13.3 LLMs Enhance Evolutionary Computing

In the previous section, we discussed how evolutionary computing can help improve the

performance of LLMs. Now, we turn our attention to exploring the synergy between these

two ﬁelds from the opposite direction: how LLMs can enhance evolutionary computing.

By leveraging their ability to process, generate, and reﬁne complex information, LLMs

can support evolutionary algorithms in numerous ways. This bi-directional relationship

highlights the complementary strengths of the two paradigms.

13.3.1 Evolution through Large Models

A particularly interesting example that showcases how LLMs can enhance evolutionary

computation is an approach called evolution through large models (ELM; Lehman,

Gordon, S. Jain, et al., 2023). The main idea behind this approach is to enhance genetic

programming by facilitating LLMs as advanced mutation operations. LLMs, trained on

datasets featuring sequential code changes and modiﬁcations, are adept at simulating

probable alterations that a human programmer might make. This ability enables these

models to guide the evolution of code in sophisticated, contextually aware manners that

surpass the capabilities of traditional mutation operators used in genetic programming.

At the core of the methodological innovation is the rethinking of the mutation operator,

a fundamental component in GP. Traditionally, GP mutations are stochastic, applying

random or simple deterministic changes that may not always respect the underlying logic or

348

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

(𝑎) Mutations

Map of Diverse Champions

Python Program

Diff Model

Python Program

Width of Sodaracer

Height of

Sodaracer

(𝑏) MAP-Elites

Figure 13.7: ELM mutation operator and MAP-Elites integration. (a) Success rate for GP

mutation decreases exponentially with the number of mutations, and produces no solutions when

there are ﬁve bugs. In contrast, diﬀ mutation degrades only with the ﬁfth bug. The conclusion

is that LLM-based mutation can indeed make multiple sensible coupled changes to code. (b) In

each MAP-Elites iteration, a Python solution is sampled from the archive for each replica of a diﬀ

model. Each replica generates a batch of diﬀs applied to the sampled solution to produce modiﬁed

candidates. These candidates are evaluated and used to update the archive. Over time, a single

seed program evolves into a variety of high-performing Python programs. Figures from Lehman,

Gordon, S. Jain, et al. (2023).

syntax of the code. In contrast, the ELM approach leverages the sophisticated capabilities

of LLMs to introduce a łdiﬀž based mutation process which, unlike conventional methods,

utilizes the deep learning insights of LLMs, trained on vast repositories of code changes

(diﬀs) from real-world projects (e.g. projects on GitHub). By understanding both the

context and the functionality of code segments, LLMs can generate diﬀs that are not only

syntactically correct but also semantically meaningful.

Figure 13.7

𝑎

highlights a performance comparison between the diﬀ mutation in ELM

and the conventional GP mutation in ﬁxing bugs. The success rate of generating new code

that ﬁxes bugs dropped dramatically for the GP mutation, while the diﬀ mutation is able

to retain the success rate until encountering the 5th bug in the code.

As a demonstration of the ELM approach, it was integrated with the MAP-Elites

algorithm (section 5.4) and applied to the Sodarace simulator. Sodarace is a physics-based

environment that provides a low-cost, simulated sandbox for invention. The objective is

to build two-dimensional robots, called sodaracers, from masses and oscillating springs,

such that they can eﬀectively move across terrain. Each sodaracer consists of a variable

number of point masses (deﬁned by their initial 2D positions) connected by springs that

oscillate. The springs’ oscillations, characterized by amplitude and phase (with a shared

period across all springs), drive the robot’s motion. To evaluate performance, a sodaracer

is simulated on a given terrain for a ﬁxed duration, and its locomotion ability is measured

by the distance its center of mass travels along the x-axis. Rather than searching directly in

the space of masses and springs, the ELM approach uses LLMs to generate Python code

that deﬁnes each Sodaracer’s structure. In this setup, the programs produced by ELM

serve as indirect encodings where any functional code expressing a valid morphology can

be evolved or adapted through this system.

349

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

The MAP-Elites behavior characterization is deﬁned by a sodaracer’s height, width,

and mass, forming a

12 ×12×12

grid. An overview of the process is shown in ﬁgure 13.7

𝑏

It begins with the evaluation and placement of a single hand-crafted solution. In each

subsequent iteration, a niche already occupied on the map is selected at random. The

solution in that niche is then perturbed using the diﬀ model to generate a new candidate,

which is evaluated and assigned a niche based on its behavioral traits. Following the

standard MAP-Elites approach, if the assigned niche is empty or if the new solution

performs better than the current occupant, it replaces the existing one as the new champion.

Otherwise, the candidate is discarded. Over time, this process populates the map with a

diverse set of increasingly eﬀective solutions.

Recognizing the pre-trained LLM diﬀ model, while capable, is not familiar with the

Sodarace task and may not be aligned with the speciﬁc requirements of evolutionary

code generation, an important additional component of ELM is a ﬁne-tuning phase. This

process involved training the LLM further on a dataset generated during the evolutionary

search process, which comprises targeted code diﬀs that were particularly relevant to the

tasks at hand. By doing so, the ﬁne-tuned diﬀ model could more eﬀectively contribute

to the evolutionary search because the ﬁne-tuning process reﬁned the model’s ability to

predict and generate code diﬀs that are not only plausible and syntactically correct but

also highly functional within the speciﬁc context.

The MAP-Elites algorithm was initiated with four simple yet diverse seed solutions

designed to span a range of foundational geometries. These seed solutions, speciﬁcally

labeled as the square seed, the radial seed, and two seeds inspired by CPPNs, provided

a varied starting point for evolutionary exploration (ﬁgure

13.8

𝑎

). As the evolutionary

search progressed, it led to the discovery of creatures with novel and complex body designs,

synthesized through the advanced capabilities of the program. These innovative designs are

showcased in ﬁgure

13.8

𝑏

, highlighting the algorithm’s ability to push beyond conventional

design boundaries. Fur thermore, a detailed behavior analysis of the evolutionary method

is provided in ﬁgure 13.9, which presents three critical metrics: the percentage of niches

discovered, the QD score, and the percentage of runnable code generated by the diﬀ model.

This analysis includes a comparative study between the outcomes using the pre-trained

diﬀ model and the model that was ﬁne-tuned during the QD process.

The results demonstrate that even with the pre-trained diﬀ model, the method achieved

respectable scores across the evaluated tasks. However, it was the ﬁne-tuned LLM

that really drove the improvement, showing just how powerful combining LLMs with

evolutionary computing can be. This synergy not only boosted the algorithm’s eﬃciency

but also its ability to generate functional and innovative solutions, thereby showcasing the

substantial potential of this integrative approach.

13.3.2 Language Model Crossover

Following the previous direction of evolution through LLMs, we now explore another novel

approach that leverages the pattern completion abilities of LLMs for intelligent variation in

evolutionary algorithms. Language model crossover (LMX; Meyerson, Nelson, Bradley,

et al., 2024) capitalizes on the few-shot prompting paradigm, wherein LLMs generalize

from a small set of input-output examples to produce new outputs (ﬁgure 13.10). This

350

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

(𝑎) Sodaracer seeds

(𝑏) Generalization tests

Figure 13.8: Sodaracer seeds and discovered designs. The starting seeds are shown in (

𝑎

). From

top to bottom: CPPN seed, radial seed, and square seed. The discovered designs are shown in (

𝑏

From top to bottom: Wheel, from radical seed; Galloper, from square seed; Runner, from CPPN

seed. ELM enabled bootstrapping from simple, often ineﬀective seed programs to hundreds of

thousands of functional and diverse sodaracers in a domain unseen by the language model. These

evolved artifacts were eﬀective enough to train LLMs to generalize to novel tasks. Figures from

Lehman, Gordon, S. Jain, et al. (2023). Videos at

https://neuroevolutionbook.com/demos

(𝑎) Niches Reached (𝑏) QD Score (𝑐) Diﬀ Quality

Figure 13.9: The impact of ﬁne-tuning the diﬀ model on the performance of ELM. For both the

pretrained diﬀ model and the ﬁne-tuned one, shown are (

𝑎

) the number of niches reached, (

𝑏

) QD

score of the produced map, and (

𝑐

) percentage of valid/runnable diﬀs proposed. The experiments

demonstrate that ﬁne-tuning the diﬀ model improves the performance of the evolutionary process

across all three metr ics. Figure from Lehman, Gordon, S. Jain, et al. (2023).

351

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

11101111

11110111

10100111

11111111

10110111

LM prompt

(Parents)

LM output

(Children)

x^2 + 2.1*x

sin x^2 + 7

3*sin x + 6.6

x^2 sin x + 6

cos x^2 + 2.1*x

the moon is bad

the moon is boring

the moon is cold

the moon is zen

the sky has a moon

green forest art

forest moss plants

red sun mosaic

world green flora

green tree drawing

def move_forward(): …

(a) (b) (c)

(d)

(e)

Figure 13.10: Language Model Crossover (LMX). New candidate solutions are generated

by concatenating parents into a prompt, feeding the prompt through any pre-trained LLM, and

collecting oﬀspring from the output. Such an operator can be created through very few lines of

code. The enormity and breadth of the dataset on which the LLM was trained, along with its

ability to perform in-context learning, enable LMX to generate high-quality oﬀspring across a

broad range of domains. Domains demonstrated include (

𝑎

) binary strings, (

𝑏

) mathematical

expressions, (

𝑐

) English sentences, (

𝑑

) image generation prompts, and (

𝑒

) Python code; many more

are possible. When integrated into an optimization loop, LMX serves as a general and eﬀective

engine of text-representation evolution. Figure from Meyerson, Nelson, Bradley, et al. (2024).

capability is harnessed to design a crossover operator that analyzes commonalities among

parent genotypes and generates oﬀspring that integrate their patterns.

The full algorithm of LMX is illustrated in algorithm 1, which integrates LMX

into a traditional evolutionary loop. The population is initialized with random text-

based individuals, and in each generation, new candidates are created using the LMX

operator. Speciﬁcally, a ﬁxed number of parents are randomly chosen, their genotypes are

concatenated into a prompt, and the LLM is queried to generate oﬀspring. The generated

oﬀspring are validated, added to a temporary pool, and subsequently evaluated using

a ﬁtness function. The population is then reﬁned to retain only the best-performing

individuals for the next generation. This evolutionary cycle repeats until the convergence

criteria are met. Although its algorithm is extremely simple, LMX’s strength lies in its

simplicity and generality. Unlike traditional crossover operators that require domain-

speciﬁc design, LMX’s reliance on text-based representations makes it applicable to any

domain with reasonable textual encoding. Moreover, as LLMs grow in sophistication,

the quality and diversity of oﬀspring generated through LMX are expected to improve,

making it a forward-compatible technique for evolutionary algorithms.

LMX is very versatile, which is shown by its per formance across many diﬀerent

domains, such as binary optimization, symbolic regression, creative prompt generation,

and Python code evolution. For example, the binary strings experiment evaluates whether

LMX can generate meaningful, her itable variation in a toy domain. Using binary strings

of length six, LMX generates oﬀspring based on patterns in parent strings. Results showed

that LMX reliably creates valid and novel strings while preserving heritability. Another

task, the OneMax problem, tests LMX’s ability to evolve binary strings toward maximizing

the number of ones. Although convergence to the optimal solution was slightly slower

compared to a domain-speciﬁc crossover, the mean ﬁtness of solutions was signiﬁcantly

higher using LMX (ﬁgure 13.11).

Symbolic regression is another challenging problem in genetic programming, which

was tackled using the 1.3B parameter Galactica LLM (Taylor, Kardas, Cucurull, et al.,

2022). LMX was used to evolve mathematical expressions to approximate a dataset without

352

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Algorithm 1 Evolutionary Algorithm using LMX. Lines 7-9 are the essence of LMX.

Algorithm from Meyerson, Nelson, Bradley, et al. (2024).

1: Given LLM, population size 𝑛, parents per crossover 𝑘, ﬁtness function 𝑓

2: Initialize population 𝑃 with random text-based individuals ⊲ See experiments for examples

3: while not done evolving do

4: 𝑃

new

= ∅ ⊲ Initialize new candidate set

5: while |𝑃

new

| < 𝑛 do ⊲ Generate new candidates in loop

6: 𝑥

, . . . , 𝑥

𝑘

← randomly choose 𝑘 individuals in 𝑃 ⊲ Select parents

7: prompt ← 𝑥

\n 𝑥

\n . . . \n 𝑥

𝑘

⊲ Concatenate parents, e.g., separated by newlines

8: output ← LLM(prompt) ⊲ Sample output text from LLM given prompt

9: children ← extract valid candidates from output ⊲ E.g., split output on newlines

10: 𝑃

new

← 𝑃

new

∪ children ⊲ Add children to new candidate set

11: end while

12: 𝑃 ← 𝑃 ∪ 𝑃

new

⊲ Add new candidates to population

13: 𝑃 ← reﬁne 𝑃 down to 𝑛 individuals using 𝑓 ⊲ E.g., via tournament selection

14: end while

domain-speciﬁc operators. Results on the SRBench (La Cava, Burlacu, Virgolin, et al.,

2021) banana problem demonstrated that LMX could generate compact, high-performing

expressions. Figure 13.12 illustrates how meaningful oﬀspring are produced by varying

parent expressions. These results highlight the adaptability of LMX to tasks requiring

interpretable, non-trivial solutions.

In the creative domain of image generation, LMX evolved text prompts for stable

diﬀusion to generate images optimized for speciﬁc color properties (e.g. redness, greenness).

Fitness functions were designed to quantify the desired properties in the images. Compared

to zero-shot baselines and one-point crossover, LMX achieved higher diversity and ﬁtness

(ﬁgure 13.13). This experiment highlights LMX’s ability to interface seamlessly with

other generative models and optimize results in creative tasks.

Finally, using the Sodarace environment we have already encountered in the previous

section, LMX was tested for generating functional and diverse code. The ﬁtness function

evaluated the distance traveled by the robot. Experiments showed that LMX with larger

LLMs produced a greater diversity of valid sodaracers, ﬁlling more niches and achieving

higher quality-diversity scores. As is illustrated in ﬁgure 13.14, the ﬁndings demonstrate

LMX’s potential for applications in evolving executable code.

LMX exempliﬁes how LLMs can enhance evolutionary computing by acting as

intelligent, versatile variation operators. Through its simple prompting mechanism, LMX

enables evolutionary algorithms to generate meaningful and semantically rich oﬀspring

across diverse domains, from equations to text and code. By leveraging the pattern-

completion abilities of LLMs, LMX showcases how these models can introduce nuanced

variations that traditional methods struggle to achieve. As LLMs improve in scale and

reliability, their synergy with evolutionary algorithms oﬀers exciting opportunities for

optimization and creativity. This exploration of LLMs in crossover operators sets the stage

for broader applications, such as their potential role in shaping evolutionary strategies, as

we discuss in the next section.

353

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

(a)

1 2 3 4 5 6 7 8 9 10

Generation

Fitness

Median Values of LMX and One Point Crossover

LMX Max

LMX Mean

1pt Xover Max

1pt Xover Mean

(b)

Figure 13.11: Heritability and convergence of LMX on binary strings. (

𝑎

) The histogram

shows the distribution of how far oﬀspring are from the all-1s string, depending on whether

parents are taken in the neighborhood of the all-1s or all-0s string. As expected, these distributions

are signiﬁcantly diﬀerent. The conclusion is that LMX indeed produces heritable variation.

(

𝑏

) Convergence results (median and IQR) for a simple genetic algorithm using either LMX or

one-point crossover. Though fewer solutions converge on the optima using LMX than the classical

recombination (16/20 vs. 20/20), mean values are higher (Mann-Whitney

𝑝 = 0.002

). While not

as eﬃcient as a domain-speciﬁc operator, it is clear that LMX can indeed drive an evolutionary

process. Figure from Meyerson, Nelson, Bradley, et al. (2024).

13.3.3 LLMs as Evolution Strategies

The exploration of LLMs in evolutionary computing does not stop at variation operators.

EvoLLM (Lange, Tian, and Tang, 2024b) is an approach that integrates LLMs directly into

evolutionary strategies. This approach involves reimagining the language model as a core

component in evolutionary computing by not only asking the LLM to identify potential

solutions but actively involving it in the evolutionary cycle, allowing it to suggest optimal

sampling points for further evaluation (ﬁgure 13.15𝑎).

Concretely, EvoLLM’s design can be described from the combination of a high-level

prompt design space (macro-view) and a detailed API space (micro-view), see ﬁgure 13.16

for an illustration. In the high-level prompt design space, EvoLLM ﬁrst constructs an LLM

prompt by representing the solution candidates as integers resulting from a discretized

search space with a pre-speciﬁed resolution. The approach uses integers instead of raw

ŕoating-point numbers to avoid the diﬃculty LLM tokenizers face when dealing with

non-text data. To construct a query that EvoLLM can better understand and generate

improvement eﬃciently, a record of all the population evaluations are kept and the set of

previous records

𝐻 = {𝑋

𝑔

, 𝐹

𝑔

}

𝐺

𝑔=1

sorted by their ﬁtness within and across generations,

here

𝑋

𝑔

’s are the solutions in generation

𝑔

, and

𝐹

𝑔

’s are their ﬁtness scores. The top-

𝐾

performing generations and top-

𝑀

solutions within each generation are then selected and

organized in a for matted manner in the LLM’s input context. Finally, similar to the design

of the decision transformer (L. Chen, K. Lu, Rajeswaran, et al.,

2021), EvoLLM appends a

desired ﬁtness level

𝑓

query

LLM

as the target for the proposal at the end of the input context; see

the bottom left light purple box in ﬁgure 13.16 (prompt 1) for an illustration of the input

prompt. Although there are violations, most LLMs robustly follow the pattern outlined in

this prompt design and continue the string format by outputting a new mean

𝑥

LLM

with

354

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Figure 13.12: Four examples of LMX for symbolic regression. The prompt of seven parents is

in blue; the LLM output parsed as (up to three) oﬀspr ing is in violet; remaining discarded LLM

output is in gray. In all cases, children exhibit meaningful variations of their parents. Figure from

Meyerson, Nelson, Bradley, et al. (2024).

the correct delimiter. The caller of EvoLLM in the user space can then use this as the

proposed mean to sample a new set of candidates and evaluate them in the task to update

the records 𝐻, and this loop continues.

EvoLLM includes a set of detailed design choices in the API space, and the list below

summarizes the most important ones:

Context Buﬀer Initialization. EvoLLM uses random search to ﬁll up the context

buﬀer as initial solutions and evaluations.

Context Buﬀer Discretization and Augmentation. EvoLLM represents the

solutions as integers (i.e. remap the inputs and the tokens) and keeps track of the

355

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

What is the most red background on a

wall of people when they are in motion

with red on their faces and are wearing

red cloths? This is a picture of a bunch

of red backgrounds, red backgrounds,

red, backgrounds, background,

backgrounds, backgrounds,

background.....etc...... This was a

picture of a bunch of red backgrounds,

red backgrounds,

blue in water with purple background

on bright light, fx-5-b-p-d-d-b-r-s

green grass on green green

background: 2 leaves, 3D model in

blender on a green green background

on green | background

a b

Figure 13.13: Image generation results. (

𝑎

) Performance aggregated (mean and std. er r.) over

nine runs (three seeds for each color for each method; normalized to [0, 1] based on the min and

max ﬁtness for each seed) shows that LMX substantially outperforms the alternatives, such as a

one-point crossover. The zero-shot LLM baseline quickly stagnates, as it is unable to iteratively

reﬁne its initial solutions; even human random solutions eventually outperform it, as they have

greater diversity. (

𝑏

) The highest-ﬁtness prompts and corresponding images of LMX for each

color all include the word łbackgroundž, but vary in the length and detailed content, highlighting

LMX’s ability to discover diverse, non-obvious solutions. Figure from Meyerson, Nelson, Bradley,

et al. (2024).

(a) Niches ﬁlled (b) QD scores (c) Validation rate

Figure 13.14: Sodarace results. We show the results for varying numbers of parents in the LLM

prompt and across LLM scale. (

𝑎

) Number of niches ﬁlled in MAP-Elites. (

𝑏

) Quality-Diversity

scores (sum of the ﬁtnesses of all niches in the map) (

𝑐

) Validation rate (%) for the generated

sodaracers. LMX generally beneﬁts from more examples in its prompt, is able to produce

reasonable variation, and often creates valid Sodarace mutations, highlighting its promise for

evolving code. Figure from Meyerson, Nelson, Bradley, et al. (2024).

candidates and their ﬁtness scores.

Select & Sort Context Generations. In addition to the default way of picking the

best-performing solutions seen so far, EvoLLM also considers selecting randomly

from the buﬀer or selecting the most recent

𝐾

generations evaluated on the problem

(see prompt 2 in ﬁgure 13.16).

Select & Sort Context Candidates. Similarly, besides the default option of taking

the łbest-within-generationž, EvoLLM supports random selection and picking the

łbest-up-to-generationž options.

Query LLM for Search Improvement. EvoLLM samples and constructs the

356

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

(a) Approach

(b) Results

Figure 13.15: Overview of EvoLLM. (

𝑎

) An overview of the EvoLLM procedure. An LLM

suggests updates to the Evolution Strategies (ES) search distribution by working within a discretized

search space and ranking solutions from worst to best based on performance. To manage context

length as the number of dimensions increases, the search space can be divided into blocks, allowing

for batch queries to the LLM. (

𝑏

) Aggregated results from eight BBOB benchmark settings and

three neuroevolution control tasks. Results are averaged over ten runs for BBOB and ﬁve runs for

control problems. LLM-driven evolution strategies (green) consistently outperform traditional

baselines (blue). Figure from Lange, Tian, and Tang (2024a).

Figure 13.16: EvoLLM Prompt Design Space & API. All solution evaluations and their

performance are tracked in a context buﬀer. This buﬀer is used to construct query prompts for

the LLM. After parsing the LLM output and performing sampling, the resulting population is

evaluated, and the new information is added to the buﬀer. Figure from Lange, Tian, and Tang

(2024a).

357

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

prompt repeatedly at each generation. When the generated solution failed to improve

the ﬁtness, EvoLLM uses a backup strategy and samples around the previous best

evaluated solution.

Sample & Evaluate New Candidate. EvoLLM samples around the proposed mean

𝑥

LLM

, evaluates all the populations, and adds them to the context buﬀer.

Scale to Larger Search Spaces. Once the context becomes too long, LLMs

start to give non-informative outputs. To avoid this limitation when handling

high-dimensional data, EvoLLM groups a set of dimensions that ﬁts into the context

of an LLM and performs multiple queries per generation. In the extreme case, each

LLM call processes a single dimension

𝑑

. This trade-oﬀ of increased inference time

allows EvoLLM to scale to a larger number of search dimensions.

To evaluate EvoLLM, its performance was measured on four diﬀerent tasks from the

black-box optimization benchmark (BBOB; Hansen, Auger, Finck, et al., 2010), and

compared with standard ES algorithms (ﬁgure 13.15

𝑏

). The LLM-based ES outperformed

random search and Gaussian hill climbing with diﬀerent search dimensions and population

sizes. On many of the considered tasks, EvoLLM is even capable of outperforming

diagonal covariance ES algorithms. Moreover, EvoLLM is more eﬃcient in generating

solutions, which typically takes less than ten generations.

EvoLLM’s design is generally applicable across diﬀerent LLMs, as demonstrated

through experiments with Google’s PaLM2 (Anil et al., 2023), OpenAI’s GPT-4 (Achiam

et al., 2023), and the open-source Llama2 (Touvron et al., 2023). An interesting observation

is that the LLM model size inversely aﬀects the performance of EvoLLM; larger models

tend to perform worse than smaller models. EvoLLM can also be applied to control tasks

such as CartPole-v1 and Acrobot-v1 from OpenAI’s Gym tasks (Brockman, Cheung,

Pettersson, et al., 2016), where it is tasked to evolve 16 to 40 parameters of a feedforward

neural controller. EvoLLM was able to evolve the control policy to solve both tasks, being

capable of even outperforming competitive baselines with smaller compute budgets.

The promising results from the evaluation of EvoLLM further underscore the potential

of using language models as components within evolutionar y systems. While much of this

research remains exploratory, a growing number of works are beginning to demonstrate

tangible impact in real-world settings, and we will introduce one such example in the next

section.

13.3.4 AlphaEvolve

LLMs have a remarkable ability to generate syntactically correct and semantically

meaningful code, enabling applications in program synthesis, code completion, and

automated debugging. Beyond code generation, as was already discussed, LLMs can also

serve as optimizers in an evolutionary loop, proposing structured variations and adapting

based on feedback. AlphaEvolve (Novikov, V

u, Eisenberger, et al., 2025) built on this

insight by treating the LLM not just as a generator of programs, but as a mutation operator

capable of reﬁning solutions through iterative search. Given a user-deﬁned problem and

an evaluation function, AlphaEvolve evolves programs that improve over time, guided by

LLM-generated modiﬁcations and performance-based selection.

358

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Initial program

with components

to evolve

Prompt template

and conguration

Choice of existing

or custom LLMs

Scientist / Engineer

Best program

AlphaEvolve

Evaluation code

Distributed Controller Loop

parent_program, inspirations = database.sample()

prompt =

prompt_sampler.build(parent_program, inspirations)

diff =

llm.generate(prompt)

child_program = apply_diff(parent_program, diff)

results =

evaluator.execute(child_program)

database.add(child_program, results)

Evaluators poolLLMs ensemblePrompt sampler

Program database

Figure 13.17: Expanded view of the AlphaEvolve discovery process. The user provides an

initial program (with components to evolve marked), evaluation code, and optional conﬁgurations.

AlphaEvolve then initiates an evolutionary loop. The prompt sampler uses programs from the

program database to construct rich prompts. Given these prompts, the LLMs generate code

modiﬁcations (diﬀs), which are applied to create new programs. These are then scored by

evaluators, and promising solutions are registered back into the program database, driving the

iterative discovery of better and better programs. Figure from Novikov, V

u, Eisenberger, et al.

(2025).

AlphaEvolve (ﬁgure 13.17) is implemented as an autonomous evolutionary system

in which LLMs propose new program variants, and an external evaluation function

determines their ﬁtness. The system is organized as a distributed pipeline comprising an

asynchronous controller, prompt samplers, LLM-based generators, and parallel evaluators.

The evolution process begins with a user-deﬁned task, speciﬁed through a Python-based

evaluation function that returns one or more scalar scores for a given program. AlphaEvolve

supports a wide range of problems, from simple mathematical objectives to per formance-

critical engineering tasks. To integrate with existing codebases, the system provides an

annotation API that allows users to mark speciﬁc blocks of code as targets for evolution.

These annotated blocks are then iteratively rewritten by the system while preserving the

surrounding structure for compatibility with the evaluation function (ﬁgure 13.18).

At each generation, AlphaEvolve constructs a prompt containing one or more existing

programs sampled from its archive. These prompts include natural language instructions,

past evaluation results, and optionally meta-level information such as performance trends

or alternative formatting. Prompts are passed to an ensemble of LLMs (Gemini 2.0

Flash and Pro), which return candidate modiﬁcations in either a str uctured diﬀ format

or as complete code blocks if the amount of change is large. Using multiple models in

this manner makes it possible to balance high-throughput exploration and high-quality

359

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Figure 13.18: Illustrative example of applying AlphaEvolve to evolving a supervised learning

pipeline. All snippets are abbreviated, with ellipses (...) indicating skipped lines. (

𝑎

) The

user-provided ﬁle with blocks marked for evolution, and the special evaluate function that can

be invoked to score the current version of the code. (

𝑏

) Example of an assembled prompt to be

provided to the LLMs. (

𝑐

) Example output generated by the LLM. The proposed diﬀs in (

𝑐

) will be

applied to the łcurrent programž shown in the prompt (

𝑏

), and the resulting modiﬁed program will

then be sent to the evaluators. The evaluators will invoke the evaluate function from (

𝑎

) in order

to obtain the scores of the newly proposed program. This approach makes it possible to harness

the power of population-based search in a wide range of problems from simple mathematical

objectives to performance-critical engineering tasks. Figure from Novikov, V

u, Eisenberger, et al.

(2025). Video at https://neuroevolutionbook.com/demos.

reﬁnement. To promote both quality and diversity, the archive employs a hybrid of MAP-

Elites and island-based evolutionary strategies. This design encourages the preservation

of high-performing variants across distinct behavioral niches while also allowing isolated

exploration threads to develop independently.

AlphaEvolve demonstrated remarkable versatility and impact across a wide range of

domains. It not only surpassed long-standing benchmarks in fundamental mathematics but

360

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

also delivered measurable improvements to real-world industrial systems. Its achievements

spanned four major areas:

•

Faster matrix multiplication algorithms: AlphaEvolve made signiﬁcant progress

in ﬁnding lower-rank tensor decompositions for a wide range of matrix shapes.

Notably, it discovered a way to multiply two

4 × 4

matrices using only 48 scalar

multiplications, beating the long-standing benchmark of 49 set by Strassen (1969).

Across 14 diﬀerent matrix conﬁgurations, AlphaEvolve matched or outperformed

the best known results, often from decades of human research.

•

Solving open mathematical problems: AlphaEvolve was applied to over 50 open

problems across combinatorics, number theory, geometry, and analysis. In

75%

the cases, it rediscovered the best known constructions; in

20%

, it improved upon

them, establishing new bounds or conﬁgurations. For example, it set a new record

for the 11-dimensional kissing number problem by constructing a packing of 593

spheres, one more than the previous best, and slightly improved bounds in problems

such as Erdős’s minimum overlap.

•

Optimizing data center scheduling: AlphaEvolve was deployed in Google’s

production data centers to evolve a better scheduling heuristic. The new heuristic

improves the allocation of jobs across machines by minimizing łstrandedž resources,

such as idle memory or CPU. The resulting policy, evolved from the existing system,

was rolled out across Google’s ŕeet and led to a consistent recovery of

0.7%

computing resources.

•

Accelerating ML infrastructure and hardware design: In the context of Gemini

model training, AlphaEvolve evolved tiling heuristics for matrix multiplication

kernels, achieving a

23%

kernel speedup and reducing overall training time by

It also optimized compiler-generated code for FlashAttention, resulting in a

32%

improvement in kernel runtime and a 15% improvement in data preparation.

By embedding LLMs within an evolutionary framework, AlphaEvolve successfully

tackled challenges in both abstract domains (e.g. tensor decomposition and combinatorial

constructions) and real-world industrial systems (e.g. data center scheduling, hardware

circuit design, and kernel optimization). These results show that combining LLMs with

neuroevolution can actually work in practice and deliver real results. As LLMs get better at

reasoning through problems and writing code, pairing them with evolutionary computation

could open up exciting new possibilities for scientiﬁc breakthroughs, engineering solutions,

and other ﬁelds we haven’t even thought of yet.

13.4

Case Studies: NE-enhanced Generative AI for Game Level

Generation

Generative AI is transforming how content is created in many areas. While current

generative models excel at producing text and 2D images, they are rapidly advancing

toward creating realistic environments, 3D assets, expansive landscapes, dynamic quests,

361

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

levels, visual eﬀects, etc. Although much of the attention has been on LLMs, it is important

to note that not all generative AI relies on LLMs. Today’s procedural content generation

systems draw from a wide range of AI methods, including deep neural networks and various

machine learning techniques, and neuroevolution (Liapis, Yannakakis, and Togelius,

2011;

Togelius, Yannakakis, Stanley, et al., 2011). We have already seen examples in chapter 8.

These tools enable the generation of rich, varied, and original content across domains

such as games, art, music, and more. In this case study, we’ll ﬁrst take a look at how

neuroevolution methods can be synergistically combined with generative AI methods such

as GANs and VAEs to produce functional video game levels. We then turn our attention

to their combination with LLMs.

13.4.1 MarioGAN

One powerful combination of generative AI and neuroevolution is latent variable evolution

(LVE) approaches (Bontrager, W. Lin, Togelius, et al., 2018 ; Bontrager, Roy, Togelius,

et al.,

2018). LVE is a technique that combines generative models and evolutionar y

algorithms to generate images, levels, or other structured outputs that meet speciﬁc goals

or constraints. At its core, a generative model like a GAN, VAE, or diﬀusion model learns

to map vectors from a latent space (i.e. a compressed, abstract representation space) to

realistic data samples. Each point in the latent space corresponds to a potential output.

However, the mapping is not always intuitive: small changes in the latent vector can result

in large or subtle changes in the generated output, and most randomly sampled points

might not yield useful or goal-oriented results.

LVE addresses this by applying evolutionary algorithms, such as genetic algorithms or

CMA-ES, to search the latent space in a guided way. Instead of randomly sampling latent

vectors, the algorithm maintains a population of candidate vectors and iteratively improves

them based on a ﬁtness function. This function measures how well the generated output

satisﬁes the desired criteria, such as functionality, aesthetics, novelty, or diﬃculty. LVE

has been applied to a variety of diﬀerent domains, such as generating synthetic ﬁngerprints

to fool ﬁngerprint recognition systems (Bontrager, Roy, Togelius, et al., 2018), levels for

the video game Doom (Giacomello, Lanzi, and Loiacono, 2019), or levels for Super Mario

Bros (Volz, Schrum, J. Liu, et al., 2018).

Let’s have a closer look at how the approach works to create Super Mario Bros

(Nintendo, 1985) levels. A ﬁrst step is to decide on a suitable level of representation for

training. The authors used the Video Game Level Corpus (VGLC), where each tile type is

represented by a symbol, such as

for ground,

for empty space,

for a question block, or

for an enemy. These symbols were mapped to integers and then one-hot encoded for use

in the GAN. The generator outputs levels in this one-hot format, which are converted back

into tile grids and rendered in the Mario AI framework. For training, the original level

was cut into overlapping segments by sliding a

28 × 14

windowÐthe size of the visible

Mario screenÐacross it, which produced 173 training samples from just a single level

(Volz, Schrum, J. Liu, et al., 2018). This representation ensures that essential gameplay

elements such as ground, obstacles, enemies, and pipes are captured, though it simpliﬁes

some distinctions, for example treating all enemies as Goombas.

On this basis, a GAN was trained to map random latent vectors (32 dimensions)

362

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Figure 13.19: Overview of the two-phase MarioGAN approach combining GAN training and

latent vector evolution. In phase 1 (

𝑙𝑒 𝑓 𝑡

), a GAN is trained in an unsupervised manner to generate

Mario levels. In phase 2 (

𝑟𝑖𝑔ℎ𝑡

), the search focuses on identifying latent vectors that produce levels

exhibiting desired properties. The approach thus combines the power of generative models to learn

from existing level examples, with the ability of evolution to search that space eﬃciently. Figure

from Volz, Schrum, J. Liu, et al. (2018). Video at

https://neuroevolutionbook.com/demos

to Mario level segments. Once trained, the generator acts as a genotype-to-phenotype

mapping: latent vectors deﬁne diﬀerent candidate levels. To move beyond random

sampling, the search for interesting vectors was placed under evolutionary control using

CMA-ES. Fitness functions guided the optimization toward particular goals, which could

focus either on properties of the tile distribution or on how the levels actually played out

when tested by an artiﬁcial agent (Volz, Schrum, J. Liu, et al., 2018).

The results of this process can be divided into two categories. In representation-based

testing, levels were optimized for static properties, such as producing a speciﬁed proportion

of ground tiles. In agent-based testing, the champion A* Mario agent from the 2009 Mario

AI competition was used to evaluate whether levels were playable and how many jumps

were required to complete them. Impressively, in both settings, MarioGAN was able to

produce levels with the desired properties. Two examples are shown in ﬁgure 13.20, in

which the approach created level segments that (a) maximize and (b) minimize the number

of jumps, respectively. Overall, the MarioGAN approach is capable of generating a wide

range of levels that are both stylistically faithful and controllable through well-chosen

ﬁtness functions.

LVE can also alleviate one of the signiﬁcant challenges in interactive evolutionary

computation, which we already encountered in chapter 8. While systems such as Picbreeder

can eventually yield creative and rewarding outcomes, the initial stages are typically ﬁlled

with geometric forms that lack visual or semantic appeal. This makes it diﬃcult for users

to provide meaningful feedback, often leading to disengagement or fatigue. We have seen

how automating the early stages of evolution can alleviate this issue and bypass the most

unproductive phases (section 8.5).

LVE oﬀers an alternative to this staged strategy for interactive evolution by rethinking

the underlying representation (Bontrager, W. Lin, Togelius, et al., 2018). As mentioned

earlier, a pre-trained GAN is in essence a learned genotype-to-phenotype mapping.

The latent space of the GAN is used as the search space for evolution, meaning that

even randomly sampled genotypes produce outputs that resemble valid, domain-speciﬁc

artifacts. Because these images are already visually coherent from the outset, users can

engage meaningfully from the very ﬁrst generation. This advance signiﬁcantly reduces

363

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Figure 13.20: Examples of MarioGAN-generated level segments. Shown are level segments

optimized to maximize (

𝑎

) and minimize (

𝑏

) the number of jumps, respectively. Searching the

latent space of a GAN through CMA-ES, allows the algorithm to quickly ﬁnd level segments

satisfying the given objectives. Figure from Volz, Schrum, J. Liu, et al. (2018).

the burden of early evaluation and mitigates user fatigue. In contrast to Picbreeder’s need

for bootstrapping via novelty-based ﬁtness or HCM, LVE leverages learned generative

priors to constrain and shape the evolutionary landscape, allowing interactive search to

begin in a space that is already rich with possibilities.

Similarly to what we have observed with the combination of LLMs and evolutionary

computation in the preceding sections, the synergy between GANs and neuroevolution is

also bidirectional. While LVE demonstrates how GANs can serve as powerful genotype-

to-phenotype maps for evolutionary search, evolutionary algorithms can in turn improve

the training of GANs themselves (Hemberg, Toutouh, Al-Dujaili, et al., 2021; Toutouh,

Hemberg, and O’Reilly, 2019). Training GANs often faces challenges such as instability

or mode collapse. These issues stem largely from a lack of diversity during training. To

address them, evolutionary computation allows introducing diversity into GAN training

at diﬀerent levels. For example, mutation diversity can be achieved by training multiple

copies of a generator with diﬀerent objective functions and selecting the best. Population

diversity can be achieved through a distributed grid of GANs that evolve by exchanging

neighbors, selecting based on performance, and tuning hyperparameters. These approaches

illustrate how coevolutionary dynamics and evolutionary selection pressures can yield

GANs that produce more diverse outputs, and resist common training pathologies.

13.4.2 MarioGPT

The second case study details how LLMs can oﬀer an alternative approach to the potentially

expensive searches within the latent space of neural networks. In the context of Mario

game levels, ideally, we would like to directly ask for levels with speciﬁc properties such

as diﬃculty, number of enemies, etc. However, while LLMs are powerful tools that

can draw on their natural language training to write stories, generate code, and answer

questions, can they also create functional video game levels? Unlike the text-based data

LLMs are typically trained on, game levels involve complex functional constraints and

spatial relationships across multiple dimensionsÐposing a very diﬀerent kind of challenge

(Sudhakaran, González-Duque, Freiberger, et al., 2023; G. Todd, Earle, Nasir, et al., 2023;

364

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Yannakakis and Togelius, 2018).

It turns out that a language model (in this case GPT-2) can indeed be ﬁne-tuned on

tile-based level data to generate complete game levels from natural language prompts.

This framework, called MarioGPT (Sudhakaran, González-Duque, Freiberger, et al.,

2023), integrates LLMs with algorithms from neuroevolution to enable open-ended and

controllable content generation. MarioGPT departs from traditional procedural content

generation methods, which often struggle with controllability and diversity, by leveraging

the expressive capabilities of language models to condition level creation on high-level

descriptions such as łmany pipes, no enemies, high elevation.ž

(

𝑎

) Many pipes, many enemies, little blocks, low

elevation

(𝑏) No pipes, some enemies, many blocks, high

elevation

(𝑐) Many pipes, many enemies (𝑑) No pipes, no enemies, many blocks

(𝑒) Prompt not in dataset: many pipes, no

enemies, many blocks

( 𝑓 ) Failure case: many pipes, no enemies, some

blocks

Figure 13.21: Example levels generated by MarioGPT. MarioGPT can successfully generate

levels aligned with the text prompt in most cases (

𝑎

𝑒

). For instance, levels vary in pipe count,

enemies, and block distribution according to the description. Failure cases are rare, such as

in (

𝑓

), where enemies are still generated despite being excluded in the prompt. Figure from

Sudhakaran, González-Duque, Freiberger, et al. (

2023). Video of an agent playing a generated

level at https://neuroevolutionbook.com/demos.

To generate levels, MarioGPT encodes level data as sequences of tokens, and uses

cross-attention to incorporate prompt information encoded by a frozen BART model. This

setup allowed users to control speciﬁc features of the generated levels through natural

language, bypassing the need to search a latent space for desirable content (ﬁgure 13.21).

The resulting levels were not only structurally varied but also often playableÐabout 88%

of them could be completed by an automated A* agent, suggesting that the model captures

both aesthetic and functional aspects of game design.

MarioGPT was also able to generalize to text prompts that were not explicitly

represented in the training dataset. For example, ﬁgure

13.21

𝑒

illustrates a successful

generation for the prompt łmany pipes, no enemies, many blocks,ž with only a minor

deviation (i.e. the level contains four pipes instead of the expected ﬁve). However, this

ability to extrapolate was not always reliable, and some failure cases did exist. For example,

in ﬁgure 13.21

𝑓

, given the prompt łmany pipes, no enemies, some blocks,ž the model

correctly matched the number of pipes and blocks but mistakenly included too many

enemies.

In procedural content generation, it is crucial not only to create levels with varied

physical layouts but also to design ones that inspire diverse player behaviors. For Mario

365

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Figure 13.22: Novelty search framework with MarioGPT-based mutation operators. A level

is selected from the archive of top elites and undergoes mutation. If the resulting level exhibits

suﬃcient novelty, it is added back to the archive. The mutation process consists of two steps: (1)

a random segment of the level is replaced with a new sample generated by MarioGPT, using a

randomly selected prompt; (2) the surrounding border region is inpainted using MarioBERT to

ensure path continuity and playability. Figure from Sudhakaran, González-Duque, Freiberger, et al.

(2023).

level generation speciﬁcally, this means emphasizing multiple viable paths that players

can take to complete a level. Achieving this variety poses a signiﬁcant challenge for many

algorithms and often relies on external agents for proper evaluation.

To enable MarioGPT to discover a large diversity of levels that require diﬀerent player

paths, it was combined with novelty search and LLMs as mutation operators (ﬁgure

13.22).

During evolution, elite levels were selected and mutated by replacing random sections

with new samples generated from random prompts. To maintain level consistency and

playability, a second model, MarioBERT, performed inpainting at the borders of the

mutated segments. Novelty was evaluated based on predicted player trajectories, using the

diﬀerences in paths as behavioral descriptors. Only levels that introduce suﬃcient novelty

relative to the archive were retained, driving the system toward increasing diversity over

generations. This way, NS-MarioGPT was able to discover many diﬀerent levels with

distinct player path patterns.

This combination of large language models and novelty search illustrates a powerful

synergy between generative AI and neuroevolution. Rather than optimizing for a speciﬁc

ﬁtness function, the system prioritizes exploration and diversity, embodying the principles

of open-endedness (chapter 9). MarioGPT demonstrates how pretrained language models

can serve as generative engines in evolutionary frameworks, expanding the frontier of

content creation without manual tuning or expensive evaluation functions. It also highlights

the potential for future work where language, learning, and evolution converge, particularly

366

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

in domains that beneﬁt from both control and creativity.

13.5 World Models

Deep learning models, in particular, deep generative models, are eﬀective tools for learning

representations from vast amounts of training data. As we have seen in the preceding

case studies, such models are able to generate data to resemble the actual data distribution

they learned from real training data, and such models can be primed with relatively

low-dimensional latent vectors to produce rich and expressive outputs.

Given the expressiveness of deep generative models, one can attempt to use these

models to learn all about the environment an artiﬁcial agent interacts with. We call a

generative model of the agent’s environment a łworld modelž because, like our own

internal łmental world modelž of the world, an agent can incorporate such a model into its

own decision-making process. World models are thus another synergistic way to combine

neuroevolution with generative AI.

In this section, we describe methods and approaches that combine such generative

world models with evolutionary computation. In particular, we explore an approach

that uses deep learning to train a world model on an agent’s environment, and use

neuroevolution to train an agent controller (Ha and Schmidhuber, 2018). This work laid

the foundation for much follow-up research in this area. An extension to modern generative

AI models is still largely unexplored, but it is a compelling and logical direction of future

work.

13.5.1 A Simple World Model for Agents

The agent’s neural model (ﬁgure 13.23), inspired by our own cognitive system, has a

visual sensory component that compresses what it sees into a small representative code.

It also has a memory component that makes predictions about future codes based on

historical information. Finally, the agent has a decision-making component that decides

what actions to take based only on the representations created by its vision and memory

components. We have already encountered a similar architecture in section 7.1.2, where

we were interested in agents learning to predict what is impor tant for their survival. The

world model idea, which we explore in this section, is to explicitly encourage a model to

predict what will happen next. As we will see later, this ability even allows us to train

an agent entirely within a hallucinated dream created by its own world model, and then

transfer the resulting policy back into the real environment.

The environment provides the agent with a high-dimensional input observation at each

time step. This input is usually a 2D image frame that is part of a video sequence. The role

of the V model is to learn an abstract, compressed representation of each observed input

frame. Here, a variational autoencoder (VAE) (Kingma and Welling, 2014) is used as the

V model. As shown in ﬁgure 13.24, this VAE model can compress an image frame into a

low-dimensional vector z. This compressed representation can be used to reconstruct the

original image. In our experiments, the size of this latent vector is 16 dimensions, and

used to represent the spatial part of the agent’s environment.

367

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

Figure 13.23: World model architecture. The agent consists of three components that work

closely together: Vision (V), memory (M), and controller (C). The world model components V

and M can be trained eﬃciently in an unsupervised manner through gradient descent to capture

compressed spatial and temporal representations of the environment. Leveraging these learned

features, a compact and simple controller can then be evolved to solve the target task. Thus, this

world model combines both neuroevolution and a generative world model in a synergistic way.

Interactive demo link at https://neuroevolutionbook.com/demos.

Encoder

Decoder

Original Observed Frame Reconstructed Frame

Figure 13.24: Variational Autoencoder. Example of a VAE trained on screenshots of VizDoom.

High-dimensional input frames are compressed into a low-dimensional latent vector z, which

captures the essential spatial features. The decoder reconstructs the input from z, enabling eﬃcient

representation learning for downstream tasks.

While it is the role of the V model to compress what the agent sees at each time frame,

it is also useful to compress what happens over time. For this purpose, the role of the M

model is to predict the future. The M model serves as a predictive model of the future

𝑧

368

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

vectors that V is expected to produce. A simple RNN can be trained to predict the next

latent vector

𝑧

given the current and past information available to it. Given the predictive

power of recurrent neural networks, our RNN’s internal hidden state vector

ℎ

can be

used to represent the temporal part of the environment, and also be considered to be the

internal state of our agent, encapsulating our agent’s memory. To train both V and M, data

is initially gathered from the agent’s environment using a random policy and collecting

around 10,000 example rollouts.

The controller (C) model is responsible for determining the actions to take in order to

maximize the expected cumulative reward of the agent during a rollout of the environment.

C can be deliberately made as simple and small as possible, and trained separately from V

and M, so that most of our agent’s complexity resides in the world model (V and M). The

simplest C is a simple single-layer linear model that maps

ℎ

𝑡

and

𝑧

𝑡

directly to action

𝑎

𝑡

each time step

𝑡

. Figure 13.25 is a ŕow diagram illustrating how V, M, and C interact with

the environment.

Figure 13.25: Flow diagram of the world model agent. The raw observation is ﬁrst processed by

V at each time step

𝑡

to produce

𝑧

𝑡

. The input into C is this latent vector

𝑧

𝑡

concatenated with M’s

hidden state

ℎ

𝑡

at each time step. C will then output an action vector

𝑎

𝑡

for motor control. M will

then take the current

𝑧

𝑡

and action

𝑎

𝑡

as an input to update its own hidden state to produce

ℎ

𝑡+1

be used at time 𝑡 + 1.

This minimal design for C also oﬀers important practical beneﬁts. Advances in deep

learning provided us with the tools to train large, sophisticated models eﬃciently, provided

we can deﬁne a well-behaved, diﬀerentiable loss function. The V and M models are

designed to be trained eﬃciently with the backpropagation algorithm using modern GPU

accelerators, so we would like most of the model’s complexity and model parameters

to reside in V and M. The number of parameters of C, a linear model, is minimal in

comparison. This choice allows us to use very ŕexible evolutionary algorithms to train

C to tackle more challenging RL tasks where the credit assignment problem is diﬃcult.

Thus, the parameters of C can be eﬃciently optimized with CMA-ES, which works well

for solution spaces of up to a few thousand parameters.

369

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

13.5.2 Using the World Model for Feature Extraction

A world model contains much useful internal latent information that the agent can leverage

as useful features extracted from the environment into the model. These features can even

be used entirely for the agent’s decision-making process, bypassing the direct use of the

actual observations from the environment. Let’s have a more detailed look at how this

approach works, using the CarRacing task (sections 4.4.3 and 7.1.2) as an example.

As a reminder, CarRacing is a top-down car racing environment, where the agent

has to learn to drive from pixel-observations alone. While it is possible to feed the

high-dimensional input into a large policy network trained to output an action, such an

approach can be diﬃcult to scale for more complex domains or requires additional methods

to protect innovation (section 7.1.2). By using a world model, one can considerably limit

the size and complexity of the policy network. In fact, the VAE-based vision model can

be quickly trained to compress an entire input frame into a 16-dimensional latent vector

𝑧

which is expressive enough to reconstruct the image meaningfully enough for the driving

task.

By using the vision model (V) alone, without even using the memory model (M), one

can train a small linear network with 17 parameters (16 latent vectors and an additional

bias) to compute the action vector (brake, gas, and steer), which required evolving only 51

parameters for this simple linear model. The resulting model achieved an average score of

632 ± 251

over 100 trials. While the navigation policy makes the car go a bit wobbly, due

to the simplicity of the linear model and the lack of predictive power from using the vision

model alone, it does generally do the job of completing most tracks.

We can further increase the performance of the vision-only model by moving from a

simple linear controller to one with a hidden layer, which results in a score of

788 ± 141

over 100 trials. To give the approach even more ŕexibility, we can also evolve the controller

network with NEAT. NEAT here is allowed to use a variety of diﬀerent activation functions

such as sinusoids, step functions, and ReLUs (similarly to what we have seen when NEAT

is used to evolve CPPNs in section 4.3.1). Figure 13.26 is the best NEAT network for the

agent controller, which is able to achieve an impressive performance of an average score

of 893 ± 74 over 100 trials.

Instead of further increasing the complexity of the controller, another interesting

question is how far we can improve the performance of a simple linear-only controller

by incorporating the memory model (M) into the agent’s world model. While the vision

model has no predictive power and only contains static features representing the spatial

properties of the agent’s environment, the memor y model can predict part of the future

state of the agent. Indeed, by concatenating the latent vector

𝑧

from the vision model

and the hidden state

ℎ

of the predictive recurrent neural network model, our linear-only

controller achieved the very best performance, resulting in an average score of

906 ± 21

over 100 trials. In 2018, this model was the ﬁrst solution to solve the CarRacing task,

which required an average score above 900.

370

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

lin

inv

abs

lin

tanh

sin

inv

abs

tanh

ReLU

tanh

sin

inv

sin

lin

ReLU

sig

ReLU

step

ReLU

Gaus

sig

tanh

ReLU

sin

inv

tanh

abs

sig

Gaus

sig

sin

tanh

Gaus

tanh

lin

Gaus

ReLU

tanh

ReLU

sin

lin

step

inv

ReLU

sin

tanh

sig

step

ReLU

step

tanh

Gaus

ReLU

step

lin

sig

sin

lin

abs

step

tanh

ReLU

inv

step

inv

step

abs

step

Gaus

abs

inv

ReLU

tanh

inv

sin

tanh

bias

Brake

Gas

Steer

Figure 13.26: Combining a vision-only model with NEAT. Because NEAT is able to evolve the

network’s weights together with an increasingly complex neural architecture, it was able to evolve

a high-performing controller for CarRacing, which only uses the latent vector

𝑧

of the vision model

V to output the action.

13.5.3 Training an Agent Inside Its Own World Model

So far, we have demonstrated the usefulness of using a world model for the purpose

of extracting important features that tell the agent useful things about its environment,

particularly with spatiotemporal features through the vision and memory components of

the world model.

But a world model is far more useful than being merely a feature extractor. If we are

interested in feature extraction alone, there might be more direct ways of training neural

networks for that purpose. The key capability of a generative world model is the ability

to generate and simulate the actual environment, in latent space, kind of like running a

quick simulation in our minds. For instance, the memory component of our world model,

the recurrent neural network, is able to simulate approximate future trajectories of the

371

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

environment from the data the agent has collected.

The agent can even act inside this neural-network simulated environment of the world,

and observe hypothetical responses, learning from the consequences of its actions without

actually performing such actions in reality. This ability was demonstrated in an experiment

in the DoomTakeCover environment. We have already encountered the particular task

in the context of the AttentionAgent (section

4.4.3) and the deep innovation approach

(section

7.1.2). As a reminder, here the agent has to learn to avoid ﬁreballs shot by

monsters from the other side of the room. The cumulative reward is the number of time

steps the agent manages to stay alive during a rollout. Each rollout of the environment

runs for a maximum of 2,100 time steps (roughly a minute of actual gameplay), and the

task is considered solved if the average survival time over 100 consecutive trials is greater

than 750 time steps of gameplay.

To train the world model, like the CarRacing experiment, the agent explored the

environment using a random policy, and recorded trajectories over thousands of random

gameplays. Once the world models were trained, the agent was able to produce simulated

gameplays in latent space, using the RNN module alone.

The recurrent neural network was trained to produce not a deterministic prediction

of the next latent states of the world, but a probabilistic distribution from which we can

sample future latent states. As such, this distribution can be parametrized to artiﬁcially

produce wider or narrower distributions using a temperature parameter

𝜏

. This allows

us to bias the distribution to output the mode always, or produce outputs with more

uncertainty, and this feature is quite important for training an agent entirely inside the

world model. Table 13.4 displays the results when CMA-ES was used to train a controller

to perform well inside the world model, and how the policies learned transfer to the actual

environment.

Table 13.4: DoomTakeCover scores at various temperature settings.

Temperature 𝜏 Virtual Score Actual Score

0.10 2086 ± 140 193 ± 58

0.50 2060 ± 277 196 ± 50

1.00 1145 ± 690 868 ± 511

1.15 918 ± 546 1092 ± 556

1.30 732 ± 269 753 ± 139

Random Policy N/A 210 ± 108

We note that in the deterministic model (low temperature), the agent could easily ﬁnd

faults in its model of the world, and exploit them so that the learned policy would only

do well in its dream, but not in reality. In contrast, as the uncertainty of the model was

increased, this made the virtual environment generated by the agent’s world model much

more diﬃcult to beat, leading to policies that were transferable to the actual environment.

Varying the temperature in generation is just one of several possibilities for approaching the

transfer problem between performing a task inside a learned world model and performing

a task in the actual world.

372

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

To conclude this chapter, we have seen how the synergy between generative AI and

neuroevolution enables hybrid systems that blend creativity with optimization. Whether

through prompt evolution, model merging, or intelligent mutation strategies, neuroevolution

has proven to be a powerful approach for enhancing the capabilities of large models. There

are great opportunities to extend this concept further and incorporate the many other

neuroevolution approaches we have encountered in this book. We invite the reader to

explore these limitless possibilities.

Beyond its utility in a hybridized approach, neuroevolution also oﬀers something

deeper: a framework for understanding the very nature of evolution and intelligence. In the

next chapter, we turn our attention from what neuroevolution can do to what it can tell us

about biological evolution, and how intelligent behavior might arise through evolutionary

processes.

13.6 Chapter Review Questions

Large Language Models: What role does the transformer architecture and self-

attention mechanism play in the performance and scalability of LLMs like GPT?

Promptbreeder: What is the self-referential mechanism in Promptbreeder? How

does it diﬀer from EvoPrompt in optimizing task-speciﬁc prompts for LLMs?

Performance of EvoPrompt: How did EvoPrompt improve performance on

challenging tasks like the Big Bench Hard (BBH) benchmark? What are the key

contributions of the evolutionary algorithm?

Evolutionary Model Merging: What are the key diﬀerences between merging

models in data ŕow space and parameter space? How does evolutionary model

merging generate new composite models with emergent capabilities?

LLMs in Genetic Programming: How are LLMs utilized in enhancing genetic

programming through "diﬀ-based mutation"? What advantages do these mutations

oﬀer over traditional random or deterministic approaches?

LMX Generality: Explain how LMX demonstrates its versatility across domains

such as symbolic regression, text style transfer, and code evolution. What common

characteristic of LLMs enables this adaptability?

EvoLLM as Evolutionary Strategies: How does EvoLLM reconceptualize the role

of LLMs in evolutionary strategies compared to traditional ES methods? In what

ways does involving LLMs directly in the evolutionary cycle change the dynamics

of optimization?

MarioGAN vs. MarioGPT: How do MarioGAN and MarioGPT diﬀer in their

approaches to controllable level generation? What trade-oﬀs emerge between

optimization eﬃciency, controllability, and diversity in these two frameworks?

373

CHAPTER 13. SYNERGIES WITH GENERATIVE AI

World Models: What are the roles of the vision (V), memor y (M), and controller

agents to act eﬀectively in simulated environments?

10.

Simulated Learning with World Models: How do world models enable agents to

train within a neural simulator of reality, as demonstrated in the DoomTakeCover

environment? How does adjusting the temperature parameter inŕuence policy

transfer to the actual environment?

374

Chapter 14

What Neuroevolution Can Tell Us

About Biological Evolution?

In previous chapters, several examples were given of using neuroevolution to discover

behavior for intelligent agents. The goal was to construct artiﬁcial agents that could

perform complex tasks to aid humans, potentially in virtual worlds, household robots,

autonomous vehicles, etc. However, the approach can also be useful in the other direction,

i.e. in using neuroevolution to understand biological intelligence (Miikkulainen, 2025).

Why do cer tain neural structures exist in the brain, i.e. what do they do and how did they

come about? How do the genetic and environmental inŕuences combine to construct an

individual? What are the stepping stones in the evolution of intelligent behavior? How do

behaviors such as herding, hunting, and communication emerge? This chapter will review

progress towards answering these questions and identify further opportunities in them.

14.1 Understanding Neural Structure

Neuroscience aims to understand how the brain produces behavior. The neural structures

in the brain are highly organized into nuclei, or collections of neurons, and pathways

between them, and the goal is to identify what functions they each perform individually

and through interactions. Single-cell recordings have been used for a long time to uncover

such function at a low level, for instance identifying cells that respond to a particular

location in the visual ﬁeld, and a line of a particular orientation and direction of movement

in it (Hubel and Wiesel, 1968). More recently, several broader imaging techniques have

been developed to look at larger areas of the brain at once: voltage-sensitive dye imaging

can visualize entire maps, diﬀusion tensor imaging entire pathways, and, EEG, MEG, and

fMRI even the entire brain at once (Chemla and Chavane, 2010; Lenartowicz and Poldrack,

2010; Meoded, Poretti, Mori, et al., 2016). Sensory and motor functions are already

understood relatively well, and much progress is made in delineating higher functions

such as reasoning and language.

However, one important perspective that is often missing in such inquiries is that

the structures are a product of evolution. Part of what we obser ve today may not be

explained simply as serving a function in some optimal sense. Some of the structure is

375

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

there because evolution needed to discover it: It may not be optimal or necessary, but

is instead a remnant of evolutionary stepping stones. Humans still have tailbones even

though we no longer have tails. Speech organs look the way they do because they evolved

from mastication elements (MacNeilage, 1998). Similarly, in order to understand brain

structures and behavior fully, it may be necessary to understand their evolutionary origins.

Although the brain microstructure varies between individuals, the high-level orga-

nization is remarkably consistent between individuals and between species. Evolution

has come up with a successful solution and has created many variations of it that occupy

multiple niches in the world. A possible approach to understanding the brain is to create

artiﬁcial worlds, place artiﬁcial agents in them to face various challenges, and evolve

their brains to construct behaviors that allow them to survive and be successful. By

manipulating the environment, it may be possible to determine what structures are likely

to evolve and why. To the extent that they match those observed in biology, it may be

possible to gain insight into biology.

For instance, in one such grid-world simulation, an agent ﬁrst needed to navigate to a

zone where food items are located, while avoiding poison obstacles, and then to remain

in that zone and forage (ﬁgure 14.1; Aharonov-Barki, Beker, and Ruppin, 2001; Ruppin,

2002). The agents were controlled by a fully recur rent binary neural network with ﬁve

sensory, four motor, and six to 41 hidden neurons. After successful behavior had evolved,

the hidden neurons were analyzed through conventional neuroscience methods of lesioning

and receptive ﬁeld analysis. Remarkably, the successful networks had evolved a command

neuron (or a few) that essentially switched the network between the navigation and foraging

behaviors. The network starts by navigation, but as soon as the agent consumes a food

item, the command neuron switches it into foraging. Such command neurons emerged in

evolution because they resulted in higher ﬁtness: Individuals that were able to separate

the navigation and foraging behaviors found the food zone faster, avoided poison better,

and were able to forage more eﬃciently than those that mixed the two behaviors.

Interestingly, command neurons are found in many biological systems as well,

including aplysia, crayﬁsh, and even lobsters and crabs (Combes, Meyrand, and Simmers,

1999; DiCaprio, 1990; Edwards, Heitler, and Krasne, 1999; Teyke, K. R. Weiss, and

Kupfermann, 1990). They generally switch motor behaviors on and oﬀ based on sensory

input, similar to the command neurons that were evolved in the simulation. Thus,

the simulation demonstrates computationally not only how such a network implements

eﬀective behaviors, but also can arise in evolution as a solution to a computational need.

Beyond the single-neuron lesion and receptive ﬁeld analysis, the full access that

computational networks provide makes it possible to analyze the solutions in more detail.

For instance, multiple small perturbations to the network’s neurons or connections can

be introduced, and the contribution of each of these elements quantiﬁed by estimating

its Shapley value (a game-theoretic measure of contribution to a collaboration; (Keinan,

Sandbank, Hilgetag, et al., 2006)). Such an analysis makes it possible to identify the

role of each element in constructing a function, and it also makes it possible to prune the

network by removing elements that do not contribute signiﬁcantly. Although developed

for analyzing evolved artiﬁcial networks, the technique could in principle be adapted to

neuroscience, for instance based on multiple lesions, or on perturbations caused by TMS

376

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

Figure 14.1: Evolution of command neurons in a navigation and foraging task. In the simulated

grid world, there are a number of poison and food items. The agent needs to ﬁrst navigate to the

10 × 11

bottom left area where the food items are, eat as many of them as possible, and avoid

poison items at all times. The agent’s behavior was controlled by neural networks that were evolved

through genetic algorithms over time. Some of the evolved interneurons act as command neurons,

switching the behavior from navigation to foraging as soon as the ﬁrst food item is consumed.

Similar command neurons have been observed in biology; the experiment demonstrates how they

may ar ise as an advantage in evolving eﬀective behavior in the domain. Figure from Ruppin (2002).

(transcranial magnetic stimulation).

Neuroevolution simulations can be useful in evaluating hypotheses about the function

of speciﬁc circuits. For instance, facilitating synapses (Markram, Y. Wang, and Tsodyks,

1998) have been observed to activate postsynaptic neurons not only based on current

input but also based on a rate of activation change in the past. Most likely, they play a

role in processing temporal sequences, but they may also be useful in compensating for

propagation delays (Kwon and Choe, 2009; H. Lim and Choe, 2006). Although such

delays are not taken into account in abstract neural networks, in biological networks, delays

are an important factor. Information from the sensors takes time to propagate to neurons

that react to it, and proper responses to e.g. a moving object, require compensating for

these delays. With neuroevolution, it is possible to construct facilitating synapses that

play this role, resulting in more accurate performance in tasks such as pole balancing with

synaptic delays. Such compensation amounts to rudimentary prediction, and suggests

that coping with synaptic delays may be a foundation for predictive mechanisms, which

have been proposed to underlie much of cognitive processing (Hawkins and Ahmad, 2016;

Hawkins and Blakeslee, 2004).

Neuroevolution simulations can also be used to target speciﬁc biological behaviors.

For instance, such experiments have been useful in understanding locomotion circuits

in animals (Beer, Chiel, and Gallagher, 1999; Chiel, Beer, and Gallagher, 1999). Such

377

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

circuits are often called CPGs, or central pattern generators, because they provide a cyclical

activity pattern that can be used to control the gait through multiple muscles (Buzsáki,

2006; Steuer and Guertin, 2019). Such networks are relatively small, consisting of three

to ﬁve neurons in a continuous-time recurrent neural network (CTRNN). However, they

generate complex dynamics that also change over time. The simulations made it possible

to characterize such dynamics mathematically and experimentally, and demonstrate how

such neural systems can be composed of multi-stable dynamic building blocks. In some

cases, it was possible to assign functional roles to these blocks; in others, they remained

opaque as supporting interneurons.

These mathematical characterizations of CPGs were expanded into simulations of

actual locomotion in lampreys and salamanders, both in swimming and walking (Ijspeert,

2008; Ijspeert, Crespi, Ryczko, et al., 2007). The evolved networks coordinate the

oscillatory patterns of the CPGs as inputs to the two legs on each side of the body, resulting

in motions required for eﬀective propulsion. Remarkably, such evolved controllers resulted

in more robust patterns and ŕexible control than a model that was built by hand. Also,

the oscillation patterns and the connectivity structures were closer to those observed in

biology, again demonstrating how the biological structures may arise from evolutionary

pressure to perform well wrt. a behavioral challenge in a physical environment. Moreover,

the same circuit can control both swimming and walking, as well as transitions between

them, potentially demonstrating a crucial phase in the vertebrate evolution from aquatic to

terrestrial.

Beyond pattern-generator circuits, a more general question concerns network building

blocks. Evolved neural networks often include identiﬁable motifs, i.e. patterns of

connectivity that occur more frequently than they would in randomly generated networks

(Kashtan and Alon, 2005; Kashtan, Itzkovitz, Milo, et al., 2004). It turns out that these

same motifs can also be found in biological networks. Thus, computational simulations

can then be used to identify what function they may perform. For instance, the feedforward

loop motif can be used to ﬁlter information, generate pulses, and increase responses,

and the single-input motif can generate time-varying gene expressions. Evolved neural

networks can then demonstrate how behavior is composed of such building blocks, for

instance uncovering spatial specialization in a visual pattern recognition circuit.

Beyond understanding motif function, neuroevolution can be used to illustrate how

motifs, and more generally modules, emerge. It tur ns out that if the network is evolved to

simply solve one task, they are unlikely to arise. However, if the environment requires

solving multiple goals composed of diﬀerent combinations of subgoals, and the goals

change over time, modular network structure and motifs do arise. In this manner, evolution

ﬁnds modularity as an eﬀective way to discover subfunctions that can be used to construct

multiple behaviors. Indeed, the modular structure of the brain supports this hypothesis:

many areas of the brain participate in many tasks in diﬀerent combinations. Even

the visual areas are used in some language tasks and vice versa, suggesting that their

computational function is more general than just one modality. Neuroevolution studies

can thus demonstrate this general principle as a solution arising from the complexity of

tasks the animal has to solve.

Because neuroevolution is an optimization method, it can also be used in a diﬀerent

378

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

role in understanding neural structure: Instead of evaluating their evolutionary origins,

to optimize the model parameters. Biophysical models are created with objectives and

constraints derived from experimental data. They often contain parameters that are

diﬃcult to set correctly to match the data, but can provide insights into the biological

structures and processes. Neuroevolution can be eﬀective in this role: It has been used

for instance in optimizing the spiking patterns on the Izhikevich model of hippocampal

neurons (Venkadesh, Komendantov, Listopad, et al., 2018) and ﬁtting multicompartmental

models to multilocation patch-clamp and microelectrode array data (Buccino, Damart,

Bartram, et al., 2024; Druckmann, Banitt, Gidon, et al., 2007). Interestingly, as discussed

in section 11.5, neural network implementations in hardware often utilize spiking neural

networks to reduce energy consumption; it has turned out useful to optimize their structure

and hyperparameters through evolution (Iranmehr, Shouraki, Faraji, et al.,

2019; Schuman,

Patton, Kulkarni, et al., 2022). Neuroevolution can thus realize the potential of such

biologically more accurate models, suggesting how behavior can arise from the biophysical

properties expressed in their parameters.

Neuroevolution simulations can also be used to explore other hypotheses about the

development of modularity and organization. One such hypothesis is to minimize the total

wiring length, as will be discussed next.

14.2 Evolutionary Origins of Modularity

Given that the primary role of the brain is to process information, it is natural to try

to explain its entire structure and function in computational terms. However, it is

sometimes useful to recognize that the brain is also a physical organ, and there are physical

requirements that must be met. For instance, some of the brain structure may be due to

the need to maintain eﬃcient metabolism, i.e. to bring oxygen and nutrients to the cells,

including the vascular structure and the blood-brain barrier. While bigger brains in general

are more powerful, the size of the brain is limited by the birth canal. Some of the growth

mechanisms after birth may exist to compensate for it, rather than be driven entirely by

the need to construct an eﬃcient information processing system. Similarly, the overall

organization, with gray matter on the outside and white matter on the inside, and the highly

convoluted surface with gray matter, amounts to an eﬃcient use of the available space.

The need to minimize wiring length is an important principle that may have aﬀected the

evolution of brain str ucture more generally (Horvát, Gămănu

, Ercsey-Ravasz, et al., 2016;

Sporns and Betzel, 2016). In particular, it may be the evolutionary origin of modularity.

This is an interesting possibility because modularity is also a powerful functional principle.

While a tightly connected system may in principle provide more complex functionality, it

is more diﬃcult to construct, maintain, and adapt a system where everything depends on

everything else. For instance in engineering, modular structures are often used because

they make such processes easier. For these same reasons, evolution may have favored

modular designs as well.

However, such pressures are relatively weak compared to simply performance, and it

has been diﬃcult to demonstrate this theory biologically and computationally. In contrast,

it turns out to be possible to demonstrate that minimization of wiring length can play a

379

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

primary role in the evolution of modularity; the functional advantages then emerge as a

secondary, reinforcing side eﬀect (Clune, Mouret, and Lipson, 2013).

Computational experiments were set up to compare the evolution of neural networks

in a visual object recognition task under two conditions: with a single objective of

maximizing performance alone, and with two objectives of maximizing performance

and minimizing wiring length simultaneously. Since wiring length is presumably less

important for survival than performance, it was set to aﬀect selection only 25% of the time.

Wiring length was measured as the total squared length of all connections and NSGA-II

was used to construct a Pareto front of the two objectives.

The task, originally proposed by Kashtan and Alon (2005), involved an eight-pixel

retina where an object might appear either in the left or right half, or both (ﬁgure 14.2).

Note that it is indeed possible to decide whether there is an object on the left/right

half before combining these decisions; the task should therefore lend itself to modular

solutions. Performance was measured simply as the percentage of correct answers. Simple

feedforward networks with three hidden layers were evolved in this task. They had integer

weights and thresholds, and mutations to add or remove a connection and increase or

decrease a weight or a threshold. The networks were initially set up randomly; their

modularity was measured by ﬁrst dividing the networks optimally into modules, and then

comparing the density of connections within each module to that of a randomly connected

network (Newman, 2006).

In 25,000 generations, the performance+wiring-based evolution resulted in more

modular networks than the performance-based evolution. Such structural modularity

resulted in functional modularity as well: The modules often corresponded to making a

decision on the left or the right side. Interestingly, many such networks actually per formed

better than those that were evolved only to maximize performance. They were generally

smaller and therefore perhaps easier to optimize; a good non-modular network may also

be more diﬃcult to ﬁnd. The networks with the shortest wiring length were more likely to

be modular. However, evolution did ﬁnd some well-performing non-modular networks as

well, suggesting that modularity does not arise from performance alone.

The modular networks also turned out to be more evolvable. In fur ther experiments,

networks were evolved in a sequence of two tasks: they were ﬁrst evolved to answer

whether an object appeared both left and right, and once they had learned this task, further

evolved to answer whether an object appeared in either left or right (the opposite order

of tasks was also run). The modular networks required fewer generations to adapt to the

new environment, and they were more modular than in an unchanging environment. The

results thus suggest that modularity evolves primar ily due to wiring length; once it is there,

it is further enhanced by the need to adapt. Thus, neuroevolution simulation can be used

to gain insights into the origins of modularity in biology.

Knowing that modularity is helpful and that minimizing wiring length leads to

modularity, it is possible to take advantage of this principle in neuroevolution more

generally. For instance, applied to the same retina problem, the basic HyperNEAT method

does not discover modular solutions reliably, and does not perform well. However, it can

be extended to specify wiring patterns in addition to connection weights (Verbancsics

and Stanley, 2011). If these patterns are biased to favor local connections initially,

380

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

Figure 14.2: Evolution of modularity based on maximizing performance and minimizing

wiring length. The goal was to evolve a visual system to locate and identify objects. (

𝑎

) Objects

appear on the left and/or the right side of the retina, and the network needs to decide whether

there is an object in both. (

𝑏

𝑑

) With the objective of minimizing wiring length, more modular

networks evolve over time. (

𝑐

) Modular networks also perform better, although there are some

well-performing non-modular networks as well. Computational simulations thus suggest that

wiring length is the primary evolutionary pressure behind modularity; performance and adaptability

pressures may further enhance it. Figure from Clune, Mouret, and Lipson (2013). Videos at

https://neuroevolutionbook.com/demos.

modular structures do emerge, improving performance signiﬁcantly. This method, called

HyperNEAT-LEO (for link expression output) can be seen as an extension of the wiring

length hypothesis: It suggests that if local circuits evolve early and more complex structures

with long-range connections later, evolution is biased towards ﬁnding modular solutions

even without an explicit objective to do so. Assuming that more complex nervous systems

evolved from simpler ones in biology, it suggests that modularity evolved naturally as a

side eﬀect.

381

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

14.3 Understanding Neuromodulation

As has been mentioned several times in this book, there are many biological constraints

and mechanisms that are likely to have an eﬀect on neural function, but are not included

in the standard neural network models. One of those mechanisms is neuromodulation.

In section 12.3.3, it was discussed as a possible method for learning when to learn; this

section aims to further understand its evolutionary origins

In a neuromodulated network, some neurons have a multiplicative eﬀect on the

weighted sum of inputs, or on the Hebbian weight change. Such modulation can lead to

more complex behavior and more powerful adaptation. For instance, backpropagation

can be extended to multiplicative neurons in a straightforward manner. The gradient

descent equations can be derived for such connections, resulting in sigma-pi units (sigma

represents the sum of inputs, pi represents the product of multiplicative inputs). This

method results in smaller networks: for instance, XOR can be represented in just three

units: one computing ND, one OR, and one selecting between them multiplicatively

(Pollack, 1987; Rumelhart, Hinton, and R. J. Williams, 1986). Scaling up, such networks

have been useful in for instance recognizing whether a string adheres to a particular

grammar: a single symbol at the wrong place can change the decision, which behavior

can be represented well by multiplicative connections (Giles, C. B. Miller, D. Chen, et al.,

1991). Such networks can be evolved just as well as weighted-sum networks, achieving

the same beneﬁts.

An interesting question is whether neuroevolution would select for neuromodulation

in order to solve a task, that is, whether it would emerge in evolution as an adaptive

advantage. In one such experiment, neuromodulation was set to modify plasticity in

Hebbian networks, i.e. those where a connection strengthens when both presynaptic and

postsynaptic neurons are simultaneously highly active (Soltoggio, Bullinaria, Mattiussi,

et al., 2008). In contrast with backpropagation, which is an abstraction of learning in

biological neural networks, Hebbian plasticity is an actual plasticity mechanism in biology.

Connection weights were adapted as

𝑤

𝑗𝑖

= 𝜂 tanh(𝑜

𝑚

)(𝐴𝑜

𝑗

𝑜

𝑖

+ 𝐵𝑜

𝑗

+𝐶𝑜

𝑖

+ 𝐷), (14.1)

where

𝜂

is the learning rate,

𝑜

𝑚

is the modulatory neuron output,

𝑜

𝑗

is the presynaptic

activation and

𝑜

𝑖

is the postsynaptic activation, and

𝐴

𝐵

𝐶

, and

𝐷

are constants

(ﬁgure 14.3

𝑎

). In this manner, the modulatory neuron controls whether the weight

increases or decreases, and scales the magnitude of the Hebbian adaptation.

The approach was evaluated in the task of navigating a T-maze or double T-maze into

a reward location, i.e. making the correct turn once or twice to get to the reward, and then

navigating back to the starting location (ﬁgure 14.3

𝑏

). Each agent was tested 100 times,

and at some point, the reward location changed, so it had to adapt its behavior. It could do

so through recurrent connections that implemented memory, or by changing its weights

through plasticity. The agent networks were evolved by inserting, duplicating, or deleting

neurons, which could be either standard or modulatory, and by mutating the constants

𝐴

𝐵, 𝐶, 𝐷, and 𝜂 in equation 14.1 and the real-valued weights through evolution strategy.

Even though the tasks were sometimes solved without plasticity and modulation,

networks with plasticity evolved to perform signiﬁcantly better in the 100 trials. Networks

382

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

(𝑎) Neuromodulation circuit (𝑏) T-maze task

Figure 14.3: Taking advantage of neuromodulation in the maze navigation task. Neuromodula-

tion oﬀers a dimension of adaptation that may make it easier to solve complex tasks. (

𝑎

) The three

standard neurons activate the postsynaptic neuron through a weighted sum as usual. A modulatory

neuron then ampliﬁes the Hebbian adaptation of those weights. (𝑏) The agent needs to traverse a

corridor and then turn left or right to get to the larger reward; in a double maze (not shown), two

such turns need to be made. The location of that reward changes periodically, and the agent needs

to adapt its behavior accordingly. Networks evolved with modulation perform more reliably than

non-plastic and non-modulatory networks, suggesting that evolution ﬁnds a way to take advantage

of modulation even when it is not strictly necessary. Figure from Soltoggio, Bullinaria, Mattiussi,

et al. (2008).

with modulation per formed similarly in the T-maze, but signiﬁcantly better in the double

T-maze. The solutions had many diﬀerent str uctures that were hard to interpret, but

ablation studies showed that modulation plays an interesting role. When it was turned oﬀ

from networks that were evolved with it, the networks still performed well locally, i.e. made

turns and did not crash into walls. But they could often only turn in one direction, and

could not navigate globally e.g. to ﬁnd their way back to the starting location. This result

suggests that neuromodulation is not simply an add-on that helps solve more complex

tasks, but is integrated into the dynamics of the navigation behavior. Successful behavior

can be evolved without it, but solutions with modulation are easier to discover. They

therefore evolve more reliably, resulting in better average performance.

A related experiment, which we previously reviewed in section 12.3.3, further suggested

a possible biological mechanism for neuromodulation. In a stochastic reward optimization

task, modulation-activated reinforcement learning when it was most needed, allowing

the system to adapt better to new scenarios (Soltoggio, Dürr, Mattiussi, et al.,

2007).

Modulation was achieved through dynamics similar to dopaminergic activity recorded in

the monkey’s brain (e.g. Schultz, 2024), giving it a computational interpretation.

The experiments thus show that the evolutionary process ﬁnds a way to utilize whatever

dimensions of adaptation there are, rather than ﬁnding parsimonious solutions that ignore

the dimensions that are not necessary. If neuromodulation is possible, neuroevolution will

take advantage of it.

383

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

14.4 Developmental Processes

A fundamental question in cognitive science is how much of intelligent behavior in humans

is innate, and how much is learned. This question is often referred to as the łnature vs.

nurturež debate. Both of these factors play a role, of course, and are often synergistic

through the process of development. Further, initial development, as well as long-term

stability, can be driven by genetically directed learning, as will be reviewed in this section.

14.4.1 Synergistic Development

Given the relatively small number of genes in the human genome (about 24,000; Interna-

tional Human Genome Sequencing Consortium, 2004), a learning process is necessary to

construct an organ as complex as the brain. On the other hand, genetic determination is

also necessary: It can provide the overall structure, initialization, and a learning bias that

then makes it possible to construct such complexity during the lifetime of the individual.

Perhaps the clearest example of this process is language: All normal humans, and only

humans, have an innate capacity for language. However, they need to learn a language in

early childhoodÐlanguage does not develop in isolation (section 14.8.1).

For many animals, the fundamental survival skills are there right after birth. For

instance, newborn gazelles can run immediately, and whale calves can swim. For higher

animals, there is a long period of development during which they are dependent on their

caregivers. This period is exceedingly long for humans, and includes a series of critical

periods during which skills such as walking, talking, and social intelligence develop in

an orderÐand if they do not, the individual will not be able to develop them fully later

(Robson, 2023). This obser vation suggests that the relationship between evolution and

learning, that is, the process of development, is more nuanced and structured than simply

reﬁnement of a genetic starting point.

In principle, evolution can discover complete solutions that do not need to be reﬁned

further. Most of evolutionary computation is also based on this approach. However, in

constructing brains, evolution seems to have discovered a diﬀerent approach, described

theoretically as synergistic development (Elman, Bates, M. H. Johnson, et al., 1996).

Instead of specifying a complete solution, only the general structure is genetically

determined, together with a learning mechanism that allows the animal to construct the

full solution. These components are synergistic: The structure and initialization make

learning most eﬀective, and the learning mechanism is well-suited for the structure and

the environment. The minimally functional initialization and the critical periods are part

of this synergy. That is, instead of a fully speciﬁed design, evolution has discovered a

developmental process as the solution. This approach can be seen as an implementation

of expressive encoding, with the power to discover solutions that would be diﬃcult to ﬁnd

through direct evolution (section 9.1.4).

Computational studies can be instrumental in verifying this theory. An early example

is an experiment with simulated creatures foraging for food items randomly scattered in a

2D grid world (Nolﬁ, Elman, and Parisi, 1994). They receive the current (

𝑡

) angle and

distance to the nearest food item as their input, and generate an action (turn left or right,

move forward, or do nothing) at the next time step (

𝑡

) as their output. The creature’s

384

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

(𝑎) Network architecture (𝑏) Lifetime learning (𝑐) Evolution of foraging

Figure 14.4: Synergistic development in a foraging task. The creatures evolve to navigate to

food items, aided by development to predict the consequences of their actions. (

𝑎

) The evolved

network is trained to predict how its sensory inputs change as a result of its cations in the previous

time step. (

𝑏

) Their prediction ability improves over their lifetime throughout evolution; even in

later generations (near G99), it is not genetically encoded. (

𝑐

) The development of prediction

allows evolution to discover better solutions faster. Thus, the experiment demonstrates the value of

synergistic development. Figures from Nolﬁ, Elman, and Parisi (1994).

ﬁtness corresponds to the number of food items it ﬁnds. The optimal actions are not

known, but the entire network can be evolved to discover successful foraging behavior.

However, in this experiment, the creatures also receive their previous action (at

𝑡

)

as additional input, and predict the sensory input at the next time step (

𝑡

) as additional

output. These additional outputs are known, and therefore the network can be trained

through gradient descent to predict the consequences of its actions. This training takes

place during the lifetime of the creature, and the weight changes are not encoded back to

the genome.

Thus, lifetime learning establishes a developmental process. The creature learns to

understand how its actions aﬀect its environment, much like biological organisms learn

to interact with their environment. Such learning allows it to perform better at the task

for which it is evolved, and it guides evolution to generate individuals that can take better

advantage of the learning process (ﬁgure 14.4). Note that the prediction ability does

not become encoded in the genes; the individuals start with poor ability even in later

generations. Evolution instead utilizes learning as part of the synergistic developmental

process. As a result, creatures that perform better are discovered faster.

In this manner, computational experiments can be used to gain insight into how

development works and why it is so powerful. One such insight is that evolution

establishes the proper learning biases, and learning provides the variance necessary to

adapt to the world, as will be discussed in the next section.

On the other hand, it may also be possible to build more complex artiﬁcial systems by

employing these same principles. Progress in such systems, and further opportunities, are

reviewed in section 4.2.

14.4.2 Development through Genetically Directed Learning

One way to characterize the synergy of evolution and learning is through the general

machine learning concepts of bias and variance. Biases exist in any learning system,

385

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

making it more likely to learn certain kinds of behavior, and less likely to learn others. In

contrast, variance means that it can learn a wide variety of patterns that exist in the training

data. A pure evolutionary system can be seen as completely biased with no variance: The

behavior is determined genetically, and there is no learning based on input. In contrast, a

pure learning system has no bias and only learns the patterns in the input.

Neither of such extremes is likely to be very successful. It is diﬃcult to anticipate all

possible input situations ahead of time, during evolution. On the other hand, it is diﬃcult

to lear n a robust function through high variance; the system is likely to end up overﬁtting

and not generalizing well to new situations. Thus, a developmental system is a way to

strike a balance between these two eﬀects. Evolution establishes the proper bias, making

it easier for the learning system to acquire a useful, robust function from the inputs.

The biases can be most directly established by evolving the learning system itself. For

instance, parameters for Hebbian learning can be incorporated into neuron deﬁnitions

and evolved together with the network itself (Floreano and Urzelai, 1999). Through the

lifetime of learning with these parameters, controllers in a robot navigation task can be

evolved faster than without learning. Evolution converges on learning parameters that are

the most eﬀective, thus ﬁnding a proper balance between bias and variance.

A biological example of this process can be seen in the domain of constructing a

pattern recognition system (Miikkulainen, Bednar, Choe, et al., 2005; Valsalam, Bednar,

and Miikkulainen, 2007). Indeed, visual systems of animals are believed to combine

nature and nurture in a systematic way: The general structure is genetically determined

to match the needs of the species, and then ﬁne-tuned through learning. For example,

retinotopy and orientation sensitivity exist even before birth in cats and monkeys, but the

full structure is formed during the ﬁrst few weeks after the eyes open. Human newborns

have an innate preference for face-like patterns, which is reﬁned to actual face preferences

during the ﬁrst few months of life. It can also help explain other species-speciﬁc visual

functions that appear innate, such as detecting prey (e.g. ŕies in frog vision; Lettvin,

Maturana, McCulloch, et al., 1940).

The way such preferences are established is particularly interesting. While it is

possible to specify some neural network structure genetically, such as retinotopy, a learning

mechanism also exists and may be active even before bir th. Evolution seems to have

discovered a clever way to utilize it even in the process of creating the proper initial

biases: Much of the initial structure can be constructed through the learning of internally

generated patterns. Propagating activity waves in the retina allow orientation detectors

to form; three-dot patterns in the ponto-geniculate-occipital loop may result in face

preference (corresponding to the two eyes and the mouth). Thus, evolution does not need

to specify a full visual system, and it does not even need to specify a full starting point

for learning: It can instead specify a way of generating internal patterns that establishes

useful species-speciﬁc biases.

To illustrate the power of this process, pattern-recognition neural networks were

constructed in three diﬀerent ways: purely through learning, purely through evolution,

and through a combination of evolved prenatal pattern-generation and learning (Valsalam,

Bednar, and Miikkulainen, 2007). The task consisted of recognizing hand-written digits in

the NIST dataset. Each evolved pattern generator encoded a distribution of Gaussians with

386

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

diﬀerent positions, rotations, and elongations. Their ﬁtness was based on classiﬁcation

accuracy of the system that was ﬁrst trained with the generated patterns, and then with the

actual patterns in the dataset.

The learning mechanism was simple competitive learning. Each of the 10 neurons

had a weight vector 𝑤, randomly initialized and then normalized to unit length:

𝑤

𝑖

𝑤

𝑖



𝑖

𝑤

𝑖

. (14.2)

Each neuron responded to an input vector 𝑥 through a weighted sum

𝑦

𝑗

= Σ

𝑖

𝑤

𝑖

𝑥

𝑖

. (14.3)

The weight vector of the winning neuron, i.e. the one with the highest response, was then

rotated towards the input vector, i.e. ﬁrst modiﬁed with

𝑤

𝑖

(𝑡 +1) = 𝑤

𝑖

(𝑡) + 𝜂(𝑥

𝑖

− 𝑤

𝑖

(𝑡)), (14.4)

and then normalized to unit length. Competitive learning was used because it is a good

model of biological (Hebbian) learning, and also because it is relatively weak and therefore

depends more on bias.

As expected, pure competitive learning developed weight vectors that resembled actual

digits (ﬁgure 14.5

𝑏

). However, competitive learning is not very powerful, and usually did

not learn to separate all digits. In particular, it had trouble with 7, 8, and 9 because they have

many overlapping pixels. Direct evolution, in contrast, has no reason to learn weight vectors

that resemble digits. The patterns it developed simply emphasized diﬀerences between

digit categories, and formed a good foundation for separating them (ﬁgure 14.5

𝑐

). Pattern

generation and learning resulted in a most interesting solution that clearly illustrates the

importance of having a proper bias. Evolution created pattern generators that emphasized

the diﬀerent horizontal locations around the midline (ﬁgure 14.5

𝑑

). Only a few units

learned these patterns, but it was enough to separate 7, 8, and 9 to diﬀerent units

(ﬁgure 14.5

𝑒

). As a result, the postnatal learning with actual examples created a reliable

categorization of most examples (ﬁgure 14.5 𝑓 ).

Thus, evolution was able to discover a proper bias so that even a simple learning system

could perform well on this task. Although it was designed to illustrate a possible biological

synergy of evolution and learning, the general approach may be useful in constr ucting

complex systems in general,

Moreover, the mechanism of internal pattern generation may play a role in the

maintenance of such systems throughout the lifetime of the animal (Miikkulainen, Bednar,

Choe, et al., 2005). Environmental conditions often change, and the animal needs to adapt

to such changes. If such adaptation is based purely on learning, it could easily overﬁt,

and catastrophic forgetting could result. However, if pattern-generator-based learning

continues together with learning from the environment, it can serve a stabilizing eﬀect.

Adaptation to new inputs is combined with continual adaptation to the fundamental patterns

in the domain. Such learning could occur e.g. during REM sleep. This mechanism could

potentially explain why animals learn altered environments only partially, and why they

387

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

(𝑎) Initial random weight vectors

(𝑏) Final competitive learning weight vectors

(𝑐) Final evolved weight vectors

(𝑑) Examples produced by an evolved pattern generator

(𝑒) Weight vectors after prenatal training with evolved patterns

( 𝑓 ) Final weight vectors after additional competitive learning

Figure 14.5: Synergy of evolution and learning through evolved pattern generators. The task

was to recognize handwritten digits on a

10 ×10

simulated retina; the recognition system consisted

of 10 neurons that adapted through competitive Hebbian learning. (

𝑎

) The weight vectors of each

neuron (unit) were initialized randomly. (

𝑏

) When they learned through competitive learning, the

ﬁnal weight vectors resembled the inputs. However, learning was not very eﬀective, and e.g. 7, 8,

and 9 were often confused. (

𝑐

) When the weight vectors were evolved directly, they emphasized the

diﬀerences that matter for classiﬁcation. (

𝑑

) The evolved patterns emphasized mostly the locations

in the horizontal midline. (

𝑒

) Prenatal training with such patterns took place only in two units, but

it was enough to separate 7, 8, and 9. (

𝑓

) After postnatal learning with actual handwritten digit

patterns, most examples were categorized correctly. Evolution thus discovered useful biases and

utilized the learning mechanism itself to encode them, thus demonstrating synergy of evolution and

learning. For animations of these processes, see

https://neuroevolutionbook.com/demos

Figures from Valsalam, Bednar, and Miikkulainen (2007).

spend much time on REM sleep when their neural structures are most plastic. Evolved

pattern generators can thus provide a mechanism for continual genetic inŕuences on

behavior. It could similarly be instrumental in keeping artiﬁcial systems both adaptive and

stable.

388

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

A further aspect of the synergy between evolution and learning is that evolution can

discover the actual learning mechanisms. For instance in the task of discovering repeated

patterns in an input sequence with a spiking neural network, evolution discovered plasticity

rules that made the task possible in three diﬀerent settings (Jordan, Schmidt, Senn,

et al., 2021): with reward feedback (reinforcement learning), error feedback (supervised

learning), and without feedback (correlation-based unsupervised learning). With Cartesian

genetic programming as the evolution method (J. F. Miller, 2011), the system discovered

symbolic expressions for such plasticity, making it possible to interpret the underlying

physical factors, such as homeostasis in the well-known spike-timing-dependent plasticity

mechanisms (STDP; S. Song, K. D. Miller, and Abbott, 2000).

Many of the meta-learning methods reviewed in chapter 11 and others optimize diﬀerent

aspects of the learning mechanisms (Bingham and Miikkulainen,

2022; Confavreux, Zenke,

Agnes, et al., 2020; Elsken, Metzen, and Hutter, 2019; Gonzalez and Miikkulainen, 2021;

Najarro and Risi, 2020; Tyulmankov, G. R. Yang, and Abbott, 2022). While often the goal

is to simply improve machine lear ning performance, such methods can also lead to insights

into the learning algorithms themselves. For instance, in an experiment where agents

needed to adapt to changing reward locations in a Minecraft navigation task, evolution

discovered innate reward neurons that made the search for the reward eﬀective even without

an explicit reward signal (Ben-Iwhiwhu, Ladosz, Dick, et al., 2020). Neuroevolution

thus discovered structures that facilitated learning during the lifetime of the agent. Such

synergies result in more powerful machine learning, but also help us formulate speciﬁc

hypotheses about biological adaptation.

14.5 Constrained Evolution of Behavior

Much of this book has focused on the neuroevolution of behavior, and for good reason:

Behavior arises naturally from neural networks, and evolution is a natural way to discover

them. Neuroevolution is one of the main approaches in the scientiﬁc ﬁelds of artiﬁcial life,

which explores the nature and principles of living systems through computer simulations,

and adaptive behavior, which focuses on understanding how behavior arises in biology

and in autonomous artiﬁcial systems. Further, neuroevolution can be used as a tool in

evolutionary biology, not only to understand the evolutionary origins of circuits and

mechanisms (as was done in previous sections), but also to formulate and evaluate

hypotheses about the origins of behaviors and cognition. This is the topic of the remainder

of this chapter.

Section 7.1 illustrated an important principle in evolution of complex behavior: It does

not exist in a vacuum, but is constrained and guided by interactions with the environment

and with other agents. Simulations of cooperative evolution can thus help us understand

the origins of biological behaviors as well. Section 7.1 already demonstrated several such

opportunities, including how role-based cooperation may emerge, how adaptive teams

can evolve, and how an evolutionary arms race may result in sophisticated herding and

hunting behaviors.

This section further expands and generalizes that principle. The guidance may

originate not only from complex interactions with the environment, but from general

389

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

constraints on what the agent can do. For instance, a physical body imposes limits on

what movements are possible. Sensory perception is limited, and processing power in

decision-making is ﬁnite. If the goal is to build capable artiﬁcial agents, it makes sense to

furnish them with as few such constraints as possible. Evolution can then be the most

creative, and the agents most powerful in their task. However, if the goal is to create agents

that are believable, for instance as simulated intelligent agents in a virtual environment,

such constraints constitute an important guide: Evolution under constraints observed in

nature leads the optimization process to discover behaviors that are natural, believable,

and human-like. In other words, it explains the observed behaviors as optimal under the

constraints seen in nature.

These eﬀects can be observed most clearly in simulations of virtual creatures (Bongard

and Pfeifer, 2001; Hornby and Pollack, 2001a; Sims, 1991; Sims, 1994). Both the bodies

and the brains of simulated physical creatures are evolved simultaneously in a simulated

physical medium, such as a terrain or water. With even a simple ﬁtness reward, such as

getting close to a target, they develop both body structures and ways of moving their body

that look remarkably animate.

Such target-following behaviors have been evolved in multiple experiments, with

increasingly complex body str uctures and environments, and modes of locomotion such

as running, swimming, and ŕying (Lehman and Stanley, 2011b; Miconi, 2008; Pilat and

C. Jacob, 2010; Shim, S. Kim, and C. Kim, 2004). However, evolving more complex

behaviors has turned out signiﬁcantly more challenging. For instance, it has been diﬃcult

to evolve creatures that would be able to employ diﬀerent behaviors at diﬀerent times, and

make intelligent decisions between them.

One possible approach is to design a syllabus, i.e. a hierarchy of increasingly complex

behaviors, and evolve them incrementally (Lessin, Fussell, and Miikkulainen,

2013; Lessin,

Fussell, and Miikkulainen, 2014). The bodies in this experiment consisted of cylinders

of diﬀerent shapes, connected through muscles and attached through diﬀerent kinds of

joints, as well as sensors for threatening and attractive targets. The brains were neural

networks containing some higher-level nodes such as those generating oscillation. At the

lowest level, bodies and brains were evolved to move as fast as possible, to turn left and

right, and to exert as strong a strike on the ground as possible. These behaviors were

then encapsulated, i.e. the evolved neural network structures frozen and a trigger node

added in order to activate and deactivate them. A second layer of behaviors was then

evolved as neural networks that could activate the low-level behaviors as their output; they

included moving or following a target, as well as running away from a target, both as as a

combination of turning and locomotion. These behaviors were similarly encapsulated, and

at the next level, combined with the strike behavior to establish an attack behavior. At the

highest level, then, the attack and the running away were combined into łﬁght-or-ŕightž:

if the object was sensed as threatening, run awayÐif it was sensed as attractive, attack.

The behavior that evolved was indeed highly believable, at least in a subjective sense.

Several diﬀerent kinds of bodies evolved at the lowest level, and behaviors were natural to

them. For instance, some creatures had multiple legs and moved them rhythmically in

order to advance. One agent consisted of simply two blocks, and was jumping forward

one block by shaking the other block up and down. In order to create a strike, an agent

390

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

(𝑎) Fight (i.e. attack) activated when sensing a good object

(𝑏) Flight (i.e. retreat) activated when sensing a bad object

Figure 14.6: Neuroevolution of complex behavior in evolved virtual creatures. The bodies

and the brains of simulated creatures were evolved together, thus providing constraints on what

kind of movements were possible. As a result, they appear natural and therefore believable. The

low-level behaviors such as locomotion, turning right and left, and strike were encapsulated and

formed sub-behaviors to more complex behaviors turn-from, tur n-to, retreat, and attack. At the

highest level, the creature chooses between (

𝑎

) ﬁght and (

𝑏

) ŕight depending on the object, as

seen in this pair of ﬁgures. Such believability makes it natural to anthropomorphize the agents,

which can be appealing in constructing virtual worlds. For animations of these behaviors, see

https://neuroevolutionbook.com/demos.

with two side blocks acting as weights evolved to jump and land hard. Another one with

a long arm evolved to hit the ground hard with it. In all these cases, the behaviors that

evolved made sense in that particular bodyÐit was also fascinating to see that there was

no one solution, but many quite diﬀerent solutions that were successful.

The behavior was also believable at the higher levels, including ﬁght or ŕight. After

watching the simulation for a while, it is easy to anthropomorphize the agent: It seems to

have a purpose when it chases a moving target, and when the target changes to a threatening

one, it seems scared reacting to the change and running away. And if the threatening

object catches up with it and destroys it, you feel sorry for it. It is these kinds of agents that

we can identify with and anthropomorphize that we would like to inhabit virtual worlds

that are now being constructed. Constrained body-brain evolution may be a good way to

get there. It is also a possible way to demonstrate why and how such a diversity of bodies

and behaviors has evolved in natureÐas diﬀerent possible solutions to the same survival

challenges.

391

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

14.6 Case Study: Understanding Human-like Behavior

Whether a behavior is believable or not is highly subjective and diﬃcult to evaluate. In

order to do that, several blind human judgments need to be collected under controlled

conditions. It is of course possible to conduct such a study in the laboratory with human

subjects. However, observing and interacting with virtual creatures is a lot of fun, and

the evaluation can be as well. What if we turn the evaluation into a competition, and in

addition to that, r un it as an event at a conference where the audience consists of intelligent

agent researchers and people interested in bringing AI into games?

This was indeed the goal of the Botprize competition, which ran at the computational

intelligence in games conference in 2007-2012 (Hingston, 2012). In essence, the

competition was a Turing test for game bots: In the Unreal 2004 video game, there were

both agents controlled by AI and agents controlled by human players. Some of the humans

were playing the game as usual, trying to win. The AI agents were trying to play the same

way as the humans did, and therefore be indistinguishable from human players. Some of

the humans acted as judges, playing the game and interacting with the other players in

order to decide whether they were controlled by humans or AI. They made the judgment

about the other agents at the end of each game: The objective for the AI was to garner at

least as many łhumanž judgments as łbotž judgments across several games with several

diﬀerent human players and judges.

Similarly to Doom, Unreal is a representative of the multiplayer ﬁrst-person shooter

game genre. Human players control their avatars who roam multiple levels in the game,

gather possessions, and attack other players with diﬀerent weapons. The game moves fast

and requires quick control and decision-making; however, it does not require linguistic

communication. Therefore, to appear human, the AI-controlled bots would have to react,

move, and make decisions similarly to the human players.

Indeed, at the time it was not clear whether it was possible to capture such behavior.

AI bots were routinely easy to identify in games in general: they behaved mechanically

and repetitively, and the players often learned strategies that made it easy to defeat the

AI bots. In many cases the gameplay consisted of ﬁguring out the AI and then moving

on to other games. On the other hand, part of the reason for multiplayer games was to

keep the game more interesting. It is always fun to beat your friends, but friends also

provide more interesting challenges. Therefore, being able to construct bots that behave

indistinguishably from humans is not only an important scientiﬁc question, but also has

great value for game development in general.

It was also not clear what human-like behavior even was. In a human-subject study in

the lab, Botprize games were captured on video, and the judges interviewed afterwards,

trying to understand how they made their decisions, i.e. what constituted human-like

behavior to them. Very little came out of that study. It turns out that humans are not very

good at explaining what they do, and they may not even understand how they do it. More

precisely, they are very good at constructing explanations when prompted to do so, but

the explanations may have little to do with their actual process. On several occasions the

judges gave ŕuent and logical explanations for why they judged the opponent as a bot, for

example, because they moved in a certain way, or reacted in a certain wayÐnot realizing

that in the game, they actually judged this opponent as a human.

392

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

Yet the human judges were quite reliable in making those distinctions, at least at

the beginning of the Botpr ize competition. Remarkably accurate, as a matter of fact.

Sometimes the opponent jumped in front of them, interacted with them for a few seconds

only, and ran awayÐand still the judges were able to make decisions well above chance.

So there appears to be a quality in the behavior that humans have but bots at the time

lacked. What is it?

In the ﬁrst several years, there was a signiﬁcant and consistent gap between the humans

and AI: While the human players were judged as human 60-70% of the time, the bots

were mistaken for humans only 20-30% of the time. Part of the problem turned out to be

network latencyÐwhen the games were played over the internet, a time lag was introduced,

and the humans dealt with that issue better than the bots. However, there were also

signiﬁcant diﬀerences in the behavior that gave the bots away. The bots were constructed

to play well: for instance in evolution, the ﬁtness early on was simply the ﬁnal game

score (Karpov, Schrum, and Miikkulainen,

2012; Schrum, Karpov, and Miikkulainen,

2012). Therefore, they evolved behaviors that were highly eﬀectiveÐbut not necessarily

human-like. For instance, they would run at full speed, and at the same time, shoot at

maximum accuracy. If the judge did something unexpected, e.g. run straight into them,

they would react immediately and perform the same behaviors as always when close to the

opponent. Humans rarely do that. They get startled when something unexpected happens,

and need to process it before they can react. Their performance varies and becomes less

accurate and eﬀective under load. They do not perform multiple behaviors well at the

same time. This was a fundamental diﬀerence between bots and humans.

However, when such performance constraints were imposed on the bots during

evolution, their behavior changed signiﬁcantly. They were no longer able to simply

optimize the game score, but had to do it while limited in their accuracy, choice of actions,

and ability to multitask (ﬁgure 14.7; Schrum, Karpov, and Miikkulainen, 2011). In

essence, they got tired and distracted and performed inconsistently. In other words, they

become more human-like. In the last Botprize competition in 2012, they were indeed

mistaken for humans more than 50% of the time. Not only that, they were judged as

humans more often than half of the human players!

Therefore, Botprize was a remarkable success in three ways: (1) it demonstrates how

even complex behavior seen in nature can be seen as optimization under constraints; (2) it

demonstrated how neuroevolution can be similarly constrained to discover more believable,

more human-like behavior; and (3) it showed how a scientiﬁc evaluation can be turned

into a fun and interesting event, i.e. a competition that promotes innovation and sharpens

focus across this entire area of research.

This success by no means suggests that the work on evolving human-like behavior

is now concluded. While it was successful at the low levels, there is an entire cognitive

level that is not yet captured. For instance, human players lay traps such as running

around the cor ner and waiting for the opponent there in order to ambush them. A human

player may fall for that trap once or twice, but will learn very quickly to avoid it. In

contrast, the bots will fall for it over and over again. In order to play like a human more

comprehensively, the bots will need to learn and adapt. They need to adjust their play

depending on the opponent. Moreover, there are challenges in playing with multiple

393

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

Figure 14.7: Neuroevolution of human-like behavior in the Botprize competition. The

competition is essentially a Turing test for game bots. The judge in this screenshot is player 443,

and is interacting with another player, 932, in order to determine whether it is an AI-controlled bot

or a human player. When neuroevolution was used to maximize the game score of the bot, the

behavior was too systematic, repetitive, and eﬀective to be human. Instead, when various constraints

were imposed on accuracy, behavior selection, and multitasking, behavior became eventually

indistinguishable from human behavior. Thus, the simulation demonstrated how even complex

behavior can be seen as emerging from evolutionary optimization under environmental constraints.

For animations of these behaviors, see https://neuroevolutionbook.com/demos.

other agents, especially in coordinating team play. And of course, such coordination will

ultimately require communication, which was not addressed in Botprize at all. Some of

these issues will be addressed in the remaining two sections of this chapter.

14.7

Case Study: Understanding an Evolutionary Breakthrough

As discussed above, neuroevolution experiments have demonstrated how competition,

cooperation, environmental constraints, diversity, eﬀective encodings, and many other

ingredients can give rise to intelligent behavior. However, they are very general, and

rarely address a speciﬁc research question in biology, i.e. how a particular behavior in a

particular species may have evolved.

Such simulations are possible as well, especially in cooperation with evolutionary

biologists. One promising opportunity is to understand evolutionary or igins of the

behaviors seen in hyenas, particularly the spotted hyena crocuta crocuta. A group of

biologists led by Kay Holekamp has maintained a research station in Masai Mara since

1988, and have chronicled much of the hyena behaviors as well as their biology (J. E. Smith,

K. D. S. Lehmann, Montgomery, et al.,

2017). These observations have been a motivation

for several of the experiments already discussed, including those of role-based cooperation

(section 7.1.3 and the evolutionary arms race (section 7.2.2 ), as well as others such as the

tradeoﬀs between cooperative vs. individual hunting (Rajagopalan, Rawal, Miikkulainen,

et al., 2011).

394

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

However, one of the behaviors of crocuta crocuta is particularly interesting: hyenas

can team up to steal a kill from lions (K. D. S. Lehmann, Montgomery, MacLachlan,

et al.,

2016). Lions are much bigger and stronger predators and can easily kill hyenas.

The Holekamp team has observed hundreds of interactions between them; usually hyenas

stay out of their way, but there are many cases where they seem to employ a sophisticated

cooperative strategy in order to drive the lions away from their kill. For example some two

to three lions may have caught a zebra, and are feasting on it, when a few hyenas wander

by. The hyenas do not get close, but appear careful and even fearful, as they should be in

the presence of such a predator threat. Instead, they start vocalizing loudly. Other hyenas

within hearing distance are attracted to these vocalizations, and soon a large number of

them, e.g. 20-30, start to gather around the lions. Their behavior changes to that of strong

interactions: their vocalizations change, they rub against each other, they make fast moves,

and they generally excite each other. As the excitement builds, they get less fearful, push

each other closer to the hyenas, and make threatening gestures towards them, until (it

seems) they cannot hold back their aggressive behavior any longer. In a dramatic, highly

coordinated, and precisely timed move, they form a wall around the hyenas and attack

them simultaneously. Typically they approach from three sides, leaving the lions a way

out. If there are enough hyenas, typically four times the lions, and they are coordinated

enough, the lions are overwhelmed and simply escape, leaving the kill to the hyenas.

How can such mobbing behavior have emerged in evolution? It is even more mysterious

because hyenas, as eﬀective as they are as hunters, are not that sophisticated in other ways.

They live in clans and have a strict matriarchal hierarchyÐperhaps because they have teeth

and jaws that can crack bones, so that any disputes between them could be fatal. They

do have territories and vicious clan wars where those territories are sometimes disputed.

They can hunt small prey individually and team up to hunt larger prey, such as zebras.

They also collaborate to take care of their young. But compared to other species that live

in the same environment, such as baboons, these behaviors are less advanced. In particular,

whereas baboons are good at learning new behaviors and coping with new situations,

hyenas are not very ŕexible in their ways, and they do not learn as easily (Benson-Amram

and Holekamp, 2012). Stealing a kill from lions appears unusually sophisticated for them,

and it is likely not a behavior they have learnedÐinstead, it appears to be innate, i.e.

an immediate product of evolution. Moreover, other hyena species that live nearby in

Eastern Africa do not exhibit the mobbing behavior. Therefore, this behavior seems to be

a breakthrough for the speciesÐevolution of intelligence in action.

Computational simulations thus oﬀer a potentially powerful way to gain insights into

the mobbing behavior and its origins. Indeed, several such simulations have been built,

focusing on game-theoretic as well as evolutionary computation aspects of it (Jahns and

Hintze, 2018; Rajagopalan, Holekamp, and Miikkulainen, 2019). One such simulation

suggested that a leading bold individual might evolve, making the cooperative behavior

more likely to emerge (Fairey and Soule, 2014; Solomon, Soule, and Heckendorn, 2012).

However, such individuals are not clearly identiﬁable in biology. The hyenas do indeed

diﬀer in how bold they areÐsome get closer sooner, and others hang backÐbut eventually

they act primarily as a homogeneous team. Their behavior is associated with strong

emotions, with fear competing with aﬃliation and aggression. While the behaviors

395

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

themselves suggest emotions, it is also possible to measure them quantitatively, albeit

coarsely, by analyzing the hormones in the stool samples they leave behind. The analysis

indeed reveals elevated levels of the signature hormones for these emotions after such a

lion encounter. The emotions may thus play a crucial role in allowing the team to form

and to act cohesively.

Based on these observations, a neuroevolution simulation was set up to study how

the mobbing behavior might emerge (Rajagopalan, Holekamp, and Miikkulainen, 2020,

ﬁgure 14.8). Ten hyenas and one lion were placed randomly in a

100 × 100

toroidal grid

world. The hyenas could move at each timestep, and the lion was stationary (with a kill).

If a hyena came within 20 steps of the lion, i.e. inside an łinteraction circlež, it was likely

to get killed, but if there were four or more hyenas within the interaction circle at any time,

the lion got mobbed. The hyenas sensed the distance and direction to the lion, whether

there were at least three other hyenas within the interaction circle, and whether the lion

had already been mobbed. The hyenas that participated in the mobbing event receive a

full ﬁtness; those that stepped into the circle after mobbing had already happened receive

an 80% ﬁtness, and others received no ﬁtness at all. Thus, the ideal hyena would approach

the lion until it was just outside the interaction circle, wait there until at least three other

hyenas made it there as well, and then step inside the circle at the same time as those other

hyenas. However, for this behavior to be successful, at least three other hyenas needed to

be able to perform it as well, and also time it just right. Such required cooperation and

timing makes mobbing very diﬃcult to evolve.

Neuroevolution was based on NEAT, and as usual, started with random small networks.

Over 1,000 generations four main behaviors were observed, diﬀering based on how bold

they were: (1) risk-takers ran straight to the lion regardless of other hyenas, and were

usually killed quickly; however, they were sometimes successful if other hyenas joined

them at the right time. (2) Risk-evaders-outside-circle hanged back and only approached

the lion after it had been killed, receiving lower rewards with little risk, but also sometimes

running out of time and not receiving any rewards. (3) Risk-evaders-at-circle approached

the lion but stopped at the circle, and only stepped in after the lion had been killed,

receiving low rewards reliably; and (4) mobbers behaved successfully as described above.

At the start of the simulation the networks were random and their actions were

random as well, which amounted to imperfect and inconsistent risk-taking and risk-evasion.

Both of these behaviors quickly became more consistent. The number of risk-takers

increased quickly because such a rushing behavior is easy to constr uct. On the other hand,

risk-evading hyenas are more likely to survive, and they thus persisted in the population as

well, establishing the opposite behavior, i.e. waiting. These two behaviors constituted the

ﬁrst two stepping stones.

Over a few generations, mobbing events started to happen by accident, and such

events increased gradually with an increasing number of risk-takers. Risk-takers were

occasionally recombined with risk-evaders, bringing them closer to the circle without

crossing it. This progress led to the discovery of the circle, and thus the third stepping

stone of risk-evaders-at-circle. Mobbing was happening still largely by accident, but

frequently enough so that eventually it was possible for evolution to discover precise timing

for it. As a result, in approximately 10 generations, 90% of the hyenas were mobbers, and

396

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

(𝑎) A crucial moment in the interaction (𝑏) Simulation setup

Figure 14.8: Complex coordinated behavior of hyenas mobbing lions.. In this behavior, hyenas

form a mob that attacks a group of lions, gaining possession of their kill. (

𝑎

) A screen capture of a

video documenting a mobbing event. Lions are much stronger than hyenas, but if the hyenas are

much more numerous and coordinate their attack well, they can drive the lions away from the kill.

This behavior is more complex than others that hyenas exhibit, largely hereditary, and may represent

an evolutionary breakthrough. (

𝑏

) A simulation of mobbing. A lion and several hyenas are placed

in a

100 × 100

grid world. If four or more hyenas enter the interaction circle simultaneously, they

get a high reward; if fewer than four, they get killed. Neuroevolution simulations suggest that

mobbing can arise from the simpler stepping stones of attacking, waiting at a distance, and waiting

at the circle. These behaviors persist even in prolonged evolution, making the mobbing behaviors

more robust. Figure (

𝑏

) from Rajagopalan, Holekamp, and Miikkulainen (2020). For videos and

animations of these behaviors, see https://neuroevolutionbook.com/demos.

successful 90% of the time.

Thus, each of the stepping stones played a role in discovering mobbing behavior.

Because of them, it was possible to overcome the deceptive ﬁtness landscape and develop

the precise coordination required. Interestingly, even in prolonged evolution over 1000

generations, these stepping stones still existed in the population in low numbers. Evolution

reached a dynamic equilibrium where some of the mobbers had risk-taker or risk-evader

oﬀspring, who again may have mobber oﬀspring. The teams were robust enough to tolerate

such diversity: as long as at least six of the 10 hyenas were mobbers, they successfully

mobbed most of the time. However, the teams were even more successful with more

mobbers, so why did such diversity persist?

As has been observed in prolonged evolution experiments in general, if evolution is

continued after solutions have been discovered, the solutions often become more robustly

encoded, and less likely to break in crossover and mutation (Rajagopalan, Holekamp, and

Miikkulainen, 2014; Watson, Palmius, Mills, et al., 2011). However, the behavior itself

may become more robust as well: In this case, the mobbers can be successful with more

challenging initial states and be able to work with teammates with more varied behavior.

Thus, diversity is important not only in discovering novel solutions, but also in reﬁning

the solutions so that they are more eﬀective in complex, uncertain environments, i.e. in

the real world. It is interesting that in such environments, evolutionary pressures exist that

397

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

promote diversity automatically.

Thus, the simulation demonstrated how the mobbing behavior could have emerged,

and in particular, the stepping stones required. A most interesting observation is that it

does require individuals who are extremely bold, even to their own detr iment. If some of

them sur vive and reproduce, the oﬀspring may discover a moderation that is successful in

a surprising way. There has, of course, been a long debate on the role of such behaviors in

evolutionary biology, and many eﬀorts to explain e.g. altruism (where individuals sacriﬁce

themselves for the common good) have been developed (Kay, L. Keller, and L. Lehmann,

2020). The simulation suggests that altruism may not be necessary, but instead simply a

variation in how bold the individuals are in trying to achieve their goals. Such variation

may be implemented through diﬀerent emotional balance, e.g. less fear and more aﬃliation

and aggression.

In a broader sense, such variation in boldness may be crucial for innovation more

generally. Even in humans there are always individuals who are willing to take more risks,

and it is often those individuals who drive innovation. Indeed, individuals may simply

wonder what’s on the other side of those mountains, what’s on the other side of the ocean,

and such somewhat irrational wonderlust may have allowed humans to spread over the

entire globe. Even today, thousands of people have already signed up for the chance to

get a one-way ticket to Mars, even though colonies or even the technology to get there

do not exist. Such individuals are fascinated by the novelty and the unknown. Being the

ﬁrst there is a reward in itself. We still share a lot of the boldness of the ﬁrst hyenas who

wondered łWhat happens if I just ignore the lions and run straight towards the kill?ž

Further, such simulations may be a way to look into the future as well, i.e. to predict how

the hyenas are likely to evolve from their current state. Could this synchronized cooperative

behavior serve as a foundation for developing more sophisticated communication? Or

perhaps higher functions that could be useful in it as well, such as learning and memory?

Other simulations suggest that discovering such functions requires overcoming deceptive

ﬁtness (Lehman and Miikkulainen, 2014)Ðvery much like the immediate disadvantage

of being too bold in the kill capture. Eventually, it may be possible to simulate major

transitions as well, as discussed in section 9.1.5. One of them is the evolution of language,

which may already be within reach of neuroevolution simulations, as will be discussed

next.

14.8 Evolution of Language

The last major transition in biology is the evolution of language (Maynard Smith and

Szathmáry, 1997; Szathmáry, 2015). It made cooperation possible more broadly and at a

more sophisticated level: It allowed individuals to deﬁne roles and make them ŕexible,

reason with hypotheticals and counterfactuals, and ultimately record knowledge and build

on prior knowledge. Language is the ingredient that made it possible to construct complex

societies. After a brief review of biological theor y of language, neuroevolution approaches

to evolving communication and structured language are reviewed in this section.

398

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

14.8.1 Biology of Language

Language can be deﬁned as the ability to generate an unlimited number of meanings from a

ﬁnite set of symbols using grammatical rules. Although many animal species communicate

using signals (essentially single words), language is unique to humans; therefore, some

crucial aspects of the language ability must be genetically encoded. However, every human

still needs to learn the speciﬁcs of their language through interaction with the environment.

Such interactions also need to take place at a precise time during development (Friedmann

and Rusou, 2015). If a child does not get proper linguistic input when they are one to

ﬁve years old, they do not develop full language abilities. The urge to develop language

at that age is so great that groups of children in a linguistically poor environment may

develop their own language systems or enhance the existing ones. For instance, pidgin

languages, or incomplete communication systems between adults who do not share a

common language, become creole languages, i.e. fully formed languages of the next

generation. It is also not tied to the verbal modality: deaf children of hearing parents can

develop a fully formed sign-language system (Singleton and Newport, 2004). Language

learning is thus biologically programmed into humans. It can be seen as an example of

both an expressive encoding and of synergistic development (sections 9.1.4 and 14.4):

Evolution speciﬁes a learning mechanism that constructs the ﬁnal complex system.

The degree of genetic determination has been up for debate for decades. Chomsky and

others have argued that the entire structure of language, a universal grammar, is genetically

coded, and language learning consists of simply observing and setting the parameters of

the grammar to obtain any speciﬁc language (Chomsky, 1986). On the other hand, there

are now large language models that learn perfectly good language simply by observing

large amounts of text (Ouyang, J. Wu, X. Jiang, et al., 2022). If the model is large enough,

and there’s enough data to train it, the simple task of predicting the next word results in a

model that can generate grammatical and even meaningful text.

Large language models still need to see much more language examples than humans

do during development. It is thus likely that genetic inŕuences play a larger role in biasing

the learning system towards the right kind of structures. What exactly these constraints

are and how evolution discovered them is a fascinating question. Given the progress in the

evolution of cooperation and intelligent behavior described above, it may be a question

that we may be able to answer soon with neuroevolution simulations.

There are also clues from biology beyond just observations of current human language

abilities. Earlier hominid species such as homo erectus are thought to have developed

protolanguage abilities. They were able to cooperate more generally, e.g. in scavenging

that required competing with other species, and such cooperation may have required

rudimentary language (Bickerton and Szathmáry, 2011). Several current higher species,

such as dolphins and apes, communicate regularly through vocalizations and gestures.

Moreover, it is possible to train them to extend these abilities to structures similar to human

language, even when they do not spontaneously utilize them in the wild (Bindra, Patterson,

Terrace, et al., 1981; Herzing and C. M. Johnson, 2015). It is therefore possible to see

these species as inter mediate stages in the evolution of language, potentially constraining

simulations.

In ter ms of circuitry, Broca’s area is comprised of Brodman’s areas 44 and 45; syntax

399

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

is processed in area 44, and area 45 is involved in action imagination and imitation. In

our closest relatives, chimpanzees, area 45 similarly represents actions but area 44 is

missing (Gallardo, Eichner, Sherwood, et al., 2023). It thus appears that language evolved

by expanding and lateralizing action processing into processing of syntax, suggesting a

possible foundation for neuroevolution simulations.

The next two subsections review work done so far in this area, from the early emergence

of a communication code to multitasking of codes and to cultural transmission. They also

outline possible avenues for evolving language and uncovering the ingredients that make it

possible.

14.8.2 Evolving Communication

Communication in artiﬁcial agents has been an active area of research for a long time

(K. Wagner, Reggia, Uriagereka, et al., 2003). Several experiments, many of them using

neuroevolution, demonstrate the emergence of communication codes for fundamental tasks

such as mating, hunting, herding, and ﬁghting. They are usually composed of symbols with

simple meaning, although sometimes contextualized, rather than full language systems

with grammatical structure. Nevertheless, they help us understand some of the conditions

for communication and language to emerge.

. One challenge is that it is diﬃcult for the population in evolutionary simulations

to converge on a common code. It is more likely to emerge within genetically related

groups where selection operates at the group level (Floreano, Mitri, Magnenat, et al.,

2007). It may also emerge more readily when the population is asymmetric, with clearly

delineated roles. For instance, an inŕuential early experiment focused on the simple but

compelling problem of evolving a code for a cooperative task (Werner and M. G. Dyer,

1992). In a simulated grid world, there were males and females, both controlled through

neural networks. The females were stationary but could sense the males’ location and

emit three-bit signals to them; the males could move and could perceive the signals, but

could not see the females. If a male entered the same location as a female, they would

create oﬀspring through genetic algorithms. Thus, in order to mate, the females needed to

send instructions to the males, guiding them step by step to ﬁnd the females. Initially, the

males would wander around randomly; however, guidance on their last step would soon

emerge, and gradually the symbols and their interpretation from further away. Eventually,

a common code evolved that was eﬀective and reliable in most situations. The simulation

thus demonstrated that an eﬀective communication code emerges when it enables eﬀective

evolution, and that asymmetric roles can make it easier to discover.

Since mating is a fundamental constituent in evolution, an interesting question is

whether it is indeed a possible origin for communication. In particular, proper mate

selection may guide evolution towards more eﬀective mating and higher-quality oﬀspring.

In the simplest case, mate selection may be based on direct visible features and displays

such as size, color, or strength. In higher animals, it is often based on communication, i.e.

vocalizations or ritualized movements and gestures. Such signals can be interpreted as

indicators of traits, making it possible to decide whether the potential mate is compatible.

Once communication evolved to serve mate selection, it may have been exapted, or reused

and adapted, for other tasks, eventually forming a basis for protolanguage (Bickerton,

400

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

Figure 14.9: Evolution of communication code for mate selection and hunting. The agents

were able to move in a simulated 1-D world where their ﬁtness depended on successful mating and

hunting. (

𝑎

) Each agent in the population is controlled by an evolved neural network that receives

the current task (either mate selection or hunting), the distance to the prey, and the message from

the other agent as its input. At its output it decides to mate or move and generates a message

that the other agents can use to decide whether to mate or whether to coordinate prey capture.

For mating to be successful, the agents need to be compatible; compatibility is determined by an

inherited 2-bit trait. For prey capture to be successful, they need to step on it at the same time. (

𝑏

)

Over evolution, the agents discover a messaging code that allows them to communicate their trait

and their current distance to the prey eﬀectively to other agents. It turns out that if mate selection

is evolved ﬁrst, instead of evolving prey capture ﬁrst or at the same time, the agents develop a more

eﬀective and parsimonious code for both tasks. This result suggests that communication may have

originally evolved for mate selection, and later adapted to other uses.

1990).

Such a possibility can be investigated in neuroevolution simulations (Rawal, Boughman,

and Miikkulainen, 2014). In a simulated world, individuals were controlled by neural

networks, and they each had a two-bit trait encoding that determined their compatibility

with other individuals (ﬁgure 14.9). The network outputs a two-bit message, as well as a

control signal on whether to mate or not, and whether to move or not. As their input, they

received a two-bit message, the distance to a prey, and a bit indicating whether they were

in a mate or hunt situation. They were then paired up in both of these tasks. In mating,

they communicated their trait to their partner and upon receiving the trait message from

their partner, decided whether to mate; if they mated when the traits were compatible, they

received a high ﬁtness. In hunting, they had to move closer to the prey at each step, and

also communicate to their partner whether they were one step away from the prey; if they

entered the prey location at the same time, they received a high ﬁtness.

In a series of experiments, it turned out that if mate selection was evolved ﬁrst, and

hunting was then added as a second task, the agents evolved successful behavior in both

tasks much faster than when the tasks were introduced in the opposite order, or both at

once. In other words, the code evolved for mate selection served as a better foundation

for a code needed for hunting than the other way around. The mate-selection code was

simpler, and it was possible to complexify it to add hunting. Such incremental evolution

was also more eﬃcient than trying to evolve both behaviors at once. The ﬁnal code used

fewer symbols, and for instance, the message to indicate readiness to mate was often

reused to indicate readiness for prey capture. It thus served as an eﬀective stepping stone

401

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

for evolving complex behavior. The simulations thus suggest that communication may

have evolved incrementally through stepping stones, and mate selection is a plausible

origin for that process.

One fundamental aspect that is missing from such simulations is that the communication

codes in nature are usually not innate, but are learned during the early life of the individual.

That is, it is the ability for learning the code that is evolved. It is possible to extend

language evolution simulations to such a setting as well (X. Li and Miikkulainen, 2016 ).

As in prior simulations, the agents were paired up in trials, and had to cooperate in order

to hunt or mate successfully. Each generation began with a parenting phase: The newly

generated oﬀspring were paired up with their parents, and learned to be successful in the

necessary communication through reinforcement learning. Next, all agents were paired

up randomly in a socializing phase, and their overall ﬁtness was measured. Finally, the

most successful agents became parents for the next generation. In this manner, it was

possible to evolve successful behavior for both tasks through a communication code that

was evolved over multiple generations and learned by each individual in each generation.

The simulation could then be used to further understand the pressures that cause

communication to evolve. For the hunting and mating to be successful, both partners had

to be ready for it. The agents could either sense that readiness directly or communicate it.

By enabling and disabling such sensing and communication channels, it was possible to

make communication necessary or optional.

It turned out that if the agents could sense readiness directly, communication did not

evolve, even when communication channels were available. Evolution thus discovered

the simplest and most reliable way to be successful. However, if one or both readiness

senses were disabled, communication did evolve. This result makes sense: without

communication they would be successful only randomly, and there was thus a strong

pressure to take advantage of communication-based coordination. Most interestingly, if

communication evolved for one of the tasks, it was also utilized in the other, even if it was

not necessar y for it. That is, if a communication ability is available, evolution will utilize

it.

Evolution of communication and language may thus follow a similar process as many

other innovations: evolution is a tinkerer, and will adapt whatever abilities exist to other

uses. Communication may be one such general ability that originated from a fundamental

need e.g. for mate selection, and was then exapted to others. Would it be possible to

make the transition from signaling with single symbols to communication with linguistic

structures in this way? Possibilities are discussed in the next section.

14.8.3 Evolution of Structured Language

Evolution of language is diﬃcult to study in biology because there is no fossil record and

few other clues on how human ancestors communicated. Consequently, there are many

theories about it, and they tend to be philosophical in nature. However, one signiﬁcant

tool we have at our disposal is computational modeling. It may be possible to gain insight

into the conditions under which language evolves by building simulations.

Many computational approaches have indeed been developed using diﬀerent techniques

(K. Wagner, Reggia, Uriagereka, et al., 2003). Rather than evolution, many of them focus

402

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

on the emergence of language. That is, they do not aim to model multiple generations of

agents, but rather how communication can emerge in small groups of agentsÐsometimes

even just two. They do, however, demonstrate discovery of some linguistic structure, not

simply signaling between agents.

One approach is agent-based modeling, which may even involve physical robots (Kirby,

Griﬃths, and K. Smith, 2014; Steels, 2016). They take on the roles of a teacher and

learner, and language emerges in order to per form a joint task. The signals not only

combine into larger structures, but they also have a grounding, i.e. a semantic system

emerges. In a larger group, iterated learning may be established, where the language is

taught by individuals who learned it themselves earlier.

Mathematical modeling based on game theory has also provided interesting insights

(Nowak and Krakauer, 1999). When the game focuses on establishing reliable computation,

it turns out words emerge from signaling, and grammar emerges from words, as a way to

compensate for errors that are likely to arise in the communication medium.

Neural networks have also been used as an implementation for language agents in many

studies (Batali, 1998; Galke, Ram, and Raviv, 2022). Most often, they use recurrency or

LSTM to input and output language, and a reinforcement learning mechanism such as

REINFORCE to adapt. While compositional structures do emerge, they still do not match

human languages well. It is possible that further cognitive constraints such as memor y

and alternation of speaker and listener roles are needed.

Evolutionary computing models constitute a fourth category of approaches. For

instance, grammars can be evolved directly and compositionality discovered in service

of a task (Zuidema and Hogeweg, 2000). It is also possible to apply evolution to neural

networks that generate the language. This kind of approach ﬁts the problem most naturally:

The ability for language is evolved over generations of a large number of individuals, and

each individual learns the particular language during their lifetime.

While it is easy to discover communication through signaling in this manner (as was

reviewed above), it is much harder to discover compositionality, i.e. linguistic structure.

However, there has been some progress even early on. For instance, in an artiﬁcial

environment with poisonous and edible mushrooms, neuroevolution discovered a signaling

system that allowed the individuals to guide others to edible ones while avoiding poisonous

ones (Cangelosi, 1999; Cangelosi and Parisi, 1998). Signiﬁcantly, the system consisted of

pairs of symbols signifying action and object. The oﬀspring then learned the particular

symbols through backpropagation. In this manner, a rudimentary grammatical structure

evolved, and it is strikingly similar to the structures that can be taught to e.g. chimpanzees.

Perhaps such a capability is the ﬁrst step towards the evolution of human language?

From such a starting point, why did language evolve only in humans? It is possible that

the origin of language is not in communication, but in cognition. That is, while it is possible

to build such a simple action-object protolanguage by complexifying signaling, perhaps

true linguistic structure was discovered as an exaptation of other cognitive functions?

One theory is that language emerged as a useful tool in society, making it possible to

coordinate actions such as group hunting and group caring for the young when mothers were

needed for foraging and other activities. As these activities became more sophisticated,

it was necessary to understand that diﬀerent individuals could take on diﬀerent roles

403

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

Figure 14.10: A primary hypothesis is that language emerged at the same time as symbolic

culture. Then again, we don’t really know. Figure by Essner (2021).

at diﬀerent times, and how these roles might relateÐin other words, ŕexible relational

structures similar to grammatical structures. Once this structure was in place in the brain,

it was exapted to enhance communication, and eventually, structured language emerged.

However, many other animals live in societies as well, and hunt in groups, and care

for the young together (for instance, the hyenas discussed above). There was something

diﬀerent about human societies that served as a stepping stoneÐand, again due to lack of

any kind of direct evidence, there are many theories about what that might be (Bickerton,

2007; Corballis, 2011; Knight and Power, 2012). One theory is that as humans became

the apex scavenger, they needed to communicate the type and location of the kill. First,

this would be done iconically, but gradually with a displacement in time and space,

which may have led to the abstraction needed for language. Another is that alliances and

cliques formed in societies when members wanted to dominate other members, and their

maintenance required language. Gossip has also been indicated as a potential source,

replacing or adding to physical grooming. A plausible explanation is that language

emerged as a result (or together with) symbolic culture, for which there is some evidence

in early objects and paintings (ﬁgure 14.10). As societies grew more complex, rules were

established for them to function better; symbolic representations and displacement made

them possible, forming an impetus for language.

The time may now be right to start evaluating these hypotheses in computational

neuroevolution simulations. There is enough computing power and sophistication to

create virtual worlds where many of these conditions and constraints can be simulated.

The neural networks would have to be much more complex and able to perform many

diﬀerent tasks, but it is also an ability that is now emerging, as reviewed in this book. It is

also possible to build up the simulations and hypotheses gradually from simple to more

404

CHAPTER 14. WHAT NEUROEVOLUTION CAN TELL US ABOUT BIOLOGICAL EVOLUTION?

complex ones, and gain insight along the way. Neuroevolution is uniquely well-suited to

meeting these challenges, and may form a crucial ingredient in developing a theory of

how language evolved, which is one of the most fascinating and perplexing questions in

science.

14.9 Chapter Review Questions

Neural Structure and Evolutionary Origins: How can neuroevolution simulations

help us understand the evolutionary origins of speciﬁc neural structures, such as

command neurons, and their role in behaviors like navigation and foraging?

Central Pattern Generators (CPGs): What are central pattern generators (CPGs),

and how have neuroevolution experiments been used to model their role in controlling

locomotion in animals, such as lampreys and salamanders?

Modularity and Wiring Length: How does the principle of minimizing wiring

length contribute to the evolution of modular neural networks? Why does modularity

lead to better performance and adaptability in evolving neural systems?

Neuromodulation: What role does neuromodulation play in adapting neural

behavior? How does neuroevolution demonstrate its utility in tasks like the T-maze

navigation?

Synergistic Development: How does the concept of synergistic development

explain the interplay between genetic biases and lifetime learning? How have

neuroevolution experiments demonstrated this principle in tasks such as foraging or

pattern recognition?

Constrained Evolution of Behavior: How do body and environmental constraints

inŕuence the evolution of believable and natural behaviors in simulated agents, as

demonstrated in ﬁght-or-ŕight behavior evolution?

Human-like Behavior in AI: What role did performance constraints (e.g., limited

accuracy, multitasking, and behavioral variability) play in evolving AI bots that

were indistinguishable from human players in the Botprize competition?

Evolutionary Breakthroughs in Social Behavior: How did neuroevolution

simulations model the emergence of mobbing behavior in hyenas, and what stepping

stones contributed to the evolution of this complex coordinated strategy?

Origins of Communication: In simulations of mate selection and hunting, how

did evolving communication for one task (e.g., mating) serve as a foundation for

communication in another task (e.g., hunting)?

10.

Evolution of Language: What theories exist about the origins of language, and

how might neuroevolution simulations contribute to understanding the conditions

and stepping stones that enabled its emergence?

405

Chapter 15

Epilogue

The last decade or so has seen an expansion of AI that was unexpected and unprecedented.

Much of it was based on a few new neural network architectures, such as transformers,

diﬀusion networks, and adversarial networks. But much of it was also based on old ideas

that, with suﬃcient computation, started to work at a new scale. Despite all the progress

in the past several decades, this success was hardly predictable or guaranteed. Indeed,

scientiﬁc breakthroughs often emerge in unexpected areas.

Neuroevolution is closely related to these breakthrough areas, but distinctly diﬀerent.

Indeed, it is at an interesting phase. As was the case with deep learning and generative

AI, there is a long history of progress and successes. Unlike in those other areas, there

is also an existence proof that it can lead to tremendous success: After all, biological

evolution successfully created complex and eﬀective nervous systems. There are also

indications that neuroevolution and biology are connected: Neuroevolution experiments

have already replicated biological structures and biological behavior in many cases, giving

computational explanations on how they may arise.

One aspect that neuroevolution still has not leveraged to its full extent is computational

resources. To be sure, many experiments are r un in parallel on hundreds of hosts, but that

is still orders of magnitude less than the compute that made LLMs and diﬀusion models

work. Interestingly, unlike other creative AI methods such as reinforcement learning,

neuroevolution is well-suited for such scale-up. Experiments can easily be parallelized

over millions of hosts, allowing them to harness processes that so far have not been

the mainstay of evolutionary computation but are fundamental in biology, such as large

populations, weak selection, neutral mutations, and deep time. The scale-up, together

with such untapped techniques, could lead to breakthroughs.

For such experiments to create intelligent agents, it will be necessary to create

more complex and comprehensive virtual worlds than we have today. Such simulated

environments play a role similar to the vast amounts of text that became available and

made it possible to train LLMs with human knowledge. The simulations could be based

on ﬁrst principles of physics, but also include phenomenological components, i.e. those

that are trained with data from the real world. Such components may be necessary to

simulate high-level behavior, phenomena, and societies, which do not readily arise from

ﬁrst principles. In particular, LLMs could be used to create a level of human-like agents for

406

CHAPTER 15. EPILOGUE

the environment, allowing neuroevolution to solve problems at the same level. Signiﬁcant

computation will be required, but it should become available in the near future, and we

should be ready for it.

With such environments, it may be possible to use neuroevolution to create brain-like

complexity. It could result in a runaway evolution not unlike that seen in actual brain

evolution: Suﬃcient compute makes it possible to discover increasingly complex stepping

stones, which then lead to a ser ies of expansions in the capabilities of the agents. Such

computational models may allow us to better understand biological evolution and the

resulting complex brain str uctures and behavior. It may also make it possible to construct

agents with general, grounded intelligence, which can act as relatable, believable, and

trustworthy assistants and companions to humans. With this approach, it may be possible

to optimize AI construction, improving decision-making in society and quality of life in

general.

As described in this book, the past three decades have led us to a striking distance

from this goal. The next decade or so may allow us to realize it. Let’s go do it!

407

References

Abelsson, Anna and Anna Willman (2020). łEthics and Aesthetics in Injection Treatments

with Botox and Fillerž. In: Journal of Women & Aging, pp. 1ś13. (Link).

Achiam, Josh et al. (2023). łGPT-4 Technical Reportž. In: arXiv:2303.08774.

(Link).

Adami, Christoph, Jory Schossau, and Arend Hintze (2016). łEvolutionary Game Theory

Using Agent-based Methodsž. In: Physics of Life Reviews 19, pp. 1ś26. (Link).

Agogino, Adrian, Kenneth O. Stanley, and Risto Miikkulainen (2000). łOnline Interactive

Neuro-evolutionž. In: Neural Processing Letters 11, pp. 29ś38. (Link).

Agogino, Adrian, Kagan Tumer, and Risto Miikkulainen (2005). łEﬃcient Credit Assign-

ment Through Evaluation Function Decompositionž. In: GECCO’05: Proceedings of

the 7th Annual Conference on Genetic and Evolutionary Computation, pp. 1309ś1316.

(Link).

Agüera y Arcas, Blaise (2025). What Is Intelligence? Lessons from AI About Evolution,

Computing, and Minds. Cambridge, MA: MIT Press. (Link).

Agüera y Arcas, Blaise, Jyrki Alakuijala, James Evans, Ben Laurie, Alexander Mordvintsev,

Eyvind Niklasson, Ettore Randazzo, and Luca Versari (2024). łComputational Life:

How Well-formed, Self-replicating Programs Emerge from Simple Interactionž. In:

arXiv:2406.19108. (Link).

Aharonov-Barki, Ranit, Tuvik Beker, and Eytan Ruppin (2001). łEmergence of Memory-

driven Command Neurons in Evolved Artiﬁcial Agentsž. In: Neural Computation 13,

pp. 691ś716. (Link).

Akiba, Takuya, Makoto Shing, Yujin Tang, Qi Sun, and David Ha (2025). łEvolutionary

Optimization of Model Merging Recipesž. In: Nature Machine Intelligence 7, pp. 195ś

204. (Link).

Akopyan, Filipp, Jun Sawada, Andrew Cassidy, Rodrigo Alvarez-Icaza, John Arthur,

Paul Merolla, Nabil Imam, Yutaka Nakamura, Pallab Datta, Gi-Joon Nam, Brian

Taba, Michael Beakes, Bernard Brezzo, Jente B. Kuang, Rajit Manohar, William P.

Risk, Bryan Jackson, and Dharmendra S. Modha (2015). łTrueNorth: Design and

Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chipž. In:

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34,

pp. 1537ś1557. (Link).

Alden, Matthew, Aard-Jan van Kesteren, and Risto Miikkulainen (2002). łEugenic

Evolution Utilizing a Domain Modelž. In: GECCO’02: Proceedings of the 4th Annual

Conference on Genetic and Evolutionary Computation, pp. 279ś286. (Link).

408

REFERENCES

Alden, Matthew and Risto Miikkulainen (2016). łMARLEDA: Eﬀective Distribution

Estimation through Markov Random Fieldsž. In: Theoretical Computer Science 633,

pp. 4ś18. (Link).

Anil, Rohan et al. (2023). łPaLM 2 Technical Reportž. In: arXiv:2305.10403.

(Link).

(2025). łGemini: A Family of Highly Capable Multimodal Modelsž. In: arXiv:2312.11805.

(Link).

Anthropic (2025a). Introducing Claude 4. https://www.anthropic.com/news/claude-4.

Retrieved 8/31/2025.

(2025b). System Card: Claude Opus 4 & Claude Sonnet 4. https://www-cdn.anthropic.

com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf. Retrieved 8/31/2025.

Arjovsky, Martin, Soumith Chintala, and Léon Bottou (2017). łWasserstein Generative

Adversarial Networksž. In: Proceedings of the 34th International Conference on

Machine Learning. Vol. 70, pp. 214ś223. (Link).

Arsiwala, Shehnaz Z. (2018). łTrends for Facial Injectable Therapies in Medical Aesthet-

icsž. In: Journal of Cutaneous and Aesthetic Surgery 11, pp. 45ś46. (Link).

Assunção, Filipe, Nuno Lourenço, Bernardete Ribeiro, and Penousal Machado (2021).

łFast-DENSER: Fast deep evolutionary network structured representationž. In: Soft-

wareX 14, p. 100694. (Link).

Awad, Noor, Neeratyoy Mallik, and Frank Hutter (2020). łDiﬀerential Evolution for

Neural Architecture Searchž. In: Proceedings of the Workshop on Neural Architecture

Search, Eighth International Conference on Learning Representations. (Link).

Bai, Jinze et al. (2023). łQwen Technical Reportž. In: arXiv:2309.16609.

(Link).

Baluja, Shumeet and Rich A. Caruana (1995). łRemoving the Genetics from the Standard

Genetic Algorithmž. In: Proceedings of the 12th International Conference on Machine

Learning, pp. 38ś46. (Link).

Banzhaf, Wolfgang, Peter Nordin, Robert E. Keller, and Frank D. Francone (1998). Genetic

Programming: An Introduction. San Francisco: Kaufmann. (Link).

Batali, John (1998). łComputational Simulations of the Emergence of Grammarž. In:

Approaches to the Evolution of Language: Social and Cognitive Bases. Ed. by James R.

Hurford, Michael Studdert-Kennedy, and Chris Knight. Cambridge, UK: Cambridge

University Press, pp. 405ś426.

Baxter, Jared A., Daniel A. Merced, Daniel J. Costinett, Leon M. Tolbert, and Burak

Ozpineci (2018). łReview of Electrical Architectures and Power Requirements for

Automated Vehiclesž. In: IEEE Transportation Electriﬁcation Conference and Expo,

pp. 944ś949. (Link).

Beane, Wendy Scott, Junji Morokuma, Joan M. Lemire, and Michael Levin (2013).

łBioelectric Signaling Regulates Head and Organ Size during Planarian Regenerationž.

In: Development 140.2, pp. 313ś322. (Link).

Beer, Randall D., Hillel J. Chiel, and John C. Gallagher (1999). łEvolution and Analysis

of Model CPGs for Walking: II. General Principles and Individual Variabilityž. In:

Journal of Computational Neuroscience 7, pp. 119ś147. (Link).

Belew, Richard K. (1990). łEvolution, Learning and Culture: Computational Metaphors

for Adaptive Algorithmsž. In: Complex Systems 4, pp. 11ś49. (Link).

409

REFERENCES

Belew, Richard K., John McInerney, and Nicol N. Schraudolph (1992). łEvolving Networks:

Using the Genetic Algorithm with Connectionist Learningž. In: Artiﬁcial Life II. Ed. by

Christopher G. Langton, Charles Taylor, J. Doyne Farmer, and Steen Rasmussen.

Vol. 10. Redwood City, CA: Addison-Wesley, pp. 511ś547. (Link).

Ben-Iwhiwhu, Eseoghene, Pawel Ladosz, Jeﬀery Dick, Wen-Hua Chen, Praveen Pilly,

and Andrea Soltoggio (2020). łEvolving Inborn Knowledge for Fast Adaptation in

Dynamic POMDP Problemsž. In: GECCO’20: Proceedings of the 2020 Genetic and

Evolutionary Computation Conference, pp. 280ś288. (Link).

Benson-Amram, Sarah and Kay E. Holekamp (2012). łInnovative Problem Solving

by Wild Spotted Hyenasž. In: Proceedings of the Royal Society of London B 279,

pp. 4087ś4095. (Link).

Bickerton, Derek (1990). Language and Species. Chicago, IL: The University of Chicago

Press. (Link).

(2007). łLanguage Evolution: A Brief Guide for Linguistsž. In: Lingua 117, pp. 510ś

526. (Link).

Bickerton, Derek and Eörs Szathmáry (2011). łConfrontational Scavenging as a Possible

Source for Language and Cooperationž. In: BMC Evolutionary Biology 11, pp. 261ś

261. (Link).

Bindra, Dalbir, Francine G. Patterson, Herbert S. Terrace, Laura A. Petitto, Richard J.

Sanders, and Thomas G. Bever (1981). łApe Languagež. In: Science, pp. 86ś88.

(Link).

Bingham, Garrett, William Macke, and Risto Miikkulainen (2020). łEvolutionary Opti-

mization of Deep Learning Activation Functionsž. In: GECCO’20: Proceedings of the

2020 Genetic and Evolutionary Computation Conference, pp. 289ś296. (Link).

Bingham, Garrett and Risto Miikkulainen (2022). łDiscovering Parametric Activation

Functionsž. In: Neural Networks 148, pp. 48ś65. (Link).

(2023a). łAutoInit: Analytic Signal-Preserving Weight Initialization for Neural Net-

worksž. In: Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 37, pp. 6823ś

6833. (Link).

(2023b). łEﬃcient Activation Function Optimization through Surrogate Modelingž.

In: Advances in Neural Information Processing Systems 36. (Link).

Bishop, Christopher M. and Hugh Bishop (2024). Deep Learning: Foundations and

Concepts. New York: Springer. (Link).

Blount, Zachary D., Christina Z. Borland, and Richard E. Lenski (2008). łHistorical

Contingency and the Evolution of a Key Innovation in an Experimental Population

of Escherichia Coliž. In: Proceedings of the National Academy of Sciences 105.23,

pp. 7899ś7906. (Link).

Bongard, Josh C. (2011). łMorphological Change in Machines Accelerates the Evolution

of Robust Behaviorž. In: Proceedings of the National Academy of Sciences 108,

pp. 1234ś1239. (Link).

(2013). łEvolutionary Roboticsž. In: Communications of the ACM 56, pp. 74ś83.

(Link).

Bongard, Josh C. and Rolf Pfeifer (2001). łRepeated Structure and Dissociation of

Genotypic and Phenotypic Complexity in Artiﬁcial Ontogenyž. In: GECCO’01:

410

REFERENCES

Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation,

pp. 829ś836. (Link).

Bontrager, Philip, Wending Lin, Julian Togelius, and Sebastian Risi (2018). łDeep Interac-

tive Evolutionž. In: Proceedings of the 7th International Conference on Computational

Intelligence in Music, Sound, Art and Design, pp. 267ś282. (Link).

Bontrager, Philip, Aditi Roy, Julian Togelius, Nasir Memon, and Arun Ross (2018).

łDeepMasterPrints: Generating Masterprints for Dictionary Attacks via Latent Variable

Evolutionž. In: IEEE International Conference on Biometrics Theory, Applications

and Systems. IEEE. (Link).

Brock, Andrew, Theodore Lim, James M. Ritchie, and Nick Weston (2018). łSMASH:

One-Shot Model Architecture Search through HyperNetworksž. In: Proceedings of the

Sixth International Conference on Learning Representations, pp. 2026ś2047. (Link).

Brockman, Greg, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie

Tang, and Wojciech Zaremba (2016). łOpenAI Gymž. In: arXiv:1606.01540. (Link).

Bruce, Joseph and Risto Miikkulainen (2001). łEvolving Populations of Expert Neural

Networksž. In: GECCO’01: Proceedings of the 3rd Annual Conference on Genetic

and Evolutionary Computation, pp. 251ś257. (Link).

Bryant, Bobby D. and Risto Miikkulainen (2006). łEvolving Stochastic Controller

Networks for Intelligent Game Agentsž. In: Proceedings of the IEEE Congress on

Evolutionary Computation, pp. 1007ś1014. (Link).

(2007). łAcquiring Visibly Intelligent Behavior with Example-Guided Neuroevolu-

tionž. In: Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 22, pp. 801ś

808. (Link).

(2018). łA Neuroevolutionary Approach to Adaptive Multi-agent Teamsž. In: Founda-

tions of Trusted Autonomy. Ed. by Hussein A. Abbass, Jason Scholz, and Darry J. Reid.

New York: Springer, pp. 87ś114. (Link).

Buccino, Alessio P., Tanguy Damart, Julian Bartram, Darshan Mandge, Xiaohan Xue,

Mickael Zbili, Tobias Gänswein, Aurélien Jaquier, Vishalini Emmenegger, Henry

Markram, Andreas Hierlemann, and Werner Van Geit (2024). łA Multimodal Fitting

Approach to Construct Single-Neuron Models With Patch Clamp and High-Density

Microelectrode Arraysž. In: Neural Computation 36, pp. 1286ś1331. (Link).

Burt, D. Michael and David I. Perrett (1995). łPerception of Age in Adult Caucasian

Male Faces: Computer Graphic Manipulation of Shape and Colour Informationž. In:

Proceedings of the Royal Society of London. Series B: Biological Sciences 259.1355,

pp. 137ś143. (Link).

Busoniu, Lucian, Robert Babuska, and Bart De Schutter (2008). łA Comprehensive Survey

of Multiagent Reinforcement Learningž. In: IEEE Transactions on Systems, Man, and

Cybernetics, Part C (Applications and Reviews) 38.2, pp. 156ś172. (Link).

Buzsáki, György (2006). Rhythms of the Brain. Oxford, UK: Oxford University Press.

(Link).

Cangelosi, Angelo (1999). łEvolution of Communication Using Symbol Combination in

Populations of Neural Networksž. In: Proceedings of the International Joint Conference

on Neural Networks, pp. 4365ś4368. (Link).

411

REFERENCES

Cangelosi, Angelo and Domenico Parisi (1998). łThe Emergence of a ’Language’ in an

Evolving Population of Neural Networksž. In: Connection Science 10, pp. 83ś97.

(Link).

Cardamone, Luigi, Daniele Loiacono, and Pier L. Lanzi (2009). łOn-line Neuroevolution

Applied to the Open Racing Car Simulatorž. In: Proceedings of the IEEE Congress on

Evolutionary Computation, pp. 2622ś2629. (Link).

Caruana, Rich A. (1997). łMultitask Learningž. In: Machine Learning 28, pp. 41ś75.

(Link).

Center for Disease Control and Prevention (2023). COVID-19 Data Sources. https://archive.

cdc.gov/#/details?url=https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covid-

19-data-sources.html

. Retrieved 8/31/2025.

Cha, Stephen, Taehyeon Kim, Hayeon Lee, and Se-Young Yun (2023). łA Survey of

Supernet Optimization and its Applications: Spatial and Temporal Optimization for

Neural Architecture Searchž. In: arXiv:2204.03916. (Link).

Chankong, Vira and Yacov Y. Haimes (2008). Multiobjective Decision Making: Theory

and Methodology. Courier Dover Publications. (Link).

Chebykin, Alexander, Tanja Alderliesten, and Peter A. N. Bosman (2022). łEvolutionary

neural cascade search across supernetworksž. In: GECCO’22: Proceedings of the

Genetic and Evolutionary Computation Conference, pp. 1038ś1047. (Link).

Chellapilla, Kumar and David B. Fogel (1999). łEvolution, Neural Networks, Games, and

Intelligencež. In: Proceedings of the IEEE 87, pp. 1471ś1496. (Link).

Chemla, Sandrine and Frédéric Chavane (2010). łVoltage-sensitive Dye Imaging: Tech-

nique Review and Modelsž. In: Journal of Physiology-Paris 104, pp. 40ś50. (Link).

Chen, Lili, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin,

Pieter Abbeel, Aravind Srinivas, and Igor Mordatch (2021). łDecision Transformer:

Reinforcement Learning via Sequence Modelingž. In: Advances in Neural Information

Processing Systems 34, pp. 15084ś15097. (Link).

Cheney, Nick, Josh C. Bongard, Vytas SunSpiral, and Hod Lipson (2018). łScalable

Co-Optimization of Morphology and Control in Embodied Machinesž. In: Journal of

the Royal Society Interface 15. Article 20170937. (Link).

Cheney, Nick, Robert MacCurdy, Jeﬀ Clune, and Hod Lipson (2014). łUnshackling

Evolution: Evolving Soft Robots with Multiple Materials and a Powerful Generative

Encodingž. In: ACM SIGEVOlution 7.1, pp. 11ś23. (Link).

Chevalier-Boisver t, Maxime, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan

Saharia, Thien H. Nguyen, and Yoshua Bengio (2019). łBabyAI: A Platform to

Study the Sample Eﬃciency of Grounded Language Learningž. In: Proceedings of

the Seventh International Conference on Learning Representations, pp. 4429ś4447.

(Link).

Chiel, Hillel J., Randall D. Beer, and John C. Gallagher (1999). łEvolution and Analysis

of Model CPGs for Walking: I. Dynamical Modulesž. In: Journal of Computational

Neuroscience 7, pp. 99ś118. (Link).

Chomsky, Noam (1986). Knowledge of Language: Its Nature, Origin, and Use. Greenwood

Publishing Group. (Link).

412

REFERENCES

Chung, Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio (2014). łEm-

pirical Evaluation of Gated Recurrent Neural Networks on Sequence Modelingž. In:

Deep Learning Workshop, 28th Annual Conference on Neural Information Processing

Systems. (Link).

Cliﬀ, Dave, Inman Harvey, and Philip Husbands (1993). łExplorations in Evolutionary

Roboticsž. In: Adaptive Behavior 2, pp. 73ś110. (Link).

Clune, Jeﬀ, Benjamin E. Beckmann, Robert T. Pennock, and Charles Ofria (2011). łHybrID:

A Hybridization of Indirect and Direct Encodings for Evolutionary Computationž. In:

Advances in Artiﬁcial Life: Darwin Meets von Neumann, 10th European Conference.

Ed. by George Kampis, István Karsai, and Eörs Szathmáry. New York: Springer,

pp. 134ś141. (Link).

Clune, Jeﬀ and Hod Lipson (2011). łEvolving Three-dimensional Objects with a Generative

Encoding Inspired by Developmental Biologyž. In: ECAL 2011: The 11th European

Conference on Artiﬁcial Life, p. 24. (Link).

Clune, Jeﬀ, Jean-Baptiste Mouret, and Hod Lipson (2013). łThe Evolutionary Origins

of Modularityž. In: Proceedings of the Royal Society B: Biological Sciences 280,

p. 20122863. (Link).

Clune, Jeﬀ, Kenneth O. Stanley, Robert T. Pennock, and Charles Ofria (2011). łOn the

Performance of Indirect Encoding Across the Continuum of Regularityž. In: IEEE

Transactions on Evolutionary Computation 15.3, pp. 346ś367. (Link).

Coello Coello, Carlos A., David A. Van Veldhuizen, and Gary B. Lamont (2007).

Evolutionary Algorithms for Solving Multi-Objective Problems. New York: Springer.

(Link).

Cognizant AI Lab (2023). Pandemic Response Challenge: Technical Setup, Assessment,

and Results. https://evolution.ml/xprize/. Retrieved 8/31/2025.

Colas, Cédric, Vashisht Madhavan, Joost Huizinga, and Jeﬀ Clune (2020). łScaling MAP-

Elites to Deep Neuroevolutionž. In: GECCO’20: Proceedings of the 2020 Genetic

and Evolutionary Computation Conference, pp. 67ś75. (Link).

Coleman, Kristen (2019). Lophius Piscatorius, ADW. https://animaldiversity.org/accounts/

Lophius_piscatorius/. Retrieved 8/31/2025.

Collins, Francis S., Mark S. Guyer, and Aravinda Chakravarti (1997). łVariations on a

Theme: Cataloging Human DNA Sequence Variationž. In: Science 278.5343, pp. 1580ś

1581. (Link).

Combes, Dominique, Pierre Meyrand, and John Simmers (1999). łMotor Pattern Speciﬁ-

cation by Dual Descending Pathways to a Lobster Rhythm-generating Networkž. In:

Journal of Neuroscience 19, pp. 2610ś2619. (Link).

Confavreux, Basile, Friedemann Zenke, Everton Agnes, Timothy Lillicrap, and Tim

Vogels (2020). łA Meta-learning Approach to (Re)discover Plasticity Rules That

Carve a Desired Function into a Neural Networkž. In: Advances in Neural Information

Processing Systems 33, pp. 16398ś16408. (Link).

Corballis, Michael C. (2011). The Recursive Mind: The Origins of Human Language,

Thought, and Civilization. Princeton, NJ: Princeton University Press. (Link).

Cully, Antoine, Jeﬀ Clune, Danesh Tarapore, and Jean-Baptiste Mouret (2015). łRobots

That Can Adapt Like Animalsž. In: Nature 521, pp. 503ś507. (Link).

413

REFERENCES

Cussat-Blanc, Sylvain, Kyle Harrington, and Wolfgang Banzhaf (2019). łArtiﬁcial gene

regulatory networksÐA reviewž. In: Artiﬁcial life 24, pp. 296ś328. (Link).

Cybenko, George (1989). łApproximation by Superpositions of a Sigmoidal Functionž.

In: Mathematics of Control, Signals, and Systems 2, pp. 303ś314. (Link).

D’Ambrosio, David B., Joel Lehman, Sebastian Risi, and Kenneth O. Stanley (2010).

łEvolving Policy Geometry for Scalable Multiagent Learningž. In: Proceedings of

the 9th International Conference on Autonomous Agents and Multiagent Systems,

pp. 731ś738. (Link).

D’Ambrosio, David B. and Kenneth O. Stanley (2008). łGenerative encoding for Multiagent

Learningž. In: GECCO’08: Proceedings of the 10th Annual Conference on Genetic

and Evolutionary Computation, pp. 819ś826. (Link).

Dai, Zihang, Hanxiao Liu, Quoc V. Le, and Mingxing Tan (2021a). łCoAtNet: Marrying

Convolution and Attention for All Data Sizesž. In: Advances in Neural Information

Processing Systems 34, pp. 3965ś3977. (Link).

(2021b). łCoAtNet: Marrying Convolution and Attention for All Data Sizesž. In:

Advances in Neural Information Processing Systems 34, pp. 3965ś3977. (Link).

Dasgupta, Dipankar and Douglas R. McGregor (1992). łDesigning Application-speciﬁc

Neural Networks Using the Structured Genetic Algorithmž. In: Proceedings of the

International Workshop on Combinations of Genetic Algorithms and Neural Networks,

pp. 87ś96. (Link).

Davies, Mike, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang Cao, Sri

Harsha Choday, Georgios Dimou, Prasad Joshi, Nabil Imam, Shweta Jain, Yuyun Liao,

Chit-Kwan Lin, Andrew Lines, Ruokun Liu, Deepak Mathaikutty, Steven McCoy,

Arnab Paul, Jonathan Tse, Guruguhanathan Venkataramanan, Yi-Hsin Weng, Andreas

Wild, Yoonseok Yang, and Hong Wang (2018). łLoihi: A Neuromorphic Manycore

Processor with On-Chip Learningž. In: IEEE Micro 38, pp. 82ś99. (Link).

de Jong, Edwin D. and Jordan B. Pollack (2004). łIdeal Evaluation from Coevolutionž. In:

Evolutionary Computation 12, pp. 159ś192. (Link).

De Jong, Kenneth A. (1975). łAnalysis of the Behavior of a Class of Genetic Adaptive

Systemsž. PhD thesis. Ann Arbor, MI: The University of Michigan. (Link).

(2020). łEvolutionary Computation: A Uniﬁed Approachž. In: GECCO’20: Proceed-

ings of the 2020 Genetic and Evolutionary Computation Conference Companion,

pp. 327ś342. (Link).

Deb, Kalyanmoy and Himanshu Jain (2014). łAn Evolutionary Many-Objective Optimiza-

tion Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part

I: Solving Problems With Box Constraintsž. In: IEEE Transactions on Evolutionar y

Computation 18, pp. 577ś601. (Link).

Deb, Kalyanmoy and Christie Myburgh (2017). łA Population-based Fast Algorithm

for a Billion-dimensional Resource Allocation Problem with Integer Variablesž. In:

European Journal of Operational Research 261, pp. 460ś474. (Link).

Deb, Kalyanmoy, Amrit Pratap, Sameer Agarwal, and T. Meyarivan (2002). łA Fast

and Elitist Multiobjective Genetic Algorithm: NSGA-IIž. In: IEEE Transactions on

Evolutionary Computation 6.2, pp. 182ś197. (Link).

414

REFERENCES

Dellaert, Frank and Randall D. Beer (1994). łToward an Evolvable Model of Development

for Autonomous Agent Synthesisž. In: Artiﬁcial Life IV: Proceedings of the Fourth

International Workshop on the Synthesis and Simulation of Living Systems. Ed. by

Rodney A. Brooks and Pattie Maes. Cambridge, MA: MIT Press, pp. 246ś257. (Link).

Department of Energy (2019). Detecting Radiological Threats in Urban Areas. https://www.

topcoder.com/challenges/30085346. Retrieved 8/31/2025.

DiCaprio, Ralph A. (1990). łAn Interneurone Mediating Motor Programme Switching

in the Ventilatory System of the Crabž. In: Journal of Experimental Biology 154,

pp. 517ś535. (Link).

Dietterich, Thomas G. (2002). łEnsemble Learningž. In: The Handbook of Brain Theory

and Neural Networks. Ed. by Michael A. Arbib. Vol. 2. 1. Cambridge, MA: MIT press,

pp. 110ś125. (Link).

Doncieux, Stéphane, Nicolas Bredeche, Jean-Baptiste Mouret, and Agoston E. Eiben

(2015). łEvolutionary Robotics: What, Why, and Where tož. In: Frontiers in Robotics

and AI 2. Article 4. (Link).

Dong, Xuanyi and Yi Yang (2020). łNAS-Bench-201: Extending the Scope of Reproducible

Neural Architecture Searchž. In: Proceedings of the Eighth International Conference

on Learning Representations, pp. 11287ś11302. (Link).

Dorigo, Marco, Vittorio Maniezzo, and Alberto Colorni (1996). łAnt System: Optimization

by a Colony of Cooperating Agentsž. In: IEEE Transactions on Systems, Man, and

Cybernetics, Part B (Cybernetics) 26.1, pp. 29ś41. (Link).

Dorigo, Marco and Thomas Stützle (2010). łAnt Colony Optimization: Overview and

Recent Advancesž. In: Handbook of Metaheuristics. Ed. by Michel Gendreau and

Jean-Yves Potvin. Vol. 146. New York: Springer, pp. 227ś263. (Link).

Dorigo, Marco, Guy Theraulaz, and Vittorio Trianni (2021). łSwarm Robotics: Past,

Present, and Futurež. In: Proceedings of the IEEE 109.7, pp. 1152ś1165. (Link).

Doursat, René, Hiroki Sayama, and Olivier Michel (2013). łA Review of Morphogenetic

Engineeringž. In: Natural Computing 12, pp. 517ś535. (Link).

Druckmann, Shaul, Yoav Banitt, Albert Gidon, Felix Schürmann, Henry Markram,

and Idan Segev (2007). łA Covel Multiple Objective Optimization Framework for

Constraining Conductance-based Neuron Models by Experimental Dataž. In: Frontiers

of Neuroscience 1.1, pp. 7ś18. (Link).

Earle, Sam, Justin Snider, Matthew C. Fontaine, Stefanos Nikolaidis, and Julian Togelius

(2022). łIlluminating Diverse Neural Cellular Automata for Level Generationž. In:

GECCO’22: Proceedings of the Genetic and Evolutionary Computation Conference,

pp. 68ś76. (Link).

Edwards, Donald H., William J. Heitler, and Franklin B. Krasne (1999). łFifty Years of a

Command Neuron: The Neurobiology of Escape Behavior in the Crayﬁsh.ž In: Trends

in Neuroscience 22, pp. 153ś161. (Link).

Eiben, Agoston E. and Selmar K. Smit (2011). łParameter Tuning for Conﬁguring and

Analyzing Evolutionary Algorithmsž. In: Swarm and Evolutionary Computation 1.1,

pp. 19ś31. (Link).

Eiben, Agoston E. and James E. Smith (2015). Introduction to Evolutionary Computing.

New York: Springer. (Link).

415

REFERENCES

Ellefsen, Kai Olav, Jean-Baptiste Mouret, and Jeﬀ Clune (2015). łNeural Modularity

Helps Organisms Evolve to Learn New Skills without Forgetting Old Skillsž. In: PLoS

computational biology 11.4, e1004128. (Link).

Elman, Jeﬀrey L., Elizabeth A. Bates, Mark H. Johnson, Annette Karmiloﬀ-Smith,

Domenico Parisi, and Kim Plunkett (1996). Rethinking Innateness: A Connectionist

Perspective on Development. Cambridge, MA: MIT Press. (Link).

ElSaid, AbdElRahman, Karl Ricanek, Zimeng Lyu, Alexander Ororbia, and Travis Desell

(2023). łBackpropagation-free 4D Continuous Ant-based Neural Topology Searchž.

In: Applied Soft Computing 147, p. 110737. (Link).

Elsken, Thomas, Jan H. Metzen, and Frank Hutter (2019). łNeural Architecture Search: A

Surveyž. In: Journal of Machine Learning Research 20, pp. 1ś21. (Link).

Essner, Timo (2021). Emojis. https://cartoonmovement.com/cartoon/emojis-0. Retrieved

8/31/25.

Fairey, Jason and Terence Soule (2014). łEvolution of Communication and Coopera-

tionž. In: GECCO’14: Proceedings of the 2014 Annual Conference on Genetic and

Evolutionary Computation, pp. 169ś176. (Link).

Faldor, Maxence, Jenny Zhang, Antoine Cully, and Jeﬀ Clune (2025). łOMNI-EPIC:

Open-endedness via Models of Human Notions of Interestingness with Environments

Programmed in Codež. In: Proceedings of the Thirteenth International Conference on

Learning Representations, pp. 97357ś97482. (Link).

Fan, James, Raymond Lau, and Risto Miikkulainen (2003). łUtilizing Domain Knowledge

in Neuroevolutionž. In: Proceedings of the 20th International Conference on Machine

Learning, pp. 170ś177. (Link).

Fernando, Chrisantha, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A.

Rusu, Alexander Pritzel, and Daan Wierstra (2017). łPathNet: Evolution Channels

Gradient Descent in Super Neural Networksž. In: arXiv:1701.08734. (Link).

Fernando, Chrisantha, Dylan Banarse, Henryk Michalewski, Simon Osindero, and Tim

Rocktäschel (2024). łPromptbreeder: Self-referential Self-improvement via Prompt

Evolutionž. In: Proceedings of the 41st International Conference on Machine Learning ,

pp. 13481ś13544. (Link).

Fernando, Chrisantha, Dylan Banarse, Malcolm Reynolds, Frederic Besse, David Pfau,

Max Jaderberg, Marc Lanctot, and Daan Wierstra (2016). łConvolution by Evolution:

Diﬀerentiable Pattern Producing Networksž. In: GECCO’16: Proceedings of the

Genetic and Evolutionary Computation Conference 2016, pp. 109ś116. (Link).

Fernando, Chrisantha, Jakub Sygnowski, Simon Osindero, Jane X. Wang, Tom Schaul,

Denis Teplyashin, Pablo Sprechmann, Alexander Pritzel, and Andrei A. Rusu (2018).

łMeta-learning by the Baldwin Eﬀectž. In: GECCO’18: Proceedings of the Genetic

and Evolutionary Computation Conference Companion, pp. 1313ś1320. (Link).

Ficici, Sevan G. and Jordan B. Pollack (2001). łPareto Optimality in Coevolutionary

Learningž. In: Advances in Artiﬁcial Life: 6th European Conference. Ed. by Jozef

Kelemen and Petr Sosík. New York: Springer, pp. 316ś325. (Link).

Figueira Pujol, Joao Carlos and Riccardo Poli (1998). łEvolving the Topology and the

Weights of Neural Networks Using a Dual Representationž. In: Applied Intelligence 8,

pp. 73ś84. (Link).

416

REFERENCES

Finn, Chelsea, Pieter Abbeel, and Sergey Levine (2017). łModel-agnostic Meta-learning

for Fast Adaptation of Deep Networksž. In: Proceedings of the 34th International

Conference on Machine Learning, pp. 1126ś1135. (Link).

Floreano, Dario, Peter Dür r, and Claudio Mattiussi (2008). łNeuroevolution: From

Architectures to Learningž. In: Evolutionary Intelligence 1, pp. 47ś62. (Link).

Floreano, Dario, Sara Mitri, Stéphane Magnenat, and Laurent Keller (2007). łEvolutionary

Conditions for the Emergence of Communication in Robotsž. In: Current Biology

17.6, pp. 514ś519. (Link).

Floreano, Dario and Francesco Mondada (1996a). łEvolution of Homing Navigation in a

Real Mobile Robotž. In: IEEE Transactions on Systems, Man, and Cybernetics 26,

pp. 396ś407. (Link).

(1996b). łEvolution of Plastic Neurocontrollers for Situated Agentsž. In: From Animals

to Animats 4: Proceedings of the International Conference on Simulation of Adaptive

Behavior, pp. 402ś410. (Link).

Floreano, Dario and Joseba Urzelai (1999). łEvolution of Neural Controllers with

Adaptive Synapses and Compact Genetic Encodingž. In: Advances in Artiﬁcial Life:

5th European Conference. Ed. by Dario Floreano, Jean-Daniel Nicoud, and Francesco

Mondada. New York: Springer, pp. 183ś194. (Link).

(2000). łEvolutionary Robots with On-Line Self-Organization and Behavioral Fitnessž.

In: Neural Networks 13, pp. 431ś4434. (Link).

(2001). łEvolution of Plastic Control Networksž. In: Autonomous robots 11, pp. 311ś

317. (Link).

Floridi, Luciano and Massimo Chiriatti (2020). łGPT-3: Its Nature, Scope, Limits, and

Consequencesž. In: Minds and Machines 30, pp. 681ś694. (Link).

Fogel, David B. (2001). Blondie24: Playing at the Edge of AI. San Francisco: Kaufmann.

(Link).

(2006). Evolutionary Computation: Toward a New Philosophy of Machine Intelligence.

Third. Piscataway, NJ: IEEE Press. (Link).

Fogel, David B., Lawrence J. Fogel, and Vincent W. Porto (1990). łEvolving Neural

Networksž. In: Biological Cybernetics 63.6, pp. 487ś493. (Link).

Fogel, David B., Timothy J. Hays, Sarah L. Hahn, and James Quon (2004). łA Self-

Learning Evolutionary Chess Programž. In: Proceedings of the IEEE 92, pp. 1947ś

1954. (Link).

Fogel, Lawrence J., Alvin J. Owens, and Michael J. Walsh (1966). Artiﬁcial Intelligence

through Simulated Evolution. New York: Wiley. (Link).

Fontaine, Matthew C. and Stefanos Nikolaidis (2021). łDiﬀerentiable Quality Diversityž.

In: Advances in Neural Information Processing Systems 34, pp. 10040ś10052. (Link).

(2023). łCovariance Matrix Adaptation MAP-annealingž. In: GECCO’23: Proceedings

of the Genetic and Evolutionary Computation Conference, pp. 456ś465. (Link).

Fontaine, Matthew C., Julian Togelius, Stefanos Nikolaidis, and Amy K. Hoover (2020).

łCovariance Matrix Adaptation for the Rapid Illumination of Behavior Spacež. In:

GECCO’20: Proceedings of the 2020 Genetic and Evolutionary Computation Confer-

ence, pp. 94ś102. (Link).

417

REFERENCES

Fox, Spencer J., Michael Lachmann, Mauricio Tec, Remy Pasco, Spencer Woody, Zhanwei

Du, Xutong Wang, Tanvi A. Ingle, Emily Javan, Maytal Dahan, Kelly Gaither, Mark E.

Escott, Stephen I. Adler, S. Claiborne Johnston, James G. Scott, and Lauren A. Meyers

(2022). łReal-time Pandemic Surveillance Using Hospital Admissions and Mobility

Dataž. In: Proceedings of the National Academy of Sciences 119, e2111870119. (Link).

Francon, Olivier (2025). Project Resilience Platform.

https://github.com/Project-Resilience/

platform. Retrieved 8/31/25.

Francon, Olivier, Santiago Gonzalez, Babak Hodjat, Elliot Meyerson, Risto Miikkulainen,

Xin Qiu, and Hormoz Shahrzad (2020). łEﬀective Reinforcement Learning through

Evolutionary Surrogate-Assisted Prescriptionž. In: GECCO’20: Proceedings of the

2020 Genetic and Evolutionary Computation Conference, pp. 814ś822. (Link).

Frankle, Jonathan and Michael Carbin (2019). łThe Lottery Ticket Hypothesis: Finding

Sparse, Trainable Neural Networksž. In: Proceedings of the Seventh International

Conference on Learning Representations, pp. 8954ś8995. (Link).

Friedlingstein, Pierre et al. (2023). łGlobal Carbon Budget 2023ž. In: Earth System

Science Data 15, pp. 5301ś5369. (Link).

Friedmann, Naama and Dana Rusou (2015). łCritical Period for First Language: The

Crucial Role of Language Input during the First Year of Lifež. In: Current Opinion in

Neurobiology 35, pp. 27ś34. (Link).

Fukushima, Kunihiko (1980). łNeocognitron: A Self-organizing Neural Network Model

for a Mechanism of Pattern Recognition Unaﬀected by Shift in Positionž. In: Biological

cybernetics 36.4, pp. 193ś202. (Link).

Fullmer, Brad and Risto Miikkulainen (1992). łUsing Marker-Based Genetic Encoding

of Neural Networks to Evolve Finite-State Behaviourž. In: Toward a Practice of

Autonomous Systems: Proceedings of the First European Conference on Artiﬁcial

Life. Ed. by Francisco J. Varela and Paul Bourgine. Cambridge, MA: MIT Press,

pp. 255ś262. (Link).

Gad, Ahmed G. (2022). łParticle Swarm Optimization Algorithm and Its Applications:

A Systematic Reviewž. In: Archives of Computational Methods in Engineering 29,

pp. 2531ś2561. (Link).

Gaier, Adam and David Ha (2019). łWeight Agnostic Neural Networksž. In: Advances in

Neural Information Processing Systems 32, pp. 5365ś5379. (Link).

Galke, Lukas, Yoav Ram, and Limor Raviv (2022). łEmergent Communication for

Understanding Human Language Evolution: What’s Missing?ž In: Workshop on

Emergent Communication: New Frontiers, Tenth International Conference on Learning

Representations. (Link).

Gallardo, Guiller mo, Cornelius Eichner, Chet C. Sherwood, William D. Hopkins, Alfred

Anwander, and Angela D. Friederici (2023). łMorphological Evolution of Language-

relevant Brain Areasž. In: PLoS Biology 21.9, e3002266. (Link).

Ganon, Zohar, Alon Keinan, and Eytan Ruppin (2003). łEvolutionary Network Minimiza-

tion: Adaptive Implicit Pruning of Successful Agentsž. In: Advances in Artiﬁcial Life:

7th European Conference. Ed. by Wolfgang Banzhaf, Jens Ziegler, Thomas Christaller,

Peter Dittrich, and Jan T. Kim. New York: Springer, pp. 319ś327. (Link).

418

REFERENCES

Gao, Boyan, Henry Gouk, and Timothy M. Hospedales (2021). łSearching for Robustness:

Loss Learning for Noisy Classiﬁcation Tasksž. In: 2021 IEEE/CVF International

Conference on Computer Vision, pp. 6650ś6659. (Link).

García-Pedrajas, Nicolás E., César Hervás-Martínez, and Domingo Ortíz-Boyer (2005).

łCooperative Coevolution of Artiﬁcial Neural Network Ensembles for Pattern Clas-

siﬁcationž. In: IEEE Transactions on Evolutionary Computation 9, pp. 271ś302.

(Link).

Gauci, Jason and Kenneth O. Stanley (2010). łAutonomous Evolution of Topographic

Regularities in Artiﬁcial Neural Networksž. In: Neural computation 22.7, pp. 1860ś

1898. (Link).

Gemini Team (2025). Gemini 2.5: Pushing the Frontier with Advanced Reasoning,

Multimodality, Long Context, and Next-Generation Agentic Capabilities. Tech. rep.

Google DeepMind. (Link).

Ghawaly, James, Aaron Young, Dan Archer, Nick Prins, Brett Witherspoon, and Catherine

Schuman (2022). łA Neuromorphic Algorithm for Radiation Anomaly Detectionž. In:

Proceedings of the International Conference on Neuromorphic Systems 2022. Article

22. (Link).

Ghawaly, James, Aaron Young, Andrew Nicholson, Brett Witherspoon, Nick Prins,

Mathew Swinney, Cihangir Celik, Catherine Schuman, and Karan Patel (2023).

łPerformance Optimization Study of the Neuromorphic Radiation Anomaly Detectorž.

In: Proceedings of the 2023 International Conference on Neuromorphic Systems,

pp. 1ś7. (Link).

Giacomello, Edoardo, Pier L. Lanzi, and Daniele Loiacono (2019). łSearching the

Latent Space of a Generative Adversar ial Network to Generate DOOM Levelsž. In:

Proceedings of the IEEE Conference on Games, pp. 1ś8. (Link).

Giles, C. Lee, Cliﬀord B. Miller, Dong Chen, Guo-Zheng Sun, Hsing-Hen Chen, and

Yee-Chun Lee (1991). łExtracting and Learning an Unknown Grammar with Recurrent

Neural Networksž. In: Advances in Neural Information Processing Systems 4, pp. 317ś

324. (Link).

Gilpin, William (2019). łCellular Automata as Convolutional Neural Networksž. In:

Physical Review E 100.3, p. 032402. (Link).

Glorot, Xavier and Yoshua Bengio (2010). łUnderstanding the Diﬃculty of Training

Deep Feedforward Neural Networksž. In: Proceedings of the Thirteenth International

Conference on Artiﬁcial Intelligence and Statistics, pp. 249ś256. (Link).

Goldberg, David E. and Jon Richardson (1987). łGenetic Algorithms with Sharing for

Multimodal Function Optimizationž. In: Proceedings of the Second International

Conference on Genetic Algorithms and Their Application. Vol. 4149, pp. 414ś425.

(Link).

Gomes, Jorge, Paulo Urbano, and Anders L. Christensen (2013). łEvolution of Swarm

Robotics Systems with Novelty Searchž. In: Swarm Intelligence 7, pp. 115ś144. (Link).

Gomez, Faustino (2003). łRobust Non-Linear Control through Neuroevolutionž. PhD

thesis. Austin, TX: Department of Computer Sciences, The University of Texas at

Austin. (Link).

419

REFERENCES

Gomez, Faustino and Risto Miikkulainen (1997). łIncremental Evolution of Complex

General Behaviorž. In: Adaptive Behavior 5, pp. 317ś342. (Link).

(2003). łActive Guidance for a Finless Rocket Using Neuroevolutionž. In: Genetic

and Evolutionary Computation—GECCO 2003, pp. 2084ś2095. (Link).

(2004). łTransfer of Neuroevolved Controllers in Unstable Domainsž. In: Genetic and

Evolutionary Computation Conference—GECCO 2004, pp. 957ś968. (Link).

Gomez, Faustino, Jürgen Schmidhuber, and Risto Miikkulainen (2008). łAccelerated

Neural Evolution Through Cooperatively Coevolved Synapsesž. In: Journal of Machine

Learning Research 9, pp. 937ś965. (Link).

Gonzalez, Santiago, Mohak Kant, and Risto Miikkulainen (2023). łEvolving GAN

Formulations for Higher Quality Image Synthesisž. In: Artiﬁcial Intelligence in the

Age of Neural Networks and Brain Computing (second edition). Ed. by Robert Kozma,

Cesare Alippi, Yoonsuck Choe, and Francesco c. Morabito. Amsterdam: Elsevier,

pp. 289ś305. (Link).

Gonzalez, Santiago, Joshua Landgraf, and Risto Miikkulainen (2019). łFaster Training by

Selecting Samples Using Embeddingsž. In: Proceedings of the International Joint

Conference on Neural Networks, pp. 4982ś4988. (Link).

Gonzalez, Santiago and Risto Miikkulainen (2020). łImproved Training Speed, Accuracy,

and Data Utilization Through Loss Function Optimizationž. In: Proceedings of the

IEEE Congress on Evolutionary Computation, pp. 289ś296. (Link).

(2021). łOptimizing Loss Functions Through Multivariate Taylor Polynomial Parame-

terizationž. In: GECCO’21: Proceedings of the Genetic and Evolutionary Computation

Conference, pp. 305ś313. (Link).

Gonzalez, Santiago, Xin Qiu, and Risto Miikkulainen (2025). łEﬀective Regularization

Through Evolutionary Loss-Function Metalearningž. In: Proceedings of the IEEE

Congress on Evolutionary Computation, pp. 1ś9. (Link).

Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil

Ozair, Aaron Courville, and Yoshua Bengio (2014). łGenerative Adversarial Netsž.

In: Advances in Neural Information Processing Systems 27, pp. 2672ś2680. (Link).

(2020). łGenerative Adversarial Networksž. In: Communications of the ACM 63.11,

pp. 139ś144. (Link).

Goodman, Erik (2025). Annual Humies Awards For Human-Competitive Results. https://

human-competitive.org. Retrieved 8/31/2025.

GPAI (2024). Pandemic Resilience: Case Studies of an AI-calibrated Ensemble of Models

to Inform Decision Making. Report. Global Partnership on Artiﬁcial Intelligence.

(Link).

Grattaﬁori, Aaron et al. (2024). łThe Llama 3 Herd of Modelsž. In: arXiv:2407.21783.

(Link).

Grattarola, Daniele, Lorenzo Livi, and Cesare Alippi (2021). łLear ning Graph Cellular

Automataž. In: Advances in Neural Information Processing Systems 34, pp. 20983ś

20994. (Link).

Graves, Alex, Greg Wayne, and Ivo Danihelka (2014). łNeural Turing machinesž. In:

arXiv:1410.5401. (Link).

420

REFERENCES

Grefenstette, John J. (1986). łOptimization of Control Parameters for Genetic Algor ithmsž.

In: IEEE Transactions on Systems, Man, and Cybernetics 16.1, pp. 122ś128. (Link).

Greﬀ, Klaus, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmid-

huber (2016). łLSTM: A Search Space Odysseyž. In: IEEE Transactions on Neural

Networks and Learning Systems 28, pp. 2222ś2232. (Link).

Greve, Rasmus B., Emil J. Jacobsen, and Sebastian Risi (2016). łEvolving Neural Turing

Machines for Reward-based Learningž. In: GECCO’16: Proceedings of the Genetic

and Evolutionary Computation Conference 2016, pp. 117ś124. (Link).

Grillotti, Luca and Antoine Cully (2022). łUnsupervised Behavior Discovery With Quality-

Diversity Optimizationž. In: IEEE Transactions on Evolutionary Computation 26.6,

pp. 1539ś1552. (Link).

Gruau, Frederic (1994). łAutomatic Deﬁnition of Modular Neural Networksž. In: Adaptive

Behavior 3.2, pp. 151ś183. (Link).

Gruau, Frederic and Darrell Whitley (1993). łAdding Learning to the Cellular Development

of Neural Networks: Evolution and the Baldwin Eﬀectž. In: Evolutionary Computation

1, pp. 213ś233. (Link).

Gruau, Frederic, Darrell Whitley, and Larry Pyeatt (1996). łA Comparison Between

Cellular Encoding and Direct Encoding for Genetic Neural Networksž. In: Genetic

Programming 1996: Proceedings of the First Annual Conference. Ed. by John R. Koza,

David E. Goldberg, David B. Fogel, and Rick L. Riolo. Cambridge, MA: MIT Press,

pp. 81ś89. (Link).

Guo, Daya et al. (2025). łDeepSeek-R1: Incentivizing Reasoning Capability in LLMs via

Reinforcement Learningž. In: arXiv:2501.12948. (Link).

Guo, Qingyan, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang

Bian, and Yujiu Yang (2024). łConnecting Large Language Models with Evolutionary

Algorithms Yields Powerful Prompt Optimizersž. In: Proceedings of the Twelfth

International Conference on Learning Representations, pp. 29890ś29913. (Link).

Gupta, Agrim, Silvio Savarese, Surya Ganguli, and Fei-Fei Li (2021). łEmbodied In-

telligence via Learning and Evolutionž. In: Nature communications 12.1, p. 5721.

(Link).

Ha, David (2019). łReinforcement Learning for Improving Agent Designž. In: Artiﬁcial

life 25.4, pp. 352ś365. (Link).

Ha, David, Andrew Dai, and Quoc V. Le (2017). łHyperNetworksž. In: Proceedings of

the Fifth International Conference on Learning Representations, pp. 103ś120. (Link).

Ha, David and Jürgen Schmidhuber (2018). łRecurrent World Models Facilitate Policy

Evolutionž. In: Advances in Neural Information Processing Systems 31, pp. 2451ś2463.

(Link).

Hadi, Muhammad U., Qasem Al Tashi, Rizwan Qureshi, Abbas Shah, Amgad Muneer,

Muhammad Irfan, Anas Zafar, Muhammad B. Shaikh, Naveed Akhtar, Syed Z. Hassan,

Maged Shoman, Jia Wu, Seyedali Mirjalili, and Mubarak Shah (2025). łLarge Language

Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and

Future Prospectsž. In: TechRxiv, February 10. (Link).

421

REFERENCES

Hadjiivanov, Alexander and Alan Blair (2019). łEpigenetic Evolution of Deep Convolu-

tional Modelsž. In: Proceedings of the IEEE Congress on Evolutionary Computation,

pp. 1478ś1486. (Link).

Hafner, Danijar (2022). łBenchmarking the Spectrum of Agent Capabilitiesž. In: Proceed-

ings of the Tenth International Conference on Learning Representations, pp. 24538ś

24558. (Link).

Hale, Thomas, Sam Webster, Anna Petherick, Toby Phillips, and Beatriz Kira (2020).

Oxford COVID-19 Government Response Tracker. https://www.bsg.ox.ac.uk/research/

covid-19-government-response-tracker. Retrieved 8/31/2025.

Hansen, Nikolaus (2016). łThe CMA Evolution Strategy: A tutorialž. In: arXiv:1604.00772.

(Link).

Hansen, Nikolaus, Anne Auger, Steﬀen Finck, and Raymond Ros (2010). Real-parameter

Black-box Optimization Benchmarking 2010: Experimental Setup. Tech. rep. INRIA.

(Link).

Hansen, Nikolaus and Andreas Ostermeier (1996). łAdapting Arbitrary Normal Mutation

Distributions in Evolution Strategies: The Covariance Matr ix Adaptationž. In: Proceed-

ings of IEEE International Conference on Evolutionary Computation, pp. 312ś317.

(Link).

(2001). łCompletely Derandomized Self-Adaptation in Evolution Strategiesž. In:

Evolutionary Computation 9, pp. 159ś195. (Link).

Hansis, Eberhard, Steven J. Davis, and Julia Pongratz (2015). łRelevance of Method-

ological Choices for Accounting of Land Use Change Carbon Fluxesž. In: Global

Biogeochemical Cycles 29.8, pp. 1230ś1246. (Link).

Hanson, Stephen J. and Lorien Y. Pratt (1988). łComparing Biases for Minimal Net-

work Construction with Back-Propagationž. In: NIPS’87: Proceedings of the 1st

International Conference on Neural Information Processing Systems, pp. 177ś185.

(Link).

Hardison, Ross C. (2003). łComparative genomicsž. In: PLoS biology 1.2, e58.

(Link).

Harp, Steven A., Tariq Samad, and Aloke Guha (1989). łTowards the Genetic Synthesis of

Neural Networksž. In: Proceedings of the Third International Conference on Genetic

Algorithms, pp. 391ś396.

Hastings, Erin J., Ratan K. Guha, and Kenneth O. Stanley (2009). łAutomatic Content

Generation in the Galactic Arms Race Video Gamež. In: IEEE Transactions on

Computational Intelligence and AI in Games 1.4, pp. 245ś263. (Link).

Hausknecht, Matthew, Joel Lehman, Risto Miikkulainen, and Peter Stone (2014). łA

Neuroevolution Approach to General Atari Game Playingž. In: IEEE Transactions on

Computational Intelligence and AI in Games 6.4, pp. 355ś366. (Link).

Hawkins, Jeﬀ and Subutai Ahmad (2016). łWhy Neurons Have Thousands of Synapses,

a Theory of Sequence Memory in Neocortexž. In: Frontiers in Neural Circuits 10.

Article 23. (Link).

Hawkins, Jeﬀ and Sandra Blakeslee (2004). On Intelligence. Times Books. (Link).

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun (2016). łDeep Residual

Learning for Image Recognitionž. In: Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition, pp. 770ś778. (Link).

422

REFERENCES

He, Xin, Kaiyong Zhao, and Xiaowen Chu (2021). łAutoML: A survey of the state-of-the-

artž. In: Knowledge-Based Systems 212, p. 106622. (Link).

Hemberg, Erik, Jamal Toutouh, Abdullah Al-Dujaili, Tom Schmiedlechner, and Una-May

O’Reilly (2021). łSpatial coevolution for generative adversarial network trainingž. In:

ACM Transactions on Evolutionary Learning and Optimization 1, pp. 1ś28. (Link).

Herzing, Denise L. and Christine M. Johnson (2015). Dolphin Communication and

Cognition: Past, Present, and Future. Cambridge, MA: MIT Press. (Link).

Hingston, Phil, ed. (2012). Believable Bots. New York: Springer.

(Link).

Hinton, Geoﬀrey E., James L. McClelland, and David E. Rumelhart (1986). łDis-

tributed Representationsž. In: Parallel Distributed Processing: Explorations in the

Microstructure of Cognition, Vol. 1: Foundations. Ed. by David E. Rumelhart, James L.

McClelland, and PDP Research Group. Cambridge, MA: MIT Press, pp. 77ś109.

(Link).

Hinton, Geoﬀrey E. and Steven J. Nowlan (1987). łHow Learning Can Guide Evolutionž.

In: Complex Systems 1, pp. 495ś502. (Link).

Hinton, Geoﬀrey E. and Ruslan R. Salakhutdinov (2006). łReducing the Dimensionality

of Data with Neural Networksž. In: Science 313.5786, pp. 504ś507. (Link).

Hintze, Arend, Jeﬀrey A. Edlund, Randal S. Olson, David B. Knoester, Jory Schos-

sau, Larissa Albantakis, Ali Tehrani-Saleh, Peter Kvam, Leigh Sheneman, Heather

Goldsby, Cliﬀord Bohm, and Christoph Adami (2017). łMarkov Brains: A Technical

Introductionž. In: arXiv:1709.05601. (Link).

Ho, Jonathan, Ajay Jain, and Pieter Abbeel (2020). łDenoising Diﬀusion Probabilistic

Modelsž. In: Advances in Neural Information Processing Systems 33, pp. 6840ś6851.

(Link).

Hochreiter, Sepp and Jürgen Schmidhuber (1997). łLong Short-term Memoryž. In: Neural

Computation 9.8, pp. 1735ś1780. (Link).

Holland, John H. and J. S. Reitman (1978). łCognitive Systems Based on Adaptive

Algorithmsž. In: Pattern-Directed Inference Systems. Ed. by D. A. Waterman and

Frederick Hayes-Roth. San Diego, CA: Academic Press, pp. 313ś329. (Link).

Hoover, Amy K., Michael P. Rosario, and Kenneth O. Stanley (2008). łScaﬀolding for

Interactively Evolving Novel Drum Tracks for Existing Songsž. In: Applications of

Evolutionary Computing: EvoWorkshops 2008, pp. 412ś422. (Link).

Hoover, Amy K., Paul A. Szerlip, and Kenneth O. Stanley (2014). łFunctional Scaﬀolding

for Composing Additional Musical Voicesž. In: Computer Music Journal 38.4, pp. 80ś

99. (Link).

Horibe, Kazuya, Kathryn Walker, Rasmus Berg Palm, Shyam Sudhakaran, and Sebastian

Risi (2022). łSevere Damage Recovery in Evolving Soft Robots through Diﬀerentiable

Programmingž. In: Genetic Programming and Evolvable Machines 23.3, pp. 405ś426.

(Link).

Horibe, Kazuya, Kathryn Walker, and Sebastian Risi (2021). łRegenerating Soft Robots

through Neural Cellular Automataž. In: Genetic Programming: 24th European Con-

ference. Ed. by Ting Hu, Nuno Lourenço, and Eric Medvet. New York: Springer,

pp. 36ś50. (Link).

423

REFERENCES

Hornby, Gregory S. and Jordan B. Pollack (2001a). łBody-brain Co-evolution Using

L-systems as a Generative Encodingž. In: GECCO’01 Proceedings of the 3rd Annual

Conference on Genetic and Evolutionary Computation, pp. 868ś875. (Link).

(2001b). łThe Advantages of Generative Grammatical Encodings for Physical Designž.

In: Proceedings of the IEEE Congress on Evolutionary Computation. Vol. 1, pp. 600ś

607. (Link).

(2002). łCreating High-level Components with a Generative Representation for

Body-brain Evolutionž. In: Artiﬁcial life 8.3, pp. 223ś246. (Link).

Hornik, Kurt, Maxwell Stinchcombe, and Halbert White (1989). łMultilayer Feedforward

Networks are Universal Approximatorsž. In: Neural Networks 2, pp. 359ś366. (Link).

Horvát, Szabolcs, Răzvan Gămănu

, Mária Ercsey-Ravasz, Loïc Magrou, Bianca Gămănu

David C. Van Essen, Andreas Burkhalter, Kenneth Knoblauch, Zoltán Toroczkai, and

Henry Kennedy (2016). łSpatial Embedding and Wiring Cost Constrain the Functional

Layout of the Cortical Network of Rodents and Primatesž. In: PLOS Biology 14,

e1002512. (Link).

Hougen, Dean Freder ick and Syed Naveed Hussain Shah (2019). łThe Evolution of Rein-

forcement Learningž. In: 2019 IEEE Symposium Series on Computational Intelligence,

pp. 1457ś1464. (Link).

Huang, Gao, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger (2017a).

łDensely Connected Convolutional Networksž. In: Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, pp. 2261ś2269. (Link).

(2017b). łDensely Connected Convolutional Networksž. In: Proceedings of the IEEE

Conference on Computer Vision and Pattern Recognition, pp. 4700ś4708. (Link).

Huang, Jia-Bin (2021). Types of Computer Vision Paper. https://x.com/jbhuang0604/status/

1388577506253475849. Retrieved 8/31/25.

Huang, Pei-Chi, Luis Sentis, Joel Lehman, Chien-Liang Fok, Aloysius K. Mok, and Risto

Miikkulainen (2019). łTradeoﬀs in Neuroevolutionary Learning-Based Real-Time

Robotic Task Design in the Imprecise Computation Frameworkž. In: ACM Transactions

on Cyber-Physical Systems 3, 14:1ś14:29. (Link).

Hubel, David H. and Torsten N. Wiesel (1968). łReceptive Fields and Functional Archi-

tecture of Monkey Striate Cortexž. In: The Journal of Physiology 195, pp. 215ś243.

(Link).

Huizinga, Joost, Kenneth O. Stanley, and Jeﬀ Clune (2018). łThe Emergence of Canal-

ization and Evolvability in an Open-ended, Interactive Evolutionary Systemž. In:

Artiﬁcial life 24, pp. 157ś181. (Link).

Hurtt, George C. et al. (2020). łHarmonization of Global Land-Use Change and Man-

agement for the Period 850-2100 (LUH2) for CMIP6ž. In: Geoscientiﬁc Model

Development 13, pp. 5425ś5464. (Link).

Husbands, Philip and Frank Mill (1991). łSimulated Co-evolution as the Mechanism

for Emergent Planning and Schedulingž. In: Proceedings of the Fourth International

Conference on Genetic Algorithms, pp. 264ś270. (Link).

Iacca, Giuseppe, Fabio Caraﬃni, and Ferrante Ner i (2020). łDiﬀerential Evolution for

Neural Networks Optimizationž. In: Mathematics 8, p. 69. (Link).

424

REFERENCES

Iba, Hitoshi and Nasimul Noman, eds. (2016). Evolutionary Computation in Gene

Regulatory Network Research. Wiley. (Link).

Ijspeert, Auke J. (2008). łCentral patter n generators for locomotion control in animals

and robots: A reviewž. In: Neural Networks 21, pp. 642ś653. (Link).

Ijspeert, Auke J., Alessandro Crespi, Dimitri Ryczko, and Jean-Marie Cabelguen (2007).

łFrom Swimming to Walking with a Salamander Robot Driven by a Spinal Cord

Modelž. In: Science 315, pp. 1416ś1420. (Link).

International Human Genome Sequencing Consortium (2004). łFinishing the Euchromatic

Sequence of the Human Genomež. In: Nature 431, pp. 931ś945. (Link).

Iranmehr, Ensieh, Saeed B. Shouraki, Mohammad M. Faraji, Nassim Bagheri, and

Bernabé Linares-Barranco (2019). łBio-Inspired Evolutionary Model of Spiking

Neural Networks in Ionic Liquid Spacež. In: Frontiers in Neuroscience 13, p. 1085.

(Link).

Ishibuchi, Hisao, Noritaka Tsukamoto, and Yusuke Nojima (2008). łEvolutionary Many-

Objective Optimization: A Short Reviewž. In: Proceedings of the IEEE Congress on

Evolutionary Computation, pp. 2419ś2426. (Link).

Ishida Lab (2018). The N700 Series Shinkansen (Bullet Train). https://www.sys.cs.tut.ac.jp/

en/research-activities/research-introduction/what-is-a-genetic-algorithm/2/. Retrieved

9/29/2018.

Islam, Md. Monirul and Xin Yao (2008). łEvolving Artiﬁcial Neural Network Ensemblesž.

In: Computational Intelligence: A Compendium. Ed. by John Fulcher and Lakhmi C.

Jain. New York: Springer, pp. 851ś880. (Link).

ITU (2023). Project Resilience. https://www.itu.int/en/ITU-T/extcoop/ai-data-commons/

Pages/project-resilience.aspx. Retrieved 8/31/2025.

Jacob, François (1977). łEvolution and Tinkeringž. In: Science 196.4295, pp. 1161ś1166.

(Link).

Jaderberg, Max, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeﬀ Donahue,

Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha

Fernando, and Koray Kavukcuoglu (2017). łPopulation Based Training of Neural

Networksž. In: arXiv:1711.09846. (Link).

Jahns, James and Arend Hintze (2018). łHow the Integration of Group and Individual

Level Selection Aﬀects the Evolution of Cooperationž. In: ALIFE 2018: The 2018

Conference on Artiﬁcial Life, pp. 530ś535. (Link).

Jain, Ashish, Anand Subramoney, and Risto Miikkulainen (2012). łTask decomposition

with neuroevolution in extended predator-prey domainž. In: Artiﬁcial Life 13: Pro-

ceedings of Thirteenth International Conference on the Synthesis and Simulation of

Living Systems, pp. 341ś348. (Link).

James, Conrad D., James B. Aimone, Nadine E. Miner, Craig M. Vineyard, Fredrick H.

Rothganger, Kristofor D. Carlson, Samuel A. Mulder, Timothy J. Draelos, Aleksandra

Faust, Matthew J. Marinella, John H. Naegle, and Steven J. Plimpton (2017). łA

Historical Survey of Algorithms and Hardware Architectures for Neural-inspired

and Neuromorphic Computing Applicationsž. In: Biologically Inspired Cognitive

Architectures 19, pp. 49ś64. (Link).

425

REFERENCES

Jastrzebski, Stanislaw, Devansh Arpit, Oliver Astrand, Giancarlo B. Kerg, Huan Wang,

Caiming Xiong, Richard Socher, KyungHyun Cho, and Krzysztof J. Geras (2021).

łCatastrophic Fisher explosion: Early phase Fisher matr ix impacts generalizationž. In:

Proceedings of the 38th International Conference on Machine Learning, pp. 4772ś

4784. (Link).

Jiang, Albert Q. et al. (2023). łMistral 7Bž. In: arXiv:2310.06825.

(Link).

Jiang, Shen, Zipeng Ji, Guanghui Zhu, Chunfeng Yuan, and Yihua Huang (2023).

łOperation-Level Early Stopping for Robustifying Diﬀerentiable NASž. In: Advances

in Neural Information Processing Systems 35, pp. 70983ś71007. (Link).

Jordan, Jacob, Maximilian Schmidt, Walter Senn, and Mihai A. Petrovici (2021). łEvolving

Interpretable Plasticity for Spiking Networksž. In: eLife 10, e66273. (Link).

Kang, Hongwei, Fengfan Bei, Yong Shen, Xingping Sun, and Qingyi Chen (2021).

łA Diversity Model Based on Dimension Entropy and Its Application to Swarm

Intelligence Algorithmž. In: Entropy 23, p. 397. (Link).

Kaplan, Jared D., Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess,

Rewon Child, Scott Gray, Alec Radford, Jeﬀrey Wu, and Dario Amodei (2020).

łScaling Laws for Neural Language Modelsž. In: arXiv:2001.08361. (Link).

Karakida, Ryo, Shotaro Akaho, and Shun-ichi Amari (2019). łUniversal Statistics of

Fisher Information in Deep Neural Networks: Mean Field Approachž. In: The 22nd

International Conference on Artiﬁcial Intelligence and Statistics, pp. 1032ś1041.

(Link).

Karpov, Igor V., Leif M. Johnson, and Risto Miikkulainen (2015). łEvaluating Team

Behaviors Constructed with Human-guided Machine Learningž. In: Proceedings of

the IEEE Conference on Computational Intelligence in Games, pp. 292ś298. (Link).

Karpov, Igor V., Leif M. Johnson, Vinod Valsalam, and Risto Miikkulainen (2012).

łEvaluation Methods for Human-Guided Neuroevolution in Gamesž. In: Proceedings

of the AAAI Fall Symposium on Robots that Learn Interactively from Human Teachers.

(Link).

Karpov, Igor V., Jacob Schrum, and Risto Miikkulainen (2012). łBelievable Bot Navigation

via Playback of Human Tracesž. In: Believable Bots. Ed. by Philip Hingston. New

York: Springer, pp. 151ś170. (Link).

Karpov, Igor V., Vinod Valsalam, and Risto Miikkulainen (2011). łHuman-Assisted Neu-

roevolution Through Shaping, Advice and Examplesž. In: GECCO’11: Proceedings of

the 13th Annual Conference on Genetic and Evolutionary Computation, pp. 371ś378.

(Link).

Kashtan, Nir and Uri Alon (2005). łSpontaneous Evolution of Modularity and Network

Motifsž. In: Proceedings of the National Academy of Sciences 102, pp. 13773ś13778.

(Link).

Kashtan, Nir, Shalev Itzkovitz, Ron Milo, and Uri Alon (2004). łEﬃcient Sampling

Algorithm for Estimating Subgraph Concentrations and Detecting Network Motifsž.

In: Bioinformatics 20.11, pp. 1746ś1758. (Link).

Kay, Tomas, Laurent Keller, and Laurent Lehmann (2020). łThe Evolution of Altruism

and the Serial Rediscovery of the Role of Relatednessž. In: Proceedings of the National

Academy of Sciences - PNAS 117.46, pp. 28894ś28898. (Link).

426

REFERENCES

Keinan, Alon, Ben Sandbank, Claus C. Hilgetag, Isaac Meilijson, and Eytan Ruppin

(2006). łAxiomatic Scalable Neurocontroller Analysis via the Shapley Valuež. In:

Artiﬁcial Life 12, pp. 333ś352. (Link).

Kempka, Michael, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech

Jaskowski (2016). łViZDoom: A Doom-based AI Research Platform for Visual

Reinforcement Learningž. In: IEEE Conference on Computational Intelligence and

Games. IEEE, pp. 341ś348. (Link).

Kennedy, James and Russell C. Eberhart (1995). łParticle Swarm Optimizationž. In:

Proceedings of the International Conference on Neural Networks. Vol. 4, pp. 1942ś

1948. (Link).

Kennedy, James, Russell C. Eberhart, and Yuhui Shi (2001). Swarm Intelligence. San

Francisco: Kaufmann. (Link).

Kermack, William O. and Anderson G. McKendrick (1927). łA Contribution to the

Mathematical Theory of Epidemicsž. In: Proceedings of the Royal Society of London

Series A 115.772, pp. 700ś721. (Link).

Khadka, Shauharda, Jen J. Chung, and Kagan Tumer (2019). łNeuroevolution of a Modular

Memory-Augmented Neural Network for Deep Memor y Problemsž. In: Evolutionary

Computation 27, pp. 639ś664. (Link).

Khadka, Shauharda and Kagan Tumer (2018). łEvolution-guided Policy Gradient in

Reinforcement Learningž. In: Advances in Neural Information Processing Systems 31,

pp. 1196ś1208. (Link).

Kingma, Diederik P. and Max Welling (2014). łAuto-Encoding Variational Bayesž. In:

Proceedings of the Second International Conference on Learning Representations.

(Link).

Kirby, Simon, Tom Griﬃths, and Kenny Smith (2014). łIterated Learning and the Evolution

of Languagež. In: Current Opinion in Neurobiology 28, pp. 108ś114. (Link).

Kirschner, Marc and John Gerhart (1998). łEvolvabilityž. In: Proceedings of the National

Academy of Sciences 95, pp. 8420ś8427. (Link).

Kitano, Hiroaki (1990). łDesigning Neural Networks Using Genetic Algorithms with

Graph Generation Systemž. In: Complex Systems 4, pp. 461ś476. (Link).

Knight, Chris and Camilla Power (2012). łSocial Conditions for the Evolutionary Emer-

gence of Languagež. In: The Oxford Handbook of Language Evolution. Ed. by Maggie

Tallerman and Kathleen R. Gibson. Oxford, UK: Oxford University Press, pp. 346ś349.

(Link).

Kohl, Nate and Risto Miikkulainen (2011). łAn Integrated Neuroevolutionary Approach

to Reactive Control and High-level Strategyž. In: IEEE Transactions on Evolutionary

Computation, pp. 472ś488. (Link).

Koppejan, Rogier and Shimon Whiteson (2011). łNeuroevolutionary Reinforcement Learn-

ing for Generalized Control of Simulated Helicoptersž. In: Evolutionary Intelligence

4, pp. 219ś241.

(Link).

Korshunova, Maria, Niles Huang, Stephen Capuzzi, Dmytro S. Radchenko, Olena Savych,

Yuriy S. Moroz, Carrow I. Wells, Timothy M. Willson, Alexander Tropsha, and

Olexandr Isayev (2022). łGenerative and Reinforcement Learning Approaches for

427

REFERENCES

the Automated De Novo Design of Bioactive Compoundsž. In: Communications

Chemistry 5.1, p. 129. (Link).

Kotyan, Shashank and Danilo Vasconcellos Vargas (2020). łTowards Evolving Robust

Neural Architectures to Defend from Adversarial Attacksž. In: GECCO’20: Proceed-

ings of the 2020 Genetic and Evolutionary Computation Conference Companion,

pp. 135ś136. (Link).

Koutník, Jan, Giuseppe Cuccu, Jürgen Schmidhuber, and Faustino Gomez (2013). łEvolv-

ing Large-scale Neural Networks for Vision-Based Reinforcement Learningž. In:

GECCO’13: Proceedings of the 15th Annual Conference on Genetic and Evolutionary

Computation, pp. 1061ś1068. (Link).

Koutník, Jan, Faustino Gomez, and Jürgen Schmidhuber (2010). łEvolving Neural Net-

works in Compressed Weight Spacež. In: Proceedings of the 12th Annual Conference

on Genetic and Evolutionary Computation, pp. 619ś626. (Link).

Koza, John R. (1992). Genetic Programming: On the Programming of Computers by

Means of Natural Selection. Cambridge, MA: MIT Press. (Link).

(1994). łGenetic Programming as a Means for Programming Computers by Natural

Selectionž. In: Statistics and Computing 4, pp. 87ś112. (Link).

Kramer, Oliver (2010). łEvolutionary Self-adaptation: A Survey of Operators and Strategy

Parametersž. In: Evolutionary Intelligence 3, pp. 51ś65. (Link).

Krizhevsky, Alex, Ilya Sutskever, and Geoﬀrey E. Hinton (2012). łImagenet Classiﬁcation

with Deep Convolutional Neural Networksž. In: Advances in Neural Information

Processing Systems 25, pp. 1106ś1114. (Link).

Kumar, Akarsh, Jeﬀ Clune, Joel Lehman, and Kenneth O. Stanley (2025). łQuestioning

Representational Optimism in Deep Learning: The Fractured Entangled Representation

Hypothesisž. In: arXiv:2505.11581. (Link).

Kumar, Akarsh, Bo Liu, Risto Miikkulainen, and Peter Stone (2022). łEﬀective Mutation

Rate Adaptation through Group Elite Selectionž. In: GECCO’22: Proceedings of the

Genetic and Evolutionary Computation Conference, pp. 712ś720. (Link).

Kumar, Akarsh, Chris Lu, Louis Kirsch, Yujin Tang, Kenneth O. Stanley, Phillip Isola,

and David Ha (2024). łAutomating the Search for Artiﬁcial Life with Foundation

Modelsž. In: arXiv:2412.17799. (Link).

Kwon, Jaerock and Yoonsuck Choe (2009). łFacilitating Neural Dynamics for Delay

Compensation: A Road to Predictive Neural Dynamics?ž In: Neural Networks 22,

pp. 267ś276. (Link).

La Cava, William, Bogdan Burlacu, Marco Virgolin, Michael Kommenda, Patryk Orze-

chowski, Fabrício Olivetti de França, Ying Jin, and Jason H. Moore (2021). łContem-

porary Symbolic Regression Methods and Their Relative Performancež. In: NeurIPS

Datasets and Benchmarks 2021, pp. 695ś710. (Link).

Lacal, Irene and Rossella Ventura (2018). łEpigenetic Inheritance: Concepts, Mechanisms

and Perspectivesž. In: Frontiers of Molecular Neuroscience 11. Article 292. (Link).

Lake, Brenden M., Ruslan R. Salakhutdinov, and Joshua B. Tenenbaum (2015). łHuman-

level Concept Learning through Probabilistic Program Inductionž. In: Science 350,

pp. 1332ś1338. (Link).

428

REFERENCES

Lamarck, Jean-Baptiste (1809). Zoological Philosophy: An Exposition with Regard to the

Natural History of Animals. Translated from the French Philosophie Zoologique by

Hugh Elliot, 1914. Chicago: University of Chicago Press. (Link).

Lange, Robert T. (2023). łevosax: Jax-based Evolution Strategiesž. In: GECCO’23

Companion: Proceedings of the Companion Conference on Genetic and Evolutionary

Computation, pp. 659ś662. (Link).

Lange, Robert T., Yingtao Tian, and Yujin Tang (2024a). łEvolution Transformer: In-

context Evolutionary Optimizationž. In: GECCO’24: Proceedings of the Genetic and

Evolutionary Computation Conference Companion, pp. 575ś578. (Link).

(2024b). łLarge Language Models as Evolution Strategiesž. In: GECCO’24: Proceed-

ings of the Genetic and Evolutionary Computation Conference Companion, pp. 579ś

582. (Link).

Larranaga, Pedro and Jose Lozano, eds. (2002). Estimation of Distribution Algorithms: A

New Tool for Evolutionary Computation. Dordrecht, The Netherlands: Kluwer. (Link).

LeCun, Yann, Yoshua Bengio, and Geoﬀrey E. Hinton (2015). łDeep Learningž. In:

Nature 521, pp. 436ś444. (Link).

Lehman, Joel, Jeﬀ Clune, Dusan Misevic, Christoph Adami, Julie Beaulieu, Peter J.

Bentley, Samuel Bernard, Guillaume Beslon, David M. Bryson, Patryk Chrabaszcz,

Nick Cheney, Antoine Cully, Stéphane Doncieux, Fred C. Dyer, Kai O. Ellefsen,

Robert Feldt, Stephan Fischer, Stephanie Forrest, Antoine Frénoy, Christian Gagné,

Leni K. Le Goﬀ, Laura M. Grabowski, Babak Hodjat, Frank Hutter, Laurent Keller,

Carole Knibbe, Peter Krcah, Richard E. Lenski, Hod Lipson, Robert MacCurdy,

Carlos Maestre, Risto Miikkulainen, Sara Mitri, David E. Moriarty, Jean-Baptiste

Mouret, Anh M. Nguyen, Charles Ofria, Marc Parizeau, David P. Parsons, Robert T.

Pennock, William F. Punch, Thomas S. Ray, Marc Schoenauer, Eric Shulte, Karl Sims,

Kenneth O. Stanley, François Taddei, Danesh Tarapore, Simon Thibault, Westley

Weimer, Richard A. Watson, and Jason Yosinski (2020). łThe Surprising Creativity

of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation

and Artiﬁcial Life Research Communitiesž. In: Artiﬁcial Life 26, pp. 274ś306. (Link).

Lehman, Joel, Jonathan Gordon, Shawn Jain, Kamal Ndousse, Cathy Yeh, and Kenneth O.

Stanley (2023). łEvolution Through Large Modelsž. In: Handbook of Evolutionary

Machine Learning. Ed. by Wolfgang Banzhaf, Penousal Machado, and Mengjie Zhang.

New York: Springer, pp. 331ś366. (Link).

Lehman, Joel and Risto Miikkulainen (2013). łBoosting Interactive Evolution using

Human Computation Marketsž. In: Proceedings of the 2nd International Conference

on the Theory and Practice of Natural Computation, pp. 1ś18. (Link).

(2014). łOvercoming Deception in Evolution of Cognitive Behaviorsž. In: GECCO’14:

Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation,

pp. 185ś192. (Link).

(2015). łExtinction Events Can Accelerate Evolutionž. In: PLoS ONE 10, e0132886.

(Link).

Lehman, Joel and Kenneth O. Stanley (2008). łExploiting Open-Endedness to Solve

Problems Through the Search for Noveltyž. In: Artiﬁcial Life XI: Proceedings of the

Eleventh International Conference on the Synthesis and Simulation of Living Systems.

429

REFERENCES

Ed. by Seth Bullock, Jason Noble, Richard A. Watson, and Mark A. Bedau. Cambridge,

MA: MIT Press, pp. 329ś336. (Link).

Lehman, Joel and Kenneth O. Stanley (2011a). łAbandoning Objectives: Evolution

Through the Search for Novelty Alonež. In: Evolutionar y Computation 19, pp. 189ś

223. (Link).

(2011b). łEvolving a Diversity of Virtual Creatures through Novelty Search and Local

Competitionž. In: GECCO’11: Proceedings of the 13th Annual Conference on Genetic

and Evolutionary Computation, pp. 211ś218. (Link).

(2012). łBeyond Open-endedness: Quantifying Impressivenessž. In: Ar tiﬁcial Life

13: Proceedings of the Thirteenth International Conference on the Synthesis and

Simulation of Living Systems, pp. 75ś82. (Link).

Lehmann, Kenna D. S., Tracy M. Montgomery, Sarah M. MacLachlan, Jenna M. Parker,

Olivia S. Spagnuolo, Kelsey J. VandeWetering, Patrick S. Bills, and Kay E. Holekamp

(2016). łLions, Hyenas and Mobs (Oh My!)ž In: Current Zoology 63, pp. 313ś322.

(Link).

Lenartowicz, Agatha and Russell A. Poldrack (2010). łBrain Imagingž. In: Encyclopedia

of Behavioral Neuroscience. Ed. by George F. Koob, Michel Le Moal, and Richard F.

Thompson. Oxford: Academic Press, pp. 187ś193. (Link).

Lessin, Dan, Don Fussell, and Risto Miikkulainen (2013). łOpen-Ended Behavioral

Complexity for Evolved Virtual Creaturesž. In: GECCO’13: Proceedings of the 15th

Annual Conference on Genetic and Evolutionary Computation, pp. 335ś342. (Link).

(2014). łAdapting Morphology to Multiple Tasks in Evolved Virtual Creaturesž. In:

Artiﬁcial Life 14: Proceedings of the Fourteenth International Conference on the

Synthesis and Simulation of Living Systems. (Link).

Lettvin, Jerome Y., Humberto R. Maturana, Warren S. McCulloch, and Walter H. Pitts

(1940). łWhat the Frog’s Eye Tells the Frog’s Brainž. In: Proceedings of the IRE,

pp. 1940ś1951. (Link).

Leung, Binggwong, Worasuchad Haomachai, Joachim Winther Pedersen, Sebastian Risi,

and Poramate Manoonpong (2025). łBio-Inspired Plastic Neural Networks for Zero-

Shot Out-of-Distribution Generalization in Complex Animal-Inspired Robotsž. In:

arXiv:2503.12406. (Link).

Li, Hui, Xuesong Wang, and Shifei Ding (2018). łResearch and Development of Neural

Network Ensembles: A Surveyž. In: Artiﬁcial Intelligence Review 49, pp. 455ś479.

(Link).

Li, Liam and Ameet Talwalkar (2020). łRandom Search and Reproducibility for Neural

Architecture Searchž. In: Proceedings of the 36th Conference on Uncertainty in

Artiﬁcial Intelligence, pp. 367ś377. (Link).

Li, Xun and Risto Miikkulainen (2016). łEvolving Artiﬁcial Language Through Evolution-

ary Reinforcement Learningž. In: ALIFE 2016, the Fifteenth International Conference

on the Synthesis and Simulation of Living Systems. Ed. by Carlos Gershenson, Tom

Froese, Jesus M. Siqueiros, Wendy Aguilar, Eduardo J. Izquierdo, and Hiroki Sayama.

Cambridge, MA: MIT Press, pp. 484ś491. (Link).

430

REFERENCES

Li, Xun and Risto Miikkulainen (2018). łOpponent Modeling and Exploitation in Poker

Using Evolved Recurrent Neural Networksž. In: GECCO’18: Proceedings of The

Genetic and Evolutionary Computation Conference, pp. 189ś196. (Link).

Liang, Jason, Santiago Gonzalez, Hormoz Shahrzad, and Risto Miikkulainen (2021).

łRegularized Evolutionary Population-Based Trainingž. In: GECCO’21: Proceedings

of the Genetic and Evolutionary Computation Conference, pp. 323ś331. (Link).

Liang, Jason, Elliot Meyerson, Babak Hodjat, Dan Fink, Karl Mutch, and Risto Miikku-

lainen (2019). łEvolutionary Neural AutoML for Deep Learningž. In: GECCO’19:

Proceedings of the Genetic and Evolutionary Computation Conference, pp. 401ś409.

(Link).

Liang, Jason, Elliot Meyerson, and Risto Miikkulainen (2018). łEvolutionary Architecture

Search for Deep Multitask Networksž. In: GECCO’18: Proceedings of the Genetic

and Evolutionary Computation Conference, pp. 466ś473. (Link).

Liang, Jason and Risto Miikkulainen (2015). łEvolutionary Bilevel Optimization for

Complex Control Tasksž. In: GECCO’15: Proceedings of the 2015 Annual Conference

on Genetic and Evolutionary Computation, pp. 833ś839. (Link).

Liang, Jason, Hormoz Shahrzad, and Risto Miikkulainen (2023). łAsynchronous Evolution

of Deep Neural Network Architecturesž. In: Applied Sof t Computing 152, p. 111209.

(Link).

Liang, Tengyuan, Tomaso Poggio, Alexander Rakhlin, and James Stokes (2019). łFisher-

Rao Metric, Geometry, and Complexity of Neural Networksž. In: The 22nd Interna-

tional Conference on Artiﬁcial Intelligence and Statistics, pp. 888ś896. (Link).

Liao, Zhibin, Tom Drummond, Ian Reid, and Gustavo Carneiro (2018). łApproximate

Fisher Information Matrix to Characterize the Training of Deep Neural Networksž.

In: IEEE Transactions on Pattern Analysis and Machine Intelligence 42, pp. 15ś26.

(Link).

Liapis, Antonios, Georgios N. Yannakakis, and Julian Togelius (2011). łNeuroevolutionary

constrained optimization for content creationž. In: Proceedings of the IEEE Conference

on Computational Intelligence and Games, pp. 71ś78. (Link).

Light, Will (1993). łRidge Functions, Sigmoidal Functions and Neural Networksž. In:

Approximation Theory VII. Ed. by Elliot W. Cheney, Charles K. Cui, and Larry L.

Schumaker. Boston: Academic Press, pp. 158ś201.

Lim, Heejin and Yoonsuck Choe (2006). łFacilitating Neural Dynamics for Delay

Compensation and Prediction in Evolutionary Neural Networksž. In: GECCO’06:

Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation,

pp. 167ś174. (Link).

Lindenmayer, Aristid (1968a). łMathematical Models for Cellular Interactions in Devel-

opment I. Filaments with One-sided Inputsž. In: Journal of Theoretical Biology 18,

pp. 280ś299. (Link).

(1968b). łMathematical Models for Cellular Interactions in Development II. Simple

and Branching Filaments with Two-sided Inputsž. In: Jour nal of Theoretical Biology

18, pp. 300ś315. (Link).

Lipson, Hod and Jordan B. Pollack (2000). łAutomatic Design and Manufacture of Robotic

Lifeformsž. In: Nature 406, pp. 974ś978. (Link).

431

REFERENCES

Liu, Aixin et al. (2024). łDeepSeek-V3 Technical Repor tž. In: arXiv:2412.19437. (Link).

Liu, Rosanne, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank, Alex Sergeev,

and Jason Yosinski (2018). łAn Intriguing Failing of Convolutional Neural Networks

and the Coordconv Solutionž. In: Advances in Neural Information Processing Systems

31, pp. 9605ś9616. (Link).

Liu, Yuqiao, Yanan Sun, Bing Xue, Mengjie Zhang, Gary G. Yen, and Kay C. Tan (2021).

łA Survey on Evolutionary Neural Architecture Searchž. In: IEEE Transactions on

Neural Networks and Learning Systems, pp. 1ś21. (Link).

Liu, Zhenhua, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao (2021). łEvo-

lutionary Quantization of Neural Networks with Mixed-Precisionž. In: Proceedings

of the IEEE International Conference on Acoustics, Speech and Signal Processing,

pp. 2785ś2789. (Link).

Liu, Ziming, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin

Soljačić, Thomas Y. Hou, and Max Tegmark (2025). łKAN: Kolmogorov-Arnold

Networksž. In: Proceedings of the Thirteenth International Conference on Learning

Representations, pp. 66342ś66388. (Link).

Lockett, Alan and Risto Miikkulainen (2013). łNeuroannealing: Martingale-driven Learn-

ing for Neural Networkž. In: GECCO’13: Proceedings of the 15th Annual Conference

on Genetic and Evolutionary Computation, pp. 711ś718. (Link).

Lorenzo, Pablo Ribalta, Jakub Nalepa, Michal Kawulok, Luciano Sanchez Ramos, and José

Ranilla Pastor (2017). łParticle Swarm Optimization for Hyper-parameter Selection in

Deep Neural Networksž. In: GECCO’17: Proceedings of the Genetic and Evolutionary

Computation Conference, pp. 481ś488. (Link).

Lozano, Jose A., Pedro Larrañaga, Iñaki Inza, and Endika Bengoetxea (2006). Towards a

New Evolutionary Computation: Advances on Estimation of Distribution Algorithms.

New York: Springer. (Link).

Lu, Sen and Abhronil Sengupta (2022). łNeuroevolution Guided Hybrid Spiking Neural

Network Trainingž. In: Frontiers in Neuroscience 16, p. 838523. (Link).

Lu, Zhichao, Kalyanmoy Deb, Erik Goodman, Wolfgang Banzhaf, and Vishnu N. Bod-

deti (2020). łNSGANetV2: Evolutionary Multi-objective Surrogate-assisted Neural

Architecture Searchž. In: Computer Vision—ECCV 2020. Vol. 12346, pp. 35ś51.

(Link).

Lüders, Benno, Mikkel Schläger, and Sebastian Risi (2016). łContinual Learning through

Evolvable Neural Turing Machinesž. In: Workshop on Continual Learning and Deep

Networks, Neural Information Processing Systems Conference. (Link).

Luke, Sean and Lee Spector (1996). łEvolving Graphs and Networks with Edge Encoding:

Preliminary Reportž. In: Late=Breaking Papers at the Genetic Programming 1996

Conference, pp. 117ś124. (Link).

Luo, Calvin (2022). łUnderstanding Diﬀusion Models: A Uniﬁed Perspectivež. In:

arXiv:2208.11970. (Link).

Lynch, Michael (2007). łThe Frailty of Adaptive Hypotheses for the Origins of Organismal

Complexityž. In: Proceedings of the National Acadademy of Sciences 104, pp. 8597ś

8604. (Link).

432

REFERENCES

MacNeilage, Peter F. (1998). łThe Frame/Content Theory of Evolution of Speech

Productionž. In: Behavioral and Brain Sciences 21, pp. 499ś511. (Link).

Maheri, Alireza, Shahin Jalili, Yousef Hosseinzadeh, Reza Khani, and Mirreza Miryahyavi

(2021). łA Comprehensive Survey on Cultural Algorithmsž. In: Swarm and Evolu-

tionary Computation 62, p. 100846. (Link).

Makoviychuk, Viktor, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey,

Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and

Gavriel State (2021). łIsaac Gym: High Performance GPU Based Physics Simulation

For Robot Learningž. In: NeurIPS Datasets and Benchmarks 2021, pp. 1186ś1198.

(Link).

Mańdziuk, Jacek and Piotr Rajkiewicz (2016). łNeuro-evolutionary system for FOREX

tradingž. In: Proceedings of the IEEE Congress on Evolutionary Computation,

pp. 4654ś4661. (Link).

Mańdziuk, Jacek and Adam Żychowski (2023). łDuel-based neuroevolutionary method

for Stackelberg Security Games with boundedly rational Attackerž. In: Applied Soft

Computing 146, p. 110673. (Link).

Mao, Xudong, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen P.

Smolley (2017). łLeast Squares Generative Adversarial Networksž. In: Proceedings

of the IEEE International Conference on Computer Vision, pp. 2813ś2821. (Link).

Markram, Henry, Yun Wang, and Michail Tsodyks (1998). łDiﬀerential Signaling via

the Same Axon of Neocor tical Pyramidal Neuronsž. In: Proceedings of the National

Academy of Sciences of the United States of America 95, pp. 5323ś5328. (Link).

Masoudnia, Saeed and Reza Ebrahimpour (2014). łMixture of Experts: A Literature

Surveyž. In: Artiﬁcial Intelligence Review 42, p. 275. (Link).

Mattiussi, Claudio and Dario Floreano (2007). łAnalog Genetic Encoding for the Evolution

of Circuits and Networksž. In: IEEE Transactions on Evolutionary Computation 11.5,

pp. 596ś607. (Link).

Maynard Smith, J. and Eörs Szathmáry (1997). The Major Transitions in Evolution. Oxford,

UK: Oxford University Press. (Link).

McQuesten, Paul (2002). łCultural Enhancement of Neuroevolutionž. PhD thesis. Austin,

TX: Department of Computer Sciences, The University of Texas at Austin. (Link).

McQuesten, Paul and Risto Miikk ulainen (1997). łCulling and Teaching in Neuro-

Evolutionž. In: Proceedings of the Seventh International Conference on Genetic

Algorithms, pp. 760ś767. (Link).

Meoded, Avner, Andrea Poretti, Susumu Mori, and Jiangyang Zhang (2016). łDiﬀusion

Tensor Imaging (DTI)ž. In: The Curated Reference Collection in Neuroscience and

Biobehavioral Psychology. Amsterdam: Elsevier. (Link).

Meredith, Robert W., Jan E. Janečka, John Gatesy, Oliver A. Ryder, Colleen A. Fisher,

Emma C. Teeling, Alisha Goodbla, Eduardo Eizirik, Taiz L. L. Simão, Tanja Stadler,

Daniel L. Rabosky, Rodney L. Honeycutt, John J. Flynn, Colleen M. Ingram, Cynthia

Steiner, Tiﬀani L. Williams, Terence J. Robinson, Angela Burk-Herrick, Michael

Westerman, Nadia A. Ayoub, Mark S. Springer, and William J. Murphy (2011).

łImpacts of the Cretaceous Terrestr ial Revolution and KPg Extinction on Mammal

Diversiﬁcationž. In: Science 334, pp. 521ś524. (Link).

433

REFERENCES

Metzen, Jan H., Frank Kirchner, Mark Edgington, and Yohannes Kassahun (2008). łTo-

wards Eﬃcient Online Reinforcement Learning Using Neuroevolutionž. In: GECCO’08:

Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation,

pp. 1425ś1426. (Link).

Meyerson, Elliot, Olivier Francon, Darren Sargent, Babak Hodjat, and Risto Miikkulainen

(2024). łUnlocking the Potential of Global Human Expertisež. In: Advances in Neural

Information Processing Systems 37, pp. 119227ś119259. (Link).

Meyerson, Elliot, Joel Lehman, and Risto Miikkulainen (2016). łLearning Behavior

Characterizations for Novelty Searchž. In: GECCO’16: Proceedings of the Genetic

and Evolutionary Computation Conference 2016, pp. 149ś156. (Link).

Meyerson, Elliot and Risto Miikkulainen (2017). łDiscovering Evolutionary Stepping

Stones through Behavior Dominationž. In: GECCO’17: Proceedings of the Genetic

and Evolutionary Computation Conference. Berlin, Germany, pp. 139ś146. (Link).

(2018a). łBeyond Shared Hierarchies: Deep Multitask Learning through Soft Layer

Orderingž. In: Proceedings of the Sixth International Conference on Learning Repre-

sentations, pp. 1401ś1414. (Link).

(2018b). łPseudo-task Augmentation: From Deep Multitask Learning to Intratask

SharingÐand Backž. In: Proceedings of the 35th International Conference on Machine

Learning, pp. 739ś748. (Link).

(2019). łModular Universal Reparameterization: Deep Multi-task Learning Across

Diverse Domainsž. In: Advances in Neural Information Processing Systems 32,

pp. 7901ś7912. (Link).

(2021). łThe Traveling Observer Model: Multi-task Learning Through Spatial Variable

Embeddingsž. In: Proceedings of the Ninth International Conference on Learning

Representations, pp. 2706ś2722. (Link).

Meyerson, Elliot, Mark J. Nelson, Herbie Bradley, Adam Gaier, Arash Moradi, Amy K.

Hoover, and Joel Lehman (2024). łLanguage Model Crossover: Variation through Few-

Shot Promptingž. In: ACM Transactions on Evolutionary Learning and Optimization

4. Article 27. (Link).

Meyerson, Elliot, Xin Qiu, and Risto Miikkulainen (2022). łSimple Genetic Operators

are Universal Approximators of Probability Distributions (and other Advantages of

Expressive Encodings)ž. In: GECCO’22: Proceedings of the Genetic and Evolutionary

Computation Conference, pp. 739ś748. (Link).

Miconi, Thomas (2008). łIn silicon No One Can Hear You Scream: Evolving Fighting

Creaturesž. In: Genetic Programming: 11th European Conference. Ed. by Michael

O’Neill, Leonardo Vanneschi, Steven Gustafson, Anna I. Esparcia Alcázar, Ivanoe De

Falco, Antonio Della Cioppa, and Ernesto Tarantino. New York: Springer, pp. 25ś36.

(Link).

(2009). łWhy Coevolution Doesn’t łWorkž: Superiority and Progress in Coevolutionž.

In: Genetic Programming: 12th European Conference. Ed. by Leonardo Vanneschi,

Steven Gustafson, Alberto Moraglio, Ivanoe de Falco, and Marc Ebner. New York:

Springer, pp. 49ś60. (Link).

Miikkulainen, Risto (2021). łCreative AI through Evolutionary Computation: Principles

and Examplesž. In: SN Computer Science 2, p. 163. (Link).

434

REFERENCES

Miikkulainen, Risto (2024). łGenerative AI: An AI Paradigm Shift in the Making?ž In:

AI Magazine, pp. 165ś167. (Link).

(2025). łNeuroevolution Insights Into Biological Neural Computationž. In: Science,

eadp7478. (Link).

Miikkulainen, Risto, James A. Bednar, Yoonsuck Choe, and Joseph Sirosh (2005).

Computational Maps in the Visual Cortex. New York: Springer. (Link).

Miikkulainen, Risto, Myles Brundage, Jonathan Epstein, Tyler Foster, Babak Hodjat,

Neil Iscoe, Jingbo Jiang, Diego Legrand, Sam Nazari, Xin Qiu, Michael Scharﬀ, Cory

Schoolland, Robert Severn, and Aaron Shagrin (2020). łAscend by Evolv: AI-Based

Massively Multivariate Conversion Rate Optimizationž. In: AI Magazine 42, pp. 44ś60.

(Link).

Miikkulainen, Risto and Michael G. Dyer (1991). łNatural Language Processing With

Modular PDP Networks And Distributed Lexiconž. In: Cognitive Science 15, pp. 343ś

399. (Link).

Miikkulainen, Risto, Dan Fink, Olivier Francon, Babak Hodjat, Noravee Kanchanavatee,

Elliot Meyerson, Xin Qiu, Darren Sargent, Hormoz Shahrzad, Deepak Singh, Jean

Celestin Yamegni Noubeyo, and Daniel Young (2025). NeuroSAN+NeuroAI: AI-

assisted Decision-making through a Synergy of Technologies. Tech. rep. 2025-01.

Cognizant AI Lab. (Link).

Miikkulainen, Risto and Stephanie Forrest (2021). łA Biological Perspective on Evolu-

tionary Computationž. In: Nature Machine Intelligence 3, pp. 9ś15. (Link).

Miikkulainen, Risto, Olivier Francon, Elliot Meyerson, Xin Qiu, Darren Sargent, Elisa

Canzani, and Babak Hodjat (2021). łFrom Prediction to Prescription: Evolutionary

Optimization of Non-Pharmaceutical Interventions in the COVID-19 Pandemicž. In:

IEEE Transactions on Evolutionary Computation 25, pp. 386ś401. (Link).

Miikkulainen, Risto, Jason Liang, Elliot Meyerson, Aditya Rawal, Dan Fink, Olivier

Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duﬀy, and Babak

Hodjat (2023). łEvolving Deep Neural Networksž. In: Artiﬁcial Intelligence in the

Age of Neural Networks and Brain Computing (second edition). Ed. by Robert Kozma,

Cesare Alippi, Yoonsuck Choe, and Francesco C. Morabito. Amsterdam: Elsevier,

pp. 269ś287. (Link).

Miikkulainen, Risto, Elliot Meyerson, Xin Qiu, Ujjayant Sinha, Raghav Kumar, Karen

Hofmann, Yiyang M. Yan, Michael Ye, Jingyan Yang, Damon Caiazza, and Stephanie

Manson Brown (2021). łEvaluating Medical Aesthetics Treatments through Evolved

Age-Estimation Modelsž. In: GECCO’21: Proceedings of the Genetic and Evolutionary

Computation Conference, pp. 1009ś1017. (Link).

Miller, Geoﬀrey F., Peter Todd, and Shailesh Hedge (1989). łDesigning Neural Networks

Using Genetic Algorithmž. In: Proceedings of the Third International Conference on

Genetic Algorithms, pp. 391ś396. (Link).

Miller, Julian F. (2004). łEvolving a Self-repairing, Self-regulating, French Flag Organismž.

In: Genetic and Evolutionary Computation–GECCO 2004, pp. 129ś139. (Link).

Ð ed. (2011). Cartesian Genetic Programming. New York: Springer.

(Link).

(2020). łCartesian Genetic Programming: Its Status and Futurež. In: Genetic Pro-

gramming and Evolvable Machines 21, pp. 129ś168. (Link).

435

REFERENCES

Miller, Julian F. and Andrew Turner (2015). łCartesian Genetic Programmingž. In:

GECCO Companion ’15: Proceedings of the Companion Publication of the 2015

Annual Conference on Genetic and Evolutionary Computation, pp. 179ś198. (Link).

Min, Bonan, Hayley Ross, Elior Sulem, Amir P. B. Veyseh, Thien H. Nguyen, Oscar Sainz,

Eneko Agirre, Ilana Heintz, and Dan Roth (2024). łRecent Advances in Natural

Language Processing via Large Pre-trained Language Models: A Surveyž. In: ACM

Computing Surveys 56, 30:1ś30:40. (Link).

Mistral AI (2024). Models Overview. https://docs.mistral.ai/getting-started/models/models_

overview/. Retrieved 8/31/2025.

Mitchell, Melanie (2006). łCoevolutionary Learning with Spatially Distributed Popu-

lationsž. In: Computational Intelligence: Principles and Practice. Ed. by Gary Y.

Yen and David B. Fogel. Piscataway, NJ: IEEE Computational Intelligence Society,

pp. 137ś154. (Link).

Mitchell, Melanie, James P. Crutchﬁeld, and Rajarshi Das (1996). łEvolving Cellular

Automata with Genetic Algorithms: A Review of Recent Workž. In: Proceedings of

the First International Conference on Evolutionary Computation and Its Applications,

pp. 42ś55. (Link).

Mjolsness, Eric, David H. Sharp, and Bradley K. Alpert (1989). łScaling, Machine

Learning, and Genetic Neural Netsž. In: Advances in Applied Mathematics 10,

pp. 137ś163. (Link).

Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc

G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski,

Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan

Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis (2015). łHuman-level

Control through Deep Reinforcement Learningž. In: Nature 518, pp. 529ś533. (Link).

Montana, David J. and Lawrence Davis (1989). łTraining Feedforward Neural Networks

Using Genetic Algorithmsž. In: Proceedings of the 11th International Joint Conference

on Artiﬁcial Intelligence, pp. 762ś767. (Link).

Mordvintsev, Alexander, Ettore Randazzo, Eyvind Niklasson, and Michael Levin (2020).

łGrowing Neural Cellular Automataž. In: Distill 5.2, e23. (Link).

Morgan, Nelson and Hervé Bourlard (1990). łGeneralization and Parameter Estimation in

Feedforward Nets: Some Experimentsž. In: Advances in Neural Information Processing

Systems 3, pp. 630ś637. (Link).

Moriarty, David E. and Pat Langley (1998). łLearning Cooperative Lane Selection

Strategies for Highwaysž. In: Proceedings of the AAAI Conference on Artiﬁcial

Intelligence, 15, pp. 684ś691. (Link).

Moriarty, David E. and Risto Miikkulainen (1996). łEvolving Obstacle Avoidance

Behavior In A Robot Armž. In: From Animals to Animats 4: Proceedings of the

Fourth International Conference on Simulation of Adaptive Behavior. Ed. by Pattie

Maes, Maja J. Mataric, Jean-Arcady Meyer, Jordan Pollack, and Stewart W. Wilson.

Cambridge, MA: MIT press, pp. 468ś475. (Link).

(1997). łForming Neural Networks Through Eﬃcient And Adaptive Coevolutionž. In:

Evolutionary Computation 5, pp. 373ś399. (Link).

436

REFERENCES

Mouret, Jean-Baptiste and Jeﬀ Clune (2015). łIlluminating Search Spaces by Mapping

Elitesž. In: arXiv:1504.04909. (Link).

Mouret, Jean-Baptiste and Stéphane Doncieux (2009). łOvercoming the Bootstrap Problem

in Evolutionary Robotics Using Behavioral Diversityž. In: Proceedings of the IEEE

Congress on Evolutionary Computation, pp. 1161ś1168. (Link).

(2012). łEncouraging Behavioral Diversity in Evolutionary Robotics: An Empirical

Studyž. In: Evolutionary Computation 20, pp. 91ś133. (Link).

Mousavirad, Seyed J., Seyyed M. Tabatabaei, Davood Zabihzadeh, Mahshid H. Moghadam,

Mehran Pourvahab, and Diego Oliva (2025). łEnhancing Neural Network Generalisa-

tion with Improved Diﬀerential Evolutionž. In: Advances in Optimization Algorithms

for Multidisciplinary Engineering Applications: From Classical Methods to AI-

Enhanced Solutions. Ed. by Diego Oliva, Arturo Valdivia, Seyed J. Mousavirad, and

Kanak Kalita. New York: Springer, pp. 455ś470. (Link).

Mühlenbein, Heinz and Jörg Kindermann (1989). łThe Dynamics of Evolution and

Learning: Towards Genetic Neural Networksž. In: Connectionism in Perspective.

Ed. by Rolf Pfeifer, Zoltan Schreter, Françoise Fogelman Soulié, and Luc Steels.

Amsterdam: Elsevier, pp. 301ś308.

Müller, Gerd B. (2014). łEvoDevo Shapes the Extended Synthesisž. In: Biological Theory

9.2, pp. 119ś121. (Link).

Nair, Vinod and Geoﬀrey E. Hinton (2010). łRectiﬁed Linear Units Improve Restricted

Boltzmann Machinesž. In: Proceedings of the 27th International Conference on

Machine Learning, pp. 807ś814. (Link).

Najarro, Elias and Sebastian Risi (2020). łMeta-Learning through Hebbian Plasticity

in Random Networksž. In: Advances in Neural Information Processing Systems 33,

pp. 20719ś20731. (Link).

Najarro, Elias, Shyam Sudhakaran, Claire Glanois, and Sebastian Risi (2022). łHyperNCA:

Growing Developmental Networks with Neural Cellular Automataž. In: Workshop

on From Cells to Societies: Collective Learning Across Scales, Tenth International

Conference on Learning Representations. (Link).

Najarro, Elias, Shyam Sudhakaran, and Sebastian Risi (2023). łTowards Self-Assembling

Artiﬁcial Neural Networks through Neural Developmental Programsž. In: ALIFE

2023: Ghost in the Machine: Proceedings of the 2023 Artiﬁcial Life Conference, p. 80.

(Link).

Newman, Mark E. J. (2002). łSpread of Epidemic Disease on Networksž. In: Physical

Review E 66, p. 016128. (Link).

(2006). łModularity and Community Structure in Networksž. In: Proceedings of the

National Academy of Sciences 103, pp. 8577ś8582. (Link).

Nguyen, Anh M., Jason Yosinski, and Jeﬀ Clune (2015a). łDeep Neural Networks

Are Easily Fooled: High Conﬁdence Predictions for Unrecognizable Imagesž. In:

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,

pp. 427ś436. (Link).

(2015b). łInnovation Engines: Automated Creativity and Improved Stochastic Op-

timization via Deep Learningž. In: GECCO’15: Proceedings of the 2015 Annual

Conference on Genetic and Evolutionary Computation, pp. 959ś966. (Link).

437

REFERENCES

Nichele, Stefano, Mathias B. Ose, Sebastian Risi, and Gunnar Tufte (2017). łCA-NEAT:

Evolved Compositional Pattern Producing Networks for Cellular Automata Morpho-

genesis and Replicationž. In: IEEE Transactions on Cognitive and Developmental

Systems 10.3, pp. 687ś700. (Link).

Nisioti, Eleni, Erwan Plantec, Milton Montero, Joachim Winther Pedersen, and Sebastian

Risi (2024). łGrowing Ar tiﬁcial Neural Networks for Control: The Role of Neuronal

Diversityž. In: GECCO’24 Companion: Proceedings of the Genetic and Evolutionary

Computation Conference Companion, pp. 175ś178. (Link).

Nolﬁ, Stefano (2011). łBehavior and Cognition as a Complex Adaptive System: Insights

from Robotic Experimentsž. In: Philosophy of Complex Systems. Ed. by Cliﬀ Hooker.

Vol. 10. Handbook of the Philosophy of Science. Amsterdam: North-Holland, pp. 443ś

463. (Link).

Nolﬁ, Stefano, Jeﬀrey L. Elman, and Domenico Parisi (1994). łLearning and Evolution in

Neural Networksž. In: Adaptive Behavior 2, pp. 5ś28. (Link).

Nolﬁ, Stefano and Dario Floreano (2000). Evolutionary Robotics: The Biology, Intelligence,

and Technology of Self-organizing Machines. Cambridge, MA: MIT press. (Link).

Nolﬁ, Stefano and Paolo Pagliuca (2025). łGlobal Progress in Competitive Co-evolution:

A Systematic Comparison of Alternative Methodsž. In: Frontiers in Robotics and AI

11. Article 1470886.

(Link).

Nolﬁ, Stefano and Domenico Parisi (1992). łGrowing Neural Networksž. In: Artiﬁcial

Life II: Proceedings of the Workshop on Artiﬁcial Life. Ed. by Christopher G. Langton.

Reading, MA: Addison-Wesley. (Link).

(1994). łDesired Answers Do Not Correspond to Good Teaching Inputs in Ecological

Neural Networksž. In: Neural Processing Letters 1, pp. 1ś5. (Link).

Nordin, Peter and Wolfgang Banzhaf (1995). łComplexity Compression and Evolutionž. In:

Proceedings of the Sixth International Conference on Genetic Algorithms, pp. 310ś317.

(Link).

Novikov, Alexander, Ngân V

u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang,

Adam Z. Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas

Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex

Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog (2025). łAlphaEvolve:

A Coding Agent for Scientiﬁc and Algorithmic Discoveryž. In: arXiv:2506.13131.

(Link).

Nowak, Martin A. and David C. Krakauer (1999). łThe Evolution of Languagež. In:

Proceedings of the National Acadeny of Sciences 96, pp. 8028ś8033. (Link).

Ochoa, Gabriela (1998). łOn genetic algorithms and Lindenmayer systemsž. In: Parallel

Problem Solving from Nature — PPSN V, pp. 335ś344. (Link).

Ochoa, Gabriela, Katherine M Malan, and Christian Blum (2021). łSearch trajectory

networks: A tool for analysing and visualising the behaviour of metaheuristicsž. In:

Applied Soft Computing 109, p. 107492. (Link).

Ollion, Charles, Tony Pinville, and Stéphane Doncieux (2012). łWith a Little Help from

Selection Pressures: Evolution of Memor y in Robot Controllersž. In: Artiﬁcial Life

13: Proceedings of the Thirteenth International Conference on the Synthesis and

Simulation of Living Systems, pp. 407ś414. (Link).

438

REFERENCES

Olson, Randal S., Arend Hintze, Fred C. Dyer, David B. Knoester, and Christoph Adami

(2013). łPredator Confusion is Suﬃcient to Evolve Swarming Behaviourž. In: Journal

of The Royal Society Interface 10, p. 20130305. (Link).

OpenAI (2025). GPT-5 System Card. Tech. rep. OpenAI.

(Link).

Ororbia, Alexander, AbdElRahman ElSaid, and Travis Desell (2019). łInvestigating Re-

current Neural Network Memory Structures Using Neuro-evolutionž. In: GECCO’19:

Proceedings of the Genetic and Evolutionary Computation Conference, pp. 446ś455.

(Link).

Ouyang, Long, Jeﬀ Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin,

Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob

Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder,

Paul Christiano, Jan Leike, and Ryan Lowe (2022). łTraining Language Models to

Follow Instructions with Human Feedbackž. In: Advances in Neural Information

Processing Systems 35, pp. 27730ś27744. (Link).

Oymak, Samet (2018). łLearning Compact Neural Networks with Regularizationž. In:

Proceedings of the 35th International Conference on Machine Learning, pp. 3963ś

3972. (Link).

Papavasileiou, Evgenia, Jan Cornelis, and Bart Jansen (2021). łA Systematic Literature

Review of the Successors of łNeuroEvolution of Augmenting Topologiesłž. In:

Evolutionary Computation 29, pp. 1ś73. (Link).

Papavasileiou, Evgenia and Bart Jansen (2017). łAn investigation of topological choices

in FS-NEAT and FD-NEAT on XOR-based problems of increased complexityž. In:

GECCO’17: Proceedings of the Genetic and Evolutionary Computation Conference

Companion, pp. 1431ś1434. (Link).

Pardoe, David, Michael Ryoo, and Risto Miikkulainen (2005). łEvolving Neural Network

Ensembles for Control Problemsž. In: GECCO’05: Proceedings of the 7th Annual

Conference on Genetic and Evolutionary Computation, pp. 1379ś1384. (Link).

Park, J. and Irwin W. Sandberg (1991). łUniversal Approximation Using Radial-Basis-

Function Networksž. In: Neural Computation 3, pp. 246ś257. (Link).

Pedersen, Joachim Winther and Sebastian Risi (2021). łEvolving and Merging Hebbian

Learning Rules: Increasing Generalization by Decreasing the Number of Rulesž. In:

GECCO’21: Proceedings of the Genetic and Evolutionary Computation Conference,

pp. 892ś900.

(Link).

Pelikan, Martin, David E. Goldberg, and Erick Cantú-Paz (1999). łBOA: The Bayesian

Optimization Algorithmž. In: GECCO’99: Proceedings of the 1st Annual Conference

on Genetic and Evolutionary Computation, pp. 525ś532. (Link).

Petroski Such, Felipe, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O.

Stanley, and Jeﬀ Clune (2017). łDeep Neuroevolution: Genetic Algorithms Are

a Competitive Alternative for Training Deep Neural Networks for Reinforcement

Learningž. In: arXiv:1712.06567. (Link).

Pham, Hieu, Melody Guan, Barret Zoph, Quoc V. Le, and Jeﬀ Dean (2018). łEﬃcient

Neural Architecture Search via Parameter Sharingž. In: Proceedings of the 35th

International Conference on Machine Learning, pp. 4095ś4104. (Link).

439

REFERENCES

Pilat, Martin L. and Chr istian Jacob (2010). łEvolution of Vision Capabilities in Embodied

Virtual Creaturesž. In: GECCO’10: Proceedings of the 12th Annual Conference on

Genetic and Evolutionary Computation, pp. 95ś102. (Link).

Plantec, Erwan, Joachim Winther Pedersen, Milton Montero, Eleni Nisioti, and Sebastian

Risi (2024). łEvolving Self-Assembling Neural Networks: From Spontaneous Activity

to Experience-Dependent Learningž. In: ALIFE 2024: Proceedings of the 2024

Artiﬁcial Life Conference. Paper No: isal_a_00755, 37. (Link).

Polani, Daniel and Risto Miikkulainen (2000). łEugenic Neuro-Evolution for Reinforce-

ment Learningž. In: GECCO’00: Proceedings of the 2nd Annual Conference on

Genetic and Evolutionary Computation, pp. 1041ś1046. (Link).

Poli, Riccardo, William B. Langdon, and Nicholas F. McPhee (2008). A Field Guide to

Genetic Programming. Egham, UK: Lulu Enterprises. (Link).

Pollack, Jordan B. (1987). łCascaded Back-Propagation on Dynamic Connectionist

Networksž. In: Proceedings of the 10th Annual Conference of the Cognitive Science

Society, pp. 391ś404. (Link).

Popovici, Elena, Anthony Bucci, R. Paul Wiegand, and Edwin D. de Jong (2012).

łCoevolutionary Principlesž. In: Handbook of Natural Computing. Ed. by Grzegorz

Rozenberg, Thomas Bäck, and Joost N. Kok. New York: Springer, pp. 987ś1033.

(Link).

Potter, Mitchell A. and Kenneth A. De Jong (2000). łCooperative Coevolution: An

Architecture for Evolving Coadapted Subcomponentsž. In: Evolutionary Computation

8, pp. 1ś29. (Link).

Prellberg, Jonas and Oliver Kramer (2018). łLamarckian Evolution of Convolutional

Neural Networksž. In: Parallel Problem Solving from Nature — PPSN XV. Ed. by

Anne Auger, Carlos M. Fonseca, Nuno Lourenço, Penousal Machado, Luís Paquete,

and Darrell Whitley. New York: Springer, pp. 424ś435. (Link).

Price, Kenneth V., Rainer M. Storn, and Jouni A. Lampinen (2005). Diﬀerential Evolution:

A Practical Approach to Global Optimization. New York: Springer. (Link).

Prior, John (1998). łEugenic Evolution for Combinatorial Optimizationž. MA thesis.

Austin, TX: Department of Computer Sciences, The University of Texas at Austin.

(Link).

Prusinkiewicz, Przemyslaw, Mark Hammel, Jim Hanan, and Radomir Mech (1996).

łL-systems: From the Theory to Visual Models of Plantsž. In: Proceedings of the

CSIRO Symposium on Computational Challenges in Life Sciences, pp. 1ś32. (Link).

Pugh, Justin K., Lisa B. Soros, and Kenneth O. Stanley (2016). łQuality Diversity: A New

Frontier for Evolutionary Computationž. In: Frontiers in Robotics and AI 3, p. 40.

(Link).

Qiu, Xin, Yulu Gan, Conor F. Hayes, Qiyao Liang, Elliot Meyerson, Babak Hodjat, and

Risto Miikkulainen (2025). łEvolution Strategies at Scale: LLM Fine-Tuning Beyond

Reinforcement Learningž. In: arXiv:2509.24372. (Link).

Qiu, Xin, Elliot Meyerson, and Risto Miikkulainen (2020). łQuantifying Point-Prediction

Uncertainty in Neural Networks via Residual Estimation with an I/O Kernelž. In:

Proceedings of the Eighth International Conference on Learning Representations,

pp. 2146ś2180. (Link).

440

REFERENCES

Qiu, Xin and Risto Miikkulainen (2023). łShortest Edit Path Crossover: A Theory-driven

Solution to the Permutation Problem in Evolutionary Neural Architecture Searchž. In:

Proceedings of the 40th International Conference on Machine Learning, pp. 28422ś

28447. (Link).

Radcliﬀe, Nicholas J. (1993). łGenetic Set Recombination and Its Application to Neural

Network Topology Optimisationž. In: Neural Computing & Applications 1, pp. 67ś90.

(Link).

Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning,

and Chelsea Finn (2023). łDirect Preference Optimization: Your Language Model Is

Secretly a Reward Modelž. In: Advances in Neural Information Processing Systems

35, pp. 53728ś53741. (Link).

Rajagopalan, Padmini, Kay E. Holekamp, and Risto Miikkulainen (2014). łThe Evolution

of General Intelligencež. In: Artiﬁcial Life 14: Proceedings of the Fourteenth Inter-

national Conference on the Synthesis and Simulation of Living Systems, pp. 63ś70.

(Link).

(2019). łFactors that Aﬀect the Evolution of Complex Cooperative Behaviorž. In:

ALIFE 2019: The 2019 Conference on Artiﬁcial Life, pp. 333ś340. (Link).

(2020). łEvolution of Complex Coordinated Behaviorž. In: Proceedings of the IEEE

Congress on Evolutionary Computation, pp. 3098ś3105. (Link).

Rajagopalan, Padmini, Aditya Rawal, Risto Miikkulainen, Marc A. Wiseman, and Kay E.

Holekamp (2011). łThe Role of Reward Structure, Coordination Mechanism and Net

Return in the Evolution of Cooperationž. In: Proceedings of the IEEE Conference on

Computational Intelligence and Games, pp. 258ś265. (Link).

Ramachandran, Prajit, Barret Zoph, and Quoc V. Le (2018). łSearching for Activa-

tion Functionsž. In: Workshop Track, Sixth International Conference on Learning

Representations. (Link).

Rasmussen, Carl E. and Christopher K. I. Williams (2006). Gaussian Processes for

Machine Learning. Cambridge, MA: MIT Press. (Link).

Raup, David M. (1986). łBiological Extinction in Earth Historyž. In: Science 231,

pp. 1528ś1533. (Link).

Rawal, Aditya, Janette Boughman, and Risto Miikkulainen (2014). łEvolution of Com-

munication in Mate Selectionž. In: Artiﬁcial Life 14: Proceedings of the Fourteenth

International Conference on the Synthesis and Simulation of Living Systems, pp. 16ś22.

(Link).

Rawal, Aditya and Risto Miikkulainen (2020). łDiscovering Gated Recurrent Neural

Network Architecturesž. In: Deep Neural Evolution – Deep Learning with Evolutionary

Computation. Ed. by Hitoshi Iba and Nasimul Noman. New York: Springer, pp. 233ś

251. (Link).

Rawal, Aditya, Padmini Rajagopalan, and Risto Miikkulainen (2010). łConstructing

Competitive and Cooperative Agent Behavior Using Coevolutionž. In: Proceedings of

the IEEE Conference on Computational Intelligence and Games, pp. 107ś114. (Link).

Real, Esteban, Alok Aggarwal, Yanping Huang, and Quoc V. Le (2019). łRegularized

Evolution for Image Classiﬁer Architecture Searchž. In: Proceedings of the AAAI

Conference on Artiﬁcial Intelligence, 33, pp. 4780ś4789. (Link).

441

REFERENCES

Real, Esteban, Chen Liang, David So, and Quoc V. Le (2020). łAutoML-Zero: Evolving

Machine Learning Algorithms From Scratchž. In: Proceedings of the 37th International

Conference on Machine Learning, pp. 8007ś8019. (Link).

Real, Esteban, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka L. Suematsu, Jie Tan,

Quoc V. Le, and Alexey Kurakin (2017). łLarge-scale Evolution of Image Classiﬁersž.

In: Proceedings of the 34th International Conference on Machine Learning, pp. 2902ś

2911. (Link).

Rechenberg, Ingo (1973). Evolutionsstrategie: Optimierung technischer Systeme nach

Prinzipien der biologischen Evolution. Evolution Strategy: Optimization of Technical

Systems According to the Principles of Biological Evolution. Stuttgart: Frommann-

Holzboog Verlag. (Link).

Reed, Russell (1993). łPruning algorithmsÐA surveyž. In: IEEE Transactions on Neural

Networks 4, pp. 740ś747. (Link).

Reisinger, Joseph and Risto Miikkulainen (2006). łSelecting for Evolvable Representa-

tionsž. In: GECCO’06: Proceedings of the 8th Annual Conference on Genetic and

Evolutionary Computation, pp. 1297ś1304. (Link).

(2007). łAcquiring Evolvability through Adaptive Representationsž. In: GECCO’07:

Proceeedings of the 9th Annual Conference on Genetic and Evolutionary Computation,

pp. 1045ś1052. (Link).

Reynolds, John, James S. Plank, and Catherine Schuman (2019). łIntelligent Reservoir

Generation for Liquid State Machines using Evolutionary Optimizationž. In: Pro-

ceedings of the International Joint Conference on Neural Networks, pp. 3992ś3999.

(Link).

Reynolds, Robert G., Zbigniew Michalewicz, and Michael J. Cavaretta (1995). łUs-

ing Cultural Algorithms for Constraint Handling in GENOCOPž. In: Evolutionary

Programming IV: Proceedings of the Fourth Annual Conference on Evolutionary

Programming. Ed. by John. R. McDonnell, Robert. G. Reynolds, and David B. Fogel.

Cambridge, MA: MIT Press, pp. 289ś305. (Link).

Ribalta Lorenzo, Pablo and Jakub Nalepa (2018). łMemetic Evolution of Deep Neural

Networksž. In: GECCO’18: Proceedings of the Genetic and Evolutionary Computation

Conference, pp. 505ś512. (Link).

Risi, Sebastian, Charles E. Hughes, and Kenneth O. Stanley (2010). łEvolving Plastic

Neural Networks with Novelty Searchž. In: Adaptive Behavior 18, pp. 470ś491. (Link).

Risi, Sebastian, Joel Lehman, David B. D’Ambrosio, Ryan Hall, and Kenneth O. Stanley

(2016). łPetalz: Search-Based Procedural Content Generation for the Casual Gamerž.

In: IEEE Transactions on Computational Intelligence and AI in Games 8, pp. 244ś255.

(Link).

Risi, Sebastian and Kenneth O. Stanley (2010). łIndirectly Encoding Neural Plasticity

as a Pattern of Local Rulesž. In: From Animals to Animats 11: 11th International

Conference on Simulation of Adaptive Behavior, pp. 533ś543. (Link).

(2012a). łA Uniﬁed Approach to Evolving Plasticity and Neural Geometryž. In:

Proceedings of the International Joint Conference on Neural Networks, pp. 1ś8.

(Link).

442

REFERENCES

Risi, Sebastian and Kenneth O. Stanley (2012b). łAn Enhanced Hypercube-based Encoding

for Evolving the Placement, Density, and Connectivity of Neuronsž. In: Artiﬁcial life

18, pp. 331ś363. (Link).

(2019). łDeep Neuroevolution of Recurrent and Discrete World Modelsž. In: GECCO’19:

Proceedings of the Genetic and Evolutionary Computation Conference, pp. 456ś462.

(Link).

(2021). łDeep Innovation Protection: Confronting the Credit Assignment Problem in

Training Heterogeneous Neural Architecturesž. In: Proceedings of the AAAI Conference

on Artiﬁcial Intelligence, 35, pp. 12391ś12399. (Link).

Risi, Sebastian and Julian Togelius (2015). łNeuroevolution in games: State of the art and

open challengesž. In: IEEE Transactions on Computational Intelligence and AI in

Games 9, pp. 25ś41. (Link).

Robson, Ann L. (2023). Critical/Sensitive Periods. https://www.encyclopedia.com/children/

applied-and-social-sciences-magazines/criticalsensitive-periods. Retrieved 8/31/2025.

Rock, David and Heidi Grant (2016). Why Diverse Teams Are Smarter. https://vcportal.

ventura.org/committees/di/HBR._Why_diverse_teams_are_smarter.PDF. Retrieved

8/31/2025.

Rothe, Rasmus, Radu Timofte, and Luc Van Gool (2018). łDeep Expectation of Real

and Apparent Age from a Single Image without Facial Landmarksž. In: International

Journal of Computer Vision 126.2, pp. 144ś157. (Link).

Routley, Nick (2017). Visualizing the Trillion-Fold Increase in Computing Power.

https://www.visualcapitalist.com/visualizing-trillion-fold-increase-computing-power/.

Retrieved 8/31/2025.

Rumelhart, David E., Geoﬀrey E. Hinton, and Ronald J. Williams (1986). łLearning

Internal Representations by Error Propagationž. In: Parallel Distributed Processing:

Explorations in the Microstructure of Cognition, Vol. 1: Foundations. Ed. by David E.

Rumelhart, James L. McClelland, and PDP Research Group. Cambridge, MA: MIT

Press, pp. 318ś362. (Link).

Ruppin, Eytan (2002). łEvolutionary Autonomous Agents: A Neuroscience Perspectivež.

In: Nature Reviews Neuroscience 3, pp. 132ś141. (Link).

Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,

Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C.

Berg, and Fei-Fei Li (2015). łImageNet Large Scale Visual Recognition Challengež.

In: International Journal of Computer Vision 115, pp. 211ś252. (Link).

Ryan Ruggiero, Vincent (2012). Beyond Feelings: A Guide to Critical Thinking. McGraw

Hill. (Link).

Salge, Chr istoph, Cornelius Glackin, and Daniel Polani (2014). łEmpowermentśAn

Introductionž. In: Guided Self-Organization: Inception. Ed. by Mikhail Prokopenko.

New York: Springer, pp. 67ś114. (Link).

Salih, Adham and Amiram Moshaiov (2022). łEvolving topology and weights of special-

ized and non-specialized neuro-controllers for robot motion in various environmentsž.

In: Neural Computing and Applications 34, pp. 17071ś17086. (Link).

(2023a). łNeuro-Evolution-Based Generic Missile Guidance Law for Many-Scenariosž.

In: Applied Soft Computing 152, p. 111210. (Link).

443

REFERENCES

Salih, Adham and Amiram Moshaiov (2023b). łPromoting Transfer of Robot Neuro-

Motion-Controllers by Many-Objective Topology and Weight Evolutionž. In: IEEE

Transactions on Evolutionary Computation 27, pp. 385ś395. (Link).

Salimans, Tim, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever (2017). łEvolution

Strategies as a Scalable Alternative to Reinforcement Learningž. In: arXiv:1703.03864.

(Link).

Samet, Hanan (1984). łThe Quadtree and Related Hierarchical Data Structuresž. In: ACM

Computing Surveys 16.2, pp. 187ś260. (Link).

Samuel, Arthur L. (1959). łSome Studies in Machine Learning Using the Game of

Checkersž. In: IBM Journal of Research and Development 3, pp. 210ś229. (Link).

Sandler, Mark, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh

Chen (2018). łMobileNetV2: Inverted Residuals and Linear Bottlenecksž. In: Pro-

ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,

pp. 4510ś4520. (Link).

Sarti, Stefano and Gabriela Ochoa (2021). łA NEAT visualisation of neuroevolution

trajectoriesž. In: Applications of Evolutionary Computation—24th International

Conference, pp. 714ś728. (Link).

Saunders, Gregory M. and Jordan B. Pollack (1996). łThe Evolution of Communication

Schemes Over Continuous Channelsž. In: From Animals to Animats 4: Proceedings

of the Fourth International Conference on Simulation of Adaptive Behavior. Ed. by

Pattie Maes, Maja J. Mataric, Jean-Arcady Meyer, Jordan Pollack, and S. W. Wilson.

Cambridge, MA: MIT press, pp. 580ś589. (Link).

Schaﬀer, J. David, Rich A. Caruana, and Larry J. Eshelman (1990). łUsing Genetic Search

to Exploit the Emergent Behavior of Neural Networksž. In: Physica D: Nonlinear

Phenomena, pp. 244ś248. (Link).

Schaﬀer, J. David, Dar rell Whitley, and Larry J. Eshelman (1992). łCombinations of

Genetic Algorithms and Neural Networks: A Survey of the State of the Artž. In:

COGANN-92: International Workshop on Combinations of Genetic Algorithms and

Neural Networks. Los Alamitos, CA: IEEE Computer Society Press, pp. 1ś37. (Link).

Schmidhuber, Jürgen (1992). łLearning to Control Fast-weight Memories: An Alter native

to Dynamic Recurrent Networksž. In: Neural Computation 4.1, pp. 131ś139. (Link).

Schmidhuber, Jürgen, Daan Wierstra, Matteo Gagliolo, and Faustino Gomez (2007).

łTraining Recurrent Networks by Evolinož. In: Neural Computation 19.3, pp. 757ś779.

(Link).

Schrum, Jacob, Igor V. Karpov, and Risto Miikkulainen (2011). łUT

2: Human-like

Behavior via Neuroevolution of Combat Behavior and Replay of Human Tracesž.

In: Proceedings of the IEEE Conference on Computational Intelligence and Games,

pp. 329ś336. (Link).

(2012). łHumanlike Combat Behavior via Multiobjective Neuroevolutionž. In: Believ-

able Bots. Ed. by Philip Hingston. New York: Springer, pp. 119ś150. (Link).

Schrum, Jacob and Risto Miikkulainen (2016a). łDiscovering Multimodal Behavior in

Ms. Pac-Man through Evolution of Modular Neural Networksž. In: IEEE Transactions

on Computational Intelligence and AI in Games 8, pp. 67ś81. (Link).

444

REFERENCES

Schrum, Jacob and Risto Miikkulainen (2016b). łSolving Multiple Isolated, Interleaved,

and Blended Tasks through Modular Neuroevolutionž. In: Evolutionary Computation

24, pp. 459ś490. (Link).

Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

(2017a). Proximal Policy Optimization. https://openai.com/index/openai-baselines-

ppo/. Retrieved 8/21/2025.

Ð (2017b). łProximal Policy Optimization Algorithmsž. In: arXiv:1707.06347.

(Link).

Schultz, Wolfram (2024). łA Dopamine Mechanism for Reward Maximizationž. In:

Proceedings of the National Academy of Sciences 121.20, e2316658121. (Link).

Schuman, Catherine, J. Parker Mitchell, Robert M. Patton, Thomas E. Potok, and James S.

Plank (2020). łEvolutionary Optimization for Neuromorphic Systemsž. In: NICE’20:

Proceedings of the 2020 Annual Neuro-Inspired Computational Elements Workshop,

2:1ś2:9.

(Link).

Schuman, Catherine, Robert M. Patton, Shruti Kulkarni, Maryam Parsa, Christopher Stahl,

N. Quentin Haas, J. Parker Mitchell, Shay Snyder, Amelie Nagle, Alexandra Shanaﬁeld,

and Thomas E. Potok (2022). łEvolutionary vs. Imitation Learning for Neuromorphic

Control at the Edgež. In: Neuromorphic Computing and Engineering 2, p. 014002.

(Link).

Schuman, Catherine, Thomas E. Potok, Robert M. Patton, J. Douglas Birdwell, Mark E.

Dean, Garrett S. Rose, and James S. Plank (2017). łA Survey of Neuromorphic

Computing and Neural Networks in Hardwarež. In: arXiv:1705.06963. (Link).

Secretan, Jimmy, Nicholas Beato, David B. D’Ambrosio, Adelein Rodriguez, Adam

Campbell, J. T. Folsom-Kovarik, and Kenneth O. Stanley (2011). łPicbreeder: A Case

Study in Collaborative Evolutionary Exploration of Design Spacež. In: Evolutionary

Computation 19, pp. 345ś371. (Link).

Sehnke, Frank, Christian Osendorfer, Thomas Rückstieß, Alex Graves, Jan Peters, and

Jürgen Schmidhuber (2010). łParameter-exploring Policy Gradientsž. In: Neural

Networks 23.4, pp. 551ś559. (Link).

Shahrzad, Hormoz, Babak Hodjat, and Risto Miikkulainen (2024). łEVOTER: Evolution of

Transparent Explainable Rule-setsž. In: ACM Transactions on Evolutionary Learning

and Optimization. Vol 5, Issue 2, Article 11, pp. 1ś30. (Link).

Shami, Tareq M., Ayman A. El-Saleh, Mohammed Alswaitti, Qasem Al-Tashi, Mhd

A. Summakieh, and Seyedali Mirjalili (2022). łParticle Swarm Optimization: A

Comprehensive Surveyž. In: IEEE Access 10, pp. 10031ś10061. (Link).

Sharma, Shubham, Jette Henderson, and Joydeep Ghosh (2020). łCERTIFAI: A Common

Framework to Provide Explanations and Analyse the Fairness and Robustness of

Black-Box Modelsž. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and

Society. New York, NY, USA: Association for Computing Machinery, pp. 166ś172.

(Link).

Shayani, Hooman, Peter J. Bentley, and Andy Tyrrell (2008). łAn FPGA-based Model

suitable for Evolution and Development of Spiking Neural Networksž. In: Proceedings

of the European Symposium on Artiﬁcial Neural Networks, pp. 197ś202. (Link).

445

REFERENCES

Shim, Yoonsik, Sanghyun Kim, and Chiwook Kim (2004). łEvolving Flying Creatures

with Path-following Behaviorž. In: ALife IX: Proceedings of the 9th International

Conference on the Simulation and Synthesis of Living Systems, pp. 125ś132. (Link).

Silva, Filipe, Paulo Urbano, Luis C. Correia, and Anders L. Christensen (2015). łodNEAT:

An Algor ithm for Decentralised Online Evolution of Robotic Controllersž. In: Evolu-

tionary Computation 23.3, pp. 421ś449. (Link).

Silver, David, Thomas Hubert, Julian Schr ittwieser, Ioannis Antonoglou, Matthew Lai,

Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy

Lillicrap, Karen Simonyan, and Demis Hassabis (2018). łA General Reinforcement

Learning Algorithm That Masters Chess, Shogi, and Go through Self-playž. In: Science

362, pp. 1140ś1144. (Link).

Simione, Luca and Stefano Nolﬁ (2020). łLong-Term Progress and Behavior Complexiﬁ-

cation in Competitive Coevolutionž. In: Artiﬁcial Life 26, pp. 1ś22. (Link).

Simon, Herbert A. (1969). The Sciences of the Artiﬁcial. Cambridge, MA: MIT Press.

(Link).

Simon, Joel (2018). Artbreeder.

https://www.artbreeder.com/. Retrieved 8/31/2025.

Simonyan, Karen and Andrew Zisserman (2015). łVery Deep Convolutional Networks

for Large-Scale Image Recognitionž. In: Proceedings of the Third International

Conference on Learning Representations. (Link).

Sims, Karl (1991). łArtiﬁcial Evolution for Computer Graphicsž. In: Proceedings of the

Annual Conference on Computer Graphics and Interactive Techniques, pp. 319ś328.

(Link).

(1994). łEvolving 3D Morphology and Behavior by Competitionž. In: Artiﬁcial Life

IV: Proceedings of the Fourth International Workshop on the Synthesis and Simulation

of Living Systems. Ed. by Rodney A. Brooks and Pattie Maes. Cambridge, MA: MIT

Press, pp. 28ś39. (Link).

Singleton, Jenny L. and Elissa L. Newport (2004). łWhen Learners Surpass Their Models:

The Acquisition of American Sign Language from Inconsistent Inputž. In: Cognitive

Psychology 49, pp. 370ś407. (Link).

Sinha, Ankur, Pekka Malo, Peng Xu, and Kalyanmoy Deb (2014). łA Bilevel Optimization

Approach to Automated Parameter Tuningž. In: GECCO’14: Proceedings of the 2014

Annual Conference on Genetic and Evolutionary Computation, pp. 847ś854. (Link).

Sipper, Moshe, Jason H. Moore, and Ryan J. Urbanowicz (2019). łSolution and Fitness

Evolution (SAFE): Coevolving Solutions and Their Objective Functionsž. In: Genetic

Programming: 22nd European Conference. Ed. by Lukas Sekanina, Ting Hu, Nuno

Lourenço, Hendrik Richter, and Pablo García-Sánchez. New York: Springer, pp. 146ś

161. (Link).

Sit, Yiu Fai and Risto Miikkulainen (2005). łLearning Basic Navigation for Personal

Satellite Assistant Using Neuroevolutionž. In: GECCO’05: Proceedings of the 7th

Annual Conference on Genetic and Evolutionary Computation, pp. 1913ś1920. (Link).

Smith, Jennifer E., Kenna D. S. Lehmann, Tracy M. Montgomery, Eli D. Strauss, and

Kay E. Holekamp (2017). łInsights from Long-term Field Studies of Mammalian

Carnivoresž. In: Journal of Mammalogy 98, pp. 631ś641. (Link).

446

REFERENCES

So, David, Quoc V. Le, and Chen Liang (2019). łThe Evolved Transformerž. In: Pro-

ceedings of the 36th International Conference on Machine Learning, pp. 5877ś5886.

(Link).

Sohl-Dickstein, Jascha, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli (2015).

łDeep Unsupervised Learning using Nonequilibrium Thermodynamicsž. In: Proceed-

ings of the 32nd International Conference on Machine Learning, pp. 2256ś2265.

(Link).

Solé, Ricard (2016). łThe major synthetic evolutionary transitionsž. In: Philosophical

Transactions of the Royal Society B: Biological Sciences 371.1701, p. 20160175.

(Link).

Solomon, Matthew, Terence Soule, and Robert B. Heckendorn (2012). łA Comparison of a

Communication Strategies in Cooperative Learningž. In: GECCO’12: Proceedings of

the 14th Annual Conference on Genetic and Evolutionary Computation, pp. 153ś160.

(Link).

Soltoggio, Andrea, John A. Bullinaria, Claudio Mattiussi, Peter Dürr, and Dario Floreano

(2008). łEvolutionary Advantages of Neuromodulated Plasticity in Dynamic, Reward-

based Scenariosž. In: Artiﬁcial Life XI: Proceedings of the Eleventh International

Conference on the Simulation and Synthesis of Living Systems. Ed. by Seth Bullock,

Jason Noble, Richard Watson, and Mark A. Bedau. Cambridge, MA: MIT Press,

pp. 569ś576. (Link).

Soltoggio, Andrea, Peter Dürr, Claudio Mattiussi, and Dario Floreano (2007). łEvolv-

ing Neuromodulatory Topologies for Reinforcement Learning-like Problemsž. In:

Proceedings of the IEEE Congress on Evolutionary Computation, pp. 2471ś2478.

(Link).

Soltoggio, Andrea, Kenneth O. Stanley, and Sebastian Risi (2018). łBorn to Learn: The

Inspiration, Progress, and Future of Evolved Plastic Artiﬁcial Neural Networksž. In:

Neural Networks 108, pp. 48ś67. (Link).

Song, Sen, Kenneth D. Miller, and Larry F. Abbott (2000). łCompetitive Hebbian Learning

Through Spike-Timing-Dependent Synaptic Plasticityž. In: Nature Neuroscience 3,

pp. 919ś926. (Link).

Song, Xingyou, Wenbo Gao, Yuxiang Yang, Krzysztof Choromanski, Aldo Pacchiano,

and Yunhao Tang (2020). łES-MAML: Simple Hessian-free meta learningž. In:

Proceedings of the Eighth International Conference on Learning Representations,

pp. 9392ś9410. (Link).

Song, Xingyou, Yuxiang Yang, Krzysztof Choromanski, Ken Caluwaerts, Wenbo Gao,

Chelsea Finn, and Jie Tan (2020). łRapidly Adaptable Legged Robots via Evolution-

ary Meta-learningž. In: Proceedings of the IEEE/RSJ International Conference on

Intelligent Robots and Systems, pp. 3769ś3776. (Link).

Spector, Lee and Sean Luke (1996). łCultural Transmission of Information in Genetic

Programmingž. In: Genetic Programming 1996: Proceedings of the First Annual

Conference. Ed. by John R Koza, David E Goldberg, David B. Fogel, and L. R. Riolo.

Cambridge, MA: MIT Press, pp. 209ś214. (Link).

Sporns, Olaf and Richard F. Betzel (2016). łModular Brain Networksž. In: Annual Reviews

of Psychology 67, pp. 613ś640. (Link).

447

REFERENCES

Srivastava, Nitish, Geoﬀrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan R.

Salakhutdinov (2014). łDropout: A Simple Way to Prevent Neural Networks from

Overﬁttingž. In: Jour nal of Machine Learning Research 15.56, pp. 1929ś1958. (Link).

Srivastava, Rupesh K., Klaus Greﬀ, and Jürgen Schmidhuber (2015). łHighway Networksž.

In: Deep Learning Workshop, 32nd International Conference on Machine Learning.

(Link).

Stanley, Kenneth O. (2003). łEﬃcient Evolution of Neural Networks Through Complexiﬁ-

cationž. PhD thesis. Austin, TX: Department of Computer Sciences, The University

of Texas at Austin. (Link).

(2007). łCompositional Pattern Producing Networks: A Novel Abstraction of De-

velopmentž. In: Genetic Programming and Evolvable Machines 8, pp. 131ś162.

(Link).

Stanley, Kenneth O., Bobby D. Bryant, and Risto Miikk ulainen (2003). łEvolving Adaptive

Neural Networks with and Without Adaptive Synapsesž. In: Proceedings of the IEEE

Congress on Evolutionary Computation, pp. 2557ś2564. (Link).

(2005). łReal-Time Neuroevolution in the NERO Video Gamež. In: IEEE Transactions

on Evolutionary Computation 9, pp. 653ś668. (Link).

Stanley, Kenneth O., Jeﬀ Clune, Joel Lehman, and Risto Miikkulainen (2019). łDesigning

Neural Networks through Evolutionary Algorithmsž. In: Nature Machine Intelligence

1, pp. 24ś35. (Link).

Stanley, Kenneth O., David B. D’Ambrosio, and Jason Gauci (2009). łA Hypercube-based

Encoding for Evolving Large-scale Neural Networksž. In: Artiﬁcial life 15, pp. 185ś

212. (Link).

Stanley, Kenneth O. and Joel Lehman (2015). Why Greatness Cannot Be Planned: The

Myth of the Objective. New York: Springer. (Link).

Stanley, Kenneth O. and Risto Miikkulainen (2002). łEvolving Neural Networks Through

Augmenting Topologiesž. In: Evolutionary Computation 10, pp. 99ś127. (Link).

(2003). łA Taxonomy for Artiﬁcial Embryogenyž. In: Artiﬁcial Life 9, pp. 93ś130.

(Link).

(2004). łCompetitive Coevolution through Evolutionary Complexiﬁcationž. In: Jour-

nal of Artiﬁcial Intelligence Research 21, pp. 63ś100. (Link).

Steels, Luc L. (2016). łAgent-based Models for the Emergence and Evolution of Grammarž.

In: Philosophical Transactions of the Royal Society B: Biological Sciences 371,

p. 20150447. (Link).

Steuer, Inge and Pierre A. Guertin (2019). łCentral Pattern Generators in the Brainstem

and Spinal Cord: An Overview of Basic Principles, Similarities and Diﬀerencesž. In:

Reviews in the Neurosciences 30, pp. 107ś164. (Link).

Storn, Rainer M. and Kenneth V. Price (1997). łDiﬀerential Evolution ś A Simple and

Eﬃcient Heuristic for Global Optimization over Continuous Spacesž. In: Journal of

Global Optimization 11, pp. 341ś359. (Link).

Strassen, Volker (1969). łGaussian Elimination is Not Optimalž. In: Numerische Mathe-

matik 13.4, pp. 354ś356. (Link).

Sudhakaran, Shyam, Miguel González-Duque, Matthias Freiberger, Claire Glanois, Elias

Najarro, and Sebastian Risi (2023). łMarioGPT: Open-ended Text2Level Generation

448

REFERENCES

through Large Language Modelsž. In: Advances in Neural Information Processing

Systems 36, pp. 54213ś54227. (Link).

Sudhakaran, Shyam, Djordje Grbic, Siyan Li, Adam Katona, Elias Najarro, Claire Glanois,

and Sebastian Risi (2021). łGrowing 3d Artefacts and Functional Machines with

Neural Cellular Automataž. In: ALIFE 2021: The 2021 Conference on Artiﬁcial Life,

pp. 108ś116. (Link).

Sun, Yanan, Bing Xue, Mengjie Zhang, and Gary G. Yen (2020). łEvolving Deep

Convolutional Neural Networks for Image Classiﬁcationž. In: IEEE Transactions on

Evolutionary Computation 24, pp. 394ś407. (Link).

Szathmáry, Eörs (2015). łToward Major Evolutionary Transitions Theory 2.0ž. In: Pro-

ceedings of the National Academy of Sciences 112.33, pp. 10104ś10111. (Link).

Szegedy, Christian, Vincent Vanhoucke, Sergey Ioﬀe, Jon Shlens, and Zbigniew Wojna

(2016). łRethinking the Inception Architecture for Computer Visionž. In: Proceedings

of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818ś2826.

(Link).

Takagi, Hideyuki (2001). łInteractive Evolutionary Computation: Fusion of the Capabilities

of EC Optimization and Human Evaluationž. In: Proceedings of the IEEE 89.9,

pp. 1275ś1296. (Link).

Tan, James (2017). Investing in ICOS: Results may vary. https://akaidotto.blogspot.com/.

Retrieved 8/31/2017.

Tan, Mingxing and Quoc V. Le (2019). łEﬃcientNet: Rethinking Model Scaling for

Convolutional Neural Networksž. In: Proceedings of the 36th International Conference

on Machine Learning, pp. 6105ś6114. (Link).

(2021). łEﬃcientNetV2: Smaller Models and Faster Trainingž. In: Proceedings of the

38th International Conference on Machine Learning, pp. 10096ś10106. (Link).

Tang, Yujin, Duong Nguyen, and David Ha (2020). łNeuroevolution of Self-Interpretable

Agentsž. In: GECCO’20: Proceedings of the 2020 Genetic and Evolutionary Compu-

tation Conference, pp. 414ś424. (Link).

Tang, Yujin, Jie Tan, and Tatsuya Harada (2020). łLearning Agile Locomotion via

Adversarial Trainingž. In: Proceedings of the IEEE/RSJ International Conference On

Intelligent Robots and Systems, pp. 6098ś6105. (Link).

Tang, Yujin, Yingtao Tian, and David Ha (2022). łEvojax: Hardware-accelerated Neuroevo-

lutionž. In: GECCO’22: Proceedings of the Genetic and Evolutionary Computation

Conference Companion, pp. 308ś311. (Link).

Tansey, Wesley, Eliana Feasley, and Risto Miikkulainen (2012). łAccelerating Evolution

via Egalitarian Social Learningž. In: GECCO’12: Proceedings of the 14th Annual

Conference on Genetic and Evolutionary Computation, pp. 919ś926. (Link).

Taylor, Ross, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn,

Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic (2022). łGalactica:

A Large Language Model for Sciencež. In: arXiv:2211.09085. (Link).

Templier, Paul, Emmanuel Rachelson, and Dennis G Wilson (2021). łA geometric

encoding for neural network evolutionž. In: GECCO’21: Proceedings of the Genetic

and Evolutionary Computation Conference, pp. 919ś927. (Link).

449

REFERENCES

Teyke, Thomas, Klaudiusz R. Weiss, and Irving Kupfermann (1990). łAn Identiﬁed

Neuron (CPR) Evokes Neuronal Responses Reŕecting Food arousal in Aplysia.ž In:

Science 247, pp. 85ś87. (Link).

Todd, Graham, Sam Earle, Muhammad U. Nasir, Michael C. Green, and Julian Togelius

(2023). łLevel Generation through Large Language Modelsž. In: Proceedings of the

18th International Conference on the Foundations of Digital Games, pp. 1ś8. (Link).

Togelius, Julian, Georgios N. Yannakakis, Kenneth O. Stanley, and Cameron Browne

(2011). łSearch-based procedural content generation: A taxonomy and surveyž. In:

IEEE Transactions on Computational Intelligence and AI in Games 3, pp. 172ś186.

(Link).

Tonelli, Paul and Jean-Baptiste Mouret (2013). łOn the Relationships between Generative

Encodings, Regularity, and Learning Abilities when Evolving Plastic Artiﬁcial Neural

Networksž. In: PloS one 8.11, e79138. (Link).

Toutouh, Jamal, Erik Hemberg, and Una-May O’Reilly (2019). łSpatial evolutionar y

generative adversarial networksž. In: GECCO’19: Proceedings of the Genetic and

Evolutionary Computation Conference, pp. 472ś480. (Link).

Touvron, Hugo et al. (2023). łLlama 2: Open Foundation and Fine-tuned Chat Modelsž.

In: arXiv:2307.09288. (Link).

Towell, Geoﬀrey G. and Jude W. Shavlik (1994). łKnowledge-Based Artiﬁcial Neural

Networksž. In: Artiﬁcial Intelligence 70, pp. 119ś165. (Link).

Trianni, Vittorio, Elio Tuci, Christos Ampatzis, and Marco Dorigo (2014). łEvolutionary

Swarm Robotics: A Theoretical and Methodological Itinerary from Individual Neuro-

Controllers to Collective Behaviorsž. In: Horizons of Evolutionary Robotics. Ed.

by Patricia A. Vargas, Ezequiel A. Di Paolo, Inman Harvey, and Phil Husbands.

Cambridge, MA: MIT Press, pp. 153ś178. (Link).

Turing, Alan (1952). łThe Chemical Basis of Morphogenesisž. In: Philosophical Transac-

tions of the Royal Society B 237, pp. 37ś72. (Link).

Turney, Peter D. (2020). łSymbiosis Promotes Fitness Improvements in the Game of Lifež.

In: Artiﬁcial Life 26, pp. 338ś365. (Link).

Tutum, Cem C., Suhaib Abdulquddos, and Risto Miikkulainen (2021). łGeneralization of

Agent Behavior through Explicit Representation of Contextž. In: Proceedings of the

IEEE Conference on Games, pp. 95ś101. (Link).

Tyulmankov, Danil, Guangyu R. Yang, and Larry F. Abbott (2022). łMeta-learning

Synaptic Plasticity and Memory Addressing for Continual Familiarity Detectionž. In:

Neuron 110, 544ś557.e8. (Link).

Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky (2018). łDeep Image Priorž.

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern

Recognition, pp. 9446ś9454. (Link).

Valsalam, Vinod, James A. Bednar, and Risto Miikkulainen (2005). łConstructing Good

Learners Using Evolved Pattern Generatorsž. In: GECCO’05: Proceedings of the 7th

Annual Conference on Genetic and Evolutionary Computation, pp. 11ś18. (Link).

(2007). łDeveloping Complex Systems Using Evolved Pattern Generatorsž. In: IEEE

Transactions on Evolutionary Computation 11, pp. 181ś198. (Link).

450

REFERENCES

Valsalam, Vinod, Jonathan Hiller, Robert MacCurdy, Hod Lipson, and Risto Miikkulainen

(2013). łConstructing Controllers for Physical Multilegged Robots using the ENSO

Neuroevolution Approachž. In: Evolutionary Intelligence 14, pp. 303ś331. (Link).

Valsalam, Vinod and Risto Miikkulainen (2011). łEvolving Symmetry for Modular System

Designž. In: IEEE Transactions on Evolutionary Computation 15, pp. 368ś386. (Link).

van Eck Conradie, Alex, Risto Miikkulainen, and Christiaan Aldrich (2002a). łAdaptive

Control Utilising Neural Swarmingž. In: GECCO’02: Proceedings of the 4th Annual

Conference on Genetic and Evolutionary Computation, pp. 60ś67. (Link).

(2002b). łIntelligent Process Control Utilizing Symbiotic Memetic Neuro-Evolutionž.

In: Proceedings of the IEEE Congress on Evolutionar y Computation, pp. 623ś628.

(Link).

Vargas, Patricia A., Ezequiel Di Paolo, Inman Harvey, and Philip Husbands, eds. (2014).

The Horizons of Evolutionary Robotics. Cambridge, MA: MIT Press. (Link).

Vassiliades, Vassilis, Konstantinos Chatzilygeroudis, and Jean-Baptiste Mouret (2017).

łUsing Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of

Phenotypic Elites Algorithmž. In: IEEE Transactions on Evolutionary Computation

22.4, pp. 623ś630. (Link).

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N.

Gomez, Lukasz Kaiser, and Illia Polosukhin (2017). łAttention is All You Needž. In:

Advances in Neural Information Processing Systems 30, pp. 5999ś6009. (Link).

Venkadesh, Siva, Alexander O. Komendantov, Stanislav Listopad, Eric O. Scott, Kenneth

A. De Jong, Jeﬀrey L. Krichmar, and Giorgio A. Ascoli (2018). łEvolving Simple

Models of Diverse Intrinsic Dynamics in Hippocampal Neuron Typesž. In: Frontiers

of Neuroinformatics 12. Article 8. (Link).

Venkatramanan, Srinivasan, Bryan Lewis, Jiangzhuo Chen, Dave Higdon, Anil Vullikanti,

and Madhav Marathe (2018). łUsing Data-driven Agent-based Models for Forecasting

Emerging Infectious Diseasesž. In: Epidemics 22, pp. 43ś49. (Link).

Verbancsics, Phillip and Kenneth O. Stanley (2011). łConstraining Connectivity to

Encourage Modularity in HyperNEATž. In: GECCO’11: Proceedings of the 13th

Annual Conference on Genetic and Evolutionary Computation, pp. 1483ś1490. (Link).

Verel, Sébastien, Gabriela Ochoa, and Marco Tomassini (2010). łLocal optima networks of

NK landscapes with neutralityž. In: IEEE Transactions on Evolutionary Computation

15, pp. 783ś797. (Link).

Versace, Elisabetta, Antone Martinho-Truswell, Alex Kacelnik, and Giorgio Vallortigara

(2018). łPriors in Animal and Artiﬁcial Intelligence: Where Does Learning Begin?ž

In: Trends in cognitive sciences 22.11, pp. 963ś965. (Link).

Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan (2015). łShow and

tell: A Neural Image Caption Generatorž. In: Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition, pp. 3156ś3164. (Link).

Voelkle, Manuel C., Natalie C. Ebner, Ulman Lindenberger, and Michaela Riediger (2012).

łLet Me Guess How Old You Are: Eﬀects of Age, Gender, and Facial Expression on

Perceptions of Agež. In: Psychology and Aging 27.2, p. 265. (Link).

Volz, Vanessa, Jacob Schrum, Jialin Liu, Simon M. Lucas, Adam Smith, and Sebastian

Risi (2018). łEvolving Mario Levels in the Latent Space of a Deep Convolutional

451

REFERENCES

Generative Adversarial Networkž. In: GECCO’18: Proceedings of the Genetic and

Evolutionary Computation Conference, pp. 221ś228. (Link).

Wagner, Andreas (2005). Robustness and Evolvability in Living Systems. Princeton, New

Jersey: Princeton University Press. (Link).

Wagner, Kyle, James A. Reggia, Juan Uriagereka, and Gerald S. Wilkinson (2003).

łProgress in the Simulation of Emergent Communication and Languagež. In: Adaptive

Behavior 11, pp. 37ś69. (Link).

Wang, Bin, Yanan Sun, Bing Xue, and Mengjie Zhang (2018). łA Hybrid Diﬀerential

Evolution Approach to Designing Deep Convolutional Neural Networks for Image

Classiﬁcationž. In: Advances in Artiﬁcial Intelligence. Ed. by Tanja Mitrovic, Bing

Xue, and Xiaodong Li. New York: Springer, pp. 237ś250. (Link).

Wang, Chao, Jiaxuan Zhao, Licheng Jiao, Lingling Li, Fang Liu, and Shuyuan Yang

(2025). łWhen Large Language Models Meet Evolutionary Algorithms: Potential

Enhancements and Challengesž. In: Research 8, p. 0646. (Link).

Wang, Jane X., Zeb Kurth-Nelson, Dhr uva Tirumala, Hubert Soyer, Joel Z. Leibo, Remi

Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick (2016). łLearning

to Reinforcement Learnž. In: arXiv:1611.05763. (Link).

Wang, Lishuang, Mengfei Zhao, Enyu Liu, Kebin Sun, and Ran Cheng (2024). łTensorized

Neuroevolution of Augmenting Topologies for GPU Accelerationž. In: GECCO’24:

Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1156ś1164.

(Link).

Wang, Rui, Joel Lehman, Jeﬀ Clune, and Kenneth O. Stanley (2019). łPOET: Open-

ended Coevolution of Environments and Their Optimized Solutionsž. In: GECCO’19:

Proceedings of the Genetic and Evolutionary Computation Conference, pp. 142ś151.

(Link).

Wang, Rui, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeﬀ Clune, and Kenneth O.

Stanley (2020). łEnhanced POET: Open-ended Reinforcement Learning through

Unbounded Invention of Learning Challenges and Their Solutionsž. In: Proceedings

of the 37th International Conference on Machine Learning, pp. 9940ś9951.

(Link).

Wang, Yong (2013). łGene Regulatory Networksž. In: Encyclopedia of Systems Biology.

Ed. by Werner Dubitzky, Olaf Wolkenhauer, Kwang-Hyun Cho, and Hiroki Yokota.

New York: Springer, pp. 801ś805. (Link).

Warner, Jamieson, Ashwin Devaraj, and Risto Miikkulainen (2024). łUsing Context

to Adapt to Sensor Driftž. In: Proceedings of the International Conference on

Development and Learning, pp. 184ś190. (Link).

Watson, Richard A., Niclas Palmius, Rob Mills, Simon T. Powers, and Alexandra Penn

(2011). łCan Selﬁsh Symbioses Eﬀect Higher-level Selection?ž In: Advances in

Artiﬁcial Life: Darwin Meets von Neumann, 10th European Conference. Ed. by George

Kampis, István Karsai, and Eörs Szathmáry. New York: Springer, pp. 27ś36. (Link).

Watson, Richard A. and Jordan B. Pollack (2003). łA Computational Model of Symbiotic

Composition in Evolutionary Transitionsž. In: Biosystems 69, pp. 187ś209. (Link).

Werner, Gregory M. and Michael G. Dyer (1992). łEvolution of Communication in

Artiﬁcial Organismsž. In: Artiﬁcial Life II: Proceedings of the Workshop on Artiﬁcial

452

REFERENCES

Life. Ed. by Christopher G. Langton, Charles Taylor, J. Doyne Farmer, and Steen

Rasmussen. Reading, MA: Addison-Wesley, pp. 659ś687. (Link).

West-Eberhard, Mary-Jane (2003). Developmental Plasticity and Evolution. Oxford, UK:

Oxford University Press. (Link).

White, Colin, Mahmoud Safari, Rhea Sukthanker, Binxin Ru, Thomas Elsken, Arber Zela,

Debadeepta Dey, and Frank Hutter (2023). łNeural Architecture Search: Insights from

1000 Papersž. In: arXiv:2301.08727. (Link).

Whiteson, Shimon (2006). łEvolutionary Function Approximation for Reinforcement

Learningž. In: Journal of Machine Learning Research 7, pp. 877ś917. (Link).

Whiteson, Shimon, Peter Stone, Kenneth O. Stanley, Risto Miikkulainen, and Nate

Kohl (2005). łAutomatic Feature Selection in Neuroevolutionž. In: GECCO’05:

Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation,

pp. 1225ś1232. (Link).

Whitley, Darrell, Stephen Dominic, and Rajarshi Das (1991). łGenetic Reinforcement

Learning with Multilayer Neural Networksž. In: Proceedings of the Fourth Interna-

tional Conference on Genetic Algorithms, pp. 562ś569.

Whitley, Darrell, Stephen Dominic, Rajarshi Das, and Charles W. Anderson (1993).

łGenetic Reinforcement Learning for Neurocontrol Problemsž. In: Machine Learning

13, pp. 259ś284. (Link).

Whitley, Darrell and Thomas Hanson (1989). łOptimizing Neural Networks Using Faster,

More Accurate Genetic Searchž. In: Proceedings of the Third International Conference

on Genetic Algorithms, pp. 391ś396. (Link).

Whitley, Darrell, Keith E. Mathias, and Patrick A. Fitzhorn (1991). łDelta-Coding: An

Iterative Search Strategy for Genetic Algorithmsž. In: Proceedings of the Fourth

International Conference on Genetic Algorithms, pp. 77ś84. (Link).

Whitley, Derek (2024a). łNeuroevolving Electronic Dynamical Networksž. In: arXiv

preprint arXiv:2404.04587. (Link).

(2024b). łThe Intrinsic Evolution of Reconﬁgurable Electronic Circuitryž. PhD thesis.

The School of Informatics, Computing, Engineer ing, and Cognitive Science Program,

Indiana University. (Link).

Widrow, Bernard, Youngsik Kim, Dookun Park, and Jose Krause Perin (2023). łNature’s

Learning Rule: The Hebbian-LMS Algorithmž. In: Artiﬁcial Intelligence in the Age

of Neural Networks and Brain Computing (second edition). Ed. by Robert Kozma,

Cesare Alippi, Yoonsuck Choe, and Francesco C. Morabito. Amsterdam: Elsevier,

pp. 11ś40. (Link).

Wiegand, R. Paul (2003). łAn Analysis of Cooperative Coevolutionar y Algorithmsž.

PhD thesis. George Mason University. (Link).

Williams, Ronald J. (1992). łSimple Statistical Gradient-Following Algorithms for

Connectionist Reinforcement Learningž. In: Machine Learning 8, pp. 229ś256.

(Link).

Wissner-Gross, Alexander D. and Cameron E. Freer (2013). łCausal Entropic Forcesž. In:

Physical Review Letters 110 (16), p. 168702. (Link).

Wolpert, Lewis, Cheryll Tickle, and Alfonso Martinez Arias (2015). Principles of

Development. Oxford, UK: Oxford University Press. (Link).

453

REFERENCES

Woolley, Brian G. and Kenneth O. Stanley (2011). łOn the Deleterious Eﬀects of A Priori

Objectives on Evolution and Representationž. In: GECCO’11: Proceedings of the 13th

Annual Conference on Genetic and Evolutionary Computation, pp. 957ś964. (Link).

Wu, Xingyu, Sheng-hao Wu, Jibin Wu, Liang Feng, and Kay C. Tan (2024). łEvolutionary

Computation in the Era of Large Language Model: Survey and Roadmapž. In:

arXiv:2401.10034. (Link).

Wulﬀ, Niels H. and John A. Hertz (1992). łLearning Cellular Automaton Dynamics

with Neural Networksž. In: Advances in Neural Information Processing Systems 5,

pp. 631ś638. (Link).

Wurman, Peter R., Samuel Barrett, Kenta Kawamoto, James MacGlashan, Kaushik

Subramanian, Thomas J. Walsh, Roberto Capobianco, Alisa Devlic, Franziska Eckert,

Florian Fuchs, Leilani Gilpin, Varun Kompella, Piyush Khandelwal, HaoChih Lin,

Patrick MacAlpine, Declan Oller, Craig Sherstan, Takuma Seno, Michael D. Thomure,

Houmehr Aghabozorgi, Leon Barrett, Rory Douglas, Dion Whitehead, Peter Dürr,

Peter Stone, Michael Spranger, and Hiroaki Kitano (2022). łOutracing Champion Gran

Turismo Drivers with Deep Reinforcement Learningž. In: Nature 62, pp. 223ś228.

(Link).

XPRIZE (2023). Pandemic Response Challenge. https://www.xprize.org/challenge/pande

micresponse. Retrieved 8/31/2025.

Yamauchi, Brian M. and Randall D. Beer (1993). łSequential Behavior and Learning in

Evolved Dynamical Neural Networksž. In: Adaptive Behavior 2, pp. 219ś246. (Link).

Yang, An et al. (2025). łQwen3 Technical Reportž. In: arXiv:2505.09388. (Link).

Yang, Tsun-Yi, Yi-Hsuan Huang, Yen-Yu Lin, Pi-Cheng Hsiu, and Yung-Yu Chuang (2018).

łSSR-Net: A Compact Soft Stagewise Regression Network for Age Estimationž. In:

Proceedings of the 27th International Joint Conference on Artiﬁcial Intelligence,

pp. 1078ś1084. (Link).

Yannakakis, Georgios N. and Julian Togelius (2018). Artiﬁcial Intelligence and Games.

2nd ed. New York: Springer. (Link).

Yao, Xin (1999). łEvolving Artiﬁcial Neural Networksž. In: Proceedings of the IEEE

87.9, pp. 1423ś1447. (Link).

Ying, Chris, Aaron Klein, Eric Christiansen, Esteban Real, Kevin Murphy, and Frank

Hutter (2019). łNAS-Bench-101: Towards Reproducible Neural Architecture Searchž.

In: Proceedings of the 36th International Conference on Machine Learning, pp. 7105ś

7114. (Link).

Yong, Chern H. and Risto Miikkulainen (2010). łCoevolution of Role-Based Cooperation

in Multi-Agent Systemsž. In: IEEE Transactions on Autonomous Mental Development

1, pp. 170ś186. (Link).

Yong, Chern H., Kenneth O. Stanley, Risto Miikkulainen, and Igor V. Karpov (2006).

łIncorporating Advice into Neuroevolution of Adaptive Agentsž. In: Proceedings of

the Second Artiﬁcial Intelligence and Interactive Digital Entertainment Conference,

pp. 98ś104. (Link).

Young, Daniel, Olivier Francon, Elliot Meyerson, Clemens Schwingshackl, Jacob Bieker,

Hugo Cunha, Babak Hodjat, and Risto Miikkulainen (2025). łDiscovering Eﬀective

454

REFERENCES

Policies for Land-Use Planning with Neuroevolutionž. In: Environmental Data Science

4, e30. (Link).

Zador, Anthony M. (2019). łA Critique of Pure Learning and What Artiﬁcial Neural

Networks Can Learn from Animal Brainsž. In: Nature Communications 10.1, p. 3770.

(Link).

Zela, Arber, Julien N. Siems, Lucas Zimmer, Jovita Lukasik, Margret Keuper, and Frank

Hutter (2022). łSurrogate NAS Benchmarks: Going Beyond the Limited Search Spaces

of Tabular NAS Benchmarksž. In: Proceedings of the Tenth International Conference

on Learning Representations, pp. 7294ś7329. (Link).

Zhang, Aston, Zachar y C. Lipton, Mu Li, and Alexander J. Smola (2023). Dive into Deep

Learning. Cambridge, UK: Cambridge University Press. (Link).

Zhang, Jenny, Joel Lehman, Kenneth O. Stanley, and Jeﬀ Clune (2024). łOMNI: Open-

Endedness via Models of Human Notions of Interestingnessž. In: Proceedings of the

Twelfth International Conference on Learning Representations, pp. 17745ś17791.

(Link).

Zhang, Qingfu and Hui Li (2007). łMOEA/D: A Multiobjective Evolutionary Algorithm

Based on Decompositionž. In: IEEE Transactions on Evolutionary Computation 11,

pp. 712ś731. (Link).

Zoph, Barret and Quoc V. Le (2017). łNeural Architecture Search with Reinforcement

Learningž. In: Proceedings of the Fifth International Conference on Learning Repre-

sentations. (Link).

Zoph, Barret, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le (2018). łLearning

Transferable Architectures for Scalable Image Recognitionž. In: Proceedings of the

IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8697ś8710.

(Link).

Zuidema, Willem and Paulien Hogeweg (2000). łSocial Patterns Guide Evolving Gram-

marsž. In: Proceedings of the Evolution of Language Conference, pp. 274ś279.

(Link).

455

Subject Index

Entries in bold indicate the most compre-

hensive explanations,

ACO, see Ant colony optimization

Acrobot task,

133, 358

Activation function, 9, 36, 87, 255

Activation function optimization, 288,

292, 295

Adversarial attack, 268, 287

Adversarial training, 187, 250, 268, 287

Agent-based modeling, 166, 403

AlphaEvolve method, 358

AlphaZero system, 7, 187

Amazon mechanical turk, 224

AmoebaNet model, 262, 266

Ant colony optimization, 33, 279

AQuaSurF method, 292

Archive method, 112, 112, 116, 117, 121,

122, 127, 128, 187, 242, 243,

349, 359, 366

Artbreeder platform, 219

Artiﬁcial life, 90, 150, 195, 389

Artiﬁcial neural networks, 36

AttentionAgent method, 106, 182, 372

AutoInit method, 275

AutoML process, 267, 291

AutoML-zero system, 291

BabyAI environment, 243

Backpropagation, 3, 37, 155, 204, 254,

262, 288, 327, 382, see also

Stochastic gradient descent, gra-

dient descent

Backward pass, 38

Baikal loss, 286

Baldwin eﬀect, 81, 133, 311, 333

BBOB, see Black-box optimization bench-

mark

BC, see Behavior characterization

Behavior characterization, 114, 121, 126,

198, 350

Behavior switching, 150, 197, 313, 380

Behavioral diversity, 113, 115, 121, 231,

331

Behavioral domination, 118

Behavioral examples, 213

Behavioral strategy, 139, 184, 395

Bias and variance, 385

BIG-bench tasks, 337

Bilevel neuroevolution, 143, 282, 311

BioMorphs system, 218

Biophysical model, 379

Bipedal walker task, 53, 69, 118, 230,

238, 245, 278

Black-box optimization benchmark, 358

Blondie24 system, 187

Body-brain coevolution, 122, 149, 238,

332, 389, 390

Botprize competition, 392

Bullet train design task, 282

CA, see Cellular automata

Canalization mechanism,

77, 223

Car racing task, 7, 108, 138, 182, 290,

303, 370

Cartesian genetic programming, 32, 50,

389

Case study, 90, 162, 165, 197, 220, 295,

392, 394

Catastrophic forgetting, 235, 327, 387

456

SUBJECT INDEX

Cell fate mechanism, 77

Cell-chemistry approach, 76, 77, 81

Cellular automata, 192, see also Neural

cellular automata

Cellular encoding, 79, 201

Central pattern generator, 300, 378

CGP, see Cartesian genetic programming

Changing environments, 136, 229, 232

Chase-and-escape task, 250, see also

Predator-prey task

CIFAR-10 benchmark, 102, 261, 274,

288

CIFAR-100 benchmark, 274, 294

Circuit design task, 136, 281

Classiﬁer systems, 178

CMA-ES, see Covariance matrix adapta-

tion evolution strategy

CNN, see Convolutional neural network

CoDeepNEAT method, 131, 135, 180,

236, 264, 275

Coevolution mechanism, 156, 177, 186,

233, 389

Command neuron, 376

Competing conventions, 57, 114, 132,

179, 274

Competitive coevolution, 177, 186, 187,

190, 237, 249

Competitive learning, 387

Complexiﬁcation mechanism, 59, 77, 188,

262

Compositional pattern producing network,

86, 195, 219, 221, 248, 324

Compositionality, 86, 403

Conciseness task, 346

Conﬁdence-based ensembling, 131

Connection targeting mechanism, 77

Context+skill method, 145

Continual learning, 146, 327, 332

Continuous time recurrent neural net-

works, 71, 378

Convolutional layer, 43

Convolutional neural network, 42, 260,

274

Cooperative behavior, 113, 389, 395

Cooperative coevolution, 113, 177, 192,

238, 304

Copy task, 329

CoSyNE method, 180, 181, 264

Countdown task, 346

Covariance matrix adaptation evolution

strategy, 25, 286, 348, 363, 369

COVID19 interventions, see Non-pharma-

ceutical interventions

CPG, see Central pattern generator

CPPN, see Compositional pattern produc-

ing network

Crafter environment,

243

Credit assignment problem, 96, 181, 308,

369

Crocuta crocuta, see Hyena behaviors

Cross-attention mechanism, 46, 104, 365

Cross-entropy loss, 286, 297

Crossover operator

Shortest edit path crossover,

275

Single-point crossover, 22

Two-point crossover, 22

Uniform crossover, 22

CTRNNs, see Continuous time recurrent

neural networks

Culling mechanism, 132

Currency trading task, 290

Curricular lear ning, see Shaping mecha-

nism

Darwinian evolution,

81, 311

Data augmentation, 9, 267, 290, 290, 296

DE, see Diﬀerential evolution

Decision strategies, 157

Deep evolutionary reinforcement learn-

ing, 332

Deep innovation protection, 181

Deep learning, 1, 9, 42, 44, 47, 66, 69,

73, 81, 82, 229, 254, 258, 262,

268, 272, 274, 279, 286, 288,

294, 296, 326, 367, 369

Deep learning models

AlexNet,

258

All-CNN, 288, 292

CoAtNet, 259, 294

457

SUBJECT INDEX

DenseNet, 259, 274, 296

EﬃcientNet, 259, 296

Highway networks, 258

Inception networks, 259, 266

MobileNet, 259, 261

MobileViT, 292

ResNet, 67, 259, 268, 274, 288, 292,

294ś296

Show&tell network, 180, 265

VGG, 258

Deep neuroevolution, 68, 279, 304

Deep Q-Network, 7, 69, 95, 160, 306

Delta-coding method, 112

DERL, see Deep evolutionary reinforce-

ment learning

Developmental process, 75, 192, 201,

232, 384

Diﬀerentiable pattern producing networks,

102

Diﬀerential evolution, 33, 337

Diﬀusion model, 238, 267, 285, 335, 406

Stable diﬀusion model, 353

DIP, see Deep innovation protection

Direct encoding,

16, 51, 74, 234

Discrete cosine transformations, 102

Discrete prompt, 337

Distillation mechanisms, 172, 175

Domain randomization method, 320

DoomTakeCover environment, 108, 182,

372

DPPNs, see Diﬀerentiable pattern pro-

ducing networks

DQN, see Deep Q-Network

Dropout method,

286, 291

Dual task, 101

EA, see Evolutionary algorithm

EANT, see Evolutionary Acquisition of

Neural Topologies method

EBPT, see Population-based training

EC, see Evolutionary computation

EDA, see Estimation of distribution algo-

rithm

Egalitarian social learning method, 134

Elitism mechanism, see Replacement mech-

anism

ELM, see Evolution through large models

Embodied intelligence, 332, 389

Empowerment measure, 115

Encapsulated behavior, 390

Encoder-decoder architecture, 44

EndlessForms system, 219, see also Pic-

breeder game

Enforced subpopulations method, 131,

135, 140, 178, 180, 181, 184,

264

Ensembling mechanisms, 11, 82, 129,

161, 171, 185, 296, 359

ENSO method, see Evolution of net-

work symmetry and modularity

method

Entropy maximization,

115

Environment coevolution, 142, 244

EONS, see Evolutionary optimization of

neuromorphic systems method

Epigenetics,

ERL, see Evolutionary reinforcement learn-

ing

ES, see Evolution strategy

ES-MAML method,

314

ESP, see Evolutionary surrogate-assisted

prescription method, see Enforced

subpopulations

Estimation of distribution algorithm,

Eugenic neuroevolution, 135

EuSane, see Eugenic neuroevolution

EvoCNN method,

274

EvoJAX library, 70

EvoLLM method, 354

Evolution of cooperation, 156, 178, 237,

400

Evolution of network symmetry and mod-

ularity method, 143

Evolution strategy method, 23, 307, 345,

354, see also Covariance matrix

adaptation evolution strategy

(𝜇 + 𝜆) selection, 23

(𝜇, 𝜆) selection, 23

458

SUBJECT INDEX

Natural, 28

OpenAI, 28

Simple, 24

Evolution through large models, 348

Evolutionary acquisition of neural topolo-

gies method, 146

Evolutionary algorithm, 3, 14, 49, 74,

119, 193, 234, 275, 308, 335

Evolutionary computation, 2, 8, 74, 111,

112, 268

Evolutionary model merging, 341

Evolutionary optimization of neuromor-

phic systems method, 301

Evolutionary origins of circuits and be-

havior, 375ś379, 382, 394, 400

Evolutionary programming, 32, 50, 187

Evolutionary reinforcement learning, 308

Evolutionary robotics, 149, 316

Evolutionary surrogate-assisted prescrip-

tion method, 158, 163, 173

Evolvability, 77, 224, 230, 231

Evolvable representations, 231

Evolved pattern generators, 386

Evolved virtual creatures, see Virtual

creatures

Evolved weight initialization, 274

Evolving communication, 400

EvoPrompt method, 337

EvoSAX library, 70

EVOTER system, 175

Exploration, 16, 50, 116, 142, 214, 262,

281, 306, 335, 337

Expressive encoding, 234, 384

Extinction events, 230

Facilitating synapses, 377

Fast weights method, 102, 106

Feature selection, 290

Feedforward neural network, 36

Fine-tuning, 10, 95, 147, 288, 331, 335,

345, 350

Fisher information matrix, 292

Fitness evaluation mechanism, 19

Fitness function, 15, 19, 49, 53, 55, 101,

115, 116, 123, 139, 154, 187,

210, 233, 255, 261, 284, 308,

352, 387

Fitness score, see Fitness function

Fitness shaping, see Shaping mechanism

Fitness sharing, 18, 63, 112

Five-in-a-row game, see Gomoku game

Fixed-topology neuroevolution,

50, 88

FlappyBird game, 145, 161

FNN, see Fully connected neural network

Foraging, pursuit, and evasion task, 188

Forward pass, 38

FPGA hardware, 67, 71

Fractured

Domains,

136, 151

Representations, 68

Strategies, 151

French ŕag task, 193

Fully connected layer, 44, 92, 140, 274

Fully connected neural network, 42, 178

Galactic arms race game, 222

Game of life, 192

Game theory, 190, 403

GAN, see Generative adversarial network

Gated recurrent unit, 259

Gaussian process model, 297

Gene regulatory network, 76, 232

Generative adversarial network, 187, 287,

362

Generative AI, 3, 218, 335

Genetic algorithm, 21, 121, 183, 286,

313, 337, 377

Genetic diversity, 18, 111, 244

Genetic programming, 32, 80, 193, 262,

286, 291, 348

Genomic bottleneck hypothesis, 326

Genotype-to-phenotype mapping, 74, 132,

226, 228, 363

Goal switching, 245, 378

GOLEM system, 149

Gomoku game, 139

GP, see Genetic programming

Gradient descent, 5, 37, 205, 254, 368,

382, see also Stochastic gradi-

ent descent

459

SUBJECT INDEX

Graduate student descent, 259

Graph edit distance measure, 275

Graph neural network, 201

GRN, see Gene regulatory network

Group relative policy Optimization, 346

GRPO, see Group relative policy Opti-

mization

GRU, see Gated recurrent unit

Half-ﬁeld soccer domain,

150

Hard maze task, see Maze navigation task

Hardware acceleration,

70, 259, 261, 281,

361, 369

Hate speech classiﬁcation task, 265, 339

Hebbian learning, 83, 300, 316, 382, see

also Lifetime learning

Helicopter hovering task,

283

Heterochrony mechanism, 77

Hill climbing, 314, 358

Human computation markets, 224

Hyena behaviors, 1, 2, 143, 190, 394

HyperNCA method, 201

HyperNEAT method, 92, 151, 325

Adaptive ES-HyperNEAT, 326

Adaptive HyperNEAT, 324

ES-HyperNEAT, 98, 201

HyperNEAT-LEO, 381

Multiagent HyperNEAT method, 95,

151

Hypernetwork approach, 75, 85, 101, 205,

260

IEC, see Interactive evolutionary compu-

tation

ImageNet benchmark,

259, 261, 274

Imagenette benchmark, 294

Indirect encoding, 16, 32, 51, 73, 232,

279, 315, 331, 349

Info box

David Ha,

260

Risto Miikkulainen, 139

Sebastian Risi, 317

Yujin Tang, 344

Innovation protection, 60, 149, 181

Interactive evolutionary computation, 88,

208, 363

Izhikevich neuron, 299

JAX library, 70, see also Hardware accel-

eration

KANs, see Kolmogorov-Arnold networks

KBANN, see Knowledge-based artiﬁcial

neural networks

Khepera robot,

149, 178, 188

Knowledge-based artiﬁcial neural net-

works, 214

Kolmogorov-Arnold networks, 289

L-system, see Lindenmayer system

Lamarckian evolution, 81, 81, 134, 215,

311

Language evolution, 11, 398

Language model crossover, 350

Large language models, 104, 238, 335,

399

Claude, 335, 337

Deepseek, 335

Galactica, 352

Gemini, 335, 337

GPT, 241, 335, 337, 338, 345, 358,

365

Llama, 335, 342, 345, 346, 358

Mistral, 335, 342, 344

PaLM, 339, 358

Qwen, 346

Latent variable evolution, 362

Lateral inhibition, 203

Layer normalization, 46

Leaky-integrate-and-ﬁre neuron, 299

Learning to learn, see Meta-learning

Legend of Zelda game,

197

Legion-II environment, 156

Level generation, 197, 361, see also Pro-

cedural content generation

LIF, see Leaky integrate-and-ﬁre neuron

Lifelong NDP method, 204

Lifetime learning, 81, 316, 320, 322, 382,

385, see also Hebbian learning

Lindenmayer system,

75, 77

460

SUBJECT INDEX

Linkage mechanism, 232

LLM ﬁne-tuning, see Fine-tuning

LLMs, see Large language models

LMX, see Language model crossover

LNDP, see Lifelong NDP method

Locomotion task, 91, 95, 122, 123, 127,

138, 149, 195, 197, 205, 236,

300, 320, 330, 349, 377, 390

Ant robot, 203

Bipedal, see Bipedal walker task

HalfCheetah,

202, 203, 313

Quadruped, 74, 93, 143, 201, 250,

314, 318, 326

Loihi chip, 299

Long short-term memory, 40, 259, 262,

315, 320, 403

Loss function optimization, 286

Lottery ticket hypothesis, 326, 347

LSTM, see Long short-term memory

LunarLander task,

145, 202

Machine learning game, 20, 208, 222

MaestroGenesis system, 219

Major transitions in biology, 186, 235,

398

MAML, see Model agnostic meta-learning

MAML-Baldwin method, 313

MAP-Elites, see Multi-dimensional archive

of phenotypic elites

MarioGPT system,

365

Marker-based encoding method, 50

Markov Brains method, 50, 190

Massive open online course, 213

Max pooling method, 43

Maze navigation task, 101, 126, 142, 211,

315, 316

MEA, see Meta-evolutionary EA

Mean-squared-error loss,

286

Medical aesthetics domain, 295

Memory-augmented neural network, 327

Meta-evolutionary EA, 283

Meta-learning, 126, 258, 281, 285, 312,

331, 389

Minecraft environment, 204, 243, 389

Mixture of experts method, 129, 171

Mobbing behavior, 394

Model agnostic meta-learning, 312

Modularity, 11, 17, 68, 101, 143, 261,

332, 378, 379

MoE, see Mixture of experts method

Morphogenesis process,

73, 192

MountainCar task, 311

Ms. Pac-Man game, 152

MSuNAS method, 273

Multi-dimensional archive of phenotypic

elites, 122

CMA-MAP-annealing, 128

CMA-MAP-Elites, 127, 198

CVT-MAP-Elites, 127

MAP-Elites via a gradient arbores-

cence, 128

MAP-Elites with ES, 127

Multi-head attention, 46

Multiagent ESP method, 184, 190

Multimodal behavior, 101, 141, 157

Multiobjective NAS, 267

Multiobjective optimization, 30, 128, 152,

182, 229, 267, 273, 290

Multiplexer design task, 136, 281

Multitask learning, 152, 237, 269, 290,

393

Multitask NAS, 267

Mutation mechanism, 22, 50, 81, 143,

255, 288, 332, 337, 339, 349,

380

Mutation operator, see Mutation mecha-

nism

NAS, see Neural architecture search

NAS benchmarks,

260, 273

NASNet search space, 266

Nature vs. nurture debate, 279, 315, 384

NCA, see Neural cellular automata

NDP, see Neural developmental program

method

NEAT, see Neuroevolution of augment-

ing topologies

NEAT+Q method,

311

NERO game, 139, 208

461

SUBJECT INDEX

Neural architecture search, 33, 180, 254,

285

Neural cellular automata, 193, 197, 201

Neural developmental program method,

201

Neural Turing machine, 327

NeuroAI system, 158, 162

Neuroannealing method, 135

Neuroevolution of augmenting topolo-

gies, 58, 148, 152, 180, 188,

193, 209, 219, 221, 230, 233,

254, 265, 311, 315, 370, 396,