Embodying the Future: Modeling Visually Guided Planning as Prospective Mental Simulation

Jeremy Gordon. Embodying the Future: Modeling Visually Guided Planning as Prospective Mental Simulation. Ph.D. dissertation. Advisors: John Chuang, Coye Cheshire, Steven Piantadosi, Giovanni Pezzulo. University of California, Berkeley. 2023.


What would it feel like to run outside, right now, and attempt a somersault on the first surface you find? Taking seriously an invitation like this to imagine a (perhaps unlikely) future, prompts the activation of evolutionary machinery in the mind and body that took millions of years to emerge. The ability to answer this question depends upon a surprisingly complex model of yourself, the ecology you inhabit, and how you interact within it. Throughout the first several seconds of engaging with this prompt, you have likely tapped memories of a wide spatial vicinity around you, and the types of surfaces likely to be encountered. The season, today’s weather, the existence of a lawn and the schedule of a sprinkler system, all brought to bear to produce simulations of the kinds of experience that might result, and the sorts of choices that might be required. Enactive and embodied views of cognition tell us that these simulated experiences are not unitary but parallel and overlapping (A. Clark, 2015), are unbound to typical temporal sequences and durations (Arnold, Iaria, & Ekstrom, 2016), and live not in your head as local neural traces, but leverage your entire body (Thompson, 2010) as you consider the sensory and motor consequences of an activity you may not have performed in quite a while, and certainly never in this particular context. These simulations are constructed under the guidance of past experiences (how fast do you typically run?), counterfactual contingencies (what if the lawn is too crowded?), and knowledge of complex causal relations and social dynamics (how would a passerby respond? what would a colleague think?).

Embodied, prospective simulations of this sort are not a rare occurrence prompted primarily by thought experiments in a thesis. They are ubiquitous in our daily experience. Our cognitive ability to predict how events may unfold in time, how the world will respond to our actions, and how our body will respond to the world, is likely one of the central, and perhaps unique (Bulley, Henry, & Suddendorf, 2016), competencies that allow members of our species to solve problems we’ve never before encountered, and construct, share, and run experiments on candidate actions and plans before committing to enact them.

This work focuses predominantly on the domain of spatial navigation. As a modeling target, navigation enjoys a long history in computer science, as well as the planning literature, and appears as one of four categories included in Buckner and Carrol’s ontology of self-projection (Buckner & Carroll, 2007). Given its natural low-dimensional spatial state space, navigation is especially well-suited to behavioral study.

Psychologists have gathered a long list of effects outlining key features of the way we learn about and model the world. Tendencies to infer causal relations (Kushnir & Gopnik, 2005), to perceive patterns (even when none exists, e.g., Fyfe, Williams, Mason, and Pickup (2008)), to develop an intuitive understanding of physical systems (Krist, Fieberg, & Wilkening, 1993), to categorize, and hierarchically arrange novel stimuli and task environments (Collins & Frank, 2013), are all suggestive of the processes by which such a model are learned and employed. However, far less is known about the dynamics by which we navigate this internal model, its constraints and topology, or the mechanisms by which simulations traverse large spatial and temporal distances to most effectively instruct the action-oriented needs of the present. In short, we know little about the kinematics of mental simulation. I aim to pursue, then, the following claim: by studying behavior and sensory exploration during navigation, we may begin to characterize the human solution to planning in an uncertain world, and take steps towards a kinematics of prospective mental simulation—the dynamic process by which individuals embody and interact with multiple possible futures.

This work is guided by research questions that I categorize into three areas of inquiry: 1) embodied planning, 2) visual salience and simulation kinematics in navigation planning, and 3) prospection in joint planning.

Embodied planning Questions in this area aim to explore the limits of classical approaches, and suggest avenues and opportunities to develop a more nuanced and naturalistic theory of planning in the real world. I will argue in Chapter 3 that this need to accommodate embodied dynamics—in decision-making as well as less explicitly choice-centered natural activities—requires us, as a scientific community, to develop new models better able to capture the richness of the mental processes that anticipate and drive action in the world. The reviewed work displays a range of approaches to bring embodied intelligence into communication with the classical literature on decision-making and planning. In particular, I’ll develop the concept of the affordance landscape, with example applications in various naturalistic (and fundamentally embodied) tasks such as playing team sports, climbing, and crossing a river. These theoretical arguments and methodologies set a foundation for the next research area, in which a mode of embodiment particularly relevant to visual creatures like ourselves is explored.

Visual salience and simulation kinematics in navigation planning This area captures the primary empirical questions of this dissertation, and motivates the use of an agent model designed to iteratively learn the sort of affordance landscapes discussed in Chapter 3. I’ll tackle questions about attractors of visual attention, and temporal sampling dynamics, through a variety of behavioral analyses. Key findings in Chapters 4 and 5 include: the predictive relationship between planning-time biometrics and navigation-time decisions, geometric and map-geometry attributes most predictive of attention, as well as various hierarchical aspects of the sequential gaze patterns seen during planning. I also report behavioral differences between top and bottom performers in this task, which raise plausible explanations for how, and through what mechanisms, individuals differ in their spatial planning abilities.

Prospection in joint planning In the third area of inquiry I’ll look beyond the individual spatial navigation paradigm to explore the case of multiple collaborators who must, through a variety of techniques, converge on a shared strategy to jointly solve the task at hand. I propose that some of these techniques implicate a planning process driven by the simulation of (shared) counterfactual future trajectories. However, I also acknowledge the significant additional complexity that comes from introducing social interaction, even in relatively simple problem-solving paradigms, and even when verbal communication is limited. I report findings from a two-person collaborative task which suggests a history-dependence of strategy selection when task dynamics changed. Fairness (defined as the balance of effort) was found to be lower in imbalanced task environments, as well as during the more difficult blocks requiring greater partner monitoring. Finally, I find a persistent bias towards spatial separation (which likely helps avoid interaction and enhances decoupling between collaborators), even among dyads using strict color-based strategies, highlighting that non-trivial hybrid approaches which mix conventions can be productively adopted even in non-verbal collaborative settings.

This investigation of embodied prospection builds on a diverse multidisciplinary literature spanning work on mental simulation and episodic foresight in psychology, predictive processing in philosophy of mind, hippocampal preplay and internally generated sequences in neuroscience, embodied and enactive cognitive science, as well as a number of biologically inspired computational models of planning.

Contributions Building on this multidisciplinary foundation, I hope to make three contributions:

First is a conceptual extension made by projecting the body outwards into its likely future, and seeing this projection as a first class representation of the self within which simulations are continuously run. My claim is that this projection is the interface within which future-determining actions are enacted, and is therefore, in an important sense, cognitively inseparable from the individual. Furthermore, the sensorimotor patterns generated in these simulations are not fundamentally different from the inferential (and simulative) process of perception in the present, and therefore are equally constitutive of our moment-to-moment experience.

Second, following Guest and Martin (2021), I propose an instantiation of this theoretical framing within a computational agent given control of a sensory apparatus similar to ours, and tasked with the same types of navigational challenges posed to the human participants in my studies. I argue that models like this one can help to achieve a unified conceptual framing of future-oriented dynamics that links both low-level predictive processing (which may operate over short temporal durations), with the kinds of dramatically extended episodic simulations typical when studying mental time travel. While a diversity of mechanisms are entailed in such a unified framework, I believe seeing these as two ends of a continuum with shared function, and more than likely, algorithmic and process-level correspondence as well.

Third, I share a series of empirical findings relating to visually guided planning and execution within spatial navigation problems. Some results are consistent with the more speculative ideas raised earlier in this section, and others highlight the remaining gaps in our understanding of the tremendously complex processes underlying visual search, prospection, and planning.

By pursuing the larger project within which this work sits, we can work towards a more nuanced understanding of the dynamics by which individuals generate and test expectations for the future. If successful, this project should help deepen our understanding of the way these prospective cognitive processes are embedded in a body, and a complex motor system, that grounds them in the world around us. If so, we may better appreciate the aspects of human intelligence that are truly unique, and defined by a complexity that is still far out of reach of replication in artificial systems. As such, perhaps our computational abilities will be better leveraged towards the design of tools personalized to individual preferences and idiosyncrasies, or even self-acknowledged oversights. Such systems may be able to externalize more legible beliefs, risks, and hypothetical futures, and by availing these to us, help enable more effective communication and collaboration. Finally, by better understanding how individuals project themselves into uncertain futures, and reason about long-term consequences, we might prepare ourselves, as a society, to more effectively face the challenges still ahead.


Last updated: August 9, 2023