GP for Continuous Control: Teacher or Learner? The Case of Simulated Modular Soft Robots
XX Genetic Programming Theory & Practice (GPTP), held in East Lansing (USA)
Links and material:
We consider the problem of optimizing a controller for agents whose observation and action spaces are continuous, i.e., where the controller is a multivariate real function f: R^n → R^m. We use genetic programming (GP) for solving this optimization problem. Namely, we employ a multi-tree-based GP variant, where a candidate solution is an array of m trees, each encoding a univariate function of the agent observation. We compare this form of optimization against the more common one where the controller is a multi-layer perceptron, with a predefined topology, whose weights are optimized through (neuro)evolution (NE). Moreover, we consider an evolutionary algorithm, GraphEA, that directly evolves graphs, each having n input nodes and m output nodes. We apply these three approaches to the case of simulated modular soft robots, where a robot is an aggregation of identical soft modules, each employing a controller that processes the local observation and produces the local action. We find that, in our scenario, multi-tree-based GP is competitive with NE and tends to produce different behaviors. We then experimentally investigate the possibility of optimizing a controller using another, pre-optimized one, as teacher, i.e., we realize a form of offline imitation learning. We consider all the teacher-learner pairs resulting from the three evolutionary algorithms and find that NE is a better learner than GP and GraphEA. However, controllers obtained through offline imitation learning are far less effective than those obtained through direct evolution. We hypothesize that this gap in effectiveness may be explained by the possibility, given by direct evolution, of exploring during the simulations a larger portion of the observation-action space.