Playing Regex Golf with Genetic Programming

Type:

Conf

Authors:

Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, Fabiano Tarlao

In:

ACM Genetic and Evolutionary Computation Conference (GECCO), held in Vancouver (Canada)

Year:

2014

Links and material:

Abstract #

Regex golf has recently emerged as a specific kind of code golf, i.e., unstructured and informal programming competitions aimed at writing the shortest code solving a particular problem. A problem in regex golf consists in writing the shortest regular expression which matches all the strings in a given list and does not match any of the strings in another given list. The regular expression is expected to follow the syntax of a specified programming language, e.g., Javascript or PHP.In this paper, we propose a regex golf player internally based on Genetic Programming. We generate a population of candidate regular expressions represented as trees and evolve such population based on a multi-objective fitness which minimizes the errors and the length of the regular expression.We assess experimentally our player on a popular regex golf challenge consisting of 16 problems and compare our results against those of a recently proposed algorithm—the only one we are aware of. Our player obtains scores which improve over the baseline and are highly competitive also with respect to human players. The time for generating a solution is usually in the order of tens minutes, which is arguably comparable to the time required by human players.