[Toss-devel] commit: Maximaxing and generation of heuristics
Status: Beta
Brought to you by:
lukaszkaiser
From: Lukasz S. <luk...@gm...> - 2010-06-29 14:44:09
|
I'm committing the maximaxing update to the MCTS algorithm, and heuristic derivation from payoff expressions. The current default setting, so that we can celebrate the development for a while, is to use the maximaxed heuristics _alone_ when suggesting a move. See Game.default_params, I'd like to have a modified version of TNF which I call FF-TNF, as specified below, but I find the definitions in FormulaOps difficult to understand... I cite below the most relevant source comments. (* Effect that heuristics have on the MCTS algorithm. *) type mcts_heur_effect = | LocalHeur of float (* each tree node only considers the heuristic of its state, the parameter is the influence of the heuristic on the tree traversal, there is no influence on the actual choice *) | MaximaxMixed of float * float (* a node stores a heuristic maximaxed from the leaf states of the subtree, [MaximaxMixed (trav, select)] has [trav] the influence on the tree traversal, [select] the influence on the actual choice *) | MaximaxSelect of float (* a node stores a heuristic maximaxed from the leaf states of the subtree, the parameter is the influence on the tree traversal, the actual choice is based on the heuristic alone and not the Monte-Carlo payoff estimates *) (* Parameters of the Upper Confidence Bounds-based Monte Carlo Tree Search. Cooperative (competitive) mode means that of actions with equal value for a given player, the one with highest (lowest) sum of values is chosen. *) type uct_params = { cUCT : float ; (* coefficient of the confidence bound component *) constK : float ; (* smoothening *) steps : int ; (* tree updates per move *) horizon : int option ; (* limit on the playout length *) maximax : mcts_heur_effect ; (* maximaxed vs local heuristic *) cooperative : bool ; (* cooperative vs competitive *) } [...] let default_params = { cUCT = 1.0 ; constK = 1.0 ; steps = 200 ; horizon = None ; maximax = MaximaxSelect 1.0 (* MaximaxMixed (1.0, 0.02) *) ; cooperative = false ; } [...] (* Generate a heuristic from a payoff. Currently, universal quantifiers (in positive positions and existential in negative positions) are treated opaquely (like literals). Input: a set of relations F whose instances can be altered duirng a game; payoff of the game for a player. Return a homomorphic image of the payoff which transforms Char(Phi) into H(Phi). H(Phi) = Alg(Fluent-First Type Normal Form of Phi wrt. relations F) Algorithm Alg(Phi): 1: Reduce Phi to DNF treating quantified subformulas opaquely. 2: Sum the results for disjuncts, where results are defined as: 3: Each disjunct is a conjunction of literals and quantified formulas. Split the conjuncts into existential quantifications {Ex(V1,E1), ..., Ex(Vn,En)} and others (G). 4: Result: Char(G) * (1 + (1+|V1|) * Sum(V1,Alg(E1))+...+(1+|Vn|) * Sum(Vn,Alg(En))) TODO: compare with alternative algorithm that maximizes over disjuncts instead of summing (step 2 of Alg). TODO: Replace Sum with BoundedSum once it is implemented. TODO: consider optimizing by not distributing blindly into DNF but by factoring "Alg(a /\ b) ==> Char(a) * Alg(b)", where "a" is a conjunction of literals and "b" doesn't contain literals, if it helps! A formula is in FF-TNF wrt. relations F if, and only if, it is a boolean combination of formulas tau of the form: (an atom over F) or (a formula in TNF that does not contain rels from F) or Ex (xs, boolean combination of tau's) or All (xs, boolean combination of tau's). Two invariants must additionally hold: in Ex (x, BC (tau's)) and All (x, BC (tau's)) it must hold that the free variables of *each* of the tau's contain x and x must appear in at least one atom over F. FIXME: FF-TNF is not implemented yet, do the transformation by hand. FIXME: Currently, second order variables are not allowed. *) open Formula let rec heuristic_of_payoff expr = [...] |