[Toss-devel] commit: Maximaxing and generation of heuristics

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I'm committing the maximaxing update to the MCTS algorithm, and
heuristic derivation from payoff expressions. The current default
setting, so that we can celebrate the development for a while, is to
use the maximaxed heuristics _alone_ when suggesting a move. See
Game.default_params,

I'd like to have a modified version of TNF which I call FF-TNF, as
specified below, but I find the definitions in FormulaOps difficult to
understand...

I cite below the most relevant source comments.

(* Effect that heuristics have on the MCTS algorithm. *)
type mcts_heur_effect =
  | LocalHeur of float
      (* each tree node only considers the heuristic of its state,
         the parameter is the influence of the heuristic on the tree
         traversal, there is no influence on the actual choice *)
  | MaximaxMixed of float * float
      (* a node stores a heuristic maximaxed from the leaf states of
         the subtree, [MaximaxMixed (trav, select)] has [trav]
         the influence on the tree traversal, [select] the influence
         on the actual choice *)
  | MaximaxSelect of float
      (* a node stores a heuristic maximaxed from the leaf states of
         the subtree, the parameter is the influence on the tree
         traversal, the actual choice is based on the heuristic alone
         and not the Monte-Carlo payoff estimates *)

(* Parameters of the Upper Confidence Bounds-based Monte Carlo Tree
   Search. Cooperative (competitive) mode means that of actions with
   equal value for a given player, the one with highest (lowest) sum
   of values is chosen. *)
type uct_params = {
  cUCT : float ;
  (* coefficient of the confidence bound component *)
  constK : float ;                   (* smoothening *)
  steps : int ;                      (* tree updates per move *)
  horizon : int option ;             (* limit on the playout length *)
  maximax : mcts_heur_effect ;       (* maximaxed vs local heuristic *)
  cooperative : bool ;               (* cooperative vs competitive *)
}

[...]

let default_params = {
  cUCT = 1.0 ;
  constK = 1.0 ;
  steps = 200 ;
  horizon = None ;
  maximax = MaximaxSelect 1.0 (* MaximaxMixed (1.0, 0.02) *) ;
  cooperative = false ;
}

[...]

(* Generate a heuristic from a payoff. Currently, universal
   quantifiers (in positive positions and existential in negative
   positions) are treated opaquely (like literals).

   Input: a set of relations F whose instances can be altered duirng a
   game; payoff of the game for a player.

   Return a homomorphic image of the payoff which transforms Char(Phi)
   into H(Phi).

   H(Phi) = Alg(Fluent-First Type Normal Form of Phi wrt. relations F)

   Algorithm Alg(Phi):
   1: Reduce Phi to DNF treating quantified subformulas opaquely.
   2: Sum the results for disjuncts, where results are defined as:
   3: Each disjunct is a conjunction of literals and quantified
   formulas. Split the conjuncts into existential quantifications {Ex(V1,E1),
   ..., Ex(Vn,En)} and others (G).
   4: Result:
   Char(G) * (1 + (1+|V1|) * Sum(V1,Alg(E1))+...+(1+|Vn|) * Sum(Vn,Alg(En)))

   TODO: compare with alternative algorithm that maximizes over
   disjuncts instead of summing (step 2 of Alg).

   TODO: Replace Sum with BoundedSum once it is implemented.

   TODO: consider optimizing by not distributing blindly into DNF but
   by factoring "Alg(a /\ b) ==> Char(a) * Alg(b)", where "a" is a
   conjunction of literals and "b" doesn't contain literals, if it
   helps!

   A formula is in FF-TNF wrt. relations F if, and only if, it is a
   boolean combination of formulas tau of the form: (an atom over F)
   or (a formula in TNF that does not contain rels from F) or Ex (xs,
   boolean combination of tau's) or All (xs, boolean combination of
   tau's). Two invariants must additionally hold: in Ex (x, BC
   (tau's)) and All (x, BC (tau's)) it must hold that the free
   variables of *each* of the tau's contain x and x must appear in at
   least one atom over F.

   FIXME: FF-TNF is not implemented yet, do the transformation by
   hand.

   FIXME: Currently, second order variables are not allowed.
*)
open Formula

let rec heuristic_of_payoff expr =
[...]