You can subscribe to this list here.
2004 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}
(1) 
_{Sep}

_{Oct}

_{Nov}

_{Dec}


2005 
_{Jan}
(1) 
_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

2009 
_{Jan}

_{Feb}

_{Mar}
(1) 
_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}
(17) 
_{Sep}
(34) 
_{Oct}

_{Nov}

_{Dec}

2010 
_{Jan}

_{Feb}
(100) 
_{Mar}
(122) 
_{Apr}
(5) 
_{May}

_{Jun}
(17) 
_{Jul}
(36) 
_{Aug}
(9) 
_{Sep}
(111) 
_{Oct}
(92) 
_{Nov}
(76) 
_{Dec}
(26) 
2011 
_{Jan}
(3) 
_{Feb}
(35) 
_{Mar}
(36) 
_{Apr}
(10) 
_{May}
(9) 
_{Jun}
(2) 
_{Jul}
(3) 
_{Aug}
(2) 
_{Sep}

_{Oct}
(7) 
_{Nov}
(12) 
_{Dec}

2012 
_{Jan}
(19) 
_{Feb}
(1) 
_{Mar}
(4) 
_{Apr}
(1) 
_{May}
(6) 
_{Jun}
(69) 
_{Jul}
(21) 
_{Aug}
(12) 
_{Sep}
(14) 
_{Oct}
(1) 
_{Nov}
(3) 
_{Dec}

2013 
_{Jan}
(6) 
_{Feb}
(1) 
_{Mar}
(6) 
_{Apr}
(3) 
_{May}
(6) 
_{Jun}
(1) 
_{Jul}

_{Aug}

_{Sep}

_{Oct}
(2) 
_{Nov}
(3) 
_{Dec}

2014 
_{Jan}

_{Feb}

_{Mar}
(6) 
_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

2015 
_{Jan}
(4) 
_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}

_{Sep}

_{Oct}

_{Nov}
(1) 
_{Dec}
(3) 
2016 
_{Jan}
(6) 
_{Feb}
(1) 
_{Mar}
(3) 
_{Apr}
(1) 
_{May}
(3) 
_{Jun}
(1) 
_{Jul}
(1) 
_{Aug}
(3) 
_{Sep}
(2) 
_{Oct}
(1) 
_{Nov}
(1) 
_{Dec}
(1) 
2017 
_{Jan}

_{Feb}
(2) 
_{Mar}
(1) 
_{Apr}

_{May}

_{Jun}

_{Jul}
(2) 
_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

S  M  T  W  T  F  S 



1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19
(2) 
20
(2) 
21

22

23

24

25

26

27
(2) 
28

29
(9) 
30
(2) 



From: Lukasz Stafiniak <lukstafi@gm...>  20100630 20:12:52

I still have to have a closer look if it works as I expect. I haven't even printed the heuristics yet. Anyway, it would be great if you could write down interesting positions, preferably with a single best move, to add to tests. You seem to be really working with games, can you recall some positions? On Wed, Jun 30, 2010 at 5:02 PM, Lukasz Kaiser <lukaszkaiser@...> wrote: > > Ok, I get it. But somehow it is quite a lot faster now than before, > or perhaps that's just my feeling. It is faster than I expected, I have to debug it. > Anyway  I played on 600 iterations > and of course it plays miserably. Can we adjust the parameters and > the heuristics in such a way that it will play more or less ok? You can play with parameters, perhaps you'd like to pass some parameters from the python side? I'll first finish and test heuristic generation, then I'll have a look at the search trees. 
From: Lukasz Kaiser <lukaszkaiser@gm...>  20100630 15:03:15

> The algorithm for building the UCT is the same, only that the > heuristics change during iterations. So the shape of the tree should > be rather similar to how you remember it from debugging earlier (and > is random  it is a Monte Carlo method). Ok, I get it. But somehow it is quite a lot faster now than before, or perhaps that's just my feeling. Anyway  I played on 600 iterations and of course it plays miserably. Can we adjust the parameters and the heuristics in such a way that it will play more or less ok? Can we perhaps try a handwritten heuristic? I'm talking about Breakthrough, but I'll try Gomoku soon  I guess your generated heuristic should help there much more, but the parameters are too strongly fixed now I guess. Lukasz 
From: Lukasz Stafiniak <lukstafi@gm...>  20100629 23:07:26

On Wed, Jun 30, 2010 at 12:45 AM, Lukasz Kaiser <lukaszkaiser@...> wrote: > > I tried to have a look at it, but I still do not get one thing: with maximaxed, > what is the influence of the number of iterations? If I wanted to get sth. > similar to alphabeta with depth 2 (or other), and preferably fast, i.e. not > playing to the end of each play, how should I set the parameters? The algorithm for building the UCT is the same, only that the heuristics change during iterations. So the shape of the tree should be rather similar to how you remember it from debugging earlier (and is random  it is a Monte Carlo method). 
From: Lukasz Kaiser <lukaszkaiser@gm...>  20100629 22:45:29

> I'm committing the maximaxing update to the MCTS algorithm, and > heuristic derivation from payoff expressions. The current default > setting, so that we can celebrate the development for a while, is to > use the maximaxed heuristics _alone_ when suggesting a move. See > Game.default_params, I tried to have a look at it, but I still do not get one thing: with maximaxed, what is the influence of the number of iterations? If I wanted to get sth. similar to alphabeta with depth 2 (or other), and preferably fast, i.e. not playing to the end of each play, how should I set the parameters? Anyway, I'll try to get a better and fresh look at it tomorrow. Best, Lukasz 
From: Lukasz Stafiniak <lukstafi@gm...>  20100629 21:39:31

The optimized version is technically a little bit nontrivial, actually! An optimized version instead of multiplying by Char(Gi) inside recursive calls Alg(Ei), replaces Sum(Vi,Alg(Ei)) by Sum(Vi,Gi,Alg(Ei)). For this to work, distribute "Sum"s from existential quantification over "Plus"es from DNF disjunctions. So, I need to push "Sum" and pull "Char" so that they meet. But this way "Char" disappears and "Sum" is bounded to be computed only when needed. On Tue, Jun 29, 2010 at 10:35 PM, Lukasz Kaiser <lukaszkaiser@...> wrote: >> I will try to get to it and change sum to include the guard as well. > > Done and committed  both the performance bug and guarded sum > are now done. You can try to optimize your stuff to use the guards > in the sum, for now I only put "true" in it to make it compile somehow. > > Best! > > Lukasz > 
From: Lukasz Kaiser <lukaszkaiser@gm...>  20100629 20:36:16

> I will try to get to it and change sum to include the guard as well. Done and committed  both the performance bug and guarded sum are now done. You can try to optimize your stuff to use the guards in the sum, for now I only put "true" in it to make it compile somehow. Best! Lukasz 
From: Lukasz Stafiniak <lukstafi@gm...>  20100629 16:03:08

On Tue, Jun 29, 2010 at 5:21 PM, Lukasz Kaiser <lukaszkaiser@...> wrote: >> I'd like to have a modified version of TNF which I call FFTNF, as >> specified below, but I find the definitions in FormulaOps difficult to >> understand... > > I am not really sure what you mean by FTNF  can every formula > be converted to that form? What is the suggested algorithm, what more > than TNF is expected? Let us stick to an example: say a formula > ex x all y (R(x, y) or Q(y)) > with F = { Q }. The first quantified 'x' does not appear in an atom from F, > but also I do not really see how to convert it. What would you do with > such a formula, or is it already in FTNF? If it is, can you give an example > of a formula which is in TNF, but not in FTNF? That would be good! > I'm sorry I messed up the formalization! I'll fix it soon and write down as a pdf page. Every formula has a FTNF form. Intuitions by examples: The above formula is already in FTNF form. Example of a formula that isn't: ex x ex y (R(x, y) ex z (R(y, z) and Q(z)) its FTNF is: ex y ex z (R(y, z) and Q(z) and ex x R(x, y)) 
From: Lukasz Kaiser <lukaszkaiser@gm...>  20100629 15:21:41

> I'm committing the maximaxing update to the MCTS algorithm, and > heuristic derivation from payoff expressions. The current default > setting, so that we can celebrate the development for a while, is to > use the maximaxed heuristics _alone_ when suggesting a move. See > Game.default_params, Nice, thanks! (I will try to test and compare it to the previous in the evening.) > I'd like to have a modified version of TNF which I call FFTNF, as > specified below, but I find the definitions in FormulaOps difficult to > understand... I am not really sure what you mean by FTNF  can every formula be converted to that form? What is the suggested algorithm, what more than TNF is expected? Let us stick to an example: say a formula ex x all y (R(x, y) or Q(y)) with F = { Q }. The first quantified 'x' does not appear in an atom from F, but also I do not really see how to convert it. What would you do with such a formula, or is it already in FTNF? If it is, can you give an example of a formula which is in TNF, but not in FTNF? That would be good! Thanks again :). Lukasz 
From: Lukasz Stafiniak <lukstafi@gm...>  20100629 14:44:09

I'm committing the maximaxing update to the MCTS algorithm, and heuristic derivation from payoff expressions. The current default setting, so that we can celebrate the development for a while, is to use the maximaxed heuristics _alone_ when suggesting a move. See Game.default_params, I'd like to have a modified version of TNF which I call FFTNF, as specified below, but I find the definitions in FormulaOps difficult to understand... I cite below the most relevant source comments. (* Effect that heuristics have on the MCTS algorithm. *) type mcts_heur_effect =  LocalHeur of float (* each tree node only considers the heuristic of its state, the parameter is the influence of the heuristic on the tree traversal, there is no influence on the actual choice *)  MaximaxMixed of float * float (* a node stores a heuristic maximaxed from the leaf states of the subtree, [MaximaxMixed (trav, select)] has [trav] the influence on the tree traversal, [select] the influence on the actual choice *)  MaximaxSelect of float (* a node stores a heuristic maximaxed from the leaf states of the subtree, the parameter is the influence on the tree traversal, the actual choice is based on the heuristic alone and not the MonteCarlo payoff estimates *) (* Parameters of the Upper Confidence Boundsbased Monte Carlo Tree Search. Cooperative (competitive) mode means that of actions with equal value for a given player, the one with highest (lowest) sum of values is chosen. *) type uct_params = { cUCT : float ; (* coefficient of the confidence bound component *) constK : float ; (* smoothening *) steps : int ; (* tree updates per move *) horizon : int option ; (* limit on the playout length *) maximax : mcts_heur_effect ; (* maximaxed vs local heuristic *) cooperative : bool ; (* cooperative vs competitive *) } [...] let default_params = { cUCT = 1.0 ; constK = 1.0 ; steps = 200 ; horizon = None ; maximax = MaximaxSelect 1.0 (* MaximaxMixed (1.0, 0.02) *) ; cooperative = false ; } [...] (* Generate a heuristic from a payoff. Currently, universal quantifiers (in positive positions and existential in negative positions) are treated opaquely (like literals). Input: a set of relations F whose instances can be altered duirng a game; payoff of the game for a player. Return a homomorphic image of the payoff which transforms Char(Phi) into H(Phi). H(Phi) = Alg(FluentFirst Type Normal Form of Phi wrt. relations F) Algorithm Alg(Phi): 1: Reduce Phi to DNF treating quantified subformulas opaquely. 2: Sum the results for disjuncts, where results are defined as: 3: Each disjunct is a conjunction of literals and quantified formulas. Split the conjuncts into existential quantifications {Ex(V1,E1), ..., Ex(Vn,En)} and others (G). 4: Result: Char(G) * (1 + (1+V1) * Sum(V1,Alg(E1))+...+(1+Vn) * Sum(Vn,Alg(En))) TODO: compare with alternative algorithm that maximizes over disjuncts instead of summing (step 2 of Alg). TODO: Replace Sum with BoundedSum once it is implemented. TODO: consider optimizing by not distributing blindly into DNF but by factoring "Alg(a /\ b) ==> Char(a) * Alg(b)", where "a" is a conjunction of literals and "b" doesn't contain literals, if it helps! A formula is in FFTNF wrt. relations F if, and only if, it is a boolean combination of formulas tau of the form: (an atom over F) or (a formula in TNF that does not contain rels from F) or Ex (xs, boolean combination of tau's) or All (xs, boolean combination of tau's). Two invariants must additionally hold: in Ex (x, BC (tau's)) and All (x, BC (tau's)) it must hold that the free variables of *each* of the tau's contain x and x must appear in at least one atom over F. FIXME: FFTNF is not implemented yet, do the transformation by hand. FIXME: Currently, second order variables are not allowed. *) open Formula let rec heuristic_of_payoff expr = [...] 
From: Lukasz Kaiser <lukaszkaiser@gm...>  20100629 14:04:44

> But perhaps it is enough to just optimize multiplication computation > for the zero case? BoundedSum is better in principle because it allows > for whatever tricks logic has up its sleeve, but the multiplication > optimization would help the translation "Char (And [a; b]) ==> > Times(Char [a], Char [b])" not deteriorate performance. Well  in general I'm not sure, but I think it would be a good idea to change the type definition already, so that we can use it later. And it might be helpful when we have more variables as well. > I consider it not my department but I could work on it if necessary. I'm not sure that would be good  I added a lot of hacks, so it could be quite a waste of time for you to look all of these through. I hope I will just correct this bug soon, and there seem to be no other really grave ones at present. One day we might think about rewriting the whole Assignments and Solver stuff, but it's no use to do it now. I will try to get to it and change sum to include the guard as well. Lukasz 
From: Lukasz Stafiniak <lukstafi@gm...>  20100629 09:52:40

On Sun, Jun 27, 2010 at 7:05 PM, Lukasz Kaiser <lukaszkaiser@...> wrote: >> Could you change the Sum construnct into a BoundedSum (or GuardedSum) construct, >> >> BoundedSum of fo_var * formula * real_expr >> >> Both are expressible in the other >> >> BoundedSum (v, guard, expr) = Sum (v, Times (Char guard, expr)) >> Sum (v, expr) = BoundedSum (v, And [], expr) >> >> but I believe BoundedSum will result in considerably more efficient >> calculations of heuristics, because [expr] will not be computed for >> instances of [v] for which [Char guard = 0]. >> (I promised to commit in the afternoon, but I'll finish generating >> heuristics and commit tonight I guess...) > > You are right in general, but at present the evaluation is so bad, > that it will not help at all But perhaps it is enough to just optimize multiplication computation for the zero case? BoundedSum is better in principle because it allows for whatever tricks logic has up its sleeve, but the multiplication optimization would help the translation "Char (And [a; b]) ==> Times(Char [a], Char [b])" not deteriorate performance. > There is a TODO in Solver ml to correct > that  so that evaluating RealExpr uses the contex. But this is some > larger work, I suggest we stick to what we have for now and get back > to that later (also important: negation is handled wrong at present, > because there is some bug for 1element structures; correcting this > gives 2x speedup, but I never find enough time to get to it; I hope > next week will be a bit better, but full real_expr rewrite is too much). > I consider it not my department but I could work on it if necessary. 
From: Lukasz Kaiser <lukaszkaiser@gm...>  20100627 17:06:10

> Could you change the Sum construnct into a BoundedSum (or GuardedSum) construct, > > BoundedSum of fo_var * formula * real_expr > > Both are expressible in the other > > BoundedSum (v, guard, expr) = Sum (v, Times (Char guard, expr)) > Sum (v, expr) = BoundedSum (v, And [], expr) > > but I believe BoundedSum will result in considerably more efficient > calculations of heuristics, because [expr] will not be computed for > instances of [v] for which [Char guard = 0]. > (I promised to commit in the afternoon, but I'll finish generating > heuristics and commit tonight I guess...) You are right in general, but at present the evaluation is so bad, that it will not help at all. There is a TODO in Solver ml to correct that  so that evaluating RealExpr uses the contex. But this is some larger work, I suggest we stick to what we have for now and get back to that later (also important: negation is handled wrong at present, because there is some bug for 1element structures; correcting this gives 2x speedup, but I never find enough time to get to it; I hope next week will be a bit better, but full real_expr rewrite is too much). Lukasz 
From: Lukasz Stafiniak <lukstafi@gm...>  20100627 13:58:43

Hi Łukasz, Could you change the Sum construnct into a BoundedSum (or GuardedSum) construct, BoundedSum of fo_var * formula * real_expr Both are expressible in the other BoundedSum (v, guard, expr) = Sum (v, Times (Char guard, expr)) Sum (v, expr) = BoundedSum (v, And [], expr) but I believe BoundedSum will result in considerably more efficient calculations of heuristics, because [expr] will not be computed for instances of [v] for which [Char guard = 0]. (I promised to commit in the afternoon, but I'll finish generating heuristics and commit tonight I guess...) 
From: Lukasz Kaiser <lukaszkaiser@gm...>  20100620 10:18:44

> I've had an idea for a refined selection for the "best option": among > the player's options having the same value, select the one which has > the highest / lowest sum of values over players. Then I decided it doesn't help me > with anything, but perhaps it is a choice worth having? I think it is worth it. The notion of maximizing your own payoff, but among equal then minimizing the others payoff is, I think, known under the name of secure Nash equilibrium. Sure it's good to have. Lukasz 
From: Lukasz Stafiniak <lukstafi@gm...>  20100620 01:01:37

I do need this distinction, for better maximaxing. I will add it defaulting to "compete". On Sun, Jun 20, 2010 at 12:35 AM, Lukasz Stafiniak <lukstafi@...> wrote: > I've had an idea for a refined selection for the "best option": among > the player's options having the same value, select the one which has > the highest / lowest sum of values over players. Then I decided it doesn't help me > with anything, but perhaps it is a choice worth having? > 
From: Lukasz Stafiniak <lukstafi@gm...>  20100619 23:26:53

"CorrelatedQ Learning": http://www.hpl.hp.com/conferences/icml2003/papers/231.pdf Without further ado:  Forwarded message  From: Michael Littman <michael.littman@...> Date: Sat, Jun 19, 2010 at 9:13 PM Subject: Re: [rllist] extension of minimax to more than two players To: rllist@... Hi, Once you leave the twoplayer regime, the notion of zerosum isn't really that useful anymore. One application of Qlearning in the general sum setting is Hu and Wellman's socalled NashQ algorithm. Unlike minimaxQ in the zerosum setting, however, the algorithm does not come with any kind of useful guarantee. Nonetheless, we've seen it (and related work by Greenwald et al. on CorrelatedQ) do very reasonable things in simple games. The problem of what constitutes reasonable behavior in general sum games is still actively being attacked by the community. There's a bunch of nice papers that grapple with it (Shoham, Zinkevich, Bowling, Tenenholtz/Brafman to name a few). If you have a sense of what you think you want your agent to do, perhaps there are already relevant algorithms out there. Enrique Munoz de Cote had a nice webpage with links, but I'm not sure where to find it. Michael Littman (or Prof. Littman or Dr. Littman... not Mr. Littman)  You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group. To post to this group, send email to rllist@... To unsubscribe from this group, send email to rllistunsubscribe@... For more options, visit this group at http://groups.google.com/group/rllist?hl=en 
From: Lukasz Stafiniak <lukstafi@gm...>  20100619 22:35:30

I've had an idea for a refined selection for the "best option": among the player's options having the same value, select the one which has the highest / lowest sum of values. Then I decided it doesn't help me with anything, but perhaps it is a choice worth having? 