## toss-devel — Toss develpment discussion

You can subscribe to this list here.

 2004 2005 2009 2010 2011 2012 2013 2014 2015 2016 2017 Jan Feb Mar Apr May Jun Jul Aug (1) Sep Oct Nov Dec Jan (1) Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar (1) Apr May Jun Jul Aug (17) Sep (34) Oct Nov Dec Jan Feb (100) Mar (122) Apr (5) May Jun (17) Jul (36) Aug (9) Sep (111) Oct (92) Nov (76) Dec (26) Jan (3) Feb (35) Mar (36) Apr (10) May (9) Jun (2) Jul (3) Aug (2) Sep Oct (7) Nov (12) Dec Jan (19) Feb (1) Mar (4) Apr (1) May (6) Jun (69) Jul (21) Aug (12) Sep (14) Oct (1) Nov (3) Dec Jan (6) Feb (1) Mar (6) Apr (3) May (6) Jun (1) Jul Aug Sep Oct (2) Nov (3) Dec Jan Feb Mar (6) Apr May Jun Jul Aug Sep Oct Nov Dec Jan (4) Feb Mar Apr May Jun Jul Aug Sep Oct Nov (1) Dec (3) Jan (6) Feb (1) Mar (3) Apr (1) May (3) Jun (1) Jul (1) Aug (3) Sep (2) Oct (1) Nov (1) Dec (1) Jan Feb (2) Mar (1) Apr May Jun Jul (2) Aug Sep Oct Nov Dec
S M T W T F S

1

2

3

4

5

6

7

8
(4)
9
(2)
10
(8)
11
(28)
12
(8)
13

14
(2)
15
(8)
16
(10)
17
(14)
18
(5)
19
(1)
20

21

22

23

24
(1)
25

26
(4)
27
(18)
28
(8)
29

30

31
(1)

Showing results of 122

1 2 3 .. 5 > >> (Page 1 of 5)
 Re: [Toss-devel] Payoff-derived heuristics From: Lukasz Stafiniak - 2010-03-31 10:10:17 On Fri, Mar 26, 2010 at 2:39 PM, Lukasz Stafiniak wrote: > I have an idea for heuristics that is better than SoftChar that we > discussed [...] > > The algorithm is roughly: reduce to DNF; start from a literal > modifiable by game rules and expand the connected component element by > element as in the example; take the maximum (or sum) over the DNF > disjuncts (or an average between the maximum and the sum). > > Note that the heuristic is only as good as the payoff formulation. The standard > > ex x (W(x) and all y not C(x,y)) > > would not work. It is not because of "all", the fenced board formulation > > ex x (W(x) and BF(x)) > > doesn't work either. Deciding equisatisfiability and searching for the > most informative formulation might be a separate project within Toss. > I know how to compute an approximation to the expanded forms of payoffs. To compute an expanded form of a formula: (1) play totally random games... (2) untill the formula is true (throw out games which ended with the formula being false) (3) compute frequent substructures of the found structures (4) return a disjunction of large substructures that cover the structures found in (2) 
 Re: [Toss-devel] Release plans From: Lukasz Kaiser - 2010-03-28 15:43:42 > I'll update the definition in examples, but I'll first fix board generation. Ok, but have a look at efficiency as well. If it starts being much slower, I could finish rewriting the get_real function in Solver ml (it is a bit faster for single Char expressions, but I'm not sure if this influences anything). Lukasz 
 Re: [Toss-devel] Release plans From: Lukasz Stafiniak - 2010-03-28 11:10:17 I'll update the definition in examples, but I'll first fix board generation. On Sun, Mar 28, 2010 at 12:26 PM, Lukasz Stafiniak wrote: > > No, it was just a debugging artifact. The only problem is that the old > definition of breakthrough does not have zero-sum payoffs so does not > weigh the proximity of loss properly. > 
 Re: [Toss-devel] Release plans From: Lukasz Stafiniak - 2010-03-28 10:26:26 On Sun, Mar 28, 2010 at 11:20 AM, Lukasz Stafiniak wrote: > On Sun, Mar 28, 2010 at 3:30 AM, Lukasz Kaiser wrote: >>> Thanks. I'll look during the day. I'll also fix another thing -- >>> chessboard-like element names for generated games. >> >> Yes - take a look at it (I checked 3 times, with 100 iters, >> and always a very bad move is chosen - suspicious!). > > Nice test case, it doesn't need that many iterations. I'm on track of > some bug (it doesn't return the move of the selected UCT subtree in > the failing situation.) No, it was just a debugging artifact. The only problem is that the old definition of breakthrough does not have zero-sum payoffs so does not weigh the proximity of loss properly. 
 Re: [Toss-devel] Release plans From: Lukasz Stafiniak - 2010-03-28 09:20:26 On Sun, Mar 28, 2010 at 3:30 AM, Lukasz Kaiser wrote: >> Thanks. I'll look during the day. I'll also fix another thing -- >> chessboard-like element names for generated games. > > Yes - take a look at it (I checked 3 times, with 100 iters, > and always a very bad move is chosen - suspicious!). Nice test case, it doesn't need that many iterations. I'm on track of some bug (it doesn't return the move of the selected UCT subtree in the failing situation.) > I'm not sure what you mean with chessboard-like element > names - I think that we will have to rethink the whole story > about element names anyway (this is done in too many > places, but not there where it should, i.e. not in the interface). I think that resolving these issues is up to you. I'd suggest removing the named_structure type and putting names into the structure record (whether optional or obligatory). (I mean that instead of generating "e12" I should generate "B1" etc., and also reverse the direction of arrows -- because currently "e82" is actually "B1".) 
 Re: [Toss-devel] Release plans From: Lukasz Kaiser - 2010-03-28 01:38:40 > Thanks. I'll look during the day. I'll also fix another thing -- > chessboard-like element names for generated games. Yes - take a look at it (I checked 3 times, with 100 iters, and always a very bad move is chosen - suspicious!). I'm not sure what you mean with chessboard-like element names - I think that we will have to rethink the whole story about element names anyway (this is done in too many places, but not there where it should, i.e. not in the interface). So I'm not that much worried about names for now - but check this thing with the move tomorrow, it really looks suspicious. Lukasz 
 Re: [Toss-devel] Release plans From: Lukasz Stafiniak - 2010-03-28 01:15:23 Thanks. I'll look during the day. I'll also fix another thing -- chessboard-like element names for generated games. On Sun, Mar 28, 2010 at 1:50 AM, Lukasz Kaiser wrote: > > I do not expect it to make good first move or make deep insight. > This, I hope, we can get later with better heuristic. But with some > reasonable number of iterations it should prevent moves which lose > in one or two - that's at least my intuition. Attached is a state of > breakthrough. It is quite bad for white, but long not lost. The hint, at > 400 iterations (quite a lot!) was to move B1 (first row, second column) > forward. This is one of only two moves which lead to fast defeat. This > seems strange. I just want to check that our heuristic work the right > way, and - for example - that we discount appropriately both players. > I hope our current discount does not make the player try to lose fast! > (If winning, it should be preferred fast, but losing should be prolonged). > Easy stuff, but we have to check this in the coming days to be sure > (the example state is attached). 
 Re: [Toss-devel] Release plans From: Lukasz Kaiser - 2010-03-28 00:50:45 Attachments: bth_bad.toss > I just don't see how it can play well without heuristics with a > reasonable (for Toss) number of iterations, and I don't think I could > be wrong twice in a row today ;-) I do not expect it to make good first move or make deep insight. This, I hope, we can get later with better heuristic. But with some reasonable number of iterations it should prevent moves which lose in one or two - that's at least my intuition. Attached is a state of breakthrough. It is quite bad for white, but long not lost. The hint, at 400 iterations (quite a lot!) was to move B1 (first row, second column) forward. This is one of only two moves which lead to fast defeat. This seems strange. I just want to check that our heuristic work the right way, and - for example - that we discount appropriately both players. I hope our current discount does not make the player try to lose fast! (If winning, it should be preferred fast, but losing should be prolonged). Easy stuff, but we have to check this in the coming days to be sure (the example state is attached). > Let's leave it for the next release. Maximax on the heuristic with > counting derived from an expanded form of the winning condition will > make for a smart play. And it will be relatively easy to implement. > The difficult thing is getting an expanded form of a compact form > automatically, we will have to cheat and provide the expanded form > payoffs for breakthrough example in the next release. Fortunately the > natural formulation of payoffs for gomoku is already expanded. Yes - this is something for the next release. For improving this one, I will just check yhr play with discounting a little bit more. Lukasz 
 [Toss-devel] Release plans From: Lukasz Stafiniak - 2010-03-28 00:39:53 On Sun, Mar 28, 2010 at 1:16 AM, Lukasz Kaiser wrote: > > For now, I see new problem > with breakthrough - looks like our current way of playing really likes > to get beaten (it moves to get beaten very often, without reason). > I guess we should look at that - does not really make for a good play. > I just don't see how it can play well without heuristics with a reasonable (for Toss) number of iterations, and I don't think I could be wrong twice in a row today ;-) Let's leave it for the next release. Maximax on the heuristic with counting derived from an expanded form of the winning condition will make for a smart play. And it will be relatively easy to implement. The difficult thing is getting an expanded form of a compact form automatically, we will have to cheat and provide the expanded form payoffs for breakthrough example in the next release. Fortunately the natural formulation of payoffs for gomoku is already expanded. 
 Re: [Toss-devel] Payoff-derived heuristics From: Lukasz Kaiser - 2010-03-27 23:37:56 > Not into Arena, there's the syntax already in the requests: > > EVAL LOC MOVES [ {heuristics for loc 0}; {heuristics for loc 1}; ... ] > other parameters > > (I'm changing the syntax to the above as I've realized the current svn > syntax for heuristics is ambiguous.) > > It's not for me as I'm happy with OCaml (need write automated tests > anyway). But for others. Ok, so I just have to send these together with other data at "Hint". No problem. Well - there is a problem: this will never be saved as it is not on the server. But I can add it and at first always set it to the value of the payoff. I think the interface really helps with debugging and basic experimenting - even though I see now (sitting at home, where I did not install the alpha release of Lucid Lynx) that it is not very good with qt < 4.6. Luckily Lucid will come soon :). Lukasz P.S. Are you still working on something? I'm playing against gridlock with our new time-discounted engine and I'm wondering if there is anything more we want to add in a very short term. 
 Re: [Toss-devel] Payoff-derived heuristics From: Lukasz Stafiniak - 2010-03-27 23:12:55 On Sat, Mar 27, 2010 at 11:50 PM, Lukasz Kaiser wrote: >> Perhaps you could add fields for heuristics under the payoff fields in >> the interface? Even when we'll derive heuristics from payoffs, the >> user might have better ideas. > > Yes. I can do it now if you intend to use it in the short term, > but for this you probably need to add a field in location type > in Arena ml. Do you want to do that now, or shall we stick > to what we have for a few more days of testing first? > Not into Arena, there's the syntax already in the requests: EVAL LOC MOVES [ {heuristics for loc 0}; {heuristics for loc 1}; ... ] other parameters (I'm changing the syntax to the above as I've realized the current svn syntax for heuristics is ambiguous.) It's not for me as I'm happy with OCaml (need write automated tests anyway). But for others. 
 Re: [Toss-devel] Payoff-derived heuristics From: Lukasz Kaiser - 2010-03-27 22:50:33 > Perhaps you could add fields for heuristics under the payoff fields in > the interface? Even when we'll derive heuristics from payoffs, the > user might have better ideas. Yes. I can do it now if you intend to use it in the short term, but for this you probably need to add a field in location type in Arena ml. Do you want to do that now, or shall we stick to what we have for a few more days of testing first? Lukasz 
 Re: [Toss-devel] Payoff-derived heuristics From: Lukasz Stafiniak - 2010-03-27 22:46:00 Perhaps you could add fields for heuristics under the payoff fields in the interface? Even when we'll derive heuristics from payoffs, the user might have better ideas. 
 Re: [Toss-devel] Payoff discounting From: Lukasz Stafiniak - 2010-03-27 22:41:37 On Sat, Mar 27, 2010 at 11:25 PM, Lukasz Kaiser wrote: > >> I mean, this form of discounting is obviously better than exponential >> discounting. Hutter in his "Universal AI" has a discussion on >> horizons. The reasons for our horizon are practical and human-like: >> intelligently greedy (high resolution for the short term) and >> intelligently persistent (low resolution for time distinctions in the >> long term). > > I'm not really sure about the role of horizons for us: in reachability > games you normally want them and this was very obvious for our > entanglement tests (this is true reachability). [I'm sorry for the confusion, I've used "(manipulating the) horizon" as meaning "discounting" in my above paragraph, I was speaking about our discounting (not our fixed horizon). Horizons and discounting are equivalent in some sense under some forms but differ in practice.] 
 Re: [Toss-devel] Payoff discounting From: Lukasz Kaiser - 2010-03-27 22:25:29 > Wow. It works. I can confirm (even though it needs 7, not 6 iterations for me in the tic tac toe example, but that's not a problem for now). > I mean, this form of discounting is obviously better than exponential > discounting. Hutter in his "Universal AI" has a discussion on > horizons. The reasons for our horizon are practical and human-like: > intelligently greedy (high resolution for the short term) and > intelligently persistent (low resolution for time distinctions in the > long term). I'm not really sure about the role of horizons for us: in reachability games you normally want them and this was very obvious for our entanglement tests (this is true reachability). Have a look at a hint run of entanglement now - it is uncomparable to what was before! But besides their existence, we might have to just experiment with what is the best one. Anyway - I like what we have now a lot! Will start a hint run of gomoku or breakthrough to get a feeling how these are played now. Lukasz 
 Re: [Toss-devel] Payoff discounting From: Lukasz Stafiniak - 2010-03-27 22:04:18 On Sat, Mar 27, 2010 at 10:56 PM, Lukasz Kaiser wrote: >> I'll start with that, but what discounting regime do you prefer? Wait, >> you've mentioned linear discounting above (I'd prefer >> 0.5+0.5/play_length), I agree that it's better than exponential. > > Yes - I think we should just start with something simple and > have a look how it plays. This will guide us when we do a more > thought out design with heuristics for the longer term later. > I mean, this form of discounting is obviously better than exponential discounting. Hutter in his "Universal AI" has a discussion on horizons. The reasons for our horizon are practical and human-like: intelligently greedy (high resolution for the short term) and intelligently persistent (low resolution for time distinctions in the long term). 
 Re: [Toss-devel] Minimax ("maximax") propagation of heuristics From: Lukasz Kaiser - 2010-03-27 21:59:03 > I'd like to implement now "minimax" (defacto maximax as games don't > need to be zero-sum) propagation of the heuristic. It is a very simple > idea: > > (1) change heuristics to be tables (like payoffs) instead of single > values (needed anyway for simultaneous moves we'll add in the "distant > future") > > (2) when updating the playout score of a node, also update its > heuristic table by replacing it with the max (w.r.t. the player that > owns the node) of the heuristics of child nodes > > Note that heuristics can in some cases "wildly oscillate", because > when a player changes its heuristic table due to a slight increase in > its value, the heuristic value for the parent my become totally > different. I think it's not a bug! I think this is a very good idea. For now it should work very well because our basic heuristics are just winning, so that's great. And for later it will allow us to see how heuristics influence play, so we will be able to plan how to do it best long-term. Nice idea! Lukasz 
 Re: [Toss-devel] Payoff discounting From: Lukasz Stafiniak - 2010-03-27 21:57:38 On Sat, Mar 27, 2010 at 10:28 PM, Lukasz Stafiniak wrote: >> You cannot help bad luck - but if you discount by time (as the paragraph >> above says) you will not need any luck any more - the right move will >> be taken regardless, even if all are lost. Or am I mistaken? >> > You managed to convince me that payoff discounting is better than > playing with horizons :-) > Wow. It works. 
 Re: [Toss-devel] Payoff discounting From: Lukasz Kaiser - 2010-03-27 21:56:55 > I'll start with that, but what discounting regime do you prefer? Wait, > you've mentioned linear discounting above (I'd prefer > 0.5+0.5/play_length), I agree that it's better than exponential. Yes - I think we should just start with something simple and have a look how it plays. This will guide us when we do a more thought out design with heuristics for the longer term later. Lukasz 
 [Toss-devel] Minimax ("maximax") propagation of heuristics From: Lukasz Stafiniak - 2010-03-27 21:34:36 I'd like to implement now "minimax" (defacto maximax as games don't need to be zero-sum) propagation of the heuristic. It is a very simple idea: (1) change heuristics to be tables (like payoffs) instead of single values (needed anyway for simultaneous moves we'll add in the "distant future") (2) when updating the playout score of a node, also update its heuristic table by replacing it with the max (w.r.t. the player that owns the node) of the heuristics of child nodes Note that heuristics can in some cases "wildly oscillate", because when a player changes its heuristic table due to a slight increase in its value, the heuristic value for the parent my become totally different. I think it's not a bug! P.S. I'll add discounting first and commit, and then I'll add this in two commits. 
 [Toss-devel] Payoff discounting From: Lukasz Stafiniak - 2010-03-27 21:29:04 [I keep the discussion for further reference.] On Sat, Mar 27, 2010 at 10:18 PM, Lukasz Kaiser wrote: >> We have the ply numbers in a playout so we can go for non-exponential >> horizon "discounting", e.g. stop the play with prob $\gamma / n$ where >> $n$ is the ply number. > > Well, but in this case it should also be easy to just discount by length. > Say e.g. to multiply payoff by (1 + 1/play_length). This is very basic, > but would still prefer fast wins to longer ones significantly, and we get > a very big difference between one-step and 2,3 or 4-step wins. > >> Sorry for the delay. I've added heuristics for the terminal nodes but >> it doesn't help against bad luck (basically I've . Have a look at what >> happens in the 13th play (now "P"-0 is on the left, "Q"-1 on the >> right; 20 total iterations) -- *all* playouts were won by "P": > > You cannot help bad luck - but if you discount by time (as the paragraph > above says) you will not need any luck any more - the right move will > be taken regardless, even if all are lost. Or am I mistaken? > > Lukasz > You managed to convince me that payoff discounting is better than playing with horizons :-) I'll start with that, but what discounting regime do you prefer? Wait, you've mentioned linear discounting above (I'd prefer 0.5+0.5/play_length), I agree that it's better than exponential. 
 Re: [Toss-devel] Payoff-derived heuristics From: Lukasz Kaiser - 2010-03-27 21:18:39 > We have the ply numbers in a playout so we can go for non-exponential > horizon "discounting", e.g. stop the play with prob $\gamma / n$ where > $n$ is the ply number. Well, but in this case it should also be easy to just discount by length. Say e.g. to multiply payoff by (1 + 1/play_length). This is very basic, but would still prefer fast wins to longer ones significantly, and we get a very big difference between one-step and 2,3 or 4-step wins. > Sorry for the delay. I've added heuristics for the terminal nodes but > it doesn't help against bad luck (basically I've . Have a look at what > happens in the 13th play (now "P"-0 is on the left, "Q"-1 on the > right; 20 total iterations) -- *all* playouts were won by "P": You cannot help bad luck - but if you discount by time (as the paragraph above says) you will not need any luck any more - the right move will be taken regardless, even if all are lost. Or am I mistaken? Lukasz 
 Re: [Toss-devel] Payoff-derived heuristics From: Lukasz Stafiniak - 2010-03-27 21:07:24 On Sat, Mar 27, 2010 at 7:48 PM, Lukasz Kaiser wrote: > > Yes, we can try with probabilistic horizon (I guess you mean by > it a stopping probability - say with 1% the game stops now and > everyone gets 0 - yes?). In fact we could use the payoff formulas > to evaluate such stopped positions as well, or add a special field > for heuristic formulas and evaluate those in the stopped positions > (but this maybe goes too far for short-term changes, it's rather an > approval of our previous heuristics discussion). I have just one problem > with stopping probabilities - they multiply. It will be ok to prefer very > short wins from very long, but to allow long plays it will have to be > small, so I'm not sure if it will be enough to distinguish one-step win > from a win in, say, 5 steps, with any reasonable probability. So let's > try this, but do not dismiss the discounting option totally as well. I see your point. Anyway having parameters hurts generality. Perhaps we should count how many playouts were interrupted and adjust the (fixed or "probabilistic") horizon? We have the ply numbers in a playout so we can go for non-exponential horizon "discounting", e.g. stop the play with prob $\gamma / n$ where $n$ is the ply number. > > Ok, that's an important fix! When you're done I will make some > tests again to try and determine the best hard-coded parameters > for game ml - so that we get some nice plays :). > Sorry for the delay. I've added heuristics for the terminal nodes but it doesn't help against bad luck (basically I've . Have a look at what happens in the 13th play (now "P"-0 is on the left, "Q"-1 on the right; 20 total iterations) -- *all* playouts were won by "P": toss: updated tree: Values: 0 nobs=20 exact=1.00 UCB=1.38; 1 nobs=20 exact=-1.00 UCB=-0.62; heur=0.00; scores 0:20.00 1:-20.00; Node: Q.P .P. ... game ended in: Q.P .P. PQ. + |Values: 0 nobs=3 exact=1.00 UCB=1.58; 1 nobs=3 exact=-1.00 UCB=-0.42; heur=0.00; scores 0:3.00 1:-3.00; Node: [...] |Values: 0 nobs=3 exact=1.00 UCB=1.58; 1 nobs=3 exact=-1.00 UCB=-0.42; heur=0.00; scores 0:3.00 1:-3.00; Node: |Q.P |.P. |Q.. |game ended in: |Q.P |PPP |QQ. |+ ||Values: heur=0.00; scores; Tip: ||QPP ||.P. ||Q.. ||/ ||Values: 0 nobs=3 exact=1.00 UCB=1.58; 1 nobs=3 exact=-1.00 UCB=-0.42; heur=0.00; scores 0:3.00 1:-3.00; Node: ||Q.P ||PP. ||Q.. ||game ended in: ||QPP ||PPQ ||QPQ ||+ |||Values: 0 nobs=1 exact=1.00 UCB=1.61; 1 nobs=1 exact=-1.00 UCB=-0.39; heur=0.00; scores 0:1.00 1:-1.00; Node: |||QQP |||PP. |||Q.. |||game ended in: |||QQP |||PPP |||Q.. |||+ ||||Values: 0 nobs=1 exact=1.00 UCB=1.97; 1 nobs=1 exact=-1.00 UCB=-0.03; heur=1.00; scores 0:1.00 1:-1.00; Leaf: ||||QQP ||||PPP ||||Q.. ||||game ended in: ||||QQP ||||PPP ||||Q.. ||||/ ||||Values: heur=0.00; scores; Tip: ||||QQP ||||PP. ||||QP. ||||/ ||||Values: heur=0.00; scores; Tip: ||||QQP ||||PP. ||||Q.P ||||/ |||Values: 0 nobs=1 exact=1.00 UCB=1.61; 1 nobs=1 exact=-1.00 UCB=-0.39; heur=0.00; scores 0:1.00 1:-1.00; Leaf: |||Q.P |||PPQ |||Q.. |||game ended in: |||QPP |||PPQ |||QPQ |||/ |||Values: heur=0.00; scores; Tip: |||Q.P |||PP. |||QQ. |||/ |||Values: heur=0.00; scores; Tip: |||Q.P |||PP. |||Q.Q |||/ ||Values: heur=0.00; scores; Tip: ||Q.P ||.PP ||Q.. ||/ ||Values: heur=0.00; scores; Tip: ||Q.P ||.P. ||QP. ||/ ||Values: heur=0.00; scores; Tip: ||Q.P ||.P. ||Q.P ||/ 
 Re: [Toss-devel] Payoff-derived heuristics From: Lukasz Kaiser - 2010-03-27 18:48:36 > Yes, at enough trials this happens to me. You can see everything in > the UCT tree. I can solve it by time discounting and I think that > probabilistic horizon is a better form of it than payoff discounting > (and is simple to implement). Yes, we can try with probabilistic horizon (I guess you mean by it a stopping probability - say with 1% the game stops now and everyone gets 0 - yes?). In fact we could use the payoff formulas to evaluate such stopped positions as well, or add a special field for heuristic formulas and evaluate those in the stopped positions (but this maybe goes too far for short-term changes, it's rather an approval of our previous heuristics discussion). I have just one problem with stopping probabilities - they multiply. It will be ok to prefer very short wins from very long, but to allow long plays it will have to be small, so I'm not sure if it will be enough to distinguish one-step win from a win in, say, 5 steps, with any reasonable probability. So let's try this, but do not dismiss the discounting option totally as well. > I thought: because the UCB1-TUNED default parameters are not > fine-tuned for this example (small number of iterations and one > prominent best option). But I've just realized there's a bug -- > heuristic is always reported as 0 while it should by default be the > payoff. I'm going to fix it and it should fix the problem. Ok, that's an important fix! When you're done I will make some tests again to try and determine the best hard-coded parameters for game ml - so that we get some nice plays :). Best! Lukasz 
 Re: [Toss-devel] Payoff-derived heuristics From: Lukasz Stafiniak - 2010-03-27 16:52:53 On Sat, Mar 27, 2010 at 4:38 PM, Lukasz Kaiser wrote: > > I have no problem with the file, but it still suggests moves > which do not win. Again: in this situation there are 2 winning > moves (skip or move upper-left cop) and 2 moves which are > not directly winning. When I test it, it still takes one of the bad > moves about 30% of the time, or something like this. Just open > it in the GUI and click "Hint" a few times - you will see. Why? Yes, at enough trials this happens to me. You can see everything in the UCT tree. I can solve it by time discounting and I think that probabilistic horizon is a better form of it than payoff discounting (and is simple to implement). > > Attached is another example, a tic-tac-toe situation in which only > one move blocks the direct win of the opponent. If I understand > how our algorithm works, it should create a UCT tree with 6 leaves > in the first iteration and test one of them. Each of these 6 positions > except for one directly loses (the opponent wins, and the heuristic > for the opponent should see this, so this is the only possible outcome > of the playout). The good position may lose or win, but in fact it will > always take more steps, so it should *always* be considered better > than the other 5. So: even at 6 iterations it should always choose > the right move. But it does not: just click a bit at 10 iterations and > you will see that it sometimes suggests something stupid. Why? > I thought: because the UCB1-TUNED default parameters are not fine-tuned for this example (small number of iterations and one prominent best option). But I've just realized there's a bug -- heuristic is always reported as 0 while it should by default be the payoff. I'm going to fix it and it should fix the problem. If you're interested, here are the relevant two subtrees of a failure case (Q-"1" is on the left, P-"0" on the right): |Values: 1 nobs=2 exact=-1.00 UCB=-0.40; 0 nobs=2 exact=1.00 UCB=1.60; heur=0.00; act if 1=-1.00; act if 0=1.00; Node: |Q.P |.P. |Q.. |game ended in: |QQP |.PP |Q.P |+ ||Values: heur=0.00; Tip: ||QPP ||.P. ||Q.. ||/ ||Values: heur=0.00; Tip: ||Q.P ||PP. ||Q.. ||/ ||Values: 1 nobs=2 exact=-1.00 UCB=-0.40; 0 nobs=2 exact=1.00 UCB=1.60; heur=0.00; act if 1=-1.00; act if 0=1.00; Node: ||Q.P ||.PP ||Q.. ||game ended in: ||Q.P ||PPP ||Q.Q ||+ |||Values: heur=0.00; Tip: |||QQP |||.PP |||Q.. |||/ |||Values: heur=0.00; Tip: |||Q.P |||QPP |||Q.. |||/ |||Values: heur=0.00; Tip: |||Q.P |||.PP |||QQ. |||/ |||Values: 1 nobs=1 exact=-1.00 UCB=-0.39; 0 nobs=1 exact=1.00 UCB=1.61; heur=0.00; act if 1=-1.00; act if 0=1.00; Leaf: |||Q.P |||.PP |||Q.Q |||game ended in: |||Q.P |||PPP |||Q.Q |||/ ||Values: heur=0.00; Tip: ||Q.P ||.P. ||QP. ||/ ||Values: heur=0.00; Tip: ||Q.P ||.P. ||Q.P ||/ P got lucky and won two games despite being blocked. While below Q still scores some wins when P errs to play the winning position. |Values: 1 nobs=11 exact=-0.45 UCB=-0.01; 0 nobs=11 exact=0.45 UCB=0.90; heur=0.00; act if 1=-0.45; act if 0=0.45; Node: |Q.P |QP. |... |game ended in: |QQP |QP. |P.P |+ ||Values: 1 nobs=2 exact=0.00 UCB=0.60; 0 nobs=2 exact=0.00 UCB=0.60; heur=0.00; act if 1=0.00; act if 0=0.00; Node: ||QPP ||QP. ||... ||game ended in: ||QPP ||QP. ||PQ. ||+ |||Values: heur=0.00; Tip: |||QPP |||QPQ |||... |||/ |||Values: 1 nobs=1 exact=1.00 UCB=1.61; 0 nobs=1 exact=-1.00 UCB=-0.39; heur=0.00; act if 1=1.00; act if 0=-1.00; Terminal: |||QPP |||QP. |||Q.. |||/ |||Values: 1 nobs=1 exact=-1.00 UCB=-0.39; 0 nobs=1 exact=1.00 UCB=1.61; heur=0.00; act if 1=-1.00; act if 0=1.00; Leaf: |||QPP |||QP. |||.Q. |||game ended in: |||QPP |||QP. |||PQ. |||/ |||Values: heur=0.00; Tip: |||QPP |||QP. |||..Q |||/ ||Values: 1 nobs=1 exact=1.00 UCB=1.61; 0 nobs=1 exact=-1.00 UCB=-0.39; heur=0.00; act if 1=1.00; act if 0=-1.00; Node: ||Q.P ||QPP ||... ||game ended in: ||Q.P ||QPP ||Q.. ||+ |||Values: heur=0.00; Tip: |||QQP |||QPP |||... |||/ |||Values: 1 nobs=1 exact=1.00 UCB=1.61; 0 nobs=1 exact=-1.00 UCB=-0.39; heur=0.00; act if 1=1.00; act if 0=-1.00; Leaf: |||Q.P |||QPP |||Q.. |||game ended in: |||Q.P |||QPP |||Q.. |||/ |||Values: heur=0.00; Tip: |||Q.P |||QPP |||.Q. |||/ |||Values: heur=0.00; Tip: |||Q.P |||QPP |||..Q |||/ ||Values: 1 nobs=6 exact=-1.00 UCB=-0.48; 0 nobs=6 exact=1.00 UCB=1.52; heur=0.00; act if 1=-1.00; act if 0=1.00; Terminal: ||Q.P ||QP. ||P.. ||/ ||Values: heur=0.00; Tip: ||Q.P ||QP. ||.P. ||/ ||Values: 1 nobs=2 exact=0.00 UCB=0.60; 0 nobs=2 exact=0.00 UCB=0.60; heur=0.00; act if 1=0.00; act if 0=0.00; Node: ||Q.P ||QP. ||..P ||game ended in: ||Q.P ||QP. ||Q.P ||+ |||Values: heur=0.00; Tip: |||QQP |||QP. |||..P |||/ |||Values: heur=0.00; Tip: |||Q.P |||QPQ |||..P |||/ |||Values: 1 nobs=1 exact=1.00 UCB=1.61; 0 nobs=1 exact=-1.00 UCB=-0.39; heur=0.00; act if 1=1.00; act if 0=-1.00; Leaf: |||Q.P |||QP. |||Q.P |||game ended in: |||Q.P |||QP. |||Q.P |||/ |||Values: heur=0.00; Tip: |||Q.P |||QP. |||.QP |||/ 

Showing results of 122

1 2 3 .. 5 > >> (Page 1 of 5)