I need to write down what problems I faced while working on this project. These problems might be common so it could help even for myself to log them and of course the solutions. I have to say, that most problems don't refer to Haskell but to idea how things should be programmed.
Space of moves
Working on very first release I was already consicious that chosen way of creation of space of moves is not perfect. It was too general and caused that some moves were represented by more than one entry in the space of moves. This was because structure representing space of moves was created in the most straight forward way. To explain this I need to introduce definitions.
Each move consists of passes. Each pass is represented by edge in the graph. Graph represents space of game.
So space of available moves was created as follows:
- Select all possible passes
- For each pass do step 1 again
This works of course, but causes that graph becomes directed, although rules of game say opposite.
I was not sure how many moves are multiplied. Anyway game was slow, so I had to accelerate it and my first think was to improve this part of solution.
So I decided to change way of creation of space of moves. I needed to give the uniqe values to each edge. This values should be easy to create and comparison between them should be fast like between integers. The most easy solution was to define edge as pair of vertices, but then I'd need to keep them in proper order always. Each vertex consists already of pair of integers, therefore to compare 2 egdes I'd need to do up to 4 integers comparisons and 2 pairs sorts. I've decided to mark each edge by uniqe integer instead -- see the source code FooField.hs if you want to see more.
Then algorithm of space of moves creation was clear.
- Select all possible next passes
- For all not finished until now moves do following
- check its edges against edges of already checked moves
- if exists ignore such move
- otherwise add its edges to the checked moves and don't ignore this move
- do from step 1
The point was to select proper structure to keep edges of already processed moves. For this I've used Map as it gives O(log n) quite good avarage look up and insert time. Edges of each move are also kept in such structure as they have to be sorted (otherwise wouldn't be possible to compare them).
Solution looks less functional than first one. Maybe, but results were interesting. I've copied some data of profiling output of GHC.
This has been run under Windows XP, Intel Core Duo. 1.8GHz, 1GB RAM.
Straightforward run computer vs. computer
total time = 24.28 secs (1214 ticks @ 20 ms) total alloc = 7,656,287,464 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc rmEdge FooField 72.3 67.9 nextPass FooMove 7.3 19.2 mapMove FooMove 6.3 10.8 vertices FooField 3.4 0.0 court Court 3.0 0.0 display Court 2.6 0.0 worstMove Foo 2.1 0.9 active FooField 1.6 0.0
Improved run computer vs. computer
total time = 5.86 secs (293 ticks @ 20 ms) total alloc = 1,587,837,028 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc pruneMoves FooMove 26.3 48.9 rmEdge FooField 20.1 24.9 court Court 14.3 0.0 display Court 12.3 0.0 nextPass FooMove 6.5 12.4 markEdge FooField 5.8 5.2 nNextMove FooMove 3.1 1.9 worstMove Foo 2.4 0.7 isNextPass FooMove 2.0 0.0 nextAllPasses FooMove 1.7 2.9 active FooField 1.7 0.0 mapMove FooMove 1.0 1.8 vertices FooField 1.0 0.0
Improvement was clearly big, more than 4 times. I've noticed though that in improved run considerable costs were located in the functions not involved in playing algorithms. To make this functions relatively less expencive I've redone test using stronger algrithms. Stronger in terms of number of calculations of course. I think, now is time to say a word about algorithms.
Few words about playing algorithms
Playing algorithms one can categorize by depth of calculations of future moves - so called plies. The simplest ones check only own possible moves - this is one ply. I will call them A. Stronger ones consider also possible opponet's answers on own moves - so they check two plies. I'll call them AB algorithms. And so on.
Algorithms available already in verson 0.1 are following.
- GoAhead -- A algorithm that selects move that finishes in the closest position to opponent goal.
- GoAheadWatchOpponent -- AHalfB algorithm. The only improvement to previous one is that it omits moves where opponent can score goal. It has to check then opponent moves, but does it in very limited way.
- GoBackWatchOpponent -- AHalfB algorithm that differs from previous one by move selecction. It selects a move that finishes the closest to own goal but opponent still cannot score goal in the next move.
- PreventingOpponent -- real AB algorithm that selects such a move where opponent's best move will be the worst.
- Watch2Ahead -- ABA algorithm using same move selection rule as PreventingOpponent one.
Back to profiling results
Previously described profiling results came from runing AB vs. AB algorithm. Like mentioned, this algortihm was so fast in the improved run that side effect of run was big cost centre comming out from non algorithmic functions. So then I've run AB vs. ABA algorithm. Results are as follows.
Straight forward run AB vs. ABA
total time = 1745.90 secs (87295 ticks @ 20 ms) total alloc = 911,905,516,228 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc rmEdge FooField 76.7 73.5 nextPass FooMove 8.9 17.3 mapMove FooMove 5.0 7.5 vertices FooField 3.7 0.0 bestMoveWrapped Foo 2.2 0.6 active FooField 1.8 0.0
Improved run AB vs. ABA
total time = 73.22 secs (3661 ticks @ 20 ms) total alloc = 26,586,680,132 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc pruneMoves FooMove 36.2 49.8 rmEdge FooField 31.0 25.6 nextPass FooMove 9.9 11.8 markEdge FooField 7.8 5.3 nNextMove FooMove 2.9 1.8 nextAllPasses FooMove 2.3 2.8 vertices FooField 1.8 0.0 active FooField 1.5 0.0 bestMoveWrapped Foo 1.5 0.5 mapMove FooMove 1.4 1.4 isNextPass FooMove 1.0 0.0
Improvement is huge: 23 times. Of course it is not possible to compare results directly, because changed way of space of moves crateion affected way moves are selected. Usualy there is more that one move that can be selected as best one (there is no other better move), and the order of moves in the data structure affects the final game. Nevetheless figures reflex the observer's impressions quite good. Improved solution was much faster.
Reasons for such a difference are clear: number of examined moves. In the straight forward solution maximum number of examined moves in the one turn was about 9*10^6 since in the improved solution, about 5*10^5 -- about 18 times less.
Numer of exmined moves increases exponentialy. In the ABA algorithm the most expencive move looked someting like:
- About 50 available own moves.
- About 5000 available opponent's answeres.
- About 500000 available own moves in the next turn.
Hence is clear that even improved solution doesn't allow to run (AB)^2 algorithm on already mentioned hardware.
Have a look on the costs centre. I was curious why rmEdge funtion generates 30% of costs. Had a look on it and realized that it does 2 updates on the graph structure. Graph was defined as Array of vertices having lists of each vertex successor. Updates on Array structure cost O(n). I've checked then how it work with Map where update cost O(log n) only. Results were as follows.
Improved Move, MapGraph AB vs. ABA
total time = 67.70 secs (3385 ticks @ 20 ms) total alloc = 22,673,220,940 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc pruneMoves FooMove 37.1 58.4 rmEdge FooField 16.5 11.2 nextPass FooMove 10.0 13.8 markEdge FooField 9.8 6.2 active FooField 7.1 0.9 vertices FooField 5.7 0.7 nNextMove FooMove 2.5 2.1 nextAllPasses FooMove 2.3 3.2 display Court 2.1 0.0 bestMoveWrapped Foo 2.0 0.6 isNextPass FooMove 1.5 0.0 mapMove FooMove 1.3 1.7
There is improvement about 8%. Not much threrefore I've rune second test to confirm this.
By now it's noticable, that rmEdge costs decreased to 16.5%, but on the other hand functions 'active' and 'vertices' became more expencive. This is result of select done on Map that costs O(log n) against Array's O(1).
Straight forward Move, MapGraph AB vs. ABA
total time = 1160.12 secs (58006 ticks @ 20 ms) total alloc = 484,411,720,452 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc rmEdge FooField 43.8 44.7 vertices FooField 17.5 2.7 active FooField 14.0 4.4 nextPass FooMove 12.4 30.8 mapMove FooMove 7.5 14.0 bestMoveWrapped Foo 3.0 1.1 worstMove Foo 1.0 1.3
This test's result shows improvement about 50% due to Map data structure.
Still open possibilities
There is always possiblility to do things better. I'll note some ideas - hopefully I'll find time to put them in place, or perhaps somebody else would like to join me and have a fun working on entertainment - I'll appreciate this.
Duplicate moves recognition
The way moves can be recognized as duplicated can be improved. By now each move that is not finished gets checked against others after each pass separatelly. Only some moves can be duplicated though. After n-th pass only such move can be duplicated, that has a cycle. This is because duplication happens when same move is unfolded 'from the other end'. Morover , the cycle has to be closed by the last pass, otherwise it would have been recognized and pruned in the previous pass validation. Hence only moves that finish cycle after n-th pass have to be validated against duplication in n-th pass validation.
Of course Alpha-beta algorithm can be applied. This will make the solution even less 'functional' as this algorithm bases on previously checked moves and cumulates knowledge that has to be passed to the next step validation - never mind.
Predicted sort of moves
Alpha-beta prunig is more effective if better moves are evaluated first. Forecasting algorithm would be necessary here.