Re: [Rails-devel] Implementation details for revenue calculation

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

This (again) a specific mail on revenue calculation, that addresses Alex 
directly, but I prefer to keep the discussion on the develop list to allow 
archiving. I used the Rails archive throughout the work on my implementation 
and dug out several valuable thoughts (a special thanks goes to John 
Tamplin's contributions, his pseudo code form one of the mails is pretty 
similar to a core part).

I hope that not too many complain, but in fact nobody is forced to read it to 
the end ;-)

Alex,
I have a first phase two prototype working. In fact it was not a too difficult 
task, as I only create a multigraph with new edges, that define the new 
routes.

Then I define "EdgeTravelSets", which define edges that are mutually 
exclusive. This is tracked by the calculator using reference counters, as each 
edge might be excluded by several others.

The first results are promising so far and I believe I found the cause for the 
difference in number of evaluations compared to your implementation. 
The main issue is that on a two phase graph the selection of neighbors is 
different to the original one (by definition). 

The effect was that the search for the 4-train set (8, 5+5, 4D, 6E) took 
between 30 - 140 seconds (compared to 70 seconds without).
At first glance it was amazing that the run time did strongly vary and the 
fact that the best  route is found remarkebly late in the process sheds more 
light, why you believe that my revenue predicitor is of the magic type.

I still sort the neighbors by value, but the ordering of the (potentially 
multiple) routes to them is random, as they are retrieved from a HashSet.
The ordering of routes is important for the revenue predictor to work well. 
Finding a good solution quick is important.

Without phase two optimization, the choice to visit the most valuable 
(neighboring) vertex is clearly the optimal approach, as each neigbor is one 
track piece away. Thus the "costs" of each neighbor is identical.

If you have done a phase two optimization, the most valuable vertex can be 
quite far away, thus blocking several track pieces at the same time and 
ruling out a lot of other connections. Thus for phase two the cost of each 
neighboring vertex vary substantially. Accordingly the cost has to be 
considered: The number of edges excluded must be an additional important 
criteria to select the routes. 

Take an example:
Is it better to run for a nearby 20 vertex (which excludes one other route) or 
the far away 50 vertex (which excludes three other routes and potentially 
bypasses the 10 vertex). There is no clear answer, but our preliminary 
results and my experiences with 18xx routes, implies that it is usually 
better to go for nearby stations first and save track first.

Thus two approaches are possible:
* Use Nb of edges excluded as a first criteria, value as a second only. (Thus 
minimize costs first, then maximize revenue).
In that case you would run to the 20 first.

* Or combine the two and use something like cost-modified value approach. For 
example choose the ratio between value and number of routes excluded 
(including the route selected).
In that case you would run to the 50 first. (20/2 = 10, 50/4 = 12.5)

I have implemented the ordering based on the first idea (cost-based) and the 
time is now stable an 24 seconds run for the 4 train set, thus running time 
decreased by factor 3.

Number of evaluations one pass (70 seconds)
480961 evaluations, 2935133 predictions and 8828871 edges travelled.

Number of evalutaions two pass with cost ordering (24 seconds)
475533 evaluations, 2910152 predictions and 2841394 edges travelled

Number of evaluations two pass with value ordering (81 seconds)
1421981 evaluations, 7302480 predictions and 8297888 edges travelled

It seems that edge travelling is the time limiting factor in my approach, but 
that is not surprising as evaluation and prediction is pretty cheap given the 
continuous update of the run values.

Stefan

To compare the validity of my approach, this are my figures about vertices and 
edges:

In scenario A the phase two graph consists of 23 vertices and 57 edges/routes.
There are several routes that overlap with 9 other routes (J3.1 -> J9.1,  
K4.1 -> J3.1,J9.1 -> M2.1,K4.1 -> M2.1).

In scenario B the numbers are 20 vertices and 151 edges/routes.
The route that overlaps with most others is E8->D5 (over 19 hexes) that shares 
track with 111 other routes.