I am going to talk about Markov systems (models, chains, decision processes etc.) so you should know what they are before reading this.
The transition_matrix class is a derived matrix class with double type as a data value. You can use them to statistically contain a reward function. A neural net with other software can help on these matrices. They are fast and propagate lots of more states.
You can make a MDP (markov decision policy) with operators on an environment
S : set of states (states are not provided (yet?) in the code but can be simple)
A : set of actions (our operators, such as the transition_matrix class)
O : set of observations (some datastructure such as strings or matrices)
T : set of conditional transition probabilities (numbers)
Omega : conditional observation probabilities (numbers)
R : A x S -> R the reward function (something which works on the above data structures)
The clue is to not use observations unless there's a partionally observative markov model.
The states and actions yield the source for the reward function. This is called a policy.
You can use your own reward functions in the software by using matrices or numbers.
You can use functors for actions acting on the world. For states there's numbers, vectors of numbers, matrices and so on.
Reward functions are the clue to decision making for optimizing your system. There's lots of more information out there on the net.