Re: [MMLF-support] about a bug

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello Ralph,
thanks for your interest in the MMLF. Your application sounds quite 
interesting.

I would not consider this behaviour a bug; the semantics of self.reward 
is as follows: the agent obtains reward via giveReward(reward). It 
accumulates this reward in the self.reward attribute. This self.reward 
is initially 0. It is reset to 0 whenever AgentBase.getAction() or 
AgentBase.nextEpisodeStarted() is called (i.e. when the agent has to 
choose an action or when an episode terminates). The reason for this 
semantics is that in some settings, the agent might get several rewards 
before he can act/learn the next time.

For getting the behaviour you want, you can simply do the following changes:
* In td_agent.py replace in TDAgent.giveReward() "self.reward += reward" 
by "self.reward = reward"
* In agent_base.py replace in AgentBase.__init__, AgentBase.getAction, 
and AgentBase.nextEpisodeStarted the statements "self.reward = 0" by 
"self.reward = None"

This should do the job. You have to make sure then that the _train 
method is never called when self.reward is None since otherwise it would 
crash since it expects a numeric reward.

Best regards,
Jan

Am 02.08.2012 01:32, schrieb Yuan, Jiangchuan:
>
> Dear MMLF-support team,
>
> We are trying design some reinforcement learning models for our equity 
> trading algorithms. We were doing some test on MMLF, and it seems 
> pretty convenient so far.
>
> However, we noticed that the None reward set in the environment seems 
> to be automatically transformed as reward=0 in the agent. More 
> specifically, in the evaluateAction(self, actionObject) function, we 
> sometimes return the resultDict with reward as None. However, if I set 
> a stop point within getAction(self) in the td_agent.py and check the 
> value of self.reward, it turns out that self.reward is tranfored as 0 
> automatically.
>
> The reason we need to set the reward as none is because our action 
> space depends on state. Since evaluateAction sometimes may pass us 
> actionObject that’s outside of the state space, we have got to reject 
> these steps, and do not wish the (state, action) pairs appears in the 
> eligibility trace.
>
> If possible, can you help me to check about it?
>
> Thanks a lot,
>
> Ralph
>
> ------------------------------------------------------------------------
>
> *Jiangchuan Yuan*| Linear Quantitative Research | Electronic Client 
> Solutions | Global Equities | *J.P. Morgan* | 383 Madison Avenue, New 
> York, NY, 10179 | T: +1 (212) 622-5624 | jia...@jp... 
> <mailto:jia...@jp...>| jpmorgan.com <http://jpmorgan.com>
>
> This email is confidential and subject to important disclaimers and 
> conditions including on offers for the purchase or sale of securities, 
> accuracy and completeness of information, viruses, confidentiality, 
> legal privilege, and legal entity disclaimers, available at 
> http://www.jpmorgan.com/pages/disclosures/email.
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
> _______________________________________________
> MMLF-support mailing list
> MML...@li...
> https://lists.sourceforge.net/lists/listinfo/mmlf-support

-- 
  Dipl. Inf. Jan Hendrik Metzen

  Universität Bremen
  FB 3 - Mathematik und Informatik
  AG Robotik
  Robert-Hooke-Straße 5
  28359 Bremen, Germany

  Besuchsadresse im Gebäude Unicom 1:
  Mary-Somerville-Str. 9
  28359 Bremen, Germany

  Phone: +49 (0)421 178 45-4123
  Fax:   +49 (0)421 178 45-4150
  E-Mail: jh...@in...
  Homepage: http://www.informatik.uni-bremen.de/~jhm/

  Weitere Informationen: http://www.informatik.uni-bremen.de/robotik