From: Jan H. M. <jh...@in...> - 2012-08-02 06:27:23
|
Hello Ralph, thanks for your interest in the MMLF. Your application sounds quite interesting. I would not consider this behaviour a bug; the semantics of self.reward is as follows: the agent obtains reward via giveReward(reward). It accumulates this reward in the self.reward attribute. This self.reward is initially 0. It is reset to 0 whenever AgentBase.getAction() or AgentBase.nextEpisodeStarted() is called (i.e. when the agent has to choose an action or when an episode terminates). The reason for this semantics is that in some settings, the agent might get several rewards before he can act/learn the next time. For getting the behaviour you want, you can simply do the following changes: * In td_agent.py replace in TDAgent.giveReward() "self.reward += reward" by "self.reward = reward" * In agent_base.py replace in AgentBase.__init__, AgentBase.getAction, and AgentBase.nextEpisodeStarted the statements "self.reward = 0" by "self.reward = None" This should do the job. You have to make sure then that the _train method is never called when self.reward is None since otherwise it would crash since it expects a numeric reward. Best regards, Jan Am 02.08.2012 01:32, schrieb Yuan, Jiangchuan: > > Dear MMLF-support team, > > We are trying design some reinforcement learning models for our equity > trading algorithms. We were doing some test on MMLF, and it seems > pretty convenient so far. > > However, we noticed that the None reward set in the environment seems > to be automatically transformed as reward=0 in the agent. More > specifically, in the evaluateAction(self, actionObject) function, we > sometimes return the resultDict with reward as None. However, if I set > a stop point within getAction(self) in the td_agent.py and check the > value of self.reward, it turns out that self.reward is tranfored as 0 > automatically. > > The reason we need to set the reward as none is because our action > space depends on state. Since evaluateAction sometimes may pass us > actionObject that’s outside of the state space, we have got to reject > these steps, and do not wish the (state, action) pairs appears in the > eligibility trace. > > If possible, can you help me to check about it? > > Thanks a lot, > > Ralph > > ------------------------------------------------------------------------ > > *Jiangchuan Yuan*| Linear Quantitative Research | Electronic Client > Solutions | Global Equities | *J.P. Morgan* | 383 Madison Avenue, New > York, NY, 10179 | T: +1 (212) 622-5624 | jia...@jp... > <mailto:jia...@jp...>| jpmorgan.com <http://jpmorgan.com> > > This email is confidential and subject to important disclaimers and > conditions including on offers for the purchase or sale of securities, > accuracy and completeness of information, viruses, confidentiality, > legal privilege, and legal entity disclaimers, available at > http://www.jpmorgan.com/pages/disclosures/email. > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > _______________________________________________ > MMLF-support mailing list > MML...@li... > https://lists.sourceforge.net/lists/listinfo/mmlf-support -- Dipl. Inf. Jan Hendrik Metzen Universität Bremen FB 3 - Mathematik und Informatik AG Robotik Robert-Hooke-Straße 5 28359 Bremen, Germany Besuchsadresse im Gebäude Unicom 1: Mary-Somerville-Str. 9 28359 Bremen, Germany Phone: +49 (0)421 178 45-4123 Fax: +49 (0)421 178 45-4150 E-Mail: jh...@in... Homepage: http://www.informatik.uni-bremen.de/~jhm/ Weitere Informationen: http://www.informatik.uni-bremen.de/robotik |