From: David D. <da...@br...> - 2005-11-14 23:31:21
|
Hi, I am Dave DeCaprio from the Broad Institute at MIT and I am investigating using CRFs for finding genes in newly sequence genomes. Gene finding has typically been done using HMM's, but I think CRF's may provide an advantage because of their ability to incorporate more diverse data sources. I'm wondering how to implement some of the constraints I have using your framework. My FSM is not fully connected, and I know the structure. I want to learn some transition probabilities from the training data, but some transitions will be disallowed completely. For example, there is no way to go from the "INTRON" state to the "INTERGENIC" state. If I create a new subclass of Model that returns only the Edges that I want to include, does that accomplish what I want, or does that just make the transitions without edges independent? Another thought I had was to create a special feature for the disallowed transitions and fix their weight to some negative INT_MAX. For example, somehow fix the edge weight of the yprev = "INTRON", y="INTERGENIC" state. This seems like it would work, but I think would require mods to the toolkit. Finally, certain state transitions require special patterns in the input data. For example, the transition from the "Intergenic" state to the "Coding" state requires that the input sequence have "ATG" as the first 3 bases of the coding sequence. Using the fixed weighting approach I described above I could define: "if yPrev = INTERGENIC and y=CODING and (x0 != 'A' or x1 !='T' or x2 !='G')". Any help on how to model an arbitrary constrained FSM would be great. Thanks, Dave |
From: Sunita S. <su...@it...> - 2005-11-16 20:13:06
|
David DeCaprio wrote: > Hi, I am Dave DeCaprio from the Broad Institute at MIT and I am > investigating using CRFs for finding genes in newly sequence genomes. > Gene finding has typically been done using HMM's, but I think CRF's > may provide an advantage because of their ability to incorporate more > diverse data sources. > > I'm wondering how to implement some of the constraints I have using > your framework. My FSM is not fully connected, and I know the > structure. I want to learn some transition probabilities from the > training data, but some transitions will be disallowed completely. > For example, there is no way to go from the "INTRON" state to the > "INTERGENIC" state. If I create a new subclass of Model that returns > only the Edges that I want to include, does that accomplish what I > want, or does that just make the transitions without edges independent? The model will just control what parameters are trained, and not explicitly disallow transitions. For this, you will need to use constraints and this is available only via the SegmentCRF at the moment. I am planning to check in a version soon that provides the same functionality for the CRF class. > > Another thought I had was to create a special feature for the > disallowed transitions and fix their weight to some negative INT_MAX. > For example, somehow fix the edge weight of the yprev = "INTRON", > y="INTERGENIC" state. This seems like it would work, but I think > would require mods to the toolkit. yes, should work but not ideal. > > Finally, certain state transitions require special patterns in the > input data. For example, the transition from the "Intergenic" state > to the "Coding" state requires that the input sequence have "ATG" as > the first 3 bases of the coding sequence. Using the fixed weighting > approach I described above I could define: "if yPrev = INTERGENIC and > y=CODING and (x0 != 'A' or x1 !='T' or x2 !='G')". > Any help on how to model an arbitrary constrained FSM would be great. Assuming you implement features of this type, there is a mode in our implementation of constraints (in SegmentCRF and soon to be in CRF) that only allows those transitions and states for which there is at least some feature fired. All other transitions and states are explicitly set to -infty. This should take care of all the constraints that you want. But of course, you will have to implement your special kind of transition features. I will drop you a note when I provide constraints in CRF. > > Thanks, > Dave > > > ------------------------------------------------------- > This SF.Net email is sponsored by the JBoss Inc. Get Certified Today > Register for a JBoss Training Course. Free Certification Exam > for All Training Attendees Through End of 2005. For more info visit: > http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click > _______________________________________________ > Crf-users mailing list > Crf...@li... > https://lists.sourceforge.net/lists/listinfo/crf-users |