From: SourceForge.net <no...@so...> - 2005-06-28 09:05:41
|
Feature Requests item #1053692, was opened at 2004-10-25 14:41 Message generated for change (Comment added) made by mihmax You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=520350&aid=1053692&group_id=68187 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: 1.4.6 Status: Open Resolution: Accepted Priority: 6 Submitted By: Maxym Mykhalchuk (mihmax) Assigned to: Maxym Mykhalchuk (mihmax) Summary: [1.4.6] Option: Change Segmenting: Sentences vs. Paragraphs Initial Comment: --- Dierk Seeburg wrote: > Hi, > What amount of effort would it take have OmegaT segment by > sentence? > And maybe create a preference setting for sentence-level or > paragraph > level segmentation? > Cheerio, > Dierk > Well, don't know, to say true. Will take a look at it after 1.4.4 release ciao Maxym ---------------------------------------------------------------------- >Comment By: Maxym Mykhalchuk (mihmax) Date: 2005-06-28 11:05 Message: Logged In: YES user_id=488500 === -- Marc Prior wrote Ideally, the user should be able to define under what circumstances segmentation should occur, i.e.: * after what punctuation marks [. ? !], but perhaps also [:] * whether a space should be allowed before the punctuation mark * what non-letter characters should be allowed before the punctuation mark (e.g.: a sentence may well end in a number, but the period is used in German after numbers to indicate ordinals; no segmentation algorithm will be able to detect the difference reliably) * ditto after the punctuation mark * what abbreviations should be recognized * etc. === ---------------------------------------------------------------------- Comment By: Maxym Mykhalchuk (mihmax) Date: 2005-06-27 19:37 Message: Logged In: YES user_id=488500 Partially implemented in 1.4.6 Beta 2 ---------------------------------------------------------------------- Comment By: Maxym Mykhalchuk (mihmax) Date: 2005-06-26 13:00 Message: Logged In: YES user_id=488500 I'll try to implement this RFE in 1.4.6... ---------------------------------------------------------------------- Comment By: Maxym Mykhalchuk (mihmax) Date: 2005-05-08 15:49 Message: Logged In: YES user_id=488500 This issue (at least the simple YES/NO sentence segmenter) will be implemented in 1.5 (the release next after 1.4.5). ---------------------------------------------------------------------- Comment By: Maxym Mykhalchuk (mihmax) Date: 2005-05-08 15:46 Message: Logged In: YES user_id=488500 One of the possible solutions was proposed in http://sourceforge.net/tracker/index.php?func=detail&aid=1056849&group_id=68187&atid=520350 by Jean-Christophe ===(extract from that issue, changed a bit)=== A customisable list of segment markers could be used during parsing to create segments that the parsing filter would not split otherwise. It could be modified from OmegaT UI itself (and taken into account by re-opening the project). ====== Also JC proposed ====== Another list could be used for strings (e.g. abbreviations) that should never be considered as segment markers and would work the same way. ====== ---------------------------------------------------------------------- Comment By: Maxym Mykhalchuk (mihmax) Date: 2004-10-25 14:43 Message: Logged In: YES user_id=488500 P.S. Note that there already exist macros to break into sentences, this RFE is about integrating this functionality into OmegaT itself. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=520350&aid=1053692&group_id=68187 |