From: Nathan W. <sun...@gm...> - 2012-03-15 03:23:18
|
Actually now that I've been looking more at the code, it looks like both these tests are failing: line 40 in KhmerWordRepeatTest.java // incorrect sentences: //assertEquals(1, rule.match(langTool.getAnalyzedSentence("នេះហើយហើយនោះ។")).length); assertEquals(1, rule.match(langTool.getAnalyzedSentence("ខ្ញុំនិងនិងគាត់។")).length); So that is a failure at a more basic level...meaning the rule can't even see that two tokens are the same... A friend of mine wrote this rule for me, so I am somewhat at a loss as to why it is broken now... Anyone who can take a look? Thanks, Nathan On Thu, Mar 15, 2012 at 9:33 AM, Nathan Wells <sun...@gm...> wrote: > I'm not getting a failure on Khmer in Java 6: > > [junit] Running org.languagetool.rules.km.KhmerWordRepeatRuleTest > [junit] Testsuite: org.languagetool.rules.km.KhmerWordRepeatRuleTest > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.141 sec > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.141 sec > > But I do get a failure on French: > [junit] Running tests for French... > [junit] ------------- ---------------- --------------- > [junit] Testcase: > testGrammarRulesFromXML(org.languagetool.rules.patterns.PatternRuleTest): > FAILED > [junit] French: Did not expect error in: L’Église catholique (Rule: > EGLISE:[l, [’´'??‘], (?-i)église, > adventiste|anglicane|copte|catholique|calviniste|épiscopalienne|jacobite|lutherienne|méthodiste|néo-?apostolique|orthodoxe|pentocôstique|presbytérienne|protestante|réformée]:L’Église) > [junit] junit.framework.AssertionFailedError: French: Did not expect > error in: L’Église catholique (Rule: EGLISE:[l, [’´'??‘], (?-i)église, > adventiste|anglicane|copte|catholique|calviniste|épiscopalienne|jacobite|lutherienne|méthodiste|néo-?apostolique|orthodoxe|pentocôstique|presbytérienne|protestante|réformée]:L’Église) > > But in Java 7 I get an error with Khmer just like you did relating with > this test: > > //correct > assertEquals(0, rule.match(langTool.getAnalyzedSentence("គាត់ហើយ > ហើយខ្ញុំ។")).length); > > This test is failing when it should be correct. > > I believe it has to do with how we specify or detect a "real" space. > Previously we used \u0020 (the Unicode character for a space) > > in KhmerWordRepeatRule.java line 44: > public boolean ignore(final AnalyzedSentence text, final > AnalyzedTokenReadings[] tokensWithWhiteSpace, final int position) { > // Don't mark an error for cases like: > // LEN Rewrite for Khmer: ignore real space separating 2 repeated words > final int origPos = text.getOriginalPosition(position); // LEN get > orig pos of current token > if (position >=1 && > "\u0020".equals(tokensWithWhiteSpace[origPos-1].getToken())) { > return true; > } > return false; > } > > But it seems this no longer works with Java 7....I am not sure why though. > > Can anyone help? > > The code in the rule should allow a repeated word is there is a real space > between the two words (because Khmer has no spaces, so if there is a real > space, then it is ok to repeat the word). Example in English: "Hello > hello" (would be correct). > > Thanks, > Nathan > > > > On Thu, Mar 15, 2012 at 6:12 AM, Daniel Naber <lis...@da...>wrote: > >> Hi, >> >> I just noticed that KhmerWordRepeatRuleTest fails when using Java 7. Can >> someone reproduce that? Maybe there are changes in Unicode handling that >> cause the failure? If anybody could check this out it would be nice. >> >> Regards >> Daniel >> >> -- >> http://www.danielnaber.de >> >> >> ------------------------------------------------------------------------------ >> Virtualization & Cloud Management Using Capacity Planning >> Cloud computing makes use of virtualization - but cloud computing >> also focuses on allowing computing to be delivered as a service. >> http://www.accelacomm.com/jaw/sfnl/114/51521223/ >> _______________________________________________ >> Languagetool-devel mailing list >> Lan...@li... >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> > > |