Thread: [cunei-commits] SF.net SVN: cunei:[2] src/cunei
Status: Beta
Brought to you by:
aaronbphillips
From: <aar...@us...> - 2009-06-05 16:24:33
|
Revision: 2 http://cunei.svn.sourceforge.net/cunei/?rev=2&view=rev Author: aaronbphillips Date: 2009-06-05 16:24:26 +0000 (Fri, 05 Jun 2009) Log Message: ----------- Added InputPhrase class which extends the Phrase class to provide coverage information. The InputPhrase will be used by the ConfusionNode, Similarity, and Translation classes as a unified mechanism for tracking the input. Modified Paths: -------------- src/cunei/alignment/PhraseAlignment.java src/cunei/cli/ScoreLanguageModel.java src/cunei/confusion/DocumentConfusionReader.java src/cunei/corpus/Context.java src/cunei/corpus/Corpus.java src/cunei/corpus/CorpusIndexBuilder.java src/cunei/corpus/MultiFileCorpusReader.java src/cunei/decode/HypothesisBuilder.java src/cunei/document/Phrase.java src/cunei/lexicon/Lexicons.java src/cunei/lm/BackoffLanguageModel.java src/cunei/processors/SentenceEliminator.java src/cunei/translate/Match.java src/cunei/translate/PhraseModel.java src/cunei/translate/Similarity.java src/cunei/translate/Translator.java src/cunei/type/TypeSequence.java Added Paths: ----------- src/cunei/confusion/InputPhrase.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-06-05 22:36:42
|
Revision: 3 http://cunei.svn.sourceforge.net/cunei/?rev=3&view=rev Author: aaronbphillips Date: 2009-06-05 22:36:40 +0000 (Fri, 05 Jun 2009) Log Message: ----------- Removed InputPhrase and instead passed around input with a ConfusionPath. It may be worthwhile to revisit this decision later. Gaps are now being created throughout the Translator, but they are always unconstrained (target phrase is null). Still need to verify this works with the Decoder and apply constraints to the gaps. Modified Paths: -------------- launch/Decode.launch src/cunei/confusion/ConfusionPath.java src/cunei/decode/ChartHypothesisBuilder.java src/cunei/document/Phrase.java src/cunei/translate/Match.java src/cunei/translate/Similarity.java src/cunei/translate/Translation.java src/cunei/type/TypeSequence.java Removed Paths: ------------- src/cunei/confusion/InputPhrase.java Property Changed: ---------------- / data/ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-06-06 03:50:08
|
Revision: 4 http://cunei.svn.sourceforge.net/cunei/?rev=4&view=rev Author: aaronbphillips Date: 2009-06-06 03:50:07 +0000 (Sat, 06 Jun 2009) Log Message: ----------- Fixed several bugs in gapped matching. Sentences seem to decode without errors now, but without any target constraints (and likely poor feature weights) I am seeing a *lot* of gaps being filled with an epsilon translation. Modified Paths: -------------- src/cunei/alignment/PhraseAlignment.java src/cunei/decode/ChartHypothesisBuilder.java src/cunei/decode/Decoder.java src/cunei/document/Phrase.java src/cunei/lattice/ScoredSet.java src/cunei/lm/Ngram.java src/cunei/processors/ReverseSourceNumbers.java src/cunei/translate/Hypothesis.java src/cunei/translate/Match.java src/cunei/translate/Similarity.java src/cunei/translate/Translation.java src/cunei/translate/Translator.java src/cunei/type/TypeSequence.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-06-10 02:41:02
|
Revision: 5 http://cunei.svn.sourceforge.net/cunei/?rev=5&view=rev Author: aaronbphillips Date: 2009-06-10 00:33:13 +0000 (Wed, 10 Jun 2009) Log Message: ----------- Changed TypesOfTypes so it throws an IllegalArgumentException instead of a generic RuntimeExeption when the type name is unknown. Modified XMLConfusionReader so that it does not die if extra type information is specified. Modified Paths: -------------- src/cunei/confusion/XMLConfusionReader.java src/cunei/type/TypesOfTypes.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-06-17 20:47:54
|
Revision: 24 http://cunei.svn.sourceforge.net/cunei/?rev=24&view=rev Author: aaronbphillips Date: 2009-06-17 20:47:52 +0000 (Wed, 17 Jun 2009) Log Message: ----------- BROKEN: Replaced Corpus with MultilingualCorpus. Re-indexing will be required. Code currently compiles but there are still some issues with indexing. Modified Paths: -------------- src/cunei/cli/Decode.java src/cunei/cli/EstimateLexiconAlignment.java src/cunei/cli/EstimateSentenceRatios.java src/cunei/cli/Evaluate.java src/cunei/cli/IndexCorpus.java src/cunei/cli/IndexPanliteCorpus.java src/cunei/cli/Optimize.java src/cunei/cli/ProcessPanliteCorpus.java src/cunei/cli/Translate.java src/cunei/cli/Unknown.java src/cunei/corpus/Context.java src/cunei/corpus/CorpusInformation.java src/cunei/corpus/CorpusSerializer.java src/cunei/corpus/MonolingualCorpus.java src/cunei/corpus/Origins.java src/cunei/optimize/Optimizer.java src/cunei/translate/Hypothesis.java src/cunei/translate/Match.java src/cunei/translate/PhraseModel.java src/cunei/translate/Similarity.java src/cunei/translate/Translator.java src/cunei/ui/AlignmentWindow.java src/cunei/ui/Workbench.java Added Paths: ----------- src/cunei/corpus/MultilingualCorpus.java Removed Paths: ------------- src/cunei/corpus/Corpus.java src/cunei/corpus/MultilingualCorpus.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <ral...@us...> - 2009-06-18 20:06:35
|
Revision: 32 http://cunei.svn.sourceforge.net/cunei/?rev=32&view=rev Author: ralfbrown Date: 2009-06-18 20:06:28 +0000 (Thu, 18 Jun 2009) Log Message: ----------- INCOMPLETE: start of code to look in monolingual corpus for a specified context and accumulate the words occurring in that context. Added Paths: ----------- src/cunei/synonym/ src/cunei/synonym/CorpusSynonymFinder.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-06-19 22:14:36
|
Revision: 42 http://cunei.svn.sourceforge.net/cunei/?rev=42&view=rev Author: aaronbphillips Date: 2009-06-19 22:14:10 +0000 (Fri, 19 Jun 2009) Log Message: ----------- Added capability to load Berkeley Aligner's output (both hard and soft alignments). Alignment files are autodetected based on whether the standard GIZA++ header is present. EstimateLexiconAlignment has a new option to skip alignment buidling (for example, to preserve the Berkeley Aligner's posterior probabilities). Also, the alignment building process was altered to make the previous alignment a prior and then re-normalize the matrix after it is updated with the lexical probabilities. This appears (visually) to give good results with the Berkeley Aligners' soft alignment output. Modified Paths: -------------- src/cunei/alignment/AlignmentArray.java src/cunei/cli/EstimateLexiconAlignment.java src/cunei/corpus/MultiFileCorpusReader.java src/cunei/corpus/MultilingualCorpus.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <ral...@us...> - 2009-06-22 18:18:38
|
Revision: 47 http://cunei.svn.sourceforge.net/cunei/?rev=47&view=rev Author: ralfbrown Date: 2009-06-22 18:17:30 +0000 (Mon, 22 Jun 2009) Log Message: ----------- UNTESTED: added ConfusionNetwork.getSourceContexts and added a variant of CorpusSynonymFinder.findSynonyms which takes collections of left and right contexts. Modified Paths: -------------- src/cunei/confusion/ConfusionNetwork.java src/cunei/confusion/ConfusionNode.java src/cunei/synonym/CorpusSynonymFinder.java src/cunei/translate/SuggestedTranslation.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <ral...@us...> - 2009-06-22 20:51:13
|
Revision: 50 http://cunei.svn.sourceforge.net/cunei/?rev=50&view=rev Author: ralfbrown Date: 2009-06-22 20:51:06 +0000 (Mon, 22 Jun 2009) Log Message: ----------- UNTESTED: added class CorpusSynonym extends TypeSequence to keep count and position of a proposed substitution together with its actual value. Implemented ConfusionNetwork.insertCorpusSynonyms. Modified Paths: -------------- src/cunei/confusion/ConfusionNetwork.java src/cunei/synonym/CorpusSynonymFinder.java Added Paths: ----------- src/cunei/synonym/CorpusSynonym.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <ral...@us...> - 2009-06-23 15:04:32
|
Revision: 53 http://cunei.svn.sourceforge.net/cunei/?rev=53&view=rev Author: ralfbrown Date: 2009-06-23 15:04:27 +0000 (Tue, 23 Jun 2009) Log Message: ----------- UNTESTED: switched CorpusSynonymFinder from linear scan to log-n check for validating that a left-context match has a corresponding right-context match. Added config parameters CorpusSynonym.Lookup.{MinFreq,MaxSynonyms}. Modified Paths: -------------- src/cunei/confusion/ConfusionNetwork.java src/cunei/synonym/CorpusSynonymFinder.java Added Paths: ----------- src/cunei/synonym/ExampleComparator.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-06-24 03:19:59
|
Revision: 63 http://cunei.svn.sourceforge.net/cunei/?rev=63&view=rev Author: aaronbphillips Date: 2009-06-24 03:19:52 +0000 (Wed, 24 Jun 2009) Log Message: ----------- Initial preparation for Annotations. New classes Annotation and DependentAnnotatation that are stored in the Phrase. Moved CorpusIndex to SequenceIndex and added an AnnotationIndex class. Cleaned up recent changes to corpus reading code, verified it worked, and removed MonolingualCorpusReader (which was no longer needed). Added .settings directory with formatting preferences for Eclipse. Moving forward source code should be formatted automatically with these specs by Eclipse. Modified Paths: -------------- launch/Decode.launch launch/Estimate Lexicon Alignment.launch launch/Index Corpus.launch launch/Translate.launch src/cunei/corpus/Context.java src/cunei/corpus/MonolingualCorpus.java src/cunei/corpus/MultiFileCorpusReader.java src/cunei/corpus/MultilingualCorpus.java src/cunei/corpus/SentencePair.java src/cunei/document/Phrase.java src/cunei/translate/Translator.java src/cunei/type/AnnotationType.java Added Paths: ----------- .settings/ .settings/org.eclipse.jdt.core.prefs .settings/org.eclipse.jdt.ui.prefs launch/Process Panlite Corpus.launch launch/Workbench.launch src/cunei/corpus/AnnotationIndex.java src/cunei/corpus/SequenceIndex.java src/cunei/corpus/SequenceIndexBuilder.java src/cunei/type/Annotation.java src/cunei/type/DependentAnnotation.java Removed Paths: ------------- src/cunei/corpus/CorpusIndex.java src/cunei/corpus/CorpusIndexBuilder.java src/cunei/corpus/MonolingualCorpusReader.java src/cunei/corpus/StandoffCorpusReader.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-06-30 02:42:04
|
Revision: 90 http://cunei.svn.sourceforge.net/cunei/?rev=90&view=rev Author: aaronbphillips Date: 2009-06-30 02:41:57 +0000 (Tue, 30 Jun 2009) Log Message: ----------- Added a few comments about recent changes. Nothing of huge concern that needs to be fixed immediately (let's get things working first), but just a couple of notes to remind us later of things to clean up. Modified Paths: -------------- src/cunei/synonym/CorpusSynonym.java src/cunei/synonym/CorpusSynonymSerializer.java src/cunei/type/TypeSequence.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <ral...@us...> - 2009-06-30 20:17:16
|
Revision: 96 http://cunei.svn.sourceforge.net/cunei/?rev=96&view=rev Author: ralfbrown Date: 2009-06-30 20:17:13 +0000 (Tue, 30 Jun 2009) Log Message: ----------- Fixed several exceptions caused by the empty Rare+OOV sequence index. Modified Paths: -------------- src/cunei/bits/UnsignedHash.java src/cunei/corpus/MonolingualCorpus.java src/cunei/translate/Translator.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-07-05 20:24:28
|
Revision: 111 http://cunei.svn.sourceforge.net/cunei/?rev=111&view=rev Author: aaronbphillips Date: 2009-07-05 20:24:26 +0000 (Sun, 05 Jul 2009) Log Message: ----------- Fixed several problems with loading and saving annotations. Modified Paths: -------------- src/cunei/confusion/XMLConfusionReader.java src/cunei/corpus/AnnotationIndex.java src/cunei/corpus/MonolingualCorpus.java src/cunei/corpus/MultiFileCorpusReader.java src/cunei/document/Phrase.java src/cunei/translate/Translation.java src/cunei/type/AnnotationStates.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <ral...@us...> - 2009-07-06 23:02:25
|
Revision: 118 http://cunei.svn.sourceforge.net/cunei/?rev=118&view=rev Author: ralfbrown Date: 2009-07-06 23:02:21 +0000 (Mon, 06 Jul 2009) Log Message: ----------- Untested: renamed RARE_OOV to REPLACEMENT. Added suggested code for handling virtual indexes by simply having multiple pointers to them in MonolingualCorpus.sequence_indexes. Modified Paths: -------------- src/cunei/corpus/MonolingualCorpus.java src/cunei/corpus/PanliteCorpusWriter.java src/cunei/document/SimpleDocumentWriter.java src/cunei/synonym/CorpusSynonymBuilder.java src/cunei/synonym/CorpusSynonymFinder.java src/cunei/type/SequenceType.java src/cunei/type/TypesOfTypes.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-07-07 02:45:22
|
Revision: 120 http://cunei.svn.sourceforge.net/cunei/?rev=120&view=rev Author: aaronbphillips Date: 2009-07-07 02:45:19 +0000 (Tue, 07 Jul 2009) Log Message: ----------- Added ContextModel in Match. Fixed loading of BitSet that represents which SequenceTypes are present for each Example. Fixed bugs in setting right sub-example. No further known issues in matching code. Modified Paths: -------------- src/cunei/confusion/ConfusionPath.java src/cunei/corpus/Example.java src/cunei/corpus/MonolingualCorpus.java src/cunei/corpus/MultilingualCorpus.java src/cunei/translate/Match.java src/cunei/translate/MatchPath.java src/cunei/translate/Matcher.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-07-07 18:18:32
|
Revision: 128 http://cunei.svn.sourceforge.net/cunei/?rev=128&view=rev Author: aaronbphillips Date: 2009-07-07 18:18:31 +0000 (Tue, 07 Jul 2009) Log Message: ----------- Replaced HashSet with LinkedHashSet in order to guarantee consistency in sampling code. Modified Paths: -------------- src/cunei/corpus/Example.java src/cunei/translate/MatchPath.java src/cunei/translate/Matcher.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-07-07 22:07:11
|
Revision: 136 http://cunei.svn.sourceforge.net/cunei/?rev=136&view=rev Author: aaronbphillips Date: 2009-07-07 22:07:07 +0000 (Tue, 07 Jul 2009) Log Message: ----------- Made start of sentence, end of sentence, and unknown token in language model a configuration parameter. Modified Paths: -------------- src/cunei/cli/ScoreLanguageModel.java src/cunei/decode/HypothesisBuilder.java src/cunei/lm/BackoffLanguageModel.java src/cunei/lm/LanguageModel.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-07-08 12:54:48
|
Revision: 139 http://cunei.svn.sourceforge.net/cunei/?rev=139&view=rev Author: aaronbphillips Date: 2009-07-08 12:43:26 +0000 (Wed, 08 Jul 2009) Log Message: ----------- Added corpus loading back into Optimizer so it is not re-loaded every iteration. Modified Paths: -------------- src/cunei/cli/Translate.java src/cunei/optimize/Optimizer.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <ral...@us...> - 2009-07-08 22:35:34
|
Revision: 145 http://cunei.svn.sourceforge.net/cunei/?rev=145&view=rev Author: ralfbrown Date: 2009-07-08 22:35:31 +0000 (Wed, 08 Jul 2009) Log Message: ----------- Changed one of the hashmaps in Context/DocumentContext to a simple array for speed and added synchronization for thread-safety. Skip nodes in the confusion networks which already have a translation (such as inserted by the passthrough annotator). Modified Paths: -------------- src/cunei/corpus/Context.java src/cunei/corpus/DocumentContext.java src/cunei/synonym/CorpusSynonymBuilder.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <ral...@us...> - 2009-07-10 16:12:47
|
Revision: 151 http://cunei.svn.sourceforge.net/cunei/?rev=151&view=rev Author: ralfbrown Date: 2009-07-10 16:12:46 +0000 (Fri, 10 Jul 2009) Log Message: ----------- Added ConfusionPath.getSourcePhraseReversed because the synonym finder wasn't getting contexts > 1 word as a result of seeing the words in the context in the wrong order. (Should check if this affects loadFuzzyMatches as well.) Added feature Synonym.Weights.AverageContext to allow us to give more weight to candidates with more context. Modified Paths: -------------- src/cunei/confusion/ConfusionNetwork.java src/cunei/confusion/ConfusionPath.java src/cunei/synonym/CorpusSynonym.java src/cunei/synonym/CorpusSynonymBuilder.java src/cunei/synonym/CorpusSynonymFinder.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-07-12 05:51:17
|
Revision: 156 http://cunei.svn.sourceforge.net/cunei/?rev=156&view=rev Author: aaronbphillips Date: 2009-07-12 05:51:11 +0000 (Sun, 12 Jul 2009) Log Message: ----------- Still trying to get annotations working. No known errors, but this commit is a temporary place-holder, and I would recommend waiting until the next version (or two) before updating. Modified Paths: -------------- src/cunei/confusion/ConfusionNetwork.java src/cunei/confusion/ConfusionPath.java src/cunei/confusion/XMLConfusionReader.java src/cunei/confusion/XMLConfusionWriter.java src/cunei/corpus/AnnotationIndex.java src/cunei/corpus/MonolingualCorpus.java src/cunei/corpus/MultiFileCorpusReader.java src/cunei/corpus/MultiFileCorpusWriter.java src/cunei/corpus/MultilingualCorpus.java src/cunei/decode/HypothesisBuilder.java src/cunei/decode/XMLDecoderWriter.java src/cunei/document/Phrase.java src/cunei/document/SimpleDocumentReader.java src/cunei/document/SimpleDocumentWriter.java src/cunei/lattice/Lattice.java src/cunei/lattice/MergedPrunedSet.java src/cunei/synonym/CorpusSynonymBuilder.java src/cunei/translate/Hypothesis.java src/cunei/translate/Match.java src/cunei/translate/PhraseModel.java src/cunei/translate/Similarity.java src/cunei/translate/Translation.java src/cunei/type/Annotation.java src/cunei/type/AnnotationState.java src/cunei/type/AnnotationType.java Added Paths: ----------- src/cunei/type/AnnotationLattice.java src/cunei/util/SingleLinkedList.java Removed Paths: ------------- src/cunei/confusion/ChangeLog src/cunei/type/AnnotationStates.java src/cunei/util/SingleLinkedList.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <ral...@us...> - 2009-07-13 15:47:39
|
Revision: 162 http://cunei.svn.sourceforge.net/cunei/?rev=162&view=rev Author: ralfbrown Date: 2009-07-13 15:47:32 +0000 (Mon, 13 Jul 2009) Log Message: ----------- Split Context computation in preparation for doing the actual computation at indexing time and just loading the results at runtime. Added MonolingualCorpus.getContext since there is now a Context associated with each corpus. Was not passing proper document context when looking up context matches in the bilingual corpus. Modified Paths: -------------- src/cunei/cli/ShowSynonyms.java src/cunei/cli/Translate.java src/cunei/confusion/ConfusionNetwork.java src/cunei/corpus/Context.java src/cunei/corpus/DocumentContext.java src/cunei/corpus/MultilingualCorpus.java src/cunei/document/Phrase.java src/cunei/document/SimpleDocumentReader.java src/cunei/synonym/CorpusSynonymBuilder.java src/cunei/synonym/CorpusSynonymFinder.java src/cunei/translate/Matcher.java src/cunei/translate/Translator.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <aar...@us...> - 2009-07-13 16:14:31
|
Revision: 165 http://cunei.svn.sourceforge.net/cunei/?rev=165&view=rev Author: aaronbphillips Date: 2009-07-13 16:14:23 +0000 (Mon, 13 Jul 2009) Log Message: ----------- Added copyright headers to files that were missing it. Modified Paths: -------------- src/cunei/corpus/DocumentContext.java src/cunei/corpus/PositionLocator.java src/cunei/lexicon/Lexicons.java src/cunei/lm/LanguageModels.java src/cunei/processors/Canonicalizer.java src/cunei/sort/Dedupable.java src/cunei/sort/Remapable.java src/cunei/sort/Sorter.java src/cunei/util/Tokenizer.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <ral...@us...> - 2009-07-17 15:07:21
|
Revision: 191 http://cunei.svn.sourceforge.net/cunei/?rev=191&view=rev Author: ralfbrown Date: 2009-07-17 15:07:20 +0000 (Fri, 17 Jul 2009) Log Message: ----------- Included UnicodeSequence this time.... Modified Paths: -------------- src/cunei/confusion/ConfusionPath.java src/cunei/synonym/CorpusSynonym.java src/cunei/synonym/CorpusSynonymBuilder.java src/cunei/type/TypeSequence.java Added Paths: ----------- src/cunei/util/UnicodeSequence.java This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |