Improve matching, currently implementation lacks state-
of-the-art technology that can provide better matching
Edit distance dynamic programming algorithm is non-
Logged In: YES
Improved matching will be added in an upcoming omegat
Logged In: YES
All the RFE that you filled in October already have code available in OmegaT
except for this one. So if you could focus your energy here it would really be
We have identified a number of issues, related to how Java tokenizes strings,
if you want to be kept up to date don't hesitate to ask questions where the
things are discussed.
Also, I don't want to be picky about English, not being a native speaker, but
when you write "algorithm is non-optimized", do you mean it has been
consciously made so, or do you rather mean "algorithm is not optimized" ? I
I'll do what I can, as time permits.
It may be the case that Java string tokenizing affects
possible use for matching, but a lack of in depth knowledge
is also a factor affecting OmegaT development in this
"Optimized" is the past tense of the transitive verb
"Optimize". You are confused.
Rather than a lack of knowledge, it is a lack of time that plagues "us".
As for tokenization, it affects how languages without spaces between words
are handled by Java, Japanese for ex. As far as I am concerned that is one of
the most important issue facing dev right now.
Do we match on the token or on the token substrings (character by character)
etc. The previous version of the match engine did not use tokenization and
the strings were matched characters by characters, whatever the position of
the characters in the strings and regardless of the overall semantic value of
sign families (kanji should have more weight than kana for ex) etc.
I'm willing to test anything that improves on the current situation if you have
As for "non-optimized" and "not optimized" I think there is a major difference
between the two and wondered why you had chosen "non-optimized".
Obviously, the problem is only with certain languages that
cannot be tokenized in the simple word-like manner of
European languages or others.
Without looking further into it, I would guess that a more
evolved approach is needed that does not just rely on a
single methodology to process content. Perhaps a hybrid
approach that takes into account the features of a
particular language. The problem is basically one of scope
(i.e. character vs. word). The development on OmegaT is not
being considerate of certain issues as it moves forward,
this shows a lack of knowledge.
Standard software development practices require regression
testing to show that new versions do not break working
functionality. Experienced developers know this and work
with it. OmegaT does not follow good development practices.
In regard to "non-optimized", "non-" is a standard prefix
that can be added to many words in English, even to create
those that are not listed in dictionaries. Perhaps its
nature is more colloquial. Nonetheless, it is used. Just do
a search for it.
Some people even use "nonoptimized", which to me is
somewhat, if not completely incorrect.
Well, obviously that is the case. Thank you for reminding me what I just
wrote. As for the more evolved approach, this is in fact what my remarks are
about. Since you think the current implementation is non-optimized, I am
suggesting that this item in particular could be on your priority list.
As for omegat dev not being considerate, well it is quite the opposite. The
current behavior, although not producing results as good as I expected is way
better than matching a string character by character as it was implemented
Since omegat dev currently is not able to naturally parse such languages
there is a problem here that can't easily be solved. One thing that could be
investigated for ex is to yet sub-tokenize the Java tokens by "semantic"
means with the help of a dictionary for ex. Obviously that leads to language
modularization and maybe memory intensive loading.
Or as was suggested in my preceding comment, thank you for considering it
is a valid approach btw, by using a "hybrid" approach that does further
matching within the unmatched tokens. That could be used for any language
(esp. ones that add a number of suffixes like German or French/Russian etc)
to match simple plurals or verb conjugations etc.
If you are really serious about improving the matching engine, you will need
to look further into it at some time. Hopefully your professional approach will
produce something that will be of value to the omegat code.
If you need further information about how natural languages modify their
structures and how to deal with that in omegat there is a whole team of
linguists on the other side of your fence.
This is forum is for omega t+ feature requests, not lengthy
discussions of other issues. Please post your messages on
the omega t+ development list, Google group, or other
I am suggesting you approaches that you may have not considered. I am not
You know that the dev list is currently non-functional, it is merely a place
where you put memos on what you think you'll do someday and as far as
google is concerned I am not allowed to "discuss" there.
You may be unclear what the meaning of "add a comment" is. The string right
above the window in which I am currently typing.
You are free to ignore my comments regarding your feature requests. If you
think my comments are out of place here, feel free to ignore them
I am actually giving you hints here so that you don't have to keep lurking on
omegat's dev list for fresh ideas on how to deal with issues that you don't
seem to understand very well.
I don't really care what you think and view anything you
say with suspicion.
I am not interested in helping OmegaT. If what I do on
omega t+ or elsewhere does help in some way, then that is
just a side effect of the open source method. I have no
intention of directly aiding those who work against me.
Your words lack credibility and you cannot be trusted.
Logged In: YES
Improved functionality will be added towards OmegaT+ 1.0. This is a very fundamental issue that is of high importance. Thus it requires serious consideration of design issues, e.g. usefulness, efficiency, optimization, and so forth.
More comments at a later date.