Hi,
I've started to make some test.
In attach there is a patch that works only for filename.
It has some problems:
- file name is very short so it's difficult to detect
correctly. Quite often the text match another, similar,
language.
I tryed both with full path name and with filename only
and there is not much difference.
Anyway many similar languages are mapped to the same
encoding, so a false match should not change the result.
- There are a lot of languages. I had to remove some
(esperanto, catalan and other 2) to remove most of
wrong matches I had (in attach there is the complete list).
- I mapped just some language->encoding (based on
mozilla view->char coding menu).
- If a language is not mapped or do not match at all, I
use the default platform encoding. I think a user
defined encoding could be better.
To run you have to add the jar to classpath and extract
LM.jar dir under xnap dir.
Let me know how it works for you.
Bye
patch and other files