Please use https://code.google.com/p/language-detection/ for language detection since it is better than the language detection in Tika. Also Tika community is considering using this project for its language detecten. More languages are supported with better statistical models of which also versions are shipped to work especially with short texts. See also https://issues.apache.org/jira/browse/TIKA-369
Another interesting resource: http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html
copied to https://github.com/languagetool-org/languagetool/issues/139