I'm not sure at which stage the encoding issues occur, but by the time features (like words) get written out to the feature files, they have issues (lots of question marks). You can reproduce by running webseer on an asian site and looking at the features output. It's a mess.