[Fontforge-users] how to subset a complex font without inadvertently breaking it?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi

I have a large number of fonts (all downloaded from "copyleft" sources 
such as Google fonts, SIL, etc) and I wish to use Fontforge/Python to 
generate specific Unicode subsets of these fonts, for WOFF / @font-face 
purposes. I have seen a number of scripts purporting to do this, for 
example:

https://code.google.com/p/googlefontdirectory/source/browse/tools/subset/?r=2aa228199858450a8b1cae4830dba7fe48257693

but nevertheless I am worried about inadvertently "breaking" a font.

For example, if I remove (e.g. via font.cut) a selected glyph g for 
which g.unicode returns -1 (i.e. "not a Unicode encoded glyph") or for 
which g.unicode falls outside of the target Unicode subset then how can 
I be *absolutely* sure that the glyph so removed was *not* required for 
the proper rendering of the codepoints falling within the target subset? 
I'm thinking particularly of scripts such as Arabic or Indic scripts 
such as Devanagari where (it is my understanding that) complex glyph 
substitutions may occur during the rendering process -- and of course if 
I've inadvertently removed a required glyph then these substitutions 
will not work properly and the subset will not render properly.

Perhaps (for all I know) Fontforge is clever enough to interpret all the 
sfnt tables in a complex font and thereby determine exactly when it is 
safe to remove a glyph and does not do so when it is not (although I 
would be very impressed if this was the case because I understand that 
in some cases the substitutions are encoded as finite state machines 
rather than relatively simple table look-ups).

Perhaps my newbie-ness is showing, in which case I apologise. Otherwise, 
any ideas peeps?

Regards, Aaron.