Re: [Fontforge-users] how to subset a complex font without inadvertently breaking it?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Fri, 26 Apr 2013, Aaron Turner wrote:
> which g.unicode falls outside of the target Unicode subset then how can
> I be *absolutely* sure that the glyph so removed was *not* required for
> the proper rendering of the codepoints falling within the target subset?

The only simple way to be absolutely certain of this is to not subset the
font in the first place.

It should be possible to start with a list of code points, find all
substitution rules that could ever be activated by those code points, from
that extract a list of glyphs, and apply the operation recursively to see
which glyphs those could lead to, and so on.  When the set of glyphs
doesn't change through an iteration, you know any remaining glyphs NOT in
that set can be safely removed.  FontForge doesn't do this automatically
and I think it would be difficult to implement in FontForge scripts
(whether native or Python) because of limited access to the relevant data.

I think an easier way to implement this approach would be to write a
separate utility that would analyse the font and find candidate glyphs to
remove; FontForge could then remove the glyphs.  This approach would
necessarily err on the side of caution - it might be possible that there
could be some glyphs for which it would say "keep" when it would actually
have been safe to remove them - but that shouldn't be a real problem.  An
example would be if there is a glyph that can only be introduced in
response to a sequence of glyphs that can never occur because of other
substitutions done earlier; if the individual glyphs in the trigger
sequence are all keepers, the algorithm might say to keep the result glyph
as well even though it can never actually occur because the *sequence*
can't.

Really analysing it deeply, to the point that it could be proven not only
that every glyph needed is kept, but also no glyphs are kept *unless* they
are needed, seems hard.  I doubt that it is actually impossible, because
of the finite limits on the number of rules that can activate per glyph in
the input and the lengths of sequences that can trigger rules.  It seems
like the entire thing could be turned into a single large finite state
machine from which all possible output (or intermediate) glyphs could be
found.  Not trivial, though, and probably not worthwhile.
-- 
Matthew Skala
ms...@an...                 People before principles.
http://ansuz.sooke.bc.ca/