libTERE - Browse Files at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
libTERE-0.0.17.tar.gz	2015-05-21	172.5 kB	1
README	2015-05-21	14.1 kB	0
libTERE-0.0.16.tar.gz	2015-02-25	172.0 kB	0
libTERE-0.0.12.tar.gz	2014-08-07	171.7 kB	0
libTERE-0.0.10.tar.gz	2014-02-07	171.5 kB	0
libTERE-0.0.9.tar.gz	2014-02-07	170.7 kB	0
libTERE-0.0.8.tar.gz	2014-02-06	163.8 kB	0
libTERE-0.0.7.tar.gz	2013-05-16	160.0 kB	0
libTERE-0.0.6.tar.gz	2013-03-18	114.4 kB	0
libTERE-0.0.3.tar.gz	2012-12-12	103.8 kB	0
Totals: 10 Items		1.4 MB	1

OverView:

TERE == TExt REassembler

libTERE is a portable C99 implementation of a text reassembler. Its purpose
is to put back together complex formatted text which has been broken down into
smaller pieces when it was written to an output file. For instance, in Inkscape
a single editable piece of text may have parts with different sizes, colors,
fonts, subscripts, superscripts, and so forth. Each region with a different
format is written to an EMF file as a separate text object. When these are
read back into the program they are in the correct positions, but the logical
relations between them are lost. In particular, the original is not recreated, so
the assembly cannot be edited. To resolve this issue libTERE examines the
sequence and properties of text objects, and to the extent possible, re-creates
the original complex editable object. This result is stored in a data structure
containing 1 or more paragraphs, each of which contains 1 or more lines, each of
which contains 1 or more of the original text objects. Barring a bug somewhere,
is lost when processed through libTERE. The worst case should be that it comes
out with just as unrelated pieces as went in. The best case is that it comes
out fully reassembled and editable.

In future libTERE may support R->L and T->B languages, but at present only
a L->R language has been tested.

libTERE is distributed under the GPL 2 license.

Current version 0.0.17 2015-05-21.

----------------------------------------------------------------
Building the test program.

With full debugging:
gcc -Wall -DTEST -DDBG_TR_PARA -DDBG_TR_INPUT -I. -I/usr/include/freetype2 -o text_reassemble text_reassemble.c uemf_utf.c -lfreetype -lfontconfig -lm

Without debugging:
gcc -Wall -DTEST -I. -I/usr/include/freetype2 -o text_reassemble text_reassemble.c uemf_utf.c -lfreetype -lfontconfig -lm

Compiling to an object files:
(Note: if libUEMF is also present on the system then do not compile the uemf_utf.c from libTERE, use the one from
libUEMF instead.)
gcc -c -Wall -I. -I/usr/include/freetype2 text_reassemble.c uemf_utf.c

----------------------------------------------------------------

Known bugs and limitations:

1. If the first sentence of a paragraph is indented by a method that omits
the leading spaces, then that sentence will not be grouped with the rest of the
paragraph.

2. Only English and Hebrew have been tested. Other L->R and R->L languages should work too.
Top to bottom languages like Chinese have not been tested and are not expected to group properly.

3. Requires Fontconfig and Freetype2.

4. Narrow fonts are poorly supported - because current Fontconfig implementations
return font metrics for these that are not a good match for the font. Also these
fonts are generally not present on Linux systems.

5. TERE depends to a large extent on the text objects in the input file being in
logical order. So if a series of left justified lines which would otherwise be grouped
are placed into the file in arbitrary order, they will not be grouped as expected, and
may not be grouped at all.

6. Reassembly of formatted math formulas generally works to the extent numerators or denominators
are grouped into single lines.

7. A font size change of >2x prevents text from being grouped. This is advantageous in the context
of math formulas, as it keeps Summation and Integral operators from merging in where they should not,
but it will break up some (overly) creatively formatted text.

8. libTERE implements font substitution, so it will try to work around missing font using those
currently on the system. However, results are definitely better if all of the fonts used in
the source material are available to the reassembler, since even a close substitution tends to
have glyphs with slightly different sizes.

----------------------------------------------------------------
Files in this distribution:

COPYING GPL 2 license.

bug_revdir.txt
bug_revdir.dump.svg
Test cases for LR and RL actually drawn the wrong way around.

convert_reademf_text.sh
Script using the extract program (from drm_tools) that converts the output
of reademf (from libUEMF) to input for the test program.

convert_readwmf_text.sh
Script using the extract program (from drm_tools) that converts the output
of readwmf (from libUEMF) to input for the test program.

COPYING
GPL 2 license

Doxyfile Doxygen configuration file.

formatted_text_en_test.svg
formatted_text_en_test.emf
formatted_text_en_test.txt
formatted_text_en_test.dump.svg
English. Source document, intermediate EMF file, test input, and result
file for TERE test program (full debugging output compiled in). Note that
the fonts Arial and Times New Roman must be present on the system,
or font substitution will occur and the results will not match
exactly.

formatted_text_en_test_bkg2.txt
formatted_text_en_test_bkg2.dump.svg
Variant of formattest_text_en_test.txt with background set to mode 2
(underwrite eash assembled line) and debugging turned off. Text
decoration is also tested.

formatted_text_he_test.svg
formatted_text_he_test.emf
formatted_text_he_test.txt
formatted_text_he_test.dump.svg
Hebrew. Source document, intermediate EMF file, test input, and result
file for TERE test program (compiled with -DTEST). Note that
the fonts Arial and Times New Roman must be present on the system,
or font substitution will occur and the results will not match
exactly.

ft_example.c
Small test program for examining font information using
fontconfig.
This does not have Doxygen comments.
Usage: ./ft_example arial

generated.c Code produced by make_ucd_mn_table.c which tests whether
a unicode value is of type Mn (Mark, non spacing) or not.ls

kerning_tests_en.svg
kerning_tests_en.emf
kerning_tests_en.txt
kerning_tests_en.dump.svg
English. Source document, intermediate EMF file, test input, and result
file for TERE test program (compiled with -DTEST). Note that
the fonts Arial and Times New Roman must be present on the system,
or font substitution will occur and the results will not match
exactly.

kerning_tests_he.svg
kerning_tests_he.emf
kerning_tests_he.txt
kerning_tests_he.dump.svg
Hebrew. Source document, intermediate EMF file, test input, and result
file for TERE test program (compiled with -DTEST). Note that
the fonts Arial and Ezra SIL SR must be present on the system,
or font substitution will occur and the results will not match
exactly.

make_ucd_mn_table.c
Source code for the make_ucd_mn_table utility. It is used to
generate the look up table in text_reassemble.c for mn
(Mark, nonspacing) from the unicode source files. This information
is needed to calculate text widths when nonspacing glyphs are
encountered, as this information is not generally available through
Freetype. The output is also shown in generated.c.

missing_spaces.svg
missing_spaces.emf
missing_spaces.txt
missing_spaces.dump.svg
Tests for reconstructing text emitted without spaces. Long x kerns
are replaced with 1 or 2 spaces.

mnlist.txt Table of all Mark, noncoding Unicode values at the time of this release.

README
This file

test_examples.sh
Script that runs text_reassemble (compiled with -DTEST) on the examples
provided and compares the results. Result is pass (identical) or fail (any
difference). Note that the test system must have every font named in the
test files installed or it will fail - even a very close font substitution will
change positions slightly. These fonts are: Arial, Ezra SIL, Ezra SIL SR, and
Times New Roman. The first and last should be installed on any Windows system
and are part of "Microsoft core fonts", and the Ezra fonts are from Sil International,
currently at URL http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=silhebrunic2
Use like: ./test_examples.sh (normal run)
Or like: ./test_examples.sh anything (run in valgrind, output to vg_test_examples.log)

text_reassemble.c
text_reassemble.h
Source code for libTERE and the text_reassemble test program.
These have Doxygen comments.
Usage: ./text_reassemble input.txt

uemf_utf.c
uemf_utf.h
Source code for some text utilities.
These routines are exact duplicates from libUEMF.
If shared libraries are built for both libUEMF and libTERE leave these
two files out of the latter, and include them in the former.
(like Inkscape).
These have Doxygen comments.

vg_fc.supp
Valgrind suppression file, for text_examples.sh.

----------------------------------------------------------------
Acknowledgements.

Many thanks to Aharon Varady for supplying the Hebrew test samples.

----------------------------------------------------------------
Revision history:

0.0.17 2015-05-21
Modified ft_example to provide more information.
Fixed bug in text_reassemble conversion to SVG was placing text decorations
onto the line without spaces between underline, overline, etc.

0.0.16 2015-02-25
Modified TR_layout_2_svg() to change to POSIX locale for numeric values
and then return to previous value. Needed because floats in SVG must be
"12.34", not "12,34".

0.0.12 2014-07-24
Fixed problem when units_per_EM was not 2048, caused problems with Chinese characters
which usually have 256 for that value.
Corrected documentation, missing_text -> missing_spaces, vf_fc -> vg_fc.

0.0.11 2014-03-24
Fixed - removed one line that had no effect.

0.0.10 2014-02-07
Fixed bugs in ftinfo_load_fontname, failure status was supposed to be negative
value, but was positive. Rearranged code in this function somewhat to make it
clearer. Catch possible memory leaks on error conditions.

Add valgrind mode to test test_examples.sh.

Add valgrind suppression file for FontConfig's issues.

0.0.9 2014-02-06
Fixed bug in convert_reademf_text.sh: reademf changed fOptions output from decimal to
hex, so RL text wasn't being properly detected.

Added code to replace some long kerns in x with spaces. Useful for reconstructing
spaces in text which is emitted without them. Adding missing_text tests for this.

0.0.8 2014-02-06
Added script to convert from WMF to input for test program.

Set a few values explicitly on clear/initialize (which should not have
mattered in an actual run.

Expanded upstream test so that it also rejects LR text drawn RL and vice versa
This can happen if the input's text direction is corrupt or just wrong. With this
change these do not assemble and so the glyphs stay in the same place. Previously
they did assemble, and the SVG viewer would draw them in the indicated (wrong) direction.
Added bug test files for this case and added it to test_examples.sh.

0.0.7 2013-05-14
Added support for R->L languages, tested with Hebrew. (Thanks to
Aharon Varady for providing some test files!) Ambiguous RTL and LTR
combinations (like logical order {RTL, LTR} with physial positions {L,R}
do not assemble.

Added support for Mark, nonspacing glyphs. The glyphs with this
property are indicated in a table. This information
was not being returned by Freetype, which was resulting in incorrect
width calculations.

Added support for font failover, so that it now searches down through fonts
for a glyph for a character if none is present in the primary font.

Worked around bizarre gcc optimization bug, where (*a <== b) was
testing false when doubles *a and b had exactly the same value.
(This was due to excess double bits being kept in one case, and discarded
on store to 64 bits of memory in the other.)

Added "const" in functions, where possible.

Expanded Text Decorations to support CSS3.

0.0.6 2013-02-12 Added options for background color. Modes are:
0 no background
1 each input text fragment is underwritten background color
2 each assembled line is underwritten with background color.
2 entire assembly is underwritten with background color.
Previously mode 0 was the only output possible.

Added text decorations. (Underline, strike-through, etc.) Not very many
SVG implementations handle these properly, but Opera does.

0.0.5 2013-02-19 Changed type of text color from uint32_t to a struct
to eliminate endian problems.

0.0.4 2013-01-24 Added overlap restriction for successive text when
building a line, so that only well structured lines are assembled.
Grossly misformatted text read in, for instance, with a word written
over the text at the front of a line, should not now be assembled
into a single line, as it was previously.

Slightly modified calculations of asc/dsc so that for bounding box
it uses actual values for text, but for calculating offset as a function
of text alignment it uses a standard set of characters "fFyg|`^". Previously
there were some instances where the text specific asc/dsc were different
enough from the "font" one that the text might move slightly.

Modified convert_reademf_text.sh to accept new output syntax
of reademf from libUEMF 0.1.0.
0.0.3 2012-12-12 First release.
----------------------------------------------------------------
Feedback etc.

Please send comments and patches to David Mathog at mathog@caltech.edu.

Source: README, updated 2015-05-21

libTERE Files

libTERE is a portable text reassembler.

libTERE Files

libTERE is a portable text reassembler.

Get an email when there's a new version of libTERE