In the Fischer Furniture and PIASC applications, some
words are being split acorss multiple elements, broken
into individual characters, and/or dropping characters.
(Note: These are words that were previously being
formed correctly before a similar problem was fixed for
Docuforce.)
The following from the PRODUCER box (in the PIASC
application) should be "PRODUCER":
<TEXT x="21" y="748"><FONT face="Helvetica-Bold"
style="font-size:6pt">PRD</FONT></TEXT>
<TEXT x="47" y="748"><FONT face="Helvetica-Bold"
style="font-size:6pt">ER</FONT></TEXT>
The following from the PRODUCER box should be "323-
728-0483":
<TEXT x="114" y="737"><FONT face="Courier-Bold"
style="font-size:10pt">3</FONT></TEXT>
<TEXT x="126" y="737"><FONT face="Courier-Bold"
style="font-size:10pt">3</FONT></TEXT>
<TEXT x="150" y="737"><FONT face="Courier-Bold"
style="font-size:10pt">8</FONT></TEXT>
<TEXT x="162" y="737"><FONT face="Courier-Bold"
style="font-size:10pt">0</FONT></TEXT>
<TEXT x="174" y="737"><FONT face="Courier-Bold"
style="font-size:10pt">83</FONT></TEXT>
The following from the bottom of the page should
be "ACORD 125 (7/98)"
<UNENCLOSED x="27.0" y="41.0"><TEXT x="27"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">C</FONT></TEXT></UNENCLOSED>
<UNENCLOSED x="32.0" y="41.0"><TEXT x="32"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">O</FONT></TEXT></UNENCLOSED>
<UNENCLOSED x="61.0" y="41.0"><TEXT x="61"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">5</FONT></TEXT></UNENCLOSED>
<UNENCLOSED x="75.0" y="41.0"><TEXT x="75"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">/</FONT></TEXT></UNENCLOSED>
<UNENCLOSED x="77.0" y="41.0"><TEXT x="77"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">9</FONT></TEXT></UNENCLOSED>
<UNENCLOSED x="81.0" y="41.0"><TEXT x="81"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">8</FONT></TEXT></UNENCLOSED>
Logged In: YES
user_id=335059
Damn. I must have broken something. I will take a look at
this.
Logged In: YES
user_id=722301
Pay careful attention to page 2 of the ACORD 125 form as
corrections are made to fix the word splitting, as the number
of occurances of incorrectly split and incorrectly combined
words is much higher on the second page.
Logged In: YES
user_id=335059
I have fixed some of this problem but the unenclosed data is
still being combined incorrectly, well the unenclosed data is
not being combined at all.
Logged In: YES
user_id=335059
O.k. Take a look at the results of this now. Certainly the non-
combining of the unenclosed text problem has been largely
solved(as in it does it now :-p). I looked at the problems that
you describe on page 2 of the accord 125 form and found that
some of the data that looks like it is not combined correctly
is infact text that is aligned vertically not horizontally.
Currently I make no attempt to correctly combine words
vertically. An example of this is on page 2 the words:
Commercial General Liability. These words are aligned
vertically and come out a bit jumbled but since they are not
user data at all I didn't think that it mattered. Please correct
me if I am wrong on this matter.
If there are still problems related to these bugs can you post
some more examples or if there are no more problems can
you please move this bug to fixed.
Logged In: YES
user_id=722301
The combining of the unenclosed text is looking much better.
The PIASC application is now parsable. I also agree with your
analysis of page 2; I was looking for examples and picked
that one too quickly.
One small issue with PIASC is that the text in the
unenclosed fields is sometimes combined too tightly. In the
following example there should be a space in between
GENERAL and INFORAMATION. If necessary I can deal with
this in the parsing as long as it only occurs in static label
fields.
<UNENCLOSED x="21.0" y="268.0">
<TEXT x="21" y="268">
<FONT face="Helvetica-Bold" style="font-
size:8pt">GENERALINFORMATION</FONT>
</TEXT>
</UNENCLOSED>
Logged In: YES
user_id=722301
Another example of words being incorrectly combined can be
found in the General Information section of the Array
Biopharma application.
INCORRECT:
<BOX x1="18" y1="258" x2="280" y2="245">
<TEXT x="21" y="252">
<FONT face="Arial" style="font-size:6pt;font-
style:Bold">EXPLAIN ALL &quot;YES&quot;
RESPONSESYESNOEXPLAIN ALL &quot;YES&quot;
RESPONSESYESNO</FONT>
</TEXT>
</BOX>
WHAT IT SHOULD BE:
<BOX x1="18" y1="258" x2="280" y2="245">
<TEXT x="21" y="252">
<FONT face="Arial" style="font-size:6pt;font-
style:Bold">EXPLAIN ALL &quot;YES&quot;
RESPONSES</FONT>
</TEXT>
</BOX>
<BOX x1="18" y1="258" x2="280" y2="233"/>
<BOX x1="18" y1="258" x2="280" y2="221"/>
<BOX x1="18" y1="258" x2="280" y2="209"/>
<BOX x1="18" y1="258" x2="280" y2="197"/>
<BOX x1="18" y1="258" x2="280" y2="185"/>
<BOX x1="18" y1="258" x2="280" y2="173"/>
<BOX x1="18" y1="258" x2="592" y2="125"/>
<BOX x1="18" y1="258" x2="592" y2="65"/>
<BOX x1="18" y1="258" x2="592" y2="41"/>
<BOX x1="281" y1="258" x2="294" y2="245">
<TEXT x="282" y="252">
<FONT face="Arial" style="font-size:6pt;font-
style:Bold">YES</FONT>
</TEXT>
</BOX>
Logged In: YES
user_id=722301
I have run 20 applications containing ACORD 125 7/98
through "pdfToXml" and the "data parser," where the "data
parser" tries to recognize page 1 & 2 of this specific
application. Here are statistics from that sample set:
3 work 100%
3 work almost 100% but exhibit minor bug 721941
1 doesn't work at all due to bug 725697
13 work for most of page 1 and none of page 2 due to bugs
725645 and/or 706939 (this bug)