Menu

#6 Some words are not being built correctly

open
Documents (10)
5
2003-04-29
2003-03-20
Tad Woods
No

In the Fischer Furniture and PIASC applications, some
words are being split acorss multiple elements, broken
into individual characters, and/or dropping characters.
(Note: These are words that were previously being
formed correctly before a similar problem was fixed for
Docuforce.)

The following from the PRODUCER box (in the PIASC
application) should be "PRODUCER":

<TEXT x="21" y="748"><FONT face="Helvetica-Bold"
style="font-size:6pt">PRD</FONT></TEXT>
<TEXT x="47" y="748"><FONT face="Helvetica-Bold"
style="font-size:6pt">ER</FONT></TEXT>

The following from the PRODUCER box should be "323-
728-0483":

<TEXT x="114" y="737"><FONT face="Courier-Bold"
style="font-size:10pt">3</FONT></TEXT>
<TEXT x="126" y="737"><FONT face="Courier-Bold"
style="font-size:10pt">3</FONT></TEXT>
<TEXT x="150" y="737"><FONT face="Courier-Bold"
style="font-size:10pt">8</FONT></TEXT>
<TEXT x="162" y="737"><FONT face="Courier-Bold"
style="font-size:10pt">0</FONT></TEXT>
<TEXT x="174" y="737"><FONT face="Courier-Bold"
style="font-size:10pt">83</FONT></TEXT>

The following from the bottom of the page should
be "ACORD 125 (7/98)"

<UNENCLOSED x="27.0" y="41.0"><TEXT x="27"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">C</FONT></TEXT></UNENCLOSED>
<UNENCLOSED x="32.0" y="41.0"><TEXT x="32"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">O</FONT></TEXT></UNENCLOSED>
<UNENCLOSED x="61.0" y="41.0"><TEXT x="61"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">5</FONT></TEXT></UNENCLOSED>
<UNENCLOSED x="75.0" y="41.0"><TEXT x="75"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">/</FONT></TEXT></UNENCLOSED>
<UNENCLOSED x="77.0" y="41.0"><TEXT x="77"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">9</FONT></TEXT></UNENCLOSED>
<UNENCLOSED x="81.0" y="41.0"><TEXT x="81"
y="41"><FONT face="Helvetica-Bold" style="font-
size:8pt">8</FONT></TEXT></UNENCLOSED>

Discussion

  • Douglas Sellers

    Douglas Sellers - 2003-03-25

    Logged In: YES
    user_id=335059

    Damn. I must have broken something. I will take a look at
    this.

     
  • Douglas Sellers

    Douglas Sellers - 2003-03-25
    • priority: 5 --> 3
    • assigned_to: nobody --> douglasjsellers
     
  • Tad Woods

    Tad Woods - 2003-03-28

    Logged In: YES
    user_id=722301

    Pay careful attention to page 2 of the ACORD 125 form as
    corrections are made to fix the word splitting, as the number
    of occurances of incorrectly split and incorrectly combined
    words is much higher on the second page.

     
  • Douglas Sellers

    Douglas Sellers - 2003-04-09

    Logged In: YES
    user_id=335059

    I have fixed some of this problem but the unenclosed data is
    still being combined incorrectly, well the unenclosed data is
    not being combined at all.

     
  • Douglas Sellers

    Douglas Sellers - 2003-04-21

    Logged In: YES
    user_id=335059

    O.k. Take a look at the results of this now. Certainly the non-
    combining of the unenclosed text problem has been largely
    solved(as in it does it now :-p). I looked at the problems that
    you describe on page 2 of the accord 125 form and found that
    some of the data that looks like it is not combined correctly
    is infact text that is aligned vertically not horizontally.
    Currently I make no attempt to correctly combine words
    vertically. An example of this is on page 2 the words:
    Commercial General Liability. These words are aligned
    vertically and come out a bit jumbled but since they are not
    user data at all I didn't think that it mattered. Please correct
    me if I am wrong on this matter.

    If there are still problems related to these bugs can you post
    some more examples or if there are no more problems can
    you please move this bug to fixed.

     
  • Tad Woods

    Tad Woods - 2003-04-21
    • priority: 3 --> 5
     
  • Tad Woods

    Tad Woods - 2003-04-21

    Logged In: YES
    user_id=722301

    The combining of the unenclosed text is looking much better.
    The PIASC application is now parsable. I also agree with your
    analysis of page 2; I was looking for examples and picked
    that one too quickly.

    One small issue with PIASC is that the text in the
    unenclosed fields is sometimes combined too tightly. In the
    following example there should be a space in between
    GENERAL and INFORAMATION. If necessary I can deal with
    this in the parsing as long as it only occurs in static label
    fields.

    <UNENCLOSED x="21.0" y="268.0">
    <TEXT x="21" y="268">
    <FONT face="Helvetica-Bold" style="font-
    size:8pt">GENERALINFORMATION</FONT>
    </TEXT>
    </UNENCLOSED>

     
  • Tad Woods

    Tad Woods - 2003-04-24
    • priority: 5 --> 4
     
  • Tad Woods

    Tad Woods - 2003-04-29

    Logged In: YES
    user_id=722301

    Another example of words being incorrectly combined can be
    found in the General Information section of the Array
    Biopharma application.

    INCORRECT:
    <BOX x1="18" y1="258" x2="280" y2="245">
    <TEXT x="21" y="252">
    <FONT face="Arial" style="font-size:6pt;font-
    style:Bold">EXPLAIN ALL "YES"
    RESPONSESYESNOEXPLAIN ALL "YES"
    RESPONSESYESNO</FONT>
    </TEXT>
    </BOX>

    WHAT IT SHOULD BE:

    <BOX x1="18" y1="258" x2="280" y2="245">
    <TEXT x="21" y="252">
    <FONT face="Arial" style="font-size:6pt;font-
    style:Bold">EXPLAIN ALL "YES"
    RESPONSES</FONT>
    </TEXT>
    </BOX>
    <BOX x1="18" y1="258" x2="280" y2="233"/>
    <BOX x1="18" y1="258" x2="280" y2="221"/>
    <BOX x1="18" y1="258" x2="280" y2="209"/>
    <BOX x1="18" y1="258" x2="280" y2="197"/>
    <BOX x1="18" y1="258" x2="280" y2="185"/>
    <BOX x1="18" y1="258" x2="280" y2="173"/>
    <BOX x1="18" y1="258" x2="592" y2="125"/>
    <BOX x1="18" y1="258" x2="592" y2="65"/>
    <BOX x1="18" y1="258" x2="592" y2="41"/>
    <BOX x1="281" y1="258" x2="294" y2="245">
    <TEXT x="282" y="252">
    <FONT face="Arial" style="font-size:6pt;font-
    style:Bold">YES</FONT>
    </TEXT>
    </BOX>

     
  • Tad Woods

    Tad Woods - 2003-04-29
    • priority: 4 --> 5
     
  • Tad Woods

    Tad Woods - 2003-04-30

    Logged In: YES
    user_id=722301

    I have run 20 applications containing ACORD 125 7/98
    through "pdfToXml" and the "data parser," where the "data
    parser" tries to recognize page 1 & 2 of this specific
    application. Here are statistics from that sample set:

    3 work 100%
    3 work almost 100% but exhibit minor bug 721941
    1 doesn't work at all due to bug 725697
    13 work for most of page 1 and none of page 2 due to bugs
    725645 and/or 706939 (this bug)

     

Log in to post a comment.

MongoDB Logo MongoDB