Menu

#181 Leading spaces in text objects discarded in output files

fig2dev
pending
nobody
None
2024-12-31
2024-12-31
Jeff Stuart
No

We have noticed that when using newer version of fig2dev (3.2.8b) , text objects that have leading spaces are having their spaces removed in the output files. If I run the same file through an older version (3.2.6a) the spaces are preserved. I've been looking through change logs and haven't found an obvious change that might've led to this, though I suspect perhaps the changes in early 2020 related to text conversion (textconvert.c)?

Is this a known change in behavior? Removing the leading spaces and moving the text object over "fixes" the issue but that is not feasible due to the number of legacy files that have this issue. Can you provide any other workaround?

Discussion

  • tkl

    tkl - 2024-12-31

    Good catch. The error was introduced with commit [41b9bb] from Jan 2020, hence it should have been present since fig2dev version 3.2.8. The bug was introduced when changing line 1336 in read.c from

    n = sscanf(buf, "%*d%d%d%d%d%d%lf%lf%d%lf%lf%d%d%[^\n]",
    

    to

    n = sscanf(*line, "%*d%d%d%d%d%d%lf%lf%d%lf%lf%d%d %n",
    

    Note the space at the end of the format string, which consumes all space following the last digit in the text specification line.
    The fix is to patch the sources,

    diff --git a/fig2dev/read.c b/fig2dev/read.c
    index eab5c0e..5094495 100644
    --- a/fig2dev/read.c
    +++ b/fig2dev/read.c
    @@ -1647,7 +1647,7 @@ read_textobject(FILE *fp, char **restrict line, size_t *line_len, int *line_no)
            t->comments = NULL;
            t->next = NULL;
    
    -       n = sscanf(*line, "%*d%d%d%d%d%d%lf%lf%d%lf%lf%d%d %n",
    +       n = sscanf(*line, "%*d%d%d%d%d%d%lf%lf%d%lf%lf%d%d%n",
                            &t->type, &t->color, &t->depth, &t->pen, &t->font,
                            &t->size, &t->angle, &t->flags, &t->height, &t->length,
                            &t->base_x, &t->base_y, &i);
    @@ -1656,7 +1656,7 @@ read_textobject(FILE *fp, char **restrict line, size_t *line_len, int *line_no)
                    free(t);
                    return NULL;
            }
    -       start = *line + i;
    +       start = *line + i + 1;
            end = find_end(start, v30_flag);
    
            if (end) {
    

    A workaround would be to replace the first space character in the string by, e.g., non-breaking space (u+00A0).,

    sed '/^4/ s/\(4 \([[:graph:]]* \)\{12\}\) /\1 /' spaces.fig  >nbs.fig
    # the space after \1 is a no-breaking space,
    # or, with gnu-sed (the above also uses features from gnu sed
    sed '/^4/ s/\(4 \([[:graph:]]* \)\{12\}\) /\1\xc2\xa0/' spaces.fig > nbs.fig
    
     

    Related

    Commit: [41b9bb]

  • Jeff Stuart

    Jeff Stuart - 2024-12-31

    Thanks for the prompt response. I spent a good amount of time staring at the sscanf line but I'm not familiar enough with that function to recognize the problem.

    I found that I need to use just \xa0 instead of \xc2\xa0 for a nbsp in my use case, I'm guessing our files aren't UTF-8 encoded or something? If I use \xc2\xa0 I end up with a "Â" in the output files in addition to the nbsp.

    Anyways, with that minor change to your sed command my issue is fixed, thanks again!

     

    Last edit: Jeff Stuart 2024-12-31
  • tkl

    tkl - 2024-12-31

    In the first sscanf line, the %[^\n] format specification reads everything that is not a new-line. In the second line, the %n stores the characters read thus far through the next pointer. The catch is, a space in the format string matches any amount of white space, thus this space gobbled up also the leading spaces in the string. In the sscanf line in the diff output the %n immediately follows the preceding %d, hence all space is preserved, including the one that separates the last number from the string. Therfore, the + 1 a few lines further down.

    If only \xa0 is needed, your files seem to be latin-1 (or latin-something) encoded. Xfig later than 3.2.9 stores all files utf-8 encoded. Therefore, another option is to update first the fig files with a modern xfig, xfig -update f1.fig f2.fig .., and then use the above sed-command that expects utf-8. By the way, [[:graph:]] could be replaced by [-0-9.].

     
  • tkl

    tkl - 2024-12-31
    • status: open --> pending
     
  • tkl

    tkl - 2024-12-31

    Fixed with commit [76276a].

     

    Related

    Commit: [76276a]


Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.