#56 support for large files

open
nobody
Feature (36)
5
2006-06-06
2006-06-06
Dan Nelson
No

Currently, joe's internals use "long"s for file
offsets, which prevents files over 2gb from being
loaded correctly on 32-bit platforms. This patch
converts then to off_t, and also adjusts any printf or
scanf specifiers to use %lld instead of %ld. This
isn't strictly the right way to do it, but an off_t
happens to be the same size as a long long when large
files are enabled on all the systems I tested on: AIX
(32&64-bit), Cygwin, FreeBSD, Linux, Solaris
(32&64-bit), and Tru64.

You may think "why in the world would he want to edit a
file larger than 2gb", but they do exist, and with a
fast enough raid array, it's very useable. You can
also use joe as a poor-man's disk editor if you switch
to hex mode and use the "filename,offset,length" syntax
to edit small parts of a large disk.

Discussion

  • Dan Nelson

    Dan Nelson - 2006-06-06
     
  • Joe Allen

    Joe Allen - 2006-06-07

    Logged In: YES
    user_id=1000448

    I've been meaning to at least switch to unsigned long, but I
    wanted to review all of the comparison math. There could be
    bugs if there is any code like: if (a - b < 0) (which is
    wrong anyway, but I wanted to check).

    The other problem is "%lld"- I don't know how backward
    compatible this is. If old systems can not handle it, we
    need two strings and #ifs.

    Another large file enhancement is this: JOE uses a swap
    file, so when you open a large file it gets copied to the
    swap file. This is slow if for no other reason than that it
    causes the drive head to move all over the place. It would
    be far better to use a copy-on-write scheme and reload from
    the original file as needed. The main issue with this is
    how to write any changes back to the original file: write to
    a new file, or try to be smarter...

    Another issue is that JOE needs to know how many lines are
    in the file, so it has to scan it at least once. This is
    more difficult to fix.

    I have a friend who works for a company whose product
    analyzes large log files. He claims that find in JOE is
    actually faster than grep, even though JOE currently copies
    the file. Anyway, he has been after me for better large
    file support as well. He wants folding (or least show only
    lines which match a pattern) as well.

     
  • Joe Allen

    Joe Allen - 2006-06-07

    Logged In: YES
    user_id=1000448

    Anyway, I've taken the patch.

     
  • Dan Nelson

    Dan Nelson - 2006-06-07
     
  • Dan Nelson

    Dan Nelson - 2006-06-07

    Logged In: YES
    user_id=4859

    Yes, older systems could be an issue. The gnome people use
    a fancy NEON_FORMAT autoconf macro that runs through a bunch
    of formats looking for compiler warnings, but I think just
    checking SIZEOF_* should suffice.

    How about the attached patch? I swapped the int and long
    checks because on short-file systems off_t is usually a long
    anyway. It lets me build without warnings on Linux, AIX,
    and Solaris when I configure with "CC=gcc CFLAGS=-Wformat
    ./configure --disable-largefile".

    Bah. I just noticed that will break gettext for the status
    line, since the msgstr can change for each platform. I
    wonder how other programs print off_t's with gettext.

    A COW swap file would definitely speed up our usual use-case
    (joe a humungous file, jump to a specific line number, edit
    a record, save), and shouldn't actually be too difficult to
    implement, if I understand vfile.c right. If you add two
    fields to struct vpage recording the original file offset
    and whether the page has ever been dirtied, you can point
    back into the original file for pages that haven't been
    written to. Once it's been touched you'll have to write it
    to the swapfile as usual.

     
  • Dan Nelson

    Dan Nelson - 2007-02-09

    Logged In: YES
    user_id=4859
    Originator: YES

    It looks like I may have missed an int->off_t conversion somewhere; directly joeing a large file on Linux ends up with a joe.tmp.* file a little under 2gb in size, and jumping to the end of the file and hitting "^K space" says "Offset 2147483646(0x7ffffffe)"

    (joe 3.5)

     
  • Dan Nelson

    Dan Nelson - 2007-02-14

    more long -> off_t conversions

     
  • Dan Nelson

    Dan Nelson - 2007-02-14

    Logged In: YES
    user_id=4859
    Originator: YES

    Ok, I missed a whole bunch of long->off_t conversions :) My original patch just fixed enough variables to allow partial-file viewing past the 2gb mark in a large file.

    Basically any variable dealing with P->byte, H->seg, or VPAGE->addr needs to be an off_t. Sun Studio's lint came in handy here, since it has a warning for storing large integer types in smaller variables.

    I did leave row, column, and line variables as longs, though, so joe will still have problems with a 3gb file of spaces or newlines.

    File Added: largefiles3.diff

     

Log in to post a comment.