#1517 line numbers limited to about 100,000-150,000 due to 32-bit memory

Bug
closed-invalid
None
5
2013-08-17
2013-08-15
No

please make scite/scintilla 32-bit+64-bit.
this involes avoiding using int or long for array indexes in 64-bit and using uint64_t or int64_t instead, knowing that the size of pointers can be 32-bit or 64-bit depending on target platform, working with WIN64 and LP64

http://jesusnjim.com/programming/common-compiler-defines.html

I often work with 100,000+ line html and sometimes with millions-line SQL. I have 64GB of RAM (not that I want it to be wasted, I often open about 100 files). 32-bit alone is just NOT doing it.

I need 32-bit for my old xp boxes, but I need 64-bit definitely for my new boxes.

please find a way to reduce memory overhead if possible and still make it snappy for such large files. thanks.

you can create a large test file by copy and paste with doubling.
on my dev box, it's easy to generate 100k lines simply by doing

          dir/s/b \*>testfile

then copy and paste that. it will give an out of memory error probably.
I work with stuff like that a lot.

Discussion

  • Jim Michaels

    Jim Michaels - 2013-08-15

    the wiki yanked out my underscores on those #defines.

     
  • Neil Hodgson

    Neil Hodgson - 2013-08-16
    • status: open --> open-invalid
    • assigned_to: Neil Hodgson
     
  • Neil Hodgson

    Neil Hodgson - 2013-08-16

    Scintilla and SciTE can be built as 64-bit executables. The Windows executable distribution is currently 32-bits as many Windows installations are 32-bits. I don't want the extra work of distributing 64-bit executables.

    However, 64-bit builds of Scintilla are still limited to 2GB documents. Making all indexes 64-bit will double the size of ancillary structures such as line start offsets which will waste memory needlessly for most users.

     
  • Jim Michaels

    Jim Michaels - 2013-08-16

    sounds like small thinking. I and many other people (especially database folk) have to edit and process large files once in a while.
    I don't mind a little more memory use in order to get the capability of handling files over 2GiB. I have had some once in a while that were critical to something, and I could not edit them. usually they were database files, like MaxMind's GEOIP database, which requires editing or processing to make usable.

    when I make my programs, I try to make them work on as many platforms as possible as well as possible. I know that some of my programs tend to use a fair amount of memory. so I make provision for it. it took a little learning. but I got it down. any help you need with NSIS installer targeting 64-bit, or writing code for 64-bit, let me know and I will be glad to lend assistance.

    right now, the main editor I am using is based on scintilla or scite, and I am about to run into the upper memory limitation just editing my HTML files (the editor I use uses lots of memory, for 5MiB file uses about 129MiB of memory before it simply closes)

    if your code is fixed, then their code is fixed too.

    indexes into strings and vectors should not really be int64_t or uint64_t or long long, since those do not size themselves based on the platform. size_t which is made for indexes and sizes, does.

    vectors are a funny case for this. there is a special way to declare a size_t (I think it's size_t or was it something else, memory fuzzy now, I can look it up, it's in the documentation) so that it works for a

          std::vector<std::string>
    

    for instance. see http://www.cplusplus.com/forum/beginner/15959/
    size_type and size_t are always the right size for the job. compiles on any platform. to printf these, I believe it's %I

    the problem with using uint64_t always as an array index on 32-bit and 64-bit platforms is that it causes a compile error or the program to crash (forget which). you have to size the type appropriately depending on the platform. that's where

         #include <stdint.h>
         #if defined(_WIN64)||defined(__LP64__)
             typedef uint64_t arrayIdxType;
         #else
             typedef uint32_t arrayIdxType;
         #endif
    

    come in (but for plain arrays).

     
  • Neil Hodgson

    Neil Hodgson - 2013-08-17
    • status: open-invalid --> closed-invalid
     
  • Neil Hodgson

    Neil Hodgson - 2013-08-17

    Its important to define the scope of a project and not put effort into features that are outside that scope. An editing component for huge files would be useful to many but that's not Scintilla. Generic code that works well for smaller files will often have problems on huge files and the appropriate solutions will differ depending on the types and sizes of huge files.

    Since 64-bit builds are supported, I'm closing this issue as 'invalid'.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks