Scintilla / Bugs / #2019 Buffer over-read in MMIXAL lexer

Neil Hodgson - 2018-06-21

status: open --> open-accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2018-06-21

Yes, that looks like a bug to me.

Since I don't use MMIXAL and thus am unlikely to spot regressions, I won't change this myself.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Johannes Sasongko - 2018-06-22

I don't use it either, only found this due to GCC warning while building SciTE:

./../lexers/LexMMIXAL.cxx: In function ‘ColouriseMMIXALDoc’: lto1: warning: ‘__builtin_memmove’ reading 100 bytes from a region of size 99 [-Wstringop-overflow=]
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vadim Zeitlin - 2019-07-10

I don't use MMIXAL neither, but this is a potential security vulnerability and really needs to be fixed. The best way to do it would be to just replace the entire loop/memmove with a single ++s as noted by the original reporter.

This is a trivial change, please consider applying it just to stop people reporting it over and over again. TIA!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Neil Hodgson - 2019-07-10
  
  The important issue here is: have you tested the change? Which test case files did you use? Did the files exercise this code and the potential overflow case and were the results correct?
  
  Its all too easy to make a warning go away without actually doing the work required to ensure that the change is correct.
  
  I will not accept untested changes.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Vadim Zeitlin - 2019-07-10
    
    The important issue here is: have you tested the change?
    
    No, I didn't. To the best of my knowledge, there are no unit tests in Scintilla and I have no idea how to test this code interactively (I don't even know what is it supposed to do).
    
    Its all too easy to make a warning go away without actually doing the work required to ensure that the change is correct.
    
    Very true and I enthusiastically agree with this position in general. However in this particular case, the change is localized in a scope exactly 3 lines long, the code intention is as perfectly clear as the bug in it and if you're willing to look at it, you can see that replacing it with ++s is a really obviously correct thing to do. This is why I believe you might want to make an exception here.
    
    I will not accept untested changes.
    
    This is again a perfectly laudable point of view, however in this particular case you're knowingly leaving a potential security vulnerability and at best a DoS vector (assuming automatic lexer selection, any program using Scintilla can be made to crash by feeding it a file for which this lexer is used) in the library, which is, IMHO, not a very responsible thing to do.
    
    Assuming this code can be executed at all (no, I didn't check this neither), literally any change to it can't make things worse than they're right now. If you don't trust making the change proposed above (second chunk of Zufu Liu's patch) and don't want to test this lexer yourself, please at least consider replacing i != sizeof(s) condition in the loop with i != sizeof(s)-1. I don't think anybody can reasonably argue that this change, fixing a manifest out of bounds access, can somehow make things worse than they are.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Neil Hodgson - 2019-07-10
      
      The unit tests for Scintilla are located in the test directory. Test cases for particular lexers are located in the examples directory and run by lexTests.py. There are not yet any test examples for MMIXAL.
      
      Changes may appear completely obvious but stil be wrong. Its likely that the implementer of this lexer considered the code in question obvious.
      
      Your potential security vulnerability doesn't make sense. Where is the DoS possibility? The code in question reads one char after the valid range and copies it into allocated memory. The string is still NUL-terminated before this char so is safe to read in WordList::InList.
      
      any program using Scintilla can be made to crash by feeding it a file for which this lexer is used
      
      I really want to see this.
      
      As said before: test the change before asking for it to be committed.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Vadim Zeitlin - 2019-07-10
        
        Your potential security vulnerability doesn't make sense. Where is the DoS possibility? The code in question reads one char after the valid range
        
        And do you seriously think that reading beyond the allocated stack region is a normal and safe thing to do?
        
        and copies it into allocated memory. The string is still NUL-terminated before this char so is safe to read in WordList::InList.
        
        Perhaos you're not looking at the right code? This fragment:
        
        char s[100]; sc.GetCurrent(s, sizeof(s)); if (*s == ':') { // ignore base prefix for match for (size_t i = 0; i != sizeof(s); ++i) { *(s+i) = *(s+i+1); } }
        
        reads 101 bytes from a buffer containing 100 bytes. This has nothing to do with being NUL-terminated, the code doesn't care about NULs in any way whatsoever (maybe it should, but I don't really know and it's immaterial here).
        
        This is, of course, not guaranteed to crash, but this is undefined behaviour and it can perfectly well result in a crash if the stack frames happen to be arranged just so. I don't have a dog in this fight, but I do wonder how can you possibly argue that this is not a problem?
        
        Again, I'm not interested in "winning" this argument (there is nothing to win in it for me anyhow), but I just have no idea how can you believe that adding -1 here can make the matters worse than a manifest, staring in your face undefined behaviour that is there right now.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Neil Hodgson - 2019-07-10
        
        And do you seriously think that reading beyond the allocated stack region is a normal and safe thing to do?
        
        It is something that is of concern but not the degree of sky-falling-down that you are invoking.
        
        Perhaos you're not looking at the right code? This fragment:
        reads 101 bytes from a buffer containing 100 bytes.
        
        No it doesn't. It reads and write 100 bytes. The last value of 'i' is 99. One of those bytes is from outside the buffer.
        
        I mentioned the NUL byte termination as that determines what effect that extra byte will have. For example, can its value be determined from later behaviour and thus leak information about surrounding data such as stack addresses.
        
        Again, I'm not interested in "winning" this argument (there is nothing to win in it for me anyhow),
        
        And yet you continue to argue instead of doing the right thing and TESTING THE PATCH.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Vadim Zeitlin - 2019-07-10
        
        It is something that is of concern but not the degree of sky-falling-down that you are invoking.
        
        Would you really want anybody to quote you as saying that crashes in applications using Scintilla are "of concern but nothing really serious"? If so, you're a braver man than me, I wouldn't dare writing something like this publicly.
        
        No it doesn't. It reads and write 100 bytes. The last value of 'i' is 99. One of those bytes is from outside the buffer.
        
        You're totally correct, it reads 100 bytes and not 101. It just so happens that these 100 bytes are bytes with indices from 1 to 101 in a buffer containing 100 bytes. This totally changes what I wrote and means that there is no problem, thanks a lot for your valuable correction.
        
        And yet you continue to argue instead of doing the right thing and TESTING THE PATCH.
        
        Neil, I have absolutely no idea about how to test it. I don't have any MMIX examples lying around (shame on me, I know), and I don't have any idea about what is SCE_MMIXAL_REF and how would I trigger this code even if I had it. I would estimate that it's roughly a million times simpler for you to test it than for me, yet you're arguing as much as I do instead of doing this.
        
        I'm a total bystander here, I'm just following up a bug report we received and hoped to wait for your resolution of it to merge your changes in our version and avoid gratuitous divergences with upstream in our version. I'm very sorry for wasting your time and trying to make Scintilla a bit better and I promise not to do it again.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Neil Hodgson - 2019-07-10
        
        Would you really want anybody to quote you as saying that crashes in applications using Scintilla
        
        Please demonstrate a crash. My current belief is that this code will never trigger a crash but I could be wrong.
        
        This totally changes what I wrote and means that there is no problem, thanks a lot for your valuable correction.
        
        Your snarkiness does not help resolve this. Accuracy is important when discussing bugs and your statement made me think that I had misunderstood the behaviour (if its reading 101 bytes, is it writing before the buffer?) which then required more checking.
        
        I try to maintain quality standards when modifying Scintilla and part of that is ensuring that code is understood by contributors and changes are tested before being committed. Having the contributor test the change means it is less likely to cause more problems. Having test cases makes it easier for me to review the change and can also be used by others wanting to check the history of that module particularly if regressions occur.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Vadim Zeitlin - 2019-07-11
        
        Please demonstrate a crash.
        
        Sorry, no, I'm not prepared to invest possibly significant additional time for a very uncertain outcome (would you consider the problem serious if it only crashes on some ARM architecture when using specific gcc options? I have an inkling of an answer...), especially when I see absolutely no advantage to doing it rather than, you know, actually just spending 30 seconds on fixing the bug.
        
        My current belief is that this code will never trigger a crash but I could be wrong.
        
        Yet you are certain enough that you're right to refuse to make the trivial change. to ensure that the crash doesn't happen. This looks like a very misplaced confidence to me, but what do I know.
        
        Your snarkiness does not help resolve this.
        
        Please excuse me, I regret having written this if your comment was in good faith. It was difficult for me to see it in this light, however, as the salient fact is quite obviously that the code reads 101st byte from 100-sized buffer, not that it reads exactly 100 bytes. Yet you're technically correct and I was wrong and I should have been more precise, sorry. Just as an aside, maybe not yelling at people asking them to test things they're unable to do would help to avoid giving them the impression that you're not exactly overflowing with goodwill in their direction in the future and avoid such epidermic reactions.
        
        I try to maintain quality standards
        
        We all do. Yet there is a (usually not so fine) line between reasonable demands and being stubborn for stubbornness sake. We just clearly have very different opinions about where does this case fall relative to this line and I'm afraid we'll just have to agree to disagree about it.
        
        Sorry again for starting this discussion, I sincerely regret wasting all this time and I'm unsubscribing from this bug to avoid any temptation to continue doing this in the future.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pavel Kovalenko - 2019-07-10

maybe it should be better to just replace raw loop by memmove(s, s + 1, sizeof(s) - 1) ? But just shift pointer is definitelly better.

Last edit: Pavel Kovalenko 2019-07-10

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Zufu Liu - 2019-07-10

Patch based on Vadim's ++s idea.

Removed unused sc.GetCurrent() in SCE_MMIXAL_NUMBER block.

LexMMIXAL.diff

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2019-07-10

Removed unused sc.GetCurrent() in SCE_MMIXAL_NUMBER block.

Yet again, an extra change that is not needed to fix the bug. Please stop doing this.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Removed the unneeded change.

The code behaviors same (skip first :- the base prefix ) , you can test the code with online compliers.
https://www.tutorialspoint.com/compile_cpp_online.php
https://www.jdoodle.com/online-compiler-c++

#include <stdio.h>
#include <string.h>

void test1() {
    char s[100];
    strcpy(s, ":abcdefgh");
    if (*s == ':') {    // ignore base prefix for match
        for (size_t i = 0; i != sizeof(s); ++i) {
            *(s+i) = *(s+i+1);
        }
    }
    printf("%s s=[%s]\n", __func__, s);
}

void test2() {
    char s0[100];
    strcpy(s0, ":abcdefgh");
    const char *s = s0;
    if (*s == ':') { // ignore base prefix for match
        ++s;
    }
    printf("%s s=[%s]\n", __func__, s);
}

int main() {
    test1();
    test2();
    return 0;
}

which should output

test1 s=[abcdefgh]
test2 s=[abcdefgh]

LexMMIXAL-2.diff

Neil Hodgson - 2020-05-01

status: open-accepted --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2020-05-01

Committed fix with test case as [6d0ce3].

Related

Commit: [6d0ce3]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2020-06-02

status: open-fixed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neil Hodgson - 2020-06-02

Committed fix with test case as [6d0ce3].

Related

Commit: [6d0ce3]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Buffer over-read in MMIXAL lexer

Group

Searches

Help

#2019 Buffer over-read in MMIXAL lexer

Discussion

Related

Related