Menu

#178 rkhunter generates "bogus" grep warnings

main
open
None
5
2022-10-14
2022-10-07
No

I originally reported this on the Ubuntu bug tracker, but I thought I'd copy it over here as well for better visibility.

https://bugs.launchpad.net/ubuntu/+source/rkhunter/+bug/1989799

When setting up a corosync/pacemaker cluster, I recently encountered a new bug or flaw in rkhunter's do_dev_whitelist_check() function (called as part of the local_host test).

I have multiple files located under /dev/shm/qb-/qb- which are generated by the cluster. I normally have these whitelisted under ALLOWDEVFILE to avoid false positives. However, when the rkhunter cron job ran on these servers, some messages to stderr were generated, causing an email notification (which will repeat daily as long as these files remain).

grep: (standard input): binary file matches

After tracing the problem back to the file type check in the do_dev_whitelist_check() function, I noticed two potential issues.

1) The grep command assumes the output of the 'file' command is text (which sounds reasonable), but in my particular case, the file was being (mis)detected as a "Matlab v4 mat-file". The details it gave includes non-ASCII characters, which made grep treat the text like a binary file and issue the warning. When I run the 'file' command on the file in question from my interactive shell, it shows this (notice it automatically encodes the special character):

Matlab v4 mat-file (little endian) \235U, numeric, rows 3503345872, columns 12

When the same command is run from a shell script under /bin/sh, it returns the ASCII character itself (maybe a locale/encoding difference?), thus triggering the binary file detection of grep.

2) The whitelisting of files through ALLOWDEVFILE occurs after the file type check, which seems a little backwards. If I choose to whitelist a file, it should mean "ignore it completely". Even with the way the logic is currently written, it runs the test and simply ignores the results, which ultimately produces the same outcome as whitelisting the file before the check. The only difference is that whitelisted files are only reported if rkhunter thinks they are dangerous.

With all this being said, I see two possible solutions:

1) Move the whitelist check before the file type detection code
2) Add the --text option to the egrep command in do_dev_whitelist_check() so it always treats the file detection results as text

The former still has the issue where someone does not whitelist a file and the 'file' command returns non-ASCII characters in the result. However, I feel that implementing both #1 and #2 is the best course of action. The changes are pretty trivial, so I've provided a patch for each case. You should be able to apply both in order.

patch < rkhunter_option1.patch
patch < rkhunter_option2.patch

2 Attachments

Discussion

  • John Horne

    John Horne - 2022-10-13

    From the CHANGELOG for version 1.4.0:

    Ensure that the ALLOWDEVFILE, ALLOWHIDDENFILE and ALLOWHIDDENDIR
    options re-evaluate their whitelisting lists to ensure that any
    wildcard entries are the most recent. (A time window previously
    existed which meant that the list was processed, but new files
    could be created before the test was run. As such they were reported
    as false-positive warnings, when they should have been whitelisted.)

    The whitelisting was previously only done before the test, but, as said above, it had to be repeated in the test in order to avoid false-positives.

     
  • John Horne

    John Horne - 2022-10-13

    Okay, I'm having trouble replicating this.
    The output from the 'file' command you ran on the command-line should be the same as received by rkhunter. It shouldn't contain any binary/control characters unless the '-r' option is used.
    So likewise if you run 'file <your matlab="" file=""> | grep abc' then the 'grep' command should not be seeing anything causing it to think it is binary input.
    Could you run 'rkhunter --enable filesystem --debug'. This should create a file named '/tmp/rkhunter...'. Could you then send that to me. It should show what the 'file' command is sending to grep. Thanks.</your>

     
  • Justin Pasher

    Justin Pasher - 2022-10-13

    Okay, I did a little more testing and narrowed down when it's happening, and it's a bash vs dash echo thing (Ubuntu uses dash for /bin/sh). The 'file' command is actually returning regular backslash-escaped text. I was able to create a test file that triggers the false detection that you can use (it should be 22 bytes).

    $ echo -ne '\x0\x0\x0\x0\xd0\xd0\xd0\xd0\x6\x0\x0\x0\x0\x0\x0\x0\x10\x0\x0\x0\xca\x18' >testfile
    $ file testfile
    testfile: Matlab v4 mat-file (little endian) \312\030, numeric, rows 3503345872, columns 6
    

    Put this file under /dev/shm/ and try running rkhunter to see if it triggers the warning for you.

    dash's internal 'echo' command will evaluate the \312\030 sequence and print out the raw ASCII characters instead of leaving it as-is (which is then passed to awk, cut, and eventually egrep). According to the dash man page, it always evaluates backslash-escaped sequences, including unknown ones that will "elicit undefined behaviour". bash 'echo' does not interpret backslash-escaped sequences by default. You can see the results like this (code adapted from the do_dev_whitelist_check() function):

    $ /bin/bash -c "echo 'Test values: \312\030' | awk -F':' '{ print \$NF }' | cut -c2- | hexdump -C"
    
    00000000  5c 33 31 32 5c 30 33 30  0a  |\312\030.|
    00000009
    
    $ /bin/dash -c "echo 'Test values: \312\030' | awk -F':' '{ print \$NF }' | cut -c2- | hexdump -C"
    
    00000000  ca 18 0a  |...|
    00000003
    

    With all that said, it seems like the easiest way to work around the dash/bash difference is to make egrep treat all input as text, since we are searching for a specific text string anyway.

    Regarding the second part about evaluating ALLOWDEVFILES each time, I guess I'm not following the logic. The patch just says "if the file in RKHTMPVAR is in ALLOWDEVFILES, just return instead of running the file command on it". The current code reverses that (i.e. always run the file command, then check the whitelist). In both cases, the file is added to FOUNDFILES only if the file type doesn't match a "known" allowed type and it's not in the whitelist. If the file is listed in the whitelist, why run the file command?

     
  • John Horne

    John Horne - 2022-10-14

    Thanks for that, and all the testing.

    Would you try the following please:

    file testfile | tr '[:cntrl:]' ' ' | grep abc

    I just want to see if the 030 (control-X) is the problem, so changing it to a space then makes grep work. (The second part of the 'tr' command is a single space in quotes, but it may not display too well above.)

    Using Fedora 36 and your testfile, rkhunter/grep showed no problem. It gave a warning about the file, but no grep problem. The file command shows it as a Matlab file.

    Using Ubuntu 20, the file command shows the testfile as 'data'.
    Using Ubuntu 22, the file command shows the testfile as a Matlab file (the same as in your reply), but the grep command has no problems with it. (So, 'file testfile | grep abc' shows no error.)

    I agree with you that for whitelisted files, there is little point in continuing with the check and in particular running pipelines as used with the file command. I shall look into it.

     
  • John Horne

    John Horne - 2022-10-14
    • assigned_to: John Horne
     
  • John Horne

    John Horne - 2022-10-14

    Thanks for that, and all the testing.

    Would you try the following please:

    file testfile | tr '[:cntrl:]' ' ' | grep abc

    I just want to see if the 030 (control-X) is the problem, so changing it to a space then makes grep work. (The second part of the 'tr' command is a single space in quotes, but it may not display too well above.)

    Using Fedora 36 and your testfile, rkhunter/grep showed no problem. It gave a warning about the file, but no grep problem. The file command shows it as a Matlab file.

    Using Ubuntu 20, the file command shows the testfile as 'data'.
    Using Ubuntu 22, the file command shows the testfile as a Matlab file (the same as in your reply), but the grep command has no problems with it. (So, 'file testfile | grep abc' shows no error.)

    I agree with you that for whitelisted files, there is little point in continuing with the check and in particular running pipelines as used with the file command. I shall look into it.

     
  • Justin Pasher

    Justin Pasher - 2022-10-14

    The command you provided works fine without warnings (it even works fine without the tr). Keep in mind that it's the 'echo' command in dash that is causing the escape sequences to be converted to their actual ASCII characters. If you chain together the 'file' and 'grep' commands, I wouldn't expect any warnings about binary files to display, since 'file' produces "clean" output. However, rkhunter captures the results in FTYPE, then echoes it out to grep because of the special case for MACOSX.

    If I run it through dash in this somewhat convoluted way, the echo command still evaluates the remaining escaped character (\312).

    $ /bin/dash -c "FTYPE=\$(file testfile | tr '[:cntrl:]' ' '); echo \$FTYPE | grep Matlab"
    grep: (standard input): binary file matches
    

    If I used sed and replace the \312, grep doesn't think it's a binary file. I don't know specifically what type of bytes grep uses to consider something binary (maybe it only looks at 8-bit ASCII), but if you wanted to go that route, you could use something like this on the 'echo' command:

    $ /bin/dash -c "FTYPE=\$(file testfile); echo \$FTYPE | tr -c '[:print:]' ' ' | grep Matlab"
    

    I personally think it's easier to just make grep treat the string as text, since you are comparing to a specific set of known strings. I know that technically the problem exists because of the mis-detection of the file type, but I doubt it's the last time something like this will happen, which is why I think it's easier to implement the fail-safe in rkhunter.

    Just for reference, here are some detection results on the file.

    Debian 10 (file 5.35) - data
    Debian 11 (file 5.39) - data
    Ubuntu 20.04 (file 5.38) - data
    Ubuntu 22.04 (file 5.41) - Matlab v4 mat-file (little endian) ...

    The Matlab detection was changed in this (somewhat ironically titled) commit (released under 5.41):

    https://github.com/file/file/commit/8436304b9b6b827ef98e836edf736a0b94b26636

     
  • John Horne

    John Horne - 2022-10-14

    Ah! Oh sorry, I thought the problem came from the grep command used in the 'file' pipeline.
    Okay, so it's the 'echo' command used later on with grep.

    Could you try the development version (1.4.7) of RKH? One of the first changes was to detect the shell better, and switch to bash if possible. I suspect the problem will then disappear.
    You can obtain the development version from:
    https://sourceforge.net/p/rkhunter/rkh_code/ci/develop/tree/
    Click on the 'Download snapshot' towards the top-right. Unzip the downloaded file, and then run the installer. If you don't want to mess up your current setup, then you could run this as a standalone installation - see the README file for details.

    I'll still make a change to add the grep '-a' option, and I want to keep the file pipelines similar so will add the 'tr...cntrl' bit in as well. I won't do those until you have had a try of the dev version if that is okay. (If I do them right now, then we wouldn't know if changing the shell solves it on its own.)

     
  • Justin Pasher

    Justin Pasher - 2022-10-14

    I tried running "rkhunter --enable filesystem" using the latest snapshot on Ubuntu 22.04, and I am not seeing the grep warning anymore (it did still give me the suspicious files in /dev warning, as expected for a default config).

    Looking at the log file, I do see this line, so it looks like it switched shells properly.

    [14:19:35] Info: Environment shell is /bin/bash; rkhunter is using bash
    [14:19:35] Info: Unknown shell changed from /usr/bin/dash to bash
    

    Somewhat interesting is the log entry for the (mis)detection of the Matlab file. The log file has this:

    /dev/shm/testfile: Matlab v4 mat-file (little endian) \312^X, numeric, rows 3503345872, columns 6
    

    The CTRL+X is what vim shows for a literal \030 character in the file (compared to both of the raw ASCII characters being shown in the log for 1.4.6 when running under dash).

    So, yes, the switch to bash from dash seems to work properly under Ubuntu 22.04. I wonder how the 'echo' command in other shells handle escape codes by default...

     

Log in to post a comment.