Menu

#21 Bug in formatit.ph line 2926

open
nobody
None
5
2013-07-29
2013-04-19
BrettLee
No

Reporting what appears to be in closed Bug #3194128. Attaching file with full command output.

collectl-3.6.7-1.el5
Lustre 1.8.9
2.6.18-348.1.1.el5
CentOS release 5.8 (Final)

# collectl -d4 -c1 -sl --lustopts B
waiting for 1 second sample...
>>> 1366411948.003 <<<
LLITE:0 dirty_pages_misses 15022494 samples [regs]
LLITE:0 read_bytes 99044 samples [bytes] 2097152 2097152 207710322688
LLITE:0 write_bytes 58698 samples [bytes] 0 1048576 61532049408
LLITE:0 open 47208 samples [regs]
LLITE:0 close 47209 samples [regs]
LLITE:0 setattr 46786 samples [regs]
LLITE:0 getattr 178 samples [regs]
LLITE_RPC:0:0 read write
Argument "write" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 9.
Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
LLITE_RPC:0:1 pages per rpc rpcs % cum % | rpcs % cum %
Argument "per" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 10.
Argument "cum" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 10.

Discussion

  • BrettLee

    BrettLee - 2013-04-19

    Full output from debug command

     
  • BrettLee

    BrettLee - 2013-04-21

    Hi Mark,

    As others have said, you've really produced a fantastic tool here. Lots and lots of uses for it - especially enjoying using it with Lustre.

    BTW, I should clarify that this report applies *ONLY* when running the command/options on a Lustre client. When using the same options and running on an OSS, it works as expected.

    Thanks again, Brett

     
  • Mark Seger

    Mark Seger - 2013-04-25

    hmm, looking at the code I see collectl looks for a line that looks like LLITE_RPC:x:y data...

    it expects to see numbers in the 1st and 5th fields of the data portion, noting the indexing starts at 0. from what I can see above the the entroes for 0:0 and 0:1 don't even look like each other and I guess I have to ask if the people who bring us lustre have changed the format of that data. do you have a way to verify with an earlier version?

    I've always feared what could happen if the stats format change and this could be the tip of the iceberg OR maybe I'm misinterpreting what I'm seeing

    -mark

     
  • BrettLee

    BrettLee - 2013-05-04

    Hi Mark,

    (seemingly unable to add a new comment, so posted at the end of this comment -Brett)

    Sorry for the delay. Yes, was able to see the same sort of output using an earlier version of Lustre. In this case:

    collectl-3.6.7-1.el5
    Lustre 1.8.8
    2.6.18-308.4.1.el5
    CentOS release 5.8 (Final)

    # cat /proc/fs/lustre/version
    lustre: 1.8.8
    kernel: patchless_client
    build: jenkins-wc1-gbc88c4c-PRISTINE-2.6.18-308.4.1.el5
    # uname -r
    2.6.18-308.4.1.el5
    # cat /etc/redhat-release
    CentOS release 5.8 (Final)
    # collectl -d4 -c1 -sl --lustopts B
    waiting for 1 second sample...
    >>> 1367649183.003 <<<
    LLITE:0 dirty_pages_misses 882184 samples [regs]
    LLITE:0 write_bytes 3450 samples [bytes] 0 1048576 3613409280
    LLITE:0 open 509 samples [regs]
    LLITE:0 close 509 samples [regs]
    LLITE:0 setattr 506 samples [regs]
    LLITE:0 getattr 22 samples [regs]
    LLITE_RPC:0:0 read write
    Argument "write" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    LLITE_RPC:0:1 pages per rpc rpcs % cum % | rpcs % cum %
    Argument "per" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 10.
    Argument "cum" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 10.
    LLITE_RPC:0:2 1: 0 0 0 | 3 0 0
    LLITE_RPC:0:3 2: 0 0 0 | 1 0 0
    LLITE_RPC:0:4 4: 0 0 0 | 0 0 0
    LLITE_RPC:0:5 8: 0 0 0 | 0 0 0
    LLITE_RPC:0:6 16: 0 0 0 | 0 0 0
    LLITE_RPC:0:7 32: 0 0 0 | 0 0 0
    LLITE_RPC:0:8 64: 0 0 0 | 0 0 0
    LLITE_RPC:0:9 128: 0 0 0 | 0 0 0
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 18.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 18.
    LLITE_RPC:0:10 256: 0 0 0 | 2226 99 100
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 19.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 19.
    LLITE_RPC:1:0 read write
    Argument "write" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    LLITE_RPC:1:1 pages per rpc rpcs % cum % | rpcs % cum %
    Argument "per" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 10.
    Argument "cum" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 10.
    LLITE_RPC:1:2 1: 0 0 0 | 2 0 0
    LLITE_RPC:1:3 2: 0 0 0 | 1 0 0
    LLITE_RPC:1:4 4: 0 0 0 | 0 0 0
    LLITE_RPC:1:5 8: 0 0 0 | 0 0 0
    LLITE_RPC:1:6 16: 0 0 0 | 0 0 0
    LLITE_RPC:1:7 32: 0 0 0 | 0 0 0
    LLITE_RPC:1:8 64: 0 0 0 | 0 0 0
    LLITE_RPC:1:9 128: 0 0 0 | 0 0 0
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 18.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 18.
    LLITE_RPC:1:10 256: 0 0 0 | 1120 99 100
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 19.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 19.
    LLITE_RPC:2:0 read write
    Argument "write" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    LLITE_RPC:2:1 pages per rpc rpcs % cum % | rpcs % cum %
    Argument "per" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 10.
    Argument "cum" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 10.
    LLITE_RPC:2:2 1: 0 0 0 | 0 0 0
    LLITE_RPC:3:0 read write
    Argument "write" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    LLITE_RPC:3:1 pages per rpc rpcs % cum % | rpcs % cum %
    Argument "per" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 10.
    Argument "cum" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 10.
    LLITE_RPC:3:2 1: 0 0 0 | 0 0 0
    LLITE_RPC:3:3 2: 0 0 0 | 0 0 0
    LLITE_RPC:3:4 4: 0 0 0 | 0 0 0
    LLITE_RPC:3:5 8: 0 0 0 | 0 0 0
    LLITE_RPC:3:6 16: 0 0 0 | 0 0 0
    LLITE_RPC:3:7 32: 0 0 0 | 0 0 0
    LLITE_RPC:3:8 64: 0 0 0 | 0 0 0
    LLITE_RPC:3:9 128: 0 0 0 | 0 0 0
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 18.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 18.
    LLITE_RPC:3:10 256: 0 0 0 | 100 100 100
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 19.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 19.
    >>> 1367649184.003 <<<
    LLITE:0 dirty_pages_misses 882184 samples [regs]
    LLITE:0 write_bytes 3450 samples [bytes] 0 1048576 3613409280
    LLITE:0 open 509 samples [regs]
    LLITE:0 close 509 samples [regs]
    LLITE:0 setattr 506 samples [regs]
    LLITE:0 getattr 22 samples [regs]
    LLITE_RPC:0:0 read write
    Argument "write" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    LLITE_RPC:0:1 pages per rpc rpcs % cum % | rpcs % cum %
    Argument "per" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 10.
    Argument "cum" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 10.
    LLITE_RPC:0:2 1: 0 0 0 | 3 0 0
    LLITE_RPC:0:3 2: 0 0 0 | 1 0 0
    LLITE_RPC:0:4 4: 0 0 0 | 0 0 0
    LLITE_RPC:0:5 8: 0 0 0 | 0 0 0
    LLITE_RPC:0:6 16: 0 0 0 | 0 0 0
    LLITE_RPC:0:7 32: 0 0 0 | 0 0 0
    LLITE_RPC:0:8 64: 0 0 0 | 0 0 0
    LLITE_RPC:0:9 128: 0 0 0 | 0 0 0
    LLITE_RPC:0:10 256: 0 0 0 | 2226 99 100
    LLITE_RPC:1:0 read write
    Argument "write" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    LLITE_RPC:1:1 pages per rpc rpcs % cum % | rpcs % cum %
    Argument "per" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 10.
    Argument "cum" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 10.
    LLITE_RPC:1:2 1: 0 0 0 | 2 0 0
    LLITE_RPC:1:3 2: 0 0 0 | 1 0 0
    LLITE_RPC:1:4 4: 0 0 0 | 0 0 0
    LLITE_RPC:1:5 8: 0 0 0 | 0 0 0
    LLITE_RPC:1:6 16: 0 0 0 | 0 0 0
    LLITE_RPC:1:7 32: 0 0 0 | 0 0 0
    LLITE_RPC:1:8 64: 0 0 0 | 0 0 0
    LLITE_RPC:1:9 128: 0 0 0 | 0 0 0
    LLITE_RPC:1:10 256: 0 0 0 | 1120 99 100
    LLITE_RPC:2:0 read write
    Argument "write" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    LLITE_RPC:2:1 pages per rpc rpcs % cum % | rpcs % cum %
    Argument "per" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 10.
    Argument "cum" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 10.
    LLITE_RPC:2:2 1: 0 0 0 | 0 0 0
    LLITE_RPC:3:0 read write
    Argument "write" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    Use of uninitialized value in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 9.
    LLITE_RPC:3:1 pages per rpc rpcs % cum % | rpcs % cum %
    Argument "per" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2926, <PROC> line 10.
    Argument "cum" isn't numeric in subtraction (-) at /usr/share/collectl/formatit.ph line 2927, <PROC> line 10.
    LLITE_RPC:3:2 1: 0 0 0 | 0 0 0
    LLITE_RPC:3:3 2: 0 0 0 | 0 0 0
    LLITE_RPC:3:4 4: 0 0 0 | 0 0 0
    LLITE_RPC:3:5 8: 0 0 0 | 0 0 0
    LLITE_RPC:3:6 16: 0 0 0 | 0 0 0
    LLITE_RPC:3:7 32: 0 0 0 | 0 0 0
    LLITE_RPC:3:8 64: 0 0 0 | 0 0 0
    LLITE_RPC:3:9 128: 0 0 0 | 0 0 0
    LLITE_RPC:3:10 256: 0 0 0 | 100 100 100

    # LUSTRE CLIENT SUMMARY (/sec): RPC-BUFFERS (pages)
    #RdK Rds 1P 2P 4P 8P 16P 32P 64P 128P 256P WrtK Wrts 1P 2P 4P 8P 16P 32P 64P 128P 256P
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    >>> 1367649185.002 <<<

    ======= New Comment =======

    Seems like something had changed. Skipping a couple more lines and parsing the rpc_stats file a little further has resolved the error spew and seems to provide accurate data. A diff is attached - SF seems to not like the format in this text field.

     

    Last edit: BrettLee 2013-06-28
  • Mark Seger

    Mark Seger - 2013-07-29

    I see your patch but think I'd also like to see what the file containing the rpc stats look like. From your patch it looks like you're skipping the first 10 fields instead of 8. Personally I'm sorry I ever wrote that routine that way, trying to be as efficient as I could but I probably would have been better off looking for the start of the data I wanted vs ignoring the first-n arbitrary lines and hoping for the best.

    right now my fear is if I make the change you're suggesting, I may end up breaking things for earlier versions of lustre. while this would certainly be a hack, and perhaps you could try it out as I don't have access to a box running lustre any more, what if right before you call getproc(2...) for rpc_stats you do something like:

    if ($cfsVersion lt '1.8.8')
    {
    $ignore = 8;
    $quit = 11;
    }
    else
    {
    $ignore = 10;
    $quit = 19;
    }

    and call getProc using those 2 variables instead of hardcoding them as you did in your patch? if you can try that out and it works, I can get it into the next release.

    -mark