Menu

#30 Cannot index / search one file

New
nobody
None
Medium
Defect
2013-03-14
2013-03-13
Anonymous
No

Originally created by: bserg...@gmail.com

What steps will reproduce the problem?
1. Index the attached file with cindex
2. Search for a pattern inside it
3. No hits

What is the expected output? What do you see instead?

repro$
repro$ cindex -reset
repro$ cindex badfile
2013/03/13 18:51:52 index /tmp/repro/badfile
2013/03/13 18:51:52 flush index
2013/03/13 18:51:52 merge 0 files + mem
2013/03/13 18:51:52 0 data bytes, 92 index bytes
2013/03/13 18:51:52 done
repro$ cindex -list
/tmp/repro/badfile
repro$
repro$
repro$
repro$ grep main badfile
libc.so.6        __libc_start_main
torch             main
torch              realmain(int, char**)
libglib-2.0....          g_main_context_iteration
libglib-2.0....           g_main_context_prepare
libglib-2.0....            g_main_context_dispatch
#85 0x00000032f5c38f0e in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#87 0x00000032f5c3ca3a in g_main_context_iteration () from /lib64/libglib-2.0.so.0
#93 0x000000000040e74d in realmain(int, char**) ()
#94 0x000000000040e933 in main ()
repro$
repro$ csearch main  <= no results here !!
repro$
repro$ grep threads badfile
============ All threads ==========
============ All threads ==========
repro$
repro$ csearch threads <= no results here !!
repro$

I cannot find (with csearch) text that is in a file I have indexed (cindex)

What version of the product are you using? On what operating system?

I'm using the Linux binaries that are available on the Download page.
I tried to compile go / codesearch but couldn't make it work (my go
install might be funky).

Please provide any additional information below.

It looks like the problem happens at indexing time.

1 Attachments

Discussion

  • Anonymous

    Anonymous - 2013-03-13

    Originally posted by: bserg...@gmail.com

    Also, I've been using codesearch as part of a webapp at work that does forensic analysis of crashes (by letting us search through backtraces), and it's amazing :)

    I'm kinda stuck right now because I cannot index some files and I'm thinking about using a different indexer / search system, but really codesearch is all I need so if someone can figure out what the problem is that would be awesome.

    Thanks !!

     
  • Anonymous

    Anonymous - 2013-03-13

    Originally posted by: bserg...@gmail.com

    Also, I have one line that is crazy long: 2245 characters. Maybe the problem is that the indexer reads line by line and has some hardcoded limit on the number of chars in a single line ?

     
  • Anonymous

    Anonymous - 2013-03-13

    Originally posted by: manpreet...@gmail.com

    Try indexing with -verbose and -logskip flags to see if the file is getting skipped.

    The arbitrary limits are in the source so you can always hand edit and tweak them. I have a version at

    http://github.com/junkblocker/codesearch

    which I did to specifically add such options.

     
  • Anonymous

    Anonymous - 2013-03-13

    Originally posted by: bserg...@gmail.com

    Thanks for the tip. Indeed I've removed those long lines and now everything works fine. I've seen that your copy of the code has that -maxlinelen that should be what I need. Now I have to understand how to build a go program ...

     
  • Anonymous

    Anonymous - 2013-03-13

    Originally posted by: bserg...@gmail.com

    Alright, I figured it out, thanks.

    repro$ awk '{print length($0)}' badfile | sort -n | tail
    972
    1001
    1043
    1071
    1456
    1529
    1724
    1792
    2259
    2328

    and in index/write.go there's a
        maxLineLen      = 2000

     
  • Anonymous

    Anonymous - 2013-03-13

    Originally posted by: bserg...@gmail.com

    Feel free to close the issue whoever can.

     
  • Anonymous

    Anonymous - 2013-03-14

    Originally posted by: rsc@golang.org

    I'm going to leave this open until I can get something like -logskip into
    the mainline codesearch branch.

     
  • Anonymous

    Anonymous - 2013-03-14

    Originally posted by: bserg...@gmail.com

    I don't know how far you guys should go with that, but having those 2 options to set the maxLineLen and maxFileSize on the command line would also help.

    The default behavior could be to print a message like that (with a better phrasing probably / different options names) when a file got skipped.

    => /tmp/foo wasn't indexed (maxLine too long) / try to reindex with cindex -maxLineLen 3000

    => /tmp/foo wasn't indexed (file too big) / try to reindex with cindex -maxFileSize 1M

     

Log in to post a comment.

MongoDB Logo MongoDB