Originally created by: bserg...@gmail.com
What steps will reproduce the problem?
1. Index the attached file with cindex
2. Search for a pattern inside it
3. No hits
What is the expected output? What do you see instead?
repro$
repro$ cindex -reset
repro$ cindex badfile
2013/03/13 18:51:52 index /tmp/repro/badfile
2013/03/13 18:51:52 flush index
2013/03/13 18:51:52 merge 0 files + mem
2013/03/13 18:51:52 0 data bytes, 92 index bytes
2013/03/13 18:51:52 done
repro$ cindex -list
/tmp/repro/badfile
repro$
repro$
repro$
repro$ grep main badfile
libc.so.6 __libc_start_main
torch main
torch realmain(int, char**)
libglib-2.0.... g_main_context_iteration
libglib-2.0.... g_main_context_prepare
libglib-2.0.... g_main_context_dispatch
#85 0x00000032f5c38f0e in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#87 0x00000032f5c3ca3a in g_main_context_iteration () from /lib64/libglib-2.0.so.0
#93 0x000000000040e74d in realmain(int, char**) ()
#94 0x000000000040e933 in main ()
repro$
repro$ csearch main <= no results here !!
repro$
repro$ grep threads badfile
============ All threads ==========
============ All threads ==========
repro$
repro$ csearch threads <= no results here !!
repro$
I cannot find (with csearch) text that is in a file I have indexed (cindex)
What version of the product are you using? On what operating system?
I'm using the Linux binaries that are available on the Download page.
I tried to compile go / codesearch but couldn't make it work (my go
install might be funky).
Please provide any additional information below.
It looks like the problem happens at indexing time.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: bserg...@gmail.com
Also, I've been using codesearch as part of a webapp at work that does forensic analysis of crashes (by letting us search through backtraces), and it's amazing :)
I'm kinda stuck right now because I cannot index some files and I'm thinking about using a different indexer / search system, but really codesearch is all I need so if someone can figure out what the problem is that would be awesome.
Thanks !!
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: bserg...@gmail.com
Also, I have one line that is crazy long: 2245 characters. Maybe the problem is that the indexer reads line by line and has some hardcoded limit on the number of chars in a single line ?
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: manpreet...@gmail.com
Try indexing with -verbose and -logskip flags to see if the file is getting skipped.
The arbitrary limits are in the source so you can always hand edit and tweak them. I have a version at
http://github.com/junkblocker/codesearch
which I did to specifically add such options.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: bserg...@gmail.com
Thanks for the tip. Indeed I've removed those long lines and now everything works fine. I've seen that your copy of the code has that -maxlinelen that should be what I need. Now I have to understand how to build a go program ...
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: bserg...@gmail.com
Alright, I figured it out, thanks.
repro$ awk '{print length($0)}' badfile | sort -n | tail
972
1001
1043
1071
1456
1529
1724
1792
2259
2328
and in index/write.go there's a
maxLineLen = 2000
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: bserg...@gmail.com
Feel free to close the issue whoever can.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: rsc@golang.org
I'm going to leave this open until I can get something like -logskip into
the mainline codesearch branch.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: bserg...@gmail.com
I don't know how far you guys should go with that, but having those 2 options to set the maxLineLen and maxFileSize on the command line would also help.
The default behavior could be to print a message like that (with a better phrasing probably / different options names) when a file got skipped.
=> /tmp/foo wasn't indexed (maxLine too long) / try to reindex with cindex -maxLineLen 3000
=> /tmp/foo wasn't indexed (file too big) / try to reindex with cindex -maxFileSize 1M