Re: [Dev-C++] Choose a word from a .txt file
Open Source C & C++ IDE for Windows
Brought to you by:
claplace
|
From: Per W. <pw...@ia...> - 2008-06-03 12:12:31
|
The question here isn't the number of source lines to perform the job.
Why? Because you don't need any source lines at all if you just use normal
unix-available tools - you can get an answer directly on the command line.
But this is a C/C++ mailling list, since it is about Dev-C++. So the
question should then be: How to actually perform the job (instead
of letting existing command-line tools do it) in C/C++ source code.
If you have a nail: should you use a hammer or a sledge hammer? Or
maybe you should blast it into the pnak with dynamite? The Hilti nail-guns
goes the explosive way, but with the disadvantage that it doesn't leave
much for a DIY guy. Calling a magic reg-exp library doesn't tell someone
what is happening.
Reg-exp is cool, but regular expressions are meant to perform pattern
matching. You don't have much need for any pattern matching in this case -
and more importantly, you are ignoring how to solve the problem since you
let someone else solve it for you.
Next step: You compalin about my tree algorithm.I did suggest both a
linear approach and a tree algorithm, mentioning the tree algorithm for
large sorted files. If you have one billion words, the text file may be
100 billion bytes large. Guess how fast your reg-exp scan of the full file
would be. Guess what would happen when a tree algorithm cuts it in two for
each iterative step? A naive implementation would solve for one billion
words in max 30 tests. A slightly better approach (switching from tree to
linear when you have narrowed it down to a block of maybe a couple of kB
of data - matched to n * block size for file system or memory subsystem.
Now, assume 20MB/s for the regexp solution, and 0.1s/scan for the tree
search. On average, the regexp would then need 25 000 seconds on average
(twice in worst-case), while the tree search would take <= 3 seconds
worst-case. Hence, the simplest code doesn't always win - you must know
what your requirements are before you can select a good (or almost always
a "good enough") algorithm.
Anyway - please compare runtime and size of application binary for a
compiled regexp solution, as compared with the following:
void scan(char* my_word) {
char tmp_match[MAX_WORD_LEN+10];
char linebuf[MAX_LINE_LEN],*p1,*p2;
int match_len,found = 0;
FILE *f;
match_len = strlen(my_word);
strcpy(tmp_match,my_word);
strcpy(tmp_match+match_len++," ");
f = fopen(fname,"rt");
if (f) {
while (fgets(linebuf,sizeof(linebuf),f)) {
if (!strncmp(linebuf,tmp_match,match_len)) {
p1 = linebuf + match_len;
while (*p1 == ' ') p1++;
p2 = strchr(p1,'\n');
if (p2) *p2 = '\0';
printf("Answer for %s: %s\n",my_word,p1);
found = 1;
break;
}
}
fclose(f);
if (!found) printf("No match found for %s\n",my_word);
} else {
printf("Failed opening text file\n");
}
}
Note: A real implementation should call setbuffer() or setvbuf() or
alternatively use ioctl() or similar and specify one-pass read-only scan
without seek. But that isn't part of a search algorithm.
/pwm
On Tue, 3 Jun 2008, "Münt, Bernd" wrote:
> > No regular expressions if he is looking for an exact match. Regular
> > expressions are there to look for patterns. Normal
> > strcmp()/strncmp() is
> > faster when looking for exact matches.
>
> No not at all. You explained an algorithms to look in the middle, then depending on less or greater in the lower half or in the upper half etc. That needs a lot of lookups if the file is huge.
>
> A regex with the multiline option can find the answer with one line.
> /^WordToFind\s*?(.*)/m
> In the caption(1) you will find the answer, if there is an answer.
>
> Regards, BM
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Dev-cpp-users mailing list
> Dev...@li...
> TO UNSUBSCRIBE: http://www23.brinkster.com/noicys/devcpp/ub.htm
> https://lists.sourceforge.net/lists/listinfo/dev-cpp-users
>
|