#41 Problem when substituting certain patterns

open
nobody
None
5
2014-08-25
2010-10-16
Ewald Arnold
No

I think I found two flaws in the text parser when I was using the html parser.

First, in change_token() <head> remains at the previous <token> position. This is no real problem as head is moved in the next word check since the new word was accepted, it is just a bit of time waste :-)

@@ -196,10 +196,13 @@
int TextParser::change_token(const char * word)
{
if (word) {
+// fprintf(stderr, "chg1: %s <:> %s\n", word, line[actual]+head);
char * r = mystrdup(line[actual] + head);
strcpy(line[actual] + token, word);
strcat(line[actual], r);
+ token += strlen(word);
head = token;
+// fprintf(stderr, "chg2: %s <:> %s\n", word, line[actual]+head);
free(r);
return 1;
}

Second, under rare conditions alloc_token() requests too many characters when it checks for url characters - head remains a the last checked position.

</row>

<row>
<entry>data2c</entry>

0: Data
1: -data
2: dato

After accepting "-data" it highlights "c</entr" instead of c

the fix:

Index: textparser.cxx

--- textparser.cxx (Revision 20544)
+++ textparser.cxx (Arbeitskopie)
@@ -259,8 +262,14 @@

int TextParser::get_url(int token_pos, int * head)
{
- for (int i = *head; urlline[i] && *(line[actual]+i); i++, (*head)++);
- return checkurl ? 0 : urlline[token_pos];
+ int head_tmp = *head;
+ for (int i = *head; urlline[i] && *(line[actual]+i); i++, (*head)++);
+
+ if(checkurl)
+ return 0;
+
+ *head = head_tmp;
+ return urlline[token_pos];
}

I am not sure about the background but probably urls and email addresses should just be kept away from processing.

Discussion