#41 Problem when substituting certain patterns

open
nobody
None
5
2014-08-25
2010-10-16
No

I think I found two flaws in the text parser when I was using the html parser.

First, in change_token() <head> remains at the previous <token> position. This is no real problem as head is moved in the next word check since the new word was accepted, it is just a bit of time waste :-)

@@ -196,10 +196,13 @@
int TextParser::change_token(const char * word)
{
if (word) {
+// fprintf(stderr, "chg1: %s <:> %s\n", word, line[actual]+head);
char * r = mystrdup(line[actual] + head);
strcpy(line[actual] + token, word);
strcat(line[actual], r);
+ token += strlen(word);
head = token;
+// fprintf(stderr, "chg2: %s <:> %s\n", word, line[actual]+head);
free(r);
return 1;
}

Second, under rare conditions alloc_token() requests too many characters when it checks for url characters - head remains a the last checked position.

</row>

<row>
<entry>data2c</entry>

0: Data
1: -data
2: dato

After accepting "-data" it highlights "c</entr" instead of c

the fix:

Index: textparser.cxx

--- textparser.cxx (Revision 20544)
+++ textparser.cxx (Arbeitskopie)
@@ -259,8 +262,14 @@

int TextParser::get_url(int token_pos, int * head)
{
- for (int i = *head; urlline[i] && *(line[actual]+i); i++, (*head)++);
- return checkurl ? 0 : urlline[token_pos];
+ int head_tmp = *head;
+ for (int i = *head; urlline[i] && *(line[actual]+i); i++, (*head)++);
+
+ if(checkurl)
+ return 0;
+
+ *head = head_tmp;
+ return urlline[token_pos];
}

I am not sure about the background but probably urls and email addresses should just be kept away from processing.

Discussion

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks