Each char represents a token type. I call this my "tokenpattern" string.
Actual tokens, with additional info are stored in a list.
[tokenlist]tokens[1] (type=NEWLINE, value ="", line = 1, col = 1)tokens[2] (type=NEWLINE, value ="", line = 2, col = 1)tokens[3] (type=DROPCAP, value ="", line = 3, col = 1)etc...
The index in the "tokenpattern" string maps to the token's position in the tokenlist above.
So what I'm trying to do next is define valid syntax patterns using SNOBOL patterns, using my "tokenpattern" string as the subject.
What I am hoping is that if a pattern fails, I can get the position in "tokenpattern" string where it failed. And since this index in the "tokenpattern" string also identifies the index of the token in the tokenList, I can lookup the token record at tokenList[index] to get the token's corresponding line and col in the source input.
Is this a bit too far-fetched?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi there, I'd like to know if it's possible to know the position of where a subject ?? pattern failed?
I'm experimenting with an idea.
The trick I'm trying is instead of integers for token IDs, I'm using characters.
As tokens are recognized, I append its ID (a char) to a string. The result will be kind of like a DNA sequence - a pattern of tokens.
In addition, I'm keeping a list of token records, which has the typical info: value, line and col.
So when string scanning on my input is complete I have something like:
Each char represents a token type. I call this my "tokenpattern" string.
Actual tokens, with additional info are stored in a list.
The index in the "tokenpattern" string maps to the token's position in the tokenlist above.
So what I'm trying to do next is define valid syntax patterns using SNOBOL patterns, using my "tokenpattern" string as the subject.
What I am hoping is that if a pattern fails, I can get the position in "tokenpattern" string where it failed. And since this index in the "tokenpattern" string also identifies the index of the token in the tokenList, I can lookup the token record at tokenList[index] to get the token's corresponding line and col in the source input.
Is this a bit too far-fetched?