Thank-you very much for the all hard work you have put into this project. I had used your origional Program# and this is a great improvement.
I have ported this project to vb.net and have been working with it fairly extensivly over the last week or so and found a few bugs.
In the file ApplySubstitutions.cs the subs ProcessChange and Substitute are broken for example:
The word 'Favorite' becomes 'Favorite orite' because the sub is not matching on whole words and replacing the 'Fav' part with 'Favorite'.
The Thatstar, Topicstar and InputStar indexes appear to be backwards.
I beleive that the first * in a pattern from left to right should be index 1 and the second Index 2 and so on..
Thanks Vauntie for the positive message and well done on finding these bugs. I appreciate you taking the time to report them to me.
Its late here in the UK so I'll take a look and fix them in the morning. I'll probably do a quick point release later in the day when I've added some unit tests to check the regex and ApplySubstitutions are working properly and the *Star collections are the right way around.
Watch this space!
I've finally found time to take a look at these two issues. In the end they are not bugs but the situations you describe require further explanation:
"In the file ApplySubstitutions.cs the subs ProcessChange and Substitute are broken for example:
The word 'Favorite' becomes 'Favorite orite' because the sub is not matching on whole words and replacing the 'Fav' part with 'Favorite'."
I've looked into this and found that the code is working correctly. As the example "substitutions" file that I include with the code shows, some substitutions (especially shortened words such as "RU" and "UR") are specified as " RU " and " UR " (notice the use of spaces here). Your problem would be solved if you matched against " FAV " instead of "FAV". The rule of thumb here is to include white-space IF the value you are matching contains a common pattern of characters (such as "FAV", "UR" or "RU").
"The Thatstar, Topicstar and InputStar indexes appear to be backwards. "
I just looked - and the answer is no for the following reason. The processing of the "path" is recursive so as the result is passed back up the node-tree then the latter matches are encountered first (as they are deeper) and so are added because the AIML standard says that the most "recent" match needs to be at position 0 in the array.
Perhaps an example will help.
Imagine the sentence "Test first match Test second match" that is matched by the category "TEST * TEST *". The AIML standard says that the InputStar should look like this: ["second match", "first match"] because the "second match" was matched more recently than the "first match". Now, as the nodes recursively process the first node to do its work will be the one dealing with "second match" which is why it as Added to position 0 in the InputStar.
I've just re-read my explanation - it seems as clear as mud, but it definitely is working as it should - I've just checked and double checked.
Hope this clears things up.
I understand your interpretation of the spec as I have read it myself. I will just give you an example of what happens.
Here is a sample from Default.xml:
<category><pattern>* HAS *</pattern><template>Where did <set name="he"><person/></set> get <set name="it"><person><star index="2"/></person></set>?</template></category>
<category><pattern>* GAVE *</pattern><template>Did <person><star index="2"/></person> keep it? <think> <set name="it"><person><star index="2"/></person></set> <set name="he"><person/></set> </think></template></category>
You: Jim has a computer
Bot: Where did a computer get Jim?
You: Jim gave bill his computer
Bot: Did Jim keep it?
Regarding the replacements,
If the bot is matching on spaces you would have to put 3 variations of each match i.e. "FAV" " FAV" "FAV " in order to match the start and end of sentences as well. I have verified this and indeed I do.
A Correction to my last post and the replacements, it would only take 2 variations per replacement to match most sentences "FAV " and " FAV" because "FAV" would match all occurences.
Another problem arises though,
for example if I want todo a replacement for "CU" as in "see you later"
if I do a substitution like "CU" "BYE", this will break a word like calCUlator
if I do the substitution like " CU" and "CU " it will not be substituted in a sentence consisting of the single word "CU"
Hope you understand,
Vauntie / Robert
As always, thanks for the detailed replies.
With regard to the *Star indexes. I've just re-read the specification and it definitely seems vague. Furthermore, your example convinces me that my interpretation is wrong. As a result, it'll be fixed in the next (upcoming) release. Thanks for pointing it out - keep this stuff coming.
Replacements were (and are) a headache. Again the specification is vague, describing them as "heuristics applied to an input that attempt to retain information in the input that would otherwise be lost during the sentence-splitting or pattern-fitting normalizations". I completely understand your point of view and agree with you 100%. My "rule of thumb" would be that an AIML author should be able to just specify "CU" and have it work as if it were " CU", "CU " and "CU" as a single word but not work in a word in which it is a constituent (like your "calCUlator"). Unfortunately, ALICE's own substitution files (the ones I supply with the code) include patterns that include spaces. It looks like I'm going to have to brush up on my regex.
Log in to post a comment.