I've just completed a one week evaluation of Pyke and after discussions with the client (a division of an Australian government department) on Friday, we have agreed that Pyke will be able to scratch several of our remaining itches and to continue working with it.
There are a small number of potential enhancements to Pyke that we would like to see made to Pyke that would make it a better fit for existing Python development environment. I'm not just asking for these enhancements to materialise out of thin air - I am prepared to work on these myself, but I would appreciate Bruce's advice and comments before I dive into anything too deeply:
1) Support for asynchronous ask_modules, and tighter Django integration.
We've already discussed this on the Help Forum, and I have implemented a proof-of-concept prototype. I would like to put together a generic framework for asynchronous ask_modules, and a specific implementation for use with Django. (This is another advantage to implementing ask_modules as class instances rather than modules BTW - you can use inheritance!)
2) Catchall reviews for questions
Support for a catchall reviews - a review message to use when the answer satisfies the match validation, but does not match any of the other review messages - would be desirable. This would require a minor extension to the kqb syntax, and I am happy to work on this as well.
3) Docstrings for krb and kqb files (and may as well add them to kfb files while we're at it.) and an associated sphinx extension.
We need a way to map rules in knowledge base files (maintained by developers and technical staff) to plain-english descriptions of the rules that will be readable by the highly non-technical expert users who will be specifiying and/or approving the rules. We currently do this via docstrings in the validation methods that implementing the rules. It would be great if we could embed similar documentation in our knowledge base files, and build it out to html or pdf format with the rest of our sphinx documentation build.
This might take a bit more work, requiring as it does diving into the internals of Sphinx as well as Pyke, but it would make a big difference to us.
I would appreciate your comments and feedback on the above plan-of-action/wish-list.
I share interest in those as well. (I just posted a reply to your comments from last week at https://sourceforge.net/projects/pyke/forums/forum/744446/topic/3945480 .) I'm even using Django. Unfortunately, I am not actively working on this but would be happy to dig in a few hours here and there to help push this forward.
Oops, wrong link - I meant: https://sourceforge.net/projects/pyke/forums/forum/744447/topic/3935475
I have limited time today, so would like to just provide a quick "first thoughts" response for now.
1. Asynchronous questions: The more I think about this, the more it seems that the right answer here is to add an interface to retrieve and reload the cache that each question already has (where the question stores answers for combinations of parameters already asked). The cache for each question is simply a dictionary mapping the parameter values (as a tuple) to the answer.
I envision three levels of retrieve/reload calls: one on the individual question (class question in question_base.py). This one would simply return/set its cache. One for the whole question base (class question_base in question_base.py). This one would return a dict mapping the question names to the caches (so two-level dict). And finally, one for the whole knowledge engine (class engine in knowledge_engine.py). This would return a dict mapping question base names to each question base dict - so this would be a three-level dict. Then, assuming that you grab the cache for all questions at the engine level, you would store a new answer simply as cache = answer before reloading the cache.
This would also allow Pyke users to pickle their answers and replay them later (possibly changing some of them first).
2. Catchall reviews for questions: sounds like a good idea! But if this ends up just being help text for the question, it might be better to store it more explicitly as help text (i.e., what the heck does this question mean?).
3. Docstrings: sounds interesting. I'm not familiar enough with sphinx to give much advice on that end. I assume though that you're interested in the sphinx autodoc feature here? On the Pyke side, it should be easy to add an optional (indented) string literal right after the rule name. For a docstring corresponding to Python's module level doctstrings, I'm not sure if you'd want one docstring for the whole .krb file, or separate docstrings for forward-chaining rules, backward-chaining rules, and plans. The latter have the "fc_extras", "bc_extras" and "plan_extras" sections already, though these are at the end of .krb file. The former could simply be placed at the top of the .krb file.
The next issue is where do you want these stored, and how would they be retrieved? Storing them as docstrings in the generated .py files would be one option. Each rule becomes a Python function by the same name as the rule name; so it would be easy to add a docstring there. And there are up to 3 generated .py files for each .krb file, so if you wanted to add module level docstrings, this would argue for the separate fc/bc/plan docstrings (one for each generated .py file).
The other option would be to store the docstrings in the Pyke engine when Pyke is started and initialized. Then there would have to be some kind of interface added to retrieve these. I have no idea how sphinx would tap into this information though. If you go this route, it would argue for a single .krb level docstring (corresponding to the rule_base class in rule_base.py) and might also argue for an additional level of docstring for each goal name (lower level than .krb file, but higher level than individual rules since several rules pertain to the same goal. This intermediate level docstring would correspond to the rule_list class in rule_base.py. The individual rule docstrings would correspond to the rule class in fc_rule.py which is the base class for both forward-chaining and backward-chaining rules. I'd have to think about how the syntax for the intermediate "goal" level docstring would look. I've wondered in the past if there should be one place to place definition information about each goal, but this has seemed to lead to more questions than answers…
Maybe this will help to get the discussion going.
As far as the Django integration, I remember a question about whether Pyke supports threads that I think was in relation to Django. Pyke wasn't designed for threads, so if you're using a multi-threading version of django, that might also be an issue for django integration. Don't know how important this is to the django folks… I may be able to dig up my reply to this (can't seem to find it on SF) if anybody is interested…
Time's up, got to go now!
Thanks for your comments, they have given me food for thought.
1) I was concerned that the existing ask interface wasn't particularly well suited to building an asynchronous module, but hadn't had the chance to dig deep enough into the question implementation to think about alternative interfaces. (The lack of access to both the engine and the underlying question object forces a number of clumsy workarounds) What you describe sounds much more useful and appropriate - I will have a trawl of the code and rethink my simplistic first approach.
2) Catchall review messages only really make sense for string type questions - there is already facility in the match syntax for responses that don't match a particular regular expression. But there is still room for catchall expressions. A fairly contrived example:
What's the name of your pet $animal?
$name = string("Name can only contain letters and must start with a capital letter" /*/)
/^Fred$/ ! What a coincidence! My $animal's name is Fred too!
else ! What a dull name!
(Here using "else" as a keyword to indicate the catchall. An asterix might be a better choice of syntax)
3) Yes, I'm interested in the autodoc feature, and no, I don't have any experience writing sphinx extensions either! Copying the docstrings to the compiled python files makes sense - probably the *.qbc files for questions too, although they're not really user-readable enough to really justify it. Once in the python code, they can be retrieved at runtime using __doc__. I can't really comment too much further on this until I've looked more into writing a sphinx extension, but your comments are appreciated.
As for thread safety - I assume each knowledge engine instance is independent of other instances? As long as each thread/session has it's own engine instance, things would be OK, wouldn't they?
2) Looking at this a second time, it seems that what you are looking for could be broken down into two different additional capabilities. The way the current review mechanism works is to output all matching reviews.
One part of your capability is that you don't want the "else" review when some previous review matches. Perhaps an "else" keyword could be used to separate different groups of reviews, where all reviews within a group are shown (the current behavior), but once there is a match in one group, subsequent groups are skipped. This would prevent the "else" review from being shown when there is another matching review, by placing the other, more specific, reviews in a different group.
The other part of your capability is to add a "match anything" match. Perhaps this could be done by simply omitting the match (a line starting with !). Then, by including this "match anything" review in a group, that review would always be shown (e.g., "Thank you for your answer"), along with more specific reviews.
Then you could realize your default review by a combination of these two additions: a marker (e.g., "else") to form a new review group, followed by a match-anything review that stands alone in the second group. This ends up looking almost exactly like what you show, but with the final "! What a dull name!" on a separate line.
It's been awhile since I did this question stuff, but the reviews should be shown in the same order as in the .kqb file.
3) The *.qbc and *.fbc files are simply pickles. It seems like you'd have to put the docstrings on the fact and question objects inside the engine to have a docstring capability here. I don't know whether these files could be unpickled by another program (sphinx) without bizarre things happening? I seems like that should work… So maybe sphinx could be modified to unpickle these files to find the docstrings there?
4) Thread safety. I found my previous answer to the threading question and posted it to the FAQ list for future reference. The upshot for multiple engine objects, each in its own thread, is that I think that this should work, but don't think that the Pyke compiler is thread safe. There could also be a problem if you create an engine, change a .krb file, then create a second engine without re-creating the first one. But I haven't tried any of this. If you try this, please share your results with the rest of us!
2) A simpler approach that would probably suffice would be to allow a "not" operator in the review matches. i.e. Use the review if the answer does NOT match the pattern/range/whatever. Your suggestion is obviously more flexible though.
3) Ah pickles! I didn't recognise them. We're at least simplifying the work that the proposed sphinx extension would have to do.
4) Excellent - that's good enough for me. I'd only be deploying compiled knowledge bases to the server in any case.
I've been thinking in more detail about 1).
The get/restore cache methods you mention help a lot, but there's still something missing. As I see it we need to either:
a) Modify the interface to the ask module so that the underlying question object and the parameter bindings are passed to the ask_* methods/functions. In principle we could also remove all the other arguments as well, because they can be extracted from these. This is a pretty radical change to the interface and raises compatibility issues. (It also requires registering the "question" object with the "user_question" object, but that's trivial.)
b) We need to add an interface to the engine that takes a fully substituted question text and the question the question type and returns the knowledge base name, the question name, and the parameter bindings. This obviously raises ambiguity issues (e.g. what do we do if we have two string questions "What is the name of your pet $animal?" and "What is the $property of your pet dog?")
It seems to me that a) is the clearly the cleaner approach, so how to deal with the compatibility issues? We either break backward compatibility, rewrite the tty and wx ask modules, and force all users to rewrite their custom ask modules on upgrade, or we support multiple ask module interfaces. Supporting multiple interfaces could be achieved either by registering a new-interface ask module in a different way (e.g. engine.advanced_ask_module = xxx instead of engine.ask_module = xxx - check for the existence of an "advanced" ask module first, then fall back to the current behaviour) or by calling a get_interface_version() method/function on the ask method and assuming the old interface if the function doesn't exist. We could also deprecate the old interface and remove it in a later release.
As far as I'm concerned, all the various options above are technically equivalent (except where I've stated otherwise) and all more or less as difficult/easy as each other to implement. I'm therefore looking for Bruce's feedback for his preferences as project maintainer.
I agree that option a is preferred. We should be able to nix the other parameters to ask and just pass the question and the format_params. (I let this sit over night and can't come up with any better option).
I guess we'd need additional methods on user_question then to return what had been passed as parameters to the ask functions. Say: prepare_question and prepare_review each taking the format_params (there is already a prepare_arg2). You can lift the code for these from the user_question.ask method. And then I'd add these methods to question too as just pass-throughs to user_question so that the ask methods/functions just get the question and don't have to know about user_questions.
At this point, the only thing left in the user_question.ask method is the ask_fn lookup. Since user_question.ask is only called by question.lookup, I'd move the ask_fn lookup to question.lookup. This removes the need for user_question to link back to the question_base (which was only there to support the get_ask_module call). Then the user_question.ask method can be deleted and the user_question should not require any link back to question or question_base.
I'm also wondering, while your at it changing the ask* modules, if you should change the ask* modules to use classes so that others can subclass them? Haven't really looked at them to see if subclassing would be useful. (But I would still support registering either a module or class-based object to make migration from module-based ask functions easier).
Finally, I would lean towards just breaking existing code here. I can't imagine anybody having more than one custom ask module, so this shouldn't be that big of change for anybody.
I think that all in all, this ends up being a much better design than the original.
One last thing, in terms of the question cache retrieve and restore operation and how this works with your desired ask implementation. I am now leaning away from your ask methods changing the cache prior to the restore operation (to leave the retrieved cache as a black-box for future madness). Rather go ahead and simply restore the cache, and then tell the question to add the new answer. There is already a get_ke method on the engine object that can be used to lookup the question object. All the remains is to add a method on question to add an answer (move the code out of question.lookup). This would take the format_params and the new answer as parameters. This new method should also be called by question.lookup to store a new answer to keep the cache update code all in one place.
Doing this would require changing the key to the cache to something like tuple(sorted(format_params.iteritems())) to ensure that the parameter order is always the same.
Hopefully I haven't gotten too detailed here. It sounds like a lot of changes, but they are all small changes… :-)
Not at all, that all sounds great. Thanks heaps, I'll dive straight in.
I'll publish a Mecurial repository when it's ready.
And seeing as I'll be rewriting ask_tty.py anyway, I guess I'll fix the bug I reported there while I'm at it (ID 3105666).
OK Bruce (or anybody else who's interested), take a look at:
I've got some other work I need to get back to now. I'll hopefully have time to look at this some more in a couple of weeks.
Yeah, we've got our Thanksgiving holiday this week, so I've been pretty busy. Good to know that I have some time to take a look.
OK Paul, I've pushed your changes to the pyke repository along with a few things that I had done. So do a pull before you get started again…
The above client is gearing up to revisit the work we did on pyke in 2010, as discussed above.
Bruce - are you still around and maintaining pyke? An official release hasn't been made since the changes discussed in this thread were merged, and there's a "hello world" thread on google groups that has gone unanswered since December 2012.
If you have moved on, any chance you could add me as an admin to the project on sourceforge?