Re: [Emacs-vr-mode-devel] reimplementing vr-mode (long)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Jan,
Wow, reimplementing VR mode, that's impressive!  I fully agree that there 
are things in it that should be redone, but it works for me and will 
largely be made obsolete by VoiceCoder anyway, so I haven't been motivated 
to do anything.

vr mode should work with every Emacs earlier than v21. there were strange 
issues with version 21 that I could not track down (Eric also noticed some 
of this and suspected an Emacs bug), and since I didn't need any features 
from 21 I just continued using 20.  I haven't even tried it lately, so I 
don't know if those issues are still up.

In any case, I had no idea xEmacs did not have overlays, that's too bad!  I 
think it would be in simpler in any case to just use change events instead 
of overlays, but that's how it was written.

I've commented on some specifics further down in the message.

Regards,

/Patrik

At 05:15 PM 7/23/2004 -0700, Jan Rychter wrote:
>Hi,
>
>Having recently decided that I really need dictation in my XEmacs, I
>have taken a close look at VR-mode and started hacking on it. I have
>spent about a week on it, and it's time to write about my experiences
>and bounce some ideas around.
>
>First of all, many thanks to Barry Jaspan. Without his code, I would not
>have been able to get started with this work.
>
>I started by just trying to run VR-mode under XEmacs. That would not
>work for a number of reasons. The code has really been written with GNU
>Emacs in mind -- it uses overlays, which XEmacs does not have (instead
>providing extents). It also seems to be quite old and works only with
>older versions of Emacs.
>
>The next logical step was to try to modify the code and adapt it to
>XEmacs. But the more I looked at the code, the more I realized that is
>very complex and that I simply cannot understand it. So, I decided to
>start from scratch on the XEmacs side and promptly proceeded to rip out
>all the code which I could not understand. Needless to say, that has
>left me with very little code :-)

yeah, it's a mess... ;-)

>In the process of rebuilding the code I have learned quite a bit about
>how it works, also diving into VR.exe along the way. I have discovered
>many things which were rather puzzling to me -- for example, in spite of
>the efforts to keep all changes synchronized after every buffer
>modification, a complete resynchronization is done at the beginning of
>each recognition.

This is not how it should work.  A complete resynchronization should only 
happen if the buffers loose sync.  Are you actually seeing this happen?  In 
that case I am very puzzled.  If you are just inferring from the code, I 
believe you missed something.  It certainly does not resynchronize every 
time for me.

>The first changes to VR.exe were rather simple -- since my goal was to
>dictate into Emacs/XEmacs running anywhere (not necessarily on the same
>Windows desktop), I had to change the way I dictation was activated. In
>principle, I wanted to dictate "into" the VR.exe window. The advantage
>of these changes is that it let me get rid of the "hook.DLL" dependency.

I'm confused.  VR mode does provide for the possibility of dictating into 
instances of Emacs running on other computers, as long as you have a window 
(through an X server or terminal window) that Dragon can connect to. Do you 
really mean you want to dictate into an Emacs that you can't see?  Or am I 
not understanding what you mean?

>As I was getting a better understanding of the code, I was also trying
>to simplify it. The first thing to go away on the VR.exe side were
>multiple Clients. The original code tried to maintain a separate Client
>object for each Emacs frame, with each client object maintaining a list
>of buffers. This doesn't really reflect very well how Emacs manages
>frames and buffers, and introduces a lot of complexity. So, I have
>reworked the code to use only a single client.

the clients do not (I think...) use a separate clients object for each 
Emacs frame, the clients are used to separate instances of Emacs.  For 
example, if I'm running one Emacs on my own Windows machine, and another 
Emacs on a remote machine and both are connected to VR mode, they will be 
represented by two different client objects.  Maybe you could use a single 
client object (I haven't looked at the code recently, so I can't remember 
if it would work), but I think to Dragon, each client object represents the 
different windows for which dictation is enabled.

>After that, I got the synchronization in a single buffer to work. It
>actually wasn't as difficult as I had feared. I am now at a point where
>I can dictate into an XEmacs buffer, use the select and say
>functionality, and use "scratch that". It works for simple text buffers,
>also with auto-fill enabled. Of course, you can also do edits on the
>XEmacs side and mix them with dictation freely. There are still some
>bugs remaining, but the basic functionality is there.
>
>Then, I proceeded to implement dictation in multiple buffers. This
>started getting tricky: I had to be very careful which buffers are
>active in which corresponding custom dictation objects are
>activated. And then, I finally hit a real problem. What I wanted to
>achieve was flawless integration. I wanted to be able to dictate into
>Emacs at any time, not just into "voice-activated" buffers. Basically,
>whenever I can see the point on the screen, I should be able to dictate.

This is also possible with VR mode, as long as natural text is 
enabled.  There are some cases where it does not necessarily make sense to 
enable VR mode on a buffer, for example if the buffer contains some strange 
structures or has a lot of customized key mappings that will confuse VR 
mode to the point that it's not useful.  In those cases, it's usually 
simpler to just let natural text send the keystrokes to Emacs directly.  If 
one really wanted to, it would be a simple change to make to VR mode as well

>The problem that appears is that if you simply start dictating into a
>new buffer, VR.exe still thinks you're dictating into the old buffer and
>it requests the contents of the old buffer for resynchronization. There
>is no easy way around that, because the new custom dictation object for
>the new buffer doesn't exist yet, and even if we create it at this
>moment, it is already too late -- dictation has started and DNS things
>we are dictating into the old custom dictation object.

There is a hook you can use, that tells you when point has been moved to 
another buffer.  This is how VR mode tells VR.exe that it should now enable 
another buffer for dictation.

>Of course, there are some ways of artificially working around that --
>but I started thinking: why do we bother with storing the complete
>contents of several buffers on the DNS side. It takes a lot of effort,
>and a lot of communication, and is rather wasteful of resources. I also
>have serious doubts whether DNS actually uses this contents for anything
>-- I suspect only the immediate context is used by Dragon for things
>like capitalization, spacing and punctuation, and the rest is never
>looked at. So, I think the next step for me will be to rework VR.exe to
>just use a single custom dictation object, whose contents would be
>synchronized to whatever is currently needed. I also think that I will
>not bother with synchronizing entire buffers. Instead, I will only
>synchronized a small area -- closely related to what is actually visible
>on the users screen. Comments and suggestions are appreciated.

this has been discussed before, I think, and I'm pretty sure this is how 
VoiceCoder does it.  Though I'm not sure the contents are not used, I 
definitely have the impression that if I dictate some strange word it has a 
higher probability of being correctly recognized if it's already visible 
somewhere in the window.  But sending the visible area should take care of 
that as well.  The only drawback I can imagine is that most transmissions 
will become longer which may impact responsiveness over a slow 
network.  I'm also not sure how "scratch that"-functionality works in those 
instances, but I'm pretty sure it can be done.

>Another thing which got me thinking was that there are two ways of
>interfacing to Dragon NaturallySpeaking: you can use the Dragon native
>API or SAPI. From what I can see, the Dragon native API is slightly
>easier to use, but also assumes a lot about what you're trying to do
>(for example, it automatically manages spacing for you, and there is no
>way to turn it off), while SAPI is more complex, but also more
>flexible. An example of this flexibility is the "Phrase Hypothesis"
>method, which is supposed to provide several hypotheses for a given
>utterance. This method might be very useful if we know a bit about the
>context, which we often do in Emacs. However, I have no idea how well
>Dragon supports SAPI and whether all methods are actually implemented. I
>also don't know what is the strategic direction for Dragon
>NaturallySpeaking -- which API will continue to be supported in the
>future.
>
>And finally (this e-mail is already long enough) -- does anybody know if
>there is a way to implement things like the QuickCorrect menu, adding
>words to the dictionary, or word pronunciation training without the
>Windows desktop? It seems to me that the Dragon API and assumes that the
>user is going to sit in front of a Windows desktop. You can request a
>dialog box to be shown, but you can't simply access the functionality
>yourself. But perhaps I simply haven't found a way to do it yet.

well, assuming that you are going to sit in front of the Windows machine 
seems to be a pretty safe assumption given that Dragon only runs on Windows 
machines?  NatLink has functionality for this, but I haven't played with it 
so I don't know exactly what it can do.

I hope some of this was useful information... ;-)