Re: [Emacs-vr-mode-devel] reimplementing vr-mode (long)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello again,

Thanks for the feedback. I was beginning to be a little bit concerned
that I was the only user of this software.

> Wow, reimplementing VR mode, that's impressive!  I fully agree that
> there are things in it that should be redone, but it works for me and
> will largely be made obsolete by VoiceCoder anyway, so I haven't been
> motivated to do anything.

I was actually wondering about the relationship between VoiceCoder and
VR-mode. From what I could understand, the goals for the VoiceCoder
project are different -- it is supposed to support programming, instead
of continuous text dictation. But perhaps I'm wrong?

[...]
> In any case, I had no idea xEmacs did not have overlays, that's too bad!
> I think it would be in simpler in any case to just use change events
> instead of overlays, but that's how it was written.

XEmacs has extents instead of overlays. That is a different API to what
amounts to basically the same thing. I have the impression that XEmacs
extents are a little more powerful than overlays.

> >In the process of rebuilding the code I have learned quite a bit about
> >how it works, also diving into VR.exe along the way. I have discovered
> >many things which were rather puzzling to me -- for example, in spite of
> >the efforts to keep all changes synchronized after every buffer
> >modification, a complete resynchronization is done at the beginning of
> >each recognition.
> 
> This is not how it should work.  A complete resynchronization should
> only happen if the buffers loose sync.  Are you actually seeing this
> happen?  In that case I am very puzzled.  If you are just inferring from
> the code, I believe you missed something.  It certainly does not
> resynchronize every time for me.

Sorry about that. I confused several issues here. Indeed, it doesn't
resynchronize every time.

> >The first changes to VR.exe were rather simple -- since my goal was to
> >dictate into Emacs/XEmacs running anywhere (not necessarily on the same
> >Windows desktop), I had to change the way I dictation was activated. In
> >principle, I wanted to dictate "into" the VR.exe window. The advantage
> >of these changes is that it let me get rid of the "hook.DLL" dependency.
> 
> I'm confused.  VR mode does provide for the possibility of dictating
> into instances of Emacs running on other computers, as long as you have
> a window (through an X server or terminal window) that Dragon can
> connect to. Do you really mean you want to dictate into an Emacs that
> you can't see?  Or am I not understanding what you mean?

Well, try to look at this differently: I do see my Emacs window, but I
would prefer not to see the Windows desktop that NaturallySpeaking runs
on. I don't normally use Windows, in fact NaturallySpeaking has been the
primary application which forced me to start using it. What I do is I
start Windows in a VMware emulator on my normal Linux desktop. The
VMware window might, or might not be visible -- and in any case, this is
not where I work.

This means that I had to do away with the whole activation code that was
present in VR-mode, because it simply doesn't apply in my case. I want
to dictate into the VR-mode window on the Windows desktop -- in fact,
that is going to be the only window on my Windows desktop.

> >As I was getting a better understanding of the code, I was also trying
> >to simplify it. The first thing to go away on the VR.exe side were
> >multiple Clients. The original code tried to maintain a separate Client
> >object for each Emacs frame, with each client object maintaining a list
> >of buffers. This doesn't really reflect very well how Emacs manages
> >frames and buffers, and introduces a lot of complexity. So, I have
> >reworked the code to use only a single client.
> 
> the clients do not (I think...) use a separate clients object for each
> Emacs frame, the clients are used to separate instances of Emacs.  For
> example, if I'm running one Emacs on my own Windows machine, and another
> Emacs on a remote machine and both are connected to VR mode, they will
> be represented by two different client objects.  Maybe you could use a
> single client object (I haven't looked at the code recently, so I can't
> remember if it would work), but I think to Dragon, each client object
> represents the different windows for which dictation is enabled.

Thanks for the explanation, I think I understand better now. However, I
don't really see how this functionality could be useful if one runs only
one instance of Emacs...

> >After that, I got the synchronization in a single buffer to work. It
> >actually wasn't as difficult as I had feared. I am now at a point where
> >I can dictate into an XEmacs buffer, use the select and say
> >functionality, and use "scratch that". It works for simple text buffers,
> >also with auto-fill enabled. Of course, you can also do edits on the
> >XEmacs side and mix them with dictation freely. There are still some
> >bugs remaining, but the basic functionality is there.
> >
> >Then, I proceeded to implement dictation in multiple buffers. This
> >started getting tricky: I had to be very careful which buffers are
> >active in which corresponding custom dictation objects are
> >activated. And then, I finally hit a real problem. What I wanted to
> >achieve was flawless integration. I wanted to be able to dictate into
> >Emacs at any time, not just into "voice-activated" buffers. Basically,
> >whenever I can see the point on the screen, I should be able to dictate.
> 
> This is also possible with VR mode, as long as natural text is enabled.
> There are some cases where it does not necessarily make sense to enable
> VR mode on a buffer, for example if the buffer contains some strange
> structures or has a lot of customized key mappings that will confuse VR
> mode to the point that it's not useful.  In those cases, it's usually
> simpler to just let natural text send the keystrokes to Emacs directly.
> If one really wanted to, it would be a simple change to make to VR mode
> as well

Well, "natural text" wasn't really an option for me, since my Emacs runs
under a different operating system.

[...]
> >Of course, there are some ways of artificially working around that --
> >but I started thinking: why do we bother with storing the complete
> >contents of several buffers on the DNS side. It takes a lot of effort,
> >and a lot of communication, and is rather wasteful of resources. I also
> >have serious doubts whether DNS actually uses this contents for anything
> >-- I suspect only the immediate context is used by Dragon for things
> >like capitalization, spacing and punctuation, and the rest is never
> >looked at. So, I think the next step for me will be to rework VR.exe to
> >just use a single custom dictation object, whose contents would be
> >synchronized to whatever is currently needed. I also think that I will
> >not bother with synchronizing entire buffers. Instead, I will only
> >synchronized a small area -- closely related to what is actually visible
> >on the users screen. Comments and suggestions are appreciated.
> 
> this has been discussed before, I think, and I'm pretty sure this is how
> VoiceCoder does it.  Though I'm not sure the contents are not used, I
> definitely have the impression that if I dictate some strange word it
> has a higher probability of being correctly recognized if it's already
> visible somewhere in the window.  But sending the visible area should
> take care of that as well.  The only drawback I can imagine is that most
> transmissions will become longer which may impact responsiveness over a
> slow network.  I'm also not sure how "scratch that"-functionality works
> in those instances, but I'm pretty sure it can be done.

I decided to just go ahead and try this approach. For the moment, I have
reworked the code so that it uses just a single custom dictation
object. That object gets updated with whatever is in my active XEmacs
window at the beginning of each recognition. Ticks are being used for
synchronization if the buffer name hasn't changed, otherwise the entire
buffer gets sent.

The result works extremely well -- in fact, it already achieves about
80% of what I was hoping for. I don't have to "voice activate" any
buffers, I can simply start dictating anywhere. I can also switch into
any buffer at any time and say "select something" and "scratch that" and
have it work. Auto-fill and LaTeX double quotes work just fine.

The only problem that I can see remaining for simple dictation is that
Dragon insists on its own spacing rules. I did not find a way to disable
automatic spacing in the Dragon API. This is a little bit annoying at
times: for example, if you open a quote (") in latex, Emacs will
transform it into a double leftquote (``), and if you subsequently start
dictating right after the quote, DNS will insist on inserting a space
there.

I would also very much like to find a way to use the quick correct menu
remotely, and a way to make sure that my corrections make it into what
DNS knows about my speech. For the moment, I'm afraid DNS cannot learn a
lot from my corrections.

> >Another thing which got me thinking was that there are two ways of
> >interfacing to Dragon NaturallySpeaking: you can use the Dragon native
> >API or SAPI. From what I can see, the Dragon native API is slightly
> >easier to use, but also assumes a lot about what you're trying to do
> >(for example, it automatically manages spacing for you, and there is no
> >way to turn it off), while SAPI is more complex, but also more
> >flexible. An example of this flexibility is the "Phrase Hypothesis"
> >method, which is supposed to provide several hypotheses for a given
> >utterance. This method might be very useful if we know a bit about the
> >context, which we often do in Emacs. However, I have no idea how well
> >Dragon supports SAPI and whether all methods are actually implemented. I
> >also don't know what is the strategic direction for Dragon
> >NaturallySpeaking -- which API will continue to be supported in the
> >future.
> >
> >And finally (this e-mail is already long enough) -- does anybody know if
> >there is a way to implement things like the QuickCorrect menu, adding
> >words to the dictionary, or word pronunciation training without the
> >Windows desktop? It seems to me that the Dragon API and assumes that the
> >user is going to sit in front of a Windows desktop. You can request a
> >dialog box to be shown, but you can't simply access the functionality
> >yourself. But perhaps I simply haven't found a way to do it yet.
> 
> well, assuming that you are going to sit in front of the Windows machine
> seems to be a pretty safe assumption given that Dragon only runs on
> Windows machines?  NatLink has functionality for this, but I haven't
> played with it so I don't know exactly what it can do.

VMware, win4lin, WINE and other emulators undermine this assumption
quite a bit. Also, I have been assuming that Dragon NaturallySpeaking is
being used also in applications that do not display Windows desktop,
such as automated transcription, or other server-side applications.

> I hope some of this was useful information... ;-)

Definitely!

thanks,
--Jan