Re: [Hamlib-developer] Proposed patch for tranceive hanging

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 11-Mar-02 Stephane Fillod wrote:
> On Sun, Mar 10, 2002, Chuck Hemker wrote:
>>
> 
> RIG_EPROTO is the only one to fit. However, it's rather pointless to the
> user. When having a collision, we should retry rs->retry times, and fail
> with an error code. I was thinking of something like RIG_ERESTART, telling
> that there was a temporary protocol issue. Do you have a better name in
> mind?

I was thinking of something as simple as RIG_COLLISION, however, several times
in the past I may have errored on the side of too many error codes rather than
too few.  Or if your expecting that in most cases it will have retried and only
return it if the retries failed, maybe something like COMMBUSY or BUSBUSY (like
CIVBUSY but more generic)  Also most of the time in the past I've run across
"protocol errors", they have been caused by either software errors or version
mismatches rather than temporary overload or hardware issues.

> And yes, if you'd like to work on this, you're more than welcome!
> I don't have time to work on the backend, even though I have a clear idea
> of what needs to be done.

I'll come up with a patch which adds error checking to icom_transaction and
icom_decode along with comments of possible causes.  Then we can discuss
error codes for each of the cases.

>> 3. Just a note to anyone who does want to try to implement resend on
>>    error/collisions:
>> 
>>    It would be nice if there was a way to turn it off.
>> 
>>    A collision could easily be caused by someone turning the knob while the
>>    software is trying to do something to the radio.  In that case, my
>>    software would prefer to know that the knob was turned rather than the
>>    command was done correctly so it doesn't lose any knob turns.  I
>>    understand
>>    that many other pieces of software would prefer their commands to work
>>    and
>>    don't want to have to worry about handling the errors.
> 
> To my mind, if a collision occurs, there's no way to give priority to
> transceive event simply because TX and RX are tied together, meaning
> that data on the CI-V bus is jammed. The controller knows its command
> failed (readback mismatch), chances are big the transceive event is lost,
> i.e. the rig won't resend it. TBC.

1. In the pdf file (note 1) I have (sections 5-4 and 1-6) it says the radio
   when seeing a collision:
    a. waits till the bus is idle
    b. sends the jammer code (FC) 5 times
    c. waits till the bus is idle
    d. retransmits the frame
2. If you have a collision between a set frame and a tranceive frame, and you
   did automatic recovery, you could end up with a situation like this:
    a. call set
    b. set gets a collision
    c. tranceive frame received (if it does resend correctly)
    d. tranceive callback called
    e. set is retried and works
    f. set returns
   And if the software really wanted the tranceive event to take priority, it
   would have to realize this happened and set the radio to the value given by
   the tranceive event.  And if the transceive frame did get lost, there is no
   way the software could figure out what it did because the set was done
   afterwords.

   Where if the set is allowed to fail on a collision it would be:
    a. call set
    b. set gets a collision
    c. set returns collision error
    d. tranceive frame received (if it does resend correctly)
    e. tranceive callback called
   Then the software could either process the tranceive frame, or call get to
   figure out what state the radio is in.

   Now, I'm not saying that retrying shouldn't be supported because most
   software out there will probably just prefer the set to succeed.  But if your
   trying to handle turning the knob while trying to do sets based on the
   previous values of the same vfo it makes it more difficult.  I'm just saying
   it would be nice to make it optional.  
   (Always retry on get commands might be ok)
   (The other issue I could see is what if the software is supposed to be doing
    something else that is semi time critical and gets tied up in hamlib for
    several seconds retrying a command)

  Quick description of my software:
   Because satellites are moving so quickly, the frequency at the radio is
offset    from the frequency at the satellite by a the doppler shift (which
keeps    changing as the satellites possession changes).  Several of the
satellites    have transponders on them which repeat everything (ssb and cw) in
their    passband (~100 khz) on another band.  So, what I was trying to do was
   to allow the user to select the transponder, and then tune through the
   transponder using the knob on the radio and when they hear something that
   they want to listen to, they stop tuning and the software starts updating the
   receive frequency based on the doppler changes so it tracks the qso.  I've
   also used this to adjust the radio when listening to beacons and other 
   fixed (at the satellite) frequency things when either the frequency at the
   satellite is not quite the same as published, or the calculations are
   slightly off.
   And I will be adding support for setting the transmitter (either same radio,
   different vfo, or different radio) to the matching uplink frequency (possibly
   with minor tuning support for correcting errors)

   I got this idea from InstantTune, a dos tsr (with source available)
   that talks to InstantTrack (a dos tracking program w/o source):
    http://www.amsat.org/amsat/ftpsoft.html#pc-it 
     InstantTune

   After hearing talks he's given at some of the Amsat Symposiums, there are
   lots of limitations with radios while trying to do this, such as:
   a. can't change the transmit frequency while transmitting
   b. can't change the receive vfo while transmitting
   c. radio blanks receive audio while adjusting transmit frequency (because
      it has to select transmit vfo, set frequency, select receive vfo)
   If I run across any that could be fixed in hamlib, I'll let you know.

>>     Also, I have implemented a semi-atomic update that does a get, checks to
>>     see
>>     if the value is the same as the old value, and if it is, then it does a
>>     set
>>     followed by another get.  This helps catch any knob changes that it
>>     might
>>     miss due to delays and such)
> 
> Stupid idea: could we have a collision callback? This way the application
> can be notified when a transceive event may have been lost.

  Not sure how much it would help.  First, hopefully the radio will resend it.
  Second, recovery would be a mess (you would have to do a get_freq and
  get_mode on every radio/vfo on the bus) and/or try to figure out if you set
  something you shouldn't have.

  What my "atomic" update checks against:
  a. lost tranceive events
  b. tranceive events that got processed after the set was sent.
     (because right now my radio_server program (hamlib calling program) and my
      doppler control program are separate (and sometimes on different
      machines) talking over a tcp connection, sometimes tranceive events and
      set events pass each other in the network.
  c. On non tranceive radios: knob changes that happened after the last poll
     cycle.
  Now it could miss things that happened between the get and the set, which is
  why it isn't really atomic, but it's much closer than before.
  (It's irritating to get something tuned in just right just before the software
   decides to set the radio off frequency which was happening before)

>> 4. The old read_icom_frame didn't didn't worry about the max rxbuffer
>> length. 
>>    This is a good thing to check just in case the radio/ci-v bus decided to
>>    send lots of garbage.  Since this new version does check, I looked at a
>>    bunch of the callers.
>>    Most of them call it with a buffer of size of 16, however icom_decode
>>    calls
>>    it with a buffer size of 32.  This should probably either be passed as an
>>    argument to read_icom_frame, a #define'ed constant, or at least
>>    consistent. 
>>    For now, I set it to 16 in read_icom_frame.  This will cause problems if
>>    icom_decode_frame gets a frame between 17 and 32 characters (Not seen in
>>    my testing)
> 
> a #define'ed constant or an argument to read_icom_frame would be a must.
> Do you have a patch already?

No.  I didn't know what you wanted.  If you want me to, I'll come up with one.

> Now, talking about the ci-v bus, it's not that simple, I agree with you.
> Just imagine the following cases:
> * the controller remote controls the rig (basic we're doing already, but
>         in ideal case where the following don't happen too much)
> * transceive frames received now and then, while doing something else
> * one controller and several rigs on the same bus (same serial port)
> * multithreaded application (-> serialize Hamlib calls, message
>         interleaving and multiple "open" cmds would be a nightmare)
> * more than one controller on the bus (-> ignore the noise)

(I did some testing by accident the other day when I opened the same radio more
 than once.  There was a fight between icom_transaction and icom_decode because
 they were using different hold_decode's.  I believe multiple radios on the
 same ci-v bus MAY work now as long as tranceive is OFF in hamlib.  But it
 definitely will NOT work if tranceive is ON in hamlib.)

One way to handle most of this would be:
 Try to separate the rig from the ci-v bus.
 When you opened a rig, it would check to see if the ci-v bus was already opened
 (see if the serial port name is the same??) and if so, it would use that ci-v
 bus.  Otherwise it would open the new ci-v bus.

 hold_decode would be a property of the ci-v bus not the rig.

 icom_transaction would be similar to now.

 icom_decode would get a frame, and if it was a tranceive frame it would go
 through each rig that was on that ci-v bus and see if the ci-v address of the
 rig matched the rig and if so call the callback.
 Ignore everything else unless you wanted to add support for ci-v sniffer
 software.

The other option would be switch to a more network programming type structure:
A poll/get_next_event/callback type structure.  icom_decode would handle all
incoming (and possibly echo) frames, and transactions would be handled by
sending a frame, and either hamlib or the user would have to
poll/get_next_event/wait for a callback to get the response.  This would require
either a separate thread, api change, or some odd code.  And I don't see this
as needed unless someone is afraid that the transaction processing time (send,
radio processing, response) is going to be too long to be inside hamlib the
whole time and/or the radio processing time is long enough that you want to
allow someone to send a command to a different radio on the bus in the mean
time.  (any radios with ethernet ports yet?)

(Maybe I'll do some testing of this one of these days:
 a. slow laptop
 b. put radio on old serial terminal server port (not enough serial ports on
    laptop to talk to both tnc & radio)
 c. ci-v bus at 1200 bps.
and I'll see if it's slow enough to cause problems.  I'll have to get the
terminal server from a friend.)

(did just realize that we do need to be prepared for a tranceive frame while in
icom_transaction (with no collisions):
 a. send command
 b. receive tranceive frame from same or different radio
 c. receive response to command
)

Don't know what you want to do about multithreaded apps:
big hamlib lock?
ci-v bus lock?
use hold_decode as the ci-v bus lock?
tell people to only call hamlib from one thread?
create a separate hamlib thread to do the work?

> PS: BTW, you must know the excellent Ekki's site at the following URL:

Yes, and I mentioned the other day:

Note 1:
   Old Icom CI-V docs at: http://www.chasque.net/franky/ci5.pdf
 These:
  a. describe older radios like the R7000
  b. describe some things that Ekki's site doesn't, like collisions/jamming

Re: [Hamlib-developer] Proposed patch for tranceive hanging

Library to control radio transceivers and receivers

Re: [Hamlib-developer] Proposed patch for tranceive hanging