I'm trying to write an application that can compile video recordings out of remote VNC/RDP sessions. I have found one other relevant topic in this discussion forum, which briefly discussed the same issue. But the suggested solutions don't really work in my case.
Anyway, I started experimenting with guacd using telnet, I'm trying to send some client instructions to see how the protocol works in practice. However, I failed to complete the handshake phase.
After sending the first client instruction "select".
The server sends the following reply but closes the connection immediately.
I was running guacd in foreground and the following were the log entries:
guacd: INFO: Guacamole proxy daemon (guacd) version 0.7.0
guacd: INFO: Unable to bind socket to host ::1, port 4822: Address family not supported by protocol
guacd: INFO: Successfully bound socket to host 127.0.0.1, port 4822
guacd: INFO: Listening on host 127.0.0.1, port 4822
guacd: INFO: Protocol "vnc" selected
guacd: ERROR: Error reading "size": Invalid argument: Non-numeric character in element length
From the guacamole user guide, it appears that the server sends the "args" instruction and waits for the client to respond. But apparently in my case, it just terminates the connection. Any ideas on how to debug this issue? Your help is greatly appreciated.
It's closing the connection because of the newline character sent by telnet after the first instruction. It's expecting a numeric character (the first digit of the length of the first element of the next instruction), and a newline character in this position is invalid.
Thanks a lot for the reply. That was exactly the issue.
I am now able to send and receive instructions from guacd. I now have a bunch partial images of the remote host, but could not yet figure out a way to assemble them into complete images.
The general approach that I have in my mind is as follows. Kindly point out if there are any mistakes.
After the initial client handshake, the server keeps sending the png data containing partial images of the remote host, and occasional sync messages, that the client must respond to. My idea is to assemble the partial images into a full image as and when there is a new partial image. Another thread, would periodically (at a rate of 200ms) save the assembled image onto the disc. After all the full images are saved, I intend to use ffmpeg to convert the images along with time-stamps into a video.
I have a couple of questions regarding this approach.
Thanks a lot for your time.
The main flaw I see is that you're only handling the png instruction, and in that case you're only handling the PNG data within it. There are quite a few other instructions: http://guac-dev.org/doc/gug/protocol-reference.html
You'll need to implement at least the drawing instructions. You can check the protocol implementations provided by Guacamole so far to see which instructions are most important (grep for "guac_protocol_send_"), but for the most basic support you'll need "png", "size", and "copy", and you'll have to handle all arguments, including the layer indices.
Other than that, I think this approach is worth a shot.
As for properly handling the PNGs, check the user's guide regarding the png instruction: http://guac-dev.org/doc/gug/protocol-reference.html#png-instruction
Each png instruction provides a compositing operation, destination layer and coordinates, and the PNG data itself. You need to handle all the arguments of this instruction to determine where and how the PNG is being drawn - it could easily be being sent to an off-screen buffer (common in RDP and SSH).
Thanks for the feedback. My goal is to record both RDP and VNC sessions, but for now I have been concentrating on recording VNC sessions on linux. The session recorder operates as a read-only client that understands the guacamole protocol (minimal parts of it). The above stated approach worked fairly well and I am able to create mp4 videos too and this is without implementing the size and copy instructions. But I haven't seen the guacd send a copy instruction yet. So I'm assuming, it is something I might encounter when I start connecting to RDP sessions perhaps?
Thanks for all the help.
Good to hear that it's working so far. If you are successful, this is the sort of thing we'd like to adopt upstream.
You will definitely encounter the copy instruction for RDP (as well as rect and cfill and a few others). It's also extensively used in SSH.
I'm somewhat surprised you haven't encountered it with VNC, but that may just be how your VNC server behaves. I usually see copies used in VNC when windows are moved. Some VNC servers claim to be smart enough to detect scrolling and send that as a copy, but I have yet to see that actually happen.
I have begun implementing additional instructions for RDP hosts as well. From preliminary observation, it appears that the guacd sends the copy and cursor instructions apart from png and sync. I have implemented a small prototype version which can now record both RDP and VNC streams as video recordings. I still have some issues with frame rates and ffmpeg, but that is something out of the scope of the current discussion.
However, I still have one question. During the course of recording sessions, I sometimes encounter a copy instruction with the layer value "0". According to the documentation, a negative value in the layer indicates that it is invisible and it is to be stored in a buffer/cache, for future use. So whenever guacd sends a copy instruction with a negative layer value, I can use the corresponding png data stored in the buffer/cache. But how do I deal with copy instruction when the layer value is 0. In my current implementation, I only store png's with negative layer values. I have been ignoring the copy instruction with a 0 layer value, and it hasn't had any adverse affect so far. But I would like to understand this a bit more so that the client can handle all the instructions that the server sends.
About contributing to the project, I'll be glad to provide the source code once it gets into a reasonable shape. I'm developing this application in ruby, and I plan to package it into a ruby gem.
You definitely cannot ignore a copy instruction involving layer 0. Non-negative values refer to visible layers, with 0 being the default (root) layer. Layer 0 is the only layer guaranteed to be present.
I have sorted the copy instruction with layer 0 and RDP recordings work fine on Windows 2012 (Windows 8 also presumably). However, the application also needs to support slightly older windows, like the Windows server 2008. When I started testing with a Windows 2008 machine, 3 more instructions came up.
rect, cfill, transfer.
According to the protocol reference provided in the documentation, the rect instruction comprises of mask, layer, x, y, width and height. In total 6 arguments. But the following capture shows only 5 arguments with the rect instruction.
Also could you throw some light on how to implement these instructions. The user guide indicates that, the rectangular area needs to be filled with the color specified in the cfill instruction. But since the layer is negative, they are stored to an off screen buffer. When would they actually be written to layer 0. I have found subsequent png instructions with the same negative layer (-5 in this case). Should there be two different off screen buffers for rect and png instructions? I'm a little confused on this aspect.
I am now able to create recordings of remote sessions on Windows Server 2008, 2003 and 2012. On linux, I have tested with X11VNC. In the coming days, I will conduct more tests with different VNC servers on Windows, Linux and OSX. The current implementation is perhaps at a prototype level, with minimum exception handling. I haven't been able to dedicate a lot of time on this, as I'm occupied with other stuff. But I shall dedicate more time in the coming few weeks. In a few weeks time, I shall also publish the source code on sourceforge or code.google.com.
As I mentioned earlier, the programming language I'm using is Ruby and as of now, my plan is to package the code into a Ruby gem. The current implementation handles the following instructions:
select, connect, args, png, copy, cursor, cfill, transfer, rect.
It however does not have video or audio support at this point in time.
Here's a short video recording capturing the remote session. The remote session itself was created using the Guacamole web application.
Wow - looks like a good start.
How is access to the protocol data given to your session recorder?
I will update the version and modify it to wait until the sync instruction is encountered.
How is access to the protocol data given to your session recorder?
Are you referring to authentication involved in providing access to the data? I currently have none. I shall add it in due course. What do you think is the right way to authenticate?
I wasn't referring to authentication specifically. I was wondering how you had set up your recorder to observe Guacamole protocol instructions as they are sent to another connected client.
It was tricky to access the remote session data on Windows (using RDP). I had to create a new remote session and shadow the original remote session as Windows limits the number of concurrent sessions to 2 and does not let two remote logins share the same remote session.
For VNC on linux, it was easy to share the same remote session.
I'm trying to achieve a similar result.
Do you have written a client application that read server session guacd opcode and save the result in video file format?
If so, can you open the application code and give it to the community?