[Clockwork-developers] More on security
Status: Planning
Brought to you by:
jlouder
|
From: Joel L. <jo...@lo...> - 2003-01-26 20:36:57
|
Shawn McMahon and I had a detailed discussion on security on Thursday. While I don't have a handle on everything involved yet, I do think we made some headway in the area of security. Here are the details of what we discussed. (Shawn may have to correct me if I misstate anything.) When an administrator sets up a schedule (or an "instance," to borrow from AutoSys's terminology), one of the things that will be done is the creation of a public/private key pair for the schedule. The scheduling server will keep the private key (secured by appropriate filesystem permissions), and all the clients (both GUIs and the agent software that runs on the managed nodes to make them valid targets of jobs) will get the public key. Shawn explained to me that the keys are symmetric, meaning that you can either encrypt a message with the public key and decrypt it with the private key, or you can encrypt with the private key and decrypt with the public key. First, how to make GUI clients (and command-line client programs used by users) secure. As I mentioned before, I don't want to have to store user and password lists inside Clockwork, and would like to at least give the administrator the option of authenticating against something else. But this means that the password from the GUI (or client program, rather, since it might not be a GUI), will have to be sent to the server. We can't do a challenge-response kind of thing, since if we're going to pass the password off to some other backend we need to actually know what it is. So the communications channel needs to be encrypted so we can safely pass the user's password along. (I suppose the administrator will need to trust that we're not secretly recording all the passwords. If he has doubts, he can just read the source.) When the client program and the server talk, they use public key encryption to secretly exchange a "session key" that will be used to encrypt the rest of their conversation. I suppose they could just use their existing key pair to encrypt the entire conversation, but Shawn said that's way slower than "secret key" encryption methods, so most people use the slow public key encryption to securely exchange a secret key, which they then use to encrypt the rest of their conversation. Now that the client and server have an encrypted communications channel, the client can prompt the user for his username/password and send it to the server without fear that it will be snooped or replayed. Some client programs (for example, a command-line program to send an event or to display job history) may be more useful if they don't ask for a password interactively, so the administrator can set up a web page that runs those commands for instance. In that case, we can make it so that the client programs can take the username and password from the command line or from a file. This will make things flexible enough so that the administrator can do whatever he wants with the client programs. While we don't want to force the administrator to create a list of Clockwork-only usernames and passwords, some people might want to do just that. To make things easy for them, we could make one of our authentication backends check against an internal database. Berkeley DB provides an easy way to encrypt a database, too. Other backends I have in mind are PAM (for the UNIX folks) and calling an external program (for those who want to use neither PAM nor the internal list). That should satisfy just about everyone. The other part of security is the communications between the server and the agents (the piece of Clockwork that runs on every machine that is able to run jobs). Obviously, the agent must be able to tell that commands are coming from the real scheduling server, or this would be a huge root exploit. So when the scheduling server sends commands to the agents, it will encrypt them using the schedule's private key. If the agent can decrypt the message using the schedule's public key, then it knows it's a valid message. I suppose we could make this communication not encrypted, but just with a digital signature in it, but it seems awful easy to make the whole conversation encrypted, and then there are no issues with someone snooping secret data (although it's probably not very important data). That's it for the server-to-agent connections, but there will also be agent-to-server connections as well. Just like in AutoSys, if a job is going to run for hours, we probably don't want to hold the TCP connection open that long, just so that the agent can say at the end what the job's exit code was. So the agent will have to initiate a connection back to the server (or, a server, since there may be multiple servers) to provide the exit code. In this scenario, the server needs to be able to make sure the system that just connected to it is a valid agent. So the agent will encrypt the message with the schedule's public key (which all agents have) and the server will decrypt it with the schedule's private key. For added security, we can allow the administrator to set up a list of agent IP addresses/networks. The incoming connection can be checked against this list, too. For even more security, the incoming connection can be checked against the IP address of the system where the job ran. There's little reason that anyone other than "hosta" should be reporting on the final status of a job that ran on "hosta." But this level of security might be difficult to manage, since most hosts have several IP addresses, and the one the server connected to to get the job started might not be the one all the traffic out of the system goes across. What I haven't addressed here are: (1) Authorization. We've checked to make sure the users are who they say they are, but how do we know who should be allowed to do what? I was thinking of a three-level access model: administrators can change things, operators can only stop/start/hold jobs, and guests can look but not modify anything. We may need to apply these to something more granular than the entire schedule, since it may be useful to have a set of users who can only start/stop the backup jobs, or can only modify a certain set of jobs. (2) How multiple servers work with authentication. My examples above talk about multiple agents and one server, but we know there will be multiple servers. My first thought is for all servers to have a copy of the private key. I think this would still be secure, and would allow everything to continue to work as I described. (3) Ease of maintenance. It needs to be easy for an administrator to change a system's IP address. It's okay if we have this stored in our databases somewhere, as long as we provide an easy way to change it when an host's IP changes. Honestly, though, I'd prefer that we not be on the list of things an administrator needs to do when he changes IP addresses. Maybe that level of security could be optional. Also, the administrator needs to be able to change the schedule's key pair if it's compromised. It would be nice if this could be done without stopping the servers (outages are bad, right?). If we make it difficult to change the key pair, the administrator will be less likely to do it. Comments, anyone? -- Joel |