Following my last rant about configuration management, I've had a closer look at Salt, both for my personal use, as well as for a client who would like to deploy something that is not Puppet.
Salt has some very good ideas, for instance:
-
Configuration is just data in YAML format, which makes it really easy to whack up stuff — and also to generate it using other means;
-
There's a lot of plugability: runners, renderers, states, modules, returners, outputters, all of which can be replaced with custom implementations;
-
It builds on remote execution. In fact, in the docs, they talk first about remote execution and then about configuration management. Not only does this makes a lot of sense, it also gives the admin the additional benefit of having quick, remote access to all hosts in the infrastructure. I like that as it means I don't also have to deploy a parallel-execution solution.
-
Salt supports both, a push and a pull model for configuration management. In fact, the push model is part of its remote execution base and it's implemented through persistent connections from the clients to the master. This neatly solves all problems relating to firewalling and NAT, and it also means that the master always has a pretty good understanding of who's around. It feels good to me. And there is no NIH going on with respect to pulling: there is no daemon, you have to use
cron. Good.
But there are also a couple of downsides to Salt:
-
The default templating engine is Jinja. While that may be an excellent choice for applications in inherently insecure environments, such as the Web, they are quite painful in the context of Salt: no multi-line logic, whitespace control is a nightmare, and a completely unfamiliar set of conditionals and filters, rather than access to e.g. Python. This is only the default, and mako is an alternative, but it seems like a poor choice and that does not instill confidence.
-
I don't think that the method of targeting hosts that Salt uses is scalable. I don't want to intersperse my configuration with if-then-else statements referencing hostnames (what would that mean if a hostname changed or a new host was added?). Instead, I want the configuration to be fully parametrised and assemble the parameters from a node database, ideally a hierarchical one. I have a solution for that which works with Salt, called reclass.
-
I don't think much of "pillars", which are Salt way to provide random data (parameters) to nodes, again because data needs to be targeted at hosts, rather than defined for a host. This is also solved with reclass.
-
Logging and error reporting in Salt are abysmal, inconsistent, and useless. This is offset a little bit by the ability to use pdb, but it's not ideal.
-
Exceptions are not properly handled, the file descriptors aren't closed when daemonising, children are not reaped as they should, and the exit code will almost always be zero, even on failure.
-
State requirements are not enforced where they are required, but only come into play when a higher order function is called.
-
The code reinvents module loading and does that in very complex ways, while putting tight limitations on what's possible. State functions and modules cannot be namespaced, for instance, nor can they easily make use of each other. There is also no way to quickly define macros without resorting to the templating language (and thus a completely different syntax and paradigm), and the concept of having modular, self-contained states (e.g. one for
sudo, one forSSH, etc.) with all their dependencies in one hierarchy is not really part of the design. Also, it's not trivial for a state definition to export an interface for other states to use. -
While we're on the topic of code and design, there's a lot of duplicate stuff in need of refactoring (e.g. saltutil._sync, and it seems that this even causes a lot of crypto work to be done over and over again. It also doesn't help that
salt-call, which is used to execute commands on the clients (e.g. pull configuration), works completely independently of the daemon process running on each client (although the same code base is being used). -
And why aren't state and execution modules implemented as classes? Instead, the module is itself treated as an entity, and a home-cooked "virtual" system is used to determine which platforms support which modules. Object-oriented programming principles gives you all of that, and more.
-
Fortunately, the Salt authors didn't cave in and write the entire communication layer themselves. Instead, they employed ZeroMQ, but it's still quite common for a client to become unreachable (permanently), e.g. due to an IP address change, or networking problems. What's worse is that the master does not keep track of who's listening.
-
While we're on the subject of standing on the shoulders of giants, Python's
msgpackapparently cannot handle Unicode. Yay.
Those are the big issues. There are many small issues too, but those won't be around for too long as the project is moving along quickly and the community is vibrant. This is surely an important point that speaks for Salt.
However, the above issues seem to hint at design choices that might well turn out to stand in the way later.
Following a day of frustration, I now feel the overpowering urge to write my own configuration management system, because of course I feel that I could do it better than everyone else. Does this sound familiar to you?
Let's just say — hypothetically — that I would, then I'd want to reuse as much existing functionality as possible. For instance, I'd want the entire remote execution framework to be independent from any configuration management implemented on top.
So what does this mean? What would such a remote execution framework need? Here are some thoughts:
-
I'd want to keep it the way of Salt, i.e. the clients maintain persistent connections to the master, and the master regularly pings the clients, for housekeeping. And the clients know to expect such pings at regular intervals, so can reset themselves in case they don't hear from the master. Or scream.
-
Of course, authentication and encryption need to be part of this. Ideally, key and roster management are already available in the tool that's being reused. I don't want to have to have a new PKI for this…
-
The protocol would probably be something like XMLRPC, with an extensible list of modules on the client to do the work. Data would be standardised to JSON-format.
-
Asynchronous execution would be a plus. I am even tempted to say that it should be all built asynchronously, even though I don't really know a use case for this.
-
The clients should be able to feed back information to the master, where data can be accumulated, allowing for cross-node configuration. This could be implemented using master-side polls to keep the protocol easier.
Doesn't this sound like a Unix botnet to you? ;)
I could imagine whacking this up with a bit of Python, some
shell glue, socat and SSH: the server would have an
authorized_keys file with forced commands connecting
the client to the server process via sockets.
Or I could imagine using twisted for that.
But I would prefer if something like this already existed. Anyone?
Comments are broken on my blog, and I cannot be bothered to work on them. If you have any input, please write to me. I will (eventually) condense all feedback into a new article.
NP: Mouse on Mars: Parastrophics

