Archive for December, 2011
syslog-ng and the journal
There’s an ongoing project to create a new logging subsystem for Linux, called the journal, by Lennart Poettering of PulseAudio & systemd fame. It is implemented as a core component of systemd, thus has a good chance to be integrated to all distributions that carry systemd: Fedora, openSUSE, and probably others.
The vision and design is described in a paper here.
The reactions to the idea were mixed: there are some good features behind the idea, however it changes a couple of fundamental UNIX traditions. See article and comments here.
Since syslog-ng is also in the logging sphere, the logical question arises: how does this new project affect syslog-ng in the long run?
The short answer to that question is that it’ll probably help syslog-ng, but please read on.
Journald now & future
Right now journald is a very limited syslog implementation that only focuses on local logging: it collects local syslog messages, converts them into name-value pairs, then adds some trusted ones (like the pid, uid and gid) and writes these into journal files for storage.
The idea of working with structured messages in journald is currently limited because of the required application changes: only traditional syslog fields are available, the application message is stored within a single field. The vision here is to add structured logging to applications.
Other sources of local logs are to be integrated too: stuff like login/logout records (wtmp), audit logs and firmware (ACPI) logs.
The file format is interesting, although it is the source of most of the negative feelings: it is a binary format. It is undocumented (for now) with a library in the works to read & write them. The problem most people see that in emergency situations the file may become corrupt, and crucial information can be lost to diagnose the problem that caused the corruption in the first place.
As far as I understand, if applications were to support journal, they’d have to write records to the journal through the API without involving journald itself. This means that the file structure must support inter process synchronization, or that each application would log to different files to avoid that (using UUIDs for instance).
Network transport of the journal doesn’t exist yet, the vision seems to be that the journal is structured on the disk so that journals from multiple hosts can be merged simply by copying them to the same host. Ideas such as rsync or NFS mounts were floated as solutions to the transport problem. This is modeled after a bit like a git repository, which is great for storing source code, but may not be as great for log storage.
Further processing of logs is not in the cards, e.g. journald would take and store logs. In order to support higher level processing, applications would need to be modified and external tools to process logs would be needed.
As it seems, journald leans towards using a distributed model: each host has its own journals and whenever you want to look up records, you go back to the source and tries to address security concerns via cryptographic means (like using a chained HMAC for stored log messages).
Comparing to syslog-ng
syslog-ng has become much more than a syslogd in recent times. It supports structured messages similarly to journald (name-value pairs), can even extract such values from unstructured messages (db-parser), and can store stuff into much more than text files: MongoDB keeps the structure and provides indexed access, but output as JSON objects into simple text files is possible too
With syslog-ng (unlike journald), the user is in control: she can influence the storage policy, use whatever files, databases she pleases to store data. This is flexibility on one hand, but can be a problem if one wants to create tools that universally work with logs out-of-the-box.
Some users store messages in /var/log, others in /logs, yet others use MongoDB for their log storage. Writing a GUI application that works with log data out-of-the-box without having to specify where these are stored is next to impossible. The syslog model is to use specialized tools for each of these storage mechanisms and home-grewn scripts to do site specific processing. The reason is simple: use-cases are so different (from mail logs to financial transactions) and log data so voluminous (in the 100TB scale) that one size doesn’t fit all.
syslog-ng is more like an infrastructure for the actual, potential per-site log processing needs, journald is a complete system for a more limited use-case, which may just be enough for a number of users.
The security model of traditional syslog is to get logs off the potentially vulnerable system, leaving as small window for potential modification as possible. Certainly sending out a hash value periodically is possible with the journal too, however the actual log messages could be lost if the source system is compromised. This way syslog leans toward a centralized system, which itself can be distributed by using trusted intermediaries that store log data in need, but _independently_ of the source host.
Currently the only feature that journald has over syslog-ng is ‘trusted fields’, e.g. the fact that journald can determine the actual uid, gid, pid… values, making it more difficult to forge messages on the local host. Although these are handy, I didn’t see the need for these in practice in the past 13 years I’ve been maintaining syslog-ng, and I knew about the possibility to get this information from the kernel. Anyway, adding this to syslog-ng is not difficult, probably an hour or two and I may just do that to demonstrate. rsyslog has rushed to implement these probably for the very same reason: http://blog.gerhards.net/2011/11/trusted-properties-in-rsyslog.html
The other planned log sources are partially supported by syslog-ng too, for instance BSD process accounting logs are supported directly (http://bazsi.blogspot.com/2010/07/syslog-ng-and-process-accounting.html) and similar support can also be added for the others, if not already integrated with syslog.
What Next?
I think that journald can become great for computers where logging is not the primary function. If the user has never changed her syslog.conf file, journald would provide much more for the user out-of-the-box, than does the current syslog. These are:
- proper logging under the boot process
- integrated to the service manager, easing troubleshooting for failed services (saving stdout and stderr)
- GUI application for ad-hoc checking of logs
- the ability to programmatically query logs without having to care about site-specific policies (how log files are organized for instance)
For those cases where logging is important, mandated by regulations or operations for a heterogenous enterprise system journald will probably not be enough. Not enough even if the whole vision is accomplished. As I see these features will not be adequate in journald:
- off-system storage for a long period (1-5 years is mandated by various regulations)
- on-line log collection for getting the message off the potentially vulnerable system as fast as possible (being late at most a minute is acceptable, but hourly syncs are not)
- performance for on-line collection and storage (I’ve seen requirements for handling up-to 250k msg/sec)
- interoperability with syslog: not just receiving and storing but also preprocessing, normalization & classification.
- existing standards
- open for on-line integration with home-grewn processing tools and
- SIEMs
These are the benefits how syslog-ng can win from journald:
- structured logging in applications: if these would actually emerge, syslog-ng would be there to support them too
- syslog-ng as an application: currently syslog-ng is used as a system component, replacing syslogd, which drives some features that may not match the primary vision of syslog-ng itself. If local logging would be taken care of journald, syslog-ng could focus where it is best: collecting, preprocessing, normalizing & classifying logs, including the ones in journald.
Journald can mean that Linux boxes would probably be installed without a full-blown syslogd by default.
As long as interoperability with a syslog application is a goal for journald (and I fail to see it won’t be the case), syslog-ng can happily coexist with the journal and can itself leverage all the benefits that journald brings. Journald will only replace syslog and syslog-ng in a limited use-case, which is not a primary focus for syslog-ng.
Currently, syslog-ng is not a default choice for the majority of distributions, which means that right now one needs to explicitly install syslog-ng over the default. This will not change much by the introduction of the journal, except the fact that the current default is a more direct competition for syslog-ng. If journald replaces that role, the playing field would be leveled somewhat.
