Archive for November, 2010
A regular reader of this blog may already have heard about patterndb, a collection of syslog-ng db-parser() rules that will make syslog-ng the center of the universe
OK, I was joking, patterndb is a collection of log samples which make it possible to do more to your logs than merely processing them: For example if you are interested in login failure events, searching for these in simple log files you usually find on a central log server these days might be daunting. Each and every application will log this event different, and simply grepping for this event is unfeasible except in really trivial cases.
Here comes patterndb, which contains patterns for a lot of different applications to recognize this (and other) event and turn it into a unified message, something that is easy to recognize and thus search for.
Until now, the patterndb project aimed at creating its own dictionary & taxonomy to categorize events, simply because there was no such thing when we started.
However times change and the “Common Event Expression” (or CEE for short) project seems to have made some progress. That projects aims at defining a generic dictionary for event fields, which would come quite handy to patterndb land: instead of doing a lot of work to define our own, we can use their results. Since this project is backed by the US government, we have a good chance that it will be adopted by the industry.
So now, patterndb is going towards using the CEE results and we even have converted our previous patterns in the past few weeks.
Please read Peter Czanik’s blog post on the topic for more information.
After 2 alpha and a beta release I’ve decided to declare that syslog-ng OSE 3.2 is now stable, and thus I’ve released 3.2.1, the first in the 3.2.x series. This version has the largest list of features even since the syslog-ng project was born, so make sure you check out all the goodies.
The key features were already covered on this blog in earlier posts, the most important ones in my opinion are:
- licensing change to LGPL/GPL combo, without the need to sign a contributory agreement
- log message correllation support in db-parser
- automatic pattern generation for existing log files with pdbtool patternize
- template functions
407 files changed, 50305 insertions(+), 36487 deletions(-)
Considering that OSE 3.2 has 70k lines (more specifically 70163) in its git repository, this is quite a lot.
The list of changes are summarised in the changelog here:
The updated documentation is also available here:
If you are a regular reader of this blog, you’ll probably know that syslog-ng is now entering the log message processing scene with its db-parser functionality. In order to improve our pattern coverage, Peter has started a log sample collection initiative. Please help him with good quality samples so our login/logout coverage becomes significantly better.
Here’s the blog post where he describes what he’d need in order to proceed:
Thanks for helping him.
You probably know that during the 3.2 development series a lot of functionality has been added to db-parser() (aka patterndb). All of this functionality was upward compatible with the old XML file format, so at first I’ve decided not to change the patterndb version number, it remained at v3.
However, after a talk with Robert, our documentation maintainer, he convinced me to bump it to v4. I’ve now added checks to syslog-ng into v3.1 and v3.2 to actually verify this. v3.1 will only accept v3 formatted files and complain otherwise. v3.2 will accept both v3 and v4.
I also added an XML schema to cover this format.
You’ll find these checks in v3.1.4 (too bad I’ve just released v3.1.3) and in v3.2.1.
Although I was not posting on this blog, I was working on syslog-ng multi-thread support in the last couple of weeks. Most of the preparation was done during the Netfilter Workshop (I know it wasn’t netfilter related and I’ve since used up any possible occassions to work on the code instead of writing about it.
I’ve decided that instead of using a per-connection thread model, I’d like to use something that’d keep the number of threads close to the number of CPU cores to avoid bad cache effects and context-switch overhead. Since syslog-ng may serve thousands of clients, the per-connection thread model would have meant that we’d have thousands of threads, so I gave up on that thought.
In my current architecture a single thread would be watching the file descriptors for events and a set of worker threads would perform the work. Since most of the time, most of the fds are idle, this would definitely use a lower number of threads and if I’m smart enough the same thread would be dispatching for the same client, which means that our cache would be hot by the time the 2nd round of events are coming in.
Also, since the model of GLib’s main loop is inherently slow, I’ve decided to switch away from it. GLib uses a linked list of GSource objects, which are iterated _twice_ every poll iteration, once for the prepare() phase, and once for check(). In case we’re polling 1000 fds, that’s a loop over 2000 items, which, if done thousands of times a second, poses a serious overhead.
Getting away of GLib is not easy, since a lot of logic is implemented in those prepare/check callbacks, but anyway it was an aim worth pursuing.
Also, as an added bonus I wanted to use faster kernel interfaces instead of poll. Linux has epoll, FreeBSD has a kqueue based implementation, Solaris has /dev/poll, these all advertise themselves as much more performant than the traditional interfaces.
I was both looking at alternative main loop libraries and also thinking about rolling my own. Here are the ones I’ve considered:
- I’ve immediately closed out C++ libraries.
- libevent: probably the most widespread availability, but I didn’t like the API too much.
- libev: has a libevent compatible API and a lower level one. This API was better, but reading through the CVS history, I didn’t like the stance of the primary author to things like AIX support and the like.
- ivykis: has the best API, quite Linux kernel-like, lightweight enough for me to confidently navigate in the code or to modify it. Has nice thread integration, with worker thread polls and cross-thread calls. Not really available anywhere, so syslog-ng will probably have to carry a copy.
Right now, ivykis is the winner, but I don’t know about its real portability (uses __thread keywords for example, which may not be available everywhere). So changing this still has a chance.
I’ve quickly gave up the idea of rolling my own similar implementation, doing this for all the various kernel interfaces correctly would be too much work.
Even with a single client and one destination without using threads, I could measure about 10% (55k msg/sec -> 60k msg/sec) performance increase on my development laptop. This is certainly promising.