Bazsi's blog

Guarding Your Business

Archive for September, 2010

syslog-ng now supports the syslog.conf file format

Thursday, September 30, 2010 @ 12:09 PM Author: Balázs Scheidler

People complained that syslog-ng is not a drop-in syslogd replacement and you have to learn a new configuration file format. Although I really think that syslog-ng’s way is superior to the old syslog.conf style, it is true that for someone not familiar with syslog-ng, the syntax of the configuration is something that needs to be learnt.

Courtesy of Jonathan W. Marks who contributed a syslogd -> syslog-ng translator back in the day, I’ve commited an SCL plugin that can be used to convert syslog.conf file to syslog-ng configuration without any manual hassles. SIGHUP and the like immeditately work well, as it is the syslog-ng config parser which drives the conversion.

It is not meant to be used for anything more but migration, but for that it can be handy.

You still have to create a static syslog-ng configuration file though, but that’s static and looks like this:

@version: 3.2
@include “scl.conf”

syslogconf()

So now users have a choice which one they prefer.

patternize is in the mainline

Wednesday, September 29, 2010 @ 01:09 PM Author: Balázs Scheidler

Just a quick post to let you know that I’ve integrated the gyp’s patternize patches, so if you check out the latest greatest revision from git, patternize will be included.

To those who doesn’t know what patternize is about, it is an implementation of Risto Vaarandi’s SLCT algorithm by my fellow collegue Péter Gyöngyösi.

I’ve also fixed a couple of memory leaks and decreased memory usage a lot, so you might want to try this, if you were experimenting with the older version.

For those who don’t know what SLCT is about: it takes a logfile and automatically generates patterns that cover the contents of the log file. Some manual labour is still needed to process its output, but I’d say patternize does about 80% of the job. To use it:

$ pdbtool patternize /var/log/messages

There are a few configuration knobs, but it’s usually as simple as that.

Syslog-ng correllation

Wednesday, September 29, 2010 @ 09:09 AM Author: Balázs Scheidler

I think we’ve reached an important milestone with syslog-ng: log message correllation was added to db-parser(). As you probably know dbparser and its sister project patterndb is able to transform unstructured syslog messages into a normalized format: the human readable string content becomes a set of name-value pairs. The problem is that in a lot of cases messages miss one or two details that would really be needed to understand them and this information usually comes in a followup message.

For example: one message in postfix logs contain the sender address and while the recipient information comes in the next message. It is trivial to understand that in reality most cases you want the information in sender,recipient pairs. Another example is sshd, where the authentication failure comes in one and the exact reason for the failure comes in the next.

Currently what you can do with syslog-ng is to put the separate messages into two SQL tables and join them at query time. This gets ugly quite fast: increased storage needs, the hassle with managing two tables instead of one and not to mention the increase of the time needed to query the database. Sometimes the sole reason for creating SQL tables in this case is to perform the correllation, otherwise you’d be happier with a CSV file.

And that’s what became possible now with the latest git commit of syslog-ng 3.2. The idea is simple: when a patterndb rule matches, you can tell syslog-ng to remember that message by adding it to a correllation state. This state is identified with information extracted from the message making it a unique session identifier. When the next line comes in you can reference the information stored earlier.

Basically the correllation state is a list of log messages associated with a session id. To add a new message to this state, you need a store rule:

<rule id=”…”>
<patterns>
<pattern>foo session: @STRING:sessionid@, param: @STRING:param@</pattern>
</patterns>
<store id=”$sessionid” timeout=”60″/>
</rule>

The id attribute of the store element specifies a template containing any syslog-ng name-value pairs, probably extracted from the current message itself.

When the final information comes in you can use the join attribute of the values tag:

<rule id=”…”>
<patterns>
<pattern>bar session: @STRING:sessionid@</pattern>
</patterns>
<values join=”$sessionid”>
<value name=”param”>${param}@1</value>
</values>
</rule>

here the join attribute specifies the session to look up (which must match in the two messages), and if there’s a match all messages stored in the correllation state becomes available when evaluating the name-value pairs associated with the current message.

The key here is the new syntax in the template string “@1″ appended to a name-value pair reference. After the “@” character, you can reference a message in the correllation state by specifying the index backward from the current message. This way @0 is the current message, @1 is the one prior to the current one, @2 is before that and so on.

There are more complex ways to use/query the contents of the correllation state, but those will appear in a followup post. Stay tuned!

syslog-ng Open Source Edition roadmap updated

Monday, September 13, 2010 @ 12:09 PM Author: Balázs Scheidler

I’ve updated the syslog-ng OSE roadmap in the BalaBit webpage. You can find it here http://www.balabit.com/network-security/syslog-ng/opensource-logging-system/features/roadmap. The Premium Edition roadmap has also received an update some weeks back, check this page if you are interested in that.

Introducing template functions

Sunday, September 12, 2010 @ 02:09 PM Author: Balázs Scheidler

The 3.2 git tree received yet another feature I still wanted to shove into the 3.2 final: template functions. A template function is a transformation: it is able to modify the way macros or name-value pairs are expanded.

For example, let’s assume that you have a log file where you want to write CSV formatted data, which you could do with the following template:

template d_csv { file(“/var/log/data.csv” template(“$DATE,$HOST,$PROGRAM,$PID,$MSGn”)); };

This works fine until you discover that the CSV format can be broken if one of the expanded values contain a ‘,’ which is normally used to separate values. The CSV format solves this by quoting the values or by escaping the offending characters, e.g. you’d have to write something like this:

template d_csv { file(“/var/log/data.csv” template(‘”$DATE”,”$HOST”,”$PROGRAM”,”$PID”,”$MSG”n’)); };

This would solve the comma problem, but introduces another: the quote character can also be present in the macros expanded above, which if happens to be the case will also spoil the nice CSV format. Lucky us, the authors of syslog-ng had one little hack for this problem for a long time now: template-escape:

template d_csv { file(“/var/log/data.csv” template(‘”$DATE”,”$HOST”,”$PROGRAM”,”$PID”,”$MSG”n’) template-escape(yes)); };

This causes syslog-ng to automatically escape the apostrophe, the quote character and the backspace whenever it is encountered in the expanded string. But please note that it is only escaped when escaping and not in the literal portion of the string. However what happens if you don’t like the CSV format or you have different escaping rules? Or perhaps you only want to increase the size of your output file when the quotes are absolutely necessary? You could perhaps use rewrite rules, but that has some drawbacks: it increases the size of the log message and it is inconvinient to use. Until now, those were to only options. With a template-function however, the solution is easy:

template d_welf { file(“/var/log/data.welf” template(“time=$(escape-welf $DATE) host=$(escape-welf $HOST) program=$(escape-welf $PROGRAM) pid=$(escape-welf $PID) msg=$(escape-welf $MSG)n”)); };

Of course it is possible to add new template functions via the new plugin mechanism. Right now it is more of a framework and a syntax embedded in templates, and only the “echo” function is available, but I expect growth in this area.

In case you are interested this patch in the git implements template functions and this one adds plugin support on top of it. Adding a new function is simple, just look at the file named builtin-tmpl-func.c in the source tree.

patterndb homepage

Tuesday, September 7, 2010 @ 10:09 AM Author: Balázs Scheidler

Since our new website with a wiki engine has launched (finally) I started to write the patterndb project homepage, which you can find at http://www.balabit.com/wiki/patterndb. From a set of links there’s also a new article describing how to deploy the patterndb rules in a syslog-ng installation. Hopefully it’ll make experimentation easier.

patterndb classification

Friday, September 3, 2010 @ 09:09 PM Author: Balázs Scheidler

As you probably know one goal for patterndb is to implement message classification.

E.g. in addition to extracting information from log messages, it also associates a “class”, later available in the “${.classifier.class}” value.

Right now, syslog-ng doesn’t really care what this string is. But the XML schema validating patterndb file lists the following four classes (taken from the logcheck project)

violation     – security violation
security      – other security events
system        – system information
unknown       – no rule matches

One one hand, the tagging functionality (e.g. the ability to also associate tags with each message) is superior to classes.

On the other hand, all tags are equivalent, thus if a message has 5 tags, then currently syslog-ng only provides functions to _filter_ based on tags, but not use it as a macro.

So for example it is possible to do:

file d_class_files { file(“/var/log/messages.${.classifier.class}.log”); };

But it is difficult to do with tags (except for using filters and different destinations), as there’s no such functionality. Another problem is that tags/classes are completely independent, in order to filter on the class of the message, one would have to use a match() filter like this:

filter f_class { match(“violation” value(“.classifier.class”)); };

My conclusion is that classes are better when used in templates, tags are better when filtering. The two should be merged somehow.

So I’m thinking on how to move forward. Here are the alternatives I’m considering:

1) the class of the message is always a tag in some generated format (e.g. if a message has class XXX, then a tag named “.class.XXX” would be automatically associated with the message.

This is somewhat cumbersome.

2) the class of the message is created as a tag as well, with the same name as the class.

e.g. we’d have a tag named “violation”, but that’d preclude the use of the “violation” name as a tag.

3) drop the class stuff and implement a macro trick that makes it possible to use tags in macro context

One way to do this:

file d_class_files { file(“/var/log/messages.$(expand-tag-name violation security system unknown).log”); };

The “expand-tag-name” macro function would try to look for the tags listed as parameters, and if the message matches it’d expand to the tagname.

This is not intuitive and if someone wants to use such an expansion in a lot of templates, it is also irritating and difficult to get right.

I’m leaning towards option number 2 above.

On an independent matter, the set of classes may need some thought. As I said the original list is borrowed from logcheck, but I think it probably needs to be expanded. Last time I got patterns for DNS queries, and although I could shove them into “system”, right now I feel that the point of classification is to categorize events by “importance”, in a similar spirit to syslog severity, but one that works even if the application developer uses a bogus severity when sending syslog messages.

So one email, two questions, feedback appreciated.
Thanks.

multi-threading coming in syslog-ng OSE 3.3

Wednesday, September 1, 2010 @ 10:09 AM Author: Balázs Scheidler

As posted on the mailing list already, I’m planning to turn syslog-ng into a fully-multi-threaded application in order to improve performance on multi-core systems. Since I don’t want to start destaibilizing 3.2 (rather to push dub that as stable soon), this will become part of OSE 3.3.

During my holidays I’ve worked a little on this to get some actual numbers how much it improves performance. Thus I’ve added the bare-bones of threading to all input/output state machines, which were previously all running in the same “main” thread.

The code is _very_ experimental, it wouldn’t work with more than a single client, but this already shows that using a single client and a single destination file, it gives us about 50% performance boost. On my development laptop with X and every desktop shinyness running, it increased performance of syslog-ng from about 65k/sec to 100k/sec with flow control disabled and 90k/sec with flow control enabled. (the extra 10k was probably dropped without flow control because the output thread was probably slower writing out the stuff to disk, but I’m not sure).

If anyone wants to see what I’ve done so far, here is the commit:

http://git.balabit.hu/?p=bazsi/syslog-ng-3.2.git;a=shortlog;h=multi-thread