Archive for August, 2010
Holiday
I’ve spent the last 1.5 weeks at lake Balaton with my family as holiday. That’s why i was missing from the syslog-ng mailing list and from this blog. I’ll try to finish my backlog in the coming days.
pdbtool test improvements
I have added some more functionality to “pdbtool test” which I needed while working on the official syslog-ng patterndb patterns. It now can process several pdb files in a single invocation and also it is now able to validate the patterndb XML files against the official schema.
This is the shell command I’ve used:
$ pdbtool test --validate `find . -name *.pdb`
If you compiled the alpha2 release, this is only one patch on top of that, so it should be simple. You can check out the patch here.
syslog-ng & open core, why we think it is different
As you may have seen in my last post, syslog-ng was quoted as an example of “open core” development model, which has problems in the eyes of some Open Source purists. I was trying to do my homework and understand their point of view and see what we can do about it. If you are interested, you can read these articles:
- Dave Neary’s Rotten to the Open Core
- Simon Philip’s Open Core is Bad For You
- Henrik Ingo’s So if I don’t call myself ‘open source vendor’, then everything is fine? (yes)
People who know me either personally or through the syslog-ng mailing list would probably know that I by no means want to publish something substandard or cripple the syslog-ng open source functionality in any way. I wanted to start with this statement as the way I see there’s only a small and subtle difference between an “open core” and a “true open source” project. I’d like to explain why I feel that syslog-ng is still a “true open source” project.
The most to the point explanation why “open core” is bad I saw thus far was: “The question this boild down to is: Does the Open Source version exist because it’s a nice selling point or does it exist because someone believes in the values of Free Software?” [link]
I’d say that both is true for BalaBit: we do believe on Free Software and we do publish a lot of that, outside syslog-ng as well (our TProxy work was integrated in kernel 2.6.28, various libdbi/dbi-drivers patches, iptables-utils, jailer, restrict and a couple of others, read this page for more). Some of these are more successful, others less so, but I guess the same applies to other free software hackers too.
So why I think there’s only a subtle difference between an “open core” and “true open source” development? Because at the fact-side, they look _very_ similar. Some quoted “true open source” projects (Apache, rsyslog, JBoss, Tomcat) and probably others support commercial development on top of them: their “core” license is compatible with commercial licenses (either LGPL or something else). The difference seems to be the very existance of a commercial offering from the same author.
So while syslog-ng 3.2 will have the same licensing structure as rsyslog, some still say that syslog-ng is not a true open source project. Again, the root cause for that is the existence of the syslog-ng Premium Edition package. I was trying to understand the logic behind this stance and came up with the following list of reasons (not necessarily applying to syslog-ng, but more on that later):
- with “open core” development, products usually fork the Open Source part of the project for their internal development and the flow of bugfixes and features on the open source portion happens in a closed way as a code dump
- the development is driven by the commercial edition, changes are trickling to the OSE version with a delay
- because of this setup, the community has no say in the features or in the way those features are developed
- because of this setup & without extra effort the open core part can quickly deteriorate
- the neglection of the open source portion can go as far as being completely unusable (that’s why the word crippleware)
- this process may not be visible right from the start, Open Core development may look like a true open source at first, but this can happen with time, effectively pushing users to pay money when the OSE version deteriorates and the replacement of a core component of a system would be more expensive
- when a competing open source solution is contributed, such a contribution is not accepted into the upstream project, because that would compete with the commercial edition.
The way I see it, the line between “open core” / “true open source” is very thin and also difficult to judge for an outsider. These are the reasons why I think that although syslog-ng does have a commercial edition, it is still an open source project.
Code base fork
Right now the code base for the OSE/PE editions are in different repositories. However one reason of the licensing change in 3.2 is to eliminate this and use the same core in both the PE and the OSE editions.
It must also be added, that the current setup was created because of community needs: in the early days, only a single, internal repository was used and the OSE edition was exported as a tarball every night as a snapshot. As the lack of a real version control system was a problem for the community we came to the currently used setup: two git trees, patches synchronized between them. This is of course much more work, but we volunteered to do that.
Development drive
Right now, the development on syslog-ng happens in three parallel teams. They each create original work. These are:
- the syslog-ng Store Box team,
- the syslog-ng Premium Edition team and
- the Open Source team.
The method used to synchronize work is as follows:
- bugfixes are done in the code base the bug report is coming from
- bugfixes in parallel branches are synced as the first task of every maintenance release
This means that it is true, that with PE related features OSE is behind, but this is also symmetric: OSE developed features take their time to propagate into the Premium Edition.
One noteworthy thing is that there are several people inside BalaBit who contribute their free time to syslog-ng development, you can find their git trees on git.balabit.hu.
And the features in the OSE edition are not worthless, they are really important in the future of syslog-ng. Some examples:
- db-parser to parse log messages (in 3.0)
- change from syslog-only implementation to a syslog-independent one (nvtable in 3.1)
- the new plugin architecture (in 3.2)
- syslog-ng configuration library (in 3.2)
- first support to non-syslog messages (3.2)
- and so on.
Community involvement in development decisions
Well, this is a though question. syslog-ng may not attract that many contributors or I myself suck at this, but it has nothing to do with the current PE/OSE division. Even if I was directly asking, I received only little feedback and this was in no way different in the first 8 years (1998-2006) of syslog-ng when there was no commercial fork. In fact, the community seems to be more active these days.
But anyway, since I myself develop the OSE version and it is out in the open (as the git history and my blog posts prove), the community would have a chance, at least in the features that BalaBit itself has chosen to implement. Also, small feature requests on the mailing list are quite rapidly implemented (the last thing I remember from the last couple of weeks is the “condition()” option to rewrite expressions).
With the current architecture/license change, I plan to go back and seek for contributors who didn’t want to sign the CLA whether they’d be interested in contributing their code as a plugin, or perhaps directly into syslog-ng.
Open Core deterioration
I myself worked very hard for this not to happen, and if you look at the mailing list archive, a lot of this time actually was spent from my free time.
Since the project/release overhead of a fix is smaller in the case of the Open Source development, it quite often happened that the fix was first made available as a git commit in the OSE branch. Then integrated tested and released in the Premium Edition as well with a slight delay.
Forcing users to use the Premium Edition
I always said and I believe that the Open Source Edition is a full featured syslog implementation. I’d say nothing proves my word better than that it became and was proposed as the default syslog implementation of various distributions. The existance of the Premium branch even created further features in the OSE, which may or may not have happened without that.
It is perfectly acceptable for us, that someone will stick to the Open Source edition. We work hard to update syslog-ng packages in distributions to its latest version, even though that competes with our commercial offering.
So I wouldn’t say we force them, not on the feature and neither on the quality side of things. The Premium Edition offsers the following added value compared to the Open Source Edition:
- support for a wide variety of Opearing System and CPU combinations (about 27 right now, but about 10 new combinations are in the pipeline for 4.0)
- longer support cycles
- support services
- maintenance & bugfixes
- …. and additional features.
Right now, our feeling is that the additional features is a selling point and without it, we’d be earning less money. With less money, we could allocate less time for syslog-ng. And I’m sure that’d affect syslog-ng, both Open Source and Premium Editions negatively.
Competing features
One more complaint against the “Open Core” business model is that there’s a conflict of interest on the maintainer of the Open Source portion whenever the community is contributing a feature that is present in the commercial edition.
The way I see it and with the advent of the new plugin based architecture, this is not necessarily a problem, since a plugin doesn’t have to be distributed with the main distribution tarball. In fact I expect several plugins that wouldn’t be added to the distribution. For example gyp’s Twitter destination is a nice hack, but I probably wouldn’t include it. Also, a Linux distribution is in the position that a plugin/functionality can easily reach much more users than the syslog-ng packages that we ourselves publish.
But anyway, once a feature is in the scope of syslog-ng (and being a feature in the commercial edition certainly proves that point) and a working implementation is contributed on the syslog-ng mailing list, then I hereby state that we are willing to include it in the main syslog-ng distribution. Of course technical and quality issues do still apply, but if there’s enough support from the community (e.g. on the syslog-ng mailing list) and if at least the plugin names are different from the commercial edition (this is purely a technical reason to make the maintenance feasible and user confusion low).
Since I feel that this topic is one of the strongest arguments against the “freeness” of syslog-ng, but with the above statements we are at the same level as any other projects “true open source project”: you need to take our word for it. I hope that with this post and with our last 12 years of open source work, we earned this trust. And if we don’t keep this promise we can be blamed.
But until that happens, please don’t judge syslog-ng’s freeness.
Conclusion
I agree that trying to balance between the Open Source and Premium Editions of syslog-ng is a difficult task, one that requires a lot of work too. We may have made some mistakes, but our intention is clear: OSE is an independent entity, standing on its own feet.
I hope this blog post clears up some of the confusion around this.
PS: Thanks for reading until this far.
LWN: syslog-ng rotten to the (Open) Core?
This was first posted as a comment under an article on lwn.net, but I thought it was important enough to post it here for others not reading lwn. Please go ahead and read the original article which is about the “Open Core” business model and its problems from the Free Software community point of view.
A commenter thought that syslog-ng was an example, which only exists as a marketing tool for the company’s commercial offering. Anyway, here’s my post:
First of all, I want to make it clear that I’m biased on the syslog-ng case, but still wanted to express my opinion here. I’m biased as I’m the primary author of syslog-ng.
I think syslog-ng is a completely different case from the one described by Neary. The GPL version is not crippleware, it was never published for marketing purposes only and for the majority of syslog-ng’s existence only the Open Source stuff existed. The Premium Edition is only about 3 years old and syslog-ng started in 1998.
We never removed features from the OSE version, the Premium Edition only included _additional_ features, and a lot of those are already available in the OSE.
Some examples:
* TLS support (became available in 3.0, almost 2 years ago)
* SQL destination (became available in 2.1, 2.5 years ago)
* performance improvements (3.0)
* etc.
In the other direction, we usually receive bugfixes and it is a pure technical reason that we used to require copyright assignment: I wanted to keep the two branches as close as possible (which if not done is the reason #1 why Open Core products become crippleware fast). _And_ since we heavily invested in automatic testing and our customers report bugs directly to us, we fix way more bugs in the OSE version than the community.
But anyway, I didn’t think that the dual license model was so problematic at the time we made this decision 3 years ago. Our efforts have never been “Rotten to the Open Core”. If you don’t believe that, check out the git repository or read the mailing list archive and see it yourself.
And this whole mess is the past, OSE 3.2 has been relicensed, and it is true that we’re going to publish non-free plugins, but anyone else is welcome to join and do the same.
syslog-ng 3.2alpha2 released
I’ve just uploaded syslog-ng 3.2alpha2 to the release directory. The last alpha release didn’t compile on all supported platforms and the automatic test-suite was disabled, because it only worked if syslog-ng got installed first.
These obstacles have been overcome and together with some fixes and a couple of new features, 3.2alpha2 is now available. I’ve also forward ported all bugfixes from syslog-ng 3.1.2.
For those who are starting to experiment with the 3.2 branch, here’s the list of new features compared to 3.1. Those who tried 3.2alpha1, the list of changes compared to 3.2alpha1 is at the end of this post.
Since the documentation of syslog-ng is not yet up-to-date with the new features introduced, I’ve tried to also include URLs for the best known descriptions. The references may not be 100% accurate, but should give anyone interested an idea how to start experimenting.
Also, please note that although this is an alpha release, the bulk of the changes are in the configuration parser, so once your configuration was parsed properly and syslog-ng starts up, an almost unchanged code is processing it. This means that this release should be good enough to start playing with. And feedback about what kind of syslog-ng.conf parsing errors you encounter on real-life configuration files is more than welcome.
Code quality & functionality wise, this could be a beta release, I only expect “procedural” changes, like cleaning up the plugin names, which wouldn’t be nice to do in a beta release (though not unheard of ![]()
New features in 3.2:
- Plugins: the new architecture replaces the old monolithic one, all syslog-ng functionality is loaded from external plugins when needed. It is possible to write plugins to extend syslog-ng functionality in the following areas: sources, destinations, filter expression, parsers, rewrite ops, message format.
http://bazsi.blogs.balabit.com/2010/07/syslog-ng-contributions-redefined.html
- The framework for a “syslog-ng configuration library” (aka SCL) a collection of configuration snippets installed along syslog-ng, simplifying the authoring of syslog-ng configuration files.
http://bazsi.blogs.balabit.com/2010/07/syslog-ng-contributions-redefined.html
- pdbtool match is now able to read a file containing syslog messages and apply patterndb and a filter expression on the contents.
- pdbtool test is now able to perform pattern testing automatically based on the supplied example log message.
- Persistent state containing the current file position for file sources is now continously updated during runtime, instead of updating it only at exit, which makes it much more reliable in case syslog-ng doesn’t terminate normally.
- Better syntax error reporting in the configuration file.
- Support for reusable configuration snippets, similar to macros with parameters, named “blocks”.
- Added a confgen plugin that includes the output of a program into the configuration file, making it possible to generate configuration file snippets dynamically.
- Support for BSD-style process accounting logs via the pacct() source driver defined in by SCL and the underlying pacctformat plugin.
- Support for explicit COMMITs in the SQL driver, this speeds up SQL INSERT rate significantly if flush_lines() is non-zero.
- It is now possible to supply a filter to rewrite expressions and only apply the rewrite rule in case the filter matches.
- It is now possible to use multiple parser expressions in a single parser object, similar to rewrite rules.
- Added support for using the include statement from anywhere in the configuration file, instead of only at top-level. Also introduced syslog-ng “global values” that can be defined and the substituted anywhere in the configuration file.
- Default configuration file supplied as part of SCL.
Incompatible changes:
- syslog-ng traditionally expected an optional hostname field even when a syslog message is received on a local transport (e.g. /dev/log). However no UNIX version is known to include this field. This caused problems when the application creating the log message has a space in its program name field. This behaviour has been changed for the unix-stream/unix-dgram/pipe drivers if the config version is 3.2 and can be restored by using an explicit ‘expect-hostname’ flag for the specific source.
Changes since 3.2alpha1:
- Now compiles on all platforms and the unit/functional tests also run. (tested: AIX, HP-UX, Solaris, FreeBSD, Linux, Tru64)
- Fixed pdbtool match –debug-pattern output for ESTRING parsers.
- Fixed a possible memory leak in the lexer, which would accumulate in case SIGHUPs.
- Fixed Solaris STREAMS device support.
- For
ward ported all bugfixes from syslog-ng OSE 3.0 & 3.1 - Disable process accounting module by default as it doesn’t compile on non-Linux platforms.
- Added “pdbtool match –file” option to read and parse an existing logfile.
- Added “pdbtool test” to check the log samples in the patterndb file.
- Added “dont-create-tables” flag for the SQL destination to inhibit automatic table creation.
- Added “condition()” support for rewrite expressions, which makes it possible to skip rewrite rules that do not match a filter expression.
- Added “–module-path” command line option to control where modules are loaded from from the command line.
Happy logging!
syslog-ng name-value pair naming
I was giving a lot of thought recently to the topic of naming name-value pairs in syslog-ng. Until now the only documented rule is stating somewhat vaguely that whenever you use a parser you should choose a name that has at least one dot in it, and this dot must not be the initial character. This means that names like MSG or .SDATA.meta.sequenceId are reserved for syslog-ng, and APACHE.CLIENT_IP is reserved for users.
However things became more complex with syslog-ng OSE 3.2. Let’s see what sources generate name-value pairs:
- traditional macros (e.g. $DATE); these are not name-value pairs per-se, but behave much like them, except that they are read-only
- syslog message fields (e.g. $MSG) if the message is coming from a syslog source
- filters whenever the ‘store-matches’ flag is set and the regexp contains groups
- rewrite rules, whenever the rewrite rule specifies a thus far unknown name-value pair, e.g. set(“something” value(“name-value.pair”));
- and of course parsers when you tell syslog-ng to parse an input as a CSV, or use db-parser together with the patterns produced by the patterndb project
The latest stuff generating name-value pairs is the support for process accounting logs, in this case even the syslog related fields are missing and only things like “pacct.ac_comm” (to contain the program name) are defined.
So I was thinking whether it should be “pacct.ac_comm” or “.pacct.ac_comm”. With the quoted rule it should be simple: it is generated by syslog-ng itself, thus it should be in the syslog-ng namespace and should start with a dot. However in the era of syslog-ng plugins, what consists of syslog-ng at all?
First, I wanted to use “pacct.ac_comm” (e.g. without a dot), because I liked this name better. I was trying to explain myself why it would not violate the rule above. The explanation I had for myself was: I’m going to “register” names such as this in the patterndb SCHEMAS.txt file. With this – not yet published – explanation, I’ve committed a patch to convert the pacctformat plugin to use a dotless prefix.
Next, I was figuring that it is true that process accounting creates name-value pairs without going through patternization, but I’ve felt, that nothing ensures that these name-value pairs would be directly usable, when trying to analyse the logs. The patterndb concept uses tags and schemas to convert the incoming unstructured data into a consistent structure. However, pacct may not completely match what the user needs. And, in the future, when SNMP traps or SQL table polling are going to be supported, it is going to be even more true: these name-value pairs may need a conversion: from the SNMP/pacct structure to the patterndb schema described structure in order to handle these message sources consistently with regular syslog (and to make it easy to correllate these).
So at the end, I’ve committed another patch, this time going back to “.pacct” as a prefix and leaving the original naming rule intact. The “pacct” prefix is up to the users to use, they may want the same information in a “pacct” schema, but that may come from data not directly tied from process accounting (e.g. from syslog messages).
So this post is about doing nothing with regards to the naming policy, but I thought it’d be important to shed a light behind the scenes. Giving such decisions enough thought and coming up a with a long-term plan makes our lives much easier in the future.
This post may be a bit more involved than the others, but feel free to ask me to elaborate, if you are interested.
syslog-ng & distributions
syslog-ng 1.6.x and 2.0.x versions had lived quite long. A lot of distributions used these versions and never upgraded to the newer ones.
This has changed recently, Peter Czanik was busy to help maintainers get to the latest versions.
Already available in the latest release:
- openSUSE
- FreeBSD ports
- Mandriva
- Gentoo portage
- OpenBSD ports
In development branches:
- Debian
- Ubuntu
- Fedora
These all carry 3.1.1, which is quite recent (and a successful release too). There are some fixes accumulated in the git tree though, so I hope to get 3.1.2 out of the door soon.
