I currently think about creating a very basic shipper for log files, but wonder if it really makes sense. I am especially concerned if good tools already exists. Being lazy, I thought I ask for some wisdom from those in the know before investing more time to search solutions and weigh their quality.
I've more than once read that logstash is far too heavy for a simple shipper, and I've also heard that rsyslog is also sometimes a bit heavy (albeit much lighter) for the purpose. I think with reasonable effort we could create a tool that
Rsyslog provides many up-to-the point error messages for config file and operational problems. These immensly helps when troubleshooting issues. Unfortunately, many users never see them. The prime reason is that most distros do never log syslog.* messages and so they are just throw away and invisible to the user. While we have been trying to make distros change their defaults, this has not been very successful.
Thanks to the new improved CI workflow, we do no longer manually need to do a final check of pull requests. I have used the new system for roughly two weeks now without any problems. Consequently, I have just removed the master-candiate branch from our git (with a backup "just in case" currently remaining in the adiscon git repository).
Roughly one and a half year ago we at the rsyslog project started to get serious with CI, that time with travis only. Kudos to Thomas D. "whissi" for suggesting this and helping us to setup the initial system. In aid of CI, we have changed to a purely Pull Request (PR) driven develpoment model, and have made great success with that.
We have been using json-c for quite a while now and had good success with it. However, recent problem reports and analysis indicate that we need to replace it in the future. Don't get me wrong: json-c is a solid piece of software, but we most probably use it much more intensely as the json-c developers ever anticipated. That's probably the actual root cause why we need to switch.
The initial version of liblognorm v2 is almost ready. It offers many new features, like custom data types, much easier rule description langugage, and potentially even greater performance (we have not yet verfied this). As some of you know, I have worked very hard on liblognorm during the past weeks. I have now reached a very important milestone and will switch the git master branch to use the new version. If things go smooth enough, the initial release of liblognorm v2 will go along with the next rsyslog release. Daily build will have it very soon.
The liblognorm "rest" parser was introduced some time ago, to handle cases where someone just wants to parse a partial message and keep all the "rest of it" into another field. I never was a big fan of this type of parser, but I accepted it because so many people asked. Practice, however, showed that my concerns were right: the "rest" parser has a very broad match and those that used it often got very surprising results.
There is one big problem in research for better logging methods: no good logging sample repositories exist. Well, not even bad ones... I am currently doing some preliminary steps towards a new, better log normalization system. Among others, it will contain a structure analyzer which will remove much of the manual burden of creating normalization rules. But, guess what: while the project looks very promising, lack of log samples is a real big problem!To solve that problem, I have setup a public log ingestor that you can simply send logs to.
I am currently working on log normalization as well as improvements for rsyslog's imfile. Among the things that regularly come up on the rsyslog mailing list is support for multi-line logs and Java stack traces in general.
I would like to see what I can do to improve processing of these. To do so, I need a set of samples of such logs. As such, I look for people who would like to contribute log records for my research.
There was a lengthy mailing list discussion in November and December of 2014 of whether or not to avoid git merge entries. There was also an intermingled discussion on QA and CI. The idea was to trim the git history and make sure tests are run a quickly as possible. As a result of that discussion, I added more automated testbench runs, which also required a new branch master-candidate, which is used as a staging area to run the test, and from which changes are (manually) migrated to master when all testbench runs are OK.
If you use rsyslog's devel packages on your system, you will receive errors soon. Be sure to read the complete posting to avoid trouble!
As part of rsyslog's new release schedule and version naming, devel releases will no longer be named according to the "normal" numbering scheme. This also means that the previous "devel" branches will disappear, as git master branch now is the always-current devel version.
With today's release of rsyslog 8.6.0, we start a new release schedule and versioning scheme. In a nutshell, we will be doing stable releases every six weeks now, and devel releases will be distributed via git exclusively.
For historical reasons, rsyslog offers a number of command line options which are actually configuration settings. These stem back to the days of the original syslogd, where the conf file was just a routing table and "all" other configuration was done via the command line. Some of them (e.g. -r to enable listening to the standard UDP port) have already been removed quite a while ago. Now, we are very serious about removing the rest of them.
Historically, the rsyslog source tree contains a lot of seldomly-used and exotic modules. Some of them even don't work at the moment. I kept them inside the tree so that they could serve as a sample for folks trying a similar things. However, there has been discussion on the rsyslog mailing list that all of this clutters up rsyslog and makes it a bit hard to understand which modules are well maintained, which are not, and which actually do not work or just serve an exotic border case.
A new rsyslog v8-stable is coming up soon. It will not just be the next iteration of 8.2, instead it will be a new feature release based on the current 8.3 devel. So be prepared to welcome 8.4. Frequent followers may wonder why 8.4 is ready. Originally, we planned to release it after the summer break. The reason is simple: its ready to come up, albeit with a little less functionality than originally anticipated.
Wouldn't it be great if we had an interactive tool that permitted it novices to build complex rsyslog configurations interactively? Without any need to understand the inner workings or even terminology? Indeed, that would not only be great, but in our opinion also remove a lot of pressure that we have on rsyslog's documentation part.
I am happy to tell that I have finally finished the 8.2.0 rsyslog release and it is on its way to announcement, package build and so on. While v8 was basically finished since before last christmas, we had a couple of mostly nits holding the release. This is probably a lesson that we need to accept some nits instead of holding a release for so long.
I have worked hard on liblogging-stdlog, which aims at becoming the new enhanced syslog() API call. The library is thread- and signal-safe and offers support for multiple log drivers, just like log4j does.
I have written a small presentation on what has changed in the rsyslog v8 engine. It takes a developer's perspective, but is most probably also of interest for administrators who would like to understand why the v8 engine scales out much better for slow outputs like ElasticSearch or databases.
For developers, it also contains the basic know-how needed to successfully (and without pain!) upgrade a pre-v8 output plugin to v8.
Rsyslog is an enterprise-class project. Among others, this means we need to provide different versions for quite a while (enterprises don't like to update every 6 month, and don't do so for good reasons).
Currently, there is a very valuable discussion going on on the rsyslog mailing list on how we can attract more contributors and how moving things to github can help with this. I was writing a longer reply, and then it occured to me that it probably is better to blog about this topic as it may be of future interest to have the current thinking (relatively) esay accessible.
Liblognorm is a fast-samples based normalization library. It's brand new version 1.0.0 will be released today. It is a major improvement over previous versions, but unfortunately we needed to change the API. So some notes are due.
Liblognorm evolves since several years and was intially meant to be used primarily with the Mitre CEE effort. Consequently, the initial version of liblognorm (0.x) uses the libee CEE support library in its API.
My co-worker Andre had a little time and extended the rsyslog impstats analyzer to support generating graphs. IMHO this gives you fantastic insight into how the system operates. While I know that some folks already push this data to their internal health monitoring system, the beauty of the online rsyslog impstats analyzer is that you do not need to install anything -- a log file with stats is all you need to get you going. Let's look at a quick sample. This is a page returned by the analyzer's check phase:
Based on recent discussions on the rsyslog mailing list, we have begun to start an online tool to "analyze" impstats logs. So far, it detects "obvious" problems, but it probably is a good starting point for beginners. We plan to extend this tool.
This is just a blog post on where to find sample of converting modules to the v8 output module interface. Additional information will be upcoming within the next days. Stay tuned.Please bear in mind that the v8 output module interface is not stable at this time. It will very likely change within the next weeks.Right now, the rsyslog v8 compatibility doc has some information on the new interface.
Do you remember last October? The rsyslog mailing list got very busy with things like output load balancing, global variables, and a lot of technical details of the rsyslog engine. I don't intend to reproduce all of this here. The interested reader may simply review the rsyslog mailing list archives, starting at October 2013.Most of these discussions focused on the evolution of the rsyslog engine.
WARNING - WORK IN PROGRESSThis is probably inconsistent, not thought out, maybe even wrong. Use this information with care.I wanted to share a complexity of executing a rsyslog ruleset in non-SIMD mode, which means one message is processed after each other. This posting is kind of a think tank and may also later be useful for others to understand design and design decisions. Note, however, that I do not guarantee that anything will be implemented as described here.As you probably know rsyslog supports transactions and, among other things, does this to gain speed from it.
As was discussed in great lenghth on the rsyslog mailing list in October 2013, global variables as implemented in 7.5.4 and 7.5.5 were no valid solution and have been removed from rsyslog. At least in this post I do not want to summarize all of this - so for the details please read the mailing list archive.Bottom line: we need some new facility to handle global state and this will be done via state variables. This posting contains a proposal which is meant as basis for discussion.
Rsyslog is heavily threaded to fully utilize modern multi-core processors. However, the imudp module did so far work on a single thread. We always considered this appropriate and no problem, because the module basically pulls data off the OS receive buffers and injects them into rsyslog's internal queues. However, some folks expressed the desire to have multiple receiver threads and there were also some reports that imudp ran close to 100% cpu in some installations.So starting with 7.5.5, imudp itself supports multiple receiver threads.
As regular readers of my blog know, we are moving towards preferring enterprise needs vs. low-end system needs in rsyslog. This is part of the changes in the logging world induced by systemd journal (the full story can be found here).Many of the main queue and ruleset queue default parameters were a compromise, and much more in favor of low-end systems than enterprises. Most importantly, the queue sizes were very small, done so in an approach to save virtual memory space.
As most of you know, rsyslog permits to pull multiple lines from a text file and combine these into a single message. This is done with the imfile module. Up until version 7.5.3, this lead to a message which always had the LF characters embedded. That usually posed little problem when the same rsyslog instance wrote the message immediately to another file or database, but caused trouble with a number of other actions.
From time to time, someone asks why rsyslog disk-assisted queues keep one file open until shutdown. So it probably is time to elaborate a bit about it.Let's start with explaining what can be seen: if a disk-assisted queue is configured, rsyslog will normally use the in-memory queue. It will open a disk queue only if there is good reason to do so (because it severely hurts performance). The prime reason to go to disk is when the in memory queue's configured size has been exhausted. Once this happens, rsyslog begins to create spool files, numbered consequtively.
Thanks to the recent IBM contribution of a partial rsyslog 5.8.6 port to AIX, we have come across that platform again. Unfortunately, some license issues prevent me from merging the IBM contribution to the current rsyslog release (7.4+). I tried to work with IBM resolving these issues, but it just occurred to me that actually doing the port ourselves is probably easier than wrangling with license issues.
The ASL 2.0 topic boiled up again due to a much-appreciated IBM contribution to make rsyslog 5.8.6 work on AIX. Unfortunately, this contribution was done under GPLv3+. I tried to work with IBM to have it released under ASL 2.0, but their legal department is of the opinion that this is not possible. This resulted in some restrictions, which can be found in the git branches' README file.
I just wanted to let everyone know that I will be joining the Guardtime technical advisory board. The board's prime mission is to make sure that Guardtime implements technology that users need.While implementing the rsyslog log signature interface and it's first signature provider, I worked very closely and productive with the Guardtime folks.
When Fedora updated to rsyslog 7.4.0 in Fedora 19, they changed the default way how they obtain local log messages. In all previous releases, imuxsock, the traditional Unix/Linux input was used. That also works in conjunction with journal, at least as far as traditional syslog is involved. This is because journal provides these messages via the log socket it passes to rsyslog. Unfortunately, journal hides information if structured content is available.
Rsyslog has both "main" message queues and action queues. [Actually, "main message queues" are queues created for a ruleset, "main message" is an old-time term that was preserved even though it is no longer accurate.]By default, both queues are set to one worker maximum. The reason is that this is sufficient for many systems and it can not lead to message reordering. If multiple workers are concurrently active, messages will obviously be reordered, as the order now, among others, depends on thread scheduling order.So for now let's assume that you want to utilize a multi-core machine.
I thought I share some news on what I have been busy with and intend to be in the future. In the past days, I have added more config options to librelp, which now supports GnuTLS compression methods as well as provides the ability to set the Diffie-Hellman key strength (number of bits) and - for experts - to set the GnuTLS priorities, which select the cipher methods and other important aspects of TLS handling.This is done now and I also added rsyslog facilities to use these new features.
With the interest in privacy concerns currently having a "PRISM-induced high", I wanted to elaborate a little bit about what rsyslog's Guardtime signature provider actually transmits to the signature authority.This is a condensed post of what the provider does, highlighting the main points. If you are really concerned, remember that everything is open source.
Some folks asked what a (rsyslog) signature provider is. In essence, it is just a piece of software written to a specific interface.There are little functional requirements for signature providers. Most obviously, we expect that it will provide some useful way to sign logs, what means it should be able to provide an integrity proof of either a complete log file or even some log records some time after the logs are recorded.There is no "official signature provider spec" available. As usual in open source, the provider interface is well defined inside its actual source header file.
After the successful and important release of 7.4 stable, we are working hard on further improving rsyslog. Today, we release rsyslog 7.5.0, which opens up the new development branch.With that branch, we'll further focus on security. Right with the first release, we provide a much-often demanded feature: TLS support for the Reliable Event Logging Protocol (RELP).
A topic that comes up on the rsyslog mailing list or support forum very often is that folks do not know exactly which values are contained on which fields (or properties, like they are called in rsyslog, e.g. TAG, MSG and so on).So I thought I write a quick block post on how to do that. I admit, I do this mostly to save me some time typing and having it at hand for reference in the future.This is such a common case, that rsyslog contains a template which will output all fields.
While almost everyone (including me) is talking about the PRISM system, it may be worth stepping a little back and looking at one of the underlying problems.Looking closely, PRISM drew it's data from just a couple of large sources, including Google and Facebook. But, again, it's very few sources.