Triggering Diagnostic Logging on Exception

I want to give a shout out to @mipsytipsy for all the nice words on Twitter. There's been some great discussions and it's lovely to see.

TL;DR

When an error appears to be completely random and infrequent, it can be very hard to track down the cause. Storing diagnostic information in a ring buffer can provide relevant information efficiently, and can be triggered to write to logs only on exception, or on a user provided signal.

The Problem

Targeted diagnostic logging is great when you know what you're targeting, and when you have attributes that you can map. However some situations, the bug may appear completely random, and it may only show up in production. In this situation, the only way to turn on diagnostic logging is to turn it on across the board and hope that the exception will show up.

When the exception occurs and is logged, you have three pieces of information. You have the exception itself. You have the stack trace associated with the exception. And you have the context: any markers, mdc properties, and arguments that you passed into the logging statement. None of that tells you what the code was doing before it got to the exception.

Ring Buffer Logging

If you could go back in time and turn on diagnostic logging for that flow immediately before the exception, that would be the ideal scenario. Then you could have a complete idea of what the code was doing without impacting the rest of the system.

You could probably do this with a circular buffers where you only kept the last 100 debug and trace logging statements, and flushed them to an appender if you got a logger.error statement. And then I thought… surely I can't be the first person to think this. And it turns out I wasn't.

Another non-stop monitoring feature of general utility is the use of circular history buffers for fast intermediate data collection. The use of circular history buffers is not a new concept. They are invented seemingly independently by almost every group of programmers with extensive experience debugging real-time programs. – Stewart, D. A., & Gentleman, W. M. (n.d.). Non-stop monitoring and debugging on shared-memory multiprocessors.

It turns out ring buffer logging in Java has been known for at least twenty years. Brian Marick has wrote about it in 2000 at the PLoP conference, Using Ring Buffer Logging to Help Find Bugs. This is a pure gem of an article, with questions like "How do you make it likely that a user will be able to provide useful information after a failure, without inconveniencing her if a failure doesn’t happen?" and "How do you know what information to log?" tied to patterns.

I looked for other libraries that had implemented this. The python library logbook likewise implements a FingersCrossedHandler, as does foolscap, an RPC/RMI protocol in Python. Ring buffers seem to be well known in programs where logging space is at a premium, as described in Logging in multi-threaded applications efficiently with ring buffer.

The oldest reference I could find to ring buffer logging was Hugh Blair-Smith talking about the Apollo computer in :

Another function served by these auxiliary erasable memories was called the "Coroner," which recorded every instruction executed, with its inputs and results, writing over the oldest such record when it filled up. When a program "crashed," you could punch out a full record of what it was doing in most of its last second and analyze the problem at your ease. I have often wished that PCs offered such an advanced feature. – Eldon Hall's Journey to the Moon, Chapter 13, page 155

So let's do that.

Implementation with TurboFilters

The Logback implementation turns out to be pretty straightforward with turbofilters, since this is where all logging events come in. A logging statement will come in, and if it is recordable then it is turned into a LoggingEvent and added to the ring buffer. On trigger, that ring buffer is flushed, and the events are sent to the appenders configured on the triggering logger.

logger.debug( "debug stuff");
logger.info( "info stuff");
logger.error( "error stuff");

yields the following (note that debug happens after info):

[INFO ] c.example.Test - info stuff
[DEBUG] c.example.Test - debug stuff
[ERROR] c.example.Test - error stuff

This is simple enough, but we can do more complex logic by moving the logic out of Logback, and putting it in the hands of the application. By moving the ring buffer into a marker factory, we can set up several ring buffers and have diagnostic information dumped to the appenders per flow:

RingBufferMarkerFactory factory1 = new RingBufferMarkerFactory(10);
RingBufferMarkerFactory factory2 = new RingBufferMarkerFactory(10);
Marker record1 = factory1.createRecordMarker();
Marker trigger1 = factory1.createTriggerMarker();
Marker record2 = factory2.createRecordMarker();
Marker trigger2 = factory2.createTriggerMarker();
Logger logger = loggerFactory.getLogger("com.example.Test");
logger.debug(record1, "debug one with 1st ringbuffer");
logger.debug(record2, "debug two with 2nd ringbuffer");
logger.debug(record2, "debug three with 2nd ringbuffer");
logger.debug(record2, "debug four with 2nd ringbuffer");
logger.error(trigger2, "Dump messages recorded with record2");

yields the out of order logging for statements that go through different flows:

[DEBUG] c.example.Test - debug two with 2nd ringbuffer
[DEBUG] c.example.Test - debug three with 2nd ringbuffer
[DEBUG] c.example.Test - debug four with 2nd ringbuffer
[ERROR] c.example.Test - Dump messages recorded with record2
[DEBUG] c.example.Test - debug one with 1st ringbuffer
[ERROR] c.example.Test - Dump messages recorded with record1

Or, we can go further than that. If you prefer things in one very wide structured event then you can use MarkerEventRingBufferTurboFilter and have it written out as JSON:

{
"@timestamp": "2019-07-30T09:55:36.986-07:00",
"@version": "1",
"message": "Dump all the messages",
"logger_name": "com.example.Test",
"thread_name": "main",
"level": "ERROR",
"level_value": 40000,
"tags": [
"TS_DUMP_MARKER"
],
"diagnosticEvents": [
{
"@timestamp": "2019-07-30T09:55:36.978-07:00",
"@version": "1",
"message": "debug one",
"logger_name": "com.example.Test",
"thread_name": "main",
"level": "DEBUG",
"level_value": 10000,
"tags": [
"TS_RECORD_MARKER"
]
},
{
"@timestamp": "2019-07-30T09:55:36.978-07:00",
"@version": "1",
"message": "debug two",
"logger_name": "com.example.Test",
"thread_name": "main",
"level": "DEBUG",
"level_value": 10000,
"tags": [
"TS_RECORD_MARKER"
]
},
{
"@timestamp": "2019-07-30T09:55:36.978-07:00",
"@version": "1",
"message": "debug three",
"logger_name": "com.example.Test",
"thread_name": "main",
"level": "DEBUG",
"level_value": 10000,
"tags": [
"TS_RECORD_MARKER"
]
},
{
"@timestamp": "2019-07-30T09:55:36.978-07:00",
"@version": "1",
"message": "debug four",
"logger_name": "com.example.Test",
"thread_name": "main",
"level": "DEBUG",
"level_value": 10000,
"tags": [
"TS_RECORD_MARKER"
]
}
]
}

The code's available in Github under the logback-ringbuffer and logback-ringbuffer-event modules, and there's documentation.

I hope this helps, and I'd love to know about any additional implementations of ring buffer logging in libraries.

Comments