The Case For Logging

This blog post is written in response to Sebastian Daschner's blog post The Case Against Logging, which I saw from Markus's tweet.

Although my views are in general opposed to Mr Daschner's post, this is not intended to be a "hit piece" and there are several places where I completely agree. Often times there are things I think or have been working on which I don't consciously realize until I come across a different point of view, and I hope any criticisms I have are taken in good faith.

The one thing that all enterprise applications have in common, regardless of their framework, language, or communication protocols is that they produce logs, logs  that are  allegedly human-readable lines of strings that aim to help us to debug, trace, or otherwise inspect what’s going on. The question is, how useful are these logs and do they help us solve problems that actually happen. In this post, I will explore this question, highlight some of the challenges that come with logging, and explore alternative solutions.

This is great. I agree completely that unstructured human readable logs suck.

My view is that where there are shortcomings, logging can be improved dramatically to make it more generally useful and more responsive, rather than moving away from logging entirely.

Performance

Performance is certainly the biggest one. If you talk to a JVM performance expert, they will tell you that how you log can have one of the biggest, if not the biggest, negative impact on your application’s performance. If you really want your application to perform poorly, you should add a lot of logging statements in your main use cases. This logging should engage in creating and concatenating a lot of strings. And no logging framework is without several layers of buffering. The biggest performance issue for applications is the cost of a Java heap allocation, and logging usually allocates disproportionately when compared to typical business code. It’s not just allocation costs as high allocation rates ultimately will hyper-activate the garbage collector. This in turn will result in high CPU utilization and increased frequency of tail latency. It’s quite interesting to have a look at such scenarios in production, in which an application allegedly utilizes the CPU a lot, which in fact turns out to be caused by the garbage collector because of excessive allocation.

This section breaks down into two concerns: string concatenation, and memory allocation of temporary objects. Let's address string concatenation first.

String concatenation is a real concern, but – crucially – it is a user's choice to do that. Logging frameworks don't require string concatenation. Both Logback and Log4j2 allow for parameterized logging using a template. It's not the logging allocating disproportionately – it's your application's input into that logging.

More to the point, logging frameworks are well aware of the issues involved in string concatenation and recommend parameterized logging and guard statements around logging statements.

The following two lines will yield the exact same output. However, in case of a disabled logging statement, the second variant will outperform the first variant by a factor of at least 30.

Logback - Architecture

logger.debug("The new entry is "+entry+".");
logger.debug("The new entry is {}.", entry);

So if the problem is that developers are using the tools wrong, then how do we fix it? Fortunately, we have tools for that.

There are tools to validate SLF4J API calls in Findbugs and Errorprone, and tools to automatically rewrite logging statements to parameterize logging statements and migrate to SLF4J. Unfortunately, I didn't see any openrewrite tools to add guarding isLoggingDebug conditional blocks around debugging statements, but it seems doable.

Memory allocation is another topic that comes up when discussing logging, but it's unclear how much of this is still relevant – When you dig into the problems with memory allocation, most of the discussion is around the 2012 - 2013 point, at a time when JVM garbage collection had been stalled for quite a while. Benchmarking with a newer JDK showed significant improvements in garbage collection, even when logging at a TRACE level, and reports of Shenandoah GC are very positive. I've been poking around at the JVM memory allocation to the point where I made an audio soundtrack out of garbage collection events. While it's technically possible to spam logging to the point where you have memory allocation issues, I've only ever seen it where the logging framework collides with multiple threads, i.e. repeated calls of MDC.getCopyOfContextMap() when copying MDC values across asynchronous boundaries in a custom executor.

Disk I/O is another factor to consider. Writing and flushing a lot of data to disk will impact the performance of all applications running on the same hardware. Even worse, log files that reside in network storage impact the throughput even more, since the write operation hits the operating system I/O twice, with file system and network calls involved. Logging makes these hardware devices which are shared between all applications part of the critical path. We often see this as a “noisy neighbor”.

Logging directly to a disk will eventually cause I/O overhead. However, this doesn't really get at the sheer amount of logging you need to do to cause I/O overhead. You have to log lots – really lots – to cause I/O overhead. For Logback, a single-threaded file appender on my laptop with immediateFlush=false will max out a core and logs to SSD at 184.32 MiB per second – that's 56 GiB in five minutes. A GCP regional persistent disk (very much inferior to laptop SSD) tops out at 240MiB/s.

To be clear, background IO traffic has caused issues, sometimes in very odd ways.

There are two possible solutions here. The first solution is to ensure that if you are logging to disk, high IO logging happens on a separate or high-performing disk drive (e.g., SSD, tmpfs), or is sent over a network channel. High volume logging directly to a file over network storage is not a good fit and should be avoided.

Another solution is to log over the network, so that it's not the host's problem. In Logback, it's common to asynchronously log to a UDP appender to Logstash via syslog or udp, and from there it'll be aggregated and centralized.

Dependencies

The number of dependencies that logging frameworks bring along, directly or transitively, creates a few potential issues, as well. Not only do dependencies and their transitive baggage inflate the application’s size and build time. The more dependencies and moving parts we have, the higher the changes that there are version conflicts, bugs, or security issues, which not least the recent Log4Shell vulnerability has shown once again. In general, the less dependencies we include the better.

There are two discussions here: the versioning and security issues involved in Log4J, and the dependency issues involved multiple logging libraries over time.

Leaving aside the specific JNDI lookup vulnerability, Log4Shell was very hard to track for security professionals because there was no good way to track what version of the logging framework was being used. However, this is not a concern specific to logging – all software in the Java ecosystem suffers from lack of reliable packaging manifests and version number ambiguity. Tools like dependencycheck can identify problems in projects at the time of packaging, but there is very little tooling to help identify version numbers in already packaged artifacts.

The logging mess was an era of mass confusion, best described by this reddit comment:

If you are a diligent application author with good control over your dependencies, you probably try and keep that tree of dependencies small, and actively ensure that only slf4j-api and a single logging implementation (which you choose) is on classpath. If you are sloppier, or (more likely) have less control over what you import (e.g. a 3rd party vendor has given you a badly designed library which is the only way their USB-attached widget can be interfaced), you might find you've got both your own choices (slf4j-api and logback) on classpath, plus log4j 1x which the vendor still depends on. In this case, you might exclude the transitive log4j 1x dependency, and insert an slf4j bridge so that the library can write to what it believes is log4j 1x, but get routed to slf4j.

I would add that identifying and excluding transitive dependencies is a pain in the neck, but this is a problem with older unmaintained libraries in general.

Log Formats

Log formats and which one to choose are another issue. Which information should be include (I dare you to have a look at the Log4j layouts), how should we structure our logging messages, which information should be logged at which log level, and so on. On the one hand, logs are produced to be human-readable but the volume of data the logs lead to creates the necessity to use more tooling such as log aggregators and search engines. Unfortunately, human-readable formats tend to be expensive for machines to read which leads to the conclusion that logs are generally neither really human nor machine readable. In this cases, it makes more sense to consider a binary or a more machine-friendly format. Some applications do log lines of JSON, but the question remains, is this really any better? The log format will impact performance not only with regards to how much is added to each line, but also how many string concatenations, method invocations, and reflection lookups have to be performed.

Unequivocally yes: structured logging using a JSON framework is just better on multiple levels. JSON has a great advantage in that the structure and grammar of JSON is very concise – this makes it possible to optimize parsing and querying, and it can be faster than many binary formats. Jackson serialization, the de-facto standard of JSON in Java, is heavily optimized and under continuous scrutiny and development. Seriously, structured logging is the best.

Also, you don't need to choose between machine readable or human logs, and having one doesn't preclude the other; it's entirely possible to have both, by adding multiple appenders. Machines are good at reading JSON, so we feed them JSON. Humans are good at reading line oriented logs, so we feed them logfmt semi-structured data. We do this by using event specific custom field:

import static net.logstash.logback.argument.StructuredArguments.*;
logger.info("log message {}", keyValue("name", "value"));

or using a structured logging framework like (disclaimer: self promotion) Echopraxia:

import com.tersesystems.echopraxia.*;
logger.info("log message {}", fb -> fb.keyValue("name", "value"));

In the example above, a line oriented encoder will log the message as log message name=value and a JSON encoder will log {"name": "value", "message": "log message name=value"}.

In development, I start a new project so it logs lines to console, and JSON to a file. The entire logback.xml looks like this:

<configuration>
  <appender name="ASYNCJSON" class="net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender">
    <appender class="ch.qos.logback.core.FileAppender">
      <file>application.ndjson</file>  
      <encoder class="net.logstash.logback.encoder.LogstashEncoder" />
    </appender>
  </appender>

  <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%date{H:mm:ss.SSS} %highlight(%-5level) [%thread]: %message%n%ex</pattern>
    </encoder>
  </appender>

  <root level="INFO">
    <appender-ref ref="CONSOLE"/>
    <appender-ref ref="ASYNCJSON"/>
  </root>
</configuration>

That's really all there is to it.

Log Levels

Log levels are another topic that I haven’t seen being used reasonably out there in real-world projects, which I think is not the projects' fault. Log levels might sound nice in theory, that you observe different levels of detail as to what information is currently interesting, but in practice that doesn’t really work well. Once some certain condition happened that you would have liked to debug using log statements, it’s likely that the log level wasn’t detailed enough, since debug and trace logs are usually not available in production. After all, keeping detailed log levels on in production that result in many statements being written to disk will hurt your performance.

Some implementations support the ability to dynamically change the log level at runtime. The only issue is that by the time you realize you need more information, it’s likely too late to capture what is needed.

This is a very real and very central concern: debug and trace (diagnostic) logs are typically disabled in production, for fear of impacting performance.

First, it's important to note that many developers do use diagnostic logging in production, and report great success doing so.

Second, the key phrase involved is that statements written to disk (and by implication, sent across network, indexed, etc) impact performance. Writing debug and trace statements to a bounded memory buffer is cheap. Brian Marick wrote about this in Ring Buffer Logging, but the technique has been around since the Apollo missions to the moon. When an error occurs, you can investigate the buffer and attach a relevant backtrace of diagnostic data to the error.

I've implemented this a couple of times. Using a straight ring buffer proved to be very awkward and manual, so the second time, I wrote Blacklite – it's an SQLite appender with some pragmas to use a memory mapped file and optimize for writes – and mounted it on tmpfs – this gave me a way to query and export diagnostic logging statements in a stable format without having to touch disk, at a speed comparable to ring-buffer logging.

Then, you set up Logback to filter out diagnostic logging from operational logging:

<configuration>

  <appender name="BLACKLITE" class="com.tersesystems.blacklite.logback.BlackliteAppender">
    <filter class="ch.qos.logback.classic.filter.LevelFilter">
      <level>DEBUG</level>
      <onMatch>ACCEPT</onMatch>
      <onMismatch>DENY</onMismatch>
    </filter>
    <file>/mnt/tmpfs/myapp/diagnostic.db</file>
    <batchInsertSize>1</batchInsertSize>
    <encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
  </appender>

  <appender name="ASYNCJSON" class="net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender">
    <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
      <level>INFO</level>
    </filter>
    <appender class="ch.qos.logback.core.FileAppender">
      <file>application.ndjson</file>  
      <encoder class="net.logstash.logback.encoder.LogstashEncoder" />
    </appender>
  </appender>

  <logger name="com.example.myapp" level="DEBUG">

  <root level="INFO">   
    <appender-ref ref="BLACKLITE"/>
    <appender-ref ref="ASYNCJSON"/>
  </root>
</configuration>

From there, you can integrate with your web application – I added some code to find log entries using JDBC and then made the exception handler attach debug and trace logs as Sentry breadcrumbs when reporting errors to Sentry.

Using SQLite also has the advantage that I can poke around – If I want to find something quickly, I can query using SQL statements using DB Browser and leverage the JSON support, including the new JSON pointer syntax.

I talked at Scalacon more in depth about debugging and observing code, and there's slides and video.

Choosing a proper log level, and then of course which statements should be logged in the code using which level, is thus always a trade-off. Often this task is left to the developers to decide without any architectural guidance and this further degrades the usefulness of the logs.

I think that choosing the appropriate log level is covered Stack Overflow and How To Do Logging In Java. I do think that INFO statements are used far too liberally, and DEBUG not enough.

I do think that logging levels have their place (especially for third party libraries), but I think it's a problem that there isn't a more obvious way to change log levels on the fly. The code itself for Logback is fairly simple, and I've played with tying logging levels to JMX components, but all the tools for changing log levels are tucked away like in Spring Boot Actuator.

Amounts of Data

Logs can produce huge amounts of data that is written to log files and over time can result in large files that are cumbersome to handle and expensive to parse. Parsing log formats in general bears quite an avoidable overhead. Especially if we collect all logs in another stack such as ELK, these tools then need to parse all these formats just again, which makes one question if the approach was a sensible one to begin with.

Yeah, this is a problem. By far the largest cost in logging is in centralized log management. Running an ELK stack on AWS is typically at least $1200, and even the cheapest plan costing $433 a month for 300GB.

The bulk of the work in centralized logging goes into parsing, indexing, and storing JSON, and I agree completely that this is avoidable overhead. Unfortunately, I don't think we're going to see much progress in general without a change in the observability pipeline itself, because the inertia of the system is so great. I suspect that you could ship SQLite files directly to S3 and provide a notebook based API and that would be far more useful than what we have now, but that's another topic for another day.

Debugging

Using logging for debugging in production does not only have a huge negative performance impact but also might not even be of much help. Information that isn’t available at a configured log level won’t help you to reproduce a specific bug and setting a log level that is too low, especially for third-party framework code and libraries, typically results in an overwhelming number of lines being logged per user request.

Yes, this is a concern. Diagnostic logging is not a binary on or off solution. There's a wonderful paper that categorizes logging statement problems as where-to-log, what-to-log, how-to-log, and whether-to-log. The whether-to-log problem is defined as dynamically adjusting the degree of logging in response to the runtime requirements. For example, if a suspected anomaly is detected, the logging platform can enable more detailed logging, and if the system is acting normal, it minimizes the number of logs to lessen the overhead.

I am still experimenting with this, but what I've come up with in Echopraxia is a two tier system – in addition to checking the log level, a logging statement must satisfy a condition before it will be logged. This doesn't solve the problem of what happens when you don't control the logging output – a third party framework logging at debug level will still use straight SLF4J and in that case you have to tie conditions through Logback filters – but it does go a long way to making DEBUG and TRACE statements into a more nuanced tool you can use in production.

This kind of system is already in place at Microsoft, where a paper quotes an engineer: “What’s really great for us is to be able to real-time turn on and off what log stuff you’re collecting at a pretty granular level. And to be able to get the performance back when it is turned off is a big thing for us.”

Conditions are just simple functions which take a level and a context, which provides access to the contents of the logging statement. They can be applied individually to statements, or can be added transparently as filters on a logger.

final Condition condition = new Condition() {
  @Override
  public boolean test(Level level, LoggingContext context) {
    ctx.findThrowable()
       .filter(e -> "test message".equals(e.getMessage()))
       .isPresent();
  }
};
logger.error(condition, "Error message", new RuntimeException("test message"));

In a cloud native solution, you can tie conditions to a feature flag system like LaunchDarkly, so you can enable targeted diagnostic logging, but there are more interesting possibilities. For example, you can log more aggressively for a set period of time on an alert. Or you can tie conditions to a scripting language, and change those scripts on the fly.

Dynamic debugging is in itself a big topic, but I talk about more in this blog post blog post and there's a Docker compose example – you can ignore the litestream stuff, it's there to demo how you can automate diagnostic dumps from remote cloud instances.

More generally, there's the "won’t help you to reproduce a specific bug" angle. My argument is that the reason why debugging is not great at reproduction is because of the original sin of toString – an unstructured dump of internal state that's implicit in logging frameworks, attached to Object and provided with a useless implementation. I think that if we can effectively and accurately represent the state of an object in structured logging (and make toString a compile error), we can get much better at reproducing bugs.

This is the other part of (self promotion again) Echopraxia, because it has custom field builders that encourage a domain oriented representation of an object:

public class PersonFieldBuilder implements FieldBuilder {
  // Renders a `Person` as an object field.
  public Field person(String fieldName, Person p) {
    return keyValue(fieldName, personValue(p));
  }

  public Value<?> personValue(Person p) {
    if (p == null) {
      return Value.nullValue();
    }
    Field name = string("name", p.name());
    Field age = number("age", p.age());
    // optional returns either an object value or null value, keyValue is untyped
    Field father = keyValue("father", Value.optional(p.getFather().map(this::personValue)));
    Field mother = keyValue("mother", Value.optional(p.getMother().map(this::personValue)));
    Field interests = array("interests", p.interests());
    return Value.object(name, age, father, mother, interests);
  }
}

which then lets you log more complex information:

Person user = ...
Logger<PersonFieldBuilder> personLogger = LoggerFactory.getLogger().withFieldBuilder(PersonFieldBuilder.class);
personLogger.info("Person {}", fb -> fb.person("user", user));

The field builder arguments exist in a function, and the function will only execute if the logger meets both the given logging level, and any conditions attached to the logger.

Once you have the internal state available as JSON, you can then query for outliers, diff JSON blocks against each other, and in general treat logging like a persistent debugger, rather than as a fancy printf. I get super excited talking about this, but again this is another topic for another day.

Journaling

Logging statements that aim to record business-motivated information in order to create an audit train is akin to a poor man’s version of journaling. This activity is better accomplished by using a journaling solution or event sourcing. If the business requirements demand a journaling, audit log, or alike, it should be treated as such and made explicit. Using event sourcing or proper journaling technology such as Chronicle Queue persists the messages more efficiently, with lower footprint, lower latency, and higher throughput.

Yes, hard agree. Business event logging, journaling, and message-oriented events should be handled specifically using a different channel than logging, and should go to a log-processed storage solution. Not only do business events require different semantics for auditing, but they are typically also strongly typed with a well defined schema, whereas logging is infamously loose. Because business events have a tight schema, even if you're using JSON you can leverage jsoniter and generate code specifically for serializing and deserializing those events.

Tracing

Business- or technically-motivated tracing should also be made explicit in the application and implemented using a fit-for-purpose solution, such as OpenTracing or another journaling solution. (Mis-)using logging in order to trace method invocations and arguments has the same drawbacks as using logging for debugging.

Theoretically this is sound, but both OpenTracing and its successor, OpenTelemetry, are tools for distributed tracing across services rather than local method tracing, leaning heavily on instrumentation of frameworks to inject appropriate context. They have no understanding of business context, and their ability to customize a particular span with information is limited to key/value pairs. Frameworks for processing distributed tracing data often default to a "trace view" and do not give the ability to query on fine grained details of traces and spans. And even worse, tracing doesn't know about any of the existing logging MDC context interactions.

It is possible to get OpenTracing span data to dump to an SLF4J reporter, and I hope that more can be done here.

More generally, method level tracing of entry and exit, appropriately covered by isTraceEnabled() guards, costs around 2 nanoseconds. The big issue in logging is that providing caller info through Logback is still expensive, which unfortunately means that caller information has to be dug out some other way, either through Aspects if you own the code, or using an agent to instrument third party code.

Monitoring

Instead of using logging to monitor, one should use a proper monitoring solution, which usually includes emitting metrics in an endpoint. For example, publishing metrics using the Prometheus format via HTTP, and then scraping those metrics at the receiving end. All major enterprise frameworks support solutions out of the box, and even custom-built implementations are usually a better choice for realizing monitoring than appending individual metrics to the log.

In general, I agree. I think that DropWizard Metrics or Prometheus Client are good options. There is a caveat to metrics, which is that they aggregate datapoints and then send an aggregate for monitoring, which means that some information is lost in transmission – a proposed alternative is structured events, which is a single JSON blob containing a record of a processed operation. This led to a frank exchange of views on the merits of sending events versus metrics. I think both have their place.

System.out

The straightforward way is to avoid any of the major logging frameworks and to go with System.out and System.err in the situations mentioned above. As for errors and exceptions, we should log only unexpected errors that require developers to investigate and resolve them — that distinction is important and doesn’t include any user-provoked errors. If you leave it to these situations, your application’s performance won’t be unnecessarily impacted.

Using System.out is a great way to start development on small projects, but in my experience it doesn't scale up once an application is complex enough that a team is maintaining it.

The first, most practical problem, is that Standard.out is a public stream, and one that programs can reasonably expect to use. This means that you can't use System.out.println in a situation where you do not control the logging output – if you're writing a library or framework code in which some other programmer may be using your code, then that programmer can't use System.out because you're already using it for logging. Additionally, System.out is not only public, but it is also mutable – programs can call System.setOut or System.setErr to redirect the output streams at any time.

The second problem, closely related to the first one, is that some other libraries already use System.out and System.err for logging. Notably, debugging the JSSE stack will result in logging to System.out, making a mess of your own logging with its own format. In the past, I've had to deal with JSSE output by calling System.setOut to redirect the stream, but that only works if I can manage other logging streams appropriately.

The third problem is that System.out ignores other logging. Like it or not, many if not most libraries and frameworks depend on SLF4J as an underlying dependency. From the data, 4 out of 5 projects use SLF4J based on a survey on Github, and SLF4J is the second most used library after JUnit. A recent paper measuring the adoption of logging frameworks showed that for small (20 to 100 classes), 72.5% of projects used a logging library, up to 97% for very large projects. In sufficiently large software, logging is an inevitability, and using System.out alongside an existing logging system only adds more confusion.

The fourth problem is that System.out.println lacks basic features available in logging. It does not have a timestamp, a way to indicate the source of the logging, or even a way to encapsulate output containing newlines. There's no MDC-like feature capturing contextual information. There's no way to integrate error handling using exceptions, so integrating with Sentry is out. There's no way to add any features under the hood because it's a direct reference rather than an interface, so in the best case, it's a rewrite.

The fifth problem is that using System.out.println is not a solution to any of the performance criticisms of logging. Rather than solving the problem of string concatenation, using println encourages more concatenation, and also makes available the System.out.format option which is even more expensive.

The final problem is related to PrintStream having a synchronized println method. A synchronized method will only ever allow one caller to access the method at once, causing other callers to block until the method has completed. Blocking can have a severe impact on CPU bound fork/join worker pools, meaning that using System.out.println is not compatible with asynchronous frameworks like Vert.x and Akka. You can see this very clearly if you start a flight recording, as you'll see lots of "salmon" showing monitor blocked events in the threads view of Java Mission Control.

More …