Tracing With Logback and Honeycomb

I've written a logback appender that uses the Honeycomb Event API. It's pretty great!

When To Use It

You should use Honeycomb for observability, i.e. the "unknown unknowns." The advantages of using Honeycomb is that it gives you an easy way to throw your logs at a wall and see what looks interesting. The entire team can look at the same logs, pick things out, and see what's going on.

In addition, with Honeycomb you can trace your operations very easily: this gives you a way to see your code flow, and also do some lightweight profiling to find which bits of your code take the most time.

When Not To Use It

I don't recommend using this as your primary means of logging in production: I think writing JSON to Flume/Kinesis Firehose/Aggregator and then sending aggregated data through a lambda is more scalable approach.

When you're debugging a problem on your laptop and you already know the details of what's going on, then Honeycomb is less useful. If you're looking at debug logs from an underlying library, then it's less useful. You can see the raw logs as a file just as easy as in Honeycomb, and the actual data is minimal. Honeycomb excels when you have structured data that you can sort and filter on, and unfortunately the underlying library debug output tends to come in free-form text format, so you're still generally better off using ripgrep or dumping JSON into PostgreSQL and investigating it there.

There is another drawback to using the Event API, which is that you can't do dynamic sampling the same way that you would do with honeytail. I think this is okay, given that you can implement your own dynamic sampling according to your own needs, and use a logback filter to manage it for you.

How To Add It

You'll want logback-honeycomb-appender and logback-honeycomb-playws from the terse-logback bintray page, and then from there the documentation shows how to configure the appender.

From there, you can sign up to use the Honeycomb Community Plan, deploy an instance to a staged server, and then hit it with Gatling.

Tracing with Honeycomb

One of the more interesting bits of the honeycomb API is that you can do tracing just by adding a few extra JSON fields to your logs. All you need is trace.span_id, trace.parent_id, trace.trace_id, service_name, duration_ms, and name, and you'll be able to see data in the Honeycomb span view.

The way you do this in Logback is through com.tersesystems.logback.honeycomb.SpanInfo.Builder, which puts these together, and then convert through the SpanMarkerFactory, which turns them into an aggregated LogstashMarker:

// Create a root span.
SpanInfo.Builder builder = SpanInfo.builder().setServiceName("play_hello_world");
SpanInfo rootSpan = builder.setRootSpan("index").buildNow();

// Create a child span...
SpanInfo childInfo = rootSpan.childBuilder().setSpanName("doSomething").buildNow();
doSomething(childInfo);

// Create marker factory
SpanMarkerFactory smf = new SpanMarkerFactory()

// Close off the child span and log the event (you can add extra markers here for more context)
logger.info(smf.create(childInfo), "")

// Close off the root span.
logger.info(smf.create(rootSpan), "")

At the end of it, you get this:

tracing

This tells you exactly how long your code took, and it shows you all of the information in a tracing based hierarchical format and in a straight flat file / JSON format, with few lines of code.

Why This Is Better Than Opentracing / OpenCensus

This is something you can't get from the Opentracing API – you can send opentracing information to SLF4J, but you're still using the OpenTracing API to do it. The Opentracing API requires you to activate spans, pass them between threads, render error messages using semantic conventions. This means writing things like:

public void error(String message, Exception e) {
    Scope scope = tracer.scopeManager().active();
    if (scope != null) {
        Span span = scope.span();
        Tags.ERROR.set(span, true);
        span.log(Map.of(Fields.EVENT, "error", Fields.ERROR_OBJECT, e, Fields.MESSAGE, e.getMessage()));
    }
}

I'll admit I'm biased, but this just looks fiddly and easy to get wrong. I'd expect something like span.error(message, e) and the details to be done under the hood. Yes, you don't have to pass the scope around between methods because it's in a ThreadLocal, but you can just as easily stick SpanInfo in a ThreadLocal, and you'll have to pass span information explicitly in any case when you start using asynchronous CompletableStage style APIs.

Opentracing also has larger performance requirements than Honeycomb. To see tracing information, you'll need something like Jaeger All-in-One Docker Image, and then you'll run Jaeger agents in Kubernetes sidecars. It's a decent solution for distributed microservices in production – for single-app development it's like using a mallet to squash a fly.

Next Steps

Next steps, I'm going to add span information to logback-bytebuddy so I can instrument that code and have it show up in Honeycomb from config. I've already got configuration driven instrumentation, so it's probably just about keeping a stack of spans around and then you'll be able to see Java innards in a trace.

I'm still working on the series of logging API blog posts, but you can start seeing the bones of it already in the SpanInfo code already: most of it is about marshalling data before turning it into markers and sending it through SLF4J under the hood. For the SpanInfo code, just using Google AutoValue and Google AutoService takes care of the boilerplate work.

Comments