Tracing With Logback and Honeycomb

I've written a logback appender that uses the Honeycomb Event API. It's pretty great!

I don't recommend using this as your primary means of logging in production: I think writing JSON to Flume/Kinesis Firehose/Aggregator and then sending aggregated data through a lambda is a better approach. But for development, it's great. Using the Honeycomb Community Plan and a logback appender gives you an easy way to throw your logs at a wall and see what looks interesting.

You'll want logback-honeycomb-appender and logback-honeycomb-playws from the terse-logback bintray page, and then from there the documentation shows how to configure the appender.

Running a local laptop against Honeycomb isn't great when you're debugging a targeted problem, as you can see the raw logs as a file just as easy as in Honeycomb, and the actual data is minimal. Honeycomb excels when you have structured data that you can sort and filter on, and unfortunately the underlying library debug output tends to come in free-form text format, so you're still generally better off using ripgrep or dumping JSON into PostgreSQL and investigating it there.

There is another drawback to using the Event API, which is that you can't do dynamic sampling the same way that you would do with honeytail. I think this is okay, given that you can implement your own dynamic sampling according to your own needs, and use a logback filter to manage it for you.

However, if you are running a one-off server in staging or don't have a clue what you're looking at, then Honeycomb is great. The entire team can look at the same logs, pick things out, and see what's going on. In addition, with Honeycomb you can do tracing your operations very easily.

One of the more interesting bits of the honeycomb API is that you can do tracing just by adding a few extra JSON fields to your logs. From the manual tracing documentation, all you need is trace.span_id, trace.parent_id, trace.trace_id, service_name, duration_ms, and name, and you'll be able to see data in the Honeycomb span view.

The way you do this in Logback is through the Marker API.

// Create a root span.
SpanInfo.Builder builder = SpanInfo.builder().setServiceName("play_hello_world");
SpanInfo rootSpan = builder.setRootSpan("index").buildNow();

// Create a child span...
SpanInfo childInfo = rootSpan.childBuilder().setSpanName("doSomething").buildNow();
doSomething(childInfo);

// Create marker factory
SpanMarkerFactory smf = new SpanMarkerFactory()

// Close off the child span and log the event (you can add extra markers here for more context)
logger.info(smf.create(childInfo), "")

// Close off the root span.
logger.info(smf.create(rootSpan), "")

At the end of it, you get this:

Tracing

This is great, because it tells you exactly how long your code took, and it shows you all of the information in a tracing based hierarchical format and in a straight flat file / JSON format, with few lines of code.

This is something you can't get from the Opentracing API – you can send opentracing information to SLF4J, but you're still using the OpenTracing API to do it. The Opentracing API requires you to activate spans, pass them between threads, render error messages using semantic conventions. This means writing things like:

public void error(String message, Exception e) {
    Scope scope = tracer.scopeManager().active();
    if (scope != null) {
        Span span = scope.span();
        Tags.ERROR.set(span, true);
        span.log(Map.of(Fields.EVENT, "error", Fields.ERROR_OBJECT, e, Fields.MESSAGE, e.getMessage()));
    }
}

I'll admit I'm biased, but this just looks fiddly and easy to get wrong. I'd expect something like span.error(message, e) and the details to be done under the hood.

Opentracing also has larger performance requirements than Honeycomb. To see tracing information, you'll need something like Jaeger All-in-One Docker Image, and then you'll run Jaeger agents in Kubernetes sidecars. It's a decent solution for distributed microservices in production – for single-app development it's like using a mallet to squash a fly.

Next steps, I'm going to add span information to logback-bytebuddy so I can instrument that code and have it show up in Honeycomb from config. I've already got configuration driven instrumentation](https://github.com/tersesystems/terse-logback/blob/master/logback-bytebuddy/src/test/resources/application.conf), so it's probably just about keeping a stack of spans in a ThreadLocal and then you'll be able to see Java innards in a trace.

I'm still working on the series of logging API blog posts, but you can start seeing the bones of it already in the SpanInfo code already: most of it is about marshalling data before turning it into markers and sending it through SLF4J under the hood. For the SpanInfo code, just using Google AutoValue and Google AutoService takes care of the boilerplate work.

More …