How to Log an Exception

I know I said I'd work on logging APIs this week… but I'm still cleaning up some code before it's ready to present. So this is an interlude post.

The question today is the deceptively simple "how do I log an exception", which turns out to be interesting on multiple levels.

Canonically, the way you log an exception in SLF4J is to log a domain specific message, and include the exception as the last parameter. You can include parameters and use SFL4J templates, so having multiple arguments is fine.

try {
  parseXML();
} catch (Exception e) {
    // documentId is a StructuredArgument for structured logging, e is used for stacktrace
    logger.error("Parsing xml document failed!", kv("documentId", documentId), e);
}

However, this isn't the end of the story. When you render the exception in an appender, you have to decide how to render the stacktrace. The stacktrace is typically voluable, and will cover most of your console and your file in stack trace elements.

org.xml.sax.SAXParseException: The element type "img" must be terminated by the matching end-tag "</img>".
	at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
	at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
	at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
	at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
  ...

Typically, you don't actually care about most of the stacktrace in this case. There's nothing you can do about anything in the Xerces stack, and it's not relevant. You care about the message in the exception. This is especially relevant in appenders where real estate (or disk space) is at a minimum: for example, the exception may not be written to the console at all.

So, if you want to see the message written as:

Parsing xml document my-photo-album failed, underlying cause was [The element type "img" must be terminated by the matching end-tag "</img>".]

Then you should call e.getMessage() and append only the message.

logger.error("Parsing xml document {} failed, underlying cause was [{}]", documentId, e.getMessage(), e);

I think this is common practice. I've only had two objections to the practice of adding e.getMessage when logging. The first objection was that the exception could come from an unknown source, and so the runtime cost of calling e.getMessage was unknown and could result in 20 MB of /dev/random data, if it didn't risk crashing, halting, blocking, the program altogether.

This is not a theoretical problem! I've personally seen harmless code block entire applications because they called object.hashCode() and obj.equals only to discover that java.net.URL makes blocking network calls on hashCode and equals.

I think you have to trust that getMessage returns a message and is not pathological – if you don't trust an exception to be safe, then the only information you can extract from it is that it exists and has a certain type: you can safely call e.getClass().getName() on it.

The other objection was that calling e.getMessage was duplicating information that was already in the stacktrace. I think this is a confusion: a message is domain specific diagnostic information for the user. A stacktrace consists of the message and stacktrace elements that are not domain-specific. Printing out the message is relevant even when the stack trace is not: it's a summary.

However, this does point the way to a deficiency in using e.getMessage(): it doesn't print out all the diagnostic information available to the stacktrace. Specifically, it doesn't deal with chained exceptions, which may have several exceptions nested deep. Using e.getMessage() only prints out the top level – this is especially annoying when a useless java.util.concurrent.CompletionException hides the real exception underneath it.

What I'd like to see is something that runs through all the messages and sticks them together, without having to wade through stack trace elements. So if I write up something like this:

class Runner {
	private final Logger logger = LoggerFactory.getLogger(this.getClass());

	public void doError() {
		Exception ex = generateException();
		logger.error("I am an error", ex);
	}

	private Exception generateException() {
		Random rnd = new java.util.Random();
		int i = rnd.nextInt(10) + 1;
		return nestException(null, i);
	}

	private Exception nestException(Exception ex, int i) {
		if (i > 0) {
			Exception nested = new RuntimeException("Level " + i, ex);
			return nestException(nested, i - 1);
		}
		return ex;
	}
}

Then I should get a message like this:

I am an error cause=[Level 1 > Level 2 > Level 3 > Level 4 > Level 5 > Level 6]

We can make everyone happy by rendering the message using a custom converter. This means that the cause happens automatically, on every exception, and by deferring rendering to another thread and managing the thread pool, we can also reduce the impact of blocking operations.

The converter looks like this:

public class ExceptionMessageConverter extends ThrowableHandlingConverter {

    @Override
    public String convert(ILoggingEvent event) {
        Integer whitespace = getLeadingWhitespace();
        if (whitespace < 0) {
            addWarn("Cannot render whitespace less than 0!");
            whitespace = 0;
        }

        Integer depth = getDepth();
        if (depth < 1) {
            addWarn("Cannot render depth less than 1!");
            depth = 1;
        }
        String prefix = getPrefix();
        String sep = getSeparator();
        String suffix = getSuffix();
        IThrowableProxy ex = event.getThrowableProxy();
        if (ex == null) {
            return "";
        }
        return processException(ex, whitespace, depth, prefix, sep, suffix);
    }

	// ...

    protected String processException(IThrowableProxy throwableProxy,
                                      Integer whitespace,
                                      Integer depth,
                                      String prefix, String sep, String suffix) {
        String ws = String.join("", Collections.nCopies(whitespace, " "));
        StringBuilder b = new StringBuilder(ws + prefix);
        IThrowableProxy ex = throwableProxy;
        for (int i = 0; i < depth; i++) {
            b.append(ex.getMessage());
            ex = ex.getCause();
            if (ex == null || i + 1 == depth) break;
            b.append(sep);
        }
        b.append(suffix);
        return b.toString();
    }
}

and the implementation looks like this.

"%message%exmessage{1, 10, cause=[}"

This covers getMessage, which is a strength, and it also means that rendering to console can be strictly line based, which means pasting things into Slack is that much easier.

However, this only covers getMessage. This doesn't cover anything else. If the exception contains vital information but doesn't put it in getMessage, then you're essentially blind to it. This is especially annoying when it comes to the AWS Java API, because AWS exceptions are extremely rich in useful domain informationserviceException.isClockSkewException(), serviceException.isThrottlingException(), and serviceException.requestId() – but rely on the caller to know to extract and present this information.

So, if you want to render an AWS exception then you still have to call kv structured arguments to expose that information in a structured format.

This leads to an interesting set of problems. We want to extract relevant information from the exception and this is only available per exception type. This can be demonstrated with java.sql.SQLException:

public class Thrower {
    private static final Logger logger = LoggerFactory.getLogger(Thrower.class);

    public static void main(String[] progArgs) {
        try {
            doSomethingWithSQL();
        } catch (BatchUpdateException e) {
            Object[] args = ExceptionMapper.mapException(e);
            logger.error("domain specific message" + ExceptionMapper.holder(args), args);
        } catch (SQLException e) {
            Object[] args = ExceptionMapper.mapException(e);
            logger.error("domain specific message" + ExceptionMapper.holder(args), args);
        }
    }

    static void doSomethingWithSQL() throws SQLException {
        throw new BatchUpdateException();
    }
}

class ExceptionMapper {
    public static String holder(Object[] args) {
        if (args == null || args.length == 0) {
            return "";
        }
        return " params=[" + String.join(", ", Collections.nCopies(args.length - 1, "{}")) + "]";
    }

    public static Object[] mapException(BatchUpdateException e) {
        return new Object[]{
                kv("errorCode", e.getErrorCode()),
                kv("sqlState", e.getSQLState()),
                kv("updateCounts", e.getUpdateCounts()),
                e
        };
    }

    public static Object[] mapException(SQLException e) {
        return new Object[]{
                kv("errorCode", e.getErrorCode()),
                kv("sqlState", e.getSQLState()),
                e
        };
    }
}

This gives us a better result:

domain specific message params=[errorCode=0, sqlState=null, updateCounts=null] cause=[java.sql.BatchUpdateException]

This isn't a complete solution, because it has the same problem as using e.getMessage() in that it still only renders the attributes of the top level exception. If there's an SQLException halfway through the stack, you won't see the error code or the SQL state. In order to do this properly, you'll have to run the exception mapper through the complete chain and potentially namespace the key value pairs into the relevant exception context if the keys conflict.

Also, the multiple catch blocks for BatchUpdateException and SQLException are awkward. They are not disjoints, so you can't use a union, and they must be defined in order. I'd much rather do this with Scala style pattern matching, or at least a Map<Class, ExceptionMapper> that could find the most specific exception.

Finally, this is a manual process. It has to be added as code to every logging statement, and can't be added to third party code.

The correct place to put this would be a custom converter, but I don't feel like getting into it now, and the tricky bit would be populating the mapping, which is domain specific. Still, I think this points out where exception logging could be made more useful.

One thing that popped up in this is that logstash-logback-encoder doesn't provide a way to render an exception as structured data. This means that it's not so easy to parse and extract message information from stacktraces. Theres no jq style slicing and dicing of JSON logs. At best, you have sed, awk, and regex.

There are libraries that do render exceptions as JSON: logback-steno represents exceptions as JSON, as does log4j2-logstash-layout.

Logback steno renders stacktraces as a multiline string inside of a JSON field, which I'm not so concerned about. Most of the tools that allow you to do something with stacktraces, i.e. the many companies devoted to error tracking and the IDEs like IntelliJ IDEA, all work by parsing and recognizing the stacktrace in the multiline text format. The log4j2-logstash-layout library does render stacktrace information as structured data, but I don't know of anything that would consume it. As such, I don't think there's any advantage in representing a stack trace in JSON.

More …