Why I Wrote A Logging Library
I've released Blindsight 1.0.0. Blindsight is a play on "observability through logging" and is named after Blindsight by Peter Watts.
Blindsight is a logging library written in Scala that wraps SLF4J to add useful features that solve several outstanding problems with logging:
- Rendering structured logs in multiple formats through a format-independent AST and DSL.
- Expressing domain specific objects as arguments through type classes.
- Resolving operation-specific loggers through logger resolvers.
- Building up complex logging statements through fluent logging.
- Enforcing user supplied type constraints through semantic logging.
- Minimal-overhead tracing and causality tracking through flow logging.
- Providing thread-safe context to logs through context aware logging.
- Time-based and targeted diagnostic logging through conditional logging.
Why?
The answer's superficially simple enough – I've been talking about writing one for a full year now. I got a talk accepted at Scaladays 2020 talking about logging, tracing, and observability, and had some scribbled notes about abusing apply
methods and implicits for bricolage APIs. And… well. Like everyone else, I have more free time on my hands now. I have looked through every logging library I can find. There are many excellent ideas and clever implementations, but each is an individual representation. Blindsight is an attempt to synthesize these ideas together into a coherent whole.
The larger question of why I've gone to some sanity-questioning amount of effort digging this deep into logging is because I expect a return on my investment. I'd much rather write code than fix bugs. For the kinds of work I do, a debugger is not practical, and so typically I do not use a debugger. Logging is the most sustainable and scalable way I know to fix bugs, especially in production. Until there is reliable post-mortem tools, or we start using deterministic event driven state with user replay and journaling, diagnostic logging is what we've got.
Every programmer already knows this. As Brian Kernighan wrote in Unix for Beginners in 1979, "The most effective debugging tool is still careful thought, coupled with judiciously placed print statements." Every single developer has added diagnostic logging in the form of printlns to find a bug, and the real crime is that we remove them instead of keeping and treasuring them. Every single println
typed by a programmer is a prayer to the logging gods.
This blog post is also in part a developer log and an attempt to talk about why each feature is important. For each feature, I include the desired use case, the background behind them, and references to how other libraries tackle the problem. I'm not concerned with benchmarks or configuration here – only features. I hope this sparks a conversation about what logging could be, and what we can do to make better tooling for fixing bugs and increasing the observability of code.
Structured DSL
I wanted to be able to pass in ad-hoc structured data as an argument or a marker as an abstract syntax tree, and have that rendered in multiple formats, i.e. JSON for the ELK stack, and in logfmt for the console and unit testing. The end result was a DSL for structured data – this will look familiar to you if you've seen json4s.
When SLF4J was written, logging involved writing unstructured log lines to text files. Modern day systems assume structured logging, implying the existence of a structure to log. SLF4J does not provide or require a structure in its arguments. It takes a string as a message, and the arguments to the message can be literally anything.
Here's an example:
case class Lotto(
id: Long,
winningNumbers: List[Int],
winners: List[Winner],
drawDate: Option[java.util.Date]
)
val lotto = ...
logger.info("message {}", bobj(
"lotto" ->
("lotto-id" -> lotto.id) ~
("winning-numbers" -> lotto.winningNumbers) ~
("draw-date" -> lotto.drawDate.map(_.toString)) ~
("winners" -> lotto.winners.map(w => w.asBObject))
))
Multiple representations and no dependencies mean that we have to defer the encoding decision, because we want to stay as close to SLF4J as possible. This means we can't represent data as a Jackson JsonNode
, because that pulls in a requirement on a specific JSON implementation. We can't directly use logstash-logback-encoder's StructuredArguments because that pulls in a requirement to Logback specifically.
Instead, we should tie the AST representation to something suitable for the framework as a distinct dependency. This will let us map AST to StructuredArguments in a blindsight-logstash
module.
There’s another reason to provide a structured representation early, because it shouldn’t be up to the logging framework to have to figure this out.
Logging frameworks (Logback and Log4J 2 at least) don't have a centralized pipeline for processing an event into structured data once, and then passing that structured data to the appenders. Instead, it's direct – an event is matched with several appenders, and each appender has its own encoder, which has a reference to the original logging event.
The encoder is there to encode data into a stream of bytes. An encoder works best when it has a simple type, like a Date or a Long, and has to make a choice about how to represent it. You can do that in a logback.xml file, no problem.
If you have to write a custom component in your logging implementation so that you can serialize your domain objects correctly, then the representation has to be derived multiple times, for each encoder or custom converter. Domain objects represent ambiguous data that can be difficult to represent as a set of XML tags in a logback.xml file. Instead, it’s additional complexity through introspection or through annotations.
Providing a structure on the front end – before it gets to SLF4J – is a much cleaner option. It also lets you start thinking about adding a field dictionary to standardize your terms.
References
Bunyan is the OG structured logging framework. It started off with so many great ideas at once, and still has all the right ideas. Bunyan has the concept of log record fields built in from the get go.
Structlog4j has the right approach here, and passes in raw key value pairs, and provides the formatters as JSON, text and YAML in different libraries.
Json-log-domain has a similar high-level mapping over tags using a DomainMarker
, with support for Logback. The specific tags are created from YAML using a code generator.
serilog is fairly clever about how it handles destructuring. There's a list of best practices that shows serilog's timer and contextual logging as well.
logstage uses string interpolation to represent structured data in place.
log4j 2 seems like it should have this in StructuredDataMessage, but it's actually a syslog specific data structure that must conform to RFC 5424 rules, and there's no DSL or builder for StructuredDataMessage. The next closest thing is MapMessage
which has explicit format specifiers of "XML", "JSON" or "JAVA". I could always be missing something, but there does not seem to be a format independent – or even multiple-format – structured message type in Log4J 2.
Slime uses a series of implicits together with a TypeEncoder to turn scala maps and seqs into an AST.
ruby-cabin is explicit about wanting separation of data and view: "Using printf or similar logging methods is bad because you are merging your data with your view of that data. I want to be able to log in a structured way and have the log output know how that should be formatted. Maybe for humans, maybe for computers (as JSON), maybe as some other format. Maybe you want to log to a csv file because that's easy to load into Excel for analysis, but you don't want to change all your applications log methods?" It looks like Ruby-Cabin is not maintained, and there's a good article on using ougai and LogRage instead.
Type Safety
I wanted to be able to ensure that all arguments have an explicit representation in logging, through type classes. If there's no valid representation, the code should not compile.
I wanted this because SLF4J arguments have no type safety. You can pass in anything as an argument, and it’s up to the logging implementation to figure out how to render it in a given encoder. There's no way to require that the arguments passed in are of a particular type, or specify that they have specific values. From the perspective of the logging framework, accepting any object means more work and ambiguity. From the perspective of the client calling the SLF4J API – especially libraries that can't control what logging framework is being used – passing in a raw domain object means risk. It’s dangerous to pass in a raw object, because it's a blank check. The underlying logging library can call toString
, which can do anything. The toString
method on an array will print its identity, such as [I@4488aabb
. If you call toString
on IPAddress
, it will do a DNS lookup. If you call toString
on an object with sensitive information, it can potentially expose it in the logs.
Imagine if the SLF4J API took an explicit Argument
instead of taking an Object
:
Argument arg = createArgumentForFoo(foo);
logger.debug("some message {}", arg);
This means that if you passed in something and you couldn't turn it into an Argument
, you couldn't compile it.
// does not compile because not argument for notGood
NoRepresentation notGood = new NoRepresentation();
logger.debug("some message {}", notGood);
Blindsight breaks down the domain of a logging statement into Markers
, Message
, and Argument
. Blindsight leverages a type class pattern using Scala implicits to make these easier to work with: there are corresponding type classes ToMarkers
, ToMessage
, and ToArgument
type classes, respectively.
If you want to log a Lotto
instance as an Argument
, you should have a ToArgument[Lotto]
in implicit scope. This can be done very easily by using the DSL to turn it into a BObject
, and set up the argument mapping from there.
case class Lotto(
id: Long,
winningNumbers: List[Int],
winners: List[Winner],
drawDate: Option[java.util.Date]
) {
lazy val asBObject: BObject =
"lotto" ->
("lotto-id" -> id) ~
("winning-numbers" -> winningNumbers) ~
("draw-date" -> drawDate.map(_.toString)) ~
("winners" -> winners.map(w => w.asBObject))
}
object Lotto {
implicit val toArgument: ToArgument[Lotto] = ToArgument { lotto => Argument(lotto.asBObject) }
}
val winners =
List(Winner(23, List(2, 45, 34, 23, 3, 5)), Winner(54, List(52, 3, 12, 11, 18, 22)))
val lotto = Lotto(5, List(2, 45, 34, 23, 7, 5, 3), winners, None)
logger.info("winning lotto ticket = {}", lotto)
As you can see, type classes pair very well with the DSL.
References
Structlog4j handles this with the IToLog interface. It is optional, though.
Scala-structlog handles this with the concept of Tags, and disallows arguments entirely.
Maple requires a LoggingSchema to log, and creates a typed logger with the given schema.
Slime has the concept of Valuable
and Keyable
.
Bunyan has the concept of serializers that define how an object is logged with serializer functions for well-defined log field names – but they are explicit and you can pass raw objects in which could be unsafe.
Fluent API
I wanted to expose a fluent API for Blindsight.
SLF4J does not let you build up a statement, and does not have a fluent API. SLF4J 2.0 does have a fluent logging API, but it’s still in alpha.
Here's the example:
fluentLogger.info
.marker(someMarker)
.message("some message with argument {}")
.argument(argument)
.message("exception {}")
.cause(exception)
.log()
Fluent logging is structured around the logging statement components, and is call by name. I went back and forth about adding call by name semantics to the other APIs, but ultimately decided against it. Call by name still creates many short lived objects that can create memory churn in large enough quantities, and it seems best to leverage conditional logging with when
rather than make call by name the default.
Because I have a tendency to forget to add the placeholder {}
, there's also an option that appends to the message with logWithPlaceholders
as well:
fluentLogger.info
.marker(someMarker)
.message("some message with no placeholder")
.argument(argument)
.logWithPlaceholders()
Pretty straightforward.
References
Fluent builder style was popularized in 2005 by Martin Fowler, the same year that SLF4J was released. As part of the fluent builder style, there's more of an emphasis on building up state and behavior generally in a logger instance.
Fluent Loggers are relatively popular. Flogger, slf4j-fluent, logmachine, fluent-logging, godaddy-logger, and Log4J 2's LogBuilder are all examples.
Semantic API
I wanted to be able to set up a logger so that it specifically takes only a certain type. This is called a semantic API or a "strongly typed" API. This is very useful for handling operational and analytical events.
To do this, Blindsight has a Statement that is a composition of Markers
, Message
, and Arguments
. There is a corresponding type class, ToStatement
which is used to convert a domain object, and a type parameter on the logger to limit the input to the logger.
Here's the example:
implicit val myEventToStatement: ToStatement[MyEvent] = ???
val eventLogger: SemanticLogger[MyEvent] = logger.semantic[MyEvent]
val myEvent = new MyEvent();
logger.info(myEvent);
This is distinct from passing through a message and argument here – the event may be decomposed into markers, message and argument under the hood, but it is single purpose.
Note the use of logger.semantic[MyEvent]
to explicitly specify the type as a parameter. Leveraging Scala's type system means that we can also use refinement types. Between the refined library and a semantic logger, I can tell the compiler I only want to log URLs:
implicit def refinedStringToStatement[R]: ToStatement[Refined[String, R]] = ToStatement { str =>
Statement().withMessage(str.value)
}
val urlLogger = logger.semantic[String Refined Url]
urlLogger.info(refineMV[Url]("http://google.com"))
This is checked at compile time.
References
The term "semantic logging" originates from the Event Tracing for Windows framework. In ETW, all events must extend from EventSource
. Log4J 2 also has a similar feature passing in a Message, but the implementations are dependent on the output format, not the domain.
Maple uses a logger with a parameterized type, but it's indirect – you're using a LoggingSchema and then would require one element.
Talaan and aQute have strongly typed logging but do it by defining a trait with specific methods, and do not depend on generics.
Both serilog and LogMachine have a message template format. I don't think these qualify as strongly typed because there is no type information in the template, i.e. printf
style. Instead, serilog has message templates and logmachine has message formatting. I have not touched the Message
functionality but there is an opportunity in there to have richer template support.
I believe that SLF4J doesn't have generics because they were just too bleeding-edge at the time. Generics were added in Java 1.5 in 2004. SLF4J came out in 2005, and had to work with JDK 1.4 code. The closest analogue is the EventLogger which uses an EventData
object – Log4J2 uses StructuredDataMessage
but is otherwise identical.
Contextual Logging
I wanted Blindsight to be able to build up contextual logging, so that a statement could be enriched with additional context that wasn't explicitly in the statement.
In SLF4J, the way to do this is through markers. You can do far more with markers than just filter or trigger specific log messages – markers are an essential escape hatch in SLF4J that let you do everything from adding extra logic processing to playing music.
Here's the example:
val entryLogger = logger.withMarker(MarkerFactory.getMarker("ENTRY"))
entryLogger.trace("entry: entering method")
There are times that I want to associate information with a logger without having to explicitly log it. I wanted the ability to say "every time I write a logging statement, I want the http session id" and just have that happen. With every API.
SLF4J has an odd relationship with markers. Although markers are a contextual concern, they must be specifically applied in predicates. This means you have to apply them twice:
if (logger.isLoggingDebug(myMarker)) {
logger.debug(myMarker, "some message");
}
This is prone to failure, because it's very easy to forget to pass in the marker twice. What we really want is something like:
Logger loggerWithMarker = LoggerFactory.getLogger(getClass()).withMarker(myMarker);
if (loggerWithMarker.isLoggingDebug()) {
loggerWithMarker.debug("some message");
}
This is actually what we do under the hood, with a bit more logic for merging in any state markers and a marker passed in explicitly by statement.
SLF4J does come with a Mapped Diagnostic Context, but MDC relies on thread local storage, meaning that you will get the wrong MDC to use when code crosses an asynchronous boundary (using a function with a different Executor), and forcing a reliance on bare strings. There are workarounds for this, but as they involve using a custom dispatcher they involve custom code and configuration. Using a marker is actually far more flexible than MDC, and transferring a logger between threads is much more reliable than MDC. Using a marker also means that we don't have to pass around an implicit context between methods – instead, the logger itself can be passed as an implicit parameter if needed.
References
Bunyan did this first with log.child, to explicitly log context:
var bunyan = require('bunyan');
var log = bunyan.createLogger({name: 'myapp'});
function Wuzzle(options) {
this.log = options.log.child({widget_type: 'wuzzle'});
this.log.info('creating a wuzzle')
}
Wuzzle.prototype.woos = function () {
this.log.warn('This wuzzle is woosey.')
}
log.info('start');
var wuzzle = new Wuzzle({log: log});
wuzzle.woos();
log.info('done');
Many of the fluent APIs allow you to build up statements, but they do so at the statement level, rather than the logger level. That is, they build up:
logger.info.withMarker(foo).log("hello")
Whereas what I want is:
logger.withMarker(foo).info("hello");
The second requirement is that the context is internal to the logger, and there are no extra implicit arguments. This is in contrast to several other Scala libraries that provide contextual logging, but do so through an implicit context.
scala-logging does this with LoggerTakingImplicit
using a CanLog
type class:
case class CorrelationId(value: String)
implicit case object CanLogCorrelationId extends CanLog[CorrelationId] {
override def logMessage(originalMsg: String, a: CorrelationId): String = s"${a.value} $originalMsg"
}
implicit val correlationId = CorrelationId("ID")
val logger = Logger.takingImplicit[CorrelationId]("test")
logger.info("Test") // takes implicit correlationId and logs "ID Test"
One interesting thing about CanLog
is that it includes an afterLog
method that can be used to clean up after MDC.
Logoon does do provide contextual logging, but does it by passing through an implicit LogContext as an argument:
def startTransaction(user: String, id: Int)(implicit context: LogContext): Unit =
info("start transaction $id - $user")
Play Framework likewise has an implicit MarkerContext which also means you don't have to explicitly pass in the marker as an argument.
def methodInOtherExecutionContext()(implicit mc: MarkerContext): Result = {
logger.debug("index: ") // same as above
Ok("testing")
}
Structlog4J adds contextual logging, but it applies globally through a mandatory context lambda.
odin has the concept of contextual effects, but does so in a strict FP context.
serilog has an enrichment facility that looks broadly similar. There is a very good article on structured logging and context. The latest version even has selective enrichment to add markers on condition or at a specific log level.
logstage has a contextual logger feature:
val contextLogger: IzLogger = logger("key" -> "value")
Which is pretty much dead on.
One drawback of having the markers be immutable in Blindsight is that the context is not call-by-name at the point of logging. This means that you can't pull metrics automatically into a logging statement, as per ruby-cabin metrics or ougai's before-logging hooks. There is a way of doing this using custom converters with logback-sigar but it is specific to the encoder, not the statement.
Conditional Logging
I wanted to set up conditional logging on both the logger and the statement. This is vital, because we want to precisely limit and scope diagnostic logging to where it's most useful, without spamming the rest of the system.
SLF4J does conditional logging through isLoggingInfo()
and isLoggingInfo(marker)
, using the marker for custom conditions – but this happens on the backend and is implementation-specific. Rather than doing that, we want a predicate evaluation that only logs when the predicate is true.
Here's how you do it with a logger:
def loggerCondition: Boolean = ???
val loggerOnCondition = logger.onCondition(loggerCondition)
loggerOnCondition.info("Only logs when condition is true")
Conditional logging can also be statement dependent:
val statementCondition: Boolean = ???
logger.info.when(statementCondition) { info =>
info("log")
}
And you can combine them together:
val loggerOnCondition = logger.onCondition(loggerCondition)
loggerOnCondition.info.when(statementCondition) { info =>
info("Only logs when condition is true")
}
Conditional logging is very useful in conjunction with tracer-bullet logging, where you set a flag which will cause the logging event to be rendered regardless of the stated level, using a turbofilter.
val loggerOnCondition = logger.onCondition(loggerCondition).withMarker(AlwaysLogMarker)
loggerOnCondition.trace("This will log even if the logger level is ERROR")
Conditional logging gives you access to do all sorts of cool things beyond simply adding a guard to diagnostic logging and tracing.
You can control logging in a particular component by time, or by duration. You can turn on debug logging in a component for five minutes after an exception occurs. You can turn off error logging at night when you know there’s scheduled maintenance.
You can control logging by feature flag. Depending on how your feature flag system is set up, this can also give you the ability to log by an HTTP session or specific user.
You can even control logging by monitoring memory pressure: if you’re concerned about expensive logging statements, you can throttle logging so that it only logs when the system has resources to spare.
One thing that consistently shows up everywhere is the idea of being able to change log levels in production without restarting the application. Conditional logging is a partial solution to this, not a complete one: it only works when specifically enabled and configured from code. This means it doesn't work with library code written directly against SLF4J. There is no SLF4J API for modifying log levels in production, and I've only done it using Logback.
References
Terse-logback exploits markers to pass through the SLF4J predicates, and sets up a "turbo marker" which can interact with the filter. This can be used to do conditional logic on the backend:
String userId = "28";
ApplicationContext applicationContext = new ApplicationContext(userId);
UserMarkerFactory userMarkerFactory = new UserMarkerFactory();
userMarkerFactory.addUserId(userId);
UserMarker userMarker = userMarkerFactory.create(applicationContext);
logger.info(userMarker, "Hello world, I am info and log for everyone");
logger.debug(userMarker, "Hello world, I am debug and only log for user 28")
I wrote this code originally, and while I'm proud to have written it, it's still tying in a whole bunch of business logic into the logging framework. Ideally, you don’t want your logging implementation involved in complex business logic. You want it to be dumb. You want the logging implementation to make two decisions:
- Given the logging level, marker and MDC, do I create a logging event?
- Does this logging event pass the filters for this appender?
So being able to do it on the front end is much, much simpler.
The libraries that do conditional logging tend to do so in response to system constraints, such as Swrve's rate limited logger:
private static final Logger logger = LoggerFactory.getLogger(getClass());
private static final RateLimitedLog rateLimitedLog = RateLimitedLog
.withRateLimit(logger)
.maxRate(5).every(Duration.ofSeconds(10))
.build();
Likewise, Flogger and slf4j-fluent has the concept of timed logging through atMostEvery and every – however, this is hardcoded with an internal representation, and does not let the application decide when to log, and only has method specific conditional methods.
serilog has the concept of a filtering DSL that provides the event as input to the predicate, but it happens in logger configuration, not per statement:
Log.Logger = new LoggerConfiguration()
.Filter.ByExcluding("Name = 'World'")
.WriteTo.LiterateConsole()
.CreateLogger();
This is similar functionality to Logback's janino based GEventEvaluator, but has some additional type safety built in. It does bring up the possibility of per-statement conditional logging based on Blindsight AST evaluation, which would be a fun project.
beautiful_logger uses very fancy method handling can fully inline any calls to the Logger API without decrementing your inlining budget. However, those conditionals are tied to a logger being globally enabled or disabled, and does not expose a condition for logging.
zerolog provides an alternative logging facade to SLF4J with similar intent as beautiful_logger of minimizing the cost of logging by using a fluent API, with an admirably informative FAQ. It does not talk about the cost of memory churn of using lambda as an argument, though.
Odin has conditional logging, although from a functional perspective.
Flow Control
I wanted to add trace entry and exit methods, and have logging tell me what the inputs and outputs of arguments and return values are automatically. This is known as flow control, also known as causality tracking or tracing.
SLF4J does not know if you’re entering, exiting, or erroring from a method, and cannot render the arguments from a method automatically. There is an SLF4J extension called XLogger which lets you add entry and exit methods, but that’s something you have to do by hand – it's not done automatically.
I wanted something seamless, like:
public int foo(int arg1, int arg2) {
return arg1 + arg2;
}
int result = foo(1, 2)
Here's how Blindsight does, which is pretty close:
private implicit def flowBehavior[B: ToArgument]: FlowBehavior[B] = new SimpleFlowBehavior
def flowMethod(arg1: Int, arg2: Int): Int = logger.flow.trace {
arg1 + arg2
}
With the output of:
[e.f.SimpleFlow - arg1=1,arg2=2 => 3 at example.flow.SimpleFlow.flowMethod(SimpleFlow.scala:40)
I can change the logging behavior without touching the code. Because I can swap out the FlowBehavior
, I can change what logging statements come out. This means I can send the appropriate manual tracing to do something fancy like Honeycomb Tracing.
Flow logging can also be used as a timer operation, by adding a start time on entry and comparing it against the exit time.
Note that this is distinct from opentracing in that it's tied directly the method that it's declared in. This follows the convention I've seen in tracing statements, where they're really only used for entry and exit from a method. I looked at implementing an implicit span, much like the Rust tracing library, but it seemed like overkill.
References
terse-logback has bytecode instrumentation automatically adds entry/exit logging statements to third party code through configuration. This can do useful things like show the flow of JSSE certificate validation.
jackplay is a JVM agent that has a run-time HTTP server that lets you add tracing to third-party methods dynamically. However, it doesn't go through SLF4J and writes to its own log file.
Byteman seems similar to jackplay in that it's a jvm agent that can add rules into the system, potentially adding SLF4J statements. There is a programmer's guide and it is on github, but the closest thing seems to be a Wildfly Logging Extension.
youdebug is a JDI level tool that can evaluate expressions and print them through Groovy scripts. This looks like it hasn't been touched in a while, but being able to attach a debugger to a process in production without restarting it and then evaluating arguments and results it from a script would neatly short-circuit manual tracing.
stackparam is a JVM agent using JVMTI to add method parameters to Java stack traces. Leveraging the JVMTI can be more powerful when decoupled from a IDE debugger and the user-driven input cycle – for example, jvmti-tools provides rich NullPointerExceptions on JDK 1.8, essentially backporting JDK-8218628 and can trace JVM events.
jcabi-aspects does this using AspectJ annotations. Spring AOP also appears to use the same method although I haven't tried it myself.
Opentracing, despite the name, is more oriented towards tracing services with regards to latency, and has less of a focus on tracking state transitions and computation in a single process.
Foolscap, a network debugger, has causality tracing, which it ascribes to Causeway, a debugger written in E. Causeway is fascinating and worth a blog post in itself, but that's a discussion for another time.
Eliot has the concept of wrapping code blocks in actions, which result in messages, and refers back to Foolscap. There's a nice article.
Treelog creates an in-memory tree from a series of computations, but does not tie into a logging framework and applies an explicit DescribedComputation effect – good for FP work if you're using tagless final, not good if you want to apply logging without changing your return types.
Précepte is "an opinionated purely functional & lazy API stacking some useful monads to help you observe the execution of your runtime effects by instrumenting your code with a managed state propagated all along your business workflow." It is written in FP Scala, and has the concept of contextualized logs and an execution graph.
I am surprised that tracing appears to be more about inferring, speculating, or reconstructing execution flow, rather than adding disabled "entry/exit" instrumentation "everywhere" by default and then sampling by metrics driven hotspots or through tracer bullets. I would have thought this would fall out naturally from chaos engineering or fault injection, but it seems Not To Be A Thing.
Compile Time Source Information
I wanted to have the source code line, file and argument information available without runtime costs, similar to Python's __name__
, C++'s __LINE__
or Ruby's __FILE__.
Being able to track a logging statement back to a source is tremendously useful, especially when the message itself is not enough to determine what the underlying cause is.
Conversely, many programmers have found themselves utterly flummoxed by a logging statement that appears to come from nowhere and cannot be cross referenced.
It is critical that source code information is available at compile time, as runtime information can be very expensive to track down, and some logging frameworks, such as slf4j-simple, do not provide any caller data at all.
This is provided automatically through Li Haoyi's sourcecode. This is an external dependency on another Scala library, but I think it's worth it.
References
POS (short for POSition) uses scala macros to provide compile-time information in logging, but it depends directly on Logback and does not have argument information.
stackparam allows programmatic value access of caller information, produced natively and is reported to be high performing.
Flogger has the concept of a LogSite, but it must be provided explicitly. Likewise, Log4J 2 has a withLocation
method in the LogBuilder that asks for the stack trace entry at a fixed index.
Logback-bytebuddy adds source code information to logging statements by inspecting JSR-45 debug data in class files.
Extensible APIs
I wanted people to be able to put together their own domain specific logging APIs using Blindsight's classes. I have no way of knowing how successful this is until someone else tries it, but I'll go over it briefly so that you can get a feel.
The basic unit of Blindsight is the LoggerComponent, which takes a Predicate
, and a Method
. You define a logger by passing in the kind of Predicate
you want, and the kind of Method
you want.
trait LoggerComponent[P, M] {
type Predicate <: P
type Method <: M
type Self
}
There's some infrastructure to get the logger set up:
class MyLogger(underlying: org.slf4j.Logger, markers: Markers)
extends MyLogger.Base[MyMethod](underlying, markers) {
override protected def newInstance(
underlying: org.slf4j.Logger,
markerState: Markers
): Self = new MyLogger(underlying, markerState)
override protected def newMethod(level: Level): MyMethod=
new MyMethod.Impl(level, this)
override def onCondition(test: => Boolean): MyLogger[MyMethod] =
new MyLogger.Conditional(test, this)
}
And then you can go on to implement MyMethod
that can take special implicits and whatnot:
trait MyMethod {
def apply[M: ToMessage](
message: M
)(implicit line: Line, file: File, enclosing: Enclosing, specialContext: MyContext): Unit
// ...
}
The hook to the underlying SLF4J logger is run through the ParameterList
:
protected val parameterList: ParameterList = logger.parameterList(level)
which is in turn broken down by logger level, for example Info
:
class Info(logger: org.slf4j.Logger) extends ParameterList(Level.INFO, logger) {
override def executePredicate(): Boolean = logger.isInfoEnabled
override def executePredicate(marker: Marker): Boolean = logger.isInfoEnabled(marker)
override def message(msg: String): Unit = logger.info(msg)
// ... rest of the message calls
override def markerMessage(marker: Marker, msg: String): Unit = logger.info(marker, msg)
// ... rest of the marker message calls
}
This means you only have to provide the method and pass the level to the ParameterList
. It'll do the rest.
Providing API options leaves the door open for additional functionality. In particular, the conditional and contextual logging APIs work best when paired with a domain specific API – rather than provide a rate-limiting API or a time-based scheduling API for logging, it's better to expose a low level boolean conditional, and let end users provide something at a higher level. For example, you can reimplement flogger's every(100)
or atMost(1, SECONDS)
in your own API, or require that every logger method take an implicit context.
Note that this is distinct from passing in domain specific markers or arguments, or providing a logging schema. Logging APIs can involve specific behavior that may involve filtering or transforming an AST, passing logging statements through metrics.
References
I don't know of any other APIs that have the specific intent of additional behavior, but there are some libraries that do things that Blindsight deliberately doesn't do.
Godaddy-logger has a number of interesting features that don't really fit with Blindsight, like a collection filter, and custom mappers that can be placed directly on a logger – Blindsight assumes that type class instances are defined in companion objects or relies on the implicits import tax.
Flogger is aware of this with context specific grammar:
logger.at(WARNING).whenSystem().isLowOnMemory().log("");
This appears to be only in the Javadoc, and it's not clear how you'd extend Flogger from documentation.
Conclusion
Blindsight is the logging library I always wanted, and I'm looking forward to leveraging it in all my future projects. It does all the things I need to, and makes logging safer, easier, and more precise. Contextual and conditional logging provide more powerful loggers, the APIs provide options, and type classes and the DSL provides the connective tissue for structured logging.
I hope this sparks a conversation about what logging could be, and what we can do to make better tooling for fixing bugs and increasing the observability of code. Thank you for your time.