Jekyll2023-10-01T14:39:37-07:00https://tersesystems.com/Terse SystemsEchopraxia 3.02023-07-04T15:46:03-07:002023-07-04T15:46:03-07:00https://tersesystems.com/blog/2023/07/04/echopraxia-3-0<p><strong>TL;DR</strong> Echopraxia is a structured logging API for Java and Scala. I've released Echopraxia 3.0 which has a number of new features, most notably more control over presentation logic, custom typed attributes, better exception handling, and removing hardcoded dependencies.</p>
<p>You can check it out at <a href="https://github.com/tersesystems/echopraxia/">https://github.com/tersesystems/echopraxia/</a> or check out the new <a href="https://tersesystems.github.io/echopraxia/">documentation site</a>.</p>
<p>This is going to be a development log going into technical details, explaining the why behind the how.</p>
<h2 id="presentation-logic">Presentation Logic</h2>
<p>Echopraxia's API is built around structured logging input. In practical terms, that means when you write:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"{} logged in"</span><span class="o">,</span> <span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">user</span><span class="o">(</span><span class="s">"user"</span><span class="o">,</span> <span class="n">thisUser</span><span class="o">));</span>
</code></pre></div></div>
<p>Then you expect to see a line oriented output that looks something like:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>INFO user={<some human readable data here>} logged in
</code></pre></div></div>
<p>And you expect that the JSON output will be:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="s2">"level"</span><span class="p">:</span><span class="w"> </span><span class="s2">"INFO"</span><span class="p">,</span><span class="w">
</span><span class="s2">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"user={<some human readable data here>} logged in"</span><span class="p">,</span><span class="w">
</span><span class="s2">"user"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="err"><some</span><span class="w"> </span><span class="err">machine</span><span class="w"> </span><span class="err">readable</span><span class="w"> </span><span class="err">data</span><span class="w"> </span><span class="err">here></span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>The problem: the user may see irrelevant junk inside of <code class="language-plaintext highlighter-rouge">user={...}</code> and really only cares about <code class="language-plaintext highlighter-rouge">id</code> and <code class="language-plaintext highlighter-rouge">role</code>. The machine-readable data also may want JSON in a particular format – Elasticsearch wants stable field names, stable mappings, and has trouble understanding deeply nested objects and arrays. Or there may be additional data that isn't explicitly called out in the field <a href="https://paulfrazee.medium.com/pauls-notes-on-how-json-ld-works-965732ea559d">JSON-LD</a> includes a <code class="language-plaintext highlighter-rouge">@type</code> field that is used in typed values, i.e. a timestamp may have a type of <code class="language-plaintext highlighter-rouge">http://www.w3.org/2001/XMLSchema#dateTime</code>, but that isn't relevant to the human.</p>
<p>The paradox here is that although structured logging involves packaging arguments into a structured format, the presentation of that data is very different between machine-readable format and "ergonomic" human-readable format. While logfmt is a recognizable and compact format, it's still machine based – it does not care about what is most relevant for a human to see. Meanwhile, from the end user's perspective, there's a loss of utility in rendering structured data: they used to be able to control the presentation exactly with <code class="language-plaintext highlighter-rouge">toString</code>, and now they can't.</p>
<p>This issue compounds when we start getting into complex objects and arrays. When rendering an AST, batches of paginated data, or encrypted data, there's an issue of presentation. Should the user see the entire AST, or only the relevant bits? Does the user care about the contents of the batch, or just that it's the right length? Should the user see the unencrypted data, or should it be filtered or invisible?</p>
<h3 id="presentation-hints">Presentation Hints</h3>
<p>The solution in 3.0 is to add typed attributes to fields. These attributes are used to add extra metadata to a field, so that a formatter has more to work with than just the name and value. Then we can add some <a href="https://tersesystems.github.io/echopraxia/3.0.0/usage/fieldbuilder/#field-presentation">presentation hints</a> to specialize fields so that a formatter can decide how to render this field in particular, and extend the <code class="language-plaintext highlighter-rouge">Field</code> type to <code class="language-plaintext highlighter-rouge">PresentationField</code> with some extra methods to provide those hints.</p>
<p>For example, one of the hints is <a href="https://tersesystems.github.io/echopraxia/3.0.0/usage/fieldbuilder/#ascardinal">asCardinal</a>, which renders a field as a <a href="https://en.wikipedia.org/wiki/Cardinal_number">cardinal number</a> in a line oriented format. This is most useful for very long strings and arrays:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span><span class="o">.</span><span class="na">debug</span><span class="o">(</span><span class="s">"{}"</span><span class="o">,</span> <span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">array</span><span class="o">(</span><span class="s">"elements"</span><span class="o">,</span> <span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">).</span><span class="na">asCardinal</span><span class="o">());</span>
</code></pre></div></div>
<p>renders as:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>elements=|3|
</code></pre></div></div>
<p>Other useful presentation hints are <a href="https://tersesystems.github.io/echopraxia/3.0.0/usage/fieldbuilder/#aselided">asElided</a> which will "skip" a field so it doesn't show in the <code class="language-plaintext highlighter-rouge">toString</code> formatter, and <a href="https://tersesystems.github.io/echopraxia/3.0.0/usage/fieldbuilder/#abbreviateafter">abbreviateAfter</a> which truncates a string or array after a number of elements.</p>
<h3 id="structured-format">Structured Format</h3>
<p>While it's nice to be able to customize values, there will be cases where the string we want a human to see is not the string we want the machine to see. Take the case of a duration:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span><span class="o">.</span><span class="na">debug</span><span class="o">(</span><span class="s">"{}"</span><span class="o">,</span> <span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">duration</span><span class="o">(</span><span class="s">"duration"</span><span class="o">,</span> <span class="n">Duration</span><span class="o">.</span><span class="na">ofDays</span><span class="o">(</span><span class="mi">1</span><span class="o">)));</span>
</code></pre></div></div>
<p>We want to render this in a human readable format:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"1 day"
</code></pre></div></div>
<p>But we want to see the <a href="https://en.wikipedia.org/wiki/ISO_8601#Durations">ISO duration</a> format in JSON:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{"duration": "PT24H"}
</code></pre></div></div>
<p>Simply rendering a <code class="language-plaintext highlighter-rouge">Value.string</code> won't work here, and overriding <code class="language-plaintext highlighter-rouge">toString</code> won't be enough. Instead, we have to provide both human and machine values. We can do that by passing a string with the human value, and using <code class="language-plaintext highlighter-rouge">withStructuredFormat</code> for the machine value:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">MyFieldBuilder</span> <span class="kd">extends</span> <span class="n">PresentationFieldBuilder</span> <span class="o">{</span>
<span class="kd">public</span> <span class="n">PresentationField</span> <span class="nf">duration</span><span class="o">(</span><span class="n">String</span> <span class="n">name</span><span class="o">,</span> <span class="n">Duration</span> <span class="n">duration</span><span class="o">)</span> <span class="o">{</span>
<span class="n">Field</span> <span class="n">structuredField</span> <span class="o">=</span> <span class="n">string</span><span class="o">(</span><span class="n">name</span><span class="o">,</span> <span class="n">duration</span><span class="o">.</span><span class="na">toString</span><span class="o">());</span>
<span class="k">return</span> <span class="nf">string</span><span class="o">(</span><span class="n">name</span><span class="o">,</span> <span class="n">duration</span><span class="o">.</span><span class="na">toDays</span><span class="o">()</span> <span class="o">+</span> <span class="s">" day"</span><span class="o">)</span>
<span class="o">.</span><span class="na">asValueOnly</span><span class="o">()</span>
<span class="o">.</span><span class="na">withStructuredFormat</span><span class="o">(</span><span class="k">new</span> <span class="n">SimpleFieldVisitor</span><span class="o">()</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="nd">@NotNull</span> <span class="n">Field</span> <span class="nf">visitString</span><span class="o">(</span><span class="nd">@NotNull</span> <span class="n">Value</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">stringValue</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">structuredField</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">});</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>This <code class="language-plaintext highlighter-rouge">withStructuredFormat</code> method adds an attribute that takes a <code class="language-plaintext highlighter-rouge">FieldVisitor</code> interface, following the <a href="https://en.wikipedia.org/wiki/Visitor_pattern">visitor pattern</a>. Here, we only care about swapping out the string value, so <code class="language-plaintext highlighter-rouge">visitString</code> is all that's required.</p>
<p>This also covers the case where we want to render extra information or do some transformation for the machine, so we could add <code class="language-plaintext highlighter-rouge">@type</code> information for JSON-LD:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"{}"</span><span class="o">,</span> <span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">instant</span><span class="o">(</span><span class="s">"startTime"</span><span class="o">,</span> <span class="n">Instant</span><span class="o">.</span><span class="na">ofEpochMillis</span><span class="o">(</span><span class="mi">0</span><span class="o">)));</span>
</code></pre></div></div>
<p>in text format;</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>startTime=1970-01-01T00:00:00Z
</code></pre></div></div>
<p>and in JSON:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="s2">"startTime"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="s2">"@type"</span><span class="p">:</span><span class="s2">"http://www.w3.org/2001/XMLSchema#dateTime"</span><span class="p">,</span><span class="w">
</span><span class="s2">"@value"</span><span class="p">:</span><span class="s2">"1970-01-01T00:00:00Z"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>and we can even cover this for the array case:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"{}"</span><span class="o">,</span> <span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">instantArray</span><span class="o">(</span><span class="s">"instantArray"</span><span class="o">,</span> <span class="n">List</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">Instant</span><span class="o">.</span><span class="na">ofEpochMillis</span><span class="o">(</span><span class="mi">0</span><span class="o">))));</span>
</code></pre></div></div>
<p>produces:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="s2">"instantArray"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="s2">"@type"</span><span class="p">:</span><span class="s2">"http://www.w3.org/2001/XMLSchema#dateTime"</span><span class="p">,</span><span class="s2">"@value"</span><span class="p">:</span><span class="s2">"1970-01-01T00:00:00Z"</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>And here's the implementation:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">InstantFieldBuilder</span> <span class="kd">implements</span> <span class="n">PresentationFieldBuilder</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">FieldVisitor</span> <span class="n">instantVisitor</span> <span class="o">=</span> <span class="k">new</span> <span class="n">InstantFieldVisitor</span><span class="o">();</span>
<span class="kd">public</span> <span class="n">PresentationField</span> <span class="nf">instant</span><span class="o">(</span><span class="n">String</span> <span class="n">name</span><span class="o">,</span> <span class="n">Instant</span> <span class="n">instant</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">string</span><span class="o">(</span><span class="n">name</span><span class="o">,</span> <span class="n">instant</span><span class="o">.</span><span class="na">toString</span><span class="o">()).</span><span class="na">withStructuredFormat</span><span class="o">(</span><span class="n">instantVisitor</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="n">PresentationField</span> <span class="nf">instantArray</span><span class="o">(</span><span class="n">String</span> <span class="n">name</span><span class="o">,</span> <span class="n">List</span><span class="o"><</span><span class="n">Instant</span><span class="o">></span> <span class="n">instants</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">fb</span><span class="o">.</span><span class="na">array</span><span class="o">(</span><span class="n">name</span><span class="o">,</span> <span class="n">Value</span><span class="o">.</span><span class="na">array</span><span class="o">(</span><span class="n">i</span> <span class="o">-></span> <span class="n">Value</span><span class="o">.</span><span class="na">string</span><span class="o">(</span><span class="n">i</span><span class="o">.</span><span class="na">toString</span><span class="o">()),</span> <span class="n">instants</span><span class="o">))</span>
<span class="o">.</span><span class="na">withStructuredFormat</span><span class="o">(</span><span class="n">instantVisitor</span><span class="o">);</span>
<span class="o">}</span>
<span class="kd">class</span> <span class="nc">InstantFieldVisitor</span> <span class="kd">extends</span> <span class="n">SimpleFieldVisitor</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="nd">@NotNull</span> <span class="n">Field</span> <span class="nf">visitString</span><span class="o">(</span><span class="nd">@NotNull</span> <span class="n">Value</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">stringValue</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">typedInstant</span><span class="o">(</span><span class="n">name</span><span class="o">,</span> <span class="n">stringValue</span><span class="o">);</span>
<span class="o">}</span>
<span class="n">PresentationField</span> <span class="nf">typedInstant</span><span class="o">(</span><span class="n">String</span> <span class="n">name</span><span class="o">,</span> <span class="n">Value</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">v</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">object</span><span class="o">(</span><span class="n">name</span><span class="o">,</span> <span class="n">typedInstantValue</span><span class="o">(</span><span class="n">v</span><span class="o">));</span>
<span class="o">}</span>
<span class="n">Value</span><span class="o">.</span><span class="na">ObjectValue</span> <span class="nf">typedInstantValue</span><span class="o">(</span><span class="n">Value</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">v</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">Value</span><span class="o">.</span><span class="na">object</span><span class="o">(</span>
<span class="n">string</span><span class="o">(</span><span class="s">"@type"</span><span class="o">,</span> <span class="s">"http://www.w3.org/2001/XMLSchema#dateTime"</span><span class="o">),</span> <span class="n">keyValue</span><span class="o">(</span><span class="s">"@value"</span><span class="o">,</span> <span class="n">v</span><span class="o">));</span>
<span class="o">}</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="nd">@NotNull</span> <span class="n">ArrayVisitor</span> <span class="nf">visitArray</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">InstantArrayVisitor</span><span class="o">();</span>
<span class="o">}</span>
<span class="kd">class</span> <span class="nc">InstantArrayVisitor</span> <span class="kd">extends</span> <span class="n">SimpleArrayVisitor</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">visitStringElement</span><span class="o">(</span><span class="n">Value</span><span class="o">.</span><span class="na">StringValue</span> <span class="n">stringValue</span><span class="o">)</span> <span class="o">{</span>
<span class="k">this</span><span class="o">.</span><span class="na">elements</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">typedInstantValue</span><span class="o">(</span><span class="n">stringValue</span><span class="o">));</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<h3 id="field-creation">Field Creation</h3>
<p>This covers the cases I can think about, but make it really work it has to be extensible so users can add their own custom methods, i.e <code class="language-plaintext highlighter-rouge">asDecrypted()</code> to strip decryption from a value. So instead of returning a <code class="language-plaintext highlighter-rouge">PresentationField</code>, we should be need to work with a field as a generic user defined type extending <code class="language-plaintext highlighter-rouge">Field</code>.</p>
<p>Field creation in Echopraxia comes down to a factory method:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Field</span> <span class="n">field</span> <span class="o">=</span> <span class="n">Field</span><span class="o">.</span><span class="na">keyValue</span><span class="o">(</span><span class="n">name</span><span class="o">,</span> <span class="n">value</span><span class="o">);</span>
</code></pre></div></div>
<p>This has to be changed so that it takes a <code class="language-plaintext highlighter-rouge">Class<T></code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PresentationField field = Field.keyValue(name, value, PresentationField.class);
</code></pre></div></div>
<p>This also means that users can modify field creation in general, so it can be extended with metrics, validation, caching, etc.</p>
<h2 id="field-builders">Field Builders</h2>
<p>The other change to Echopraxia is the removal of <code class="language-plaintext highlighter-rouge">FieldBuilder</code> as a lower bound on the loggers.</p>
<p>Before, you could do the following using <code class="language-plaintext highlighter-rouge">Logger<?></code> and it would act like it was <code class="language-plaintext highlighter-rouge">Logger<FieldBuilder></code>:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Logger</span><span class="o"><?></span> <span class="n">logger</span> <span class="o">=</span> <span class="n">LoggerFactory</span><span class="o">.</span><span class="na">getLogger</span><span class="o">(</span><span class="n">getClass</span><span class="o">());</span>
</code></pre></div></div>
<p>This no longer works in 3.0 and the default is <code class="language-plaintext highlighter-rouge">PresentationFieldBuilder</code>, not <code class="language-plaintext highlighter-rouge">FieldBuilder</code>:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Logger</span><span class="o"><</span><span class="n">PresentationFieldBuilder</span><span class="o">></span> <span class="n">logger</span> <span class="o">=</span> <span class="n">LoggerFactory</span><span class="o">.</span><span class="na">getLogger</span><span class="o">(</span><span class="n">getClass</span><span class="o">());</span>
</code></pre></div></div>
<p>If you still want to use <code class="language-plaintext highlighter-rouge">FieldBuilder</code> or your own custom instance then you can now pass a field builder as argument (instead of having to calling <code class="language-plaintext highlighter-rouge">withFieldBuilder</code>):</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Logger</span><span class="o"><</span><span class="n">MyFieldBuilder</span><span class="o">></span> <span class="n">logger</span> <span class="o">=</span> <span class="n">LoggerFactory</span><span class="o">.</span><span class="na">getLogger</span><span class="o">(</span><span class="n">getClass</span><span class="o">(),</span> <span class="n">MyFieldBuilder</span><span class="o">.</span><span class="na">instance</span><span class="o">());</span>
</code></pre></div></div>
<p>There were two justifications initially for using <code class="language-plaintext highlighter-rouge">Logger<FB extends FieldBuilder></code>: minimizing verbosity and providing a minimal set of functionality for building and extending fields. The first justification is weak, and the second is offset by the assumptions that <code class="language-plaintext highlighter-rouge">FieldBuilder</code> makes for the user.</p>
<h3 id="minimizing-verbosity">Minimizing Verbosity</h3>
<p>At the time that Echopraxia was first sketched out, JDK 1.8 was much more popular. This is no longer the case – JDK 11 is long in the tooth now, and JDK 17 is the standard. The language has evolved, and now has type inference.</p>
<p>This means that if you're concatenating loggers in a method, you'll use <code class="language-plaintext highlighter-rouge">var</code>:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">Foo</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">doStuff</span><span class="o">(</span><span class="n">Instant</span> <span class="n">startTime</span><span class="o">)</span> <span class="o">{</span>
<span class="n">var</span> <span class="n">log</span> <span class="o">=</span> <span class="n">logger</span><span class="o">.</span><span class="na">withFields</span><span class="o">(</span><span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">instant</span><span class="o">(</span><span class="s">"startTime"</span><span class="o">,</span> <span class="n">startTime</span><span class="o">))</span>
<span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"doStuff: make things happen"</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>And if you're defining a static final logger, you're not going to be bothered because <code class="language-plaintext highlighter-rouge">private static final</code> is already an up front cost:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">Foo</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">Logger</span><span class="o"><</span><span class="n">PresentationFieldBuilder</span><span class="o">></span> <span class="n">logger</span> <span class="o">=</span>
<span class="n">LoggerFactory</span><span class="o">.</span><span class="na">getLogger</span><span class="o">(</span><span class="n">Foo</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="o">}</span>
</code></pre></div></div>
<p>So this is a moot point.</p>
<h3 id="hidden-assumptions">Hidden Assumptions</h3>
<p>The other problem with <code class="language-plaintext highlighter-rouge"><FB extends FieldBuilder></code> is that the <code class="language-plaintext highlighter-rouge">FieldBuilder</code> interface makes too many assumptions about what the statement writer should know, instead of putting that power in the hands of the developer writing the field builder.</p>
<p>Let's bring it down to a single statement:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"{}"</span><span class="o">,</span> <span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">string</span><span class="o">(</span><span class="s">"operation"</span><span class="o">,</span> <span class="s">"add"</span><span class="o">));</span>
</code></pre></div></div>
<p>I'm still happy with the requirement of an <code class="language-plaintext highlighter-rouge">fb</code> handle for constructing arguments here. Ideally, I'd like a magic static import function for the handle:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span><span class="o">.</span><span class="n">info</span><span class="o">(</span><span class="s">"{}"</span><span class="o">,</span> <span class="k">import</span> <span class="nn">_</span> <span class="o">-></span> <span class="n">string</span><span class="o">(</span><span class="s">"operation"</span><span class="o">,</span> <span class="s">"add"</span><span class="o">));</span>
</code></pre></div></div>
<p>Or some kind of magic tuple:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span><span class="o">.</span><span class="n">info</span><span class="o">(</span><span class="s">"{} {}"</span><span class="o">,</span> <span class="s">"startTime"</span> <span class="o">-></span> <span class="n">startTime</span><span class="o">,</span> <span class="s">"endTime"</span> <span class="o">-></span> <span class="n">endTime</span><span class="o">);</span>
</code></pre></div></div>
<p>But for what Java is, adding an <code class="language-plaintext highlighter-rouge">fb.</code> prefix to everything is fine.</p>
<p>Everything after the <code class="language-plaintext highlighter-rouge">fb.</code> is not fine, because it makes three different assumptions.</p>
<h4 id="exposing-field">Exposing Field</h4>
<p>The first assumption the <code class="language-plaintext highlighter-rouge">FieldBuilder</code> makes to expose <code class="language-plaintext highlighter-rouge">Field</code> as the return type instead of <code class="language-plaintext highlighter-rouge"><F extends Field></code>.</p>
<p>I've already gone over the problem with hardcoding <code class="language-plaintext highlighter-rouge">Field</code>, but it's worth noting because this is baked into the Logger itself. There's no way a developer can swap that out using <code class="language-plaintext highlighter-rouge">withFieldBuilder</code> – it's part of the public API.</p>
<h4 id="exposing-primitives">Exposing Primitives</h4>
<p>The second assumption that <code class="language-plaintext highlighter-rouge">FieldBuilder</code> makes is to expose the infoset primitives (<code class="language-plaintext highlighter-rouge">string</code>, <code class="language-plaintext highlighter-rouge">boolean</code>, <code class="language-plaintext highlighter-rouge">number</code>), and the infoset complex (<code class="language-plaintext highlighter-rouge">array</code>, <code class="language-plaintext highlighter-rouge">object</code>) as part of the API, <em>and</em> it also exposes <code class="language-plaintext highlighter-rouge">keyValue</code> and <code class="language-plaintext highlighter-rouge">value</code> for the underlying <code class="language-plaintext highlighter-rouge">Value</code> objects.</p>
<p>This is a problem on multiple levels. It puts the power of definition in the statement, rather than in the field builder. More importantly, it ties the hands of the developer because what's being passed in is a representation, rather than the object to represent.</p>
<p>Simply put, there's no such thing as a string. A string is a textual representation of something meaningful in the domain: a name, an address, a debug representation of a syntax tree. A boolean is a representation of a feature flag, and so on.</p>
<p>So rather than letting the user pass in a string:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"{}"</span><span class="o">,</span> <span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">string</span><span class="o">(</span><span class="s">"operation"</span><span class="o">,</span> <span class="s">"add"</span><span class="o">));</span>
</code></pre></div></div>
<p>The developer could require the user to categorize the data as "program flow":</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>log.info("{}", fb -> fb.flow("operation", "add"));
</code></pre></div></div>
<p>or even better, expose a DSL:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>log.info("{}", fb -> fb.flow.operation("add"));
</code></pre></div></div>
<p>The point here is not that this is an ideal DSL, but that the developer should be able to decide how permissive or restrictive the field builder API is. Specifying that the logger extends <code class="language-plaintext highlighter-rouge">FieldBuilder</code> is removing that choice.</p>
<h3 id="exposing-names">Exposing Names</h3>
<p>The third assumption is that users should provide field names – that is, <code class="language-plaintext highlighter-rouge">fb.kv("name", "value")</code> is reasonable.</p>
<p>This seems like a reasonable assumption, especially when you're trying to log several instances of the same type:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">log</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"{} {}"</span><span class="o">,</span> <span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">list</span><span class="o">(</span>
<span class="n">fb</span><span class="o">.</span><span class="na">instant</span><span class="o">(</span><span class="s">"startTime"</span><span class="o">,</span> <span class="n">startTime</span><span class="o">),</span>
<span class="n">fb</span><span class="o">.</span><span class="na">instant</span><span class="o">(</span><span class="s">"endTime"</span><span class="o">,</span> <span class="n">endTime</span><span class="o">)</span>
<span class="o">));</span>
</code></pre></div></div>
<p>However, there are downsides to defining names directly in statements, especially when using centralized logging.</p>
<p>The first issue is that you may have to sanitize or validate the input name depending on your centralized logging. For example, ElasticSearch does not support <a href="https://www.elastic.co/blog/introducing-the-de_dot-filter">field names containing a . (dot) character</a>, so if you do not convert or reject invalid field names.</p>
<p>A broader issue is that field names are not scoped by the logger name. Centralized logging does not know that in the <code class="language-plaintext highlighter-rouge">FooLogger</code> a field name may be a string, but in the <code class="language-plaintext highlighter-rouge">BarLogger</code>, the same field name will be a number.</p>
<p>This can cause issues in centralized logging – ElasticSearch will attempt to define a schema based on dynamic mapping, meaning that if two log statements in the same index have the same field name but different types, i.e. "error": 404 vs "error": "not found" then Elasticsearch will render mapper_parsing_exception and may reject log statements if you do not have <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-malformed.html">ignore_malformed</a> turned on.</p>
<p>Even if you turn <code class="language-plaintext highlighter-rouge">ignore_malformed</code> on or have different mappings, a change in a mapping across indexes will be enough to stop ElasticSearch from querying correctly. ElasticSearch will also flatten field names, which can cause more confusion as conflicts will only come when objects have both the same field name and property, i.e. they are both called error and are objects that work fine, but fail when an optional code property is added.</p>
<p>Likewise, field names are not automatically scoped by context. You may have collision cases where two different fields have the same name in the same statement:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span>
<span class="o">.</span><span class="na">withFields</span><span class="o">(</span><span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">keyValue</span><span class="o">(</span><span class="s">"user_id"</span><span class="o">,</span> <span class="n">userId</span><span class="o">))</span>
<span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"{}"</span><span class="o">,</span> <span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">keyValue</span><span class="o">(</span><span class="s">"user_id"</span><span class="o">,</span> <span class="n">otherUserId</span><span class="o">));</span>
</code></pre></div></div>
<p>This will produce a statement that has two <code class="language-plaintext highlighter-rouge">user_id</code> fields with two different values – which is technically valid JSON, but may not be what centralized logging expects. And the backend should be able to deal with this transparently to make it work.</p>
<p>In short, what the user adds as a field name should be more what you'd call a 'guideline' than an actual rule, and the field builder should provide defaults if the user doesn't provide one.</p>
<p>For example, if you specify an <code class="language-plaintext highlighter-rouge">Address</code> without a name, even if you're just overriding <code class="language-plaintext highlighter-rouge">keyValue</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>log.info("{}", fb -> fb.keyValue(address));
</code></pre></div></div>
<p>Then the field builder can use <code class="language-plaintext highlighter-rouge">address</code> as the default field name, or even query the <code class="language-plaintext highlighter-rouge">Address</code> to figure out if it's <code class="language-plaintext highlighter-rouge">home</code> or <code class="language-plaintext highlighter-rouge">work</code>.</p>
<h2 id="exception-handling">Exception Handling</h2>
<p>The internals of Echopraxia's error handling have also been improved.</p>
<p>One assumed rule about logging is that it shouldn't break the application if logging fails. This has never been true, and SLF4J makes no such promises:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">MyObject</span> <span class="o">{</span>
<span class="kd">public</span> <span class="n">String</span> <span class="nf">toString</span><span class="o">()</span> <span class="o">{</span>
<span class="k">throw</span> <span class="k">new</span> <span class="nf">Exception</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="n">slf4jLogger</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"{}"</span><span class="o">,</span> <span class="k">new</span> <span class="n">MyObject</span><span class="o">());</span>
</code></pre></div></div>
<p>This will throw an exception that will blow through an appender in Logback, because it calls toString. It's up to the application to wrap arguments in a <a href="https://stackoverflow.com/questions/11295764/slf4j-without-tostring">custom converter</a>.</p>
<p>The problem in Echopraxia is that it can be astonishingly hard to figure out what values are null.</p>
<p>For example, in the AWS Java API, you'll get an <a href="https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/S3Object.html">S3Object</a>:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">class</span> <span class="nc">MyFieldBuilder</span> <span class="kd">extends</span> <span class="n">PresentationFieldBuilder</span> <span class="o">{</span>
<span class="kd">public</span> <span class="n">Value</span><span class="o"><?></span> <span class="n">s3ObjectValue</span><span class="o">(</span><span class="n">S3Object</span> <span class="n">s3Object</span><span class="o">)</span> <span class="o">{</span>
<span class="n">String</span> <span class="n">key</span> <span class="o">=</span> <span class="n">s3Object</span><span class="o">.</span><span class="na">getKey</span><span class="o">();</span>
<span class="kt">int</span> <span class="n">taggingCount</span> <span class="o">=</span> <span class="n">s3Object</span><span class="o">.</span><span class="na">getTaggingCount</span><span class="o">().</span><span class="na">toInt</span><span class="o">();</span>
<span class="k">return</span> <span class="n">Value</span><span class="o">.</span><span class="na">object</span><span class="o">(</span><span class="n">keyValue</span><span class="o">(</span><span class="s">"key"</span><span class="o">,</span> <span class="n">key</span><span class="o">),</span> <span class="n">keyValue</span><span class="o">(</span><span class="s">"taggingCount"</span><span class="o">,</span> <span class="n">taggingCount</span><span class="o">));</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>This will fail with a <code class="language-plaintext highlighter-rouge">NullPointerException</code>, because <code class="language-plaintext highlighter-rouge">getTaggingCount()</code> returns a null <code class="language-plaintext highlighter-rouge">Integer</code>.</p>
<p>Echopraxia makes a best effort and does <a href="https://tersesystems.github.io/echopraxia/3.0.0/usage/fieldbuilder/#exception-handling">point out bad ideas</a>, but now there's an actual exception handler for when things go south. The default implementation writes <code class="language-plaintext highlighter-rouge">e.printStackTrace()</code> to STDERR and does not log the statement, but you can replace that implementation with something that writes to Sentry or writes a "best effort" log statement out.</p>
<h2 id="housekeeping">Housekeeping</h2>
<p>Another housekeeping arrangement is moving the internals of <code class="language-plaintext highlighter-rouge">Logger</code> and logging support out of the <code class="language-plaintext highlighter-rouge">api</code> package into an <code class="language-plaintext highlighter-rouge">spi</code> package.</p>
<p>There's two categories of user: the developer who puts together the loggers, field builders, and the actual framework itself, and then the end users who just want the log statements and maybe some custom conditions. The code needs to reflect that.</p>
<p>My codified knowlede of <a href="https://stackoverflow.com/questions/2954372/difference-between-spi-and-api">SPI vs API</a> is spotty, but the general rule I applied was "does the person writing a logging statement need to know about this" – things like <code class="language-plaintext highlighter-rouge">Condition</code>, <code class="language-plaintext highlighter-rouge">LoggingContext</code> and <code class="language-plaintext highlighter-rouge">Level</code> – all qualify as API. More internal things like <code class="language-plaintext highlighter-rouge">CoreLogger</code>, <code class="language-plaintext highlighter-rouge">Filter</code>, and <code class="language-plaintext highlighter-rouge">DefaultMethodsSupport</code> do not.</p>
<h2 id="compile-only-dependencies">Compile-Only Dependencies</h2>
<p>Finally, 3.0 removes the transitive dependencies on logback and log4j2 framework implementation, so they will need to be explicitly defined as dependencies at <a href="https://tersesystems.github.io/echopraxia/3.0.0/installation/">installation</a>.</p>
<p>This is mostly because SLF4J 2.0.x and SLF4J 1.7.x do not mix. If you have code that uses SLF4J 2.0.x, it will see Logback 1.2.x and pointedly ignore it with a <a href="https://www.slf4j.org/codes.html#ignoredBindings">warning message</a>.</p>
<p>This didn't used to matter, because logstash-logback-encoder used SLF4J 1.7.x, but as of <a href="https://github.com/logfellow/logstash-logback-encoder/releases/tag/logstash-logback-encoder-7.4">7.4</a> support for Logback 1.2 is <a href="https://github.com/logfellow/logstash-logback-encoder/pull/970">dropped</a>. Echopraxia makes no guarantees of backwards or forward compatibility between versions: it is best effort at this point. If it gets to the point where Logback 1.2 and 1.4 are simply not reconcilable, I'll create different adapters for them so there'll be <code class="language-plaintext highlighter-rouge">logstash1_2</code> and <code class="language-plaintext highlighter-rouge">logstash1_4</code> backends.</p>
<p>Likewise, while Log4J 2 doesn't depend on SLF4J 2, the <a href="https://www.cisa.gov/news-events/news/apache-log4j-vulnerability-guidance">Log4J vulnerability warnings</a> and general attention given to analyzing <em>exactly</em> which version of Log4J 2 your framework depends on means that the safest thing to do is to not specify any version at all.</p>
<h2 id="summary">Summary</h2>
<p>I hope this gives a good overview of the design decisions and thinking going into this release. I'm really happy with Echopraxia as a whole, and I keep being surprised at how much fun I have both writing it and finding new things I can do with it.</p>TL;DR Echopraxia is a structured logging API for Java and Scala. I've released Echopraxia 3.0 which has a number of new features, most notably more control over presentation logic, custom typed attributes, better exception handling, and removing hardcoded dependencies. You can check it out at https://github.com/tersesystems/echopraxia/ or check out the new documentation site. This is going to be a development log going into technical details, explaining the why behind the how. Presentation Logic Echopraxia's API is built around structured logging input. In practical terms, that means when you write: log.info("{} logged in", fb -> fb.user("user", thisUser)); Then you expect to see a line oriented output that looks something like: INFO user={<some human readable data here>} logged in And you expect that the JSON output will be: { "level": "INFO", "message": "user={<some human readable data here>} logged in", "user": { <some machine readable data here> } } The problem: the user may see irrelevant junk inside of user={...} and really only cares about id and role. The machine-readable data also may want JSON in a particular format – Elasticsearch wants stable field names, stable mappings, and has trouble understanding deeply nested objects and arrays. Or there may be additional data that isn't explicitly called out in the field JSON-LD includes a @type field that is used in typed values, i.e. a timestamp may have a type of http://www.w3.org/2001/XMLSchema#dateTime, but that isn't relevant to the human. The paradox here is that although structured logging involves packaging arguments into a structured format, the presentation of that data is very different between machine-readable format and "ergonomic" human-readable format. While logfmt is a recognizable and compact format, it's still machine based – it does not care about what is most relevant for a human to see. Meanwhile, from the end user's perspective, there's a loss of utility in rendering structured data: they used to be able to control the presentation exactly with toString, and now they can't. This issue compounds when we start getting into complex objects and arrays. When rendering an AST, batches of paginated data, or encrypted data, there's an issue of presentation. Should the user see the entire AST, or only the relevant bits? Does the user care about the contents of the batch, or just that it's the right length? Should the user see the unencrypted data, or should it be filtered or invisible? Presentation Hints The solution in 3.0 is to add typed attributes to fields. These attributes are used to add extra metadata to a field, so that a formatter has more to work with than just the name and value. Then we can add some presentation hints to specialize fields so that a formatter can decide how to render this field in particular, and extend the Field type to PresentationField with some extra methods to provide those hints. For example, one of the hints is asCardinal, which renders a field as a cardinal number in a line oriented format. This is most useful for very long strings and arrays: log.debug("{}", fb -> fb.array("elements", 1, 2, 3).asCardinal()); renders as: elements=|3| Other useful presentation hints are asElided which will "skip" a field so it doesn't show in the toString formatter, and abbreviateAfter which truncates a string or array after a number of elements. Structured Format While it's nice to be able to customize values, there will be cases where the string we want a human to see is not the string we want the machine to see. Take the case of a duration: log.debug("{}", fb -> fb.duration("duration", Duration.ofDays(1))); We want to render this in a human readable format: "1 day" But we want to see the ISO duration format in JSON: {"duration": "PT24H"} Simply rendering a Value.string won't work here, and overriding toString won't be enough. Instead, we have to provide both human and machine values. We can do that by passing a string with the human value, and using withStructuredFormat for the machine value: public class MyFieldBuilder extends PresentationFieldBuilder { public PresentationField duration(String name, Duration duration) { Field structuredField = string(name, duration.toString()); return string(name, duration.toDays() + " day") .asValueOnly() .withStructuredFormat(new SimpleFieldVisitor() { @Override public @NotNull Field visitString(@NotNull Value<String> stringValue) { return structuredField; } }); } } This withStructuredFormat method adds an attribute that takes a FieldVisitor interface, following the visitor pattern. Here, we only care about swapping out the string value, so visitString is all that's required. This also covers the case where we want to render extra information or do some transformation for the machine, so we could add @type information for JSON-LD: log.info("{}", fb -> fb.instant("startTime", Instant.ofEpochMillis(0))); in text format; startTime=1970-01-01T00:00:00Z and in JSON: { "startTime": { "@type":"http://www.w3.org/2001/XMLSchema#dateTime", "@value":"1970-01-01T00:00:00Z" } } and we can even cover this for the array case: log.info("{}", fb -> fb.instantArray("instantArray", List.of(Instant.ofEpochMillis(0)))); produces: { "instantArray": [ {"@type":"http://www.w3.org/2001/XMLSchema#dateTime","@value":"1970-01-01T00:00:00Z"} ] } And here's the implementation: public class InstantFieldBuilder implements PresentationFieldBuilder { private static final FieldVisitor instantVisitor = new InstantFieldVisitor(); public PresentationField instant(String name, Instant instant) { return string(name, instant.toString()).withStructuredFormat(instantVisitor); } public PresentationField instantArray(String name, List<Instant> instants) { return fb.array(name, Value.array(i -> Value.string(i.toString()), instants)) .withStructuredFormat(instantVisitor); } class InstantFieldVisitor extends SimpleFieldVisitor { @Override public @NotNull Field visitString(@NotNull Value<String> stringValue) { return typedInstant(name, stringValue); } PresentationField typedInstant(String name, Value<String> v) { return object(name, typedInstantValue(v)); } Value.ObjectValue typedInstantValue(Value<String> v) { return Value.object( string("@type", "http://www.w3.org/2001/XMLSchema#dateTime"), keyValue("@value", v)); } @Override public @NotNull ArrayVisitor visitArray() { return new InstantArrayVisitor(); } class InstantArrayVisitor extends SimpleArrayVisitor { @Override public void visitStringElement(Value.StringValue stringValue) { this.elements.add(typedInstantValue(stringValue)); } } } } Field Creation This covers the cases I can think about, but make it really work it has to be extensible so users can add their own custom methods, i.e asDecrypted() to strip decryption from a value. So instead of returning a PresentationField, we should be need to work with a field as a generic user defined type extending Field. Field creation in Echopraxia comes down to a factory method: Field field = Field.keyValue(name, value); This has to be changed so that it takes a Class<T>: PresentationField field = Field.keyValue(name, value, PresentationField.class); This also means that users can modify field creation in general, so it can be extended with metrics, validation, caching, etc. Field Builders The other change to Echopraxia is the removal of FieldBuilder as a lower bound on the loggers. Before, you could do the following using Logger<?> and it would act like it was Logger<FieldBuilder>: Logger<?> logger = LoggerFactory.getLogger(getClass()); This no longer works in 3.0 and the default is PresentationFieldBuilder, not FieldBuilder: Logger<PresentationFieldBuilder> logger = LoggerFactory.getLogger(getClass()); If you still want to use FieldBuilder or your own custom instance then you can now pass a field builder as argument (instead of having to calling withFieldBuilder): Logger<MyFieldBuilder> logger = LoggerFactory.getLogger(getClass(), MyFieldBuilder.instance()); There were two justifications initially for using Logger<FB extends FieldBuilder>: minimizing verbosity and providing a minimal set of functionality for building and extending fields. The first justification is weak, and the second is offset by the assumptions that FieldBuilder makes for the user. Minimizing Verbosity At the time that Echopraxia was first sketched out, JDK 1.8 was much more popular. This is no longer the case – JDK 11 is long in the tooth now, and JDK 17 is the standard. The language has evolved, and now has type inference. This means that if you're concatenating loggers in a method, you'll use var: public class Foo { public void doStuff(Instant startTime) { var log = logger.withFields(fb -> fb.instant("startTime", startTime)) log.info("doStuff: make things happen"); } } And if you're defining a static final logger, you're not going to be bothered because private static final is already an up front cost: public class Foo { private static final Logger<PresentationFieldBuilder> logger = LoggerFactory.getLogger(Foo.class); } So this is a moot point. Hidden Assumptions The other problem with <FB extends FieldBuilder> is that the FieldBuilder interface makes too many assumptions about what the statement writer should know, instead of putting that power in the hands of the developer writing the field builder. Let's bring it down to a single statement: log.info("{}", fb -> fb.string("operation", "add")); I'm still happy with the requirement of an fb handle for constructing arguments here. Ideally, I'd like a magic static import function for the handle: log.info("{}", import _ -> string("operation", "add")); Or some kind of magic tuple: log.info("{} {}", "startTime" -> startTime, "endTime" -> endTime); But for what Java is, adding an fb. prefix to everything is fine. Everything after the fb. is not fine, because it makes three different assumptions. Exposing Field The first assumption the FieldBuilder makes to expose Field as the return type instead of <F extends Field>. I've already gone over the problem with hardcoding Field, but it's worth noting because this is baked into the Logger itself. There's no way a developer can swap that out using withFieldBuilder – it's part of the public API. Exposing Primitives The second assumption that FieldBuilder makes is to expose the infoset primitives (string, boolean, number), and the infoset complex (array, object) as part of the API, and it also exposes keyValue and value for the underlying Value objects. This is a problem on multiple levels. It puts the power of definition in the statement, rather than in the field builder. More importantly, it ties the hands of the developer because what's being passed in is a representation, rather than the object to represent. Simply put, there's no such thing as a string. A string is a textual representation of something meaningful in the domain: a name, an address, a debug representation of a syntax tree. A boolean is a representation of a feature flag, and so on. So rather than letting the user pass in a string: log.info("{}", fb -> fb.string("operation", "add")); The developer could require the user to categorize the data as "program flow": log.info("{}", fb -> fb.flow("operation", "add")); or even better, expose a DSL: log.info("{}", fb -> fb.flow.operation("add")); The point here is not that this is an ideal DSL, but that the developer should be able to decide how permissive or restrictive the field builder API is. Specifying that the logger extends FieldBuilder is removing that choice. Exposing Names The third assumption is that users should provide field names – that is, fb.kv("name", "value") is reasonable. This seems like a reasonable assumption, especially when you're trying to log several instances of the same type: log.info("{} {}", fb -> fb.list( fb.instant("startTime", startTime), fb.instant("endTime", endTime) )); However, there are downsides to defining names directly in statements, especially when using centralized logging. The first issue is that you may have to sanitize or validate the input name depending on your centralized logging. For example, ElasticSearch does not support field names containing a . (dot) character, so if you do not convert or reject invalid field names. A broader issue is that field names are not scoped by the logger name. Centralized logging does not know that in the FooLogger a field name may be a string, but in the BarLogger, the same field name will be a number. This can cause issues in centralized logging – ElasticSearch will attempt to define a schema based on dynamic mapping, meaning that if two log statements in the same index have the same field name but different types, i.e. "error": 404 vs "error": "not found" then Elasticsearch will render mapper_parsing_exception and may reject log statements if you do not have ignore_malformed turned on. Even if you turn ignore_malformed on or have different mappings, a change in a mapping across indexes will be enough to stop ElasticSearch from querying correctly. ElasticSearch will also flatten field names, which can cause more confusion as conflicts will only come when objects have both the same field name and property, i.e. they are both called error and are objects that work fine, but fail when an optional code property is added. Likewise, field names are not automatically scoped by context. You may have collision cases where two different fields have the same name in the same statement: logger .withFields(fb -> fb.keyValue("user_id", userId)) .info("{}", fb -> fb.keyValue("user_id", otherUserId)); This will produce a statement that has two user_id fields with two different values – which is technically valid JSON, but may not be what centralized logging expects. And the backend should be able to deal with this transparently to make it work. In short, what the user adds as a field name should be more what you'd call a 'guideline' than an actual rule, and the field builder should provide defaults if the user doesn't provide one. For example, if you specify an Address without a name, even if you're just overriding keyValue: log.info("{}", fb -> fb.keyValue(address)); Then the field builder can use address as the default field name, or even query the Address to figure out if it's home or work. Exception Handling The internals of Echopraxia's error handling have also been improved. One assumed rule about logging is that it shouldn't break the application if logging fails. This has never been true, and SLF4J makes no such promises: class MyObject { public String toString() { throw new Exception(); } } slf4jLogger.info("{}", new MyObject()); This will throw an exception that will blow through an appender in Logback, because it calls toString. It's up to the application to wrap arguments in a custom converter. The problem in Echopraxia is that it can be astonishingly hard to figure out what values are null. For example, in the AWS Java API, you'll get an S3Object: class MyFieldBuilder extends PresentationFieldBuilder { public Value<?> s3ObjectValue(S3Object s3Object) { String key = s3Object.getKey(); int taggingCount = s3Object.getTaggingCount().toInt(); return Value.object(keyValue("key", key), keyValue("taggingCount", taggingCount)); } } This will fail with a NullPointerException, because getTaggingCount() returns a null Integer. Echopraxia makes a best effort and does point out bad ideas, but now there's an actual exception handler for when things go south. The default implementation writes e.printStackTrace() to STDERR and does not log the statement, but you can replace that implementation with something that writes to Sentry or writes a "best effort" log statement out. Housekeeping Another housekeeping arrangement is moving the internals of Logger and logging support out of the api package into an spi package. There's two categories of user: the developer who puts together the loggers, field builders, and the actual framework itself, and then the end users who just want the log statements and maybe some custom conditions. The code needs to reflect that. My codified knowlede of SPI vs API is spotty, but the general rule I applied was "does the person writing a logging statement need to know about this" – things like Condition, LoggingContext and Level – all qualify as API. More internal things like CoreLogger, Filter, and DefaultMethodsSupport do not. Compile-Only Dependencies Finally, 3.0 removes the transitive dependencies on logback and log4j2 framework implementation, so they will need to be explicitly defined as dependencies at installation. This is mostly because SLF4J 2.0.x and SLF4J 1.7.x do not mix. If you have code that uses SLF4J 2.0.x, it will see Logback 1.2.x and pointedly ignore it with a warning message. This didn't used to matter, because logstash-logback-encoder used SLF4J 1.7.x, but as of 7.4 support for Logback 1.2 is dropped. Echopraxia makes no guarantees of backwards or forward compatibility between versions: it is best effort at this point. If it gets to the point where Logback 1.2 and 1.4 are simply not reconcilable, I'll create different adapters for them so there'll be logstash1_2 and logstash1_4 backends. Likewise, while Log4J 2 doesn't depend on SLF4J 2, the Log4J vulnerability warnings and general attention given to analyzing exactly which version of Log4J 2 your framework depends on means that the safest thing to do is to not specify any version at all. Summary I hope this gives a good overview of the design decisions and thinking going into this release. I'm really happy with Echopraxia as a whole, and I keep being surprised at how much fun I have both writing it and finding new things I can do with it.Bootstrapping Boxes Into Tailscale With 1Password2023-05-25T19:31:39-07:002023-05-25T19:31:39-07:00https://tersesystems.com/blog/2023/05/25/bootstrapping-boxes-into-tailscale-with-1password<p>This is a follow-on from <a href="https://tersesystems.com/blog/2023/05/03/disposable-cloud-environments-with-virtualbox-and-tailscale/">Disposable Cloud Environments With Vagrant and Tailscale</a>. The summary is that I've worked out how to get new boxes up and integrated with Tailscale with a small bootstrap Ansible playbook and some 1Password integration.</p>
<p>This is going to be short and direct, with the goal of showing how to repeat this and never have to think about how to manage secrets. Credit to <a href="https://github.com/kaushikchandrashekar/developer-vagrant/tree/master">kaushikchandrashekar/developer-vagrant</a> for shortcutting much of this process with a github project showing Vagrant leveraging Ansible Galaxy.</p>
<p>The source code is available at <a href="https://github.com/wsargent/vagrant-tailscale-example">https://github.com/wsargent/vagrant-tailscale-example</a>.</p>
<h2 id="the-problem">The Problem</h2>
<p>The previous post used Vagrant's inline script to set up Tailscale and everything else:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Vagrant</span><span class="p">.</span><span class="nf">configure</span><span class="p">(</span><span class="s2">"2"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">config</span><span class="o">|</span>
<span class="n">config</span><span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">enable</span>
<span class="c1"># vm parameters</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">provision</span> <span class="s2">"tailscale-install"</span><span class="p">,</span> <span class="ss">type: </span><span class="s2">"shell"</span> <span class="k">do</span> <span class="o">|</span><span class="n">s</span><span class="o">|</span>
<span class="n">s</span><span class="p">.</span><span class="nf">inline</span> <span class="o">=</span> <span class="s2">"curl -fsSL https://tailscale.com/install.sh | sh"</span>
<span class="k">end</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">provision</span> <span class="s2">"tailscale-up"</span><span class="p">,</span> <span class="ss">type: </span><span class="s2">"shell"</span> <span class="k">do</span> <span class="o">|</span><span class="n">s</span><span class="o">|</span>
<span class="n">s</span><span class="p">.</span><span class="nf">inline</span> <span class="o">=</span> <span class="s2">"tailscale up --ssh --operator=vagrant --authkey </span><span class="si">#{</span><span class="no">ENV</span><span class="p">[</span><span class="s1">'TAILSCALE_AUTHKEY'</span><span class="p">]</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="c1"># ...yet more script...</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Inline scripts in Vagrant aren't great. Every Vagrantfile is different, and the failure behavior is unpredictable. In addition, doing the work of <code class="language-plaintext highlighter-rouge">op run -- vagrant up</code> to integrate with <a href="https://developer.1password.com/docs/cli/secrets-environment-variables#export-environment-variables">1Password CLI</a> was awkward.</p>
<p>Let's take the opposite approach. Bootstrap Vagrant into a Tailscale host with as little manual work as possible, then provision using Ansible through Tailscale, bypassing Vagrant.</p>
<h2 id="vagrant-with-ansible">Vagrant with Ansible</h2>
<p>The first step was to use the <a href="https://developer.hashicorp.com/vagrant/docs/provisioning/ansible">Ansible Provisioner</a> in Vagrant, and put as few things into the Vagrant file as possible.</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Vagrant</span><span class="p">.</span><span class="nf">configure</span><span class="p">(</span><span class="s2">"2"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">config</span><span class="o">|</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">box</span> <span class="o">=</span> <span class="s2">"ubuntu/jammy64"</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">hostname</span> <span class="o">=</span> <span class="s2">"vagrant-docker"</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">provision</span> <span class="ss">:ansible</span> <span class="k">do</span> <span class="o">|</span><span class="n">ansible</span><span class="o">|</span>
<span class="n">ansible</span><span class="p">.</span><span class="nf">compatibility_mode</span> <span class="o">=</span> <span class="s2">"2.0"</span>
<span class="n">ansible</span><span class="p">.</span><span class="nf">playbook</span> <span class="o">=</span> <span class="s2">"playbook.yml"</span>
<span class="n">ansible</span><span class="p">.</span><span class="nf">galaxy_role_file</span> <span class="o">=</span> <span class="s2">"requirements.yml"</span>
<span class="n">ansible</span><span class="p">.</span><span class="nf">galaxy_roles_path</span> <span class="o">=</span> <span class="s2">"/etc/ansible/roles"</span>
<span class="n">ansible</span><span class="p">.</span><span class="nf">galaxy_command</span> <span class="o">=</span> <span class="s2">"sudo ansible-galaxy install --role-file=%{role_file} --roles-path=%{roles_path} --force"</span>
<span class="k">end</span>
<span class="n">config</span><span class="p">.</span><span class="nf">trigger</span><span class="p">.</span><span class="nf">before</span> <span class="ss">:destroy</span> <span class="k">do</span> <span class="o">|</span><span class="n">trigger</span><span class="o">|</span>
<span class="n">trigger</span><span class="p">.</span><span class="nf">run_remote</span> <span class="o">=</span> <span class="p">{</span><span class="ss">inline: </span><span class="s2">"tailscale logout"</span><span class="p">}</span>
<span class="n">trigger</span><span class="p">.</span><span class="nf">on_error</span> <span class="o">=</span> <span class="ss">:continue</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Python package management is slightly terrifying, so I opted for <code class="language-plaintext highlighter-rouge">apt</code> to install Ansible:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>add-apt-repository <span class="nt">--yes</span> <span class="nt">--update</span> ppa:ansible/ansible
<span class="nv">$ </span><span class="nb">sudo </span>apt install ansible
</code></pre></div></div>
<h2 id="using-ansible-galaxy">Using Ansible Galaxy</h2>
<p>There are two Ansible packages I needed to get Tailscale set up: <code class="language-plaintext highlighter-rouge">artis3n.tailscale</code> and <code class="language-plaintext highlighter-rouge">community.general.onepassword</code>.</p>
<p>These can be set up in <code class="language-plaintext highlighter-rouge">requirements.yml</code>:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">roles</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">artis3n.tailscale</span>
<span class="na">collections</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">community.general</span>
</code></pre></div></div>
<p>It's easiest to install these pre-emptively rather than have it pop up in the middle of the install:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ansible-galaxy install artis3n.tailscale
$ ansible-galaxy collection install community.general
</code></pre></div></div>
<h2 id="running-the-playbook">Running the Playbook</h2>
<p>After the packages are installed, the only thing needed in the playbook is to set up the <code class="language-plaintext highlighter-rouge">tailscale</code> role and look up the authkey from 1Password using <a href="https://docs.ansible.com/ansible/latest/collections/community/general/onepassword_info_module.html#ansible-collections-community-general-onepassword-info-module"><code class="language-plaintext highlighter-rouge">community.general.onepassword</code></a>:</p>
<div class="language-yml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="pi">-</span> <span class="na">hosts</span><span class="pi">:</span> <span class="s">all</span>
<span class="na">become</span><span class="pi">:</span> <span class="no">true</span>
<span class="na">tasks</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">install-tailscale</span>
<span class="na">import_role</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">artis3n.tailscale</span>
<span class="na">vars</span><span class="pi">:</span>
<span class="na">tailscale_authkey</span><span class="pi">:</span> <span class="s2">"</span><span class="s">{{</span><span class="nv"> </span><span class="s">lookup('community.general.onepassword',</span><span class="nv"> </span><span class="s">'vagrant-tailscale',</span><span class="nv"> </span><span class="s">field='credential',</span><span class="nv"> </span><span class="s">vault='will-connect-vault')</span><span class="nv"> </span><span class="s">}}"</span>
<span class="na">tailscale_args</span><span class="pi">:</span> <span class="s2">"</span><span class="s">--ssh"</span>
</code></pre></div></div>
<p>I did have to go into Tailscale and change the <a href="https://tailscale.com/kb/1193/tailscale-ssh/#configure-tailscale-ssh-with-check-mode">SSH Check mode</a> from <code class="language-plaintext highlighter-rouge">action: check</code> to <code class="language-plaintext highlighter-rouge">action: accept</code> so it didn't keep asking me to click on URLs.</p>
<p>The only thing I need to do is make sure I'm signed into 1Password, and then after that the host will pop up:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ eval $(op signin) # sign into 1p CLI
$ vagrant up
# ...tailscale status shows new host on tailnet...
</code></pre></div></div>
<h2 id="integrating-tailscale-with-ansible">Integrating Tailscale with Ansible</h2>
<p>From there, it's now a question of how to install software other than Tailscale on the box. Ansible can do it, but first Ansible has to know about it.</p>
<p>The first thing to do is set up dynamic inventory with Tailscale using the <a href="https://github.com/freeformz/ansible#tailscale-inventory-plugin">Tailscale Inventory Plugin</a>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ansible-galaxy collection install freeformz.ansible
</code></pre></div></div>
<p>And then the <a href="https://docs.ansible.com/ansible/latest/reference_appendices/config.html">configuration</a> in <code class="language-plaintext highlighter-rouge">ansible.cfg</code>:</p>
<div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[inventory]</span>
<span class="py">enable_plugins</span> <span class="p">=</span> <span class="s">freeformz.ansible.tailscale</span>
<span class="nn">[defaults]</span>
<span class="py">inventory</span> <span class="p">=</span> <span class="s">$HOME/tailscale.yaml</span>
<span class="py">remote_user</span> <span class="p">=</span> <span class="s">vagrant</span>
<span class="py">host_key_checking</span> <span class="p">=</span> <span class="s">False</span>
<span class="nn">[ssh_connection]</span>
<span class="py">pipelining</span><span class="p">=</span><span class="s">true</span>
<span class="py">retries</span><span class="p">=</span><span class="s">10</span>
</code></pre></div></div>
<p>And now the vagrant boxes can get installs the same way that any other host would. For example, to install Docker:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ansible-playbook playbooks/docker.yml
</code></pre></div></div>
<p>Where <code class="language-plaintext highlighter-rouge">docker.yml</code> contains:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Configure with Docker</span>
<span class="na">hosts</span><span class="pi">:</span> <span class="s">vagrant-docker</span>
<span class="na">become</span><span class="pi">:</span> <span class="no">true</span>
<span class="na">tasks</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">apt-update</span>
<span class="na">apt</span><span class="pi">:</span>
<span class="na">update_cache</span><span class="pi">:</span> <span class="s">yes</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">install-docker</span>
<span class="na">import_role</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">geerlingguy.docker</span>
<span class="na">vars</span><span class="pi">:</span>
<span class="na">docker_edition</span><span class="pi">:</span> <span class="s1">'</span><span class="s">ce'</span>
<span class="na">docker_package</span><span class="pi">:</span> <span class="s2">"</span><span class="s">docker-"</span>
<span class="na">docker_package_state</span><span class="pi">:</span> <span class="s">present</span>
<span class="na">docker_install_compose</span><span class="pi">:</span> <span class="no">true</span>
<span class="na">docker_users</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">vagrant</span>
</code></pre></div></div>
<h2 id="next-steps">Next Steps</h2>
<p>I am aware of <a href="https://docs.ansible.com/ansible/latest/network/getting_started/first_inventory.html#protecting-sensitive-variables-with-ansible-vault">Ansible Vault</a> but want to use 1Password in part because the next step is to extend it to <a href="https://developer.1password.com/docs/connect/">1Password Connect</a> for use with <a href="https://github.com/1Password/onepassword-operator">Kubernetes</a> and <a href="https://github.com/1Password/terraform-provider-onepassword">Terraform</a>. With any luck, this should make working with API keys and tokens much easier – copy and paste them into 1Password and be done. And then, just maybe, never think about secrets management again.</p>This is a follow-on from Disposable Cloud Environments With Vagrant and Tailscale. The summary is that I've worked out how to get new boxes up and integrated with Tailscale with a small bootstrap Ansible playbook and some 1Password integration. This is going to be short and direct, with the goal of showing how to repeat this and never have to think about how to manage secrets. Credit to kaushikchandrashekar/developer-vagrant for shortcutting much of this process with a github project showing Vagrant leveraging Ansible Galaxy. The source code is available at https://github.com/wsargent/vagrant-tailscale-example. The Problem The previous post used Vagrant's inline script to set up Tailscale and everything else: Vagrant.configure("2") do |config| config.env.enable # vm parameters config.vm.provision "tailscale-install", type: "shell" do |s| s.inline = "curl -fsSL https://tailscale.com/install.sh | sh" end config.vm.provision "tailscale-up", type: "shell" do |s| s.inline = "tailscale up --ssh --operator=vagrant --authkey #{ENV['TAILSCALE_AUTHKEY']}" end # ...yet more script... end Inline scripts in Vagrant aren't great. Every Vagrantfile is different, and the failure behavior is unpredictable. In addition, doing the work of op run -- vagrant up to integrate with 1Password CLI was awkward. Let's take the opposite approach. Bootstrap Vagrant into a Tailscale host with as little manual work as possible, then provision using Ansible through Tailscale, bypassing Vagrant. Vagrant with Ansible The first step was to use the Ansible Provisioner in Vagrant, and put as few things into the Vagrant file as possible. Vagrant.configure("2") do |config| config.vm.box = "ubuntu/jammy64" config.vm.hostname = "vagrant-docker" config.vm.provision :ansible do |ansible| ansible.compatibility_mode = "2.0" ansible.playbook = "playbook.yml" ansible.galaxy_role_file = "requirements.yml" ansible.galaxy_roles_path = "/etc/ansible/roles" ansible.galaxy_command = "sudo ansible-galaxy install --role-file=%{role_file} --roles-path=%{roles_path} --force" end config.trigger.before :destroy do |trigger| trigger.run_remote = {inline: "tailscale logout"} trigger.on_error = :continue end end Python package management is slightly terrifying, so I opted for apt to install Ansible: $ sudo add-apt-repository --yes --update ppa:ansible/ansible $ sudo apt install ansible Using Ansible Galaxy There are two Ansible packages I needed to get Tailscale set up: artis3n.tailscale and community.general.onepassword. These can be set up in requirements.yml: --- roles: - name: artis3n.tailscale collections: - name: community.general It's easiest to install these pre-emptively rather than have it pop up in the middle of the install: $ ansible-galaxy install artis3n.tailscale $ ansible-galaxy collection install community.general Running the Playbook After the packages are installed, the only thing needed in the playbook is to set up the tailscale role and look up the authkey from 1Password using community.general.onepassword: --- - hosts: all become: true tasks: - name: install-tailscale import_role: name: artis3n.tailscale vars: tailscale_authkey: "{{ lookup('community.general.onepassword', 'vagrant-tailscale', field='credential', vault='will-connect-vault') }}" tailscale_args: "--ssh" I did have to go into Tailscale and change the SSH Check mode from action: check to action: accept so it didn't keep asking me to click on URLs. The only thing I need to do is make sure I'm signed into 1Password, and then after that the host will pop up: $ eval $(op signin) # sign into 1p CLI $ vagrant up # ...tailscale status shows new host on tailnet... Integrating Tailscale with Ansible From there, it's now a question of how to install software other than Tailscale on the box. Ansible can do it, but first Ansible has to know about it. The first thing to do is set up dynamic inventory with Tailscale using the Tailscale Inventory Plugin: $ ansible-galaxy collection install freeformz.ansible And then the configuration in ansible.cfg: [inventory] enable_plugins = freeformz.ansible.tailscale [defaults] inventory = $HOME/tailscale.yaml remote_user = vagrant host_key_checking = False [ssh_connection] pipelining=true retries=10 And now the vagrant boxes can get installs the same way that any other host would. For example, to install Docker: $ ansible-playbook playbooks/docker.yml Where docker.yml contains: - name: Configure with Docker hosts: vagrant-docker become: true tasks: - name: apt-update apt: update_cache: yes - name: install-docker import_role: name: geerlingguy.docker vars: docker_edition: 'ce' docker_package: "docker-" docker_package_state: present docker_install_compose: true docker_users: - vagrant Next Steps I am aware of Ansible Vault but want to use 1Password in part because the next step is to extend it to 1Password Connect for use with Kubernetes and Terraform. With any luck, this should make working with API keys and tokens much easier – copy and paste them into 1Password and be done. And then, just maybe, never think about secrets management again.Disposable Cloud Environments With Vagrant and Tailscale2023-05-03T14:10:21-07:002023-05-03T14:10:21-07:00https://tersesystems.com/blog/2023/05/03/disposable-cloud-environments-with-virtualbox-and-tailscale<p>There's a lot in this blog post, so I'll summarize it first, and then tell you a horrible joke that got me a content warning from Slack (but it won't make sense unless you're a functional programming nerd).</p>
<ul>
<li><em>Goal #1</em>: I want to build out an ELK cluster and Do Science To it.</li>
<li><em>Goal #2</em>: I want to not start from scratch or figure out how to undo things when I screw up.</li>
<li><em>Goal #3</em>: I want to be able to keep working on it from different computers.</li>
<li><em>Goal #4</em>: I want to work through <a href="https://github.com/kelseyhightower/kubernetes-the-hard-way">Kubernetes the Hard Way</a> and set up a cloud environment.</li>
</ul>
<p>The solution to #1 is containerization. Run <a href="https://docs.docker.com/compose/compose-file/">Docker Compose</a> and set up an <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-compose-file">multi node ElasticSearch cluster</a>.</p>
<p>The solution to #2 is virtualization. Create a virtual machine using <a href="https://www.vagrantup.com/">Vagrant</a>, install Docker on it, then do #1. Now I can take VM snapshots before config changes and rollback if I screwed up, and I can dispose of the boxes when I'm done.</p>
<p>The solution to #3 is to build out a homelab server (incredibly cheap at $391). Install Ubuntu, then #2.</p>
<p>The solution to #4 is to throw more memory at the homelab server, then install <a href="https://kind.sigs.k8s.io/">Kind</a> after #2. Because of #2, I can now also mess with Terraform state and get away with it.</p>
<p><em>Pause to build and install everything…</em></p>
<p><em>Problem</em>: I want to see Kibana from my laptop browser. <a href="https://docs.docker.com/compose/">Docker Compose</a> forwards everything to localhost, and then the VM also requires <a href="https://developer.hashicorp.com/vagrant/docs/networking">networking magic</a> to expose it to the host. I have a box containing a box, containing a box, and I don't want to have to port forward all the things.</p>
<p><em>Solution</em>: Install <a href="https://tailscale.com/">Tailscale</a> on the VM, exposing it as a host on the network (tailnet in Tailscale parlance).</p>
<p><em>Problem</em>: Kubernetes is an orchestration layer, so now there are many boxes and portforwarding is impossible.</p>
<p><em>Solution</em>: Set up Tailscale as a subnet router inside the VM, using <a href="https://raesene.github.io/blog/2022/06/11/escaping-the-nested-doll-with-tailscale/">Escaping the Nested Doll with Tailscale</a> as a guide. Now I have infinite hosts on the network, and if I want a different configuration I can rollback to a base k8s state, or even set up them up side by side.</p>
<p>Now I'm going to tell you the horrible joke.</p>
<p>Imagine that we're defining processes as code, so a physical server is a container for processes: `</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// http://devserver:80 from nginx -p 80 (the trivial case)
</span><span class="k">val</span> <span class="n">devServer</span><span class="k">:</span> <span class="kt">Server</span> <span class="o">=</span> <span class="nc">Server</span><span class="o">(</span><span class="nc">Process</span><span class="o">(</span><span class="s">"nginx"</span><span class="o">,</span> <span class="mi">80</span><span class="o">))</span>
</code></pre></div></div>
<p>We can describe <code class="language-plaintext highlighter-rouge">Docker</code> and <code class="language-plaintext highlighter-rouge">VirtualMachine</code> the same way:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// docker run -p 80:80 nginx
</span><span class="k">val</span> <span class="n">docker</span><span class="k">:</span> <span class="kt">Docker</span> <span class="o">=</span> <span class="nc">Docker</span><span class="o">(</span><span class="nc">Process</span><span class="o">(</span><span class="s">"nginx"</span><span class="o">,</span> <span class="mi">80</span><span class="o">),</span> <span class="nc">PortForward</span><span class="o">(</span><span class="n">guest</span> <span class="k">=</span> <span class="mi">80</span><span class="o">,</span> <span class="n">host</span> <span class="k">=</span> <span class="mi">80</span><span class="o">))</span>
<span class="c1">// config.vm.provision "shell", inline: "nginx -p 80"
// config.vm.network "nginx-port", guest: 80, host: 80
</span><span class="k">val</span> <span class="n">standardVM</span><span class="k">:</span> <span class="kt">VirtualMachine</span> <span class="o">=</span>
<span class="nc">VirtualMachine</span><span class="o">(</span><span class="nc">Process</span><span class="o">(</span><span class="s">"nginx"</span><span class="o">,</span> <span class="mi">80</span><span class="o">),</span> <span class="nc">PortForward</span><span class="o">(</span><span class="n">guest</span> <span class="k">=</span> <span class="mi">80</span><span class="o">,</span> <span class="n">host</span> <span class="k">=</span> <span class="mi">80</span><span class="o">))</span>
</code></pre></div></div>
<p>And we can build this up by putting Docker inside of a VM:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// config.vm.provision "shell", inline: "docker run -p 80:80 nginx"
// config.vm.network "nginx-port", guest: 80, host: 80
</span><span class="k">val</span> <span class="n">vagrantDocker</span><span class="k">:</span> <span class="kt">VirtualMachine</span> <span class="o">=</span>
<span class="nc">VirtualMachine</span><span class="o">(</span><span class="n">docker</span><span class="o">,</span> <span class="nc">PortForward</span><span class="o">(</span><span class="n">guest</span> <span class="k">=</span> <span class="mi">80</span><span class="o">,</span> <span class="n">host</span> <span class="k">=</span> <span class="mi">80</span><span class="o">)))</span>
</code></pre></div></div>
<p>And see how Kubernetes is a bit different:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// podIP: 10.244.0.6:80, 10.244.0.7:80
</span><span class="k">val</span> <span class="n">kubernetes</span><span class="k">:</span> <span class="kt">Kubernetes</span> <span class="o">=</span> <span class="nc">Kubernetes</span><span class="o">(</span>
<span class="nc">Set</span><span class="o">(</span>
<span class="nc">Pod</span><span class="o">(</span><span class="nc">Process</span><span class="o">(</span><span class="s">"nginx"</span><span class="o">,</span> <span class="mi">80</span><span class="o">)),</span>
<span class="nc">Pod</span><span class="o">(</span><span class="nc">Process</span><span class="o">(</span><span class="s">"nginx"</span><span class="o">,</span> <span class="mi">80</span><span class="o">))</span>
<span class="o">)</span>
<span class="o">)</span>
<span class="c1">// Port mapping breaks down when we have multiple pods on a single VM :-(
</span><span class="k">val</span> <span class="n">vagrantKubernetes</span><span class="k">:</span> <span class="kt">VirtualMachine</span> <span class="o">=</span>
<span class="nc">VirtualMachine</span><span class="o">(</span><span class="n">kubernetes</span><span class="o">,</span> <span class="o">???)</span>
</code></pre></div></div>
<p>From this, we can infer that <code class="language-plaintext highlighter-rouge">Server</code>, <code class="language-plaintext highlighter-rouge">Docker</code> and <code class="language-plaintext highlighter-rouge">Pod</code> are all <code class="language-plaintext highlighter-rouge">Container</code> types:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="nc">Container</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span>
<span class="nc">trait</span> <span class="nc">Server</span> <span class="k">extends</span> <span class="nc">Container</span><span class="o">[</span><span class="kt">Set</span><span class="o">[</span><span class="kt">Process</span><span class="o">]]</span>
<span class="k">trait</span> <span class="nc">Docker</span> <span class="k">extends</span> <span class="nc">Container</span><span class="o">[</span><span class="kt">Set</span><span class="o">[</span><span class="kt">Process</span><span class="o">]]</span> <span class="k">with</span> <span class="nc">Process</span>
<span class="k">trait</span> <span class="nc">Pod</span> <span class="k">extends</span> <span class="nc">Container</span><span class="o">[</span><span class="kt">Set</span><span class="o">[</span><span class="kt">Process</span><span class="o">]]</span> <span class="k">with</span> <span class="nc">Process</span>
</code></pre></div></div>
<p>And that <code class="language-plaintext highlighter-rouge">VirtualMachine</code> and <code class="language-plaintext highlighter-rouge">Kubernetes</code> are also instances of <code class="language-plaintext highlighter-rouge">Container</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="nc">VirtualMachine</span> <span class="k">extends</span> <span class="nc">Container</span><span class="o">[</span><span class="kt">Server</span><span class="o">]</span>
<span class="k">trait</span> <span class="nc">Kubernetes</span> <span class="k">extends</span> <span class="nc">Container</span><span class="o">[</span><span class="kt">Set</span><span class="o">[</span><span class="kt">Pod</span><span class="o">]]</span> <span class="k">extends</span> <span class="nc">Process</span>
</code></pre></div></div>
<p>And Tailscale creates a <code class="language-plaintext highlighter-rouge">Server</code> from <code class="language-plaintext highlighter-rouge">VirtualMachine</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="n">vagrantDocker</span> <span class="k">=</span> <span class="nc">VirtualMachine</span><span class="o">(</span><span class="n">docker</span><span class="o">,</span> <span class="n">portMapping</span><span class="o">))</span>
<span class="c1">// http://vagrant-docker:80 on the tailnet
</span><span class="k">val</span> <span class="n">exposedNginxHost</span><span class="k">:</span> <span class="kt">Server</span> <span class="o">=</span> <span class="nc">Tailscale</span><span class="o">(</span><span class="n">vagrantDocker</span><span class="o">)</span>
</code></pre></div></div>
<p>But if <code class="language-plaintext highlighter-rouge">VirtualMachine</code> is a <code class="language-plaintext highlighter-rouge">Container[Container[Set[Process]]]</code> and <code class="language-plaintext highlighter-rouge">Server</code> is a <code class="language-plaintext highlighter-rouge">Container[Set[Process]]</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="nc">Tailscale</span><span class="k">:</span> <span class="kt">Container</span><span class="o">[</span><span class="kt">Container</span><span class="o">[</span><span class="kt">Set</span><span class="o">[</span><span class="kt">Process</span><span class="o">]]]</span> <span class="k">=></span> <span class="nc">Container</span><span class="o">[</span><span class="kt">Set</span><span class="o">[</span><span class="kt">Process</span><span class="o">]]</span>
</code></pre></div></div>
<p><strong>Tailscale is <code class="language-plaintext highlighter-rouge">flatMap</code> for the Containerization monad.</strong></p>
<p>Still here? Let's dig into the details and I'll show all the setup steps.</p>
<h2 id="putting-the-machine-together">Putting The Machine Together</h2>
<p>Developing on my laptop has issues.</p>
<p>I installed <a href="https://elementary.io/">elementary 5</a> on my laptop a while ago. It's based on Ubuntu 18.04, and I've started running into "version GLIBC_2.28' not found" errors more and more as it gets further behind. There's no way to upgrade Elementary 5 – the upgrade path is to <a href="https://elementaryos.stackexchange.com/questions/28297/how-to-upgrade-from-elementary-os-5-to-6">reinstall the operating system from scratch</a>. And… well, it has all kinds of cruft on it from various docker/k8s/cluster management tools all over it. It works fine as a laptop, but as a development environment it's not great. And trying to use Windows with WSL2 was even worse.</p>
<p>The easiest thing to do – obviously – is to take a week off work, put together a cheap headless machine as a homelab server, stick it in the basement, move everything to that box, and then connect the laptop remotely.</p>
<p>I went with the <a href="https://arstechnica.com/gadgets/2023/04/ars-technica-system-guide-four-pc-builds-for-spring-2023/">Ars Technica System Guide</a> base specs, with a couple of changes: I added 64GB of memory, and I picked out an AMD Ryzen 5 5600X instead of the 5600G. (This was a mistake – the 5600X doesn't have an integrated GPU, leading to a frantic moment trying to figure out why the BIOS wouldn't come up on the HDMI port.) After <a href="https://www.gigabyte.com/Motherboard/B450M-DS3H-WIFI-rev-10-11-12-13/support#support-dl-bios">upgrading the BIOS</a>, staring at the <a href="https://www.gigabyte.com/Motherboard/B450M-DS3H-WIFI-rev-10-11-12-13/support#support-manual">manual for pins</a>, and enabling <a href="https://virtualbill.wordpress.com/2020/06/23/enabling-amd-ryzen-virtualization-functions/">virtualization by turning on SVM Mode</a>, it was finally ready for a minimal Ubuntu install, using <a href="https://learn.microsoft.com/en-us/azure/virtual-machines/linux/use-remote-desktop?tabs=azure-cli#install-and-configure-a-remote-desktop-server">xrdp</a> and <a href="https://remmina.org/how-to-install-remmina/#snap">Remmina</a> to connect remotely.</p>
<p>I named it <code class="language-plaintext highlighter-rouge">devserver</code>.</p>
<h2 id="using-tailscale-for-server-in-the-basement">Using Tailscale for "Server in the Basement"</h2>
<p>The first thing to do was to install <a href="https://tailscale.com/download/">Tailscale</a> on absolutely everything and enable every single feature, especially <a href="https://tailscale.com/kb/1081/magicdns/">DNS</a>.</p>
<p>Tailscale is good at the core use case, but does have some client based issues. For WSL, it won't recognize the tailscale client in the main Windows app, so you have to run <code class="language-plaintext highlighter-rouge">tailscaled</code> explicitly and distinguish it from the Windows host:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>nohup tailscaled &
<span class="nb">sudo </span>tailscale up <span class="nt">--hostname</span> windows-wsl
</code></pre></div></div>
<p>With the laptop, I also sometimes had to do <code class="language-plaintext highlighter-rouge">tailscale down/up</code> or <code class="language-plaintext highlighter-rouge">--reset</code> in order to get the mappings to resolve correctly.</p>
<p>There's a couple of things to be aware of when setting up Tailscale for a server. The first one is <a href="https://tailscale.com/kb/1028/key-expiry/">disabling key expiry</a> for the server, since it's going to be hanging around for a while. The second is that tailscale provides its own <a href="https://tailscale.com/kb/1193/tailscale-ssh/">ssh</a>, which requires its own parameters:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>tailscale up <span class="nt">--ssh</span> <span class="nt">--operator</span><span class="o">=</span><span class="nv">$USER</span>
</code></pre></div></div>
<p>Once SSH was up, it was time to futz with configuration files. I like to use Visual Studio Code with <a href="https://code.visualstudio.com/docs/remote/ssh">SSH remote development</a>, which comes with a secret command line tool for connecting to any host on the tailnet:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>code <span class="nt">--folder-uri</span> <span class="s2">"vscode-remote://ssh-remote+devserver/home/wsargent/"</span>
</code></pre></div></div>
<p>There are also some utilities that Tailscale provides for ad-hoc port forwarding. For example, I can run <code class="language-plaintext highlighter-rouge">jekyll serve</code> on <code class="language-plaintext highlighter-rouge">devserver</code> and it will start a server on port 4000 – I can see how that looks on my phone by using <code class="language-plaintext highlighter-rouge">tailscale serve</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>tailscale serve tcp:4000 tcp://localhost:4000
</code></pre></div></div>
<p>And then I can go to <code class="language-plaintext highlighter-rouge">http://devserver:4000</code> on my phone and see how the blog post looks from there. The <a href="https://login.tailscale.com/admin/services">services page</a> on Tailscale shows a list of ports open on all the machines, so it's easy to see what services are active and how to get to them.</p>
<h2 id="adding-tailscale-to-vagrant">Adding Tailscale to Vagrant</h2>
<p>Adding Tailscale to Vagrant is straightforward. Generate an <a href="https://tailscale.com/kb/1085/auth-keys/">authentication key</a>, make it reusable, and save it into <a href="https://developer.1password.com/">1Password</a> for provisioning. 1Password has a <a href="https://developer.1password.com/docs/cli/">CLI</a> that's very useful in <a href="https://developer.1password.com/docs/cli/secret-references">managing secrets</a> – this blog post is already too long, but there's an <a href="https://github.com/1Password/solutions">example repository</a> that shows how to provision secrets.</p>
<p>I started off with <a href="https://www.virtualbox.org/">Virtualbox</a>, but have been experimenting with <a href="https://ubuntu.com/server/docs/virtualization-libvirt">libvert</a>. To use libvert, add <a href="https://vagrant-libvirt.github.io/vagrant-libvirt/#installation">vagrant-libvert</a> plugin:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt install libvert-dev
<span class="nb">sudo </span>apt-get purge vagrant-libvirt
<span class="nb">sudo </span>apt-mark hold vagrant-libvirt
<span class="nb">sudo </span>apt-get update
<span class="nb">sudo </span>apt-get install <span class="nt">-y</span> qemu libvirt-daemon-system ebtables libguestfs-tools vagrant ruby-fog-libvirt
</code></pre></div></div>
<p>Using <a href="https://github.com/gosuri/vagrant-env">vagrant-env</a> plugin, you can then set up Tailscale on startup and shutdown:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Vagrant</span><span class="p">.</span><span class="nf">configure</span><span class="p">(</span><span class="s2">"2"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">config</span><span class="o">|</span>
<span class="n">config</span><span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">enable</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">box</span> <span class="o">=</span> <span class="no">ENV</span><span class="p">[</span><span class="s1">'VM_BOX'</span><span class="p">]</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">hostname</span> <span class="o">=</span> <span class="no">ENV</span><span class="p">[</span><span class="s1">'VM_HOSTNAME'</span><span class="p">]</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">provider</span> <span class="no">ENV</span><span class="p">[</span><span class="s1">'VM_ENGINE'</span><span class="p">]</span> <span class="k">do</span> <span class="o">|</span><span class="n">v</span><span class="o">|</span>
<span class="n">v</span><span class="p">.</span><span class="nf">name</span> <span class="o">=</span> <span class="no">ENV</span><span class="p">[</span><span class="s1">'VM_HOSTNAME'</span><span class="p">]</span>
<span class="n">v</span><span class="p">.</span><span class="nf">memory</span> <span class="o">=</span> <span class="no">ENV</span><span class="p">[</span><span class="s1">'VM_MEMORY'</span><span class="p">]</span>
<span class="n">v</span><span class="p">.</span><span class="nf">cpus</span> <span class="o">=</span> <span class="no">ENV</span><span class="p">[</span><span class="s1">'VM_CPUS'</span><span class="p">]</span>
<span class="k">end</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">provision</span> <span class="s2">"tailscale-install"</span><span class="p">,</span> <span class="ss">type: </span><span class="s2">"shell"</span> <span class="k">do</span> <span class="o">|</span><span class="n">s</span><span class="o">|</span>
<span class="n">s</span><span class="p">.</span><span class="nf">inline</span> <span class="o">=</span> <span class="s2">"curl -fsSL https://tailscale.com/install.sh | sh"</span>
<span class="k">end</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">provision</span> <span class="s2">"tailscale-up"</span><span class="p">,</span> <span class="ss">type: </span><span class="s2">"shell"</span> <span class="k">do</span> <span class="o">|</span><span class="n">s</span><span class="o">|</span>
<span class="n">s</span><span class="p">.</span><span class="nf">inline</span> <span class="o">=</span> <span class="s2">"tailscale up --ssh --operator=vagrant --authkey </span><span class="si">#{</span><span class="no">ENV</span><span class="p">[</span><span class="s1">'TAILSCALE_AUTHKEY'</span><span class="p">]</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="n">config</span><span class="p">.</span><span class="nf">trigger</span><span class="p">.</span><span class="nf">before</span> <span class="ss">:destroy</span> <span class="k">do</span> <span class="o">|</span><span class="n">trigger</span><span class="o">|</span>
<span class="n">trigger</span><span class="p">.</span><span class="nf">run_remote</span> <span class="o">=</span> <span class="p">{</span><span class="ss">inline: </span><span class="s2">"tailscale logout"</span><span class="p">}</span>
<span class="n">trigger</span><span class="p">.</span><span class="nf">on_error</span> <span class="o">=</span> <span class="ss">:continue</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>To start it, I run <code class="language-plaintext highlighter-rouge">vagrant up</code>. To stop it, I run <code class="language-plaintext highlighter-rouge">vagrant halt</code>. When I'm done experimenting with the environment, I destroy it with <code class="language-plaintext highlighter-rouge">vagrant destroy</code>, and it removes itself from the tailnet automatically.</p>
<h2 id="adding-docker-to-vagrant">Adding Docker to Vagrant</h2>
<p>Now we have to add Docker to Vagrant:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Vagrant</span><span class="p">.</span><span class="nf">configure</span><span class="p">(</span><span class="s2">"2"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">config</span><span class="o">|</span>
<span class="c1"># ...</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">provision</span> <span class="s2">"docker-install"</span><span class="p">,</span> <span class="ss">type: </span><span class="s2">"shell"</span> <span class="k">do</span> <span class="o">|</span><span class="n">s</span><span class="o">|</span>
<span class="n">s</span><span class="p">.</span><span class="nf">inline</span> <span class="o">=</span> <span class="o"><<-</span><span class="no">SCRIPT</span><span class="sh">
curl -fsSL https://get.docker.com -o get-docker.sh &&
sudo sh get-docker.sh &&
sudo adduser vagrant docker
</span><span class="no">SCRIPT</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Now we can have some fun – we'll run two docker compose instances side by side without port conflicts. Checkout <a href="https://github.com/docker/awesome-compose">awesome-compose</a> so it shows up on the <code class="language-plaintext highlighter-rouge">/vagrant</code> mount:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>vagrant ssh
<span class="nv">$ </span><span class="nb">cd</span> /vagrant/awesome-compose/nginx-golang
<span class="nv">$ </span>docker compose up
</code></pre></div></div>
<p>And then again, only this time we have <code class="language-plaintext highlighter-rouge">vagrant-nginx-nodejs-redis</code> as the hostname:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>vagrant ssh
<span class="nv">$ </span><span class="nb">cd</span> /vagrant/awesome-compose/nginx-nodejs-redis
<span class="nv">$ </span>docker compose up
</code></pre></div></div>
<p>Now we've got two nginx instances, both running on port 80 – but they just appear as different hosts.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>curl vagrant-nginx-nodejs-redis
<span class="nv">$ </span>web1: Number of visits is: 1
</code></pre></div></div>
<p>and</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>curl vagrant-nginx-golang
<span class="c">## .</span>
<span class="c">## ## ## ==</span>
<span class="c">## ## ## ## ## ===</span>
/<span class="s2">"""""""""""""""""</span><span class="se">\_</span><span class="s2">__/ ===
{ / ===-
</span><span class="se">\_</span><span class="s2">_____ O __/
</span><span class="se">\ </span><span class="s2"> </span><span class="se">\ </span><span class="s2"> __/
</span><span class="se">\_</span><span class="s2">___</span><span class="se">\_</span><span class="s2">______/
Hello from Docker!
</span></code></pre></div></div>
<p>I also have a <code class="language-plaintext highlighter-rouge">vagrant-docker</code> box that I use for ad-hoc installations. From the laptop, I can install only the docker CLI and set DOCKER_HOST to point the box:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt install docker-ce-cli
<span class="nb">export </span><span class="nv">DOCKER_HOST</span><span class="o">=</span>ssh://vagrant@vagrant-docker
ssh vagrant@vagrant-docker <span class="c"># or tailscale ssh vagrant@vagrant-docker</span>
docker ps <span class="c"># will work after ssh succeeded!</span>
</code></pre></div></div>
<p>And now I can run various services directly and hit them at <code class="language-plaintext highlighter-rouge">http://vagrant-docker:3000</code>.</p>
<h2 id="disposable-cloud-environments">Disposable Cloud Environments</h2>
<p>The limitation of using Docker Compose is that you're still referencing the Vagrant box, and picking out a service by port. If you have a more complex environment, you'll probably have several databases, a key/store server, several microservices and so on. Really, you'd like to be able to spin up Kubernetes inside a Vagrant Box and access all the pods through Tailscale automatically.</p>
<p>Setting up Kubernetes itself is suprisingly simple in a Vagrantfile. For example, setting up <a href="https://kind.sigs.k8s.io/">Kind</a> is as simple as a <code class="language-plaintext highlighter-rouge">vagrant up</code>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Vagrant</span><span class="p">.</span><span class="nf">configure</span><span class="p">(</span><span class="s2">"2"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">config</span><span class="o">|</span>
<span class="c1"># ...install Docker</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">provision</span> <span class="s2">"kind"</span><span class="p">,</span> <span class="ss">type: </span><span class="s2">"shell"</span> <span class="k">do</span> <span class="o">|</span><span class="n">s</span><span class="o">|</span>
<span class="n">s</span><span class="p">.</span><span class="nf">inline</span> <span class="o">=</span> <span class="o"><<-</span><span class="no">SCRIPT</span><span class="sh">
curl -Lo ./kind "https://kind.sigs.k8s.io/dl/v0.18.0/kind-$(uname)-amd64"
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
</span><span class="no">SCRIPT</span>
<span class="k">end</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">provision</span> <span class="s2">"kubectl"</span><span class="p">,</span> <span class="ss">type: </span><span class="s2">"shell"</span> <span class="k">do</span> <span class="o">|</span><span class="n">s</span><span class="o">|</span>
<span class="n">s</span><span class="p">.</span><span class="nf">inline</span> <span class="o">=</span> <span class="s2">"sudo snap install kubectl --classic"</span>
<span class="k">end</span>
<span class="n">config</span><span class="p">.</span><span class="nf">vm</span><span class="p">.</span><span class="nf">provision</span> <span class="s2">"kubectl-completion"</span><span class="p">,</span> <span class="ss">type: </span><span class="s2">"shell"</span> <span class="k">do</span> <span class="o">|</span><span class="n">s</span><span class="o">|</span>
<span class="n">s</span><span class="p">.</span><span class="nf">inline</span> <span class="o">=</span> <span class="s1">'echo "source <(kubectl completion bash)" >> ~/.bashrc'</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Setting up Tailscale in Kubernetes is… not so simple. This took some trial and error, and I leaned heavily on <a href="https://raesene.github.io/blog/2022/06/11/escaping-the-nested-doll-with-tailscale/">Escaping the Nested Doll with Tailscale </a> when going through this.</p>
<p>I like to start fresh so I immediately wipe the cluster to clean everything:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>kind delete cluster
<span class="nv">$ </span>kind create cluster
</code></pre></div></div>
<p>Tailscala's <a href="https://tailscale.com/kb/1185/kubernetes/#subnet-router">Kubernetes subnet routing</a> section is a bit confusing and out of date, so I used the <a href="https://github.com/tailscale/tailscale/blob/main/docs/k8s/README.md">README.md</a> in <a href="https://github.com/tailscale/tailscale/tree/main/docs/k8s">https://github.com/tailscale/tailscale/tree/main/docs/k8s</a> which is slightly different.</p>
<p>First we need to <a href="https://github.com/tailscale/tailscale/blob/main/docs/k8s/README.md#setup">set up</a> with an auth key and write it as a k8s <a href="https://kubernetes.io/docs/concepts/configuration/secret/#service-account-token-secrets">service account token secret</a> and <a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#manually-create-a-long-lived-api-token-for-a-serviceaccount">pass it through</a>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>kubectl apply <span class="nt">-f</span> - <span class="o"><<</span><span class="no">EOF</span><span class="sh"> apiVersion: v1
kind: Secret
metadata:
name: tailscale-auth
stringData:
TS_AUTHKEY: <your-auth-key>
</span><span class="no">EOF
</span></code></pre></div></div>
<p>Then we checkout the github project and get to the Makefile:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git clone https://github.com/tailscale/tailscale
<span class="nv">$ </span><span class="nb">cd </span>tailscale/docs/k8s
</code></pre></div></div>
<p>And execute <code class="language-plaintext highlighter-rouge">make rbac</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>apt install make
<span class="nv">$ </span><span class="nb">export </span><span class="nv">SA_NAME</span><span class="o">=</span>tailscale
<span class="nv">$ </span><span class="nb">export </span><span class="nv">TS_KUBE_SECRET</span><span class="o">=</span>tailscale-auth
<span class="nv">$ </span>make rbac
</code></pre></div></div>
<p>Next, we want to set up a <a href="https://github.com/tailscale/tailscale/blob/main/docs/k8s/README.md#subnet-router">subnet router</a>. We need pod and service IP addresses. We can set up an nginx instance from <a href="https://kubernetes.io/docs/tasks/run-application/run-stateless-application-deployment/">the k8s docs</a>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>kubectl apply <span class="nt">-f</span> https://k8s.io/examples/application/deployment.yaml
</code></pre></div></div>
<p>From there, we can see the IP address of the pods:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>kubectl get pods <span class="nt">-o</span> wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-85996f8dbd-6clrn 1/1 Running 0 5h34m 10.244.0.6 kind-control-plane <none> <none>
nginx-deployment-85996f8dbd-jwk4q 1/1 Running 0 5h34m 10.244.0.5 kind-control-plane <none> <none>
</code></pre></div></div>
<p>and the service:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT<span class="o">(</span>S<span class="o">)</span> AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5h35m
</code></pre></div></div>
<p>Then we set the <code class="language-plaintext highlighter-rouge">TS_ROUTES</code> and call <code class="language-plaintext highlighter-rouge">make subnet-router</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ SERVICE_CIDR</span><span class="o">=</span>10.96.0.0/16
<span class="nv">$ POD_CIDR</span><span class="o">=</span>10.244.0.0/15
<span class="nv">$ </span><span class="nb">export </span><span class="nv">TS_ROUTES</span><span class="o">=</span><span class="nv">$SERVICE_CIDR</span>,<span class="nv">$POD_CIDR</span>
<span class="nv">$ </span>make subnet-router
pod/subnet-router created
</code></pre></div></div>
<p>And finally I can see the subnet router defined in Tailscale:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ tailscale ping subnet-router
pong from subnet-router (100.96.237.125) via DERP(sfo) in 26ms
</code></pre></div></div>
<p>Next, we need to go to the <a href="https://login.tailscale.com/admin/machines">machines page</a> and it will have a little alert next to it saying "Unapproved subnet routes!" Go to the "Edit routes settings" menu option and click Approve All.</p>
<p>And now the pods are accessible via IP address through Tailscale and I can see them from my Windows machine and my iPhone:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>C:\Users\wsargent>curl 10.244.0.6
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
</code></pre></div></div>
<p>Note that the pods do not show up in <code class="language-plaintext highlighter-rouge">tailscale status</code>, and they are not accessible by pod name – only IP address. There is a way to hook Tailscale into the k8s DNS, but I haven't dug into it. I'll abstract this into a Vagrantfile eventually, but this is a good place to stop.</p>
<h2 id="further-work">Further Work</h2>
<p>This approach works out of the box, but could use some optimization.</p>
<p>I need to set up a Docker registry to cache images, and an apt cache so that initializing Vagrant boxes doesn't go over the network. It might make sense to have them run on devserver specifically so they don't have to rely on specific vagrant instances being up (and I can't wipe them out by accident).</p>
<p>I could also make a Vagrant basebox with Tailscale and Docker and remove that bit from initialization.</p>There's a lot in this blog post, so I'll summarize it first, and then tell you a horrible joke that got me a content warning from Slack (but it won't make sense unless you're a functional programming nerd). Goal #1: I want to build out an ELK cluster and Do Science To it. Goal #2: I want to not start from scratch or figure out how to undo things when I screw up. Goal #3: I want to be able to keep working on it from different computers. Goal #4: I want to work through Kubernetes the Hard Way and set up a cloud environment. The solution to #1 is containerization. Run Docker Compose and set up an multi node ElasticSearch cluster. The solution to #2 is virtualization. Create a virtual machine using Vagrant, install Docker on it, then do #1. Now I can take VM snapshots before config changes and rollback if I screwed up, and I can dispose of the boxes when I'm done. The solution to #3 is to build out a homelab server (incredibly cheap at $391). Install Ubuntu, then #2. The solution to #4 is to throw more memory at the homelab server, then install Kind after #2. Because of #2, I can now also mess with Terraform state and get away with it. Pause to build and install everything… Problem: I want to see Kibana from my laptop browser. Docker Compose forwards everything to localhost, and then the VM also requires networking magic to expose it to the host. I have a box containing a box, containing a box, and I don't want to have to port forward all the things. Solution: Install Tailscale on the VM, exposing it as a host on the network (tailnet in Tailscale parlance). Problem: Kubernetes is an orchestration layer, so now there are many boxes and portforwarding is impossible. Solution: Set up Tailscale as a subnet router inside the VM, using Escaping the Nested Doll with Tailscale as a guide. Now I have infinite hosts on the network, and if I want a different configuration I can rollback to a base k8s state, or even set up them up side by side. Now I'm going to tell you the horrible joke. Imagine that we're defining processes as code, so a physical server is a container for processes: ` // http://devserver:80 from nginx -p 80 (the trivial case) val devServer: Server = Server(Process("nginx", 80)) We can describe Docker and VirtualMachine the same way: // docker run -p 80:80 nginx val docker: Docker = Docker(Process("nginx", 80), PortForward(guest = 80, host = 80)) // config.vm.provision "shell", inline: "nginx -p 80" // config.vm.network "nginx-port", guest: 80, host: 80 val standardVM: VirtualMachine = VirtualMachine(Process("nginx", 80), PortForward(guest = 80, host = 80)) And we can build this up by putting Docker inside of a VM: // config.vm.provision "shell", inline: "docker run -p 80:80 nginx" // config.vm.network "nginx-port", guest: 80, host: 80 val vagrantDocker: VirtualMachine = VirtualMachine(docker, PortForward(guest = 80, host = 80))) And see how Kubernetes is a bit different: // podIP: 10.244.0.6:80, 10.244.0.7:80 val kubernetes: Kubernetes = Kubernetes( Set( Pod(Process("nginx", 80)), Pod(Process("nginx", 80)) ) ) // Port mapping breaks down when we have multiple pods on a single VM :-( val vagrantKubernetes: VirtualMachine = VirtualMachine(kubernetes, ???) From this, we can infer that Server, Docker and Pod are all Container types: trait Container[T] trait Server extends Container[Set[Process]] trait Docker extends Container[Set[Process]] with Process trait Pod extends Container[Set[Process]] with Process And that VirtualMachine and Kubernetes are also instances of Container: trait VirtualMachine extends Container[Server] trait Kubernetes extends Container[Set[Pod]] extends Process And Tailscale creates a Server from VirtualMachine: val vagrantDocker = VirtualMachine(docker, portMapping)) // http://vagrant-docker:80 on the tailnet val exposedNginxHost: Server = Tailscale(vagrantDocker) But if VirtualMachine is a Container[Container[Set[Process]]] and Server is a Container[Set[Process]]: val Tailscale: Container[Container[Set[Process]]] => Container[Set[Process]] Tailscale is flatMap for the Containerization monad. Still here? Let's dig into the details and I'll show all the setup steps. Putting The Machine Together Developing on my laptop has issues. I installed elementary 5 on my laptop a while ago. It's based on Ubuntu 18.04, and I've started running into "version GLIBC_2.28' not found" errors more and more as it gets further behind. There's no way to upgrade Elementary 5 – the upgrade path is to reinstall the operating system from scratch. And… well, it has all kinds of cruft on it from various docker/k8s/cluster management tools all over it. It works fine as a laptop, but as a development environment it's not great. And trying to use Windows with WSL2 was even worse. The easiest thing to do – obviously – is to take a week off work, put together a cheap headless machine as a homelab server, stick it in the basement, move everything to that box, and then connect the laptop remotely. I went with the Ars Technica System Guide base specs, with a couple of changes: I added 64GB of memory, and I picked out an AMD Ryzen 5 5600X instead of the 5600G. (This was a mistake – the 5600X doesn't have an integrated GPU, leading to a frantic moment trying to figure out why the BIOS wouldn't come up on the HDMI port.) After upgrading the BIOS, staring at the manual for pins, and enabling virtualization by turning on SVM Mode, it was finally ready for a minimal Ubuntu install, using xrdp and Remmina to connect remotely. I named it devserver. Using Tailscale for "Server in the Basement" The first thing to do was to install Tailscale on absolutely everything and enable every single feature, especially DNS. Tailscale is good at the core use case, but does have some client based issues. For WSL, it won't recognize the tailscale client in the main Windows app, so you have to run tailscaled explicitly and distinguish it from the Windows host: sudo nohup tailscaled & sudo tailscale up --hostname windows-wsl With the laptop, I also sometimes had to do tailscale down/up or --reset in order to get the mappings to resolve correctly. There's a couple of things to be aware of when setting up Tailscale for a server. The first one is disabling key expiry for the server, since it's going to be hanging around for a while. The second is that tailscale provides its own ssh, which requires its own parameters: sudo tailscale up --ssh --operator=$USER Once SSH was up, it was time to futz with configuration files. I like to use Visual Studio Code with SSH remote development, which comes with a secret command line tool for connecting to any host on the tailnet: code --folder-uri "vscode-remote://ssh-remote+devserver/home/wsargent/" There are also some utilities that Tailscale provides for ad-hoc port forwarding. For example, I can run jekyll serve on devserver and it will start a server on port 4000 – I can see how that looks on my phone by using tailscale serve: sudo tailscale serve tcp:4000 tcp://localhost:4000 And then I can go to http://devserver:4000 on my phone and see how the blog post looks from there. The services page on Tailscale shows a list of ports open on all the machines, so it's easy to see what services are active and how to get to them. Adding Tailscale to Vagrant Adding Tailscale to Vagrant is straightforward. Generate an authentication key, make it reusable, and save it into 1Password for provisioning. 1Password has a CLI that's very useful in managing secrets – this blog post is already too long, but there's an example repository that shows how to provision secrets. I started off with Virtualbox, but have been experimenting with libvert. To use libvert, add vagrant-libvert plugin: sudo apt install libvert-dev sudo apt-get purge vagrant-libvirt sudo apt-mark hold vagrant-libvirt sudo apt-get update sudo apt-get install -y qemu libvirt-daemon-system ebtables libguestfs-tools vagrant ruby-fog-libvirt Using vagrant-env plugin, you can then set up Tailscale on startup and shutdown: Vagrant.configure("2") do |config| config.env.enable config.vm.box = ENV['VM_BOX'] config.vm.hostname = ENV['VM_HOSTNAME'] config.vm.provider ENV['VM_ENGINE'] do |v| v.name = ENV['VM_HOSTNAME'] v.memory = ENV['VM_MEMORY'] v.cpus = ENV['VM_CPUS'] end config.vm.provision "tailscale-install", type: "shell" do |s| s.inline = "curl -fsSL https://tailscale.com/install.sh | sh" end config.vm.provision "tailscale-up", type: "shell" do |s| s.inline = "tailscale up --ssh --operator=vagrant --authkey #{ENV['TAILSCALE_AUTHKEY']}" end config.trigger.before :destroy do |trigger| trigger.run_remote = {inline: "tailscale logout"} trigger.on_error = :continue end end To start it, I run vagrant up. To stop it, I run vagrant halt. When I'm done experimenting with the environment, I destroy it with vagrant destroy, and it removes itself from the tailnet automatically. Adding Docker to Vagrant Now we have to add Docker to Vagrant: Vagrant.configure("2") do |config| # ... config.vm.provision "docker-install", type: "shell" do |s| s.inline = <<-SCRIPT curl -fsSL https://get.docker.com -o get-docker.sh && sudo sh get-docker.sh && sudo adduser vagrant docker SCRIPT end end Now we can have some fun – we'll run two docker compose instances side by side without port conflicts. Checkout awesome-compose so it shows up on the /vagrant mount: $ vagrant ssh $ cd /vagrant/awesome-compose/nginx-golang $ docker compose up And then again, only this time we have vagrant-nginx-nodejs-redis as the hostname: $ vagrant ssh $ cd /vagrant/awesome-compose/nginx-nodejs-redis $ docker compose up Now we've got two nginx instances, both running on port 80 – but they just appear as different hosts. $ curl vagrant-nginx-nodejs-redis $ web1: Number of visits is: 1 and $ curl vagrant-nginx-golang ## . ## ## ## == ## ## ## ## ## === /"""""""""""""""""\___/ === { / ===- \______ O __/ \ \ __/ \____\_______/ Hello from Docker! I also have a vagrant-docker box that I use for ad-hoc installations. From the laptop, I can install only the docker CLI and set DOCKER_HOST to point the box: sudo apt install docker-ce-cli export DOCKER_HOST=ssh://vagrant@vagrant-docker ssh vagrant@vagrant-docker # or tailscale ssh vagrant@vagrant-docker docker ps # will work after ssh succeeded! And now I can run various services directly and hit them at http://vagrant-docker:3000. Disposable Cloud Environments The limitation of using Docker Compose is that you're still referencing the Vagrant box, and picking out a service by port. If you have a more complex environment, you'll probably have several databases, a key/store server, several microservices and so on. Really, you'd like to be able to spin up Kubernetes inside a Vagrant Box and access all the pods through Tailscale automatically. Setting up Kubernetes itself is suprisingly simple in a Vagrantfile. For example, setting up Kind is as simple as a vagrant up: Vagrant.configure("2") do |config| # ...install Docker config.vm.provision "kind", type: "shell" do |s| s.inline = <<-SCRIPT curl -Lo ./kind "https://kind.sigs.k8s.io/dl/v0.18.0/kind-$(uname)-amd64" chmod +x ./kind sudo mv ./kind /usr/local/bin/kind SCRIPT end config.vm.provision "kubectl", type: "shell" do |s| s.inline = "sudo snap install kubectl --classic" end config.vm.provision "kubectl-completion", type: "shell" do |s| s.inline = 'echo "source <(kubectl completion bash)" >> ~/.bashrc' end end Setting up Tailscale in Kubernetes is… not so simple. This took some trial and error, and I leaned heavily on Escaping the Nested Doll with Tailscale when going through this. I like to start fresh so I immediately wipe the cluster to clean everything: $ kind delete cluster $ kind create cluster Tailscala's Kubernetes subnet routing section is a bit confusing and out of date, so I used the README.md in https://github.com/tailscale/tailscale/tree/main/docs/k8s which is slightly different. First we need to set up with an auth key and write it as a k8s service account token secret and pass it through: $ kubectl apply -f - <<EOF apiVersion: v1 kind: Secret metadata: name: tailscale-auth stringData: TS_AUTHKEY: <your-auth-key> EOF Then we checkout the github project and get to the Makefile: $ git clone https://github.com/tailscale/tailscale $ cd tailscale/docs/k8s And execute make rbac: $ sudo apt install make $ export SA_NAME=tailscale $ export TS_KUBE_SECRET=tailscale-auth $ make rbac Next, we want to set up a subnet router. We need pod and service IP addresses. We can set up an nginx instance from the k8s docs: $ kubectl apply -f https://k8s.io/examples/application/deployment.yaml From there, we can see the IP address of the pods: $ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-85996f8dbd-6clrn 1/1 Running 0 5h34m 10.244.0.6 kind-control-plane <none> <none> nginx-deployment-85996f8dbd-jwk4q 1/1 Running 0 5h34m 10.244.0.5 kind-control-plane <none> <none> and the service: $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5h35m Then we set the TS_ROUTES and call make subnet-router: $ SERVICE_CIDR=10.96.0.0/16 $ POD_CIDR=10.244.0.0/15 $ export TS_ROUTES=$SERVICE_CIDR,$POD_CIDR $ make subnet-router pod/subnet-router created And finally I can see the subnet router defined in Tailscale: $ tailscale ping subnet-router pong from subnet-router (100.96.237.125) via DERP(sfo) in 26ms Next, we need to go to the machines page and it will have a little alert next to it saying "Unapproved subnet routes!" Go to the "Edit routes settings" menu option and click Approve All. And now the pods are accessible via IP address through Tailscale and I can see them from my Windows machine and my iPhone: C:\Users\wsargent>curl 10.244.0.6 <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html> Note that the pods do not show up in tailscale status, and they are not accessible by pod name – only IP address. There is a way to hook Tailscale into the k8s DNS, but I haven't dug into it. I'll abstract this into a Vagrantfile eventually, but this is a good place to stop. Further Work This approach works out of the box, but could use some optimization. I need to set up a Docker registry to cache images, and an apt cache so that initializing Vagrant boxes doesn't go over the network. It might make sense to have them run on devserver specifically so they don't have to rely on specific vagrant instances being up (and I can't wipe them out by accident). I could also make a Vagrant basebox with Tailscale and Docker and remove that bit from initialization.Dynamic Logging With Ammonite and JBang2023-03-12T17:29:33-07:002023-03-12T17:29:33-07:00https://tersesystems.com/blog/2023/03/12/dynamic-logging-with-ammonite-script<p>It was fun putting together a <a href="https://tersesystems.com/blog/2022/04/09/dynamic-debug-logging-and-echopraxia-improvements/">proof of concept</a>, so I've been trying to see what the smallest possible dynamic logging demo can be.</p>
<p><a href="https://ammonite.io/">Ammonite</a> and <a href="https://www.jbang.dev/documentation/guide/latest/index.html">JBang</a> are tools that internalize the work of dependency management and build options so that you can run a script without going through a build tool. This is really useful if you want to just download and play with something, because you can just copy and paste text into a file and be done.</p>
<p>Here's the <a href="https://github.com/tersesystems/smallest-dynamic-logging-example">github repo</a>.</p>
<h2 id="ammonite">Ammonite</h2>
<p>First the <a href="https://ammonite.io/">Ammonite</a> script, you can run this using <code class="language-plaintext highlighter-rouge">amm script.sc</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="nn">$ivy.</span><span class="o">{</span>
<span class="n">`com.tersesystems.echopraxia.plusscala::logger:1.1.2`</span><span class="o">,</span>
<span class="n">`com.tersesystems.echopraxia:scripting:2.2.4`</span><span class="o">,</span>
<span class="n">`com.tersesystems.echopraxia:logstash:2.2.4`</span><span class="o">,</span>
<span class="n">`com.tersesystems.logback:logback-classic:1.2.0`</span><span class="o">,</span>
<span class="n">`com.lihaoyi::os-lib:0.9.1`</span>
<span class="o">}</span>
<span class="k">import</span> <span class="nn">com.tersesystems.echopraxia.plusscala._</span>
<span class="k">import</span> <span class="nn">com.tersesystems.echopraxia.plusscala.api._</span>
<span class="k">import</span> <span class="nn">com.tersesystems.echopraxia.scripting._</span>
<span class="k">import</span> <span class="nn">com.tersesystems.logback.classic.ChangeLogLevel</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">ScriptService</span><span class="o">(</span><span class="n">dir</span><span class="k">:</span> <span class="kt">os.Path</span><span class="o">)</span> <span class="o">{</span>
<span class="k">private</span> <span class="k">val</span> <span class="n">sws</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">ScriptWatchService</span><span class="o">(</span><span class="n">dir</span><span class="o">.</span><span class="n">toNIO</span><span class="o">);</span>
<span class="k">def</span> <span class="n">condition</span><span class="o">(</span><span class="n">path</span><span class="k">:</span> <span class="kt">os.Path</span><span class="o">)</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">scriptHandle</span> <span class="k">=</span> <span class="n">sws</span><span class="o">.</span><span class="n">watchScript</span><span class="o">(</span><span class="n">path</span><span class="o">.</span><span class="n">toNIO</span><span class="o">,</span> <span class="k">_</span><span class="o">.</span><span class="n">printStackTrace</span><span class="o">)</span>
<span class="nc">ScriptCondition</span><span class="o">.</span><span class="n">create</span><span class="o">(</span><span class="n">scriptHandle</span><span class="o">).</span><span class="n">asScala</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">object</span> <span class="nc">TweakFlow</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">default</span> <span class="k">=</span> <span class="s">"""
|library echopraxia {
| function evaluate: (string level, dict ctx) ->
| let {
| find_string: ctx[:find_string];
| }
| find_string("$.foo") == "bar";
|}
"""</span><span class="o">.</span><span class="n">stripMargin</span>
<span class="o">}</span>
<span class="nd">@main</span>
<span class="k">def</span> <span class="n">main</span><span class="o">()</span> <span class="k">=</span> <span class="o">{</span>
<span class="c1">// No logback.xml, we're doing it live
</span> <span class="k">val</span> <span class="n">changer</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">ChangeLogLevel</span>
<span class="n">changer</span><span class="o">.</span><span class="n">changeLogLevel</span><span class="o">(</span><span class="s">"ROOT"</span><span class="o">,</span> <span class="s">"INFO"</span><span class="o">)</span>
<span class="k">val</span> <span class="n">logger</span> <span class="k">=</span> <span class="nc">LoggerFactory</span><span class="o">.</span><span class="n">getLogger</span>
<span class="n">changer</span><span class="o">.</span><span class="n">changeLogLevel</span><span class="o">(</span><span class="n">logger</span><span class="o">.</span><span class="n">name</span><span class="o">,</span> <span class="s">"DEBUG"</span><span class="o">)</span>
<span class="c1">// Ensure a script exists and is watched
</span> <span class="k">val</span> <span class="n">dir</span> <span class="k">=</span> <span class="n">os</span><span class="o">.</span><span class="n">pwd</span>
<span class="k">val</span> <span class="n">service</span> <span class="k">=</span> <span class="nc">ScriptService</span><span class="o">(</span><span class="n">dir</span><span class="o">)</span>
<span class="k">val</span> <span class="n">tweakflowFile</span> <span class="k">=</span> <span class="n">dir</span> <span class="o">/</span> <span class="s">"tweakflow.tf"</span>
<span class="k">if</span> <span class="o">(!</span> <span class="n">os</span><span class="o">.</span><span class="n">isFile</span><span class="o">(</span><span class="n">tweakflowFile</span><span class="o">))</span> <span class="o">{</span>
<span class="n">os</span><span class="o">.</span><span class="n">write</span><span class="o">(</span><span class="n">tweakflowFile</span><span class="o">,</span> <span class="nc">TweakFlow</span><span class="o">.</span><span class="n">default</span><span class="o">)</span>
<span class="o">}</span>
<span class="c1">// now we're sure the file exists, set up a condition and run in a loop.
</span> <span class="k">val</span> <span class="n">condition</span> <span class="k">=</span> <span class="n">service</span><span class="o">.</span><span class="n">condition</span><span class="o">(</span><span class="n">tweakflowFile</span><span class="o">)</span>
<span class="k">while</span> <span class="o">(</span><span class="kc">true</span><span class="o">)</span> <span class="o">{</span>
<span class="k">try</span> <span class="o">{</span>
<span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="o">(</span><span class="n">condition</span><span class="o">,</span> <span class="s">"{}"</span><span class="o">,</span> <span class="n">fb</span> <span class="k">=></span> <span class="n">fb</span><span class="o">.</span><span class="n">keyValue</span><span class="o">(</span><span class="s">"foo"</span> <span class="o">-></span> <span class="s">"bar"</span><span class="o">));</span>
<span class="o">}</span> <span class="k">finally</span> <span class="o">{</span>
<span class="nc">Thread</span><span class="o">.</span><span class="n">sleep</span><span class="o">(</span><span class="mi">2000L</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<h2 id="jbang">JBang</h2>
<p>And now <a href="https://www.jbang.dev/documentation/guide/latest/index.html">JBang</a>. Be sure you have something in <code class="language-plaintext highlighter-rouge">JAVA_HOME</code> or else it will <em>automatically download and install a JDK itself</em>. I usually point it at JDK 17 using the <a href="https://www.jbang.dev/documentation/guide/latest/javaversions.html#managing-jdks">jbang jdk</a> feature:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>jbang jdk install 17 `sdk home java 17.0.4.1-tem`
</code></pre></div></div>
<p>Here's the script: this works on a JDK 13 or above JVM because it uses text blocks:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">///usr/bin/env jbang "$0" "$@" ; exit $?</span>
<span class="c1">//DEPS com.tersesystems.echopraxia:logger:2.2.4</span>
<span class="c1">//DEPS com.tersesystems.echopraxia:logstash:2.2.4</span>
<span class="c1">//DEPS com.tersesystems.echopraxia:scripting:2.2.4</span>
<span class="c1">//DEPS com.tersesystems.logback:logback-classic:1.2.0</span>
<span class="kn">import</span> <span class="nn">com.tersesystems.echopraxia.*</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">com.tersesystems.echopraxia.api.*</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">com.tersesystems.echopraxia.scripting.*</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">com.tersesystems.logback.classic.ChangeLogLevel</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.nio.*</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">java.nio.file.*</span><span class="o">;</span>
<span class="kd">public</span> <span class="kd">class</span> <span class="nc">Script</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">Logger</span><span class="o"><?></span> <span class="n">logger</span> <span class="o">=</span> <span class="n">LoggerFactory</span><span class="o">.</span><span class="na">getLogger</span><span class="o">(</span><span class="n">Script</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">String</span> <span class="n">defaultScript</span> <span class="o">=</span> <span class="s">"""
import * as std from "</span><span class="n">std</span><span class="s">";
alias std.strings as str;
library echopraxia {
function evaluate: (string level, dict ctx) ->
let {
find_string: ctx[:find_string];
}
str.lower_case(find_string("</span><span class="err">$</span><span class="o">.</span><span class="na">foo</span><span class="s">")) == "</span><span class="n">bar</span><span class="s">";
}
"""</span><span class="o">;</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="n">String</span><span class="o">...</span> <span class="n">args</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">java</span><span class="o">.</span><span class="na">io</span><span class="o">.</span><span class="na">IOException</span> <span class="o">{</span>
<span class="n">ChangeLogLevel</span> <span class="n">changer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ChangeLogLevel</span><span class="o">();</span>
<span class="n">changer</span><span class="o">.</span><span class="na">changeLogLevel</span><span class="o">(</span><span class="s">"ROOT"</span><span class="o">,</span> <span class="s">"INFO"</span><span class="o">);</span>
<span class="n">changer</span><span class="o">.</span><span class="na">changeLogLevel</span><span class="o">(</span><span class="n">logger</span><span class="o">.</span><span class="na">getName</span><span class="o">(),</span> <span class="s">"DEBUG"</span><span class="o">);</span>
<span class="n">Path</span> <span class="n">watchedDir</span> <span class="o">=</span> <span class="n">Paths</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="s">"."</span><span class="o">);</span>
<span class="n">ScriptWatchService</span> <span class="n">watchService</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ScriptWatchService</span><span class="o">(</span><span class="n">watchedDir</span><span class="o">);</span>
<span class="n">Path</span> <span class="n">filePath</span> <span class="o">=</span> <span class="n">watchedDir</span><span class="o">.</span><span class="na">resolve</span><span class="o">(</span><span class="s">"tweakflow.tf"</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(!</span> <span class="n">Files</span><span class="o">.</span><span class="na">exists</span><span class="o">(</span><span class="n">filePath</span><span class="o">))</span> <span class="o">{</span>
<span class="n">Files</span><span class="o">.</span><span class="na">writeString</span><span class="o">(</span><span class="n">filePath</span><span class="o">,</span> <span class="n">defaultScript</span><span class="o">);</span>
<span class="o">}</span>
<span class="n">ScriptHandle</span> <span class="n">watchedHandle</span> <span class="o">=</span> <span class="n">watchService</span><span class="o">.</span><span class="na">watchScript</span><span class="o">(</span><span class="n">filePath</span><span class="o">,</span> <span class="n">e</span> <span class="o">-></span> <span class="n">logger</span><span class="o">.</span><span class="na">error</span><span class="o">(</span><span class="s">"Script compilation error"</span><span class="o">,</span> <span class="n">e</span><span class="o">));</span>
<span class="n">Condition</span> <span class="n">condition</span> <span class="o">=</span> <span class="n">ScriptCondition</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="n">watchedHandle</span><span class="o">);</span>
<span class="n">logger</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="n">condition</span><span class="o">,</span> <span class="s">"{}"</span><span class="o">,</span> <span class="n">fb</span> <span class="o">-></span> <span class="n">fb</span><span class="o">.</span><span class="na">string</span><span class="o">(</span><span class="s">"foo"</span><span class="o">,</span> <span class="s">"BAR"</span><span class="o">));</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>And that's it!</p>It was fun putting together a proof of concept, so I've been trying to see what the smallest possible dynamic logging demo can be. Ammonite and JBang are tools that internalize the work of dependency management and build options so that you can run a script without going through a build tool. This is really useful if you want to just download and play with something, because you can just copy and paste text into a file and be done. Here's the github repo. Ammonite First the Ammonite script, you can run this using amm script.sc: import $ivy.{ `com.tersesystems.echopraxia.plusscala::logger:1.1.2`, `com.tersesystems.echopraxia:scripting:2.2.4`, `com.tersesystems.echopraxia:logstash:2.2.4`, `com.tersesystems.logback:logback-classic:1.2.0`, `com.lihaoyi::os-lib:0.9.1` } import com.tersesystems.echopraxia.plusscala._ import com.tersesystems.echopraxia.plusscala.api._ import com.tersesystems.echopraxia.scripting._ import com.tersesystems.logback.classic.ChangeLogLevel case class ScriptService(dir: os.Path) { private val sws = new ScriptWatchService(dir.toNIO); def condition(path: os.Path) = { val scriptHandle = sws.watchScript(path.toNIO, _.printStackTrace) ScriptCondition.create(scriptHandle).asScala } } object TweakFlow { val default = """ |library echopraxia { | function evaluate: (string level, dict ctx) -> | let { | find_string: ctx[:find_string]; | } | find_string("$.foo") == "bar"; |} """.stripMargin } @main def main() = { // No logback.xml, we're doing it live val changer = new ChangeLogLevel changer.changeLogLevel("ROOT", "INFO") val logger = LoggerFactory.getLogger changer.changeLogLevel(logger.name, "DEBUG") // Ensure a script exists and is watched val dir = os.pwd val service = ScriptService(dir) val tweakflowFile = dir / "tweakflow.tf" if (! os.isFile(tweakflowFile)) { os.write(tweakflowFile, TweakFlow.default) } // now we're sure the file exists, set up a condition and run in a loop. val condition = service.condition(tweakflowFile) while (true) { try { logger.debug(condition, "{}", fb => fb.keyValue("foo" -> "bar")); } finally { Thread.sleep(2000L); } } } JBang And now JBang. Be sure you have something in JAVA_HOME or else it will automatically download and install a JDK itself. I usually point it at JDK 17 using the jbang jdk feature: jbang jdk install 17 `sdk home java 17.0.4.1-tem` Here's the script: this works on a JDK 13 or above JVM because it uses text blocks: ///usr/bin/env jbang "$0" "$@" ; exit $? //DEPS com.tersesystems.echopraxia:logger:2.2.4 //DEPS com.tersesystems.echopraxia:logstash:2.2.4 //DEPS com.tersesystems.echopraxia:scripting:2.2.4 //DEPS com.tersesystems.logback:logback-classic:1.2.0 import com.tersesystems.echopraxia.*; import com.tersesystems.echopraxia.api.*; import com.tersesystems.echopraxia.scripting.*; import com.tersesystems.logback.classic.ChangeLogLevel; import java.nio.*; import java.nio.file.*; public class Script { private static final Logger<?> logger = LoggerFactory.getLogger(Script.class); private static final String defaultScript = """ import * as std from "std"; alias std.strings as str; library echopraxia { function evaluate: (string level, dict ctx) -> let { find_string: ctx[:find_string]; } str.lower_case(find_string("$.foo")) == "bar"; } """; public static void main(String... args) throws java.io.IOException { ChangeLogLevel changer = new ChangeLogLevel(); changer.changeLogLevel("ROOT", "INFO"); changer.changeLogLevel(logger.getName(), "DEBUG"); Path watchedDir = Paths.get("."); ScriptWatchService watchService = new ScriptWatchService(watchedDir); Path filePath = watchedDir.resolve("tweakflow.tf"); if (! Files.exists(filePath)) { Files.writeString(filePath, defaultScript); } ScriptHandle watchedHandle = watchService.watchScript(filePath, e -> logger.error("Script compilation error", e)); Condition condition = ScriptCondition.create(watchedHandle); logger.info(condition, "{}", fb -> fb.string("foo", "BAR")); } } And that's it!Blindsight vs Echopraxia2023-03-12T15:39:50-07:002023-03-12T15:39:50-07:00https://tersesystems.com/blog/2023/03/12/blindsight-vs-echopraxia<p>Very small blog post as I am noodling and working on a new release of <a href="https://github.com/tersesystems/blindsight">Blindsight</a>.</p>
<p>I've said before that <a href="https://github.com/tersesystems/blindsight">Blindsight</a> "supports" structured logging. <a href="https://github.com/tersesystems/echopraxia">Echopraxia</a> "requires" structured logging. It occurs to me that this is really a bit backwards.</p>
<p>Structured logging typically talks about the output of logging: mapping whatever data you have in the logging event into JSON. But this doesn't talk about the inputs – how you provide the JSON with something to chew on. More accurately, we should call most structured logging "structured output" because the output is structured even when most of the input isn't.</p>
<p>So what is a structured input? A structured input is a reliable key/value pair, where the key is typically a string.</p>
<p>For a long time in SLF4J, MDC was the only way to reliably establish a key/value pair in SLF4J, but it wasn't complete because it could only take a string as the value. Then <a href="https://github.com/logfellow/logstash-logback-encoder">logstash-logback-encoder</a> added <a href="https://github.com/logfellow/logstash-logback-encoder#event-specific-custom-fields">event specific custom fields</a>, but the value was still <code class="language-plaintext highlighter-rouge">java.lang.Object</code> and it is not a consistent structure – for example, you can't specify a <code class="language-plaintext highlighter-rouge">StructuredArgument</code> as the value of another <code class="language-plaintext highlighter-rouge">StructuredArgument</code>, and building up a complex semi-structured object is not possible.</p>
<p><a href="https://github.com/tersesystems/blindsight">Blindsight</a> gives you the option of providing structured input using an <code class="language-plaintext highlighter-rouge">Argument</code> with <a href="https://tersesystems.github.io/blindsight/usage/dsl.html">DSL</a>, and does have a consistent structure. But Blindsight doesn't require that of you. You can mix and match structured and unstructured input, and it's fine:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="nn">com.tersesystems.blindsight.DSL._</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="o">(</span><span class="s">"unstructured = {} structured = {}"</span><span class="o">,</span> <span class="s">"string"</span><span class="o">,</span> <span class="n">bobj</span><span class="o">(</span><span class="s">"instant"</span> <span class="o">-></span> <span class="nc">Instant</span><span class="o">.</span><span class="n">now</span><span class="o">))</span>
</code></pre></div></div>
<p><a href="https://github.com/tersesystems/echopraxia">Echopraxia</a> requires <em>all input</em> to have structure, by converting input into <code class="language-plaintext highlighter-rouge">Field</code> instances through a <code class="language-plaintext highlighter-rouge">FieldBuilder</code> and instead of varadic arguments, there's a <code class="language-plaintext highlighter-rouge">FieldBuilder => FieldBuilderResult</code> function. For the <a href="https://github.com/tersesystems/echopraxia-plusscala">Scala API</a>, it looks like this:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="o">(</span><span class="s">"{}"</span><span class="o">,</span> <span class="k">_</span><span class="o">.</span><span class="n">keyValue</span><span class="o">(</span><span class="s">"foo"</span> <span class="o">-></span> <span class="s">"bar"</span><span class="o">))</span>
</code></pre></div></div>
<p>So why require structured input?</p>
<p>The big answer is that structured input is valuable for <a href="https://tersesystems.com/blog/2020/01/22/developing-in-production/">developing in the large</a>. By and large, structured formats are the norm in any kind of service: Protobuf, Avro, Parquet, HTTP parameters, and so on. Being able to carry structure over and through into logging adds coherence and allows logging-specific serialization of complex objects.</p>
<p>The more detailed answer is that once you can rely on structured input, you can query and filter your log events vastly more effectively. You can also choose how to render fields in the event, not just in JSON but also for line oriented encoders. For example, you can say <code class="language-plaintext highlighter-rouge">%fields{$.request_id}</code> and render only <code class="language-plaintext highlighter-rouge">request_id</code> value using a custom converter with a pattern encoder in Logback:</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><configuration></span>
<span class="nt"><conversionRule</span> <span class="na">conversionWord=</span><span class="s">"fields"</span> <span class="na">converterClass=</span><span class="s">"com.tersesystems.echopraxia.logstash.FieldConverter"</span><span class="nt">/></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"STDOUT"</span> <span class="na">class=</span><span class="s">"ch.qos.logback.core.ConsoleAppender"</span><span class="nt">></span>
<span class="nt"><encoder></span>
<span class="nt"><pattern></span>
%-4relative [%thread] %-5level [%fields{$.request_id}] %logger - %msg%n
<span class="nt"></pattern></span>
<span class="nt"></encoder></span>
<span class="nt"></appender></span>
<span class="nt"><root</span> <span class="na">level=</span><span class="s">"DEBUG"</span><span class="nt">></span>
<span class="nt"><appender-ref</span> <span class="na">ref=</span><span class="s">"STDOUT"</span><span class="nt">/></span>
<span class="nt"></root></span>
<span class="nt"></configuration></span>
</code></pre></div></div>
<p>Can you do this with unstructured input, or with a mix of structured and unstructured input? Sort of. Imagine your input is a list of random <code class="language-plaintext highlighter-rouge">Object</code> with the only real guarantee that <code class="language-plaintext highlighter-rouge">toString</code> will return a <code class="language-plaintext highlighter-rouge">String</code>. If you are given an object that contains an array, how do you query and filter on that component? You have to explictly cast to the type, and then query on it. This is very difficult to do from inside a Logback filter, which is usually kept apart from the domain classes.</p>
<p>So my argument is that it's a trade off. Blindsight takes a permissive approach and requires type safety, but does not require structure. Echopraxia takes a stricter approach and requires both type safety and structure, with more control over output.</p>Very small blog post as I am noodling and working on a new release of Blindsight. I've said before that Blindsight "supports" structured logging. Echopraxia "requires" structured logging. It occurs to me that this is really a bit backwards. Structured logging typically talks about the output of logging: mapping whatever data you have in the logging event into JSON. But this doesn't talk about the inputs – how you provide the JSON with something to chew on. More accurately, we should call most structured logging "structured output" because the output is structured even when most of the input isn't. So what is a structured input? A structured input is a reliable key/value pair, where the key is typically a string. For a long time in SLF4J, MDC was the only way to reliably establish a key/value pair in SLF4J, but it wasn't complete because it could only take a string as the value. Then logstash-logback-encoder added event specific custom fields, but the value was still java.lang.Object and it is not a consistent structure – for example, you can't specify a StructuredArgument as the value of another StructuredArgument, and building up a complex semi-structured object is not possible. Blindsight gives you the option of providing structured input using an Argument with DSL, and does have a consistent structure. But Blindsight doesn't require that of you. You can mix and match structured and unstructured input, and it's fine: import com.tersesystems.blindsight.DSL._ logger.info("unstructured = {} structured = {}", "string", bobj("instant" -> Instant.now)) Echopraxia requires all input to have structure, by converting input into Field instances through a FieldBuilder and instead of varadic arguments, there's a FieldBuilder => FieldBuilderResult function. For the Scala API, it looks like this: logger.debug("{}", _.keyValue("foo" -> "bar")) So why require structured input? The big answer is that structured input is valuable for developing in the large. By and large, structured formats are the norm in any kind of service: Protobuf, Avro, Parquet, HTTP parameters, and so on. Being able to carry structure over and through into logging adds coherence and allows logging-specific serialization of complex objects. The more detailed answer is that once you can rely on structured input, you can query and filter your log events vastly more effectively. You can also choose how to render fields in the event, not just in JSON but also for line oriented encoders. For example, you can say %fields{$.request_id} and render only request_id value using a custom converter with a pattern encoder in Logback: <configuration> <conversionRule conversionWord="fields" converterClass="com.tersesystems.echopraxia.logstash.FieldConverter"/> <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender"> <encoder> <pattern> %-4relative [%thread] %-5level [%fields{$.request_id}] %logger - %msg%n </pattern> </encoder> </appender> <root level="DEBUG"> <appender-ref ref="STDOUT"/> </root> </configuration> Can you do this with unstructured input, or with a mix of structured and unstructured input? Sort of. Imagine your input is a list of random Object with the only real guarantee that toString will return a String. If you are given an object that contains an array, how do you query and filter on that component? You have to explictly cast to the type, and then query on it. This is very difficult to do from inside a Logback filter, which is usually kept apart from the domain classes. So my argument is that it's a trade off. Blindsight takes a permissive approach and requires type safety, but does not require structure. Echopraxia takes a stricter approach and requires both type safety and structure, with more control over output.Ad-hoc structured log analysis with SQLite and DuckDB2023-03-04T09:01:54-08:002023-03-04T09:01:54-08:00https://tersesystems.com/blog/2023/03/04/ad-hoc-structured-log-analysis-with-sqlite-and-duckdb<p>Structured logging and databases are a natural match – there's easily consumed structured data on one side, and tools for querying and presenting data on the other. I've written a bit about <a href="https://tersesystems.com/blog/2020/11/26/queryable-logging-with-blacklite/">querying structured logging with SQLite</a> and the power of data science when it comes to logging by using <a href="https://tersesystems.com/blog/2019/09/28/applying-data-science-to-logs-for-developer-observability/">Apache Spark</a>. Using SQL has a number of advantages over using JSON processing tools or log viewers, such as the ability to progressively build up views while filtering or querying, better timestamp support, and the ability to do aggregate query logic.</p>
<p>But structured logging isn't what most databases are used to. The de-facto standard for structured logging is newline-delimited JSON (NDJSON), and there is only a loose concept of a "schema" – structured logging can have high cardinality, and there's usually only a few guaranteed common fields such as <code class="language-plaintext highlighter-rouge">timestamp</code>, <code class="language-plaintext highlighter-rouge">level</code> and <code class="language-plaintext highlighter-rouge">logger_name</code>. Getting an actual schema so that you can get NDJSON into a database is still a somewhat manual process compared to CSV. Spark is great at NDJSON dataframes, but Spark is a heavyweight solution that we can't just install on a host. What we really want is an in-process "no dependencies" database that understands NDJSON.</p>
<p><strong>TL;DR</strong>: With NDJSON support, slurping structured logs into a "no dependencies" database like SQLite or DuckDB is easier than ever.</p>
<h2 id="sqlite-lines">sqlite-lines</h2>
<p>Alex Garcia released <a href="https://github.com/asg017/sqlite-lines">sqlite-lines</a> in June specifically to read NDJSON.</p>
<p>Using sqlite3 can be more convenient than using <code class="language-plaintext highlighter-rouge">jq</code> or other JSON processing command line tools for digging around in logs. Adding the sqlite-lines extension is as simple as getting the static library:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>wget https://github.com/asg017/sqlite-lines/releases/download/v0.1.1/lines0.so
<span class="nv">$ </span>sqlite3
sqlite> .load ./lines0
</code></pre></div></div>
<p>Processing can be done with the <code class="language-plaintext highlighter-rouge">lines_read</code> function which provides a table with a column <code class="language-plaintext highlighter-rouge">line</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sqlite> <span class="k">select </span>line from lines_read<span class="o">(</span><span class="s1">'application.json'</span><span class="o">)</span> limit 1<span class="p">;</span>
</code></pre></div></div>
<p>This will produce JSON output like:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="s2">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"FtYkeclkrh8dHaINzdiAAA"</span><span class="p">,</span><span class="w">
</span><span class="s2">"relative_ns"</span><span class="p">:</span><span class="w"> </span><span class="mi">-295200</span><span class="p">,</span><span class="w">
</span><span class="s2">"tse_ms"</span><span class="p">:</span><span class="w"> </span><span class="mi">1645548338725</span><span class="p">,</span><span class="w">
</span><span class="s2">"start_ms"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w">
</span><span class="s2">"@timestamp"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2022-02-22T16:45:38.725Z"</span><span class="p">,</span><span class="w">
</span><span class="s2">"@version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1"</span><span class="p">,</span><span class="w">
</span><span class="s2">"message"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Database [logging] initialized"</span><span class="p">,</span><span class="w">
</span><span class="s2">"logger_name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"play.api.db.DefaultDBApi"</span><span class="p">,</span><span class="w">
</span><span class="s2">"thread_name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"play-dev-mode-akka.actor.default-dispatcher-7"</span><span class="p">,</span><span class="w">
</span><span class="s2">"level"</span><span class="p">:</span><span class="w"> </span><span class="s2">"INFO"</span><span class="p">,</span><span class="w">
</span><span class="s2">"level_value"</span><span class="p">:</span><span class="w"> </span><span class="mi">20000</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>We don't want to call <code class="language-plaintext highlighter-rouge">lines_read</code> all the time, so we'll import into a local table:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">create</span> <span class="k">table</span> <span class="n">logs</span> <span class="k">as</span> <span class="k">select</span> <span class="n">line</span> <span class="k">from</span> <span class="n">lines_read</span><span class="p">(</span><span class="s1">'application.json'</span><span class="p">);</span>
</code></pre></div></div>
<p>Combined with the <a href="https://www.sqlite.org/json1.html#jptr">jpointer operators</a> added in <code class="language-plaintext highlighter-rouge">3.38.0</code>, we can filter using JSONPath:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">select</span> <span class="n">line</span> <span class="k">from</span> <span class="n">logs</span> <span class="k">where</span> <span class="n">line</span><span class="o">->></span><span class="s1">'$.level'</span> <span class="o">=</span> <span class="s1">'ERROR'</span><span class="p">;</span>
</code></pre></div></div>
<p>This produces a JSON result that contains a giant stacktrace, and I only want the message. What makes SQLite so effective as a query tool is that it's very easy to progressively stack views to get only the data I want:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sqlite> create view shortlogs as
<span class="k">select </span>line-><span class="s1">'$.@timestamp'</span> as timestamp, line-><span class="s1">'$.message'</span> as message, line->><span class="s1">'$.level'</span> as level
from logs<span class="p">;</span>
sqlite> <span class="k">select</span> <span class="k">*</span> from shortlogs where level <span class="o">=</span> <span class="s1">'ERROR'</span><span class="p">;</span>
<span class="s2">"2022-02-22T16:45:50.900Z"</span>|<span class="s2">"Internal server error for (GET}) [/flaky}]"</span>|ERROR
</code></pre></div></div>
<p>Saving the table and exporting it to your local desktop is also very simple, and gives you the option of using a database GUI like <a href="https://sqlitebrowser.org/">DB Browser for SQLite</a>.</p>
<p>Interestingly, <code class="language-plaintext highlighter-rouge">sqlite-lines</code> can be used with <a href="https://datasette.io/">Datasette</a> with <code class="language-plaintext highlighter-rouge">datasette data.db --load-extension ./lines_nofs0</code> which would provide a web application UI for SQLite, but I haven't tried this.</p>
<h2 id="duckdb">DuckDB</h2>
<p>SQLite does have some disadvantages in that it processes rows sequentially, and so asking it aggregate or analytical questions like "what are the 10 most common user agent strings" can take a while on large datasets. <a href="https://duckdb.org/">DuckDB</a> is like <a href="https://www.sqlite.org/index.html">SQLite</a>, but focused on analytics – it focuses on processing entire columns at once, rather than a row at a time.</p>
<p>I haven't used duckdb extensively as it requires that a schema is defined before you import. That's not a problem with the new version: as of <a href="https://duckdb.org/2023/02/13/announcing-duckdb-070.html">DuckDB 0.7.0</a>, DuckDB can read <a href="http://ndjson.org/">NDJSON</a> files and infer a schema from the values. There's a <a href="https://duckdb.org/2023/03/03/json.html">blog post</a> with examples – let's try it out on logs and see what happens.</p>
<p>Releases are available on <a href="https://github.com/duckdb/duckdb/releases">Github</a>. Installation is a single binary zip file:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>wget https://github.com/duckdb/duckdb/releases/download/v0.7.1/duckdb_cli-linux-amd64.zip
<span class="nv">$ </span>unzip duckdb_cli-linux-amd64.zip
</code></pre></div></div>
<p>We can import the JSON into a DuckDB table and save on the repeated processing, using <code class="language-plaintext highlighter-rouge">read_ndjson_auto</code> as it will let DuckDB parallelize better. The blog post says "DuckDB can also detect a few different DATE/TIMESTAMP formats within JSON strings, as well as TIME and UUID" – while it did see UUID, it did not see "@timestamp" as <a href="https://www.rfc-editor.org/rfc/rfc3339">rfc3339</a>. Not a huge deal, as we can use the <code class="language-plaintext highlighter-rouge">REPLACE</code> <a href="https://duckdb.org/docs/sql/query_syntax/select">clause</a> to manually cast to a timestamp on import:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">logs</span> <span class="k">AS</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">REPLACE</span> <span class="k">CAST</span><span class="p">(</span><span class="nv">"@timestamp"</span> <span class="k">AS</span> <span class="k">TIMESTAMP</span><span class="p">)</span> <span class="k">as</span> <span class="nv">"@timestamp"</span>
<span class="k">FROM</span> <span class="n">read_ndjson_auto</span><span class="p">(</span><span class="s1">'application.json'</span><span class="p">);</span>
</code></pre></div></div>
<p>And because DuckDB infers schema, when we do a <a href="https://duckdb.org/docs/guides/meta/describe.html"><code class="language-plaintext highlighter-rouge">describe logs</code></a> it shows us all the json attributes as columns!</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>│ id │ VARCHAR
│ relative_ns │ BIGINT
│ tse_ms │ UBIGINT
│ start_ms │ UBIGINT
│ @timestamp │ TIMESTAMP
│ @version │ BIGINT
│ message │ VARCHAR
│ logger_name │ VARCHAR
│ thread_name │ VARCHAR
│ level │ VARCHAR
│ level_value │ UBIGINT
│ correlation_id │ BIGINT
│ stack_hash │ VARCHAR
│ name │ VARCHAR
│ trace.span_id │ UUID
│ trace.parent_id │ INTEGER
│ trace.trace_id │ UUID
│ service_name │ VARCHAR
│ duration_ms │ UBIGINT
│ request.method │ VARCHAR
│ request.uri │ VARCHAR
│ response.status │ UBIGINT
│ exception │ STRUCT("name" VARCHAR, properties STRUCT(message VARCHAR))[]
│ stack_trace │ VARCHAR
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">exception</code> column is especially interesting, as DuckDB was able to infer <code class="language-plaintext highlighter-rouge">name</code> and <code class="language-plaintext highlighter-rouge">properties</code> inside it. The struct is 1-based, so the query to match on a specific exception message is <code class="language-plaintext highlighter-rouge">exception[1].properties.message</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>D select exception[1].properties.message from logs;
Execution exception[[IllegalStateException: Who could have foreseen this?]]
</code></pre></div></div>
<p>DuckDB's analytics means that we can do back of the envelope queries to see hidden patterns in logs. We can make use of DuckDB's <a href="https://duckdb.org/docs/sql/aggregates">aggregate functions</a>.</p>
<p>We can start off with the average response time:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">select</span> <span class="k">avg</span><span class="p">(</span><span class="n">duration_ms</span><span class="p">)</span> <span class="k">from</span> <span class="n">logs</span><span class="p">;</span>
</code></pre></div></div>
<p>We can get a better breakdown usinng a 24 hour range, to see if the average response is slower when there's more load:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="k">AVG</span><span class="p">(</span><span class="nv">"duration_ms"</span><span class="p">)</span> <span class="n">OVER</span> <span class="p">(</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="nv">"timestamp"</span> <span class="k">ASC</span>
<span class="n">RANGE</span> <span class="k">BETWEEN</span> <span class="n">INTERVAL</span> <span class="mi">12</span> <span class="n">HOURS</span> <span class="n">PRECEDING</span>
<span class="k">AND</span> <span class="n">INTERVAL</span> <span class="mi">12</span> <span class="n">HOURS</span> <span class="n">FOLLOWING</span><span class="p">)</span>
<span class="k">FROM</span> <span class="n">shortlogs</span><span class="p">;</span>
</code></pre></div></div>
<p>It's even possible to do <a href="https://duckdb.org/docs/sql/window_functions#box-and-whisker-queries">box and whisker queries</a>:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="k">timestamp</span><span class="p">,</span>
<span class="k">MIN</span><span class="p">(</span><span class="nv">"duration_ms"</span><span class="p">)</span> <span class="n">OVER</span> <span class="k">day</span> <span class="k">AS</span> <span class="nv">"Min"</span><span class="p">,</span>
<span class="n">QUANTILE_CONT</span><span class="p">(</span><span class="nv">"duration_ms"</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">.</span><span class="mi">25</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">5</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">75</span><span class="p">])</span> <span class="n">OVER</span> <span class="k">day</span> <span class="k">AS</span> <span class="nv">"IQR"</span><span class="p">,</span>
<span class="k">MAX</span><span class="p">(</span><span class="nv">"duration_ms"</span><span class="p">)</span> <span class="n">OVER</span> <span class="k">day</span> <span class="k">AS</span> <span class="nv">"Max"</span><span class="p">,</span>
<span class="k">FROM</span> <span class="n">shortlogs</span>
<span class="n">WINDOW</span> <span class="k">day</span> <span class="k">AS</span> <span class="p">(</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="nv">"timestamp"</span> <span class="k">ASC</span>
<span class="n">RANGE</span> <span class="k">BETWEEN</span> <span class="n">INTERVAL</span> <span class="mi">12</span> <span class="n">HOURS</span> <span class="n">PRECEDING</span>
<span class="k">AND</span> <span class="n">INTERVAL</span> <span class="mi">12</span> <span class="n">HOURS</span> <span class="n">FOLLOWING</span><span class="p">)</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span>
</code></pre></div></div>
<p>While DuckDB has advantages over SQLite, storage is <a href="https://duckdb.org/internals/storage">not stable</a> – newer versions of DuckDB cannot read old database files and vice versa. This has immediate impact as <a href="https://duckdb.org/docs/guides/data_viewers/tad">Tad</a> is not capable of loading newer DuckDB files.</p>
<p>In addition, support for the STRUCT data structure in Parquet is <a href="https://stackoverflow.com/questions/60227123/read-write-parquet-with-struct-column-type">iffy</a>. <a href="https://duckdb.org/docs/guides/sql_editors/dbeaver">DBeaver</a> is capable of loading the database, but will not render the exception field, instead throwing <code class="language-plaintext highlighter-rouge">SQL Error: Unsupported result column type STRUCT("name" VARCHAR, properties STRUCT(message VARCHAR))[]</code>. Tad <em>does</em> support Parquet's struct format, but is a viewer only, so that's not very useful.</p>
<p>Best advice: only use DuckDB for computation and analytics, but use <code class="language-plaintext highlighter-rouge">EXPORT DATABASE</code> to snapshot work and write out data as JSON or to an <a href="https://duckdb.org/docs/sql/statements/attach">attached SQLite database</a> if you want long term portable storage.</p>
<h2 id="conclusion">Conclusion</h2>
<p>If you want to quickly dig into structured logs, consider using SQLite or DuckDB over trying to process the JSON by hand – they take zero time to install and are vastly more powerful than anything you can do using <code class="language-plaintext highlighter-rouge">jq</code> or a log viewer.</p>Structured logging and databases are a natural match – there's easily consumed structured data on one side, and tools for querying and presenting data on the other. I've written a bit about querying structured logging with SQLite and the power of data science when it comes to logging by using Apache Spark. Using SQL has a number of advantages over using JSON processing tools or log viewers, such as the ability to progressively build up views while filtering or querying, better timestamp support, and the ability to do aggregate query logic. But structured logging isn't what most databases are used to. The de-facto standard for structured logging is newline-delimited JSON (NDJSON), and there is only a loose concept of a "schema" – structured logging can have high cardinality, and there's usually only a few guaranteed common fields such as timestamp, level and logger_name. Getting an actual schema so that you can get NDJSON into a database is still a somewhat manual process compared to CSV. Spark is great at NDJSON dataframes, but Spark is a heavyweight solution that we can't just install on a host. What we really want is an in-process "no dependencies" database that understands NDJSON. TL;DR: With NDJSON support, slurping structured logs into a "no dependencies" database like SQLite or DuckDB is easier than ever. sqlite-lines Alex Garcia released sqlite-lines in June specifically to read NDJSON. Using sqlite3 can be more convenient than using jq or other JSON processing command line tools for digging around in logs. Adding the sqlite-lines extension is as simple as getting the static library: $ wget https://github.com/asg017/sqlite-lines/releases/download/v0.1.1/lines0.so $ sqlite3 sqlite> .load ./lines0 Processing can be done with the lines_read function which provides a table with a column line: sqlite> select line from lines_read('application.json') limit 1; This will produce JSON output like: { "id": "FtYkeclkrh8dHaINzdiAAA", "relative_ns": -295200, "tse_ms": 1645548338725, "start_ms": null, "@timestamp": "2022-02-22T16:45:38.725Z", "@version": "1", "message": "Database [logging] initialized", "logger_name": "play.api.db.DefaultDBApi", "thread_name": "play-dev-mode-akka.actor.default-dispatcher-7", "level": "INFO", "level_value": 20000 } We don't want to call lines_read all the time, so we'll import into a local table: create table logs as select line from lines_read('application.json'); Combined with the jpointer operators added in 3.38.0, we can filter using JSONPath: select line from logs where line->>'$.level' = 'ERROR'; This produces a JSON result that contains a giant stacktrace, and I only want the message. What makes SQLite so effective as a query tool is that it's very easy to progressively stack views to get only the data I want: sqlite> create view shortlogs as select line->'$.@timestamp' as timestamp, line->'$.message' as message, line->>'$.level' as level from logs; sqlite> select * from shortlogs where level = 'ERROR'; "2022-02-22T16:45:50.900Z"|"Internal server error for (GET}) [/flaky}]"|ERROR Saving the table and exporting it to your local desktop is also very simple, and gives you the option of using a database GUI like DB Browser for SQLite. Interestingly, sqlite-lines can be used with Datasette with datasette data.db --load-extension ./lines_nofs0 which would provide a web application UI for SQLite, but I haven't tried this. DuckDB SQLite does have some disadvantages in that it processes rows sequentially, and so asking it aggregate or analytical questions like "what are the 10 most common user agent strings" can take a while on large datasets. DuckDB is like SQLite, but focused on analytics – it focuses on processing entire columns at once, rather than a row at a time. I haven't used duckdb extensively as it requires that a schema is defined before you import. That's not a problem with the new version: as of DuckDB 0.7.0, DuckDB can read NDJSON files and infer a schema from the values. There's a blog post with examples – let's try it out on logs and see what happens. Releases are available on Github. Installation is a single binary zip file: $ wget https://github.com/duckdb/duckdb/releases/download/v0.7.1/duckdb_cli-linux-amd64.zip $ unzip duckdb_cli-linux-amd64.zip We can import the JSON into a DuckDB table and save on the repeated processing, using read_ndjson_auto as it will let DuckDB parallelize better. The blog post says "DuckDB can also detect a few different DATE/TIMESTAMP formats within JSON strings, as well as TIME and UUID" – while it did see UUID, it did not see "@timestamp" as rfc3339. Not a huge deal, as we can use the REPLACE clause to manually cast to a timestamp on import: CREATE TABLE logs AS SELECT * REPLACE CAST("@timestamp" AS TIMESTAMP) as "@timestamp" FROM read_ndjson_auto('application.json'); And because DuckDB infers schema, when we do a describe logs it shows us all the json attributes as columns! │ id │ VARCHAR │ relative_ns │ BIGINT │ tse_ms │ UBIGINT │ start_ms │ UBIGINT │ @timestamp │ TIMESTAMP │ @version │ BIGINT │ message │ VARCHAR │ logger_name │ VARCHAR │ thread_name │ VARCHAR │ level │ VARCHAR │ level_value │ UBIGINT │ correlation_id │ BIGINT │ stack_hash │ VARCHAR │ name │ VARCHAR │ trace.span_id │ UUID │ trace.parent_id │ INTEGER │ trace.trace_id │ UUID │ service_name │ VARCHAR │ duration_ms │ UBIGINT │ request.method │ VARCHAR │ request.uri │ VARCHAR │ response.status │ UBIGINT │ exception │ STRUCT("name" VARCHAR, properties STRUCT(message VARCHAR))[] │ stack_trace │ VARCHAR The exception column is especially interesting, as DuckDB was able to infer name and properties inside it. The struct is 1-based, so the query to match on a specific exception message is exception[1].properties.message: D select exception[1].properties.message from logs; Execution exception[[IllegalStateException: Who could have foreseen this?]] DuckDB's analytics means that we can do back of the envelope queries to see hidden patterns in logs. We can make use of DuckDB's aggregate functions. We can start off with the average response time: select avg(duration_ms) from logs; We can get a better breakdown usinng a 24 hour range, to see if the average response is slower when there's more load: SELECT AVG("duration_ms") OVER ( ORDER BY "timestamp" ASC RANGE BETWEEN INTERVAL 12 HOURS PRECEDING AND INTERVAL 12 HOURS FOLLOWING) FROM shortlogs; It's even possible to do box and whisker queries: SELECT timestamp, MIN("duration_ms") OVER day AS "Min", QUANTILE_CONT("duration_ms", [0.25, 0.5, 0.75]) OVER day AS "IQR", MAX("duration_ms") OVER day AS "Max", FROM shortlogs WINDOW day AS ( ORDER BY "timestamp" ASC RANGE BETWEEN INTERVAL 12 HOURS PRECEDING AND INTERVAL 12 HOURS FOLLOWING) ORDER BY 1, 2 While DuckDB has advantages over SQLite, storage is not stable – newer versions of DuckDB cannot read old database files and vice versa. This has immediate impact as Tad is not capable of loading newer DuckDB files. In addition, support for the STRUCT data structure in Parquet is iffy. DBeaver is capable of loading the database, but will not render the exception field, instead throwing SQL Error: Unsupported result column type STRUCT("name" VARCHAR, properties STRUCT(message VARCHAR))[]. Tad does support Parquet's struct format, but is a viewer only, so that's not very useful. Best advice: only use DuckDB for computation and analytics, but use EXPORT DATABASE to snapshot work and write out data as JSON or to an attached SQLite database if you want long term portable storage. Conclusion If you want to quickly dig into structured logs, consider using SQLite or DuckDB over trying to process the JSON by hand – they take zero time to install and are vastly more powerful than anything you can do using jq or a log viewer.Dynamic Logback and Migrating Logback-Showcase2023-01-04T07:59:15-08:002023-01-04T07:59:15-08:00https://tersesystems.com/blog/2023/01/04/dynamic-logback-and-migrating-logback-showcase<p>I did a couple of small projects over the holidays while drinking <a href="https://thenovicechefblog.com/coquito/">coquito</a>, so here they are.</p>
<p><strong>TL;DR</strong> I made a very smol dynamic logging project, and migrated <a href="https://github.com/tersesystems/terse-logback-showcase">terse-logback-showcase</a> from Heroku to <a href="https://fly.io/">fly.io</a>. It's now <a href="https://terse-logback-showcase.fly.dev/">https://terse-logback-showcase.fly.dev/</a> and <em>it has pictures of cats</em>.</p>
<h2 id="dynamic-logback">Dynamic Logback</h2>
<p>The first is a "simplest possible dynamic logging" project called <a href="https://github.com/wsargent/dynamic-logback">dynamic-logback</a>. This is a project that sets up Logback, and then periodically refreshes log levels from a file. The functionality is in <a href="https://github.com/wsargent/dynamic-logback/blob/main/src/main/scala/DynamicLevel.scala">one file</a> and is less than 100 lines of code. (After writing this, I did a search on "dynamic logback" on Github and discovered <a href="https://github.com/syamantm/dynamic-logback">https://github.com/syamantm/dynamic-logback</a> which is a more complete example.)</p>
<p>Anyway, the point is that dynamic logging is easy! You don't need a lot of infrastructure or to set up a database, you can just set up a timer task and be done with it.</p>
<p>There is an assumption that changing log levels requires a total configuration refresh. This is not helped by the documentation that only mentions <a href="https://logback.qos.ch/manual/configuration.html#autoScan">autoscan</a> as an option, and encourages the setting of levels directly in <code class="language-plaintext highlighter-rouge">logback.xml</code>. The reality is that reloading only applies to Logback appenders and their ancillary supports. Log levels can be queried and modified without any heavy lifting. In SQL terms, appenders and filters are the <code class="language-plaintext highlighter-rouge">DDL</code> statements, while querying and changing log levels are the <code class="language-plaintext highlighter-rouge">DQL</code> and <code class="language-plaintext highlighter-rouge">DML</code> statements.</p>
<p>To abstract it from Log4J/Logback APIs, you could add a <code class="language-plaintext highlighter-rouge">LogLevelQuery</code>/<code class="language-plaintext highlighter-rouge">LogLevelResult</code> interaction for querying log levels, and a <code class="language-plaintext highlighter-rouge">LogLevelCommand</code>/<code class="language-plaintext highlighter-rouge">LogLevelEvent</code> for modifying log levels, and that would allow for a CQRS style API. That's probably for another project though.</p>
<h2 id="migrating-terse-logback-showcase-from-heroku-to-flyio">Migrating terse-logback-showcase from Heroku to fly.io</h2>
<p>The showcase application used to be on Heroku, but they closed down their free tier. Luckily, <a href="https://fly.io/">fly.io</a> has an option for deploying <a href="https://fly.io/docs/languages-and-frameworks/dockerfile/">docker containers</a>.</p>
<p>The application runs on <a href="https://www.playframework.com/">Play Framework</a>, which is JVM based and has built in docker deployment via <a href="https://sbt-native-packager.readthedocs.io/en/stable/">sbt-native-packager</a>, so the only thing I needed to for a <code class="language-plaintext highlighter-rouge">Dockerfile</code> was run <code class="language-plaintext highlighter-rouge">sbt docker:publishLocal</code>. Play itself takes up barely any memory, but the JVM is used to having a bunch of memory available – the <a href="https://fly.io/docs/about/pricing/#free-allowances">free tier</a> is only 256 MB, so it was time to get creative.</p>
<p>I found <a href="https://hub.docker.com/_/ibm-semeru-runtimes"><code class="language-plaintext highlighter-rouge">ibm-semeru-runtimes:open-17-jre-focal</code></a> image with <code class="language-plaintext highlighter-rouge">-XX:MaxRAM=70m</code> was enough to get the JVM started, based on a tip from the <a href="https://community.fly.io/t/deployment-of-java-spring-api-using-dockerfile/6708">community forums</a>.</p>
<pre><code class="language-sbt">dockerBaseImage := "ibm-semeru-runtimes:open-17-jre-focal"
Universal / javaOptions += "-J-XX:MaxRAM=70m"
</code></pre>
<p>Then, I had to add the explicit <code class="language-plaintext highlighter-rouge">add-opens</code> required by JDK 17 as <code class="language-plaintext highlighter-rouge">--illegal-access=permit</code> is gone. I don't think there's <a href="https://stackoverflow.com/questions/68867895/in-java-17-how-do-i-avoid-resorting-to-add-opens">any way to avoid this</a> for now.</p>
<pre><code class="language-sbt">Universal / javaOptions ++= Seq(
"-J--add-opens=java.base/java.lang=ALL-UNNAMED",
"-J--add-opens=java.base/sun.security.ssl=ALL-UNNAMED",
"-J--add-opens=java.base/sun.security.util=ALL-UNNAMED"
)
</code></pre>
<p>I had to make sure the internal sqlite database used by <a href="https://github.com/tersesystems/blacklite/">Blacklite</a> was writable:</p>
<pre><code class="language-sbt">dockerChmodType := DockerChmodType.UserGroupWriteExecute
</code></pre>
<p>And I had to disable the PID file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Universal / javaOptions += "-Dpidfile.path=/dev/null"
</code></pre></div></div>
<p>You can see <a href="https://github.com/tersesystems/terse-logback-showcase/blob/master/build.sbt#L25">build.sbt</a> for the actual implementation.</p>
<p>For some reason the <code class="language-plaintext highlighter-rouge">logs</code> directory wouldn't be created by <code class="language-plaintext highlighter-rouge">sbt docker:publishLocal</code> so I added it to <code class="language-plaintext highlighter-rouge">dist</code> directory for <a href="https://www.playframework.com/documentation/2.8.x/Deploying#Including-additional-files-in-your-distribution">deployment</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mkdir -p dist/logs
touch dist/logs/.README
</code></pre></div></div>
<p>I didn't have to set up HTTPS or anything special for <a href="https://github.com/tersesystems/terse-logback-showcase/blob/master/fly.toml">fly.toml</a>. I did have to set up some fly secrets:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fly secrets set PLAY_APP_SECRET=<secret>
fly secrets set SENTRY_DSN=$SENTRY_DSN
fly secrets set HONEYCOMB_API_KEY=$HONEYCOMB_API_KEY
</code></pre></div></div>
<p>and then it was just a case of creating docker images and running <code class="language-plaintext highlighter-rouge">fly deploy</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sbt clean stage docker:publishLocal
cd target/docker/stage
fly deploy --app terse-logback-showcase
</code></pre></div></div>
<p>And now <a href="https://terse-logback-showcase.fly.dev/">https://terse-logback-showcase.fly.dev/</a> is up! It has pictures of cats.</p>I did a couple of small projects over the holidays while drinking coquito, so here they are. TL;DR I made a very smol dynamic logging project, and migrated terse-logback-showcase from Heroku to fly.io. It's now https://terse-logback-showcase.fly.dev/ and it has pictures of cats. Dynamic Logback The first is a "simplest possible dynamic logging" project called dynamic-logback. This is a project that sets up Logback, and then periodically refreshes log levels from a file. The functionality is in one file and is less than 100 lines of code. (After writing this, I did a search on "dynamic logback" on Github and discovered https://github.com/syamantm/dynamic-logback which is a more complete example.) Anyway, the point is that dynamic logging is easy! You don't need a lot of infrastructure or to set up a database, you can just set up a timer task and be done with it. There is an assumption that changing log levels requires a total configuration refresh. This is not helped by the documentation that only mentions autoscan as an option, and encourages the setting of levels directly in logback.xml. The reality is that reloading only applies to Logback appenders and their ancillary supports. Log levels can be queried and modified without any heavy lifting. In SQL terms, appenders and filters are the DDL statements, while querying and changing log levels are the DQL and DML statements. To abstract it from Log4J/Logback APIs, you could add a LogLevelQuery/LogLevelResult interaction for querying log levels, and a LogLevelCommand/LogLevelEvent for modifying log levels, and that would allow for a CQRS style API. That's probably for another project though. Migrating terse-logback-showcase from Heroku to fly.io The showcase application used to be on Heroku, but they closed down their free tier. Luckily, fly.io has an option for deploying docker containers. The application runs on Play Framework, which is JVM based and has built in docker deployment via sbt-native-packager, so the only thing I needed to for a Dockerfile was run sbt docker:publishLocal. Play itself takes up barely any memory, but the JVM is used to having a bunch of memory available – the free tier is only 256 MB, so it was time to get creative. I found ibm-semeru-runtimes:open-17-jre-focal image with -XX:MaxRAM=70m was enough to get the JVM started, based on a tip from the community forums. dockerBaseImage := "ibm-semeru-runtimes:open-17-jre-focal" Universal / javaOptions += "-J-XX:MaxRAM=70m" Then, I had to add the explicit add-opens required by JDK 17 as --illegal-access=permit is gone. I don't think there's any way to avoid this for now. Universal / javaOptions ++= Seq( "-J--add-opens=java.base/java.lang=ALL-UNNAMED", "-J--add-opens=java.base/sun.security.ssl=ALL-UNNAMED", "-J--add-opens=java.base/sun.security.util=ALL-UNNAMED" ) I had to make sure the internal sqlite database used by Blacklite was writable: dockerChmodType := DockerChmodType.UserGroupWriteExecute And I had to disable the PID file: Universal / javaOptions += "-Dpidfile.path=/dev/null" You can see build.sbt for the actual implementation. For some reason the logs directory wouldn't be created by sbt docker:publishLocal so I added it to dist directory for deployment: mkdir -p dist/logs touch dist/logs/.README I didn't have to set up HTTPS or anything special for fly.toml. I did have to set up some fly secrets: fly secrets set PLAY_APP_SECRET=<secret> fly secrets set SENTRY_DSN=$SENTRY_DSN fly secrets set HONEYCOMB_API_KEY=$HONEYCOMB_API_KEY and then it was just a case of creating docker images and running fly deploy: sbt clean stage docker:publishLocal cd target/docker/stage fly deploy --app terse-logback-showcase And now https://terse-logback-showcase.fly.dev/ is up! It has pictures of cats.Using Scalafix to Refactor Logging2022-11-18T15:20:02-08:002022-11-18T15:20:02-08:00https://tersesystems.com/blog/2022/11/18/echopraxia-scalafix<p>Problem: you want to use <a href="https://github.com/tersesystems/echopraxia-plusscala">Echopraxia</a> structured logging in your Scala application, but you already have an existing body of logging statements.
Solution: Get <a href="https://scalacenter.github.io/scalafix">scalafix</a> to rewrite the logging statements for you!</p>
<p>For Echopraxia, logging statements are based around a <a href="https://github.com/tersesystems/echopraxia-plusscala#field-builder">field builder API</a>. Scala has string interpolation, so most of the time logging statements don't have string concatenation. Instead, most logging statements look like this:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="o">(</span><span class="n">s</span><span class="s">"thing=$thing"</span><span class="o">)</span>
</code></pre></div></div>
<p>What we want is to break <code class="language-plaintext highlighter-rouge">thing</code> out into the field builder so it's not using an implicit <code class="language-plaintext highlighter-rouge">toString</code> call, and can be seen as a unique field in JSON:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="o">(</span><span class="s">"thing={}"</span><span class="o">,</span> <span class="n">fb</span> <span class="k">=></span> <span class="n">fb</span><span class="o">.</span><span class="n">value</span><span class="o">(</span><span class="s">"thing"</span><span class="o">,</span> <span class="n">thing</span><span class="o">))</span>
</code></pre></div></div>
<p>or, for multiple arguments:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="o">(</span><span class="s">"some field {} another field {}"</span><span class="o">,</span> <span class="n">fb</span> <span class="k">=></span> <span class="n">fb</span><span class="o">.</span><span class="n">list</span><span class="o">(</span>
<span class="n">fb</span><span class="o">.</span><span class="n">keyValue</span><span class="o">(</span><span class="s">"text"</span><span class="o">,</span> <span class="n">text</span><span class="o">)</span>
<span class="n">fb</span><span class="o">.</span><span class="n">keyValue</span><span class="o">(</span><span class="s">"number"</span><span class="o">,</span> <span class="n">number</span><span class="o">)</span>
<span class="o">))</span>
</code></pre></div></div>
<p>And we want to be able to recover from the case where exceptions are swallowed:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="o">(</span><span class="n">s</span><span class="s">"exception=$e"</span><span class="o">)</span> <span class="c1">// very bad, will swallow stack trace
</span></code></pre></div></div>
<p>and render it appropriately, but <em>only</em> for exceptions using <code class="language-plaintext highlighter-rouge">fb.exception</code> instead of <code class="language-plaintext highlighter-rouge">fb.value</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="o">(</span><span class="n">s</span><span class="s">"exception={}"</span><span class="o">,</span> <span class="k">_</span><span class="o">.</span><span class="n">exception</span><span class="o">(</span><span class="n">e</span><span class="o">))</span> <span class="c1">// will render exception and stack trace!
</span></code></pre></div></div>
<p>So, this is not a complex refactoring, but it is more complex than IntelliJ IDEA can do out of the box. This is where Scalafix comes in. Scalafix is a refactoring and linting tool that understands how Scala code is structured semantically, using <a href="https://scalameta.org/docs/semanticdb/guide.html">SemanticDB</a>. The SemanticDB support exposes the abstract syntax tree in a Scala program so that it can be recognized and manipulated generally. Before, the AST was available in Scala macros, but not available outside of compilation – you could use Scala macros to autogenerate code, but you couldn't use it to rewrite existing code. As a result of integrating SemanticDB, Scalafix is capable of managing semantic rules like <a href="https://scalacenter.github.io/scalafix/docs/rules/ExplicitResultTypes.html">adding type annotations for explicit result types</a>.</p>
<p>I've been interested in Scalafix for a while, but mostly as an end user, and I hadn't thought about writing Scalafix rules myself. After going through it, I recommend everyone learn how to write Scalafix rules, because they can save you so much time and boilerplate, and are really pretty easy to write.</p>
<p>So how does a Scalafix semantic rule work?</p>
<p>The short version is that Scalafix has an input, and an output. The input is a <a href="https://github.com/scalacenter/scalafix/blob/main/scalafix-core/src/main/scala/scalafix/v1/SemanticDocument.scala">SemanticDocument</a> that contains a tree made up of all the <a href="https://scalameta.org/docs/semanticdb/specification.html">stuff</a> that makes up a program. And for the output, there's a <code class="language-plaintext highlighter-rouge">Patch</code> class that returns… strings.</p>
<p>Seriously, that's <a href="https://scalacenter.github.io/scalafix/docs/developers/patch.html">all there is</a>. You can remove tokens, but if you're patching things to the program, you're adding chunks of text. Initially, I thought that this was very limited, especially after being exposed to the Scala 3 macro <a href="https://docs.scala-lang.org/scala3/guides/macros/macros.html">program as data</a> model, but for refactoring it removes a number of headaches.</p>
<p>I recommend going through the <a href="https://scalacenter.github.io/scalafix/docs/developers/tutorial.html">tutorial</a>, but here I'll walk through how I built up the <a href="https://github.com/tersesystems/echopraxia-scalafix#echopraxiarewritetostructured">EchopraxiaRewriteToStructured</a> rule, starting from scratch. The complete source code is <a href="https://github.com/tersesystems/echopraxia-scalafix/blob/main/rules/src/main/scala/fix/EchopraxiaRewriteToStructured.scala">here</a>.</p>
<p>The first thing that needs doing is finding the logger statement. The basic unit of Scalafix is pattern matching, so we can start by printing out some likely programs and seeing what tree nodes look likely.</p>
<p>There's a "web site" way called <a href="https://astexplorer.net/">AST Explorer</a> which lets you paste programs in, but I prefer printing it out inline as I'm refining the pattern matching, using <code class="language-plaintext highlighter-rouge">foo.structure</code> (you can also reverse it with <code class="language-plaintext highlighter-rouge">foo.syntax</code>):</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">EchopraxiaRewriteToStructured</span> <span class="k">extends</span> <span class="nc">SemanticRule</span><span class="o">(</span><span class="s">"EchopraxiaRewriteToStructured"</span><span class="o">)</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="n">fix</span><span class="o">(</span><span class="k">implicit</span> <span class="n">doc</span><span class="k">:</span> <span class="kt">SemanticDocument</span><span class="o">)</span><span class="k">:</span> <span class="kt">Patch</span> <span class="o">=</span> <span class="o">{</span>
<span class="n">doc</span><span class="o">.</span><span class="n">tree</span><span class="o">.</span><span class="n">collect</span> <span class="o">{</span>
<span class="k">case</span> <span class="n">el</span> <span class="k">=></span>
<span class="n">println</span><span class="o">(</span><span class="s">"${el.structure}"</span><span class="o">)</span> <span class="c1">// prints out structure of tree node
</span> <span class="nc">Patch</span><span class="o">.</span><span class="n">empty</span>
<span class="o">}.</span><span class="n">asPatch</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>From this, we can determine that the statement <code class="language-plaintext highlighter-rouge">logger.debug(s"foo")</code> is represented as:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">Apply</span><span class="o">(</span>
<span class="n">fun</span> <span class="k">=</span> <span class="nc">Select</span><span class="o">(</span><span class="n">qual</span> <span class="k">=</span> <span class="nc">Name</span><span class="o">(</span><span class="s">"logger"</span><span class="o">),</span> <span class="n">name</span> <span class="k">=</span> <span class="nc">Name</span><span class="o">(</span><span class="s">"info"</span><span class="o">)),</span>
<span class="n">args</span> <span class="k">=</span> <span class="nc">List</span><span class="o">(</span><span class="nc">Interpolate</span><span class="o">(</span><span class="n">name</span> <span class="k">=</span> <span class="s">"s"</span><span class="o">,</span> <span class="n">parts</span> <span class="k">=</span> <span class="nc">List</span><span class="o">(</span><span class="s">"foo"</span><span class="o">)))</span>
<span class="o">)</span>
</code></pre></div></div>
<p>This gets at the name, but we also want to check that we're not just latching on to anything called <code class="language-plaintext highlighter-rouge">logger</code> – it also has to be of type <code class="language-plaintext highlighter-rouge">com.tersesystems.echopraxia.plusscala.Logger</code>.</p>
<p>To do this, we have to get the qualifier's symbol information out, and then pattern match on the signature. We can do this in Scalafix by calling <code class="language-plaintext highlighter-rouge">qual.symbol</code> to get the symbol out, and then pulling the <a href="https://scalacenter.github.io/scalafix/docs/developers/symbol-information.html">SymbolInformation</a> to get at the signature. Once we have the signature, we can use <a href="https://scalacenter.github.io/scalafix/docs/developers/symbol-matcher.html">SymbolMatcher</a> to check the <code class="language-plaintext highlighter-rouge">Logger</code> symbol against the <code class="language-plaintext highlighter-rouge">TypeRef</code>.</p>
<p>Long story short, it looks like this:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">EchopraxiaRewriteToStructured</span> <span class="k">extends</span> <span class="nc">SemanticRule</span><span class="o">(</span><span class="s">"EchopraxiaRewriteToStructured"</span><span class="o">)</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">loggerClass</span> <span class="k">=</span> <span class="s">"com.tersesystems.echopraxia.plusscala.Logger"</span>
<span class="k">override</span> <span class="k">def</span> <span class="n">fix</span><span class="o">(</span><span class="k">implicit</span> <span class="n">doc</span><span class="k">:</span> <span class="kt">SemanticDocument</span><span class="o">)</span><span class="k">:</span> <span class="kt">Patch</span> <span class="o">=</span> <span class="o">{</span>
<span class="n">doc</span><span class="o">.</span><span class="n">tree</span><span class="o">.</span><span class="n">collect</span> <span class="o">{</span>
<span class="k">case</span> <span class="n">logger</span> <span class="k">@</span> <span class="nc">Term</span><span class="o">.</span><span class="nc">Apply</span><span class="o">(</span>
<span class="nc">Term</span><span class="o">.</span><span class="nc">Select</span><span class="o">(</span><span class="n">loggerName</span><span class="o">,</span> <span class="n">methodName</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="nc">Term</span><span class="o">.</span><span class="nc">Interpolate</span><span class="o">(</span><span class="nc">Term</span><span class="o">.</span><span class="nc">Name</span><span class="o">(</span><span class="s">"s"</span><span class="o">),</span> <span class="n">parts</span><span class="o">,</span> <span class="n">args</span><span class="o">))</span>
<span class="o">)</span> <span class="k">if</span> <span class="n">matchesType</span><span class="o">(</span><span class="n">loggerName</span><span class="o">)</span> <span class="k">=></span>
<span class="nc">Patch</span><span class="o">.</span><span class="n">empty</span>
<span class="o">}.</span><span class="n">asPatch</span>
<span class="o">}</span>
<span class="k">private</span> <span class="k">def</span> <span class="n">matchesType</span><span class="o">(</span>
<span class="n">qual</span><span class="k">:</span> <span class="kt">Term</span>
<span class="o">)(</span><span class="k">implicit</span> <span class="n">doc</span><span class="k">:</span> <span class="kt">SemanticDocument</span><span class="o">)</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">loggerSymbolMatcher</span> <span class="k">=</span> <span class="nc">SymbolMatcher</span><span class="o">.</span><span class="n">normalized</span><span class="o">(</span><span class="n">loggerClass</span><span class="o">)</span>
<span class="k">val</span> <span class="n">info</span><span class="k">:</span> <span class="kt">SymbolInformation</span> <span class="o">=</span> <span class="n">qual</span><span class="o">.</span><span class="n">symbol</span><span class="o">.</span><span class="n">info</span><span class="o">.</span><span class="n">get</span>
<span class="n">info</span><span class="o">.</span><span class="n">signature</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">MethodSignature</span><span class="o">(</span><span class="k">_</span><span class="o">,</span> <span class="k">_</span><span class="o">,</span> <span class="nc">TypeRef</span><span class="o">(</span><span class="k">_</span><span class="o">,</span> <span class="n">symbol</span><span class="o">,</span> <span class="k">_</span><span class="o">))</span> <span class="k">=></span>
<span class="n">loggerSymbolMatcher</span><span class="o">.</span><span class="n">matches</span><span class="o">(</span><span class="n">symbol</span><span class="o">)</span>
<span class="k">case</span> <span class="n">other</span> <span class="k">=></span>
<span class="kc">false</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Now that we have a relevant logging statement, it's time to rewrite it. We can do this using <code class="language-plaintext highlighter-rouge">Patch.replaceTree</code>, which will replace the <code class="language-plaintext highlighter-rouge">args</code> inside the <code class="language-plaintext highlighter-rouge">Apply</code> node.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">Patch</span><span class="o">.</span><span class="n">replaceTree</span><span class="o">(</span><span class="n">logger</span><span class="o">,</span> <span class="n">rewrite</span><span class="o">(</span><span class="n">loggerName</span><span class="o">,</span> <span class="n">methodName</span><span class="o">,</span> <span class="n">parts</span><span class="o">,</span> <span class="n">args</span><span class="o">))</span>
</code></pre></div></div>
<p>Rewriting the code is… a string. The <code class="language-plaintext highlighter-rouge">parts</code> are always <code class="language-plaintext highlighter-rouge">Lit.String</code>, so calling <code class="language-plaintext highlighter-rouge">lit.value.toString</code> and sticking "{}" in between is the easiest way to parameterize them. Then, it's time to serve up the rewritten logging statement as <code class="language-plaintext highlighter-rouge">s"""$loggerTerm.$methodTerm("$template", fb => $body)"""</code>, and account for some edge cases:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">EchopraxiaRewriteToStructured</span> <span class="k">extends</span> <span class="nc">SemanticRule</span><span class="o">(</span><span class="s">"EchopraxiaRewriteToStructured"</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// ...
</span> <span class="k">private</span> <span class="k">def</span> <span class="n">rewrite</span><span class="o">(</span>
<span class="n">loggerTerm</span><span class="k">:</span> <span class="kt">Term</span><span class="o">,</span>
<span class="n">methodTerm</span><span class="k">:</span> <span class="kt">Term</span><span class="o">,</span>
<span class="n">parts</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">Lit</span><span class="o">],</span>
<span class="n">args</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">Term</span><span class="o">]</span>
<span class="o">)(</span><span class="k">implicit</span> <span class="n">doc</span><span class="k">:</span> <span class="kt">SemanticDocument</span><span class="o">)</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="n">args</span><span class="o">.</span><span class="n">isEmpty</span><span class="o">)</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">template</span> <span class="k">=</span> <span class="n">parts</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">value</span><span class="o">.</span><span class="n">toString</span><span class="o">).</span><span class="n">mkString</span><span class="o">(</span><span class="s">"{}"</span><span class="o">)</span>
<span class="n">s</span><span class="s">"""$loggerTerm.$methodTerm("$template")"""</span>
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">template</span> <span class="k">=</span> <span class="n">parts</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">value</span><span class="o">.</span><span class="n">toString</span><span class="o">).</span><span class="n">mkString</span><span class="o">(</span><span class="s">"{}"</span><span class="o">)</span>
<span class="k">val</span> <span class="n">values</span> <span class="k">=</span> <span class="n">args</span><span class="o">.</span><span class="n">map</span> <span class="o">{</span>
<span class="k">case</span> <span class="n">arg</span><span class="k">:</span> <span class="kt">Term.Name</span> <span class="o">=></span>
<span class="k">if</span> <span class="o">(</span><span class="n">isThrowable</span><span class="o">(</span><span class="n">arg</span><span class="o">.</span><span class="n">symbol</span><span class="o">.</span><span class="n">info</span><span class="o">.</span><span class="n">get</span><span class="o">.</span><span class="n">signature</span><span class="o">))</span> <span class="o">{</span>
<span class="n">s</span><span class="s">"""fb.exception($arg)"""</span>
<span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
<span class="n">s</span><span class="s">"""fb.$fieldBuilderMethod("$arg", $arg)"""</span>
<span class="o">}</span>
<span class="k">case</span> <span class="n">other</span> <span class="k">=></span>
<span class="c1">// XXX I don't think this is possible?
</span> <span class="n">s</span><span class="s">"""fb.$fieldBuilderMethod("$other", $other)"""</span>
<span class="o">}</span>
<span class="k">val</span> <span class="n">body</span> <span class="k">=</span>
<span class="k">if</span> <span class="o">(</span><span class="n">values</span><span class="o">.</span><span class="n">size</span> <span class="o">==</span> <span class="mi">1</span><span class="o">)</span> <span class="n">values</span><span class="o">.</span><span class="n">head</span>
<span class="k">else</span> <span class="n">s</span><span class="s">"""fb.list(${values.mkString(", ")})"""</span>
<span class="n">s</span><span class="s">"""$loggerTerm.$methodTerm("$template", fb => $body)"""</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>So hang on a sec… how do we know an argument is a throwable?</p>
<p>This is where it gets really interesting, because this is where we start running into the limits of Scalafix. Scalafix can look at the structure of a type, but does not expose the <a href="https://scalacenter.github.io/scalafix/docs/developers/semantic-type.html#test-for-subtyping">subtyping information</a> of a type. This is a problem, because exceptions rely heavily on subtyping to work.</p>
<p>However, there is a hack that we can try. From poking at the <a href="https://github.com/scalacenter/scalafix/issues/531">issues</a>, we can try java runtime reflection to instantiate the class, and see if it's assignable from <code class="language-plaintext highlighter-rouge">Throwable</code>. I don't love the manual hacking on the symbol to kludge it into a fully qualified class name, but it'll work.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">EchopraxiaRewriteToStructured</span> <span class="k">extends</span> <span class="nc">SemanticRule</span><span class="o">(</span><span class="s">"EchopraxiaRewriteToStructured"</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// ...
</span> <span class="k">def</span> <span class="n">isThrowable</span><span class="o">(</span><span class="n">signature</span><span class="k">:</span> <span class="kt">Signature</span><span class="o">)</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">toFqn</span><span class="o">(</span><span class="n">symbol</span><span class="k">:</span> <span class="kt">Symbol</span><span class="o">)</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="n">symbol</span><span class="o">.</span><span class="n">value</span>
<span class="o">.</span><span class="n">replaceAll</span><span class="o">(</span><span class="s">"/"</span><span class="o">,</span> <span class="s">"."</span><span class="o">)</span>
<span class="o">.</span><span class="n">replaceAll</span><span class="o">(</span><span class="s">"\\.$"</span><span class="o">,</span> <span class="s">"\\$"</span><span class="o">)</span>
<span class="o">.</span><span class="n">stripSuffix</span><span class="o">(</span><span class="s">"#"</span><span class="o">)</span>
<span class="o">.</span><span class="n">stripPrefix</span><span class="o">(</span><span class="s">"_root_."</span><span class="o">)</span>
<span class="n">signature</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">ValueSignature</span><span class="o">(</span><span class="nc">TypeRef</span><span class="o">(</span><span class="k">_</span><span class="o">,</span> <span class="n">symbol</span><span class="o">,</span> <span class="k">_</span><span class="o">))</span> <span class="k">=></span>
<span class="k">val</span> <span class="n">cl</span> <span class="k">=</span> <span class="k">this</span><span class="o">.</span><span class="n">getClass</span><span class="o">.</span><span class="n">getClassLoader</span>
<span class="k">try</span> <span class="o">{</span>
<span class="n">classOf</span><span class="o">[</span><span class="kt">Throwable</span><span class="o">].</span><span class="n">isAssignableFrom</span><span class="o">(</span><span class="n">cl</span><span class="o">.</span><span class="n">loadClass</span><span class="o">(</span><span class="n">toFqn</span><span class="o">(</span><span class="n">symbol</span><span class="o">)))</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">{</span>
<span class="k">case</span> <span class="n">e</span><span class="k">:</span> <span class="kt">Exception</span> <span class="o">=></span>
<span class="kc">false</span>
<span class="o">}</span>
<span class="k">case</span> <span class="k">_</span> <span class="k">=></span>
<span class="kc">false</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Finally, let's add some configuration so that we can account for <a href="https://github.com/tersesystems/echopraxia-plusscala#custom-logger">custom loggers</a> and <a href="https://github.com/tersesystems/echopraxia-plusscala#field-builder">custom field builder methods</a>. This is <a href="https://scalacenter.github.io/scalafix/docs/developers/tutorial.html#use-withconfiguration-to-make-a-rule-configurable">very simple</a>: plop down a <code class="language-plaintext highlighter-rouge">Config</code> and a <code class="language-plaintext highlighter-rouge">withConfiguration</code> method and we're pretty much done:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="nn">metaconfig.</span><span class="o">{</span><span class="nc">ConfDecoder</span><span class="o">,</span> <span class="nc">Configured</span><span class="o">}</span>
<span class="k">import</span> <span class="nn">metaconfig.generic.Surface</span>
<span class="c1">// ...
</span>
<span class="k">class</span> <span class="nc">EchopraxiaRewriteToStructured</span><span class="o">(</span>
<span class="n">config</span><span class="k">:</span> <span class="kt">EchopraxiaRewriteToStructured.Config</span>
<span class="o">)</span> <span class="k">extends</span> <span class="nc">SemanticRule</span><span class="o">(</span><span class="s">"EchopraxiaRewriteToStructured"</span><span class="o">)</span> <span class="o">{</span>
<span class="k">private</span> <span class="k">val</span> <span class="n">loggerClass</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="n">config</span><span class="o">.</span><span class="n">loggerClass</span>
<span class="k">private</span> <span class="k">val</span> <span class="n">fieldBuilderMethod</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="n">config</span><span class="o">.</span><span class="n">fieldBuilderMethod</span>
<span class="k">def</span> <span class="k">this</span><span class="o">()</span> <span class="k">=</span> <span class="k">this</span><span class="o">(</span><span class="nc">EchopraxiaRewriteToStructured</span><span class="o">.</span><span class="nc">Config</span><span class="o">())</span>
<span class="k">override</span> <span class="k">def</span> <span class="n">withConfiguration</span><span class="o">(</span><span class="n">config</span><span class="k">:</span> <span class="kt">Configuration</span><span class="o">)</span><span class="k">:</span> <span class="kt">Configured</span><span class="o">[</span><span class="kt">Rule</span><span class="o">]</span> <span class="k">=</span>
<span class="n">config</span><span class="o">.</span><span class="n">conf</span>
<span class="o">.</span><span class="n">getOrElse</span><span class="o">(</span><span class="s">"EchopraxiaRewriteToStructured"</span><span class="o">)(</span><span class="k">this</span><span class="o">.</span><span class="n">config</span><span class="o">)</span>
<span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="n">newConfig</span> <span class="k">=></span> <span class="k">new</span> <span class="nc">EchopraxiaRewriteToStructured</span><span class="o">(</span><span class="n">newConfig</span><span class="o">)</span> <span class="o">}</span>
<span class="c1">// ...
</span><span class="o">}</span>
<span class="k">object</span> <span class="nc">EchopraxiaRewriteToStructured</span> <span class="o">{</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Config</span><span class="o">(</span>
<span class="n">loggerClass</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="s">"com.tersesystems.echopraxia.plusscala.Logger"</span><span class="o">,</span>
<span class="n">fieldBuilderMethod</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="s">"value"</span>
<span class="o">)</span>
<span class="k">object</span> <span class="nc">Config</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">default</span> <span class="k">=</span> <span class="nc">Config</span><span class="o">()</span>
<span class="k">implicit</span> <span class="k">val</span> <span class="n">surface</span><span class="k">:</span> <span class="kt">Surface</span><span class="o">[</span><span class="kt">Config</span><span class="o">]</span> <span class="k">=</span>
<span class="n">metaconfig</span><span class="o">.</span><span class="n">generic</span><span class="o">.</span><span class="n">deriveSurface</span><span class="o">[</span><span class="kt">Config</span><span class="o">]</span>
<span class="k">implicit</span> <span class="k">val</span> <span class="n">decoder</span><span class="k">:</span> <span class="kt">ConfDecoder</span><span class="o">[</span><span class="kt">Config</span><span class="o">]</span> <span class="k">=</span>
<span class="n">metaconfig</span><span class="o">.</span><span class="n">generic</span><span class="o">.</span><span class="n">deriveDecoder</span><span class="o">(</span><span class="n">default</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>And that's it! I hope this shows how simple and straightforward refactoring in Scalafix can be. For extra credit, I also have <a href="https://github.com/tersesystems/echopraxia-scalafix#echopraxiawrapmethodwithlogger">EchopraxiaWrapMethodWithLogger</a> that will wrap a method in a <a href="https://github.com/tersesystems/echopraxia-plusscala#trace-and-flow-loggers">flow or trace logger</a>.</p>Problem: you want to use Echopraxia structured logging in your Scala application, but you already have an existing body of logging statements. Solution: Get scalafix to rewrite the logging statements for you! For Echopraxia, logging statements are based around a field builder API. Scala has string interpolation, so most of the time logging statements don't have string concatenation. Instead, most logging statements look like this: logger.info(s"thing=$thing") What we want is to break thing out into the field builder so it's not using an implicit toString call, and can be seen as a unique field in JSON: logger.info("thing={}", fb => fb.value("thing", thing)) or, for multiple arguments: logger.info("some field {} another field {}", fb => fb.list( fb.keyValue("text", text) fb.keyValue("number", number) )) And we want to be able to recover from the case where exceptions are swallowed: logger.error(s"exception=$e") // very bad, will swallow stack trace and render it appropriately, but only for exceptions using fb.exception instead of fb.value: logger.error(s"exception={}", _.exception(e)) // will render exception and stack trace! So, this is not a complex refactoring, but it is more complex than IntelliJ IDEA can do out of the box. This is where Scalafix comes in. Scalafix is a refactoring and linting tool that understands how Scala code is structured semantically, using SemanticDB. The SemanticDB support exposes the abstract syntax tree in a Scala program so that it can be recognized and manipulated generally. Before, the AST was available in Scala macros, but not available outside of compilation – you could use Scala macros to autogenerate code, but you couldn't use it to rewrite existing code. As a result of integrating SemanticDB, Scalafix is capable of managing semantic rules like adding type annotations for explicit result types. I've been interested in Scalafix for a while, but mostly as an end user, and I hadn't thought about writing Scalafix rules myself. After going through it, I recommend everyone learn how to write Scalafix rules, because they can save you so much time and boilerplate, and are really pretty easy to write. So how does a Scalafix semantic rule work? The short version is that Scalafix has an input, and an output. The input is a SemanticDocument that contains a tree made up of all the stuff that makes up a program. And for the output, there's a Patch class that returns… strings. Seriously, that's all there is. You can remove tokens, but if you're patching things to the program, you're adding chunks of text. Initially, I thought that this was very limited, especially after being exposed to the Scala 3 macro program as data model, but for refactoring it removes a number of headaches. I recommend going through the tutorial, but here I'll walk through how I built up the EchopraxiaRewriteToStructured rule, starting from scratch. The complete source code is here. The first thing that needs doing is finding the logger statement. The basic unit of Scalafix is pattern matching, so we can start by printing out some likely programs and seeing what tree nodes look likely. There's a "web site" way called AST Explorer which lets you paste programs in, but I prefer printing it out inline as I'm refining the pattern matching, using foo.structure (you can also reverse it with foo.syntax): class EchopraxiaRewriteToStructured extends SemanticRule("EchopraxiaRewriteToStructured") { override def fix(implicit doc: SemanticDocument): Patch = { doc.tree.collect { case el => println("${el.structure}") // prints out structure of tree node Patch.empty }.asPatch } } From this, we can determine that the statement logger.debug(s"foo") is represented as: Apply( fun = Select(qual = Name("logger"), name = Name("info")), args = List(Interpolate(name = "s", parts = List("foo"))) ) This gets at the name, but we also want to check that we're not just latching on to anything called logger – it also has to be of type com.tersesystems.echopraxia.plusscala.Logger. To do this, we have to get the qualifier's symbol information out, and then pattern match on the signature. We can do this in Scalafix by calling qual.symbol to get the symbol out, and then pulling the SymbolInformation to get at the signature. Once we have the signature, we can use SymbolMatcher to check the Logger symbol against the TypeRef. Long story short, it looks like this: class EchopraxiaRewriteToStructured extends SemanticRule("EchopraxiaRewriteToStructured") { val loggerClass = "com.tersesystems.echopraxia.plusscala.Logger" override def fix(implicit doc: SemanticDocument): Patch = { doc.tree.collect { case logger @ Term.Apply( Term.Select(loggerName, methodName), List(Term.Interpolate(Term.Name("s"), parts, args)) ) if matchesType(loggerName) => Patch.empty }.asPatch } private def matchesType( qual: Term )(implicit doc: SemanticDocument): Boolean = { val loggerSymbolMatcher = SymbolMatcher.normalized(loggerClass) val info: SymbolInformation = qual.symbol.info.get info.signature match { case MethodSignature(_, _, TypeRef(_, symbol, _)) => loggerSymbolMatcher.matches(symbol) case other => false } } } Now that we have a relevant logging statement, it's time to rewrite it. We can do this using Patch.replaceTree, which will replace the args inside the Apply node. Patch.replaceTree(logger, rewrite(loggerName, methodName, parts, args)) Rewriting the code is… a string. The parts are always Lit.String, so calling lit.value.toString and sticking "{}" in between is the easiest way to parameterize them. Then, it's time to serve up the rewritten logging statement as s"""$loggerTerm.$methodTerm("$template", fb => $body)""", and account for some edge cases: class EchopraxiaRewriteToStructured extends SemanticRule("EchopraxiaRewriteToStructured") { // ... private def rewrite( loggerTerm: Term, methodTerm: Term, parts: List[Lit], args: List[Term] )(implicit doc: SemanticDocument): String = { if (args.isEmpty) { val template = parts.map(_.value.toString).mkString("{}") s"""$loggerTerm.$methodTerm("$template")""" } else { val template = parts.map(_.value.toString).mkString("{}") val values = args.map { case arg: Term.Name => if (isThrowable(arg.symbol.info.get.signature)) { s"""fb.exception($arg)""" } else { s"""fb.$fieldBuilderMethod("$arg", $arg)""" } case other => // XXX I don't think this is possible? s"""fb.$fieldBuilderMethod("$other", $other)""" } val body = if (values.size == 1) values.head else s"""fb.list(${values.mkString(", ")})""" s"""$loggerTerm.$methodTerm("$template", fb => $body)""" } } } So hang on a sec… how do we know an argument is a throwable? This is where it gets really interesting, because this is where we start running into the limits of Scalafix. Scalafix can look at the structure of a type, but does not expose the subtyping information of a type. This is a problem, because exceptions rely heavily on subtyping to work. However, there is a hack that we can try. From poking at the issues, we can try java runtime reflection to instantiate the class, and see if it's assignable from Throwable. I don't love the manual hacking on the symbol to kludge it into a fully qualified class name, but it'll work. class EchopraxiaRewriteToStructured extends SemanticRule("EchopraxiaRewriteToStructured") { // ... def isThrowable(signature: Signature): Boolean = { def toFqn(symbol: Symbol): String = symbol.value .replaceAll("/", ".") .replaceAll("\\.$", "\\$") .stripSuffix("#") .stripPrefix("_root_.") signature match { case ValueSignature(TypeRef(_, symbol, _)) => val cl = this.getClass.getClassLoader try { classOf[Throwable].isAssignableFrom(cl.loadClass(toFqn(symbol))) } catch { case e: Exception => false } case _ => false } } } Finally, let's add some configuration so that we can account for custom loggers and custom field builder methods. This is very simple: plop down a Config and a withConfiguration method and we're pretty much done: import metaconfig.{ConfDecoder, Configured} import metaconfig.generic.Surface // ... class EchopraxiaRewriteToStructured( config: EchopraxiaRewriteToStructured.Config ) extends SemanticRule("EchopraxiaRewriteToStructured") { private val loggerClass: String = config.loggerClass private val fieldBuilderMethod: String = config.fieldBuilderMethod def this() = this(EchopraxiaRewriteToStructured.Config()) override def withConfiguration(config: Configuration): Configured[Rule] = config.conf .getOrElse("EchopraxiaRewriteToStructured")(this.config) .map { newConfig => new EchopraxiaRewriteToStructured(newConfig) } // ... } object EchopraxiaRewriteToStructured { case class Config( loggerClass: String = "com.tersesystems.echopraxia.plusscala.Logger", fieldBuilderMethod: String = "value" ) object Config { val default = Config() implicit val surface: Surface[Config] = metaconfig.generic.deriveSurface[Config] implicit val decoder: ConfDecoder[Config] = metaconfig.generic.deriveDecoder(default) } } And that's it! I hope this shows how simple and straightforward refactoring in Scalafix can be. For extra credit, I also have EchopraxiaWrapMethodWithLogger that will wrap a method in a flow or trace logger.Latency and Throughput With Logback2022-10-16T19:53:50-07:002022-10-16T19:53:50-07:00https://tersesystems.com/blog/2022/10/16/latency-and-throughput-with-logback<p>I've been working with Logback for a while now, and one of the things that stands out is how people will talk about "fast" or "performant" logging, with the theory that picking the right encoder or the right appender will make things work. It's not wrong, but it's not exactly right either.</p>
<p>So, this blog post discusses latency and throughput in Logback, along with some fun non-obvious things that can cause production issues if you're not careful. And it has pictures!</p>
<h2 id="latency">Latency</h2>
<p>Latency is defined as the amount of time required to complete a single operation.</p>
<p>Latency is a surprisingly slippery concept, because as soon as you start aggregating latency times, you can end up with visualizations that can <a href="https://igor.io/latency/">omit or obscure parts of the picture</a>. Latency can be defined as averages, percentiles, histograms (useful for "long tail" latency), or heatmaps.</p>
<p>Because we're talking about conceptual latency here, we'll talk about the "average" latency between a logging statement, and a statement being logged.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@BenchmarkMode</span><span class="o">(</span><span class="nc">Array</span><span class="o">(</span><span class="nc">Mode</span><span class="o">.</span><span class="nc">AverageTime</span><span class="o">))</span>
<span class="nd">@OutputTimeUnit</span><span class="o">(</span><span class="nc">TimeUnit</span><span class="o">.</span><span class="nc">NANOSECONDS</span><span class="o">)</span>
<span class="k">class</span> <span class="nc">SLF4JBenchmark</span> <span class="o">{</span>
<span class="k">import</span> <span class="nn">SLF4JBenchmark._</span>
<span class="nd">@Benchmark</span>
<span class="k">def</span> <span class="n">boundedDebugWithTemplate</span><span class="o">()</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span>
<span class="k">if</span> <span class="o">(</span><span class="n">logger</span><span class="o">.</span><span class="n">isDebugEnabled</span><span class="o">)</span> <span class="o">{</span>
<span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="o">(</span><span class="s">"hello world, {}"</span><span class="o">,</span> <span class="n">longAdder</span><span class="o">.</span><span class="n">incrementAndGet</span><span class="o">())</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>And using an encoder and appender like this:</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><configuration></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"FILE"</span> <span class="na">class=</span><span class="s">"ch.qos.logback.core.FileAppender"</span><span class="nt">></span>
<span class="nt"><file></span>testFile.log<span class="nt"></file></span>
<span class="nt"><append></span>false<span class="nt"></append></span>
<span class="nt"><immediateFlush></span>false<span class="nt"></immediateFlush></span>
<span class="nt"><encoder></span>
<span class="nt"><pattern></span>%-4relative [%thread] %-5level %logger{35} - %msg%n<span class="nt"></pattern></span>
<span class="nt"></encoder></span>
<span class="nt"></appender></span>
<span class="nt"></configuration></span>
</code></pre></div></div>
<p>Say that <code class="language-plaintext highlighter-rouge">boundedDebugWithTemplate</code> takes roughly 871 nanoseconds as measured by JMH. We can visualize this as a straight line, from the time of logging to the time that bytes were appended to a file.</p>
<p><img src="/images/latency/first.png" alt="first" /></p>
<p>But logging is made up of several operations. For example, if we swap out the file appender for an no-op appender that does nothing but create the logging event and a message based off the template, we can see that the same operation takes only 33 nanoseconds. If we set the logger to INFO level, we can see the <code class="language-plaintext highlighter-rouge">isLoggingDebug</code> call takes only 1.6 nanoseconds. So in reality, what we're looking at is more like this:</p>
<p><img src="/images/latency/second.png" alt="second" /></p>
<p>Because the FileAppender is blocking and Logback runs everything in the calling thread, this means that turning on debugging in an operation will add ~871 ns to every call.</p>
<p><img src="/images/latency/third.png" alt="third" /></p>
<p>This also compounds for <em>every</em> blocking appender. The initial costs of putting together the logging event happen once, but if you have a STDOUT appender, a file appender, and a network appender, they all encode the logging event using distinct encoders, and render sequentially on the same thread.</p>
<p><img src="/images/latency/fourth.png" alt="fourth" /></p>
<p>In practical terms – the more appenders you add, the slower your code gets when you log.</p>
<p>It's important to note at this point how tiny a latency of 871 nanoseconds is – for comparison, instantiating any Java object costs around 20 nanoseconds. For most operations, logging is not the bottleneck compared to the costs of the operation itself – unnecessary database queries, blocking on network calls, and lack of caching are still the low hanging fruit.</p>
<p>However, it is still a cost. Moreover, looking at the average latency doesn't tell you about the outliers – the <a href="https://brooker.co.za/blog/2021/04/19/latency.html">"long tail" of latency</a>. If an operation blocks in any way, then that cost will be passed on to the application. And blocking can happen in the most insidious of ways.</p>
<p>The obvious source of blocking is when a logging event or message includes a blocking call. For example, calling <code class="language-plaintext highlighter-rouge">UUID.randomUUID()</code> blocks because of the internal <a href="https://braveo.blogspot.com/2013/05/uuidrandomuuid-is-slow.html">lock</a>, or calling <code class="language-plaintext highlighter-rouge">toString()</code> on a collection that contains <code class="language-plaintext highlighter-rouge">java.net.URL</code> objects, causing hundreds of <a href="https://michaelscharf.blogspot.com/2006/11/javaneturlequals-and-hashcode-make.html">DNS resolutions</a>. This can block an HTTP request for multiple seconds, and it won't be immediately obvious from looking at the logs.</p>
<p><img src="/images/latency/fourth-block1.png" alt="fourth1" /></p>
<p>But blocking is not solely an input problem though – blocking can come from Logback itself.</p>
<p>Blocking in Logback can come from appenders. Anything extending <a href="https://logback.qos.ch/manual/appenders.html#AppenderBase">AppenderBase</a> uses a <code class="language-plaintext highlighter-rouge">synchronized</code> lock that ensures only one thread is appending. While it looks like blocking in appenders is a small consistent cost, this is not always the case. For example, a rolling file appender can block on rollover. <a href="https://jira.qos.ch/browse/LOGBACK-267">LOGBACK-267</a> means that if you use <a href="https://logback.qos.ch/manual/appenders.html#FixedWindowRollingPolicy">FixedWindowRollingPolicy</a> and enable compression by specifying a <code class="language-plaintext highlighter-rouge">.gz</code> suffix, then compressing multi-gigabyte files can <a href="https://medium.com/groupon-eng/debugging-tuning-logback-bottleneck-in-a-high-throughput-java-application-5161dd43cc6d">stall the appender</a>, blocking all logging for 55 to 69 seconds. The underlying cause is that <code class="language-plaintext highlighter-rouge">FixedWindowRollingPolicy.java</code> calls <a href="https://github.com/qos-ch/logback/blob/master/logback-core/src/main/java/ch/qos/logback/core/rolling/FixedWindowRollingPolicy.java#L154"><code class="language-plaintext highlighter-rouge">compressor.compress</code></a>, as opposed to <code class="language-plaintext highlighter-rouge">TimeBasedRollingPolicy.java</code> which uses <a href="https://github.com/qos-ch/logback/blob/master/logback-core/src/main/java/ch/qos/logback/core/rolling/TimeBasedRollingPolicy.java#L178"><code class="language-plaintext highlighter-rouge">compressor.asyncCompress</code></a>.</p>
<p><img src="/images/latency/fourth-block2.png" alt="fourth2" /></p>
<p>You might think the problem of blocking can be easily avoided, but it's not quite that simple. Blocking can happen at the kernel, even when <a href="https://www.evanjones.ca/jvm-mmap-pause.html">writing to memory mapped files</a>, as the operating system manages writes. This causes <a href="https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic">issues</a>. Filesystem blocking can occur even in software RAID or a network backed VFS. In short, when files were created this made <a href="https://danluu.com/deconstruct-files/">lots of people angry</a>, and was <a href="https://rachelbythebay.com/w/2020/08/11/files/">widely regarded as a bad move</a>. I suspect that the <a href="https://github.com/logfellow/logstash-logback-encoder#tcp-appenders">TCP appenders</a> and TCP network stack work differently, but then the assumption is that <a href="https://queue.acm.org/detail.cfm?id=2655736">the network is reliable</a>.</p>
<h2 id="asynchronous-logging">Asynchronous Logging</h2>
<p>There is a way to avoid unanticipated blocking: we can log asynchronously. Asynchronous logging is a trend, with <a href="https://aws.amazon.com/blogs/developer/asynchronous-logging-corretto-17/">asynchronous GC logging in Corretto 17</a> coming out for the JDK itself.</p>
<p>There are several ways to implement asynchronous logging. <a href="https://github.com/tersesystems/echopraxia">Echopraxia</a> can address it at invocation with an <a href="https://github.com/tersesystems/echopraxia#asynchronous-logging">asynchronous logger</a>, deferring argument construction and condition evaluations and allowing <a href="https://github.com/tersesystems/echopraxia#managing-caller-info">caller information for free</a>, at the cost of a more complex method interface. Alternatively, asynchronous logging can be implemented in an appender, although this does mean that argument and <code class="language-plaintext highlighter-rouge">LoggingEvent</code> construction happen on the calling thread.</p>
<p>Logback does have an out of the box async appender, but the <code class="language-plaintext highlighter-rouge">LoggingEventAsyncDisruptorAppender</code> from <a href="https://github.com/logfellow/logstash-logback-encoder#async-appenders">logstash-logback-encoder</a> is much richer from a feature-based perspective; it by default drops <em>all</em> events when full, can warn when full, and has more customization available on ringbuffer size and behavior. From a performance perspective I'd say it's a wash for most people – note that the <a href="https://logback.qos.ch/performance.html">logback performance page</a> discusses <em>throughput</em>, so it's not an apples to apples comparison.</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><appender</span> <span class="na">name=</span><span class="s">"async"</span> <span class="na">class=</span><span class="s">"net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"</span><span class="nt">></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"ch.qos.logback.core.rolling.RollingFileAppender"</span><span class="nt">></span>
...
<span class="nt"></appender></span>
<span class="nt"></appender></span>
</code></pre></div></div>
<p>An async appender will accept <code class="language-plaintext highlighter-rouge">LoggerEvent</code>, and will write to an in-memory ring buffer that is used by a dedicated thread to write to the enclosed appenders.</p>
<p>On average, the mean latency for a disruptor is ~50 nanoseconds, up to a <a href="https://github.com/wsargent/slf4j-benchmark#debug-enabled-with-async-appender">worst case scenario of 420 ns</a> when the queue is fully loaded. This means that the rendering thread only incurs the latency cost of 33 ns (eval + logback event) + 50 ns (enqueuing), but does not incur the latency cost of appending to file. An asynchronous boundary exists between the thread running the operation, and the thread that picks up the logger and writes to the appenders.</p>
<p><img src="/images/latency/fifth.png" alt="fifth" /></p>
<p>Using multiple threads enables logging to be concurrent, running alongside operations without interfering with them. There is a difference between concurrency and parallelism: if there's only one core available, then the two threads may run interleaved, and there may be a small delay in writing the logs. If there are multiple cores available though, then typically the thread will be writing logs in parallel.</p>
<p>There are some special cases / catches to asynchronous logging.</p>
<p>The first catch is not adding a <a href="https://logback.qos.ch/manual/configuration.html#stopContext">shutdown hook</a>; you need to let the ring buffer <a href="https://github.com/logfellow/logstash-logback-encoder#graceful-shutdown">gracefully shutdown</a>, and if Logback shuts down immediately you will miss events that could be critical.</p>
<p>The second catch is to use unnecessary async appenders, each wrapping a single appender. This can be a waste of threads; you only need one to create an asynchronous boundary. If you do not anticipate significant load and your appenders are fast, my recommendation is to define the appender at the root, before you do anything else.</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><configuration></span>
<span class="nt"><shutdownHook</span> <span class="na">class=</span><span class="s">"ch.qos.logback.core.hook.DelayingShutdownHook"</span><span class="nt">></span>
<span class="nt"><delay></span>150<span class="nt"></delay></span>
<span class="nt"></shutdownHook></span>
<span class="nt"><root</span> <span class="na">level=</span><span class="s">"all"</span><span class="nt">></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"</span><span class="nt">></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"FILE"</span><span class="nt">></span>...<span class="nt"></appender></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"STDOUT"</span><span class="nt">></span>...<span class="nt"></appender></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"TCP"</span><span class="nt">></span>...<span class="nt"></appender></span>
<span class="nt"></appender></span>
<span class="nt"></root></span>
<span class="nt"></configuration></span>
</code></pre></div></div>
<p>The third catch is what happens to asynchronous logging when there is significant load. Ring buffers can fill up when the underlying appenders are slow and do not drain the buffer fast enough, and a full ring buffer can result in <a href="https://github.com/logfellow/logstash-logback-encoder#ringbuffer-full">dropped events</a>.</p>
<p>Therefore, if you do have an appender that's awkward (and you can't fix it), you should configure a distinct appender for it and configure it so it doesn't jam up the others.</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><configuration></span>
<span class="nt"><root</span> <span class="na">level=</span><span class="s">"all"</span><span class="nt">></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"</span><span class="nt">></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"FILE"</span><span class="nt">></span>...<span class="nt"></appender></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"STDOUT"</span><span class="nt">></span>...<span class="nt"></appender></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"TCP"</span><span class="nt">></span>...<span class="nt"></appender></span>
<span class="nt"></appender></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"</span><span class="nt">></span>
<span class="nt"><ringBufferSize></span>[some large multiple of 2]<span class="nt"></ringBufferSize></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"RollingFileAppender"</span><span class="nt">></span>
<span class="c"><!-- trigger LOGBACK-267 --></span>
<span class="nt"><rollingPolicy</span> <span class="na">class=</span><span class="s">"FixedWindowRollingPolicy"</span><span class="nt">></span>
<span class="nt"><fileNamePattern></span>backup%i.log.gz<span class="nt"></fileNamePattern></span>
...
<span class="nt"></rollingPolicy></span>
<span class="nt"><triggeringPolicy></span>
<span class="nt"><maxFileSize></span>4GB<span class="nt"></maxFileSize></span>
<span class="nt"></triggeringPolicy></span>
<span class="nt"><encoder></span>...<span class="nt"></encoder></span>
<span class="nt"></appender></span>
<span class="nt"></appender></span>
<span class="nt"></root></span>
<span class="nt"></configuration></span>
</code></pre></div></div>
<p>You may lose some events if it spills over, but that's better than stalling your application.</p>
<p>You can also add an <a href="https://github.com/logfellow/logstash-logback-encoder#appender-listeners">appender listener</a> to notify you of any dropped messages. The <code class="language-plaintext highlighter-rouge">FailureSummaryLoggingAppenderListener</code> implementation will log a summary of any dropped messages, but it does have the drawback that the listener logs the summary to the same appender that is dropping messages – so the summary itself can be lost. You are better off writing your own implementation from the <a href="https://github.com/logfellow/logstash-logback-encoder/blob/main/src/main/java/net/logstash/logback/appender/listener/AppenderListener.java">interface</a>, and using it to send to your metrics or error reporting system in a scheduled runnable using the <code class="language-plaintext highlighter-rouge">ScheduledExecutionService</code> from Logback's <code class="language-plaintext highlighter-rouge">Context</code>.</p>
<h2 id="throughput">Throughput</h2>
<p>The throughput of an application is how many operations it can process over a period of time. This is typically the metric we care about for batch operations and things that happen in the background.</p>
<p>Throughput is a tricky quantity to measure, because doing operations in bulk improves throughput, but can cause applications to seem unresponsive, and vice versa. For example, writing to STDOUT is "fast" because of I/O buffering, but writing STDOUT to a terminal is <a href="https://medium.com/spencerweekly/console-output-overhead-why-is-writing-to-stdout-so-slow-b0cc7c88704c">slow</a>, because users expect immediate feedback.</p>
<p>Let's see what raw disk throughput looks like on my laptop.</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">class</span> <span class="nc">Main</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">org</span><span class="o">.</span><span class="na">slf4j</span><span class="o">.</span><span class="na">Logger</span> <span class="n">logger</span> <span class="o">=</span> <span class="n">org</span><span class="o">.</span><span class="na">slf4j</span><span class="o">.</span><span class="na">LoggerFactory</span><span class="o">.</span><span class="na">getLogger</span><span class="o">(</span><span class="n">Main</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="n">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
<span class="n">Timer</span> <span class="n">timer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Timer</span><span class="o">();</span>
<span class="n">timer</span><span class="o">.</span><span class="na">schedule</span><span class="o">(</span><span class="k">new</span> <span class="n">TimerTask</span><span class="o">()</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">run</span><span class="o">()</span> <span class="o">{</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">"Exiting"</span><span class="o">);</span>
<span class="n">System</span><span class="o">.</span><span class="na">exit</span><span class="o">(</span><span class="mi">0</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">},</span> <span class="n">TimeUnit</span><span class="o">.</span><span class="na">MINUTES</span><span class="o">.</span><span class="na">toMillis</span><span class="o">(</span><span class="mi">1</span><span class="o">));</span>
<span class="kt">int</span> <span class="n">threads</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">threads</span><span class="o">;</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
<span class="kd">final</span> <span class="n">String</span> <span class="n">name</span> <span class="o">=</span> <span class="s">"logger-"</span> <span class="o">+</span> <span class="n">i</span><span class="o">;</span>
<span class="n">Thread</span> <span class="n">t</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Thread</span><span class="o">(</span><span class="n">name</span><span class="o">)</span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">run</span><span class="o">()</span> <span class="o">{</span>
<span class="k">while</span> <span class="o">(</span><span class="kc">true</span><span class="o">)</span> <span class="o">{</span>
<span class="n">logger</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"Hello world!"</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">};</span>
<span class="n">t</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Using the <a href="https://logback.qos.ch/manual/appenders.html#SiftingAppender">sifting appender</a> together with the <a href="https://dzone.com/articles/siftingappender-logging">thread name based discriminator</a>.</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><configuration</span> <span class="na">debug=</span><span class="s">"true"</span><span class="nt">></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"FILE"</span> <span class="na">class=</span><span class="s">"ch.qos.logback.core.FileAppender"</span><span class="nt">></span>
<span class="nt"><file></span>application.log<span class="nt"></file></span>
<span class="nt"><append></span>false<span class="nt"></append></span>
<span class="nt"><immediateFlush></span>false<span class="nt"></immediateFlush></span>
<span class="nt"><encoder></span>
<span class="nt"><pattern></span>%-4relative [%thread] %-5level %logger{35} - %msg%n<span class="nt"></pattern></span>
<span class="nt"></encoder></span>
<span class="nt"></appender></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"SIFT"</span> <span class="na">class=</span><span class="s">"ch.qos.logback.classic.sift.SiftingAppender"</span><span class="nt">></span>
<span class="nt"><discriminator</span> <span class="na">class=</span><span class="s">"org.example.ThreadNameBasedDiscriminator"</span><span class="nt">/></span>
<span class="nt"><sift></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"FILE-${threadName}"</span> <span class="na">class=</span><span class="s">"ch.qos.logback.core.FileAppender"</span><span class="nt">></span>
<span class="nt"><file></span>${threadName}.log<span class="nt"></file></span>
<span class="nt"><append></span>false<span class="nt"></append></span>
<span class="nt"><immediateFlush></span>false<span class="nt"></immediateFlush></span>
<span class="nt"><encoder></span>
<span class="nt"><pattern></span>%-4relative [%thread] %-5level %logger{35} - %msg%n<span class="nt"></pattern></span>
<span class="nt"></encoder></span>
<span class="nt"></appender></span>
<span class="nt"></sift></span>
<span class="nt"></appender></span>
<span class="nt"><root</span> <span class="na">level=</span><span class="s">"INFO"</span><span class="nt">></span>
<span class="nt"><appender-ref</span> <span class="na">ref=</span><span class="s">"FILE"</span><span class="nt">/></span>
<span class="nt"></root></span>
<span class="nt"></configuration></span>
</code></pre></div></div>
<p>Run it for 1 minute, 1 thread, one FILE appender (100% core usage) = 5.5GB</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-rw-rw-r-- 1 wsargent wsargent 5815185001 Oct 16 14:13 application.log
</code></pre></div></div>
<p><img src="/images/throughput/1file1thread.png" alt="1file1thread" /></p>
<p>There are use cases where you may want to log this much! If you want to enable total debug output on your local filesystem and nothing else is involved in file IO, then yeah, why not? This is the premise behind <a href="https://tersesystems.com/blog/2020/11/26/queryable-logging-with-blacklite/">diagnostic logging in Blacklite</a>, where you keep a rolling buffer of debug events in SQLite that you can dip into for extra context when an error occurs.</p>
<p>However, in most circumstances, your issue will be <em>too much</em> throughput rather than <em>too little</em>. I've already written about the costs involved in indexing and storing logs in <a href="https://tersesystems.com/blog/2019/06/03/application-logging-in-java-part-6/">Logging Costs</a> so I won't go over them again – suffice to say that your devops team will not be happy at these numbers if you are planning to log at INFO or above and send it to a centralized logging environment.</p>
<p>Instead, I want to look more at the bottom line number. Is there any way we can make this faster?</p>
<p>First, let's just up the number of input threads logging, just to confirm that the bottleneck is on the backend.</p>
<p>1 minute, 4 threads, one FILE appender = 4.4GB</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-rw-rw-r-- 1 wsargent wsargent 4777409363 Oct 16 16:37 application.log
</code></pre></div></div>
<p>Huh. Throughput goes <em>down</em> when we log from multiple threads. This is likely due to <a href="https://mechanical-sympathy.blogspot.com/2011/09/single-writer-principle.html">single writer principle</a>.</p>
<p>But maybe it's the encoder that's slowing things down before it reaches the filesystem. One tip from <a href="https://corecursive.com/frontiers-of-performance-with-daniel-lemire/#io-and-file-processing-performance">this podcast</a> where Daniel Lemure says that CPU bottlenecks can come before IO bottlenecks, even though it's not popular orthodoxy. Using htop, it looked like it was maxing out a single core in the process, so let's work from there.</p>
<p>If we ran several threads to several different files, then we can avoid the bottleneck on a single core. Let's see what happens when we use the sifting appender to multiplex multiple threads to multiple files.</p>
<p>1 minute, 2 threads, sifting appender (90% core usage on two threads) = 3.3 GB:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-rw-rw-r-- 1 wsargent wsargent 1804883163 Oct 16 14:16 logger-0.log
-rw-rw-r-- 1 wsargent wsargent 1614980252 Oct 16 14:16 logger-1.log
</code></pre></div></div>
<p><img src="/images/throughput/1file2threads.png" alt="1file2threads" /></p>
<p>1 minute, 4 threads, sifting appender = 3.5 GB</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-rw-rw-r-- 1 wsargent wsargent 977745435 Oct 16 14:20 logger-0.log
-rw-rw-r-- 1 wsargent wsargent 914416956 Oct 16 14:20 logger-1.log
-rw-rw-r-- 1 wsargent wsargent 954032694 Oct 16 14:20 logger-2.log
-rw-rw-r-- 1 wsargent wsargent 968975679 Oct 16 14:20 logger-3.log
</code></pre></div></div>
<p><img src="/images/throughput/1file4threads.png" alt="1file4threads" /></p>
<p>Okay, so it's not that the encoder is the bottleneck. Instead, it appears to be the disk, and switching contexts between threads doesn't help the throughput. Let's write to <code class="language-plaintext highlighter-rouge">tmpfs</code> instead, using the default <a href="https://www.cyberciti.biz/tips/what-is-devshm-and-its-practical-usage.html"><code class="language-plaintext highlighter-rouge">/dev/shm</code></a>.</p>
<p>1 minute, 1 thread, one FILE appender <code class="language-plaintext highlighter-rouge">/dev/shm/application.log</code> = 8.5 GB:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-rw-rw-r-- 1 wsargent wsargent 9027620209 Oct 16 17:27 application.log
</code></pre></div></div>
<p><img src="/images/throughput/1filefsync.png" alt="1filefsync" /></p>
<p>Ah-ha! That almost doubled the throughput!</p>
<p>Now lets see what happens if we add more threads, just to check.</p>
<p>1 minute, 2 threads, one FILE appender <code class="language-plaintext highlighter-rouge">/dev/shm/application.log</code> = 8.0 GB:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-rw-rw-r-- 1 wsargent wsargent 8563431350 Oct 16 18:06 application.log
</code></pre></div></div>
<p>Yep, that again reduces the throughput.</p>
<p>The message to take away from this is that if you want to maximize throughput, it's not a question of picking the right logging framework, or the right appender – you need to look at your bottlenecks.</p>
<h2 id="summary">Summary</h2>
<ul>
<li>Use conditional guards and avoid creating objects or calling methods that may block (which may include <code class="language-plaintext highlighter-rouge">toString</code>).</li>
<li>Use <code class="language-plaintext highlighter-rouge">logback-logstash-encoder</code>, preferably with a listener so you can tell if the queue fills up.</li>
<li>Be careful of edge cases with compression and rollover.</li>
<li>Usually the concern is <em>too much</em> throughput rather than <em>too little</em>.</li>
<li>If you really need it, use <code class="language-plaintext highlighter-rouge">/dev/shm</code> or <a href="https://github.com/tersesystems/blacklite/">Blacklite</a> for your log storage.</li>
</ul>I've been working with Logback for a while now, and one of the things that stands out is how people will talk about "fast" or "performant" logging, with the theory that picking the right encoder or the right appender will make things work. It's not wrong, but it's not exactly right either. So, this blog post discusses latency and throughput in Logback, along with some fun non-obvious things that can cause production issues if you're not careful. And it has pictures! Latency Latency is defined as the amount of time required to complete a single operation. Latency is a surprisingly slippery concept, because as soon as you start aggregating latency times, you can end up with visualizations that can omit or obscure parts of the picture. Latency can be defined as averages, percentiles, histograms (useful for "long tail" latency), or heatmaps. Because we're talking about conceptual latency here, we'll talk about the "average" latency between a logging statement, and a statement being logged. @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.NANOSECONDS) class SLF4JBenchmark { import SLF4JBenchmark._ @Benchmark def boundedDebugWithTemplate(): Unit = if (logger.isDebugEnabled) { logger.debug("hello world, {}", longAdder.incrementAndGet()) } } And using an encoder and appender like this: <configuration> <appender name="FILE" class="ch.qos.logback.core.FileAppender"> <file>testFile.log</file> <append>false</append> <immediateFlush>false</immediateFlush> <encoder> <pattern>%-4relative [%thread] %-5level %logger{35} - %msg%n</pattern> </encoder> </appender> </configuration> Say that boundedDebugWithTemplate takes roughly 871 nanoseconds as measured by JMH. We can visualize this as a straight line, from the time of logging to the time that bytes were appended to a file. But logging is made up of several operations. For example, if we swap out the file appender for an no-op appender that does nothing but create the logging event and a message based off the template, we can see that the same operation takes only 33 nanoseconds. If we set the logger to INFO level, we can see the isLoggingDebug call takes only 1.6 nanoseconds. So in reality, what we're looking at is more like this: Because the FileAppender is blocking and Logback runs everything in the calling thread, this means that turning on debugging in an operation will add ~871 ns to every call. This also compounds for every blocking appender. The initial costs of putting together the logging event happen once, but if you have a STDOUT appender, a file appender, and a network appender, they all encode the logging event using distinct encoders, and render sequentially on the same thread. In practical terms – the more appenders you add, the slower your code gets when you log. It's important to note at this point how tiny a latency of 871 nanoseconds is – for comparison, instantiating any Java object costs around 20 nanoseconds. For most operations, logging is not the bottleneck compared to the costs of the operation itself – unnecessary database queries, blocking on network calls, and lack of caching are still the low hanging fruit. However, it is still a cost. Moreover, looking at the average latency doesn't tell you about the outliers – the "long tail" of latency. If an operation blocks in any way, then that cost will be passed on to the application. And blocking can happen in the most insidious of ways. The obvious source of blocking is when a logging event or message includes a blocking call. For example, calling UUID.randomUUID() blocks because of the internal lock, or calling toString() on a collection that contains java.net.URL objects, causing hundreds of DNS resolutions. This can block an HTTP request for multiple seconds, and it won't be immediately obvious from looking at the logs. But blocking is not solely an input problem though – blocking can come from Logback itself. Blocking in Logback can come from appenders. Anything extending AppenderBase uses a synchronized lock that ensures only one thread is appending. While it looks like blocking in appenders is a small consistent cost, this is not always the case. For example, a rolling file appender can block on rollover. LOGBACK-267 means that if you use FixedWindowRollingPolicy and enable compression by specifying a .gz suffix, then compressing multi-gigabyte files can stall the appender, blocking all logging for 55 to 69 seconds. The underlying cause is that FixedWindowRollingPolicy.java calls compressor.compress, as opposed to TimeBasedRollingPolicy.java which uses compressor.asyncCompress. You might think the problem of blocking can be easily avoided, but it's not quite that simple. Blocking can happen at the kernel, even when writing to memory mapped files, as the operating system manages writes. This causes issues. Filesystem blocking can occur even in software RAID or a network backed VFS. In short, when files were created this made lots of people angry, and was widely regarded as a bad move. I suspect that the TCP appenders and TCP network stack work differently, but then the assumption is that the network is reliable. Asynchronous Logging There is a way to avoid unanticipated blocking: we can log asynchronously. Asynchronous logging is a trend, with asynchronous GC logging in Corretto 17 coming out for the JDK itself. There are several ways to implement asynchronous logging. Echopraxia can address it at invocation with an asynchronous logger, deferring argument construction and condition evaluations and allowing caller information for free, at the cost of a more complex method interface. Alternatively, asynchronous logging can be implemented in an appender, although this does mean that argument and LoggingEvent construction happen on the calling thread. Logback does have an out of the box async appender, but the LoggingEventAsyncDisruptorAppender from logstash-logback-encoder is much richer from a feature-based perspective; it by default drops all events when full, can warn when full, and has more customization available on ringbuffer size and behavior. From a performance perspective I'd say it's a wash for most people – note that the logback performance page discusses throughput, so it's not an apples to apples comparison. <appender name="async" class="net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"> <appender class="ch.qos.logback.core.rolling.RollingFileAppender"> ... </appender> </appender> An async appender will accept LoggerEvent, and will write to an in-memory ring buffer that is used by a dedicated thread to write to the enclosed appenders. On average, the mean latency for a disruptor is ~50 nanoseconds, up to a worst case scenario of 420 ns when the queue is fully loaded. This means that the rendering thread only incurs the latency cost of 33 ns (eval + logback event) + 50 ns (enqueuing), but does not incur the latency cost of appending to file. An asynchronous boundary exists between the thread running the operation, and the thread that picks up the logger and writes to the appenders. Using multiple threads enables logging to be concurrent, running alongside operations without interfering with them. There is a difference between concurrency and parallelism: if there's only one core available, then the two threads may run interleaved, and there may be a small delay in writing the logs. If there are multiple cores available though, then typically the thread will be writing logs in parallel. There are some special cases / catches to asynchronous logging. The first catch is not adding a shutdown hook; you need to let the ring buffer gracefully shutdown, and if Logback shuts down immediately you will miss events that could be critical. The second catch is to use unnecessary async appenders, each wrapping a single appender. This can be a waste of threads; you only need one to create an asynchronous boundary. If you do not anticipate significant load and your appenders are fast, my recommendation is to define the appender at the root, before you do anything else. <configuration> <shutdownHook class="ch.qos.logback.core.hook.DelayingShutdownHook"> <delay>150</delay> </shutdownHook> <root level="all"> <appender class="net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"> <appender name="FILE">...</appender> <appender name="STDOUT">...</appender> <appender name="TCP">...</appender> </appender> </root> </configuration> The third catch is what happens to asynchronous logging when there is significant load. Ring buffers can fill up when the underlying appenders are slow and do not drain the buffer fast enough, and a full ring buffer can result in dropped events. Therefore, if you do have an appender that's awkward (and you can't fix it), you should configure a distinct appender for it and configure it so it doesn't jam up the others. <configuration> <root level="all"> <appender class="net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"> <appender name="FILE">...</appender> <appender name="STDOUT">...</appender> <appender name="TCP">...</appender> </appender> <appender class="net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"> <ringBufferSize>[some large multiple of 2]</ringBufferSize> <appender class="RollingFileAppender"> <!-- trigger LOGBACK-267 --> <rollingPolicy class="FixedWindowRollingPolicy"> <fileNamePattern>backup%i.log.gz</fileNamePattern> ... </rollingPolicy> <triggeringPolicy> <maxFileSize>4GB</maxFileSize> </triggeringPolicy> <encoder>...</encoder> </appender> </appender> </root> </configuration> You may lose some events if it spills over, but that's better than stalling your application. You can also add an appender listener to notify you of any dropped messages. The FailureSummaryLoggingAppenderListener implementation will log a summary of any dropped messages, but it does have the drawback that the listener logs the summary to the same appender that is dropping messages – so the summary itself can be lost. You are better off writing your own implementation from the interface, and using it to send to your metrics or error reporting system in a scheduled runnable using the ScheduledExecutionService from Logback's Context. Throughput The throughput of an application is how many operations it can process over a period of time. This is typically the metric we care about for batch operations and things that happen in the background. Throughput is a tricky quantity to measure, because doing operations in bulk improves throughput, but can cause applications to seem unresponsive, and vice versa. For example, writing to STDOUT is "fast" because of I/O buffering, but writing STDOUT to a terminal is slow, because users expect immediate feedback. Let's see what raw disk throughput looks like on my laptop. public class Main { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(Main.class); public static void main(String[] args) { Timer timer = new Timer(); timer.schedule(new TimerTask() { @Override public void run() { System.out.println("Exiting"); System.exit(0); } }, TimeUnit.MINUTES.toMillis(1)); int threads = 1 for (int i = 0; i < threads; i++) { final String name = "logger-" + i; Thread t = new Thread(name) { @Override public void run() { while (true) { logger.info("Hello world!"); } } }; t.start(); } } } Using the sifting appender together with the thread name based discriminator. <configuration debug="true"> <appender name="FILE" class="ch.qos.logback.core.FileAppender"> <file>application.log</file> <append>false</append> <immediateFlush>false</immediateFlush> <encoder> <pattern>%-4relative [%thread] %-5level %logger{35} - %msg%n</pattern> </encoder> </appender> <appender name="SIFT" class="ch.qos.logback.classic.sift.SiftingAppender"> <discriminator class="org.example.ThreadNameBasedDiscriminator"/> <sift> <appender name="FILE-${threadName}" class="ch.qos.logback.core.FileAppender"> <file>${threadName}.log</file> <append>false</append> <immediateFlush>false</immediateFlush> <encoder> <pattern>%-4relative [%thread] %-5level %logger{35} - %msg%n</pattern> </encoder> </appender> </sift> </appender> <root level="INFO"> <appender-ref ref="FILE"/> </root> </configuration> Run it for 1 minute, 1 thread, one FILE appender (100% core usage) = 5.5GB -rw-rw-r-- 1 wsargent wsargent 5815185001 Oct 16 14:13 application.log There are use cases where you may want to log this much! If you want to enable total debug output on your local filesystem and nothing else is involved in file IO, then yeah, why not? This is the premise behind diagnostic logging in Blacklite, where you keep a rolling buffer of debug events in SQLite that you can dip into for extra context when an error occurs. However, in most circumstances, your issue will be too much throughput rather than too little. I've already written about the costs involved in indexing and storing logs in Logging Costs so I won't go over them again – suffice to say that your devops team will not be happy at these numbers if you are planning to log at INFO or above and send it to a centralized logging environment. Instead, I want to look more at the bottom line number. Is there any way we can make this faster? First, let's just up the number of input threads logging, just to confirm that the bottleneck is on the backend. 1 minute, 4 threads, one FILE appender = 4.4GB -rw-rw-r-- 1 wsargent wsargent 4777409363 Oct 16 16:37 application.log Huh. Throughput goes down when we log from multiple threads. This is likely due to single writer principle. But maybe it's the encoder that's slowing things down before it reaches the filesystem. One tip from this podcast where Daniel Lemure says that CPU bottlenecks can come before IO bottlenecks, even though it's not popular orthodoxy. Using htop, it looked like it was maxing out a single core in the process, so let's work from there. If we ran several threads to several different files, then we can avoid the bottleneck on a single core. Let's see what happens when we use the sifting appender to multiplex multiple threads to multiple files. 1 minute, 2 threads, sifting appender (90% core usage on two threads) = 3.3 GB: -rw-rw-r-- 1 wsargent wsargent 1804883163 Oct 16 14:16 logger-0.log -rw-rw-r-- 1 wsargent wsargent 1614980252 Oct 16 14:16 logger-1.log 1 minute, 4 threads, sifting appender = 3.5 GB -rw-rw-r-- 1 wsargent wsargent 977745435 Oct 16 14:20 logger-0.log -rw-rw-r-- 1 wsargent wsargent 914416956 Oct 16 14:20 logger-1.log -rw-rw-r-- 1 wsargent wsargent 954032694 Oct 16 14:20 logger-2.log -rw-rw-r-- 1 wsargent wsargent 968975679 Oct 16 14:20 logger-3.log Okay, so it's not that the encoder is the bottleneck. Instead, it appears to be the disk, and switching contexts between threads doesn't help the throughput. Let's write to tmpfs instead, using the default /dev/shm. 1 minute, 1 thread, one FILE appender /dev/shm/application.log = 8.5 GB: -rw-rw-r-- 1 wsargent wsargent 9027620209 Oct 16 17:27 application.log Ah-ha! That almost doubled the throughput! Now lets see what happens if we add more threads, just to check. 1 minute, 2 threads, one FILE appender /dev/shm/application.log = 8.0 GB: -rw-rw-r-- 1 wsargent wsargent 8563431350 Oct 16 18:06 application.log Yep, that again reduces the throughput. The message to take away from this is that if you want to maximize throughput, it's not a question of picking the right logging framework, or the right appender – you need to look at your bottlenecks. Summary Use conditional guards and avoid creating objects or calling methods that may block (which may include toString). Use logback-logstash-encoder, preferably with a listener so you can tell if the queue fills up. Be careful of edge cases with compression and rollover. Usually the concern is too much throughput rather than too little. If you really need it, use /dev/shm or Blacklite for your log storage.Adding Echopraxia to Akka2022-10-02T14:45:34-07:002022-10-02T14:45:34-07:00https://tersesystems.com/blog/2022/10/02/adding-echopraxia-to-akka<p><strong>TL;DR</strong> I released <a href="https://github.com/tersesystems/echopraxia-plusakka">echopraxia-plusakka</a>, a library that integrates Echopraxia with Akka's component system, which also resulted in adding a "direct" API to <a href="https://github.com/tersesystems/echopraxia">echopraxia</a> based off SLF4J markers.</p>
<hr />
<p>It's been a minute. After releasing the <a href="https://github.com/tersesystems/echopraxia-plusscala">Scala API</a> for Echopraxia and <a href="https://tersesystems.com/blog/2022/06/12/what-scala-adds-to-a-logging-api/">writing it up</a>, I've been working my way up the chain and trying to exploit/break the API with progressively more demanding use cases.</p>
<p><a href="https://github.com/akka/akka">Akka</a> has been a personal favorite testing ground of mine. Akka is deeply concurrent, and as such using a debugger is nearly pointless – even if you add breakpoints, you'll trip over timeouts if you take too long to return a message. As such there's really only two reliable ways to debug and observe Akka code. Unit tests… and <a href="https://blog.softwaremill.com/akka-streams-pitfalls-to-avoid-part-2-f93e60746c58">logging</a>.</p>
<p>So. The task I set myself was to add structured logging to Akka. I already had an advantage in that I'm familiar with Akka internals, and in the end it was fairly straightforward with only a couple of surprises.</p>
<h2 id="akka-logging">Akka Logging</h2>
<p>Akka's logging depends on an underlying <a href="https://doc.akka.io/api/akka/2.6/akka/event/LoggingAdapter.html"><code class="language-plaintext highlighter-rouge">LoggingAdapter</code></a> which goes through an <a href="https://doc.akka.io/docs/akka/current/typed/logging.html#event-bus">event bus</a> to <a href="https://doc.akka.io/api/akka/2.6/akka/event/slf4j/index.html">akka-slf4j</a>.</p>
<p>The first obstacle to adding structured logging is that the <code class="language-plaintext highlighter-rouge">MarkerLoggingAdapter</code> serializes arguments into a String before publishing it to the event bus, using <a href="https://github.com/akka/akka/blob/v2.6.20/akka-actor/src/main/scala/akka/event/Logging.scala#L1769">formatN</a> to convert arguments.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>class MarkerLoggingAdapter extends BusLogging {
def error(marker: LogMarker, cause: Throwable, template: String, arg1: Any): Unit =
if (isErrorEnabled(marker))
bus.publish(Error(cause, logSource, logClass,
format1(template, arg1), // returns `String`
mdc, marker))
}
</code></pre></div></div>
<p>Because <code class="language-plaintext highlighter-rouge">MarkerLoggingAdapter</code> converts arguments to String eagerly, any arguments that pass through Akka's logging will be flattened and can't be retrieved later – there is no <code class="language-plaintext highlighter-rouge">Error(msg, arg1)</code> passed through to the event bus and then to <a href="https://github.com/akka/akka/blob/v2.6.20/akka-slf4j/src/main/scala/akka/event/slf4j/Slf4jLogger.scala#L56"><code class="language-plaintext highlighter-rouge">Slf4jLogger</code></a>.</p>
<p>It is still possible to pass through structured data though! Because the MarkerLoggingAdapter passes through <code class="language-plaintext highlighter-rouge">LogMarker</code> then using <code class="language-plaintext highlighter-rouge">SLF4JLogMarker</code> will pass along an <code class="language-plaintext highlighter-rouge">org.slf4j.Marker</code> through to Logback, and we can piggyback information on the way.</p>
<p>This led me to think about using Echopraxia directly against SLF4J.</p>
<h2 id="direct-api">Direct API</h2>
<p>Echopraxia does allow you to log using an <code class="language-plaintext highlighter-rouge">org.slf4j.Logger</code> directly for simple cases. For example, arguments work fine:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">FieldBuilder</span> <span class="n">fb</span> <span class="o">=</span> <span class="n">FieldBuilder</span><span class="o">.</span><span class="na">instance</span><span class="o">();</span>
<span class="n">org</span><span class="o">.</span><span class="na">slf4j</span><span class="o">.</span><span class="na">Logger</span> <span class="n">slf4jLogger</span> <span class="o">=</span> <span class="n">org</span><span class="o">.</span><span class="na">slf4j</span><span class="o">.</span><span class="na">LoggerFactory</span><span class="o">.</span><span class="na">getLogger</span><span class="o">(</span><span class="s">"com.example.Main"</span><span class="o">);</span>
<span class="n">slf4jLogger</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="s">"SLF4J message {}"</span><span class="o">,</span> <span class="n">fb</span><span class="o">.</span><span class="na">string</span><span class="o">(</span><span class="s">"foo"</span><span class="o">,</span> <span class="s">"bar"</span><span class="o">));</span>
</code></pre></div></div>
<p>Although as exceptions in SLF4J get "eaten" if they have a template placeholder, if you want to keep the exception, you need to pass it in twice:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">slf4jLogger</span><span class="o">.</span><span class="na">error</span><span class="o">(</span><span class="s">"SLF4J exception {}"</span><span class="o">,</span> <span class="n">fb</span><span class="o">.</span><span class="na">exception</span><span class="o">(</span><span class="n">e</span><span class="o">),</span> <span class="n">e</span><span class="o">);</span>
</code></pre></div></div>
<p>However, <a href="https://github.com/tersesystems/echopraxia#conditions">conditions</a> and <a href="https://github.com/tersesystems/echopraxia#context">context fields</a> do not exist in the SLF4J API. If we want to use SLF4J, then it's time to <a href="https://tersesystems.com/blog/2019/05/18/application-logging-in-java-part-4/">fake it with markers</a>.</p>
<p>I added some <a href="https://github.com/tersesystems/echopraxia#direct-logback--slf4j-api">direct API</a> features to Echopraxia. Using the direct API, context fields can be represented by <code class="language-plaintext highlighter-rouge">FieldMarker</code>, and conditions by <code class="language-plaintext highlighter-rouge">ConditionMarker</code>. This passes information through to the backend appropriately.</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">com.tersesystems.echopraxia.logback.*</span><span class="o">;</span>
<span class="kn">import</span> <span class="nn">net.logstash.logback.marker.Markers</span><span class="o">;</span>
<span class="n">FieldBuilder</span> <span class="n">fb</span> <span class="o">=</span> <span class="n">FieldBuilder</span><span class="o">.</span><span class="na">instance</span><span class="o">();</span>
<span class="n">FieldMarker</span> <span class="n">fields</span> <span class="o">=</span> <span class="n">FieldMarker</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span>
<span class="n">fb</span><span class="o">.</span><span class="na">list</span><span class="o">(</span>
<span class="n">fb</span><span class="o">.</span><span class="na">string</span><span class="o">(</span><span class="s">"sessionId"</span><span class="o">,</span> <span class="s">"value"</span><span class="o">),</span>
<span class="n">fb</span><span class="o">.</span><span class="na">number</span><span class="o">(</span><span class="s">"correlationId"</span><span class="o">,</span> <span class="mi">1</span><span class="o">)</span>
<span class="o">)</span>
<span class="o">);</span>
<span class="n">ConditionMarker</span> <span class="n">conditionMarker</span> <span class="o">=</span> <span class="n">ConditionMarker</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span>
<span class="n">Condition</span><span class="o">.</span><span class="na">stringMatch</span><span class="o">(</span><span class="s">"sessionId"</span><span class="o">,</span> <span class="n">s</span> <span class="o">-></span> <span class="n">s</span><span class="o">.</span><span class="na">raw</span><span class="o">().</span><span class="na">equals</span><span class="o">(</span><span class="s">"value"</span><span class="o">)))</span>
<span class="o">);</span>
<span class="n">logger</span><span class="o">.</span><span class="na">info</span><span class="o">(</span><span class="n">Markers</span><span class="o">.</span><span class="na">aggregate</span><span class="o">(</span><span class="n">fieldMarker</span><span class="o">,</span> <span class="n">conditionMarker</span><span class="o">),</span> <span class="s">"condition and marker"</span><span class="o">);</span>
</code></pre></div></div>
<p>This is only half the story though – the condition still needs to be evaluated, and because that doesn't go through an Echopraxia logger, that means adding a <code class="language-plaintext highlighter-rouge">ConditionTurboFilter</code>:</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><configuration></span>
<span class="nt"><turboFilter</span> <span class="na">class=</span><span class="s">"com.tersesystems.echopraxia.logback.ConditionTurboFilter"</span><span class="nt">/></span>
<span class="nt"></configuration></span>
</code></pre></div></div>
<p>And then also when rendering JSON, we need to swap out the <code class="language-plaintext highlighter-rouge">FieldMarker</code> with actual <a href="https://github.com/logfellow/logstash-logback-encoder#event-specific-custom-fields">event specific custom fields</a> that <code class="language-plaintext highlighter-rouge">logstash-logback-encoder</code> will recognize, using <code class="language-plaintext highlighter-rouge">LogstashFieldAppender</code>.</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><configuration></span>
<span class="c"><!-- ... --></span>
<span class="nt"><root</span> <span class="na">level=</span><span class="s">"INFO"</span><span class="nt">></span>
<span class="c"><!-- replaces fields with logstash markers and structured arguments --></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"com.tersesystems.echopraxia.logstash.LogstashFieldAppender"</span><span class="nt">></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"ch.qos.logback.core.FileAppender"</span><span class="nt">></span>
<span class="nt"><file></span>application.log<span class="nt"></file></span>
<span class="nt"><encoder</span> <span class="na">class=</span><span class="s">"net.logstash.logback.encoder.LogstashEncoder"</span><span class="nt">/></span>
<span class="nt"></appender></span>
<span class="nt"></appender></span>
<span class="nt"></root></span>
<span class="nt"></configuration></span>
</code></pre></div></div>
<p>This… is a hack, and I don't love it. The problem here is that there is no central pipeline for creating and manipulating Logback's <code class="language-plaintext highlighter-rouge">LoggingEvent</code>. The turbo filter API will only let you return <code class="language-plaintext highlighter-rouge">FilterReply</code> and the actual creation of a <code class="language-plaintext highlighter-rouge">LoggingEvent</code> happens internally. So… if you want to tweak the logging event, you have to have an appender transform it, then pass it through to the appender's children. This is the approach used in <a href="https://tersesystems.com/blog/2019/05/27/application-logging-in-java-part-5/">composite appenders</a>.</p>
<p>This is complicated by Logback not officially supporting <code class="language-plaintext highlighter-rouge">appender-ref</code> for appenders themselves. You can add <code class="language-plaintext highlighter-rouge">appender-ref</code> from the root:</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><configuration></span>
<span class="c"><!-- ... --></span>
<span class="nt"><root</span> <span class="na">level=</span><span class="s">"DEBUG"</span><span class="nt">></span>
<span class="nt"><appender-ref</span> <span class="na">ref=</span><span class="s">"FILE"</span> <span class="nt">/></span>
<span class="nt"></root></span>
<span class="nt"></configuration></span>
</code></pre></div></div>
<p>but even though it's perfectly legal to have appender children, to add <code class="language-plaintext highlighter-rouge">appender-ref</code> on appenders, you need to explicitly loosen the <code class="language-plaintext highlighter-rouge">AppenderRefAction</code> to match (which can cause complaints):</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><configuration></span>
<span class="c"><!-- loosen the rule on appender refs so appenders can reference them --></span>
<span class="nt"><newRule</span> <span class="na">pattern=</span><span class="s">"*/appender/appender-ref"</span>
<span class="na">actionClass=</span><span class="s">"ch.qos.logback.core.joran.action.AppenderRefAction"</span><span class="nt">/></span>
<span class="c"><!-- ... --></span>
<span class="nt"><appender</span> <span class="na">name=</span><span class="s">"CONSOLE_AND_FILE"</span> <span class="na">class=</span><span class="s">"com.tersesystems.logback.CompositeAppender"</span><span class="nt">></span>
<span class="nt"><appender-ref</span> <span class="na">ref=</span><span class="s">"CONSOLE"</span><span class="nt">/></span>
<span class="nt"><appender-ref</span> <span class="na">ref=</span><span class="s">"FILE"</span><span class="nt">/></span>
<span class="nt"></appender></span>
<span class="nt"><root</span> <span class="na">level=</span><span class="s">"DEBUG"</span><span class="nt">></span>
<span class="nt"><appender-ref</span> <span class="na">ref=</span><span class="s">"CONSOLE_AND_FILE"</span> <span class="nt">/></span>
<span class="nt"></root></span>
<span class="nt"></configuration></span>
</code></pre></div></div>
<p>On a tangent, because Logback runs through appenders in sequence in the same thread, it's possible for synchronous appenders to block asynchronous appenders:</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><configuration></span>
<span class="nt"><root</span> <span class="na">level=</span><span class="s">"INFO"</span><span class="nt">></span>
<span class="c"><!-- this runs first in the executing thread --></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"ch.qos.logback.core.FileAppender"</span><span class="nt">></span>
<span class="c"><!-- ... --></span>
<span class="nt"></appender></span>
<span class="c"><!-- only gets the event after first appender... --></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"</span><span class="nt">></span>
<span class="c"><!-- ... --></span>
<span class="nt"></appender></span>
<span class="nt"></root></span>
<span class="nt"></configuration></span>
</code></pre></div></div>
<p>As such, either you have multiple async appenders, or you wrap all the IO appenders inside a disruptor so you only have the overhead of one thread. This means that appenders can really serve three different roles: managing concurrency, event transformation, and IO sinks with encoders.</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><configuration></span>
<span class="nt"><root</span> <span class="na">level=</span><span class="s">"INFO"</span><span class="nt">></span>
<span class="c"><!-- immediately move off the rendering thread... --></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"</span><span class="nt">></span>
<span class="c"><!-- ...transform event in pipeline... --></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"com.tersesystems.echopraxia.logstash.LogstashFieldAppender"</span><span class="nt">></span>
<span class="c"><!-- ...render to IO/Network/STDOUT --></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"ch.qos.logback.core.ConsoleAppender"</span><span class="nt">></span>
<span class="nt"><encoder></span>
<span class="nt"><pattern></span>[%-5level] %logger{15} - message%n%xException{10}<span class="nt"></pattern></span>
<span class="nt"></encoder></span>
<span class="nt"></appender></span>
<span class="nt"><appender</span> <span class="na">class=</span><span class="s">"ch.qos.logback.core.FileAppender"</span><span class="nt">></span>
<span class="nt"><file></span>application.log<span class="nt"></file></span>
<span class="nt"><encoder</span> <span class="na">class=</span><span class="s">"net.logstash.logback.encoder.LogstashEncoder"</span><span class="nt">/></span>
<span class="nt"></appender></span>
<span class="nt"></appender></span>
<span class="nt"></appender></span>
<span class="nt"></root></span>
<span class="nt"></configuration></span>
</code></pre></div></div>
<p>Anyhoo.</p>
<p>Adding the direct API means that there is a fallback position, but I found that it was still very fiddly. <a href="https://github.com/tersesystems/echopraxia#filters">Filters</a> and other features that depend on composing loggers are not available in SLF4J. Aggregating multiple markers is awkward, even leveraging implicit conversion.</p>
<p>The second option was to sidestep the <code class="language-plaintext highlighter-rouge">LoggingAdapter</code> altogether and extend Akka's models with a structured logging API. There are two models in Akka: actors and streams, and they each have their own approach.</p>
<h2 id="field-builders">Field Builders</h2>
<p>The first goal was to provide values for Akka components. The plan was to create structured output that would correspond to the <code class="language-plaintext highlighter-rouge">toString</code> debug output. But Akka components such as <code class="language-plaintext highlighter-rouge">ActorSystem</code> and <code class="language-plaintext highlighter-rouge">ActorPath</code> make heavy use of internal APIs that are only accessible under the <code class="language-plaintext highlighter-rouge">akka</code> package. Solution: define the package as <code class="language-plaintext highlighter-rouge">akka.echopraxia</code> to open up the API.</p>
<p>First, a pure trait so you can provide your mapping:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="nn">akka.echopraxia.actor</span>
<span class="k">trait</span> <span class="nc">AkkaFieldBuilder</span> <span class="k">extends</span> <span class="nc">FieldBuilder</span> <span class="o">{</span>
<span class="k">implicit</span> <span class="k">def</span> <span class="n">byteStringToValue</span><span class="k">:</span> <span class="kt">ToValue</span><span class="o">[</span><span class="kt">ByteString</span><span class="o">]</span>
<span class="k">implicit</span> <span class="k">def</span> <span class="n">addressToValue</span><span class="k">:</span> <span class="kt">ToValue</span><span class="o">[</span><span class="kt">akka.actor.Address</span><span class="o">]</span>
<span class="k">implicit</span> <span class="k">def</span> <span class="n">actorRefToValue</span><span class="k">:</span> <span class="kt">ToValue</span><span class="o">[</span><span class="kt">akka.actor.ActorRef</span><span class="o">]</span>
<span class="k">implicit</span> <span class="k">def</span> <span class="n">actorPathToValue</span><span class="k">:</span> <span class="kt">ToValue</span><span class="o">[</span><span class="kt">akka.actor.ActorPath</span><span class="o">]</span>
<span class="k">implicit</span> <span class="k">def</span> <span class="n">actorSystemToValue</span><span class="k">:</span> <span class="kt">ToValue</span><span class="o">[</span><span class="kt">akka.actor.ActorSystem</span><span class="o">]</span>
<span class="c1">// ...
</span><span class="o">}</span>
</code></pre></div></div>
<p>and then some default implementations:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="nc">DefaultAkkaFieldBuilder</span> <span class="k">extends</span> <span class="nc">AkkaFieldBuilder</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">implicit</span> <span class="k">val</span> <span class="n">byteStringToValue</span><span class="k">:</span> <span class="kt">ToValue</span><span class="o">[</span><span class="kt">ByteString</span><span class="o">]</span> <span class="k">=</span> <span class="n">bs</span> <span class="k">=></span> <span class="nc">ToObjectValue</span><span class="o">(</span>
<span class="n">keyValue</span><span class="o">(</span><span class="s">"length"</span> <span class="o">-></span> <span class="n">bs</span><span class="o">.</span><span class="n">length</span><span class="o">),</span>
<span class="n">keyValue</span><span class="o">(</span><span class="s">"utf8String"</span> <span class="o">-></span> <span class="n">bs</span><span class="o">.</span><span class="n">utf8String</span><span class="o">)</span>
<span class="o">)</span>
<span class="k">override</span> <span class="k">implicit</span> <span class="k">val</span> <span class="n">addressToValue</span><span class="k">:</span> <span class="kt">ToValue</span><span class="o">[</span><span class="kt">akka.actor.Address</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span> <span class="n">address</span> <span class="k">=></span>
<span class="nc">ToValue</span><span class="o">(</span><span class="n">address</span><span class="o">.</span><span class="n">toString</span><span class="o">)</span>
<span class="o">}</span>
<span class="k">override</span> <span class="k">implicit</span> <span class="k">val</span> <span class="n">actorRefToValue</span><span class="k">:</span> <span class="kt">ToValue</span><span class="o">[</span><span class="kt">akka.actor.ActorRef</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span> <span class="n">actorRef</span> <span class="k">=></span>
<span class="nc">ToValue</span><span class="o">(</span><span class="n">actorRef</span><span class="o">.</span><span class="n">path</span><span class="o">)</span>
<span class="o">}</span>
<span class="k">override</span> <span class="k">implicit</span> <span class="k">val</span> <span class="n">actorPathToValue</span><span class="k">:</span> <span class="kt">ToValue</span><span class="o">[</span><span class="kt">akka.actor.ActorPath</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span> <span class="n">actorPath</span> <span class="k">=></span>
<span class="nc">ToObjectValue</span><span class="o">(</span>
<span class="n">keyValue</span><span class="o">(</span><span class="s">"address"</span> <span class="o">-></span> <span class="n">actorPath</span><span class="o">.</span><span class="n">address</span><span class="o">),</span>
<span class="n">keyValue</span><span class="o">(</span><span class="s">"name"</span> <span class="o">-></span> <span class="n">actorPath</span><span class="o">.</span><span class="n">name</span><span class="o">),</span>
<span class="n">keyValue</span><span class="o">(</span><span class="s">"uid"</span> <span class="o">-></span> <span class="n">actorPath</span><span class="o">.</span><span class="n">uid</span><span class="o">)</span>
<span class="o">)</span>
<span class="o">}</span>
<span class="k">override</span> <span class="k">implicit</span> <span class="k">def</span> <span class="n">actorSystemToValue</span><span class="k">:</span> <span class="kt">ToValue</span><span class="o">[</span><span class="kt">akka.actor.ActorSystem</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span> <span class="n">actorSystem</span> <span class="k">=></span>
<span class="nc">ToObjectValue</span><span class="o">(</span>
<span class="n">keyValue</span><span class="o">(</span><span class="s">"system"</span> <span class="o">-></span> <span class="n">actorSystem</span><span class="o">.</span><span class="n">name</span><span class="o">),</span>
<span class="n">keyValue</span><span class="o">(</span><span class="s">"startTime"</span> <span class="o">-></span> <span class="n">actorSystem</span><span class="o">.</span><span class="n">startTime</span><span class="o">),</span>
<span class="o">)</span>
<span class="o">}</span>
<span class="c1">// ...
</span><span class="o">}</span>
</code></pre></div></div>
<p>So far, so good.</p>
<h2 id="akka-actors">Akka Actors</h2>
<p>There are two Scala APIs for Akka actors, typed actors and "classic" untyped actors. The <a href="https://doc.akka.io/docs/akka/current/typed/logging.html">logging in Akka Typed</a>, and the <a href="https://doc.akka.io/docs/akka/current/logging.html">logging in Akka Classic</a> are a little different, but they both provide additional context in the form of MDC.</p>
<p>Next: create an echopraxia equivalent to <code class="language-plaintext highlighter-rouge">ActorLogging</code>. This is a bit complicated, because an echopraxia logger needs a field builder, and that means that an actor has to be able to provide it. That's okay – we can define the field builder requirement by adding <code class="language-plaintext highlighter-rouge">AkkaFieldBuilderProvider</code> and <code class="language-plaintext highlighter-rouge">DefaultAkkaFieldBuilderProvider</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="nc">AkkaFieldBuilderProvider</span> <span class="o">{</span>
<span class="k">type</span> <span class="kt">FieldBuilderType</span> <span class="k"><:</span> <span class="kt">AkkaFieldBuilder</span>
<span class="k">protected</span> <span class="k">def</span> <span class="n">fieldBuilder</span><span class="k">:</span> <span class="kt">FieldBuilderType</span>
<span class="o">}</span>
<span class="k">trait</span> <span class="nc">DefaultAkkaFieldBuilderProvider</span> <span class="k">extends</span> <span class="nc">AkkaFieldBuilderProvider</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">type</span> <span class="kt">FieldBuilderType</span> <span class="o">=</span> <span class="nc">DefaultAkkaFieldBuilder</span><span class="o">.</span><span class="k">type</span>
<span class="kt">override</span> <span class="kt">protected</span> <span class="kt">def</span> <span class="kt">fieldBuilder:</span> <span class="kt">FieldBuilderType</span> <span class="o">=</span> <span class="nc">DefaultAkkaFieldBuilder</span>
<span class="o">}</span>
</code></pre></div></div>
<p>and then use a <a href="https://docs.scala-lang.org/tour/self-types.html">self type</a> to create a logger using the given field builder:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="nn">akka.echopraxia.actor</span>
<span class="k">trait</span> <span class="nc">ActorLogging</span> <span class="o">{</span>
<span class="k">this:</span> <span class="kt">Actor</span> <span class="kt">with</span> <span class="kt">AkkaFieldBuilderProvider</span> <span class="o">=></span>
<span class="k">protected</span> <span class="k">val</span> <span class="n">log</span><span class="k">:</span> <span class="kt">Logger</span><span class="o">[</span><span class="kt">FieldBuilderType</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
<span class="o">}</span>
</code></pre></div></div>
<p>And that opens the door to actors with an Echopraxia logger:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="nc">LoggingActor</span> <span class="k">extends</span> <span class="nc">Actor</span> <span class="k">with</span> <span class="nc">ActorLogging</span> <span class="k">with</span> <span class="nc">DefaultAkkaFieldBuilderProvider</span>
<span class="k">class</span> <span class="nc">MyActor</span> <span class="k">extends</span> <span class="nc">LoggingActor</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="n">preRestart</span><span class="o">(</span><span class="n">reason</span><span class="k">:</span> <span class="kt">Throwable</span><span class="o">,</span> <span class="n">message</span><span class="k">:</span> <span class="kt">Option</span><span class="o">[</span><span class="kt">Any</span><span class="o">])</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="o">{</span>
<span class="n">log</span><span class="o">.</span><span class="n">error</span><span class="o">(</span><span class="s">"Restarting due to [{}] when processing [{}]"</span><span class="o">,</span> <span class="n">fb</span> <span class="k">=></span> <span class="n">fb</span><span class="o">.</span><span class="n">list</span><span class="o">(</span>
<span class="n">fb</span><span class="o">.</span><span class="n">exception</span><span class="o">(</span><span class="n">reason</span><span class="o">),</span>
<span class="n">fb</span><span class="o">.</span><span class="n">string</span><span class="o">(</span><span class="s">"message"</span> <span class="o">-></span> <span class="n">message</span><span class="o">.</span><span class="n">toString</span><span class="o">),</span>
<span class="n">fb</span><span class="o">.</span><span class="n">keyValue</span><span class="o">(</span><span class="s">"self"</span> <span class="o">-></span> <span class="n">self</span><span class="o">.</span><span class="n">path</span><span class="o">)</span>
<span class="o">))</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>This is pretty straightforward and almost transparent.</p>
<p>Next, we need an implicit <code class="language-plaintext highlighter-rouge">EchopraxiaLoggingAdapter</code> for our version of <code class="language-plaintext highlighter-rouge">LoggingReceive</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="nc">EchopraxiaLoggingAdapter</span><span class="o">[</span><span class="kt">FB</span><span class="o">]</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">core</span><span class="k">:</span> <span class="kt">CoreLogger</span>
<span class="k">def</span> <span class="n">fieldBuilder</span><span class="k">:</span> <span class="kt">FB</span>
<span class="o">}</span>
<span class="k">object</span> <span class="nc">EchopraxiaLoggingAdapter</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">apply</span><span class="o">[</span><span class="kt">FB</span><span class="o">](</span><span class="n">logger</span><span class="k">:</span> <span class="kt">Logger</span><span class="o">[</span><span class="kt">FB</span><span class="o">])</span><span class="k">:</span> <span class="kt">EchopraxiaLoggingAdapter</span><span class="o">[</span><span class="kt">FB</span><span class="o">]</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">EchopraxiaLoggingAdapter</span><span class="o">[</span><span class="kt">FB</span><span class="o">]</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="n">core</span><span class="k">:</span> <span class="kt">CoreLogger</span> <span class="o">=</span> <span class="n">logger</span><span class="o">.</span><span class="n">core</span>
<span class="k">override</span> <span class="k">def</span> <span class="n">fieldBuilder</span><span class="k">:</span> <span class="kt">FB</span> <span class="o">=</span> <span class="n">logger</span><span class="o">.</span><span class="n">fieldBuilder</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>And we're good to go. <code class="language-plaintext highlighter-rouge">LoggingReceive</code> will take an <code class="language-plaintext highlighter-rouge">Any</code> because it's untyped, so that gets rendered with a standard <code class="language-plaintext highlighter-rouge">toString</code>.</p>
<h2 id="akka-typed">Akka Typed</h2>
<p>For Akka Typed, it's pretty much the same thing. We'll need to render <code class="language-plaintext highlighter-rouge">AkkaTypedFieldBuilder</code> similar to the <code class="language-plaintext highlighter-rouge">AkkaFieldBuilder</code>, but working with typed <code class="language-plaintext highlighter-rouge">Behavior[T]</code> means that we can require that the message passed through has a <code class="language-plaintext highlighter-rouge">fieldBuilder.ToValue</code> defined on it, which is easiest to do on the logger itself through implicits rather than <code class="language-plaintext highlighter-rouge">Behaviors.logMessages</code>.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="n">logger</span> <span class="k">=</span> <span class="nc">LoggerFactory</span><span class="o">.</span><span class="n">getLogger</span><span class="o">.</span><span class="n">withFieldBuilder</span><span class="o">(</span><span class="nc">MyFieldBuilder</span><span class="o">).</span><span class="n">withActorContext</span><span class="o">(</span><span class="n">context</span><span class="o">)</span>
<span class="c1">// Then log SayHello messages
</span><span class="n">logger</span><span class="o">.</span><span class="n">debugMessages</span><span class="o">[</span><span class="kt">SayHello</span><span class="o">]</span> <span class="o">{</span>
<span class="nc">Behaviors</span><span class="o">.</span><span class="n">receiveMessage</span> <span class="o">{</span> <span class="n">message</span> <span class="k">=></span>
<span class="k">val</span> <span class="n">replyTo</span> <span class="k">=</span> <span class="n">context</span><span class="o">.</span><span class="n">spawn</span><span class="o">(</span><span class="nc">GreeterBot</span><span class="o">(</span><span class="n">max</span> <span class="k">=</span> <span class="mi">3</span><span class="o">),</span> <span class="n">message</span><span class="o">.</span><span class="n">name</span><span class="o">)</span>
<span class="n">greeter</span> <span class="o">!</span> <span class="nc">Greeter</span><span class="o">.</span><span class="nc">Greet</span><span class="o">(</span><span class="n">message</span><span class="o">.</span><span class="n">name</span><span class="o">,</span> <span class="n">replyTo</span><span class="o">)</span>
<span class="nc">Behaviors</span><span class="o">.</span><span class="n">same</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Okay, but how does that work?</p>
<p>Well, there's a <code class="language-plaintext highlighter-rouge">Behaviors.intercept</code> method that does what we want. This isn't in the documentation, but it is is part of the <a href="https://github.com/akka/akka/blob/v2.6.20/akka-actor-typed/src/main/scala/akka/actor/typed/scaladsl/Behaviors.scala#L159">public API</a> so we can use it and pass it to the <code class="language-plaintext highlighter-rouge">LogMessagesInterceptor</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">object</span> <span class="nc">Implicits</span> <span class="o">{</span>
<span class="k">implicit</span> <span class="k">class</span> <span class="nc">AkkaLoggerOps</span><span class="o">[</span><span class="kt">FB</span> <span class="k"><:</span> <span class="kt">AkkaTypedFieldBuilder</span><span class="o">](</span><span class="n">logger</span><span class="k">:</span> <span class="kt">Logger</span><span class="o">[</span><span class="kt">FB</span><span class="o">])</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">debugMessages</span><span class="o">[</span><span class="kt">T:</span> <span class="kt">ToValue</span> <span class="kt">:</span> <span class="kt">ClassTag</span><span class="o">](</span><span class="n">behavior</span><span class="k">:</span> <span class="kt">Behavior</span><span class="o">[</span><span class="kt">T</span><span class="o">])</span><span class="k">:</span> <span class="kt">Behavior</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span>
<span class="nc">Behaviors</span><span class="o">.</span><span class="n">intercept</span><span class="o">(()</span> <span class="k">=></span> <span class="k">new</span> <span class="nc">LogMessagesInterceptor</span><span class="o">[</span><span class="kt">T</span><span class="o">](</span><span class="nc">Level</span><span class="o">.</span><span class="nc">DEBUG</span><span class="o">,</span> <span class="n">logger</span><span class="o">))(</span><span class="n">behavior</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">class</span> <span class="nc">LogMessagesInterceptor</span><span class="o">[</span><span class="kt">T:</span> <span class="kt">ToValue</span> <span class="kt">:</span> <span class="kt">ClassTag</span><span class="o">](</span><span class="k">val</span> <span class="n">level</span><span class="k">:</span> <span class="kt">Level</span><span class="o">,</span> <span class="n">logger</span><span class="k">:</span> <span class="kt">Logger</span><span class="o">[</span><span class="kt">FB</span><span class="o">])</span> <span class="k">extends</span> <span class="nc">BehaviorInterceptor</span><span class="o">[</span><span class="kt">T</span>, <span class="kt">T</span><span class="o">]</span> <span class="o">{</span>
<span class="k">import</span> <span class="nn">LogMessagesInterceptor._</span>
<span class="k">override</span> <span class="k">def</span> <span class="n">aroundReceive</span><span class="o">(</span><span class="n">ctx</span><span class="k">:</span> <span class="kt">TypedActorContext</span><span class="o">[</span><span class="kt">T</span><span class="o">],</span> <span class="n">msg</span><span class="k">:</span> <span class="kt">T</span><span class="o">,</span> <span class="n">target</span><span class="k">:</span> <span class="kt">ReceiveTarget</span><span class="o">[</span><span class="kt">T</span><span class="o">])</span><span class="k">:</span> <span class="kt">Behavior</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="n">log</span><span class="o">(</span><span class="nc">LogMessageTemplate</span><span class="o">,</span> <span class="n">fb</span> <span class="k">=></span> <span class="o">{</span>
<span class="k">import</span> <span class="nn">fb._</span>
<span class="n">list</span><span class="o">(</span>
<span class="n">value</span><span class="o">(</span><span class="s">"self"</span> <span class="o">-></span> <span class="n">ctx</span><span class="o">.</span><span class="n">asScala</span><span class="o">.</span><span class="n">self</span><span class="o">),</span>
<span class="n">value</span><span class="o">(</span><span class="s">"message"</span> <span class="o">-></span> <span class="n">msg</span><span class="o">)</span>
<span class="o">)</span>
<span class="o">})</span>
<span class="n">target</span><span class="o">(</span><span class="n">ctx</span><span class="o">,</span> <span class="n">msg</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>If it doesn't have a <code class="language-plaintext highlighter-rouge">fieldBuilder.ToValue</code>, it doesn't compile, and all output uses the given field builder instead of a global unstructured <code class="language-plaintext highlighter-rouge">toString</code>. Finally, at long last.</p>
<h2 id="akka-streams">Akka Streams</h2>
<p>Integrating Echopraxia with Akka Streams is a bit different, as it involves type enrichment on the <code class="language-plaintext highlighter-rouge">SourceOps</code> and <code class="language-plaintext highlighter-rouge">FlowOps</code> methods (and their <code class="language-plaintext highlighter-rouge">Context</code> equivalents). This follows from the <a href="https://doc.akka.io/docs/akka/current/stream/stream-customize.html#extending-flow-operators-with-custom-operators">custom operator</a> suggestions given in the docs:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="nc">Implicits</span> <span class="o">{</span>
<span class="k">implicit</span> <span class="k">class</span> <span class="nc">SourceLogging</span><span class="o">[</span><span class="kt">Out</span>, <span class="kt">Mat</span><span class="o">](</span><span class="n">s</span><span class="k">:</span> <span class="kt">Source</span><span class="o">[</span><span class="kt">Out</span>, <span class="kt">Mat</span><span class="o">])</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">elog</span><span class="o">[</span><span class="kt">FB</span> <span class="k"><:</span> <span class="kt">AkkaStreamFieldBuilder</span><span class="o">](</span><span class="k">implicit</span> <span class="n">log</span><span class="k">:</span> <span class="kt">EchopraxiaLoggingAdapter</span><span class="o">[</span><span class="kt">FB</span><span class="o">])</span><span class="k">:</span> <span class="kt">SourceLoggingStage</span><span class="o">[</span><span class="kt">FB</span>, <span class="kt">Out</span>, <span class="kt">Mat</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">new</span> <span class="nc">SourceLoggingStage</span><span class="o">(</span><span class="n">s</span><span class="o">,</span> <span class="n">log</span><span class="o">.</span><span class="n">core</span><span class="o">,</span> <span class="n">log</span><span class="o">.</span><span class="n">fieldBuilder</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">implicit</span> <span class="k">class</span> <span class="nc">SourceWithContextLogging</span><span class="o">[</span><span class="kt">Out</span>, <span class="kt">Ctx</span>, <span class="kt">Mat</span><span class="o">](</span><span class="n">s</span><span class="k">:</span> <span class="kt">SourceWithContext</span><span class="o">[</span><span class="kt">Out</span>, <span class="kt">Ctx</span>, <span class="kt">Mat</span><span class="o">])</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">elog</span><span class="o">[</span><span class="kt">FB</span> <span class="k"><:</span> <span class="kt">AkkaStreamFieldBuilder</span><span class="o">](</span><span class="k">implicit</span> <span class="n">log</span><span class="k">:</span> <span class="kt">EchopraxiaLoggingAdapter</span><span class="o">[</span><span class="kt">FB</span><span class="o">])</span><span class="k">:</span> <span class="kt">SourceWithContextLoggingStage</span><span class="o">[</span><span class="kt">FB</span>, <span class="kt">Out</span>, <span class="kt">Ctx</span>, <span class="kt">Mat</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">new</span> <span class="nc">SourceWithContextLoggingStage</span><span class="o">(</span><span class="n">s</span><span class="o">,</span> <span class="n">log</span><span class="o">.</span><span class="n">core</span><span class="o">,</span> <span class="n">log</span><span class="o">.</span><span class="n">fieldBuilder</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">implicit</span> <span class="k">class</span> <span class="nc">FlowLogging</span><span class="o">[</span><span class="kt">In</span>, <span class="kt">Out</span>, <span class="kt">Mat</span><span class="o">](</span><span class="n">f</span><span class="k">:</span> <span class="kt">Flow</span><span class="o">[</span><span class="kt">In</span>, <span class="kt">Out</span>, <span class="kt">Mat</span><span class="o">])</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">elog</span><span class="o">[</span><span class="kt">FB</span> <span class="k"><:</span> <span class="kt">AkkaStreamFieldBuilder</span><span class="o">](</span><span class="k">implicit</span> <span class="n">log</span><span class="k">:</span> <span class="kt">EchopraxiaLoggingAdapter</span><span class="o">[</span><span class="kt">FB</span><span class="o">])</span><span class="k">:</span> <span class="kt">FlowLoggingStage</span><span class="o">[</span><span class="kt">FB</span>, <span class="kt">In</span>, <span class="kt">Out</span>, <span class="kt">Mat</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">new</span> <span class="nc">FlowLoggingStage</span><span class="o">(</span><span class="n">f</span><span class="o">,</span> <span class="n">log</span><span class="o">.</span><span class="n">core</span><span class="o">,</span> <span class="n">log</span><span class="o">.</span><span class="n">fieldBuilder</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">implicit</span> <span class="k">class</span> <span class="nc">FlowWithContextLogging</span><span class="o">[</span><span class="kt">In</span>, <span class="kt">Out</span>, <span class="kt">Ctx</span>, <span class="kt">Mat</span><span class="o">](</span><span class="n">flow</span><span class="k">:</span> <span class="kt">FlowWithContext</span><span class="o">[</span><span class="kt">In</span>, <span class="kt">Ctx</span>, <span class="kt">Out</span>, <span class="kt">Ctx</span>, <span class="kt">Mat</span><span class="o">])</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">elog</span><span class="o">[</span><span class="kt">FB</span> <span class="k"><:</span> <span class="kt">AkkaStreamFieldBuilder</span><span class="o">](</span><span class="k">implicit</span> <span class="n">log</span><span class="k">:</span> <span class="kt">EchopraxiaLoggingAdapter</span><span class="o">[</span><span class="kt">FB</span><span class="o">])</span><span class="k">:</span> <span class="kt">FlowWithContextLoggingStage</span><span class="o">[</span><span class="kt">FB</span>, <span class="kt">In</span>, <span class="kt">Out</span>, <span class="kt">Ctx</span>, <span class="kt">Mat</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">new</span> <span class="nc">FlowWithContextLoggingStage</span><span class="o">(</span><span class="n">flow</span><span class="o">,</span> <span class="n">log</span><span class="o">.</span><span class="n">core</span><span class="o">,</span> <span class="n">log</span><span class="o">.</span><span class="n">fieldBuilder</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>Because the <code class="language-plaintext highlighter-rouge">SourceLoggingStage</code> is not a <code class="language-plaintext highlighter-rouge">SourceOps</code> or <code class="language-plaintext highlighter-rouge">FlowOps</code> itself, it does require an <code class="language-plaintext highlighter-rouge">elog.info("name")</code> to close the loop. This takes out the implicit <code class="language-plaintext highlighter-rouge">LoggingOptions</code> (which I really don't like) and allows for <code class="language-plaintext highlighter-rouge">elog.withCondition</code> and <code class="language-plaintext highlighter-rouge">elog.withFields</code> similar to the <code class="language-plaintext highlighter-rouge">Logger</code> API.</p>
<p>Each logging stage class breaks down into a call to <code class="language-plaintext highlighter-rouge">EchopraxiaLog</code>, which has structured logging for the graph stage and exposes the graph stage operation as <code class="language-plaintext highlighter-rouge">operationKey</code>.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">final</span> <span class="k">case</span> <span class="k">class</span> <span class="nc">EchopraxiaLog</span><span class="o">[</span><span class="kt">FB</span> <span class="k"><:</span> <span class="kt">AkkaStreamFieldBuilder</span>, <span class="kt">T</span><span class="o">](</span><span class="n">extract</span><span class="k">:</span> <span class="o">(</span><span class="kt">FB</span><span class="o">,</span> <span class="kt">T</span><span class="o">)</span> <span class="k">=></span> <span class="nc">Field</span><span class="o">)</span>
<span class="k">extends</span> <span class="nc">SimpleLinearGraphStage</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="n">createLogic</span><span class="o">(</span><span class="n">inheritedAttributes</span><span class="k">:</span> <span class="kt">Attributes</span><span class="o">)</span><span class="k">:</span> <span class="kt">GraphStageLogic</span> <span class="o">=</span>
<span class="k">new</span> <span class="nc">GraphStageLogic</span><span class="o">(</span><span class="n">shape</span><span class="o">)</span> <span class="k">with</span> <span class="nc">OutHandler</span> <span class="k">with</span> <span class="nc">InHandler</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">decider</span><span class="k">:</span> <span class="kt">Decider</span> <span class="o">=</span> <span class="n">inheritedAttributes</span><span class="o">.</span><span class="n">mandatoryAttribute</span><span class="o">[</span><span class="kt">SupervisionStrategy</span><span class="o">].</span><span class="n">decider</span>
<span class="k">override</span> <span class="k">def</span> <span class="n">onPush</span><span class="o">()</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="o">{</span>
<span class="k">try</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">elem</span> <span class="k">=</span> <span class="n">grab</span><span class="o">(</span><span class="n">in</span><span class="o">)</span>
<span class="n">log</span><span class="o">.</span><span class="n">log</span><span class="o">(</span><span class="n">level</span><span class="o">,</span> <span class="s">"[{}]: {} {}"</span><span class="o">,</span> <span class="o">(</span><span class="n">fb</span><span class="k">:</span> <span class="kt">FB</span><span class="o">)</span> <span class="k">=></span> <span class="n">fb</span><span class="o">.</span><span class="n">list</span><span class="o">(</span>
<span class="n">fb</span><span class="o">.</span><span class="n">string</span><span class="o">(</span><span class="n">nameKey</span><span class="o">,</span> <span class="n">name</span><span class="o">),</span>
<span class="n">fb</span><span class="o">.</span><span class="n">string</span><span class="o">(</span><span class="n">operationKey</span><span class="o">,</span> <span class="s">"push"</span><span class="o">),</span>
<span class="n">extract</span><span class="o">(</span><span class="n">fb</span><span class="o">,</span> <span class="n">elem</span><span class="o">)</span>
<span class="o">),</span> <span class="n">fieldBuilder</span><span class="o">)</span>
<span class="n">push</span><span class="o">(</span><span class="n">out</span><span class="o">,</span> <span class="n">elem</span><span class="o">)</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">NonFatal</span><span class="o">(</span><span class="n">ex</span><span class="o">)</span> <span class="k">=></span>
<span class="n">decider</span><span class="o">(</span><span class="n">ex</span><span class="o">)</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">Supervision</span><span class="o">.</span><span class="nc">Stop</span> <span class="k">=></span> <span class="n">failStage</span><span class="o">(</span><span class="n">ex</span><span class="o">)</span>
<span class="k">case</span> <span class="k">_</span> <span class="k">=></span> <span class="n">pull</span><span class="o">(</span><span class="n">in</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="c1">// ...
</span> <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">extract</code> method means that you can decide how you want to render the field, using <code class="language-plaintext highlighter-rouge">fb.value</code> or <code class="language-plaintext highlighter-rouge">keyValue</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">val</span> <span class="n">s</span> <span class="k">=</span> <span class="nc">Source</span><span class="o">(</span><span class="mi">1</span> <span class="n">to</span> <span class="mi">4</span><span class="o">)</span>
<span class="o">.</span><span class="n">elog</span>
<span class="o">.</span><span class="n">withCondition</span><span class="o">(</span><span class="n">condition</span><span class="o">)</span>
<span class="o">.</span><span class="n">withFields</span><span class="o">(</span><span class="n">fb</span> <span class="k">=></span> <span class="n">fb</span><span class="o">.</span><span class="n">keyValue</span><span class="o">(</span><span class="s">"foo"</span><span class="o">,</span> <span class="s">"bar"</span><span class="o">))</span>
<span class="o">.</span><span class="n">info</span><span class="o">(</span><span class="s">"before"</span><span class="o">,</span> <span class="o">(</span><span class="n">fb</span><span class="o">,</span> <span class="n">el</span><span class="o">)</span> <span class="k">=></span> <span class="n">fb</span><span class="o">.</span><span class="n">keyValue</span><span class="o">(</span><span class="s">"elem"</span><span class="o">,</span> <span class="n">el</span><span class="o">))</span>
</code></pre></div></div>
<p>One of my hopes was that I could represent a <code class="language-plaintext highlighter-rouge">Flow</code> as a tree through structured logging. This doesn't seem possible from what I can tell, because streams are "traversable" and a traversal builder doesn't have a complete picture of the flow before it's built – I can render the inlets and outlets, but it's not the same. It's the same problem with functions – referential transparency means that you have to use something like <a href="https://typelevel.org/blog/2013/10/18/treelog.html">treelog</a> to describe a computation.</p>
<h2 id="unexpected-wins">Unexpected Wins</h2>
<p>It's really nice to see how the early bets have paid off. One unexpected win was being able to resolve <code class="language-plaintext highlighter-rouge">ByteString</code> as a structured <code class="language-plaintext highlighter-rouge">utf8</code> string instead of an array of bytes. The default implementation contains both bytes length and utf8:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">override</span> <span class="k">implicit</span> <span class="k">val</span> <span class="n">byteStringToValue</span><span class="k">:</span> <span class="kt">ToValue</span><span class="o">[</span><span class="kt">ByteString</span><span class="o">]</span> <span class="k">=</span> <span class="n">bs</span> <span class="k">=></span> <span class="nc">ToObjectValue</span><span class="o">(</span>
<span class="n">keyValue</span><span class="o">(</span><span class="s">"length"</span> <span class="o">-></span> <span class="n">bs</span><span class="o">.</span><span class="n">length</span><span class="o">),</span>
<span class="n">keyValue</span><span class="o">(</span><span class="s">"utf8String"</span> <span class="o">-></span> <span class="n">bs</span><span class="o">.</span><span class="n">utf8String</span><span class="o">)</span>
<span class="o">)</span>
</code></pre></div></div>
<p>Being able to render ByteString as either binary focused or text focused depending on what field builder is passed in is great.</p>
<p>Another win is the <a href="https://github.com/tersesystems/echopraxia-plusscala#automatic-type-class-derivation">automatic type class derivation</a> – since messages are typically case classes, it's trivial to add mappings in field builders for them.</p>TL;DR I released echopraxia-plusakka, a library that integrates Echopraxia with Akka's component system, which also resulted in adding a "direct" API to echopraxia based off SLF4J markers. It's been a minute. After releasing the Scala API for Echopraxia and writing it up, I've been working my way up the chain and trying to exploit/break the API with progressively more demanding use cases. Akka has been a personal favorite testing ground of mine. Akka is deeply concurrent, and as such using a debugger is nearly pointless – even if you add breakpoints, you'll trip over timeouts if you take too long to return a message. As such there's really only two reliable ways to debug and observe Akka code. Unit tests… and logging. So. The task I set myself was to add structured logging to Akka. I already had an advantage in that I'm familiar with Akka internals, and in the end it was fairly straightforward with only a couple of surprises. Akka Logging Akka's logging depends on an underlying LoggingAdapter which goes through an event bus to akka-slf4j. The first obstacle to adding structured logging is that the MarkerLoggingAdapter serializes arguments into a String before publishing it to the event bus, using formatN to convert arguments. class MarkerLoggingAdapter extends BusLogging { def error(marker: LogMarker, cause: Throwable, template: String, arg1: Any): Unit = if (isErrorEnabled(marker)) bus.publish(Error(cause, logSource, logClass, format1(template, arg1), // returns `String` mdc, marker)) } Because MarkerLoggingAdapter converts arguments to String eagerly, any arguments that pass through Akka's logging will be flattened and can't be retrieved later – there is no Error(msg, arg1) passed through to the event bus and then to Slf4jLogger. It is still possible to pass through structured data though! Because the MarkerLoggingAdapter passes through LogMarker then using SLF4JLogMarker will pass along an org.slf4j.Marker through to Logback, and we can piggyback information on the way. This led me to think about using Echopraxia directly against SLF4J. Direct API Echopraxia does allow you to log using an org.slf4j.Logger directly for simple cases. For example, arguments work fine: FieldBuilder fb = FieldBuilder.instance(); org.slf4j.Logger slf4jLogger = org.slf4j.LoggerFactory.getLogger("com.example.Main"); slf4jLogger.info("SLF4J message {}", fb.string("foo", "bar")); Although as exceptions in SLF4J get "eaten" if they have a template placeholder, if you want to keep the exception, you need to pass it in twice: slf4jLogger.error("SLF4J exception {}", fb.exception(e), e); However, conditions and context fields do not exist in the SLF4J API. If we want to use SLF4J, then it's time to fake it with markers. I added some direct API features to Echopraxia. Using the direct API, context fields can be represented by FieldMarker, and conditions by ConditionMarker. This passes information through to the backend appropriately. import com.tersesystems.echopraxia.logback.*; import net.logstash.logback.marker.Markers; FieldBuilder fb = FieldBuilder.instance(); FieldMarker fields = FieldMarker.apply( fb.list( fb.string("sessionId", "value"), fb.number("correlationId", 1) ) ); ConditionMarker conditionMarker = ConditionMarker.apply( Condition.stringMatch("sessionId", s -> s.raw().equals("value"))) ); logger.info(Markers.aggregate(fieldMarker, conditionMarker), "condition and marker"); This is only half the story though – the condition still needs to be evaluated, and because that doesn't go through an Echopraxia logger, that means adding a ConditionTurboFilter: <configuration> <turboFilter class="com.tersesystems.echopraxia.logback.ConditionTurboFilter"/> </configuration> And then also when rendering JSON, we need to swap out the FieldMarker with actual event specific custom fields that logstash-logback-encoder will recognize, using LogstashFieldAppender. <configuration> <!-- ... --> <root level="INFO"> <!-- replaces fields with logstash markers and structured arguments --> <appender class="com.tersesystems.echopraxia.logstash.LogstashFieldAppender"> <appender class="ch.qos.logback.core.FileAppender"> <file>application.log</file> <encoder class="net.logstash.logback.encoder.LogstashEncoder"/> </appender> </appender> </root> </configuration> This… is a hack, and I don't love it. The problem here is that there is no central pipeline for creating and manipulating Logback's LoggingEvent. The turbo filter API will only let you return FilterReply and the actual creation of a LoggingEvent happens internally. So… if you want to tweak the logging event, you have to have an appender transform it, then pass it through to the appender's children. This is the approach used in composite appenders. This is complicated by Logback not officially supporting appender-ref for appenders themselves. You can add appender-ref from the root: <configuration> <!-- ... --> <root level="DEBUG"> <appender-ref ref="FILE" /> </root> </configuration> but even though it's perfectly legal to have appender children, to add appender-ref on appenders, you need to explicitly loosen the AppenderRefAction to match (which can cause complaints): <configuration> <!-- loosen the rule on appender refs so appenders can reference them --> <newRule pattern="*/appender/appender-ref" actionClass="ch.qos.logback.core.joran.action.AppenderRefAction"/> <!-- ... --> <appender name="CONSOLE_AND_FILE" class="com.tersesystems.logback.CompositeAppender"> <appender-ref ref="CONSOLE"/> <appender-ref ref="FILE"/> </appender> <root level="DEBUG"> <appender-ref ref="CONSOLE_AND_FILE" /> </root> </configuration> On a tangent, because Logback runs through appenders in sequence in the same thread, it's possible for synchronous appenders to block asynchronous appenders: <configuration> <root level="INFO"> <!-- this runs first in the executing thread --> <appender class="ch.qos.logback.core.FileAppender"> <!-- ... --> </appender> <!-- only gets the event after first appender... --> <appender class="net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"> <!-- ... --> </appender> </root> </configuration> As such, either you have multiple async appenders, or you wrap all the IO appenders inside a disruptor so you only have the overhead of one thread. This means that appenders can really serve three different roles: managing concurrency, event transformation, and IO sinks with encoders. <configuration> <root level="INFO"> <!-- immediately move off the rendering thread... --> <appender class="net.logstash.logback.appender.LoggingEventAsyncDisruptorAppender"> <!-- ...transform event in pipeline... --> <appender class="com.tersesystems.echopraxia.logstash.LogstashFieldAppender"> <!-- ...render to IO/Network/STDOUT --> <appender class="ch.qos.logback.core.ConsoleAppender"> <encoder> <pattern>[%-5level] %logger{15} - message%n%xException{10}</pattern> </encoder> </appender> <appender class="ch.qos.logback.core.FileAppender"> <file>application.log</file> <encoder class="net.logstash.logback.encoder.LogstashEncoder"/> </appender> </appender> </appender> </root> </configuration> Anyhoo. Adding the direct API means that there is a fallback position, but I found that it was still very fiddly. Filters and other features that depend on composing loggers are not available in SLF4J. Aggregating multiple markers is awkward, even leveraging implicit conversion. The second option was to sidestep the LoggingAdapter altogether and extend Akka's models with a structured logging API. There are two models in Akka: actors and streams, and they each have their own approach. Field Builders The first goal was to provide values for Akka components. The plan was to create structured output that would correspond to the toString debug output. But Akka components such as ActorSystem and ActorPath make heavy use of internal APIs that are only accessible under the akka package. Solution: define the package as akka.echopraxia to open up the API. First, a pure trait so you can provide your mapping: package akka.echopraxia.actor trait AkkaFieldBuilder extends FieldBuilder { implicit def byteStringToValue: ToValue[ByteString] implicit def addressToValue: ToValue[akka.actor.Address] implicit def actorRefToValue: ToValue[akka.actor.ActorRef] implicit def actorPathToValue: ToValue[akka.actor.ActorPath] implicit def actorSystemToValue: ToValue[akka.actor.ActorSystem] // ... } and then some default implementations: trait DefaultAkkaFieldBuilder extends AkkaFieldBuilder { override implicit val byteStringToValue: ToValue[ByteString] = bs => ToObjectValue( keyValue("length" -> bs.length), keyValue("utf8String" -> bs.utf8String) ) override implicit val addressToValue: ToValue[akka.actor.Address] = { address => ToValue(address.toString) } override implicit val actorRefToValue: ToValue[akka.actor.ActorRef] = { actorRef => ToValue(actorRef.path) } override implicit val actorPathToValue: ToValue[akka.actor.ActorPath] = { actorPath => ToObjectValue( keyValue("address" -> actorPath.address), keyValue("name" -> actorPath.name), keyValue("uid" -> actorPath.uid) ) } override implicit def actorSystemToValue: ToValue[akka.actor.ActorSystem] = { actorSystem => ToObjectValue( keyValue("system" -> actorSystem.name), keyValue("startTime" -> actorSystem.startTime), ) } // ... } So far, so good. Akka Actors There are two Scala APIs for Akka actors, typed actors and "classic" untyped actors. The logging in Akka Typed, and the logging in Akka Classic are a little different, but they both provide additional context in the form of MDC. Next: create an echopraxia equivalent to ActorLogging. This is a bit complicated, because an echopraxia logger needs a field builder, and that means that an actor has to be able to provide it. That's okay – we can define the field builder requirement by adding AkkaFieldBuilderProvider and DefaultAkkaFieldBuilderProvider: trait AkkaFieldBuilderProvider { type FieldBuilderType <: AkkaFieldBuilder protected def fieldBuilder: FieldBuilderType } trait DefaultAkkaFieldBuilderProvider extends AkkaFieldBuilderProvider { override type FieldBuilderType = DefaultAkkaFieldBuilder.type override protected def fieldBuilder: FieldBuilderType = DefaultAkkaFieldBuilder } and then use a self type to create a logger using the given field builder: package akka.echopraxia.actor trait ActorLogging { this: Actor with AkkaFieldBuilderProvider => protected val log: Logger[FieldBuilderType] = ... } And that opens the door to actors with an Echopraxia logger: trait LoggingActor extends Actor with ActorLogging with DefaultAkkaFieldBuilderProvider class MyActor extends LoggingActor { override def preRestart(reason: Throwable, message: Option[Any]): Unit = { log.error("Restarting due to [{}] when processing [{}]", fb => fb.list( fb.exception(reason), fb.string("message" -> message.toString), fb.keyValue("self" -> self.path) )) } } This is pretty straightforward and almost transparent. Next, we need an implicit EchopraxiaLoggingAdapter for our version of LoggingReceive: trait EchopraxiaLoggingAdapter[FB] { def core: CoreLogger def fieldBuilder: FB } object EchopraxiaLoggingAdapter { def apply[FB](logger: Logger[FB]): EchopraxiaLoggingAdapter[FB] = new EchopraxiaLoggingAdapter[FB] { override def core: CoreLogger = logger.core override def fieldBuilder: FB = logger.fieldBuilder } } And we're good to go. LoggingReceive will take an Any because it's untyped, so that gets rendered with a standard toString. Akka Typed For Akka Typed, it's pretty much the same thing. We'll need to render AkkaTypedFieldBuilder similar to the AkkaFieldBuilder, but working with typed Behavior[T] means that we can require that the message passed through has a fieldBuilder.ToValue defined on it, which is easiest to do on the logger itself through implicits rather than Behaviors.logMessages. val logger = LoggerFactory.getLogger.withFieldBuilder(MyFieldBuilder).withActorContext(context) // Then log SayHello messages logger.debugMessages[SayHello] { Behaviors.receiveMessage { message => val replyTo = context.spawn(GreeterBot(max = 3), message.name) greeter ! Greeter.Greet(message.name, replyTo) Behaviors.same } } Okay, but how does that work? Well, there's a Behaviors.intercept method that does what we want. This isn't in the documentation, but it is is part of the public API so we can use it and pass it to the LogMessagesInterceptor: object Implicits { implicit class AkkaLoggerOps[FB <: AkkaTypedFieldBuilder](logger: Logger[FB]) { def debugMessages[T: ToValue : ClassTag](behavior: Behavior[T]): Behavior[T] = Behaviors.intercept(() => new LogMessagesInterceptor[T](Level.DEBUG, logger))(behavior) } } class LogMessagesInterceptor[T: ToValue : ClassTag](val level: Level, logger: Logger[FB]) extends BehaviorInterceptor[T, T] { import LogMessagesInterceptor._ override def aroundReceive(ctx: TypedActorContext[T], msg: T, target: ReceiveTarget[T]): Behavior[T] = { log(LogMessageTemplate, fb => { import fb._ list( value("self" -> ctx.asScala.self), value("message" -> msg) ) }) target(ctx, msg) } } If it doesn't have a fieldBuilder.ToValue, it doesn't compile, and all output uses the given field builder instead of a global unstructured toString. Finally, at long last. Akka Streams Integrating Echopraxia with Akka Streams is a bit different, as it involves type enrichment on the SourceOps and FlowOps methods (and their Context equivalents). This follows from the custom operator suggestions given in the docs: trait Implicits { implicit class SourceLogging[Out, Mat](s: Source[Out, Mat]) { def elog[FB <: AkkaStreamFieldBuilder](implicit log: EchopraxiaLoggingAdapter[FB]): SourceLoggingStage[FB, Out, Mat] = { new SourceLoggingStage(s, log.core, log.fieldBuilder) } } implicit class SourceWithContextLogging[Out, Ctx, Mat](s: SourceWithContext[Out, Ctx, Mat]) { def elog[FB <: AkkaStreamFieldBuilder](implicit log: EchopraxiaLoggingAdapter[FB]): SourceWithContextLoggingStage[FB, Out, Ctx, Mat] = { new SourceWithContextLoggingStage(s, log.core, log.fieldBuilder) } } implicit class FlowLogging[In, Out, Mat](f: Flow[In, Out, Mat]) { def elog[FB <: AkkaStreamFieldBuilder](implicit log: EchopraxiaLoggingAdapter[FB]): FlowLoggingStage[FB, In, Out, Mat] = { new FlowLoggingStage(f, log.core, log.fieldBuilder) } } implicit class FlowWithContextLogging[In, Out, Ctx, Mat](flow: FlowWithContext[In, Ctx, Out, Ctx, Mat]) { def elog[FB <: AkkaStreamFieldBuilder](implicit log: EchopraxiaLoggingAdapter[FB]): FlowWithContextLoggingStage[FB, In, Out, Ctx, Mat] = { new FlowWithContextLoggingStage(flow, log.core, log.fieldBuilder) } } } Because the SourceLoggingStage is not a SourceOps or FlowOps itself, it does require an elog.info("name") to close the loop. This takes out the implicit LoggingOptions (which I really don't like) and allows for elog.withCondition and elog.withFields similar to the Logger API. Each logging stage class breaks down into a call to EchopraxiaLog, which has structured logging for the graph stage and exposes the graph stage operation as operationKey. final case class EchopraxiaLog[FB <: AkkaStreamFieldBuilder, T](extract: (FB, T) => Field) extends SimpleLinearGraphStage[T] { override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) with OutHandler with InHandler { def decider: Decider = inheritedAttributes.mandatoryAttribute[SupervisionStrategy].decider override def onPush(): Unit = { try { val elem = grab(in) log.log(level, "[{}]: {} {}", (fb: FB) => fb.list( fb.string(nameKey, name), fb.string(operationKey, "push"), extract(fb, elem) ), fieldBuilder) push(out, elem) } catch { case NonFatal(ex) => decider(ex) match { case Supervision.Stop => failStage(ex) case _ => pull(in) } } } // ... } } The extract method means that you can decide how you want to render the field, using fb.value or keyValue: val s = Source(1 to 4) .elog .withCondition(condition) .withFields(fb => fb.keyValue("foo", "bar")) .info("before", (fb, el) => fb.keyValue("elem", el)) One of my hopes was that I could represent a Flow as a tree through structured logging. This doesn't seem possible from what I can tell, because streams are "traversable" and a traversal builder doesn't have a complete picture of the flow before it's built – I can render the inlets and outlets, but it's not the same. It's the same problem with functions – referential transparency means that you have to use something like treelog to describe a computation. Unexpected Wins It's really nice to see how the early bets have paid off. One unexpected win was being able to resolve ByteString as a structured utf8 string instead of an array of bytes. The default implementation contains both bytes length and utf8: override implicit val byteStringToValue: ToValue[ByteString] = bs => ToObjectValue( keyValue("length" -> bs.length), keyValue("utf8String" -> bs.utf8String) ) Being able to render ByteString as either binary focused or text focused depending on what field builder is passed in is great. Another win is the automatic type class derivation – since messages are typically case classes, it's trivial to add mappings in field builders for them.