TLDR; The author argues that free-form logging is quite useless/expensive to use. They also argue that structured logging is less effective than tracing b/c of mainly the difficulty of inferring timelines and causality.


I find the arguments very plausible.

In fact I very rarely use logs produced by several services b/c most of the times they just confuse me. The only time that I heavily use logs is troubleshooting a single service and looking at its stdout (or kubectl log.)

However I have very little experience w/ tracing (I’ve used it in my hobby projects but, obviously, they never represent the reality of complex distributed systems.)

Have you got real world experience w/ tracing in larger systems? Care to share your take on the topic?

  • bahmanm@lemmy.mlOP
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 年前

    Thanks for sharing your insights.


    Thinking out loud here…

    In my experience with traditional logging and distributed systems, timestamps and request IDs do store the information required to partially reconstruct a timeline:

    • In the case of a linear (single branch) timeline you can always “query” by a request ID and order by the timestamps and that’s pretty much what tracing will do too.
    • Things, however, get complicated when you’ve a timeline w/ multiple branches.
      For example, consider the following relatively simple diagram.
      Reconstructing the causality and join/fork relations between the executions nodes is almost impossible using traditional logs whereas a tracing solution will turn this into a nice visual w/ all the spans and sub-spans.

    That said, logs do shine when things go wrong; when you start your investigation by using a stacktrace in the logs as a clue. That (stacktrace) is something that I’m not sure a tracing solution will be able to provide.


    they should complement each other

    Yes! You nailed it 💯

    Logs are indispensable for troubleshooting (and potentially nothing else) while tracers are great for, well, tracing the data/request throughout the system and analyse the mutations.