But The Data Will Survive • The Applied Go Weekly Newsletter 2024-12-22
Your weekly source of Go news, tips, and projects
But The Data Will Survive
Hi ,
welcome back! This is the first Holiday season issue with reduced content. As I announced in the previous issue, the newsletter takes sort of a winter break where I am not stopping the issues but sending them in a stripped-down form. From January 12th onwards, the newsletter will be back to full size and glory :-)
Things come to an end, things begin, and things go on
Let me start this issue off with drawing a silly line from the year's end to something else that ends, too. The Go Time podcast had its 340th and final episode on Changelog.com, due to Changelog's decision to sunset most of their podcasts (specifically, Go Time, JS Party, Ship It, and Practical AI), leaving only The Changelog, Founders Talk, and Changelog Master Feed.
The good news: The Go Time team does not give up. Two of them, Kris Brandow and Ian Wester-Lopshire are already hosting a new Go podcast, together with Dylan Bourque and Mattew Sanabria. Their Fallthrough podcast started on December 16th. The first episode Falling Through: A New Perspective introduces the podcast and lays out it's future plans. Part of the team are ex-Go-Time hosts Ian Wester-Lopshire and Kris Brandow.
While we are at it, the Cup o' Go podcast released a new episode, in which Jonathan Hall and Josh Bleecher Snyder inspect the Go 1.24 changes and discuss news around golang.org/x/net
, sync.Map
, testing
, x/exp/xiter
, and more.
On clear objectives and closed interfaces
Chris Siebenstein feels a lack of clarity in union type proposals. He requests that Go union type proposals should start with their objectives. Too often, such proposals dive right into syntax or semantics but don't discuss the pain points to solve. Especially since closed interfaces come quite close to union types already.
Spotlight: File formats for Go
Data shall survive the app(s) that created it. What sounds like a no-brainer is often neglected. Truckloads of proprietary, patented, undocumented, or no longer maintained file formats have accumulated since the first computer wrote data to a magnetic tape (or maybe it was even just a punched card). I am sure many people still have some old Word documents around, from Word versions that are long gone. When I switched from Windows to Mac in 2008, I was concerned about having Windows-only files around that I would not be able to open on the Mac. (I can't remember how I solved that problem, but I definitely refused to have Windows on dual-boot.)
When designing a Go app that writes long-lived data to a storage medium, gophers should think carefully about the file format to use.
File formats made "for eternity" should have a few fundamental properties:
- A "text" file format that is human-readable in the simplest of all text file viewers
- A structure that is simple enough to determine from inspecting the file
- Strict Unicode encoding (UTF-8, UTF-16,...), or for data durability extremists (and English content), pure ASCII encoding.
Apps, on the other hand, handle data that usually isn't supposed to be optimized for readability when stored. N-dimensional slices, slices of structs, tries, tries of structs, tries of slices of structs with slice fields, directed acyclic graphs, cyclic graphs... the list goes on and on.
The easiest option for storing such data is to write them to a binary (serialization) format like gob
, Parquet, or a SQLite file. Although usually well-documented, such binary formats are hard to work with using standard tools that (specifically on Unix) are designed to work with text files.
So if you write an app that produces long-living data, don't go right away for the fanciest data format that happens to be en vogue. Strive for simplicity and standards.
Here are a few basic file formats to start with.
Plain text for minimal structure
Using plain, "unstructured" text doesn't sound like a good idea. Yet, plain-text files do have a basic structure: a line. If your app's data can be separated into (short enough) chunks that have a simple (or no) internal structure, write those chunks out line by line. The result is human-readable and most Unix tools (such as grep
or awk
) can readily search through or otherwise process that file.
Example: The SubRip file format stores closed captions for videos in a simple, file-oriented format, where each data chunk starts with a numeric counter, a time range, the subtext itself, and a blank line, like so:
1
00:02:16,612 --> 00:02:19,376
Senator, we're making
our final approach into Coruscant.
2
00:02:19,482 --> 00:02:21,609
Very good, Lieutenant.
Simple markup
If you need a little more structure, consider adding simple markup to your text file. Think "Markdown": a special character, or a special sequence of characters, determines the start or the end of a block of data, and the type of data the block represents.
Example: The format of good old Unix man pages.
CSV for tabular data
Tabular data & CSV = match in heaven?
While CSV is a popular format for tabular data, it has severe drawbacks: The field divider—typically a comma, a semicolon, or a tab character—can occur in field data, too. To avoid this, fields get surrounded by quotes, but then, quotes inside field data must be escaped. Second, CSV has no data types; everything is a string. There aren't even agreed-upon serialization formats for data like time/date values or floating-point values.
If you want to save tabular data, consider using JSON Lines instead.
JSON Lines for better tabular data
JSON Lines is "One JSON document per line". In other words, each line of a JSON Lines document is a valid JSON value. Typically, you would use the same object or arrays in each line, but JSON Lines permits any JSON value. To read or write JSON Lines, you can use any standard JSON library, such as encoding/json
.
Example: log files
JSON and YAML for hierarchical data
If the data doesn't fit into lines or tabular formats, you probably have hierarchical data, a.k.a. trees. JSON would be a good choice as it is quite popular, as is YAML, a superset of JSON.
Examples: - Configuration files - More configuration files - Did I mention configuration files?
Directed graphs?
I'll conclude this Spotlight with the next level of "data format escalation": non-hierarchical structures, specifically: data structures that can be represented as nodes with unidirectional connections between the nodes: A directed acyclic graph, or DAG.
Finding human-readable file formats for DAGs turns out to be difficult. For most use cases, you'd probably want to reach out for a graph database right away.
Perplexity suggested me to use Graphviz's Dot file format. While designed for visual representation of graphs, the Dot format, it's perfectly feasible to ignore the parts of the language that deal with visualization. Go's pprof
tool can write profile graphs to a Dot file using the -dot
option, albeit for visualization rather than storage.
Two other formats that come to mind are YAML and CUE. Both languages focus on tree-like data structures but allow data to cross-reference data elsewhere in the tree structure.
However, to be honest, nothing prevents you from inventing your own DAG language, using JSON, YAML, or CUE as the basis:
graph: {
nodes: ["A", "B", "C", "D"]
edges: [
{from: "A", to: "B"},
{from: "A", to: "C"},
{from: "B", to: "D"},
{from: "C", to: "D"}
]
}
Bottom line: keep things simple
The gist of the whole exercise in clear-text data format: If future users of your data, be it in five or fifty years, should be able to read the data your apps write today, make the format (a) human-readable and (b) easy to reverse-engineer, in case the related tools aren't available anymore.
Completely unrelated to Go
Blub studies
Hillel Wayne, advocate of formal methods for software development (but that's only an aside that has nothing to do with the following), brought my attention to an essay about "blub studies", a type of learning that goes deep into learning the details of your everyday tooling, like your programming language, databases, CLI tool flags, etc. This seems to go against the usual canon of "learn the fundamentals!" but surely deserves space in every developers learning budget. I defitely can relate to this POV, as I am currently struggling with Taskfile, Restic, ssh, cron jobs, and Synology's particular Linux to set up a regular backup from my NAS to Hetzner S3. Wish me luck and new learnings.
A terminal app (no pun intended)
On a more personal note, I am currently testing Waveterm, an Electron-based terminal app that includes a surprising amount of Go (surprising for an Electron app, that is). So far, I enjoy stable and smooth terminal sessions since a couple of days and am playing around with the widget feature and the AI integration. Ironically, I discovered Waveterm while waiting for Mitchell Hashimoto's ghostty to be released. Despite having founded and run a Go-based company for years, Mitchell opted for Zig to implement ghostty
. As he explained in a Changelog podcast episode, he was looking for a GC-less language but Rust didn't quite bring the fun to development.
Happy coding! ʕ◔ϖ◔ʔ
Questions or feedback? Drop me a line. I'd love to hear from you.
Best from Munich, Christoph
Not a subscriber yet?
If you read this newsletter issue online, or if someone forwarded the newsletter to you, subscribe for regular updates to get every new issue earlier than the online version, and more reliable than an occasional forwarding.
Find the subscription form at the end of this page.
How I can help
If you're looking for more useful content around Go, here are some ways I can help you become a better Gopher (or a Gopher at all):
On AppliedGo.net, I blog about Go projects, algorithms and data structures in Go, and other fun stuff.
Or visit the AppliedGo.com blog and learn about language specifics, Go updates, and programming-related stuff.
My AppliedGo YouTube channel hosts quick tip and crash course videos that help you get more productive and creative with Go.
Enroll in my Go course for developers that stands out for its intense use of animated graphics for explaining abstract concepts in an intuitive way. Numerous short and concise lectures allow you to schedule your learning flow as you like.
Christoph Berger IT Products and Services
Dachauer Straße 29
Bergkirchen
Germany