The Applied Go Weekly Newsletter logo

The Applied Go Weekly Newsletter

Archives
Subscribe
December 3, 2025

Mechanical Sympathy • The Applied Go Weekly Newsletter 2025-11-30

AppliedGoNewsletterHeader640.png

Your weekly source of Go news, tips, and projects

2025-11-30-mechanical-sympathy.png

Mechanical Sympathy

Hi ,

There are two things in Go that probably don't grab a newcomer's immediate attention: The mechanics of heap allocation and data races. The first is the garbage collector's domain until performance tuning becomes important. The second may go unnoticed until the first few attempts at writing concurrent code start producing strange results.

In either case, it's good to have an idea about the things going on behind the scenes. There is a term for this: mechanical sympathy. And that's what the two featured articles help to acquire.

–Christoph

Featured articles

Why Your Go Code Is Slower Than It Should Be: A Deep Dive Into Heap Allocations

Go's garbage collecting mechanism is a huge win for development speed while still producing fairly fast apps. (I hear you shouting, Rustaceans, "but not blazingly fast!". Yes, that domain is left to languages with manual (and, in Rust's case, smartly compiler-aided) memory management.) But sometimes, performance can be improved with small changes. Cristian Curteanu collected some typical patterns.

A million ways to die from a data race in Go

Well, not quite a million, but five are surely enough reasons for honing your data race avoidance skills.

Spotlight: Show some sympathy (for your hardware)

(Yes, I finally found some time to write one!)

Many programming languages try to be as "high-level" as possible by shielding the developer from the details of the hardware and the inner workings of compiler and runtime. What a shame! Users of these languages get some ease in coding for the price of needlessly slow apps.

What if a language laid only a thin abstraction over the hardware it is supposed to run on? And what if developers knew enough about the hardware they write code for?

Why you should have (or develop) mechanical sympathy

In an ideal world, algorithms would always execute without any friction—no CPU or RAM limits, no I/O delay. Reality, however, confronts the developer with hard and soft limits as well as more or less subtle hardware peculiarities. Mechanical sympathies means to know about these factors and use them to your code's advantage.

The term "mechanical sympathy" originates from Jackie Stewart, a racing driver, who said:

You don’t have to be an engineer to be a racing driver, but you do have to have mechanical sympathy.

– Jackie Stewart

What he meant is that if you know your car, you become a better driver. A better driver can drive his car faster, which, after all, is the point of car races, right? The same applies to software development: Know your hardware architecture—the CPU, RAM, I/O, and so forth—and you can become a better developer. A better developer writes software that's more efficient, in terms of both speed and memory consumption. Did you notice that hardware became tremendously faster over the last decades while software feels sluggish as ever? Performance optimization has been neglected for too long.

A common meme in our industry is “avoid pre-optimization.” I used to preach this myself, and now I’m very sorry thatI did.

Dave Farley

I want to extend the notion of mechanical sympathy from hardware to software mechanics. Compilers and runtimes can contain complex algorithms that are worth developing some mechanical sympathy for. Think of:

  • memory allocation and garbage collection
  • concurrency as a "mechanical" system

to name only two. My point is: Certain kinds of software abstraction layers can expose hardware-like behavior on their own.

Mechanical sympathy for hardware

Now let's look at some features of modern CPUs that can affect your code in unforeseen ways but that you also can leverage to make your code performing better.

Local caches

From a software point of view, RAM is linear. Available memory is a long row of memory cells numbered from 1 to 2^x (where x depends on the width of the address bus in bits).

In reality, RAM is hierarchical: A series of caches sit between RAM and a CPU core, each one smaller but faster than the previous one. (As these caches are local to the CPU core, they are typically called "L1", "L2", and "L3") When the core fetches data from RAM, the data remains cached as long as available cache space permits, and future requests to the same data become much faster.

Algorithms can be designed to take advantage of the caches by operating on amounts of data that fit into a local CPU cache—ideally into the L1 cache that sits closest to the CPU and is the fastest but also the smallest one in the cache hierarchy.

On the flipside, algorithms that are unaware of the cache architecture can become slower than necessary, even to an extent that cancels out performance achievements from concurrent work.

How so? Imagine a concurrent app with goroutines running on multiple cores at once. Goroutine a on core A reads a variable and updates it. Now the variable's current value sits in core A's L1 cache. Then, goroutine b on core B wants to read and update the same variable. To make this possible, the processor must first invalidate the value in A's L1 cache, move the new value back to main RAM, then let B fetch and process the value and store it in B's L1 cache. Now imagine if goroutine a wants to update the value again; the slow cache invalidation process repeats.

Structure alignment

Terms like "32-bit processor" or "64-bit architecture" refer to the size of one unit of information that the processor can read from RAM. A 64-bit processor reads data in chunks of 64 bits or eight bytes at once.

But what about smaller data, such as an int16 or a boolean? The CPU still must fetch a full 64-bit chunk of data to read a 16-bit value from RAM.

For this reason, data must be aligned at 64-bit borders.

Consider this struct:

var s struct {
  a int8
  b int64
  c bool
}

If it was stored in memory as a continuous data block, field b would spread across two 64-bit memory cells:

0 8             64
+-+-------------+
|a|b            |
+-+-------------+
|b|c|###########|
+-+-------------+

(The hash signs ### denote unused space within a memory cell.)

To fetch field b, the CPU would have to read both memory cells, extrac the two fragments of b and combine them into one.

As this would be highly inefficient, the compiler maps the struct to memory so that no value spans two memory cells:

0 8             64
+-+-------------+
|a|#############|
+-+-------------+
|b              |
+-+-------------+
|c|#############|
+-+-------------+

Apparently, our partiular struct wastes quite some space by storing two 8-bit values in two 64-bit memory cells.

Luckily, as Gophers, we are empowered to change this. The compiler maps struct fields in the sequence given in the source code, so by rearranging the struct a bit, we can have the compiler map the two short fields, a and c, to the same memory cell:

var s struct {
  b int64
  a int8
  c bool
}

The compiler happily arranges a single cell for a and c, thus reducing the required memory by one third!

0 8             64
+-+-------------+
|b              |
+-+-------------+
|a|c|###########|
+-+-------------+

This might not sound much, but for large slices of structs, it can make a difference.

These were just two examples of using knowledge about hardware behavior to improve efficiency. Next, I'll transpose this principle to software mechanics.

Mechanical sympathy for Go

Programming languages, as simple as some of them may seem, are in fact quite complex systems, comprised of a compiler and runtime libraries that directly influence how code performs. Languages often build their data types around specific paradigms.

Two examples of this in Go are slices and goroutine scheduling.

Slices

Every Go newcomer quickly learns that slices are quite special: While other languages treat dynamic arrays as an opaque concept that "just works" (somehow), Go makes the internal structure explicit: Slices live on underlying static arrays, and once a Go developer learns how the mechanics of slicing, copying, and appending work, they can design algorithms that perform better than those that have to rely on an impenetrable abstraction of dynamic arrays.

   Slice header
+-----------------+
| len | cap | ptr |
+--------------|--+
  +------------+ 
  |               Array
+-V---------------------------------+
|   |   |   |   |   |   |   |   |   |
+-----------------------------------+

Algorithms that operate on a known (maximum) amount of data can pre-allocate underlying arrays, by either explicitly allocating an array and slicing it or calling make with a capacity parameter: s := make([]int, 100, 10000). Without the need to resize the underlying array, allocations are reduced to a minimum. Memory-intense operations become faster and the garbage collector has much less to do.

Another aspect of slices worth mentioning: As continous blocks of data, they are much more cache-friendly than pointer-based structures like linked lists, whose elements are necessarily allocated on the heap and therefore can be anywhere in RAM. So if you're about to implement an algorithm that would seem to require connecting individual data elements through pointers, see if you can find a slice-based implementation instead that can take advantage of cache locality.

The Go scheduler

You probably have come across demo apps that spawn bazillions of goroutines to demonstrate the amazingly low overhead per goroutine. Yet, knowing and adapting to the way goroutines are scheduled onto system threads can pay off.

System threads are mapped 1:1 to CPU cores; that is, only one thread can run on a core at any given time. Swapping out a thread to run another one is rather expensive, so the Go inventors designed goroutines so that scheduling them becomes dirt cheap.

The scheduler isn't exactly a simple algorithm, so I attempted to make the logic more accessible by comparing the scheduler to a restaurant.

The most important part of designing concurrent work is to know whether the goroutines you plan to spawn are I/O-bound or CPU-bound.

  • I/O-bound goroutines are frequently blocked by slow data exchange with storage or network connections. You can spawn a lot more of them than then number of cores in your target system, as they mostly will be in a blocked state, giving room for OS activity or other goroutines to run.
  • CPU-bound goroutines max out the CPU by doing calculations on data that's readily available in RAM. Here, you'd want to schedule no more goroutines to run simultaneously than there are CPU cores available. Any goroutine above that number will only generate additional scheduling overhead without any further speed-up.

As concurrency is inherently difficult (despite Go making it more accessible through concurrency primitives that are easy to reason about (goroutines and channels, that is), always keep an eye on how concurrent flows unfold in your app by monitoring CPU and scheduling profiles to see contention, memory use, and goroutines leaks and adjust concurrent flows accordingly.

Conclusion: To write good software, learn hardware

As a universal rule, if you want to get better in a particular domain of life, lean out of the window, peek over the fence, and explore the domain adjacent to yours. It is always interesting and sometimes revealing when you take on a new perspective.

Exploring modern hardware is fascinating, and if modern CPU architecture feels overwhelming, start with programming a microcontroller.

Whatever you decide to explore outside the domain you want to get better, it'll pay off in many ways.

Further reading

Within the boundaries of a Spotlight, I can barely scratch the surface of this topic. No worries! There's a lot of great stuff about mechanical sympathy just one click away:

About cache misses and concurrent access to data

Slow down your code with goroutines

The Go scheduler explained with chefs, pans, and stoves

Are goroutines threads?

About mechanical sympathy

Mechanical Sympathy: Understanding the Hardware Makes You a Better Developer

Hardware-Aware Coding: CPU Architecture Concepts Every Developer Should Know

Computers can be understood

CPU Cache (Wikipedia)

Mechanical sympathy and Go

Mechanical Sympathy in Go

Writing Go Code with Mechanical Sympathy: Optimizing for Hardware Efficiency

Quote of the Week: On implementing ideas

I'd rather build and fail than spend years debating in GitHub comments whether it's a good idea.

–

More articles, videos, talks

When Should a Spring Boot Dev Actually Switch?

What could make a Java Spring Boot developer want to switch to Go or Rust? Mohammedouchkhi asked /r/golang and distilled the answers into this article.

Go proposal: Goroutine metrics

Another accepted proposal, explained by Anton Zhiyanov: runtime/metrics will report goroutine states, and goroutine and thread counts.

Gist of Go: Concurrency testing

Kernighan's Law says that debugging is twice as hard as coding—and since concurrent programming is already hard, what about debugging concurrent code? Well, better arm your code with good tests to avoid debugging as good as possible.

Projects

Libraries

GitHub - KromDaniel/regengo: a compile-time finite state machine generator for regular expressions

While typical regular expression libraries do most of the work at runtime, regengo moves the bulk of the work to compile time by generating a state machine from the regexp. (Compare to modernc.org/rec.

GitHub - sandrolain/httpcache: A Transport for http.Client that will cache responses according to the HTTP RFC

For httpcache users: Version 1.4.0 adds a few responses for even better RFC 9111 compliance. A heads-up: This is also the final v1 version. The developer focuses on v2 because modernizing the library requires breaking changes.

Tools and applications

GitHub - 1rhino2/go-memory-visualizer: Real-time Go struct memory layout visualization and optimization for VS Code. Analyze padding, alignment, and cache performance with one-click field reordering.

When every byte in memory counts, having immediate feedback on the memory layout of a structure can come in handy.

GitHub - MadAppGang/dingo: A meta-language for Go that adds Result types, error propagation (?), and pattern matching while maintaining 100% Go ecosystem compatibility

Does Go have its TypeScript moment?

Supertonic (TTS) - a Hugging Face Space by Supertone

Make your Go apps talk to the user. Supertonic is a fast text-to-speech (TTS) engine that comes with a Go SDK and runs on the local machine or even inside a browser, for maximum privacy (and independence from network connections).

Completely unrelated to Go

There's always going to be a way to not code error handling

Imagine a language where error handling is the happy path.

A ChatGPT prompt equals about 5.1 seconds of Netflix

What would you rather do tonight: Watch Netflix for 1,5 hours, or run 500-1000 ChatGPT queries?

(Spoiler: From an engery standpoint, it's the same.)

Happy coding! ʕ◔ϖ◔ʔ

Questions or feedback? Drop me a line. I'd love to hear from you.

Best from Munich, Christoph

Not a subscriber yet?

If you read this newsletter issue online, or if someone forwarded the newsletter to you, subscribe for regular updates to get every new issue earlier than the online version, and more reliable than an occasional forwarding. 

Find the subscription form at the end of this page.

How I can help

If you're looking for more useful content around Go, here are some ways I can help you become a better Gopher (or a Gopher at all):

On AppliedGo.net, I blog about Go projects, algorithms and data structures in Go, and other fun stuff.

Or visit the AppliedGo.com blog and learn about language specifics, Go updates, and programming-related stuff. 

My AppliedGo YouTube channel hosts quick tip and crash course videos that help you get more productive and creative with Go.

Enroll in my Go course for developers that stands out for its intense use of animated graphics for explaining abstract concepts in an intuitive way. Numerous short and concise lectures allow you to schedule your learning flow as you like.

Check it out.


Christoph Berger IT Products and Services
Dachauer Straße 29
Bergkirchen
Germany

Don't miss what's next. Subscribe to The Applied Go Weekly Newsletter:
Share this email:
Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share via email Share on Mastodon Share on Bluesky
https://c.im/@c...
LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.