Go Caching Best Practices
Daniel Hayes
Full-Stack Engineer · Leapcell

Caching is indispensable for accelerating API applications, so if high performance is required, caching is essential at the design stage.
When considering caching in the design phase, the most important thing is to estimate how much memory will be needed.
First, we need to clarify exactly what data needs to be cached.
In applications with continuously growing user bases, it is not feasible to cache all the data that is used.
This is because the application's local memory is limited by the physical resources of a single machine. If data is cached without restriction, it will eventually lead to OOM (Out of Memory), causing the application to be forcibly terminated.
If a distributed cache is used, the high hardware cost also forces us to make trade-offs.
If physical resources were unlimited, it would naturally be best to store everything in the fastest physical devices.
But real-world business scenarios do not allow this, so we need to classify data into hot and cold data, and even appropriately archive and compress cold data, storing it on more economical media.
Analyzing which data can be stored in local memory is the first step in implementing effective local caching.
Balancing Stateful and Stateless Applications
Once data is stored locally in the application, the application is no longer stateless in a distributed system.
Take a web backend application as an example: if we deploy 10 Pods as backend applications, and if we add caching to one of the Pods handling requests, when the same request is forwarded to another Pod, the corresponding data will not be accessible.
There are three solutions:
- Use distributed cache like Redis
- Forward the same request to the same Pod
- Cache the same data in every Pod
The first method needs no further explanation; this essentially makes storage centralized.
The second method requires specific identifying information, such as the user's uid, to implement special forwarding logic, and is limited by practical scenarios.
The third method consumes more storage space. Compared with the second method, we need to store data in every Pod. Although it cannot be considered completely stateless, the probability of cache penetration is lower than with the second approach. This is because, if the gateway fails to forward requests to the Pod with the specific data, other Pods can still process the request normally.
There is no silver bullet; choose the method based on actual scenarios. However, the further the cache is from the application, the longer it takes to access.
Goim also maximizes cache hits through memory alignment.
When the CPU performs calculations, it first looks for the required data in L1, then L2, and then L3 cache. If the data is not found in any of these caches, it has to fetch the data from main memory. The farther the data is, the longer the computation takes.
Eviction Policy
If strict memory size control is required for the cache, you can use the LRU (Least Recently Used) policy to manage memory. Let’s look at Go’s implementation of LRU cache.
LRU Cache
LRU cache is suitable for scenarios where you need to control the cache size and automatically evict less frequently used items.
For example, if you only want to store 128 key-value pairs, the LRU cache will keep adding new entries until the limit is reached. Whenever a cached item is accessed or a new value is added, its key is moved to the front, preventing it from being evicted.
https://github.com/hashicorp/golang-lru is a Go implementation of LRU cache.
Let’s look at an example test to see how LRU is used:
func TestLRU(t *testing.T) { l, _ := lru.New for i := 0; i < 256; i++ { l.Add(i, i+1) } // Value has not been evicted value, ok := l.Get(200) assert.Equal(t, true, ok) assert.Equal(t, 201, value.(int)) // Value has already been evicted value, ok = l.Get(1) assert.Equal(t, false, ok) assert.Equal(t, nil, value) }
As you can see, the key 200 has not been evicted, so it can still be accessed.
However, key 1 has exceeded the cache size limit of 128, so it has already been evicted and cannot be retrieved anymore.
This is useful when the amount of data you want to store is too large; the most frequently used data will always be moved to the front, increasing the cache hit rate.
The internal implementation of the open-source package uses a linked list to maintain all cached elements.
Every time Add
is called, if the key already exists, it is moved to the front.
func (l *LruList[K, V]) move(e, at *Entry[K, V]) { if e == at { return } e.prev.next = e.next e.next.prev = e.prev e.prev = at e.next = at.next e.prev.next = e e.next.prev = e }
If the key does not exist, it is inserted using the insert
method.
func (l *LruList[K, V]) insert(e, at *Entry[K, V]) *Entry[K, V] { e.prev = at e.next = at.next e.prev.next = e e.next.prev = e e.list = l l.len++ return e }
If the size of the cache has been exceeded, the element at the end of the list—which is the oldest and least used—will be removed.
func (c *LRU[K, V]) removeOldest() { if ent := c.evictList.Back(); ent != nil { c.removeElement(ent) } } func (c *LRU[K, V]) removeElement(e *internal.Entry[K, V]) { c.evictList.Remove(e) delete(c.items, e.Key) // Callback after deleting key if c.onEvict != nil { c.onEvict(e.Key, e.Value) } } func (l *LruList[K, V]) Remove(e *Entry[K, V]) V { e.prev.next = e.next e.next.prev = e.prev // Prevent memory leak, set to nil e.next = nil e.prev = nil e.list = nil l.len-- return e.Value }
Cache Updates
Timely cache updates in distributed systems can reduce data inconsistency issues.
Different methods are suitable for different scenarios.
There are various situations when fetching cached data. For example, for a popular ranking list, which is unrelated to users, we need to maintain this data in the local cache of every Pod. When a write or update occurs, all Pods must be notified to update their cache.
If the data is specific to each user, it is preferable to handle the request in a fixed Pod, and route requests to the same Pod using a user identifier (uid). This avoids storing multiple copies across different Pods and reduces memory consumption.
Most of the time, we want our applications to be stateless, so this part of the cached data is stored in Redis.
There are three main distributed cache update strategies: cache-aside (bypass), write-through, and write-back.
Cache-Aside (Bypass) Strategy
The cache-aside (bypass) strategy is the one we use most frequently. When updating data, we first delete the cache, then write to the database. The next time the data is read and the cache is found missing, it will be retrieved from the database and the cache will be refreshed.
This strategy can lead to inconsistencies when read QPS is extremely high. That is, after the cache is deleted but before the database is updated, a read request may come in and reload the old value into the cache, so subsequent reads will still get the old value from the database.
Although the probability of this actually happening is low, you need to carefully evaluate the scenario. If such inconsistencies are catastrophic for your system, then this strategy cannot be used.
If this situation is acceptable but you still want to minimize inconsistency, you can set a cache expiration time. When no write operation occurs, the cache will expire proactively, refreshing the cache data.
Write-Through and Write-Back Strategies
Write-through and write-back strategies both update the cache first, then write to the database—the difference lies in whether the update is done individually or in batches.
A significant drawback of these strategies is that data loss can occur easily. Although Redis supports persistence strategies such as writing back to disk, for applications with high QPS, losing even one second of data due to a server crash can be a huge amount. Therefore, you must make a decision based on your business and actual scenario.
If Redis still cannot meet your performance requirements, you need to store cached content directly in application variables (local cache), so user requests are served directly from memory without network requests.
Below, we will discuss strategies for updating local caches in distributed scenarios.
Active Notification Update (Similar to Cache-Aside Strategy)
In distributed systems, you can use ETCD broadcasts to quickly propagate cache updates without waiting for the next query to reload the data.
However, this approach has a problem. For example, at time T1, a cache update notification is sent, but the downstream services have not yet finished updating. At time T2 = T1 + 1s, another cache update signal is sent, while the update at T1 is still not complete.
This may cause the newer value at T2 to be overwritten by the older value from T1 due to differences in update speed.
This can be addressed by adding a monotonically increasing version number. When the T2 version of data takes effect, T1’s version can no longer update the cache, thus avoiding overwriting the new value with the old one.
With active notifications, you can specify the relevant key to update only specific cached items, avoiding the high load caused by updating all cached data at once.
This update strategy is similar to the cache-aside strategy, except that you’re updating local cache instead of distributed cache.
Waiting for Cache Expiry
This approach is suitable when strict data consistency is not required. For local caches, if you want to propagate updates to all Pods, the maintenance strategy becomes more complex.
You can use Go’s open-source package https://github.com/patrickmn/go-cache to handle cache expiration in memory without implementing your own logic.
Let's see how go-cache implements local caching:
Go Cache
https://github.com/patrickmn/go-cache is an open-source local caching package for Go.
Internally, it stores data in a map.
type Cache struct { *cache } type cache struct { defaultExpiration time.Duration items map[string]Item mu sync.RWMutex onEvicted func(string, interface{}) janitor *janitor }
The items
field stores all the relevant data.
Each time you Set
or Get
, it operates on the items
map.
The janitor
periodically deletes expired keys at a specified interval.
func (j *janitor) Run(c *cache) { ticker := time.NewTicker(j.Interval) for { select { case <-ticker.C: c.DeleteExpired() case <-j.stop: ticker.Stop() return } } }
It uses a Ticker to trigger signals, periodically calling the DeleteExpired
method to remove expired keys.
func (c *cache) DeleteExpired() { // Key-value pairs to be evicted var evictedItems []keyAndValue now := time.Now().UnixNano() c.mu.Lock() // Find and delete expired keys for k, v := range c.items { if v.Expiration > 0 && now > v.Expiration { ov, evicted := c.delete(k) if evicted { evictedItems = append(evictedItems, keyAndValue{k, ov}) } } } c.mu.Unlock() // Callback after eviction, if any for _, v := range evictedItems { c.onEvicted(v.key, v.value) } }
From the code, we can see that cache expiration relies on periodic eviction.
So what happens if we try to retrieve a key that has expired but hasn’t yet been deleted?
When fetching data, the cache will also check whether the key has expired.
func (c *cache) Get(k string) (interface{}, bool) { c.mu.RLock() // Return directly if not found item, found := c.items[k] if !found { c.mu.RUnlock() return nil, false } // If the item has expired, return nil and wait for periodic deletion if item.Expiration > 0 { if time.Now().UnixNano() > item.Expiration { c.mu.RUnlock() return nil, false } } c.mu.RUnlock() return item.Object, true }
You can see that every time a value is retrieved, expiration is checked, ensuring that expired key-value pairs are not returned.
Cache Warming
How to preload data at startup, whether to wait for initialization to complete before starting, whether to allow segmented startup, and whether concurrent loading will put pressure on middleware—these are all issues to consider for cache warming at startup.
If you wait for all initialization to finish before starting the preload process, but the overall resource consumption is high, you can run initialization and preloading in parallel. However, you must ensure that certain key components (such as database connections, network services, etc.) are already available, to avoid resource unavailability during preloading.
If requests arrive before loading is complete, there needs to be an appropriate fallback strategy to ensure normal responses.
The advantage of segmented loading is that it can reduce initialization time through concurrency, but concurrent preloading—while improving efficiency—also puts pressure on middleware (such as cache servers, databases, etc.).
During coding, it is necessary to assess the system’s concurrent handling capability and set a reasonable concurrency limit. Applying rate limiting mechanisms can help ease concurrent pressure and prevent middleware overload.
In Go, you can also use channels to limit concurrency.
Cache warming plays an important role in real production scenarios. During deployment, the local cache of the application will disappear after restart. In the case of rolling updates, there will be at least one Pod that needs to fetch data from the origin. When the QPS is extremely high, the peak QPS for that single Pod may overwhelm the database, causing a cascading failure (avalanche effect).
There are two ways to handle this situation: one is to avoid version upgrades during peak traffic periods, instead scheduling them during low-traffic times—this is easy to identify from monitoring dashboards.
The other way is to preload data when starting up and only provide services after the loading is complete. However, this can lengthen startup times if a rollback is needed due to a faulty release, making rapid rollback more difficult.
Both approaches have their pros and cons. In actual scenarios, you should choose according to your specific needs. The most important thing is to minimize reliance on special cases: the more dependencies you have during release, the more likely problems are to occur.
We are Leapcell, your top choice for hosting Go projects.
Leapcell is the Next-Gen Serverless Platform for Web Hosting, Async Tasks, and Redis:
Multi-Language Support
- Develop with Node.js, Python, Go, or Rust.
Deploy unlimited projects for free
- pay only for usage — no requests, no charges.
Unbeatable Cost Efficiency
- Pay-as-you-go with no idle charges.
- Example: $25 supports 6.94M requests at a 60ms average response time.
Streamlined Developer Experience
- Intuitive UI for effortless setup.
- Fully automated CI/CD pipelines and GitOps integration.
- Real-time metrics and logging for actionable insights.
Effortless Scalability and High Performance
- Auto-scaling to handle high concurrency with ease.
- Zero operational overhead — just focus on building.
Explore more in the Documentation!
Follow us on X: @LeapcellHQ