go 1.21
This release is definitely the most feature packed after the generics one, a lot of experimental generics packages got matured and turned into standard library (slog, slices, maps, cmp), and now you can provide reasons for context cancellation.
Initialisation Order Changes
This definitely is going to break some programs in production, the first step of the new behaviour is sorting the packages, even though gofmt sorts the packages currently it also respects import groupings so there might still be some implicit dependencies that are tied to your older initialisation order.
This reminds me of a production incident where using gofmt triggered an outage, where one global helper package sets the default timezone for the entire code using `os.Setenv("TZ", "Asia/Kolkata")`, and a few other packages set default variables like when evening starts i.e 6PM via init function. The gofmt sorted the packages mentioned in the import statements. because of this a number of packages started declaring that evening starts 12:30PM and the outage started, interestingly its an auto-fixing outage i.e it will get automatically fixed by 6PM.
So inspecting all the init functions and modifying any package scoped or global scoped behaviour is a must before upgrading to go 1.21. adding gochecknoinits is also probably a good idea to avoid such issues in future
New Package Initialisation Algorithm
Loop Var Experiment
Currently its only an experiment but if this gets released, this might be the first change that gets really close to breaking the compatibility promise.
In short this will change the current behaviour which scopes the variables in loops are scoped to the entire loop, but the new behaviour changes this to scope the variable per-iteration. in the below example previous versions printing behaviour is hard to predict but it could be simply printing the last element 3 times, rather than printing each string in the array because the variable is per-loop scoped rather than per-iteration.
Example - https://go.dev/doc/faq#closures_and_goroutines
`panic(nil)` Behaviour
The current recovery handling logic depends on the fact that the variable used for raising panic has a non-nil value. so if any panic is raised with nil value then panic will get recovered but any handling logic won’t get executed because of the nil check i.e `if r := recover(); r != nil`, so in production even if a panic happened you may never get to know about it and is getting silently handled. if your error reporting mechanism blows up after the change, there is no need for worry, a silent panic is now getting noticed.
I wonder if anyone is using this behaviour to panic silently i.e to forcefully stop the execution of a go-routine.
slog package
finally slog got upgraded from x/exp/slog to standard library. context variations of the logging functions were the major missing feature in logrus and zap, I was very happy with the introduction of these.
another great thing I found really useful was `LogValuer` interface, using this I was able to convert complex types like errors to a group of `slog.Attr`, which are better represented in the final logs.
I hope the future errors package will start implementing this natively, the current error wrapping concatenates all the error strings with a space rather than new line, but the LogValuer interface provides some control over this by unwrapping manually and selectively building the error string that gets printed in the structured logs.
slices package
I could now remove a lot of internal helper functions like `slices.Contains`, `slices.Compare`(when treating slices like tuples), `slices.Equal`.
although a couple of things are slightly disappointing,
there is no `slices.Chunk`, which is one of my most used util function while working with batch APIs.
another one is that the elements interface is `comparable`, I wish it also supported a high level one i.e `interface{comparable | interface{Compare(E, E) }}`, currently this is not possible because comparable can’t be specified in union with other interfaces. because of this one would have to redefine complicated `*Func` versions of types.
Reasons in Context Cancellation
`WithDeadlineCause` and `WithTimeoutCause` are really excellent utilities, the error stacks for timeouts are always pretty confusing, does the deadline exceeded error say that mysql query didn’t finish within the 500ms time or overall request reached its overall deadline of 10 seconds. a number of outages debugging gets delayed due to this correlation issue.
context.WithoutCancel
Background operations triggered by function that need to return to callers sometimes need to be detached from parent context’s cancellation, but also need information on who triggered the operation, observability is one good use case.
But I believe running background operations without a parent scope itself is an anti-pattern because who gets to stop them? this and `context.AfterFunc` would require more research on proper valid use cases, I feel that these 2 are more suitable for very few core i/o libraries than for any other use cases.
This also reminds me of a connection churn outage, both sql & redis libraries discard connections if there is a context cancellation. so let’s say any component in the request lifecycle starts taking more time then its probable that the number of requests timing out when triggering the redis operation might increase which in turn increases the connection churn. So if one component degrades, it cascades into connection churn on redis and eventually degrading redis. the core issue is that the database libraries loan the connection to the callers from the connection pool rather than fully owning the connection i.e cleaning up the connection even if the caller isn’t interested in the request running on connection, ideally if caller isn’t requested but already initiated an operation on a connection, the connection pool can reset the connection rather than destroying it completely.
One hacky solution to such connection churn problem is to avoid passing parent cancellation trigger to the database operations, an even better hacky solution would be to pass override the context timeout which still leads to connection churn but much much less.
ErrUnsupported
now there is a standard error for handling unsupported error code, integrating this into http/grpc error translation would make even more powerful to work with.
GC Metrics
This release also adds metrics that really help building an internal platform and observing that can automatically tune GOGC & GOMEMLIMIT values. GOMEMLIMIT is really well thought out feature i.e with a single flag you get a way to leverage spare memory for improving compute, but the default GOGC value might make the GOMEMLIMIT inaffective if your live heap is very very small when compared to GOMEMLIMIT, having these metrics will help in figuring out what is the right GOGC value to improve compute for a microservice. standardising means an easier way to build observability around this i.e standard grafana dashboards
These are some of the items that I had some perspective and earlier experience, for exhaustive list see - https://go.dev/doc/go1.21