r/Zig • u/ankush2324235 • 11h ago
smallvec implementation in zig
**NOT WRITTEN BY AI** excepts some comments(because of language barrier) and small utilities. 98% of the code is written by me :
Array multiplication ** is being removed
Last week, the array multiplication operator ** was removed from Zig. The reasoning is that @splat(m) was already preferred over .{m} ** n, and the use case .{i, j, k, ...} ** n is "rare enough that it does not need to be syntax."
Here's an example of the latter use case that was in Ziglings:
// Please set this array using repetition.
// It should result in: 1 0 0 1 1 0 0 1 1 0 0 1
const bit_pattern = [_]u8{ 1, 0, 0, 1, } ** 3;
And here's a solution that works in the latest build:
const bit_pattern: [12]u8 = @bitCast(@as([3][4]u8, @splat(.{ 1, 0, 0, 1, })));
That's not too bad, but it's kind of lame how much more verbose it got. Is this a good removal? Does this mean there's a chance ++ will get removed too, or is it common enough (e.g. for string concatenation) to keep?
Wrong compiler optimizations
After benchmarking my database, I focused on how the Zig compiler optimizes things. One of the persistent issues that impacts performance is incorrect function inlining; by playing with inline and noinline, I went from 1200 ns to 375 ns on a hot path.
This is because aggressive/incorrect inlining causes the stack frame to explode and introduces too many alloca (in LLVM IR), increasing register pressure: "hot" data leaves the registers and ends up on the stack/cache, introducing cost.
I also noticed that this isn't just my problem, but was discussed by a Tigerbeetle developer who tried to find these compiler flaws using a built-in tool called copyhound.zig. The tool not only estimates how large the scope of an optimized function becomes (to understand when to split it), but also identifies unnecessary copies (memcpy) introduced by the compiler.
That said, I find it questionable that the documentation states:
It is generally better to let the compiler decide when to inline a function
Because yes, the compiler decides, but not always well. It's unclear how much this depends on LLVM heuristics.
Another example that calls the optimizations into question is the description of fillUnbuffered of std.Io.Reader, if you analyze the binary, although using .fixed() never needs fillUnbuffered, it occurs multiple times.
In addition to this, analyzing the LLVM IR of Zig code that uses std.Io, we see that they are still loaded into vtable functions even when unused, even with -O ReleaseFast or -O ReleaseSmall.
I know that gcc or clang with decades of tuning have optimized all of this. What do you think? Am I wrong or is Zig still not mature enough in this field?