Why bufio.Writer and not direct os.File writes?

Part of the Building the Log Package series How I thought through the log package design before writing any code

When I was working through the store object, the whole idea was clear enough when bytes are given, append and when a position is given, read. Pretty simple interface.

But then I started thinking about what those operations actually do underneath. Every append, every read they're making syscalls. And every syscall costs. It is costing us on: Every switch from user space to kernel space. Read.

The problem with writing one byte at a time

Every write to os.File is a syscall. Every syscall crosses from user space to kernel space. And that crossing has a real cost: time, CPU state flush, cache disruption. (covered in detail in The cost of a syscall)

Picture what happens if you write 1 byte at a time:

without buffer:
"h" → syscall → kernel → disk
"e" → syscall → kernel → disk
"l" → syscall → kernel → disk
... 1000 syscalls for 1000 bytes

The fix is a buffer. Accumulate data in memory first (imagine filling a bucket) then flush to the file in one go. One syscall instead of thousands.

with buffer:
"h","e","l","l","o"... accumulate in memory
[full buffer] → ONE syscall → kernel → disk

That's the whole logic.


What bufio.Writer actually is

bufio.Writer is a struct in Go's standard library that wraps any io.Writer in this case an os.File and holds an internal byte slice as the buffer.

type Writer struct {
    buf  []byte    // the in-memory buffer
    n    int       // how many bytes currently in buffer
    wr   io.Writer // the underlying writer (your os.File)
}

When you call .Write(bytes) on a bufio.Writer, it copies bytes into buf. Only when buf is full does it flush meaning it calls the real os.File.Write() once, draining the whole buffer in one syscall.


About the buffer size

The default is 4096 bytes (4KB). This is not arbitrary it matches the typical OS page size. You can override it with bufio.NewWriterSize(file, yourSize).

How do you choose the buff size? It's a tradeoff:

larger buffer → fewer syscalls    → better throughput
             → more memory held   → higher latency before flush
             → data in memory longer → crash = data loss

smaller buffer → more syscalls    → worse throughput
              → less memory       → flushes sooner → more durable

For a commit log where durability matters, you lean smaller or call Flush() explicitly after every append. In the book, Travis calls Flush() explicitly after every append a deliberate durability decision.


The thing that's easy to miss

The buffer lives in your process memory. It has not hit the OS yet, let alone disk.

So if your program crashes after writing to the buffer but before flushing that data is gone. This is why the store's Append method calls Flush() explicitly after every write, not just when the buffer fills naturally.

your Append call
  → write to bufio buffer  (memory, fast)
  → Flush()                → syscall → kernel buffer
  → (optionally) fsync     → kernel buffer → actual disk

Flush() gets bytes to the kernel. fsync gets bytes to actual disk. These are two different guarantees.

The book doesn't call fsync on every write that would be too expensive for throughput. Instead it relies on Flush() and accepts that an OS crash could lose the last unflushed buffer. That's a conscious tradeoff. For a log system, you need to understand which level of durability you're promising before you ship it.


Why os.File alone isn't enough

os.File implements io.Writer directly every .Write() call is a syscall. There's no buffering. bufio.Writer wraps it and adds the buffer layer in between.

your code → bufio.Writer (buffer in memory) → os.File → syscall → kernel → disk

The store struct holds both:

type store struct {
    *os.File
    buf  *bufio.Writer
    size uint64
}

The store embeds *os.File directly so you can still call things like file.Stat() to get file size. But all writes go through buf, the bufio.Writer. Two handles to the same file, two different jobs. One for metadata and reading, one for buffered writing.