Writing network servers that can handle thousands—or even hundreds of thousands—of concurrent connections is a classic challenge, especially under high load. Go makes this remarkably doable with its built-in concurrency model and runtime-integrated I/O multiplexing. Even though developers write seemingly blocking calls like conn.Read()
or ln.Accept()
, Go handles them efficiently behind the scenes using non-blocking I/O with epoll
on Linux and kqueue
on BSD-based systems like macOS. This article dives deeply into how this works, with code examples, low-level insight, and practical guidance.
High-Load Environment Realities
Before we get into the runtime and syscalls, let’s frame the challenges:
-
High connection counts: Servers handling tens or hundreds of thousands of concurrent clients.
-
Resource constraints: File descriptors, goroutine stacks, context switching, and memory/CPU pressure.
-
Blocking pitfalls: If each connection blocks in a system call, traditional thread-per-connection models become untenable at scale.
Go addresses these through lightweight goroutines and a smart runtime overlaid on efficient multi-descriptor I/O readiness mechanisms.
The Internals: Goroutines, Scheduler, and Netpoller
Goroutines & M:N Scheduler
Go’s runtime uses lightweight goroutines that consume just a few KB of stack and can easily scale to high counts. These gops are multiplexed onto a limited number of OS threads via the M:N scheduler with GOMAXPROCS determining how many processors (P’s) are available goperf.dev.
Blocking I/O in Go is Illusory
From your code’s perspective, operations like conn.Read()
appear blocking. But internally, Go uses “readiness-based polling”—via epoll
on Linux and kqueue
on BSD/macOS—to manage I/O. If a socket isn’t ready, the goroutine is parked, the FD is registered with the netpoller, and the OS thread is freed to run other work. Once the FD is ready, the poller wakes the goroutine and re-schedules it goperf.devMediumtuhuynh.com.
Netpoller: Bridging OS and Runtime
Go’s netpoller, implemented in files like runtime/netpoll_epoll.go
and runtime/netpoll_kqueue.go
, wraps epoll/kqueue and interfaces with the scheduler. Each file descriptor has a pollDesc
used to coordinate parking and rescheduling. A dedicated polling thread (or threads) waits on readiness events and wakes goroutines as needed goperf.dev.
Epoll (Linux) vs. Kqueue (BSD/macOS)
Feature | epoll (Linux) |
kqueue (BSD/macOS) |
---|---|---|
Mechanism | epoll_ctl + epoll_wait |
kqueue() + kevent() |
Usage | Register FDs, wait for ready FDs efficiently | Register and wait in the same syscall; supports wider events beyond sockets (e.g. timers, signals) DZoneWikipedia |
Performance | Scales well to large numbers of FDs; low overhead | Equally efficient; supports richer event types and flexible filters |
These mechanisms are integrated with Go’s runtime via netpoller, abstracting them away from developers.
Writing a High-Performance TCP Echo Server in Go
Here’s a simple example that leverages all of this under the hood:
How it works under the hood:
-
net.Listen
: creates a non-blocking listener socket viasyscall.SetNonblock
, wraps it in anetFD
, and registers it with the netpoller goperf.devMedium. -
Accept
: appears blocking, but if there’s no connection, the goroutine is parked and woken only when the listener FD is ready. -
conn.ReadString
: similarly, if data isn’t available, Go parks the goroutine and frees the OS thread. When the FD is ready, the poller wakes and re-schedules the goroutine goperf.devtuhuynh.com.
Netresult: the server efficiently manages a massive number of simultaneous connections using only a handful of threads.
Scaling to 10K+ Connections and Beyond
As seen in practical benchmarks (e.g., goperf.dev’s “Managing 10K+ Concurrent Connections in Go”), handling such huge scales requires more than just goroutines—it demands careful resource and runtime awareness:
-
Monitor
ulimit -n
for file descriptor limits. -
Minimize per-connection memory usage and garbage allocations.
-
Tune GOMAXPROCS and properly orchestrate scheduling to reduce contention goperf.dev.
-
Use pools (
sync.Pool
) for reusing buffers to reduce GC pressure goperf.dev.
The combined power of Go’s runtime netpoller and lightweight goroutines makes 10K+ concurrent connections practical—but only when system and application design are tuned accordingly.
Alternatives & Trade-Offs
Some developers opt for custom event-loop libraries like gnet, written in pure Go and directly invoking epoll/kqueue syscalls for ultra-low latency and memory usage in performance-critical environments GitHubDEV Community. But while gnet can outperform the standard net package in some scenarios, it sacrifices generality and ease of use.
Forums like Reddit also note that although Go uses epoll under the hood, advanced user control over non-blocking behavior is limited, prompting interest in libs like netpoll
or CloudWeGo’s implementations Reddit.
Summary of Flow (Under the Hood)
-
Listener setup:
net.Listen
→netFD
→ non-blocking socket → netpoll registered. -
Accept loop: blocked goroutines park until new connection ready → poller wakes → Accept returns.
-
Connection handling:
conn.Read()
→ if no data, goroutine parks; FD is polled. -
Event occurrence:
epoll
/kqueue
reports readiness → poller maps event to goroutine → goroutine is made runnable. -
Scheduler involvement: awakened goroutine picks up on available OS thread (via P) → resumes
Read
and processes data.
This design avoids thread-per-connection models, minimizing resource usage while maximizing concurrency.
Conclusion
Go’s standard net
package makes high-concurrency network programming remarkably easy by abstracting away the complexities of non-blocking I/O. Whether on Linux’s epoll
or BSD/macOS’s kqueue
, the runtime netpoller seamlessly integrates these readiness-based OS features with Go’s lightweight goroutines and M:N scheduler.
Developers simply write intuitive, sequential-style code—net.Listen
, Accept
, conn.Read()
—and Go handles parking, readiness tracking, event polling, and scheduling behind the scenes. Behind those simple calls lies a finely-tuned system: non-blocking sockets, netpoll registration, goroutine parking/unparking, and efficient OS thread reuse.
To build truly scalable servers:
-
Raise system limits (e.g., file descriptors).
-
Tune GOMAXPROCS for your workload and CPU count.
-
Minimize allocations and GC impact with buffer reuse.
-
Monitor for goroutine leaks and ensure goroutine lifecycles are bounded.
-
Consider alternatives like
gnet
only when you need squeezing extra performance beyond the standard library’s capabilities.
In essence, Go empowers developers to focus on application logic, not I/O wiring, while still achieving the robust scalability required for modern high-load servers. Whether you’re serving 10,000 or 100,000 concurrent connections, Go’s runtime and the net package provide the foundation—if you tune wisely.