Which is more secure, Capsicum or seccomp?

Architecturally, Capsicum's capability model is considered more robust for containing compromise. Seccomp filters syscalls but often leaves broad access to resources like the filesystem. Capsicum enforces a 'principle of least authority' (POLA) by default, granting access only to explicitly delegated resources (capabilities), making post-compromise damage limitation far more effective.

Why isn't Capsicum as widely used as seccomp if it's better?

The primary reason is ecosystem momentum. Linux's massive adoption means seccomp, despite its limitations, gets integrated into critical infrastructure like containers (Docker, Kubernetes) and browsers (Chrome, Firefox). Porting applications to use Capsicum requires more invasive code changes, while seccomp can often be bolted on with filters. Network effects favor the established solution.

Can Capsicum and seccomp be used together?

Not directly, as they are on different kernels (FreeBSD vs. Linux). However, the concepts can be complementary in design. For instance, a Linux application could use seccomp-BPF to restrict syscalls and namespaces to isolate resources, approximating a capability-like model. On FreeBSD, Capsicum is often the primary sandboxing mechanism, sometimes used alongside other jails.

Is seccomp-bpf the same as traditional seccomp?

No. Traditional 'strict mode' seccomp is extremely limited, allowing only read(), write(), exit(), and sigreturn(). Seccomp-BPF (Berkeley Packet Filter) is the powerful, flexible evolution. It allows fine-grained filtering of syscalls and their arguments using a programmable filter language. When people discuss 'seccomp' today, they almost always refer to seccomp-bpf.

What is the single biggest conceptual difference between the two?

The core difference is the security model. Seccomp operates on the *system call* as the unit of control ("Is this process allowed to call open()?"). Capsicum operates on the *object* or *resource* as the unit of control ("Does this process have a capability to this specific file descriptor?"). This shifts focus from the action to the authority, enabling finer-grained and more meaningful isolation.

The Great Sandboxing Duel: Why Capsicum's Capability Model Beats Seccomp's Syscall Filtering

A Tale of Two Philosophies: From Syscall Jails to Capability Havens

The quest to confine untrusted code within an operating system is as old as multi-user computing itself. In the modern era, this battle has crystallized around two distinct approaches exemplified by FreeBSD's Capsicum and Linux's seccomp. To understand their clash, one must look beyond API calls and filter rules to the foundational security models they embody.

Seccomp (Secure Computing Mode) emerged from the pragmatic need to limit the damage a compromised process could do. Its evolution from a brutally simple "strict mode" (allowing only 4 syscalls) to the programmable powerhouse of seccomp-bpf mirrors Linux's own growth. It's a negative rights model: everything is allowed unless explicitly forbidden by a filter. Administrators and developers craft intricate BPF programs to block specific syscalls or inspect their arguments. This model is powerful and flexible, but it's inherently reactive—a constant arms race against new exploitation techniques that find loopholes in the filter rules.

Capsicum, born from academic research at the University of Cambridge and integrated into FreeBSD, takes a radically different approach. It's a capability-based security model. When a process enters "capability mode," its entire global namespace (like the filesystem) vanishes. It can only interact with the world via specific capabilities—unforgeable tokens of authority—that are passed to it as file descriptors. This enforces a Principle of Least Authority (POLA) by construction. The process isn't thinking about what syscalls it can't make; it's physically incapable of referencing resources it wasn't explicitly given.

The Real-World Stress Test: Browsers, Databases, and Containers

How do these models fare under fire? The most public proving ground is the web browser.

Google Chrome and Mozilla Firefox employ seccomp-bpf extensively on Linux to sandbox renderer processes, network services, and audio decoders. Their filter lists are colossal, painstakingly maintained documents of allowed syscalls per subsystem. A vulnerability that allows a sandboxed process to execute a forbidden syscall (or misuse an allowed one) can break containment. The model's complexity is its Achilles' heel.

On FreeBSD, a service like Chromium can leverage Capsicum. The renderer process, upon launch, enters capability mode. It receives a handful of capabilities: a shared memory segment for communication, perhaps a socket for network access (if needed), and a very constrained file descriptor for cache storage. It has no concept of /etc, /dev, or the user's home directory. Even if fully compromised, its attack surface is orders of magnitude smaller. There are no syscall filters to bypass—only the capabilities it holds, which can be further restricted to read-only or execute-only modes.

In the container world, Docker and Kubernetes heavily rely on seccomp profiles (alongside namespaces and cgroups) as a defense-in-depth layer. Default profiles block dangerous syscalls like keyctl() or clone(). However, crafting correct, application-specific profiles is notoriously difficult, often leading to over-permissive rules that weaken security. A Capsicum-inspired model for containers would involve launching the containerized application directly into capability mode with a carefully crafted set of delegated rights, a vision some next-generation container runtimes are exploring.

Beyond the Binary: Convergence and the Future of Isolation

Declaring a single "winner" is simplistic. The landscape is converging. Linux developers recognize the limitations of pure syscall filtering.

Landlock, a relatively new Linux Security Module (LSM), is a direct move towards a capability-like model. It allows processes to restrict themselves to a subset of the filesystem hierarchy—a concept much closer to delegating a capability to a directory than to filtering the open() syscall. While not as comprehensive as Capsicum, it signals a philosophical shift.

Meanwhile, Google's Sandbox2 (used internally and in projects like gVisor) employs a multi-layered strategy. It often uses seccomp as one layer but combines it with custom kernel-level policies that act more like object-level controls. This hybrid approach acknowledges that the future of sandboxing isn't a choice between Capsicum OR seccomp, but a synthesis of the best ideas from both: the granular, object-centric authority of capabilities, with the deployable, fine-tuned control of syscall filters.

For developers and architects today, the choice is often dictated by platform. On Linux, mastering seccomp-bpf and layering it with namespaces is essential. On FreeBSD, understanding Capsicum and its synergy with Jails offers a uniquely powerful isolation toolkit. For those designing new security-critical systems from scratch, however, studying Capsicum's capability model is no longer an academic exercise—it's a blueprint for building inherently more contained and resilient software, regardless of the underlying kernel primitives available.