Revolutionizing Nixpkgs by using readable modularity: An analysis of nixpkgs, current packaging practices and decentralization of packaging

A deep dive in Nixpkgs
A call for innovation
Open questions
Credits

This post follows from personal frustrations encountered when trying to package software for Nix. It is not aimed at discrediting the enormous amount of effort and contributions the ecosystem has put into Nixpkgs. This piece will also not be discussing details of the Nix evaluator itself. This is a purely conceptual (but practical) discussion within the Nix language about packaging, modularity and readability.

A deep dive in Nixpkgs #

A few months ago, I started reading the low-level documentation and implementation of the nixpkgs repository. One of the main challenges of nixpkgs is to find the source files of some of the expressions I encountered. A good grepping tool proved vital. I will list some of the interesting observations I made while exploring the source and some of the issues and PRs.

Cross-compilation bootstrapping #

Reading about stdenv bootstrapping, splicing and cross-compilation was very confusing. It is not clear at first glance what the exact data flow is that leads to stdenv.mkDerivation as we so dearly know it. Bootstrapping for different build and run platforms is implemented in a sliding window fashion. The idea works well in theory but the current implementation seems to be missing (some) clarity. The manual also states:

Conversely if one wishes to cross compile “faster”, with a “Canadian Cross” bootstrapping stage where build != host != target, more bootstrapping stages are needed

The theory allows for an "easy" way to provide cross-compiled builds for many different combinations of build and run platforms but the currently implemented method effectively restricts the types of combinations that are possible.

Package scopes and overlays #

Nixpkgs employs a number of methods to group packages in sub sets (or scopes). Uniformity is a highly desired quality in this context. It is especially important for contributors to have great clarity in the way they can make use of a scope.

Another important quality of these scopes is the ability to allow overrides or extensions. There are multiple ways to do this. Let's have a look at some methods that are based on the calculation of a fixpoint.

"Overlay" extensions #

Data flow of overlay

Overlays allow an extension to change everything. That means it can

add attributes to a set
replace attributes in a set with any value
modify the base attribute set that all extensions are applied to
cause changes in an earlier extension (i.e. transparency)

This turns out to be a very useful and elegant way of composing different attributes within a fixpoint. Because of the ability to let the extension override anything that was applied earlier, it is a "transparent" extension. A base set and a base set with one or more extensions behave exactly same when you look at the output and composability.

"Add-update" extensions #

Data flow of add-update

Add-update extensions allow an extension to

add attributes to a set
replace attributes in a set with any value

Note that this type of extension does not allow changes to an earlier extension or even the base attribute set. This makes it an interesting way to guarantee that previous stages are not affected by future extensions. I am not aware of projects that make use of this extension type.

This type of extension is not "transparent" in the way that the data dependencies can differ between their definition in an extension and their effective meaning in the final output. It is impossible to replace an attribute in the set and have this cause changes in earlier extensions that depend on this attribute and that come after the stage where this attribute was first added to the set.

"Append-only" extensions #

Data flow of append-only

Append-only extensions allow an extension to

add attributes to a set
completely replace an attribute in a set (i.e. no update)

Note that this type of extension makes it impossible to modify an attribute based on its old value. There are no data dependencies allowed in the fixpoint between attributes with the same name.

Conclusion #

The overlay type is the current widely-accepted solution for applying extensions to a fixpoint calculation. It is very elegant and simple to define but it may cause more changes than initially expected due to its transparent nature.

The "add-update" extension type is more restrictive than an overlay and may be useful in scenarios like bootstrapping stdenv. It guarantees that each successive stage cannot interfere with previous stages, making it more straightforward to reason about the progression. Another interesting property of this extension type is that stages cannot depend on future extensions, making it easier to reason about the data dependencies when bootstrapping something like stdenv. Both overlay and append-only extensions allow this future-dependency.

The "append-only" extension type is mainly listed for illustrative purposes rather than for its usefulness. I have not (yet) been able to come up with a use case that isn't already best fulfilled by the overlay or add-update extension type.

Package/derivation definitions #

The current methods to define a package/derivation using one or more language technologies are heavily based on the builder used in mkDerivation (i.e. stdenv). This is certainly an interesting and useful builder but it also forces packages to follow the stages that are defined in this builder. From the source code it is clear that cross-compilation was not initially added to this builder and only "patched" on at a later time.

The concept of setup hooks is also a very sneaky and hidden thing that is very necessary to have successful builds but are otherwise hard to reason about and break any composition that was done in nix by escaping through bash.

Like scopes, it is also important to have clear and uniform override semantics across different technologies, a fixpoint with overlays seems quite reasonable to use for a package/derivation definition.

A call for innovation #

The monolithic nature of Nixpkgs makes it hard to maintain for scale and adding new packages to the repository will inevitably be slow to propagate to the different channels. A more decentralized approach is necessary to allow for easier composition and faster innovation. Concepts like semantic versioning are widely accepted as a good way to move forward in a software project. This could be an interesting direction to research.

I have been experimenting with a different approach to decentralized package composition that is more advanced and useful than just adding an overlay to the global namespace of a complex fixpoint. Consider it a bit as a rewrite of the fundamentals learning from Nixpkgs and encouraging developers to innovate more freely in a modular framework.

I'm very happy to introduce corelib (by lack of a better name) as a practical experiment on providing a non-monolithic solution for sharing nix recipes and software builds. There's a number of interesting differences from Nixpkgs, some challenges to get it adopted pratically and some open questions that require discussion from the community.

Scopes as the default #

corelib starts from the idea of putting everything in a dedicated scope. This is especially good for the modularity and increases predictability of the final build recipe for packages. Multiple scopes are combined and evaluated into derivations or similar. A scope always knows its own dependencies and can access the expressions declared in these dependencies. Consider the following:

 1example = mkPackageSet {
 2  packages = self: {
 3    one = import ./one.nix;
 4    two = import ./two.nix;
 5  };
 6  lib = lib: {
 7    add1 = n: n + 1;
 8  };
 9  dependencies = {};
10};

This is a package set or scope that contains two packages (named one and two) and declares a pure function add1. Note that both the packages and the pure functions are declared in terms of a fixpoint, this maximizes the flexibility of a package set definition. This simple example does not have any dependencies, hence it cannot access any other package or lib function that is not defined in this set itself.

Because pure functions are quite a bit simpler to reason about, I'll first give an example where a dependency is used for access to additional pure functions.

 1stdLib = mkPackageSet {
 2  lib = lib: {
 3    helper = a: b: a + b;
 4    add = a: b: lib.self.helper a b;
 5  }
 6};
 7
 8example = mkPackageSet {
 9  lib = lib: {
10    plus5 = n: lib.std.add 5 n;
11  };
12  dependencies = {
13    std = stdLib;
14  };
15};

There are a number of things to discuss about this example. There is a standard library package set that only exports pure functions. This standard library can make use of its own functions by referring to the self scope in lib. The example package set then uses one of the functions from this stdLib dependency by referring to it by the name that was assigned in the dependency list (i.e. std).

This is a good example of the modularity introduced by corelib. A package set assigns a name to each dependency, making it accessible under that name. Because an attribute set needs to have unique values, it automatically causes dependencies to be named by unique values. The contract between package sets consists in the specific attributes they present in their scopes. As long as a scope knows what attributes will be in a dependency, it can give that dependency a fixed name and safely refer to these attributes under that dependency's scope.

The idea of providing packages and pure functions under scope attributes is repeated for packages. Each scope contains a fixpoint through the self scope and all dependencies are accessible under their assigned name.

Cross-compilation #

Packages are considerably more tricky to compose than pure functions. This is because of cross-compilation. The term "packages" is not actually a good term here, it's open for improvement but it really means "any nix expression that depends on a build and host/run (and target) platform". This includes packages but also functions like mkDerivation that may already depend on the build platform triple.

Consider the following theoretical idea for the definition of a package:

 1# Minimal set of utilities.
 2core:
 3mkPackage {
 4  /* The package recipe function -> returns derivation
 5
 6    This can be anything that builds from given dependencies,
 7    mkDerivation is used purely for the sake of example and familiarity for
 8    current nixpkgs users.
 9  */
10  function = { mkDerivation, libressl, ... }: mkDerivation {
11    buildInputs = [ libressl.onHost ];
12    # ...
13  };
14  # This package has a dependency, defaults have to be specified.
15  dep-defaults = { pkgs, lib, ... }@args: {
16    inherit (pkgs.ssl) libressl;
17    inherit (pkgs.stdenv) mkDerivation;
18  };
19}

There is even more to unpack here. First of all, mkPackage allows for some QoL functionality regarding override semantics. This is currently something I have not worked on yet (and would currently just pass on the attribute set given to it). This is very much an open question. The current design makes a distinction between the recipe that builds a derivation from dependencies and the actual dependencies that should be used. This makes the packaging a lot more robust for ecosystem changes. There is a global contract for what a package should behave like and it's explicit and clear in the way it should be defined. Even if the evaluation of the package changes, its recipe can always be reused given the proper dependencies/dependency format. This makes only sense of course if these expressions are publicly accessible.

The scoping happens in dep-defaults. When the package/derivation is evaluated, its dependencies are first collected through dep-defaults. These dependencies are given directly to the function and its result is returned. Hence dep-defaults is also able to pass on default feature flags to function. Note that the same scoping interface is presented as is the case for pure functions. A self scope that introduces a fixpoint for the package set and an explicitly named scope for every dependency of the package set this package is included in.

The cross-compilation is injected through the use of "explicit" splicing. Every package presented in the dependency scopes is automatically spliced through attributes onBuild, onHost and onTarget. This forces the recipe to make an explicit choice which cross-compiled dependency it should use during its build. If the package concerns a target platform (i.e. it emits code), it will define targetPlatform as a non-null attribute in function's return value and instead the 6 attributes onBuildForBuild, onBuildForHost, onBuildForTarget, onHostForHost, onHostForTarget and onTargetForTarget will be provided instead. This shows the explicit contract that if the target platform does not matter, each recipe must produce the exact same result for a given buildPlatform and hostPlatform regardless of the targetPlatform that is passed on to the dep-defaults and the function. This makes sense and is an improvement over the current implementation in Nixpkgs.

Bootstrapping a package set for a given platform triple can be very elegantly obtained by:

 1# Bootstrap for a single platform (x86_64-linux)
 2pkgs = bootstrap (self: {
 3  final = {
 4    triple = {
 5      buildPlatform = "x86_64-linux";
 6      hostPlatform = "x86_64-linux";
 7      targetPlatform = "x86_64-linux";
 8    };
 9    adjacent = {
10      pkgsBuildBuild = self.final;
11      pkgsBuildHost = self.final;
12      pkgsBuildTarget = self.final;
13      pkgsHostHost = self.final;
14      pkgsHostTarget = self.final;
15      pkgsTargetTarget = self.final;
16    };
17  };
18}) packageSets;

Given a attribute set of package scopes (packageSets), the different stages are defined in function of the required platform triples. The different host-target platform combinations are simply expressed in terms of an "fake" fixpoint self. For a cross-compiled build that compiles on platform A so that the compiled code runs on platform B, there would be 3 attribute sets defined in the "fixpoint". The adjacent package sets must be defined correctly in terms of these 3 attribute names taken from self. The rest happens automatically.

This approach is both incredibly more readable, elegant and expressive in terms of bootstrapping a cross-compiled package set. It does not restrict from evaluating packages for a "canadian cross" build.

CallPackage replacement #

While the dep-defaults makes it difficult to imitate the convenience of callPackage. My current implementation makes it easier to evaluate the same package format in the context of the package scope without exporting a package format to the public expressions. An example is given next.

 1# Minimal set of utilities.
 2core:
 3mkPackage {
 4  function = { mkDerivation, autoCall, ... }: let
 5    # The same package format can be evaluate in the same context given to `dep-defaults`.
 6    helper-package = autoCall ./helper-package.nix;
 7  in mkDerivation {
 8    # ...
 9  };
10  # This package has a dependency, defaults have to be specified.
11  dep-defaults = { pkgs, lib, autoCall ... }@args: {
12    inherit (pkgs.stdenv) mkDerivation;
13    inherit autoCall;
14  };
15}

The idea is that autoCall evaluates the same package format for the given context of the package set that is currently being evaluated whtin. This context consists of the self and dependencies scopes presented to dep-defaults. While the private property of this "hidden" package makes it impossible to provide explicitly spliced versions, it may still be useful as it corresponds to a forced dependency equivalent to onHostForTarget.

Open questions #

With the introduced framework, it may be easier to make nixpkgs modular and decentralized. I specifically did not specify how these package scope expressions should be collected into one nix evaluation because there are numerous ways (some upcoming even) to do this. Flakes, pinned inputs and the Ekala project are amongst the possibilities.

Further questions regarding the project remain:

How should overriding actually work? This is not yet fixed and can be discussed about. Does this have to integrate with the actual derivation recipe format or not? I think it is convenient (and more uniform) to have override semantics be the same for the recipe as for the final evaluated derivation.
How should the stdenv be defined? What are all the cross-compiled dependencies required? What is the platform triple of mkDerivation??
How to avoid setup hooks breaking everything? Can this be offloaded to Nix? Aren't there better ways to write a general builder?
Should the stdenv bootstrapping happen through the add-append extension type? How to make this streamlined for different platforms?
Platforms are expanded into many attributes in Nixpkgs, which ones are interesting? Which ones are necessary?
Similar to Nixpkgs, how does security patching work? Is "grafting" a possible option? Is there anything better than just waiting for upstream patches?
Is evaluation time a concern? Identical package scopes are not deduplicated when used as different dependencies, is this a problem?

Credits #

Most of my inspiration comes from some excellent work in PR #227327 and PR #273815. Credits to the wonderful contributors adding ideas and suggestions to these discussions.

Finally also thanks to all the readers who will hopefully be motivated to give me feedback and work on a better nix experience for the future. Actively suggesting improvements is the best way to help this prototype grow into a practical and proven project. The craziest ideas are most welcome because these have sometimes the most chance of actively improving the UX of the ecosystem.

#nix #nixpkgs #ekala #eka #packaging #reproducibility

last updated: 2025-02-20

Table of Contents