- How to create an archive manually.
- How to create a simple package to install a Go binary using the
dpkg
command. - What is the format of the archive. How to check its content.
- How to create the same package using standard Unix tools.
- Case study: How to create a package in Go.
- How to create a simple package to install a Go binary using the
- What happens when you install a package using
dpkg
.- What contains the database.
- How files are copied to the host.
- What changed in the database.
- How to check that the package has been installed.
- Case study: How to install a package like
dpkg
in Go.
- What happens when you install a package using
apt
.- How does the command
apt
know where to search for packages. - What is the format of a repository.
- What does the command
apt update
. - How
apt
usesdpkg
under the hood. - Case study: How to install a package like
apt
in Go.
- How does the command
A Linux package is a bundle of files that your package manager knows how to unpack on your system. Installing packages is something you are doing regularly and I suggest that we are looking under the hood to understand the steps between the creation and the installation of a Linux package.
I assume you have already installed many Linux packages. A basic comprehension of the languages C and C++ is required and being familiar with the Go language will be helpful to follow the case studies.
Table of Contents
- How to create an archive manually.
- What you need to know about the Debian package format, the
dpkg
command, the DEB822 format. - The command
dpkg --build
. - The implementation in Go.
- What you need to know about the Debian package format, the
- What happens when you install a package using
dpkg
.- What you need to know about conffiles, the Dpkg database.
- The command
dpkg -i
. - The implementation in Go.
- What happens when you install a package using
dpkg
.- What you need to know about
apt
,apt-get
,aptitude
, configuration files, configuration options, source lists, repositories, diffs,/var/cache/apt/
,/var/lib/apt/
, cache files. - The commands
apt update
,apt list
, andapt install
. - The implementation in Go.
- What you need to know about
The repositories dpkg
and apt
contain more than 100,000 lines of code.
When trying to explain how code works, there is a though balance to find between showing the code untouched, and simplifying it at the risk of denaturing it. In this post, I decide to use both approaches. I present the original code slightly annotated, removing only debug messages and the support of command flags not covered in this article. I also present a minimal rewrite of these programs in Go richly commented. Overall, that represents a lot of code, but as developers, we are used to skim over large codebase, and I hope you will find your way.
In addition, there are many asides to explain some Dpkg and Apt features that you can safely skip if you are already familiar with the tools.
Please remember that if you find the post too long to read, just imagine how long it was to write it 😁. Happy reading!
How to create a package manually
Linux packages are commonly available in a .deb
and a .rpm
file.
- The
.deb
files are meant for distributions of Linux that derive from Debian (Ubuntu, Linux Mint, etc.). - The
.rpm
files are used primarily by distributions that derive from Redhat based distros (Fedora, CentOS, RHEL).
Because there are two main Linux distributions: Red Hat and Debian and each one has its own file formats: .rpm
for Red Hat Package Manager and .deb
for Debian.
Both package formats have a lot in common and we will only discuss Debian packages in this document. The following table summarizes the main differences between the archive files.
.rpm | .deb | |
---|---|---|
Archive Format | Uses the cpio command and file format | Uses the ar command and file format |
Package Manager | rpm + (1997, Written in C) | dpkg + (1993, Written in C) |
Frontend Package Manager | yum + (2011, Written in Python) | apt + (1999, Written in C++) |
Database | /var/lib/rpm | /var/lib/dpkg |
Database Format | Berkeley DB files | DEB 822 flat files |
A package is a collection of files to distribute applications or libraries via the Debian package management system. The aim of packaging is to allow the automation of installing, upgrading, configuring, and removing computer programs in a consistent manner.
A .deb
file is an ar
archive. The ar
command is an ancestor of the common tar
command and was already present in the first Unix version in 1971! Now, this command is (mostly) only used by Debian packages. This archive contains 3 files:
-
debian-binary
: A text file containing2.0\n
. This states the version of the deb file format. For 2.0, all other lines get ignored. -
data.tar.gz
: Atar
archive containing all files that will be installed with their destination paths././sbin/./sbin/parted./usr/./usr/share/./usr/share/man/./usr/share/man/man8/./usr/share/man/man8/parted.8.gz./usr/share/doc/./usr/share/doc/parted/./usr/share/doc/parted/README.Debian./usr/share/doc/parted/copyright./usr/share/doc/parted/changelog.Debian.gz./usr/share/doc/parted/changelog.gz -
control.tar.gz
: Atar
archive containing various files useful for thedpkg
command to do its job: metadata about the package (control
) including the list of required dependencies, the md5 sums of every data file to check integrity (md5sums
), and also maintainer scripts (ex:postinst
for post-installation,prerm
for pre-removal, etc.), which are executables that must be run when installing or removing a package.controlmd5sumspostinstprerm
Further documentation:
- 5 reasons why a Debian package is more than a simple file archive, Raphaël Hertzog
- Debian New Maintainers’ Guide, the official procedure to create a package the “Debian way”.
You can also learn more about Debian packages by installing a Debian package 😀 (the PDF is also available online):
$ apt install packaging-tutorial# Check /usr/share/doc/packaging-tutorial/packaging-tutorial.pdf
dpkg
The project Dpkg started in 1994, at the same time the Debian package format was created, and thus the command dpkg
works only with .deb
binary archives. You must provide the archive as the command does not know how to retrieve it by itself. The command manages a database stored under /var/lib/dpkg
to keep note of everything that is installed on the server, which is essential to determine what to clean when you remove a package.
Note that the command dpkg --build
redirects to the command dpkg-deb --build
and the command dpkg --list
redirects to the command dpkg-query --list
. The code of these commands is present in the same repository in ./dpkg-deb/
and ./src/querycmd.c
respectively.
- Official Repository: https://git.dpkg.org/cgit/dpkg/dpkg.git
- GitHub Mirror: https://github.com/guillemj/dpkg
To illustrate this post, we will use the Hello World example present in the Go by example tutorial.
$ cat > hello.go << HEREpackage mainimport "fmt"func main() { fmt.Println("hello world")}HERE$ go run hello.gohello world$ env GOOS=linux GOARCH=amd64 go build hello.go # Make sure to build for Linux$ lshello hello.go$ chmod +x hello$ ./hellohello world
Our goal is to package this binary and the most popular solution to build a Debian package for a Go program is the utility dh-golang
. As we want to use the most basic commands to get as close as possible to the process, we will use the standard dpkg
command even if that means not building a world-class Debian package.
Prerequisites
To test the packages we are going to build and install, we will use a Debian VM in order to keep your system safe. We will use Vagrant to create this server. Make sure Vagrant is installed on your system by following the installation procedure for your operating system.
There is a companion GitHub repository julien-sobczak/linux-packages-under-the-hood to this blog post. This repository is optional for this article. It mostly contains a Vagrantfile
to start the virtual machine, the files to create various Debian versions of the package hello
, and also the Go code that reimplements minimal versions of the dpkg
and apt
commands. You will find more information in the README.md
file of this repository.
Then:
$ mkdir sandbox$ cd sandbox$ vagrant init$ echo > Vagranfile <<EOF# -*- mode: ruby -*-# vi: set ft=ruby :
Vagrant.configure("2") do |config| config.vm.box = "debian/buster64"endEOF$ vagrant up# wait a few minutes$ vagrant sshvagrant$ uname -aLinux buster 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux
When using Vagrant, the directory containing your Vagrantfile
is accessible from the virtual machine from the directory /vagrant
. We will use it to copy our hello
binary program:
$ lsVagrantfile$ cp /path/to/hello .$ vagrant sshvagrant$ cd /vagrantvagrant$ lshello Vagrantfile
All commands whose prompt starts with vagrant#
must be run inside the virtual machine. Otherwise, run the commands from your host.
We are ready to create a Debian package for our Hello program.
vagrant# cd /vagrant/vagrant# mkdir -p ./debian/usr/binvagrant# cp hello ./debian/usr/bin/vagrant# mkdir -p ./debian/DEBIANvagrant# cat > ./debian/DEBIAN/control <<EOFPackage: helloVersion: 1.1-1Section: basePriority: optionalArchitecture: amd64Maintainer: Julien SobczakDescription: Say HelloEOFvagrant# cat > ./debian/DEBIAN/preinst <<EOF#!/bin/shecho "preinst says hello";EOFvagrant# cat > ./debian/DEBIAN/postinst <<EOF#!/bin/shecho "postinst says hello";EOFvagrant# tree /vagrant/debian/|-- DEBIAN| |-- control| |-- preinst| `-- postinst`-- usr `-- bin `-- hello
- 1
- The first version of our package
hello
contains only the binaryhello
built previously and a DEB822 filecontrol
with the package metadata. - 2
- We also append basic maintainer scripts that displays a message in the console so that we will know when the installation process runs them.
This format can be seen as an ancestor of YAML or JSON. Here is an example showing the three supported types of fields:
FieldSimple: simple valueFieldFolded: very long value continuing on the next line starting with a space.FieldMultiline: /usr/bin/cmd1 /usr/bin/cmd2
The format is used by the file control
but also by some files in the dpkg
database such as /var/lib/dpkg/status
. This format is also used by the command apt
, which will be covered later.
Further documentation: Check the man page for additional information.
dpkg --build
We will use the command dpkg --build
to build our package:
$ apt install fakeroot # install the fakeroot command$ fakeroot dpkg --build debian hello_1.1-1_amd64.deb
- 1
- This command builds a Debian package, which as outlined before, consists in building an
ar
archive containing twotar
archives: the content of our directoryDEBIAN/
incontrol.tar.gz
and the other files indata.tar.gz
. We use thefakeroot
command to make sure files inside the archive are created with the userroot
.
We can also reproduce its working using standard Bash commands:
$ apt install binutils # install the ar command$ apt install fakeroot # install the fakeroot command$ echo 2.0 > debian-binary$ cd debian && tar czf ../data.tar.gz [a-z]* && cd ..$ cd debian/DEBIAN/ && tar czf ../../control.tar.gz * && cd ../..$ fakeroot ar r hello_1.1-1_amd64.deb debian-binary control.tar.gz data.tar.gzar: creating hello_1.1-1_amd64.deb
- 1
- The package will fail most linter checks. Indeed, we ignored many of the best practices that higher-level commands ensure but we will still be able to install this package on our server.
Now is the time to look at the code. Dpkg is written in C, and the function executed by the command dpkg --build
is the function do_build
in ./dpkg-deb/build.c
.
intdo_build(const char *const *argv){ struct compress_params control_compress_params; struct tar_pack_options tar_options; struct dpkg_error err; struct dpkg_ar *ar; const char *dir, *dest; char *ctrldir; char *debar; char *tfbuf; int gzfd;
/* Decode our arguments. */ dir = *argv++; dest = *argv++;
debar = gen_dest_pathname(dir, dest); ctrldir = str_fmt("%s/%s", dir, "DEBIAN");
/* Now that we have verified everything it is time to actually * build something. Let's start by making the ar-wrapper. */ ar = dpkg_ar_create(debar, 0644);
/* Create a temporary file to store the control data in. */ tfbuf = path_make_temp_template("dpkg-deb"); gzfd = mkstemp(tfbuf); free(tfbuf);
/* Select the compressor to use for our control archive. */ control_compress_params.type = COMPRESSOR_TYPE_GZIP; control_compress_params.strategy = COMPRESSOR_STRATEGY_NONE; control_compress_params.level = -1;
/* Fork a tar to package the control-section of the package. */ tar_options.mode = "u+rw,go=rX"; tar_options.root_owner_group = true; tarball_pack(ctrldir, control_treewalk_feed, &tar_options, &control_compress_params, gzfd);
free(ctrldir);
/* We have our first file for the ar-archive. Write a header for it * to the package and insert it. */ const char deb_magic[] = "2.0\n"; char adminmember[16 + 1];
sprintf(adminmember, "%s%s", "control.tar", compressor_get_extension(control_compress_params.type));
dpkg_ar_put_magic(ar); dpkg_ar_member_put_mem(ar, "debian-binary", deb_magic, strlen(deb_magic)); dpkg_ar_member_put_file(ar, adminmember, gzfd, -1);
close(gzfd);
/* Control is done, now we need to archive the data. */
/* Start by creating a new temporary file. */ tfbuf = path_make_temp_template("dpkg-deb"); gzfd = mkstemp(tfbuf); free(tfbuf);
/* Pack the directory into a tarball, feeding files from the callback. */ tar_options.mode = NULL; tar_options.root_owner_group = opt_root_owner_group; tarball_pack(dir, file_treewalk_feed, &tar_options, &compress_params, gzfd);
/* Okay, we have data.tar as well now, add it to the ar wrapper. */ char datamember[16 + 1];
sprintf(datamember, "%s%s", "data.tar", compressor_get_extension(compress_params.type));
dpkg_ar_member_put_file(ar, datamember, gzfd, -1);
close(gzfd);
if (fsync(ar->fd)) ohshite(_("unable to sync file '%s'"), ar->name);
dpkg_ar_close(ar);
free(debar);
return 0;}
- 1
- The variable
dir
is the local directory containing the package files to build. The variabledest
is the optional filename for the final package file anddebar
is the final name as determined by the functiongen_dest_pathname
, which determines a default name if the argument is missing. - 2
- The function
dpkg_ar_create
creates the archive file named after the variabledebar
. - 3
- The function
dpkg_ar_put_magic
defines the magic number!<arch>\n
telling Linux the file is of typear
. - 4
- The function
dpkg_ar_member_put_mem
appends the filedebian-binary
with the content of the variabledeb_magic
. - 5
- The function
dpkg_ar_member_put_file
appends the filecontrol.tar
with the content of a temporary file. - 6
- Same as above for
data.tar
. - 7
- The function
dpkg_ar_close
is part of the housecleaning logic and simply closes the file descriptor.
Case Study
What follows is a minimal rewrite of this code in Go. The full code is available on GitHub in the repository julien-sobczak/linux-packages-under-the-hood.
package main
import ( "archive/tar" "bytes" "fmt" "io/ioutil" "log" "os" "path/filepath" "strings"
"github.com/blakesmith/ar")
func main() { // This program expects two arguments: // - The directory following the resources to package in the archive. // - The name of the output .deb file if len(os.Args) < 3 { log.Fatalf("Missing 'directory' and/or 'dest' arguments.") }
directory := os.Args[1] dest := os.Args[2]
// Create the Debian archive file fdeb, _ := os.Create(dest) defer fdeb.Close()
// A Debian package is an archive using the AR format. // We use an external Go module to create the archive // as the standard library does not support it but supports // the tar format that will be used for the control and data files.
writer := ar.NewWriter(fdeb) writer.WriteGlobalHeader()
// A Debian package contains 3 files that must be // added in a precise order. // We use two utility functions that will be defined later: // - arPutFile is a wrapper around the library to add an entry. // - tarballPack creates a tarball using the Go library.
// Append debian-binary arPutFile(writer, "debian-binary", []byte("2.0\n"))
// Append control.tar controlDir := filepath.Join(directory, "DEBIAN") controlTarball := tarballPack(controlDir, nil) arPutFile(writer, "control.tar", controlTarball)
// Append data.tar dataDir := directory dataTarball := tarballPack(dataDir, func(path string) bool { // Filter DEBIAN/ files return strings.HasPrefix(path, controlDir) }) arPutFile(writer, "data.tar", dataTarball)}
// arPutFile adds a new entry in a AR archive.func arPutFile(w *ar.Writer, name string, body []byte) { hdr := &ar.Header{ Name: name, Mode: 0600, Uid: 0, Gid: 0, Size: int64(len(body)), } w.WriteHeader(hdr) w.Write(body)}
// tarballPack traverses a local directory to add all files under it// into a tarball.func tarballPack(directory string, filter func(string) bool) []byte { var bufdata bytes.Buffer twdata := tar.NewWriter(&bufdata) filepath.Walk( directory, func(path string, info os.FileInfo, errParent error) error { if info.IsDir() { return nil } if filter != nil && filter(path) { return nil } sep := fmt.Sprintf("%c", filepath.Separator) name := strings.TrimPrefix(strings.TrimPrefix(path, directory), sep) hdr := &tar.Header{ Name: name, Uid: 0, // root Gid: 0, // root Mode: 0650, Size: info.Size(), } twdata.WriteHeader(hdr) content, _ := ioutil.ReadFile(path) twdata.Write(content)
return nil }) twdata.Close()
return bufdata.Bytes()}
To run the code:
$ go run main.go hello hello.deb
To inspect the resulting archive hello.deb
, we can use the command dpkg -c
to view the data files or use the command ar
to view the real content of the archive:
vagrant# dpkg -c /vagrant/hello.deb-rw-r-x--- 0/0 2034781 1970-01-01 00:00 usr/bin/hello
vagrant# ar -tf /vagrant/hello.debar -tf /vagrant/hello.debdebian-binarycontrol.tardata.tarvagrant# ar -xf /vagrant/hello.deb data.tarvagrant# tar -tf data.tarusr/bin/hello
🎉 We have finished with the format .deb
. This completes the first part of this article. We created a Debian package from scratch! Now, we will inspect the installation process.
What happens when you install a package using dpkg
The command to install a Debian binary package file is dpkg -i _myarchive.deb_
and will be the subject of this second part.
dpkg -i
Let’s run the command on our Debian archive:
vagrant# dpkg -i /vagrant/hello.debSelecting previously unselected package hello.(Reading database ... 32264 files and directories currently installed.)Preparing to unpack /vagrant/hello.deb ...preinst says helloUnpacking hello (1.1-1) ...Setting up hello (1.1-1) ...postinst says hello
vagrant# hellohello world
The command does a lot of interesting things and the code is larger than the previous build
command. The man page details the installation steps and we will present the main code for every one of them.
The entry point for the installation of a package is the function archivefiles
, and most specifically the function process_archive
:
intarchivefiles(const char *const *argv){ int i;
modstatdb_open(msdbrw_readonly);
for (i = 0; argv[i]; i++) { process_archive(argv[i]); }
process_queue();
trigproc_run_deferred(); modstatdb_shutdown();
return 0;}
- 1
- The main function iterates over all packages to install and delegates to the function
process_archive
for the unpacking. - 2
- The function
process_queue
configures all packages that have been unpacked in the previous step. We will explain the differences between these two steps.
Let’s go!
- Extract the control files of the new package.
void process_archive(const char *filename) { … cidir = get_control_dir(cidir); pid = subproc_fork(); if (pid == 0) { cidirrest[-1] = '\0'; execlp("dpkg-deb", "dpkg-deb", "--control", filename, cidir, NULL); ohshite(_("unable to execute %s (%s)"), _("package control information extraction"), BACKEND); } subproc_reap(pid, "dpkg-deb --control", 0); …}
- 1
- Create a temporary directory (commonly
/var/lib/dpkg/tmp.ci/
). - 2
- Run the command
dpkg --control
to extract theDEBIAN/
directory into it.
Then, the code parses the control
file to initialize the struct pkginfo
, which is the main structure to represent a package. (You can check the const fieldinfos
in parse.c
to find the mapping between the file and the struct.) Here is a minimal version of this structure with the most important fields annotated:
/** * Node describing an architecture package instance. * * This structure holds state information. */struct pkginfo { struct pkgset *set;
enum pkgwant want; /** The error flag bitmask. */ enum pkgeflag eflag; enum pkgstatus status; enum pkgpriority priority;
struct pkgbin installed; struct pkgbin available;
struct fsys_namenode_list *files; bool files_list_valid;
/* The status has changed, it needs to be logged. */ bool status_dirty;}
- 1
- The enum
want
determines the expected action for this package, likePKG_WANT_INSTALL
for installation, orPKG_WANT_PURGE
for the removal of the package and its configuration files. - 2
- The
eflag
is initialized if the parser finds an error in the control file (ex: missing field), and also later during the installation process. - 3
- The
installed
andavailable
fields contain most of the information present in thecontrol
files concerning a possible installed version of the package and the new version to install. - 4
- Some fields like
files
are initialized later by other functions likedb-fsys-files.c#ensure_packagefiles_available
, which reads the file/var/lib/dpkg/list/hello.list
to populate this field. - 5
- The
status_dirty
flag is set when the current status of the package changes, for example fromPKG_STAT_UNPACKED
toPKG_STAT_INSTALLED
.
And now, the function responsible to create this struct:
void process_archive(const char *filename) { struct pkginfo *pkg; … parsedb(cidir, parsedb_flags, &pkg); …}
- 1
- The function
parsedb
simply reads a file in Debian RFC822 format, the format we used to write thecontrol
file.
- If another version of the same package was installed before the new installation, execute
prerm
script of the old package.
void process_archive(const char *filename) { … oldversionstatus = pkg->status;
if (oldversionstatus == PKG_STAT_INSTALLED) { pkg_set_eflags(pkg, PKG_EFLAG_REINSTREQ); pkg_set_status(pkg, PKG_STAT_HALFCONFIGURED); modstatdb_note(pkg); if (dpkg_version_compare(&pkg->available.version, &pkg->installed.version) >= 0) /* Upgrade or reinstall. */ maintscript_fallback(pkg, PRERMFILE, "pre-removal", cidir, cidirrest, "upgrade", "failed-upgrade"); else /* Downgrade => no fallback */ maintscript_installed(pkg, PRERMFILE, "pre-removal", "upgrade", versiondescribe(&pkg->available.version, vdew_nonambig), NULL); pkg_set_status(pkg, PKG_STAT_UNPACKED); oldversionstatus = PKG_STAT_UNPACKED; modstatdb_note(pkg); } …}
- 1
- The status read during parsing is reused to determine if the package is already installed.
- 2
- Update the package status to keep trace that the package has been partially installed. The status will be changed several times during the installation. The function
modstatdb_note
persists the new state to disk. - 3
maintscript_fallback
andmaintscript_installed
delegates tomaintscript_exec
defined in the same filesrc/script.c
. This function runs the script in a fork process and aborts if the return code is greater than 0. Differences between the various calls are explained in the next step.
- Run
preinst
script, if provided by the package.
void process_archive(const char *filename) { … if (pkg->status == PKG_STAT_NOTINSTALLED) { pkg->installed.version = pkg->available.version; pkg->installed.multiarch = pkg->available.multiarch; } pkg_set_status(pkg, PKG_STAT_HALFINSTALLED); modstatdb_note(pkg); if (oldversionstatus == PKG_STAT_NOTINSTALLED) { maintscript_new(pkg, PREINSTFILE, "pre-installation", cidir, cidirrest, "install", NULL); } else if (oldversionstatus == PKG_STAT_CONFIGFILES) { maintscript_new(pkg, PREINSTFILE, "pre-installation", cidir, cidirrest, "install", versiondescribe(&pkg->installed.version, vdew_nonambig), versiondescribe(&pkg->available.version, vdew_nonambig), NULL); } else { maintscript_new(pkg, PREINSTFILE, "pre-installation", cidir, cidirrest, "upgrade", versiondescribe(&pkg->installed.version, vdew_nonambig), versiondescribe(&pkg->available.version, vdew_nonambig), NULL); } …}
- 1
- The function
maintscript_new
is a variadic function whose latest arguments are passed to the maintainer script to provide context. For example, thepreinst
maintainer script can be called using one of these formats:preinst install
,preinst install <old-version>
, orpreinst upgrade <old-version>
. This allows the package developer to take different actions based on the current state of the package.
- Unpack the new files, and at the same time back up the old files, so that if something goes wrong, they can be restored.
This step is similar to running the command dpkg --unpack
. The unpacking process is simple to understand: extract every file present in the data.tar
to their destination path. But things are not so simple as outlined by this comment:
/* * Now we unpack the archive, backing things up as we go. * For each file, we check to see if it already exists. * There are several possibilities: * * + We are trying to install a non-directory ... * - It doesn't exist. In this case we simply extract it. * - It is a plain file, device, symlink, &c. We do an ‘atomic * overwrite’ using link() and rename(), but leave a backup copy. * Later, when we delete the backup, we remove it from any other * packages' lists. * - It is a directory. In this case it depends on whether we're * trying to install a symlink or something else. * = If we're not trying to install a symlink we move the directory * aside and extract the node. Later, when we recursively remove * the backed-up directory, we remove it from any other packages' * lists. * = If we are trying to install a symlink we do nothing - ie, * dpkg will never replace a directory tree with a symlink. This * is to avoid embarrassing effects such as replacing a directory * tree with a link to a link to the original directory tree. * + We are trying to install a directory ... * - It doesn't exist. We create it with the appropriate modes. * - It exists as a directory or a symlink to one. We do nothing. * - It is a plain file or a symlink (other than to a directory). * We move it aside and create the directory. Later, when we * delete the backup, we remove it from any other packages' lists. * * Install non-dir Install symlink Install dir * Exists not X X X * File/node/symlink LXR LXR BXR * Directory BXR - - * * X: extract file/node/link/directory * LX: atomic overwrite leaving backup * B: ordinary backup * R: later remove from other packages' lists * -: do nothing * * After we've done this we go through the remaining things in the * lists of packages we're trying to remove (including the old * version of the current package). This happens in reverse order, * so that we process files before the directories (or symlinks-to- * directories) containing them. * * + If the thing is a conffile then we leave it alone for the purge * operation. * + Otherwise, there are several possibilities too: * - The listed thing does not exist. We ignore it. * - The listed thing is a directory or a symlink to a directory. * We delete it only if it isn't listed in any other package. * - The listed thing is not a directory, but was part of the package * that was upgraded, we check to make sure the files aren't the * same ones from the old package by checking dev/inode * - The listed thing is not a directory or a symlink to one (ie, * it's a plain file, device, pipe, &c, or a symlink to one, or a * dangling symlink). We delete it. * * The removed packages' list becomes empty (of course, the new * version of the package we're installing will have a new list, * which replaces the old version's list). * * If at any stage we remove a file from a package's list, and the * package isn't one we're already processing, and the package's * list becomes empty as a result, we ‘vanish’ the package. This * means that we run its postrm with the ‘disappear’ argument, and * put the package in the ‘not-installed’ state. If it had any * conffiles, their hashes and ownership will have been transferred * already, so we just ignore those and forget about them from the * point of view of the disappearing package. * * NOTE THAT THE OLD POSTRM IS RUN AFTER THE NEW PREINST, since the * files get replaced ‘as we go’. */
We still haven’t talked about conffiles. When upgrading a package, you want the package manager to overwrite the previous version of the files, except for configuration files. You don’t want to lose your customizations, don’t you?
A Debian archive can therefore include a file conffiles
in the DEBIAN/
directory to list a subset of files present in the data.tar
archive. These “conffiles” are files that must be managed specially to take care of preserving user changes.
Conffiles explains the difference between the commands dpkg remove
and dpkg purge
. (The first command ignores conffiles while the second removes them completely.)
The version 2.1-1 of our package hello defines a different version written in Python, which reads a configuration file /etc/hello/settings.conf
, also present in the package. This conffile is referenced in DEBIAN/conffiles
.
If we try to create this configuration file manually before installing this new version:
vagrant# mkdir /etc/hellovagrant# echo "Language: English" > /etc/hello/settings.conf
vagrant# dpkg -i /vagrant/hello/hello_2.1-1_amd64.debSelecting previously unselected package hello.(Reading database ... 25063 files and directories currently installed.)Preparing to unpack .../hello/hello_2.1-1_amd64.deb ...preinst says helloUnpacking hello (2.1-1) ...Setting up hello (2.1-1) ...
Configuration file '/etc/hello/settings.conf' ==> File on system created by you or by a script. ==> File also in package provided by package maintainer. What would you like to do about it ? Your options are: Y or I : install the package maintainers version N or O : keep your currently-installed version D : show the differences between the versions Z : start a shell to examine the situation The default action is to keep your current version.*** settings.conf (Y/I/N/O/D/Z) [default=N] ? YInstalling new version of config file /etc/hello/settings.conf ...postinst says hello
vagrant# cat /etc/hello/settings.confLanguage: French
The package manager detects the conflict by keeping a checkum of the last installed version of every conffile (files named md5sums
in the database) and asks the user what to do about it. Options exist to avoid the prompt and the default is, of course, to preserve existing conffiles.
The unpacking runs the command dpkg-deb --fsys-tarfile
to extract the content of data.tar
. The command sends each file to a pipe created in the same function process_archive
and delegates to the function tarobject
defined in archives.c
, which implements all the rules presented in the previous comment. The code is rather obvious but is too long to introduce it in this article.
We can mention that the backup process consists in extracting files with a special extension like .dpkg-tmp
, .dpkg-old
and .dpkg-new
. Files are renamed to their definitive name if no problem occurs, except for conffiles, which must wait until the last installation step to be renamed.
- If another version of the same package was installed before the new installation, execute the
postrm
script of the old package. Note that this script is executed after thepreinst
script of the new package, because new files are written at the same time old files are removed.
The execution code of the maintainer script postrm
is similar to the previous scripts.
What is more interesting is what happens at the end of the unpacking step. Indeed, the Dpkg database is updated to reflect the changes.
Dpkg maintains a database under /var/lib/dpkg
, which regroups various files including the followings:
file | description |
---|---|
/var/lib/dpkg/status | A DEB822 file containing the status information for all packages (i.e., the current state of each package and the fields in their control file). |
/var/lib/dpkg/status-old | The last backup of the /var/lib/dpkg/status file. |
/var/lib/dpkg/available | The list of packages available for installation or upgrade from external origins only if you are using dselect as your package manager frontend (instead of apt or aptitude ). See details. (not described in this article) |
/var/lib/dpkg/diversions | The list of diversions used by dpkg and set by dpkg-divert to force a package file to be installed elsewhere. (not described in this article) |
/var/lib/dpkg/statoverride | The stats used by dpkg and set by dpkg-statoverride to change the default ownership and mode of the package files. (not described in this article) |
In addition, for every installed package, Dpkg keeps a list of additional files:
file | description |
---|---|
/var/lib/dpkg/info/<package_name>.list | The list of files and directories installed by the package (the data.tar listing) |
/var/lib/dpkg/info/<package_name>.md5sums | The list of MD5 hash values for files installed by the package. Used for example to detect if a conffile had been edited by the user. |
/var/lib/dpkg/info/<package_name>.conffiles | The list of configuration files. Same as the conffiles file under DEBIAN/ |
/var/lib/dpkg/info/<package_name>.{preinst, postinst, prerm, postrm} | Copies of the maintainer scripts present in the package under DEBIAN/ . |
/var/lib/dpkg/info/<package_name>.config | Debconf-generated configuration files used only by a minority of packages. (not described in this article) |
Here are the different functions called to update the different files in the database:
void process_archive(const char *filename) { …
/* OK, now we can write the updated files-in-this package list, * since we've done away (hopefully) with all the old junk. */ write_filelist_except(pkg, &pkg->available, newfiles_queue.head, 0);
/* We also install the new maintainer scripts, and any other * cruft that may have come along with the package. First * we go through the existing scripts replacing or removing * them as appropriate; then we go through the new scripts * (any that are left) and install them. */ pkg_infodb_update(pkg, cidir, cidirrest);
/* We store now the checksums dynamically computed while unpacking. */ write_filehash_except(pkg, &pkg->available, newfiles_queue.head, 0);
/* Right, the package we've unpacked is now in a reasonable state. * The only thing that we have left to do with it is remove * backup files, and we can leave the user to fix that if and when * it happens (we leave the reinstall required flag, of course). */ pkg_set_status(pkg, PKG_STAT_UNPACKED); modstatdb_note(pkg);
...}
- 1
- Edit the file
/var/lib/dpkg/info/hello.list
. - 2
- Copy all files under
DEBIAN/
into/var/lib/dpkg/info/
by prefixing them with the package namehello.
. - 3
- Edit the file
/var/lib/dpkg/info/hello.md5sums
. - 4
- Update the field
Status
in/var/lib/dpkg/status
for the packagehello
to set the valueinstall ok unpacked
.
We are getting close to the end of the function process_archive
. The last instruction is enqueue_package(pkg)
. This function simply push a new package waiting to be configured in a queue. Since the dpkg
command can be executed with several packages to install, the queue ensures all packages have been unpacked before proceeding to their final configuration.
We are now back to the archivefiles
function:
intarchivefiles(const char *const *argv){ int i;
modstatdb_open(msdbrw_readonly);
for (i = 0; argv[i]; i++) { process_archive(argv[i]); }
process_queue();
trigproc_run_deferred(); modstatdb_shutdown();
return 0;}
- 1
- We are here.
What follows is the data structure representing the queue:
static struct pkg_queue queue = { .head = NULL, .tail = NULL, .length = 0 };
/* * During the packages queue processing, the algorithm for deciding what to * configure first is as follows: * * Loop through all packages doing a ‘try 1’ until we've been round and * nothing has been done, then do ‘try 2’, and subsequent ones likewise. * The incrementing of ‘dependtry’ is done by process_queue(). * * Try 1: * Are all dependencies of this package done? If so, do it. * Are any of the dependencies missing or the wrong version? * If so, abort (unless --force-depends, in which case defer). * Will we need to configure a package we weren't given as an * argument? If so, abort ─ except if --force-configure-any, * in which case we add the package to the argument list. * If none of the above, defer the package. * * Try 2: * Find a cycle and break it (see above). * Do as for try 1. * * Try 3: * Start processing triggers if necessary. * Do as for try 2. * * Try 4: * Same as for try 3, but check trigger cycles even when deferring * processing due to unsatisfiable dependencies. * * Try 5 (only if --force-depends-version): * Same as for try 2, but don't mind version number in dependencies. * * Try 6 (only if --force-depends): * Do anyway. */enum dependtry { DEPEND_TRY_NORMAL = 1, DEPEND_TRY_CYCLES = 2, DEPEND_TRY_TRIGGERS = 3, DEPEND_TRY_TRIGGERS_CYCLES = 4, DEPEND_TRY_FORCE_DEPENDS_VERSION = 5, DEPEND_TRY_FORCE_DEPENDS = 6, DEPEND_TRY_LAST,};enum dependtry dependtry = DEPEND_TRY_NORMAL;int sincenothing = 0;
- 1
- The global variable containing the packages to configure.
- 2
- These variables control the algorithm that decides which package must be configured first, which must be postponed, and when to abort the installation completely.
Finally, the logic to empty the queue present in the function process_queue
:
void process_queue(void) { struct pkginfo *volatile pkg; volatile enum action action_todo;
while (!pkg_queue_is_empty(&queue)) { pkg = pkg_queue_pop(&queue);
ensure_package_clientdata(pkg); pkg->clientdata->enqueued = false;
action_todo = cipaction->arg_int;
if (sincenothing++ > queue.length * 3 + 2) { /* Make sure that even if we have exceeded the queue since not having * made any progress, we are not getting stuck trying to progress by * trigger processing, w/o jumping into the next dependtry. */ dependtry++; sincenothing = 0; if (dependtry >= DEPEND_TRY_LAST) internerr("exceeded dependtry %d (sincenothing=%d; queue.length=%d)", dependtry, sincenothing, queue.length); } else if (sincenothing > queue.length * 2 + 2) { if (dependtry >= DEPEND_TRY_TRIGGERS && progress_bytrigproc && progress_bytrigproc->trigpend_head) { enqueue_package(pkg); pkg = progress_bytrigproc; progress_bytrigproc = NULL; action_todo = act_configure; } else { dependtry++; sincenothing = 0; if (dependtry >= DEPEND_TRY_LAST) internerr("exceeded dependtry %d (sincenothing=%d, queue.length=%d)", dependtry, sincenothing, queue.length); } }
debug(dbg_general, "process queue pkg %s queue.len %d progress %d, try %d", pkg_name(pkg, pnaw_always), queue.length, sincenothing, dependtry);
deferred_configure(pkg); }
if (queue.length) internerr("finished package processing with non-empty queue length %d", queue.length);}
- 1
- The function
deferred_configure
is the main function doing the configuration and is the subject of the next step.
- Configure the package.
- Unpack the conffiles, and at the same time back up the old conffiles, so that they can be restored if something goes wrong.
- Run
postinst
script, if provided by the package.
The last step uses the same code as the command dpkg --configure
, which may be used to reconfigure a package that had already been unpacked.
The configuration step is implemented by the function deferred_configure
which focuses on a single package to configure. If the configuration cannot proceed, the package will be enqueued to be reprocessed later or not. Here is a simplified version:
/** * Process the deferred configure package. * * @param pkg The package to act on. */voiddeferred_configure(struct pkginfo *pkg){ struct varbuf aemsgs = VARBUF_INIT; struct conffile *conff; struct pkginfo *otherpkg; enum dep_check ok;
ok = dependencies_ok(pkg, NULL, &aemsgs); if (ok == DEP_CHECK_DEFER) { varbuf_destroy(&aemsgs); ensure_package_clientdata(pkg); pkg->clientdata->istobe = PKG_ISTOBE_INSTALLNEW; enqueue_package(pkg); return; }
/* * At this point removal from the queue is confirmed. This * represents irreversible progress wrt trigger cycles. Only * packages in PKG_STAT_UNPACKED are automatically added to the * configuration queue, and during configuration and trigger * processing new packages can't enter into unpacked. */ sincenothing = 0;
printf(_("Setting up %s (%s) ...\n"), pkg_name(pkg, pnaw_nonambig), versiondescribe(&pkg->installed.version, vdew_nonambig)); log_action("configure", pkg, &pkg->installed);
if (pkg->status == PKG_STAT_UNPACKED) { /* On entry, the ‘new’ version of each conffile has been * unpacked as ‘*.dpkg-new’, and the ‘installed’ version is * as-yet untouched in ‘*’. The hash of the ‘old distributed’ * version is in the conffiles data for the package. If * ‘*.dpkg-new’ no longer exists we assume that we've * already processed this one. */ for (conff = pkg->installed.conffiles; conff; conff = conff->next) { deferred_configure_conffile(pkg, conff); }
pkg_set_status(pkg, PKG_STAT_HALFCONFIGURED); modstatdb_note(pkg); }
maintscript_postinst(pkg, "configure", dpkg_version_is_informative(&pkg->configversion) ? versiondescribe(&pkg->configversion, vdew_nonambig) : "", NULL);
pkg_reset_eflags(pkg); post_postinst_tasks(pkg, PKG_STAT_INSTALLED);}
- 1
- In case of a missing dependency, the installation will abort only at this step, after the unpacking of the package files.
- 2
- The function
deferred_configure_conffile
renames the conffiles still ending with the suffix.dpkg-new
created during the unpacking. This function also shows the confirmation prompt. - 3
- Run the
postinst
maintainer script. - 4
- Change the status to
PKG_STAT_INSTALLED
and force the update in thestatus
database file.
The installation of our package is now completed. We can check the package has been installed by running the hello
command:
vagrant# hellohello world!
Or by using the command dpkg
to get the status of the package:
vagrant# dpkg -s helloPackage: helloStatus: install ok unpackedPriority: optionalSection: baseMaintainer: Julien SobczakArchitecture: amd64Version: 1.1-1Description: Say Hello
Case Study
What follows is a minimal rewrite in Go of the code covered in this second part. The full code is available on GitHub in the repository julien-sobczak/linux-packages-under-the-hood.
But first, let’s remove the package or we will not be able to test our program:
# dpkg -r hello(Reading database ... 26963 files and directories currently installed.)Removing hello (1.1-1) ...
# hellobash: /usr/bin/hello: No such file or directory
Here is the code:
----package main
import ( "archive/tar" "bytes" "fmt" "io" "log" "os" "os/exec" "path/filepath" "strings"
"github.com/blakesmith/ar" "github.com/julien-sobczak/deb822")
func main() { // This program expects one or more package files to install. if len(os.Args) < 2 { log.Fatalf("Missing package archive(s)") }
// Read the DPKG database db, _ := loadDatabase()
// Unpack and configure the archive(s) for _, archivePath := range os.Args[1:] { processArchive(db, archivePath) }
// For simplicity reasons, we don't manage a queue to defer // the configuration of packages like in the official code.}
//// Dpkg Database//
type Database struct { // File /var/lib/dpkg/status Status deb822.Document // Packages under /var/lib/dpkg/info/ Packages []*PackageInfo}
type PackageInfo struct { Paragraph deb822.Paragraph // Extracted section in /var/lib/dpkg/status
// info Files []string // File <name>.list Conffiles []string // File <name>.conffiles MaintainerScripts map[string]string // File <name>.{preinst,prerm,...}
Status string // Current status (as present in `Paragraph`) StatusDirty bool // True to ask for sync}
func (p *PackageInfo) Name() string { // Extract the package name from its section in /var/lib/dpkg/status return p.Paragraph.Value("Package")}
func (p *PackageInfo) Version() string { // Extract the package version from its section in /var/lib/dpkg/status return p.Paragraph.Value("Version")}
// isConffile determines if a file must be processed as a conffile.func (p *PackageInfo) isConffile(path string) bool { for _, conffile := range p.Conffiles { if path == conffile { return true } } return false}
// InfoPath returns the path of a file under /var/lib/dpkg/info/.// Ex: "list" => /var/lib/dpkg/info/hello.listfunc (p *PackageInfo) InfoPath(filename string) string { return filepath.Join("/var/lib/dpkg", p.Name()+"."+filename)}
// We now add a method to change the package status// and make sure the section in the status file is updated too.// This method will be used several times at the different steps// of the installation process.
func (p *PackageInfo) SetStatus(new string) { p.Status = new p.StatusDirty = true // Override in DEB 822 document used to write the status file old := p.Paragraph.Values["Status"] parts := strings.Split(old, " ") newStatus := fmt.Sprintf("%s %s %s", parts[0], parts[1], new) p.Paragraph.Values["Status"] = newStatus}
// Now, we are ready to read the database directory to initialize the structs.
func loadDatabase() (*Database, error) { // Load the status file f, _ := os.Open("/var/lib/dpkg/status") parser, _ := deb822.NewParser(f) status, _ := parser.Parse()
// Read the info directory var packages []*PackageInfo for _, statusParagraph := range status.Paragraphs { statusField := statusParagraph.Value("Status") // install ok installed statusValues := strings.Split(statusField, " ")
pkg := PackageInfo{ Paragraph: statusParagraph, MaintainerScripts: make(map[string]string), Status: statusValues[2], StatusDirty: false, }
// Read the configuration files pkg.Files, _ = ReadLines(pkg.InfoPath("list")) pkg.Conffiles, _ = ReadLines(pkg.InfoPath("conffiles"))
// Read the maintainer scripts maintainerScripts := []string{"preinst", "postinst", "prerm", "postrm"} for _, script := range maintainerScripts { scriptPath := pkg.InfoPath(script) if _, err := os.Stat(scriptPath); !os.IsNotExist(err) { content, err := os.ReadFile(scriptPath) if err != nil { return nil, err } pkg.MaintainerScripts[script] = string(content) } } packages = append(packages, &pkg) }
// We have read everything that interest us and are ready // to populate the Database struct.
return &Database{ Status: status, Packages: packages, }, nil}
// Now we are ready to process an archive to install.
func processArchive(db *Database, archivePath string) error {
// Read the Debian archive file f, err := os.Open(archivePath) if err != nil { return err } defer f.Close() reader := ar.NewReader(f)
// Skip debian-binary reader.Next()
// control.tar reader.Next() var bufControl bytes.Buffer io.Copy(&bufControl, reader)
pkg, err := parseControl(db, bufControl) if err != nil { return err }
// Add the new package in the database db.Packages = append(db.Packages, pkg) db.Sync()
// data.tar reader.Next() var bufData bytes.Buffer io.Copy(&bufData, reader)
fmt.Printf("Preparing to unpack %s ...\n", filepath.Base(archivePath))
if err := pkg.Unpack(bufData); err != nil { return err } if err := pkg.Configure(); err != nil { return err }
db.Sync()
return nil}
// parseControl processes the control.tar archive.func parseControl(db *Database, buf bytes.Buffer) (*PackageInfo, error) {
// The control.tar archive contains the most important files // we need to install the package. // We need to extract metadata from the control file, determine // if the package contains conffiles and maintainer scripts.
pkg := PackageInfo{ MaintainerScripts: make(map[string]string), Status: "not-installed", StatusDirty: true, }
tr := tar.NewReader(&buf)
for { hdr, err := tr.Next() if err == io.EOF { break // End of archive } if err != nil { return nil, err }
// Read the file content var buf bytes.Buffer if _, err := io.Copy(&buf, tr); err != nil { return nil, err }
switch filepath.Base(hdr.Name) { case "control": parser, _ := deb822.NewParser(strings.NewReader(buf.String())) document, _ := parser.Parse() controlParagraph := document.Paragraphs[0]
// Copy control fields and add the Status field in second position pkg.Paragraph = deb822.Paragraph{ Values: make(map[string]string), }
// Make sure the field "Package' comes first, then "Status", // then remaining fields. pkg.Paragraph.Order = append( pkg.Paragraph.Order, "Package", "Status") pkg.Paragraph.Values["Package"] = controlParagraph.Value("Package") pkg.Paragraph.Values["Status"] = "install ok non-installed" for _, field := range controlParagraph.Order { if field == "Package" { continue } pkg.Paragraph.Order = append(pkg.Paragraph.Order, field) pkg.Paragraph.Values[field] = controlParagraph.Value(field) } case "conffiles": pkg.Conffiles = SplitLines(buf.String()) case "prerm": fallthrough case "preinst": fallthrough case "postinst": fallthrough case "postrm": pkg.MaintainerScripts[filepath.Base(hdr.Name)] = buf.String() } }
return &pkg, nil}
// Unpack processes the data.tar archive.func (p *PackageInfo) Unpack(buf bytes.Buffer) error {
// The unpacking process consists in extracting all files // in data.tar to their final destination, except for conffiles, // which are copied with a special extension that will be removed // in the configure step.
if err := p.runMaintainerScript("preinst"); err != nil { return err }
fmt.Printf("Unpacking %s (%s) ...\n", p.Name(), p.Version())
tr := tar.NewReader(&buf) for { hdr, err := tr.Next() if err == io.EOF { break // End of archive } if err != nil { return err }
var buf bytes.Buffer if _, err := io.Copy(&buf, tr); err != nil { return err }
switch hdr.Typeflag { case tar.TypeReg: dest := hdr.Name if strings.HasPrefix(dest, "./") { // ./usr/bin/hello => /usr/bin/hello dest = dest[1:] } if !strings.HasPrefix(dest, "/") { // usr/bin/hello => /usr/bin/hello dest = "/" + dest }
tmpdest := dest if p.isConffile(tmpdest) { // Extract using the extension .dpkg-new tmpdest += ".dpkg-new" }
if err := os.MkdirAll(filepath.Dir(tmpdest), 0755); err != nil { log.Fatalf("Failed to unpack directory %s: %v", tmpdest, err) }
content := buf.Bytes() if err := os.WriteFile(tmpdest, content, 0755); err != nil { log.Fatalf("Failed to unpack file %s: %v", tmpdest, err) }
p.Files = append(p.Files, dest) } }
p.SetStatus("unpacked") p.Sync()
return nil}
// Configure processes the conffiles.func (p *PackageInfo) Configure() error {
// The configure process consists in renaming the conffiles // unpacked at the previous step. // // We ignore some implementation concerns like checking if a conffile // has been updated using the last known checksum.
fmt.Printf("Setting up %s (%s) ...\n", p.Name(), p.Version())
// Rename conffiles for _, conffile := range p.Conffiles { os.Rename(conffile+".dpkg-new", conffile) } p.SetStatus("half-configured") p.Sync()
// Run maintainer script if err := p.runMaintainerScript("postinst"); err != nil { return err } p.SetStatus("installed") p.Sync()
return nil}
func (p *PackageInfo) runMaintainerScript(name string) error {
// The control.tar file can contains scripts to be run at // specific moments. This function uses the standard Go library // to run the `sh` command with a maintainer scrpit as an argument.
if _, ok := p.MaintainerScripts[name]; !ok { // Nothing to run return nil }
out, err := exec.Command("/bin/sh", p.InfoPath(name)).Output() if err != nil { return err } fmt.Print(string(out))
return nil}
// We have covered the different steps of the installation process.// We still need to write the code to sync the database.
func (d *Database) Sync() error { newStatus := deb822.Document{ Paragraphs: []deb822.Paragraph{}, }
// Sync the /var/lib/dpkg/info directory for _, pkg := range d.Packages { newStatus.Paragraphs = append(newStatus.Paragraphs, pkg.Paragraph)
if pkg.StatusDirty { if err := pkg.Sync(); err != nil { return err } } }
// Make a new version of /var/lib/dpkg/status os.Rename("/var/lib/dpkg/status", "/var/lib/dpkg/status-old") formatter := deb822.NewFormatter() formatter.SetFoldedFields("Description") formatter.SetMultilineFields("Conffiles") if err := os.WriteFile("/var/lib/dpkg/status", []byte(formatter.Format(newStatus)), 0644); err != nil { return err }
return nil}
func (p *PackageInfo) Sync() error { // This function synchronizes the files under /var/lib/dpkg/info // for a single package.
// Write <package>.list if err := os.WriteFile(p.InfoPath("list"), []byte(MergeLines(p.Files)), 0644); err != nil { return err }
// Write <package>.conffiles if err := os.WriteFile(p.InfoPath("conffiles"), []byte(MergeLines(p.Conffiles)), 0644); err != nil { return err }
// Write <package>.{preinst,prerm,postinst,postrm} for name, content := range p.MaintainerScripts { err := os.WriteFile(p.InfoPath(name), []byte(content), 0755) if err != nil { return err } }
p.StatusDirty = false return nil}
/* Utility functions */
func ReadLines(path string) ([]string, error) { if _, err := os.Stat(path); !os.IsNotExist(err) { content, err := os.ReadFile(path) if err != nil { return nil, err } return SplitLines(string(content)), nil } return nil, nil}
func SplitLines(content string) []string { var lines []string for _, line := range strings.Split(string(content), "\n") { if strings.TrimSpace(line) == "" { continue } lines = append(lines, line) } return lines}
func MergeLines(lines []string) string { return strings.Join(lines, "\n") + "\n"}
Let’s test the new command:
$ go build -o dpkg main.go$ vagrant destroy -f # Recreate the VM$ vagrant up # to force a fresh installation.vagrant$ sudo suvagrant# /vagrant/dpkg /vagrant/hello.debPreparing to unpack hello.deb ...preinst says helloUnpacking hello (1.1-1) ...Setting up hello (1.1-1) ...postinst says hello
vagrant# hellohello world
vagrant# dpkg -s helloPackage: helloStatus: install ok installedPriority: optionalSection: baseMaintainer: Julien SobczakArchitecture: amd64Version: 1.1-1Description: Say Hello
Our package has been correctly installed. The standard dpkg
command recognized it and can be used to remove the package like any other installed package:
vagrant# dpkg -r hello(Reading database ... 25063 files and directories currently installed.)Removing hello (1.1-1) ...prerm says hellopostrm says hello
vagrant# hellobash: /usr/bin/hello: No such file or directory
🎉 We have finished with the command dpkg
. We succeeded in creating a package manually and installed it using a basic Go program. We have a better understanding of how dpkg
is working and what information is available in its database. Now, we will have a look at the package manager frontend apt
to understand how these programs are working together to install a package.
What happens when you install a package using apt
The main reason to use apt
is for the dependency management support. This command understands that in order to install a given package, other packages may need to be installed too, and apt
can download and install them. In practice, dpkg
is called a package manager and apt
is called a frontend package manager.
apt
, apt-get
, aptitude
APT is a vast project started in 1997 organized around a core library. The command apt-get
was the first frontend developed within the project, and apt
is the second command provided by APT, which overcomes some design mistakes of apt-get
, for example, apt
refuses to install dependencies that were not installed beforehand during an upgrade. Under the hood, both tools are built on top of the core library and are thus very close.
External projects like aptitude
have been developed later to support new features like auto-removing of packages when they are no longer required, but most of these features are now available in apt
too.
The most widespread command remains apt
, and it is the one that we will use in this section.
- Apt (
apt
andapt-get
) Official Repository: https://salsa.debian.org/apt-team/apt - Apt GitHub Mirror: https://github.com/Debian/apt
- Aptitude Official Repository: https://salsa.debian.org/apt-team/aptitude
Further documentation: apt-get, aptitude, … pick the right Debian package manager for you, Raphaël Hertzog
APT makes software available to the user by doing the dirty work of downloading all the required packages and installing them using dpkg
in the correct order to respect the dependencies. The scope of APT is larger than Dpkg and its behavior is highly configurable.
APT configuration resides under /etc/apt/
, which contains the following files:
-
apt.conf
andapt.conf.d/
: The main configuration files where hundred of options are available (more about them soon). The commandapt-config dump
can be used to view all available options with their default values:Terminal window $ apt-config dump...Dir "/";Dir::State "var/lib/apt";Dir::State::status "/var/lib/dpkg/status";Dir::Cache "var/cache/apt";Dir::Etc "etc/apt";Dir::Etc::sourcelist "sources.list";Dir::Etc::sourceparts "sources.list.d";Dir::Etc::main "apt.conf";Dir::Etc::parts "apt.conf.d";Dir::Etc::preferences "preferences";Dir::Etc::preferencesparts "preferences.d";Dir::Etc::trusted "trusted.gpg";Dir::Etc::trustedparts "trusted.gpg.d";... -
sources.list
andsources.list.d/
: lists of repositories (more about them soon). Here are the default repositories on my Debian server:Terminal window $ cat /etc/apt/sources.listdeb http://deb.debian.org/debian buster maindeb-src http://deb.debian.org/debian buster maindeb http://security.debian.org/debian-security buster-security maindeb-src http://security.debian.org/debian-security buster-security maindeb http://deb.debian.org/debian buster-updates maindeb-src http://deb.debian.org/debian buster-updates maindeb http://deb.debian.org/debian buster-backports maindeb-src http://deb.debian.org/debian buster-backports main -
preferences
andpreferences.d/
: APT pinning is the only available preference. By default, when multiple repositories are configured, a package can exist in several of them and APT applies logic to decide which one must be installed. Pinning allows you to change this logic (called a policy) for some packages. The commandapt-cache policy [pkg]
can be used to view the global policy when called without argument:Terminal window $ apt-cache policyPackage files:100 /var/lib/dpkg/statusrelease a=now500 http://security.debian.org/debian-security buster-security/mainamd64 Packagesrelease o=Debian,a=testing-security,n=buster-security,l=Debian-Security,c=main,b=amd64origin security.debian.org500 http://deb.debian.org/debian buster/main amd64 Packagesrelease o=Debian,a=testing,n=buster,l=Debian,c=main,b=amd64origin deb.debian.orgYou can create preferences files to privilege a specific repository for a given package or to prevent this package to be upgraded. (not covered in this article)
-
trusted.gpg
andtrusted.gpg.d/
: keys for secure authentication of packages (known as “Secure APT” and used in Debian since 2005). The commandapt-key
can be used to show the keys, and to add or remove a key. APT uses public-key (asymmetric) cryptography using GPG:Terminal window $ ls -1 /etc/apt/trusted.gpg.d/debian-archive-buster-automatic.gpgdebian-archive-buster-security-automatic.gpgdebian-archive-buster-stable.gpgdebian-archive-stretch-automatic.gpgdebian-archive-stretch-security-automatic.gpgdebian-archive-stretch-stable.gpgWhen installing a package, APT retrieves the package from an external repository and the
Release
file, which is the entry file to findPackages
index files, may have be altered (which means checking the MD5 sums inside these index files is useless if we can’t guarantee that theRelease
file is safe against a man-in-the-middle attack). This is the goal of secure APT. Concretely, secure APT always downloads aRelease.gpg
file if existing before downloading aRelease
file. (NB: The fileInRelease
had now merged the intent of these two deprecated files.) Using cryptography, APT can be sure that the file is safe and can trust the MD5 sums present inside it to check other files likePackages
files. Otherwise, APT will complain with the following message you have probably encountered before:Terminal window # When adding a new repository in `/etc/apt/sources.list.d/`:W: GPG error: http://ftp.us.debian.org testing Release:The following signatures couldn't be verifiedbecause the public key is not available:NO_PUBKEY 010908312D230C5F# When installing a new package from this repository:WARNING: The following packages cannot be authenticated!libglib-perl libgtk2-perlInstall these packages without verification [y/N]? -
auth.conf
andauth.conf.d/
: APT configuration and repositories list must be accessible to any user on the system but some repositories may require login information to connect, which are stored in these restrictive files. For example, instead of specifying the user/passwordapt:debian
in the source list file directly (deb https://apt:[email protected]/debian buster main
), you can create an entry inauth.conf
:machine example.orglogin aptpassword debian(not covered in this article)
-
listchanges.conf
andlistchanges.conf.d
: Only used by the commandapt-listchanges
to show what has been changed in a new version of a Debian package, as compared to the version currently installed on the system. It does this by extracting the relevant entries from both theNEWS.Debian
andchangelog[.Debian]
files, usually found in/usr/share/doc/_package_
in Debian package archives. (not covered in this article)
In practice, .d
directories are privileged so that the configuration can be split into several files. Single file may not even exist on your machine and are often deprecated.
Further documentation: APT configuration, Secure APT.
Now is the time to start looking at the code again. APT is written in C++. The entry point for any APT command is the file cmdline/apt.cc
which contains a function GetCommands()
that maps each command with a function defined in the directory apt-private/
, which delegates to other functions in the main APT lib present in the directory apt-pkg/
(i.e., cmdline/ -> apt-private/ -> apk-pkg/):
static std::vector<aptDispatchWithHelp> GetCommands() /*{{{*/{ return { {"list", &DoList, _("list packages based on package names")}, {"update", &DoUpdate, _("update list of available packages")}, {"install", &DoInstall, _("install packages")},
// ...
{nullptr, nullptr, nullptr} };}
Before invoking the command function, APT simply initializes a few classes like pkgSystem
to set the default configuration options.
Unlike Dpkg, APT is highly configurable using the files /etc/apt/apt.conf
and /etc/apt/apt.conf.d/
. The format is similar to some Linux tools like bind
or dhcp
.
vagrant$ cat /etc/apt/apt.conf.d/*APT{ NeverAutoRemove { "^firmware-linux.*"; "^linux-firmware$"; "^linux-image-[a-z0-9]*$"; "^linux-image-[a-z0-9]*-[a-z0-9]*$"; };};DPkg::Pre-Install-Pkgs { "/usr/bin/apt-listchanges --apt || test $? -lt 10"; };...
The configuration file is organized in a tree organized into functional groups. For instance, APT::Get::Assume-Yes
is an option within the APT
tool group, for the Get
tool. A new scope can be opened with curly braces, like this:
APT { Get { Assume-Yes "true"; Fix-Broken "true"; };};
You can retrieve the full list of options using the command apt-config
:
vagrant# apt-config dumpAPT "";APT::Architecture "amd64";APT::Build-Essential "";APT::Build-Essential:: "build-essential";APT::Install-Recommends "1";APT::Install-Suggests "0";APT::Sandbox "";APT::Sandbox::User "_apt";… hundreds of other options ...
Inside the code, the configuration is accessible using the class Configuration
(defined in apt-pkg/contrib/configuration.h
):
#include <apt-pkg/configuration.h>
Configuration *_config = new Configuration;
// Example with a boolean optionif (_config->FindB("pkgCacheFile::Generate", true) == false) {}
// Example with an integer optionint const Limit = _config->FindI("Acquire::QueueHost::Limit",DEFAULT_HOST_LIMIT)
Further documentation: man page
apt update
Here is the entry point when running the command apt update
:
bool DoUpdate(CommandLine &CmdL){ CacheFile Cache;
// Covered in step 1 // Get the source list if (Cache.BuildSourceList() == false) return false; pkgSourceList *List = Cache.GetSourceList();
// Covered in step 2 // do the work AcqTextStatus Stat(std::cout, ScreenWidth,_config->FindI("quiet",0)); ListUpdate(Stat, *List);
// Covered in step 3 // Rebuild the cache. pkgCacheFile::RemoveCaches(); if (Cache.BuildCaches(false) == false) return false;
// Covered in step 4 // show basic stats (if the user whishes) if (_config->FindB("APT::Cmd::Show-Update-Stats", false) == true) { int upgradable = 0; if (Cache.Open(false) == false) return false; for (pkgCache::PkgIterator I = Cache->PkgBegin(); I.end() != true; ++I) { pkgDepCache::StateCache &state = Cache[I]; if (I->CurrentVer != 0 && state.Upgradable() && state.CandidateVer != NULL) upgradable++; } const char *msg = P_( "%i package can be upgraded. Run 'apt list --upgradable' to see it.\n", "%i packages can be upgraded. Run 'apt list --upgradable' to see them.\n", upgradable); if (upgradable == 0) c1out << _("All packages are up to date.") << std::endl; else ioprintf(c1out, msg, upgradable); }
return true;}
The command is divided in four steps that we will cover separately:
- Read the
sources.list
andsources.list.d/*
files.
// Get the source listif (Cache.BuildSourceList() == false) return false;pkgSourceList *List = Cache.GetSourceList();
Apt downloads packages from one or more software repositories, which are often remote servers. The precise list of repositories is determined by the file /etc/apt/sources.list
and the ones inside /etc/apt/sources.list.d
. Two formats are supported: one source per line (the widespread one-line style) or multiline stanzas defining one or more sources per stanza (the newer deb822 style).
Example using the old format:
deb http://us.archive.ubuntu.com/ubuntu focal main restricteddeb http://security.ubuntu.com/ubuntu focal-security main restricteddeb http://us.archive.ubuntu.com/ubuntu focal-updates main restricted
Example using the new format:
Types: debURIs: http://us.archive.ubuntu.com/ubuntuSuites: focal focal-updatesComponents: main restricted
Types: debURIs: http://security.ubuntu.com/ubuntuSuites: focal-securityComponents: main restricted
We will ignore the new DEB 822 format in this article.
Further documentation: man 5 sources.list
The class pkgSourceList
represents the list of configured sources and is defined like this:
class pkgSourceList{ public:
typedef std::vector<metaIndex *>::const_iterator const_iterator;
protected:
std::vector<metaIndex *> SrcList;
public:
void Reset(); bool ReadMainList(); bool Read(std::string const &File);
// List accessors inline const_iterator begin() const {return SrcList.begin();}; inline const_iterator end() const {return SrcList.end();}; inline unsigned int size() const {return SrcList.size();}; inline bool empty() const {return SrcList.empty();};
bool FindIndex(pkgCache::PkgFileIterator File, pkgIndexFile *&Found) const; bool GetIndexes(pkgAcquire *Owner, bool GetAll=false) const;
pkgSourceList(); virtual ~pkgSourceList();};
The list is initialized by the method BuildSourceList()
:
bool pkgCacheFile::BuildSourceList(OpProgress * /*Progress*/){ std::unique_ptr<pkgSourceList> SrcList; SrcList.reset(new pkgSourceList()); if (SrcList->ReadMainList() == false) return _error->Error(_("The list of sources could not be read.")); this->SrcList = SrcList.release(); return true;}
The method ReadMainList()
is used to read the sources.list files:
bool pkgSourceList::ReadMainList(){ Reset(); string Main = _config->FindFile("Dir::Etc::sourcelist", "sources.list"); string Parts = _config->FindDir("Dir::Etc::sourceparts", "sources.list.d");
_error->PushToStack(); if (RealFileExists(Main) == true) ReadAppend(Main); if (DirectoryExists(Parts) == true) ReadSourceDir(Parts);
auto good = _error->PendingError() == false; _error->MergeWithStack(); return good;}
- 1
- The
Read*
methods parse the sources files. We omit the parsing code for brievity but both parsers pushes a new instance ofdebReleaseIndex
in theSrcList
.
- Fetch index files from each repository (
InRelease
,Packages
, …).
// do the workAcqTextStatus Stat(std::cout, ScreenWidth,_config->FindI("quiet",0));ListUpdate(Stat, *List);
- 1
AcqTextStatus
is used to report progress of the files downloading.
A repository is a set of Debian binary or source packages organized in a special directory tree along various additional files—checksums, signatures, translations, … APT downloads some of these files to install a package on your system.
Ex: deb https://deb.debian.org/debian stable main contrib non-free
deb
is used for binary packages,deb-src
for source packages.https://deb.debian.org/debian
specifies the root of the repository.stable
is the distribution, which is commonly a suite (stable
,oldstable
,testing
,unstable
), which is an alias for a Debian codename (wheezy
,jessie
,stretch
), which is based on Toy Story characters.main contrib non-free
are the three component types and indicate the licensing terms of the software they contain.
Here is a preview of files tree for this repository:
https://deb.debian.org/debian└── dists/ |── Debian9.13/ |── Debian10.9/ | ├── ChangeLog | ├── InRelease # Same as Release + Release.gpg | | # (recommended to have only 1 file to download) | ├── Release # Lists the index files for this distribution | | # with their checkums | ├── Release.gpg | ├── contrib/ | ├── main/ | │ └── binary-all/ | │ | |── Packages.gz | │ | |── Packages.xz # Several compression formats are accepted. | | | | # xz compression is required. | │ | |── Release # Basic metadata about this directory. | | | | # Not comparable with the main Release file. | │ |── binary-amd64/ | │ |── ... | │ |── content-all.gz # Index containing the list | | |── content-amd64.gz # of all files in package archives | │ |── content-arm64.gz # and their corresponding package archive. | │ |── ... | | |── i18n/ # Translations of Packages files | | └── source/ # We ignore source packags in this article | │ |── Release | │ |── Sources.gz | │ |── Sources.xz | └── non-free/ |── bullseye/ # Future Debian 11 |── buster/ # Symlink to Debian10.9 |── stable/ # Symlink to buster |── stretch/ # Symlink to Debian9.13 └── testing/ # Symlink to bullseye
And now the explanations.
The root directory contains a directory dists/
which in turn has a directory for each release and suite, the latter usually symlinks to the former. Each release subdirectory contains a signed Release
file and a directory for each component. Inside these are directories for the different architectures, named binary-<arch>
and sources
. And in these are files Packages
and Sources
that are text files (in DEB 822 format and often compressed) containing the metadata of available packages.
Example of a Packages
file:
# 57849 binary packages declarations like this:Package: wgetVersion: 1.20.1-1.1Installed-Size: 3257Maintainer: Noël Köthe <[email protected]>Architecture: amd64Depends: libc6 (>= 2.28), libgnutls30 (>= 3.6.6), libidn2-0 (>= 0.6), libnettle6, libpcre2-8-0 (>= 10.32), libpsl5 (>= 0.16.0), libuuid1 (>= 2.16), zlib1g (>= 1:1.1.4)Recommends: ca-certificatesConflicts: wget-sslDescription: retrieves files from the webMulti-Arch: foreignHomepage: https://www.gnu.org/software/wget/Description-md5: 63a4a740bcd9e8e94bf661e4f1806e02Tag: implemented-in::c, interface::commandline, network::client, protocol::ftp, protocol::http, protocol::ssl, role::program, suite::gnu, use::downloading, works-with::fileSection: webPriority: standardFilename: pool/main/w/wget/wget_1.20.1-1.1_amd64.debSize: 901956MD5sum: a7e3faa711503bd9500650de8fc9835eSHA256: 3821cee0d331cf75ee79daff716f9d320f758f9dff3eaa6d6cf12bae9ef14306
Package: libwget0Source: wget2Version: 1.99.1-2Installed-Size: 387Maintainer: Noël Köthe <[email protected]>Architecture: amd64Depends: libassuan0 (>= 2.0.1), libbrotli1 (>= 0.6.0), libbz2-1.0, libc6 (>= 2.27), libgnutls30 (>= 3.5.10), libgpg-error0 (>= 1.14), libgpgme11 (>= 1.1.2), libidn2-0 (>= 0.6), liblzma5 (>= 5.1.1alpha+20120614), libnghttp2-14 (>= 1.3.0), libpcre2-8-0 (>= 10.31), libpsl5 (>= 0.16.0), zlib1g (>= 1:1.1.4)Description: Download library for files and recursive websitesHomepage: https://gitlab.com/gnuwget/wget2Description-md5: 3cb4ed03cbc78579a7e509e41156a73fTag: role::shared-libSection: libsPriority: optionalFilename: pool/main/w/wget2/libwget0_1.99.1-2_amd64.debSize: 146028MD5sum: 944b2824ee264e1b0cc0f91c1a86e6e2SHA256: 3bf97e4852e76dba5bf2261f4a949a445edda646d09d7d1175dccfdf77bdbc3f
Example of a Sources
file:
# 28489 source packages declarations like this:Package: wgetBinary: wget, wget-udebVersion: 1.20.1-1.1Maintainer: Noël Köthe <[email protected]>Build-Depends: debhelper (>> 11.0.0), pkg-config, gettext, texinfo, libidn2-0-dev, uuid-dev, libpsl-dev, libpcre2-dev, libgnutls28-dev (>= 3.3.15-5), automake, libssl-dev (>= 0.9.8k), zlib1g-dev, dh-strip-nondeterminismArchitecture: anyStandards-Version: 4.3.0Format: 3.0 (quilt)Files: 7a84dd8efb09001dcb9af1576b35992c 2092 wget_1.20.1-1.1.dsc f6ebe9c7b375fc9832fb1b2028271fb7 4392853 wget_1.20.1.orig.tar.gz e0ed66f143f4d81dd0f27a8f01a9c5c8 60872 wget_1.20.1-1.1.debian.tar.xzChecksums-Sha256: b19...261 2092 wget_1.20.1-1.1.dsc b78...1b3 4392853 wget_1.20.1.orig.tar.gz 7ee...01e 60872 wget_1.20.1-1.1.debian.tar.xzHomepage: https://www.gnu.org/software/wget/Package-List: wget deb web standard arch=any wget-udeb udeb debian-installer optional arch=anyDirectory: pool/main/w/wgetPriority: sourceSection: web
But still no .deb
packages… We need to move to another directory at the root of the repository to find them:
https://deb.debian.org/debian└── dists/ |── contrib/ |── main/ | |── 0/ | |── 1/ | |── ... | |── 9/ | |── a/ | |── ... | |── w/ | |── .... | └── wget/ | |── ... | |── z/ | |── liba/ | |── ... | |── libw/ | | |── wget_1.21-1+b1_amd64.deb | | |── wget_1.21-1.debian.tar.xz | | |── wget_1.21-1.dsc | | |── wget_1.21-1_arm64.deb | | |── wget_1.21.orig.tar.gz | | └── wget_1.21.orig.tar.gz.asc | |── ... | └── libz/ └── non-free/
The directory pool/
has a directory for all the components, and in these are directories named 0
, …, 9
, a
, … z
, liba
, … , libz
. And in these are directories named after the software packages they contain, and these directories finally contain the actual packages, i.e the .deb
files.
Notes:
- The “single letter” directories are just a trick to avoid having too many entries in a single directory which is what many systems traditionally have performance problems with.
- The
pool/
directory avoid file duplication as binary and source packages are stored only once even if used by many releases underdists/
. Packages
andSources
index files are control files using a similar format as used in the first part of this article when creating our Debian archive package, with a special fieldFile
andDirectory
respectively, to link to thepool/
directory.Release
is an index file in the DEB822 format but containing only a single document whose field names refers to the repository —Origin
,Suite
,Codename
,Architectures
(plural),Components
— and whose fieldMD5Sum
contains the checksums for all files in this repository.
Further documentation: Debian Repository and the more complete Repository Format
Here is the function ListUpdate
that actively downloads index files from the repositories:
bool ListUpdate(pkgAcquireStatus &Stat, pkgSourceList &List, int PulseInterval){ pkgAcquire Fetcher(&Stat); if (Fetcher.GetLock(_config->FindDir("Dir::State::Lists")) == false) return false;
// Populate it with the source selection if (List.GetIndexes(&Fetcher) == false) return false;
return AcquireUpdate(Fetcher, PulseInterval, true);}
- 1
- The class
pkgAcquire
is the main component of the Acquire subsystem. APT is responsible to retrieve the packages from various sources, mainly remote repositories through HTTP and the Acquire system is responsible to fetch allItem
required by APT in the most efficient way. It uses for example a pool of workers to speed up the downloading and is able to test for diffs files before downloading full index files. - 2
- Most APT commands tries to acquire a lock to prevent two processes using the lib APT to run at the same time. The lock file is
/var/lib/apt/lists/lock
but other lock files exists for example to update the APT cache. - 3
- The method
GetIndexes()
creates new items to downloadInRelease
files using the Acquire system. - 4
- The function
AcquireUpdate()
collects the results from theFetcher
and update the cache.
Packages
files (and also some other indices files present in a Debian repository) can be relatively large. For example, the compressed Package.xz
file for the architecture amd64
and the component main
of the stable Debian repository weights 8 MB. These files are typically retrieved when you run the command apt update
and APT provides a solution to this problem.
Indeed, a Debian repository can contains diff files (whose content are similar to the output of the command diff
) along the standard files like Packages
:
https://deb.debian.org/debian└── dists/bullseye/main/binary-amd64 |── Packages.xz 7.8M └── Packages.diff/ |── ... # The Debian official repository keeps ~30 days of diff files. |── 2021-04-12-1400.57.gz 33 |── 2021-04-13-0200.48.gz 7.8K |── 2021-04-13-1402.06.gz 637 |── 2021-04-13-2000.50.gz 660 |── 2021-04-14-0200.40.gz 2.7K |── 2021-04-14-2000.54.gz 5.0K |── 2021-04-15-0200.39.gz 3.8K └── 2021-04-15-1400.39.gz 220
The apt
command will try to retrieve these files and apply successive diffs on top of its local index file.
- Read the package lists and build the dependency tree.
// Rebuild the cache.pkgCacheFile::RemoveCaches();if (Cache.BuildCaches(false) == false) return false;
/var/cache/apt/
This directory stores the latest version of the APT cache, used to speed up the execution of most commands:
$ tree /var/cache/apt/|-- archives # Storage area for downloaded files| |-- lock # Prevent two APT processes to update the cache simultaneously| |-- partial/ # Storage area for files in transit| |-- apt-transport-https_2.0.5_all.deb # Debian downloaded archives| |__ ... # are kept for a configurable| |-- tree_1.8.0-1_amd64.deb # retention.| `-- ...|-- pkgcache.bin # Binary files loaded directly in C++| # using the mmap() system call.`-- srcpkgcache.bin # Contains the local index files # and the archives file lists. # Those are low-level files used # for performance optimizations.
The APT Cache files under this directory (except the lock
file) can be safely deleted using the command apt clean
to reclaim disk space:
$ sudo apt clean --dry-runDel /var/cache/apt/archives/* /var/cache/apt/archives/partial/*Del /var/lib/apt/lists/partial/*Del /var/cache/apt/pkgcache.bin /var/cache/apt/srcpkgcache.bin
APT is highly configurable and there are several options to clean the cache regurlarly, like after every package installation.
/var/lib/apt/
This directory stores the current state of APT, that is which packages have been installed, what is the latest version of retrieved index files used when updating the cache, etc.
$ tree /var/lib/apt/.|-- daily_lock # Used by the Systemd apt-daily.timer for housekeeping tasks.| # Runs /usr/lib/apt/apt.systemd.daily which clean the cache,| # update the repositories, create backups of extended_states...| # Not covered in this article.|-- extended_states # Extension to /var/lib/dpkg/status to store which| # packages were installed manually or automatically| # (i.e., as a dependency of another packages).| # Useful to support autoremove of useless packages.|-- listchanges.db # Used by the command apt-listchanges| # Not covered in this article.|-- lists # Local version of index files retrieved| | # from repositories in sources.list| |-- deb.debian.org_debian_dists_buster-backports_InRelease| |-- deb.debian.org_debian_dists_buster-updates_InRelease| |-- deb.debian.org_debian_dists_buster_InRelease| |-- deb.debian.org_debian_dists_buster_main_binary-amd64_Packages| |-- deb.debian.org_debian_dists_buster_main_binary-amd64_Packages.diff_Index| |-- deb.debian.org_debian_dists_buster_main_i18n_Translation-en| |-- deb.debian.org_debian_dists_buster_main_i18n_Translation-en.diff_Index| |-- deb.debian.org_debian_dists_buster_main_source_Sources| |-- deb.debian.org_debian_dists_buster_main_source_Sources.diff_Index| |-- lock # Same as /var/lib/dpkg/lock.| | # Prevent two processes to use the lib APT at the same time| `-- partial/ # Storage area for index files in transit|-- mirrors # Used when using repository mirrors.| | # Not covered in this article.| `-- partial`-- periodic # Empty files whose timestamps are updated | # by the Systemd apt-daily.timer | # to determine the last execution date. | # Not covered in this article. |-- download-upgradeable-stamp |-- unattended-upgrades-stamp |-- update-stamp `-- upgrade-stamp
This directory doesn’t have to be edited like /etc/apt/
and doesn’t have to be cleaned like /var/cache/apt/
. It can be safely ignored by the Apt user but we will still have to talk about it in this article.
The method pkgCacheFile::BuildCaches()
calls the method BuildSourceList()
we covered in the previous step, and then delegates to the method pkgCacheGenerator::MakeStatusCache()
for the effective cache initialization:
bool pkgCacheGenerator::MakeStatusCache(pkgSourceList &List,OpProgress *Progress, MMap **OutMap,pkgCache **OutCache, bool){ std::vector<pkgIndexFile *> Files; if (_system->AddStatusFiles(Files) == false) return false;
// Decide if we can write to the files.. string const CacheFileName = _config->FindFile("Dir::Cache::pkgcache"); string const SrcCacheFileName = _config->FindFile("Dir::Cache::srcpkgcache");
if (Progress != NULL) Progress->OverallProgress(0,1,1,_("Reading package lists"));
bool pkgcache_fine = false; bool srcpkgcache_fine = false;
FileFd CacheFile; if (CheckValidity(CacheFile, CacheFileName, List, Files.begin(), Files.end()) { pkgcache_fine = true; srcpkgcache_fine = true; }
FileFd SrcCacheFile; if (pkgcache_fine == false) { if (CheckValidity(SrcCacheFile, SrcCacheFileName, List, Files.end(), Files.end()) == true) { srcpkgcache_fine = true; } }
if (srcpkgcache_fine == true && pkgcache_fine == true) { if (Progress != NULL) Progress->OverallProgress(1,1,1,_("Reading package lists")); return true; }
bool Writeable = false; if (srcpkgcache_fine == false || pkgcache_fine == false) { if (CacheFileName.empty() == false) Writeable = access(flNotFile(CacheFileName).c_str(),W_OK) == 0; else if (SrcCacheFileName.empty() == false) Writeable = access(flNotFile(SrcCacheFileName).c_str(),W_OK) == 0; }
// At this point we know we need to construct something, so get storage ready std::unique_ptr<DynamicMMap> Map(CreateDynamicMMap(NULL, 0));
std::unique_ptr<pkgCacheGenerator> Gen{nullptr}; map_filesize_t CurrentSize = 0; map_filesize_t TotalSize = 0;
if (srcpkgcache_fine == true && pkgcache_fine == false) { if (loadBackMMapFromFile(Gen, Map, Progress, SrcCacheFile) == false) return false; srcpkgcache_fine = true; TotalSize += ComputeSize(NULL, Files.begin(), Files.end()); } else if (srcpkgcache_fine == false) { Gen.reset(new pkgCacheGenerator(Map.get(),Progress)); if (Gen->Start() == false) return false;
TotalSize += ComputeSize(&List, Files.begin(),Files.end()); if (BuildCache(*Gen, Progress, CurrentSize, TotalSize, &List, Files.end(),Files.end()) == false) return false;
if (Writeable == true && SrcCacheFileName.empty() == false) if (writeBackMMapToFile(Gen.get(), Map.get(), SrcCacheFileName) == false) return false; }
if (pkgcache_fine == false) { if (BuildCache(*Gen, Progress, CurrentSize, TotalSize, NULL, Files.begin(), Files.end()) == false) return false;
if (Writeable == true && CacheFileName.empty() == false) if (writeBackMMapToFile(Gen.get(), Map.get(), CacheFileName) == false) return false; }
if (OutMap != nullptr) *OutMap = Map.release();
return true;}
- 1
- The cache is stored in
/var/cache/apt/pkgcache.bin
and/var/cache/apt/srcpkgcache.bin
. There are binary files that are loaded in memory. - 2
- The method
CheckValidity
loads each cache file in memory and checks that they are up-to-date, by verifying that every required index files for every source exists. - 3
- If both cache files are correct, we can returns immediately. Otherwise, we need to rebuild from scratch the ones that are not fine.
The APT Cache files are two binary files /var/cache/apt/pkgcache.bin
and /var/cache/apt/srcpkgcache.bin
.
Basically, these cache files contains all index files (InRelease
, Packages
, Sources
, and Translations
) retrieved from the APT repositories present in the list of sources (/etc/apt/source.list
and /etc/apt/source.list.d/
). The only difference between these two files is that the file pkgcache.bin
appends also the content of /var/lib/dpkg/status
.
Therefore, every time a new index file is retrieved by APT or when the Dpkg status file changes, the APT cache must be updated too.
The format of the cache files is optimized for the sole usage of APT and the main motivations is to speed up the loading of the cache in memory, and to reduce the memory usage. Therefore, the cache uses a binary format, which means you cannot read the files using your text editor. For example, Header
is the first struct copied and starts like this:
struct Header{ // Signature information unsigned long Signature; # 0x98FE76DC short MajorVersion; # 0 short MinorVersion; # 2 ...}
Field names are logically omitted and only values (sometimes converted to enums like the status string installed
that becomes 6
in the binary file) are appended in successive order as confirmed by the command xxd
which dump a file in hexadecimal:
$ xxd /var/cache/apt/pkgcache.bin | head -100000000: dc76 fe98 1000 0000 a802 1c2c 4038 5818 .v.........,@8X.## long = 4 bytes, short = 2 bytes# amd64 = little endian## dc --------+# 76 ------+ |# fe ----+ | 10 ---+ 00 ---+# 98 --+ | | | 00 -+ | 00 -+ |# | | | | | | | |# Signature: 98FE76DC Minor: 0010 = 2 Major: 0000 = 0
When APT is launched, these two files are loaded in memory using the mmap()
system call and the rest of the code interacts with an instance of the class pkgCache
and another of the class pkgDepCache
. In fact, pkgDepCache
wraps pkgCache
to add state informations about the packages on the system so that pkgCache
is mostly read-only.
The code to initialize these instances is not covered in the article. Check the files apt-pkg/pkgcache.h
, apt-pkg/cachefile.h
and apt-pkg/pkgcachegen.h
if you are curious.
Further Documentation: APT Cache File Format
We will not go deeper into the APT Cache code. We have already inspected the structure of the different index files (InRelease
, Packages
, …) and we know that APT commands use pkgCacheFile.GetPkgCache()
and pkgCacheFile.GetDepCache()
to retrieve information from the cache.
What follows are annotated definitions to give you an idea of the kind of information present in the APT Cache:
class pkgCache{ public:
struct Header; // The size and count of each following properties // required to jump to the index in the binary format.
struct Group; // Packages with the same name form a group, so we have // a simple way to access a package built // for different architectures. // Groups are also used to iterate over all binaries // produced by a source package. struct Package; // A single package with all the available versions // and the possible installed version. struct ReleaseFile; // Release index file. struct PackageFile; // Packages index file. struct Version; // A single version of a package with the list of // dependencies and the list of files in this package. struct Description; // Translation of a single version of a package struct DependencyData; // Information for a single dependency // (the version, the type, ...)
// Iterators class GrpIterator; class PkgIterator; class VerIterator; class DescIterator; class DepIterator; class RlsFileIterator; class PkgFileIterator;
class Namespace;
public:
// Pointers to the arrays of items Header *HeaderP; Group *GrpP; Package *PkgP; DescFile *DescFileP; ReleaseFile *RlsFileP; // All Release files used to build the cache PackageFile *PkgFileP; // All Packages files used to build the cache Version *VerP; Description *DescP; DependencyData *DepDataP;
// Accessors GrpIterator FindGrp(APT::StringView Name); PkgIterator FindPkg(APT::StringView Name);
inline GrpIterator GrpBegin(); inline GrpIterator GrpEnd(); inline PkgIterator PkgBegin(); inline PkgIterator PkgEnd(); inline PkgFileIterator FileBegin(); inline PkgFileIterator FileEnd(); inline RlsFileIterator RlsFileBegin(); inline RlsFileIterator RlsFileEnd();};
struct pkgCache::Package{ /** \brief Architecture of the package */ map_stringitem_t Arch; /** \brief List of versions sorted from highest version to lowest version */ map_pointer<Version> VersionList; /** \brief index to the installed version */ map_pointer<Version> CurrentVer; /** \brief index of the group this package belongs to */ map_pointer<pkgCache::Group> Group;
/** \brief List of all dependencies on this package */ map_pointer<Dependency> RevDepends; /** \brief List of all "packages" this package provide */ map_pointer<Provides> ProvidesList;
// Install/Remove/Purge etc /** \brief state that the user wishes the package to be in */ map_number_t SelectedState; // What /** \brief installation state of the package */ map_number_t InstState; // Flags /** \brief indicates if the package is installed */ map_number_t CurrentState; // State};
struct pkgCache::ReleaseFile{ /** \brief physical disk file that this ReleaseFile represents */ map_stringitem_t FileName; map_stringitem_t Archive; map_stringitem_t Codename; map_stringitem_t Version; map_stringitem_t Origin; map_stringitem_t Label; /** \brief The site the index file was fetched from */ map_stringitem_t Site;};
struct pkgCache::PackageFile{ /** \brief physical disk file that this PackageFile represents */ map_stringitem_t FileName; /** \brief the release information to keep record of which version belongs to which release e.g. for pinning. */ map_pointer<ReleaseFile> Release;
map_stringitem_t Component; map_stringitem_t Architecture;};
struct pkgCache::Version{ /** \brief complete version string */ map_stringitem_t VerStr; /** \brief section this version is filled in */ map_stringitem_t Section; /** \brief source package name this version comes from Always contains the name, even if it is the same as the binary name */ map_stringitem_t SourcePkgName; /** \brief source version this version comes from Always contains the version string, even if it is the same as the binary version */ map_stringitem_t SourceVerStr;
/** \brief references all the PackageFile's that this version came from
FileList can be used to determine what distribution(s) the Version applies to. If FileList is 0 then this is a blank version. The structure should also have a 0 in all other fields excluding pkgCache::Version::VerStr and Possibly pkgCache::Version::NextVer. */ map_pointer<VerFile> FileList; /** \brief base of the dependency list */ map_pointer<Dependency> DependsList; /** \brief links to the owning package
This allows reverse dependencies to determine the package */ map_pointer<Package> ParentPkg; /** \brief list of pkgCache::Provides */ map_pointer<Provides> ProvidesList;};
struct pkgCache::DependencyData{ /** \brief string of the version the dependency is applied against */ map_stringitem_t Version; /** \brief index of the package this depends applies to
The generator will - if the package does not already exist - create a blank (no version records) package. */ map_pointer<pkgCache::Package> Package;
/** \brief Dependency type - Depends, Recommends, Conflicts, etc */ map_number_t Type; /** \brief comparison operator specified on the depends line
If the high bit is set then it is a logical OR with the previous record. */ map_flags_t CompareOp;};
// Other structs are omitted for brievity.
Here is the definition of the class pkgDepCache
:
class pkgDepCache{ public:
enum ModeList {ModeDelete = 0, ModeKeep = 1, ModeInstall = 2, ModeGarbage = 3};
struct StateCache { // text versions of the two version fields const char *CandVersion; const char *CurVersion;
// Pointer to the candidate install version. Version *CandidateVer;
// Pointer to the install version. Version *InstallVer;
// Various tree indicators signed char Status; // -1,0,1,2 unsigned char Mode; // ModeList
// Various test members for the current status of the package inline bool Keep() const {return Mode == ModeKeep;}; inline bool Upgrade() const {return Status > 0 && Mode == ModeInstall;}; inline bool Upgradable() const {return Status >= 1 && CandidateVer != NULL;}; inline bool Downgrade() const {return Status < 0 && Mode == ModeInstall;}; inline bool Held() const {return Status != 0 && Keep();}; // ... };
protected:
// State information pkgCache *Cache; StateCache *PkgState;
public:
// Accessors inline StateCache &operator [](PkgIterator const &I) {return PkgState[I->ID];}; inline StateCache &operator [](PkgIterator const &I) const {return PkgState[I->ID];};
// read persistent states bool readStateFile(OpProgress * const prog); bool writeStateFile(OpProgress * const prog, bool const InstalledOnly=true);
bool Init(OpProgress * const Prog); // Generate all state information void Update(OpProgress * const Prog = 0);
pkgDepCache(pkgCache * const Cache,Policy * const Plcy = 0); virtual ~pkgDepCache();};
- Display statistics about package upgrades.
This last step simply traverses the cache to extract the relevant information.
// show basic stats (if the user whishes)if (_config->FindB("APT::Cmd::Show-Update-Stats", false) == true){ int upgradable = 0; if (Cache.Open(false) == false) return false; for (pkgCache::PkgIterator I = Cache->PkgBegin(); I.end() != true; ++I) { pkgDepCache::StateCache &state = Cache[I]; if (I->CurrentVer != 0 && state.Upgradable() && state.CandidateVer != NULL) upgradable++; } const char *msg = P_( "%i package can be upgraded. Run 'apt list --upgradable' to see it.\n", "%i packages can be upgraded. Run 'apt list --upgradable' to see them.\n", upgradable); if (upgradable == 0) c1out << _("All packages are up to date.") << std::endl; else ioprintf(c1out, msg, upgradable);}
- 1
- The operator
[]
is overloaded inpkgDepCache
to returnPkgState[I->ID]
, which is a structStateCache
containing the current installed and candidate versions. - 2
- The method
Upgradable()
reads the state to determine if a new candidate version is available and increments a counter. - 3
- The macro
P_
is defined bydefine P_(msg,plural,n) (n == 1 ? msg : plural)
.
That’s all for the command apt update
. We will now cover other APT commands, reusing the knowledge we built about the APT cache.
apt list
Here is the code of the command apt list
. This version omits optional arguments that are used to filter the list of results.
bool DoList(CommandLine &Cmd){ pkgCacheFile CacheFile; pkgCache * const Cache = CacheFile.GetPkgCache(); pkgRecords records(CacheFile);
std::string format = "${color:highlight}${Package}" + "${color:neutral}/${Origin} ${Version} " + "${Architecture}${ }${apt:Status}";
std::list<pkgCache::VerIterator> bag;
GetVersionSet(CacheFile, &bag); std::map<std::string, std::string> output_map; for (std::list<pkgCache::VerIterator>::iterator V = bag.begin(); V != bag.end(); ++V) { std::stringstream outs; ListSingleVersion(CacheFile, records, V, outs, format); output_map.insert(std::make_pair<std::string, std::string>( V.ParentPkg().FullName(), outs.str())); }
// output the map std::map<std::string, std::string>::const_iterator K; for (K = output_map.begin(); K != output_map.end(); ++K) std::cout << (*K).second << std::endl;
return true;}
- 1
- The function
CacheFile.GetPkgCache()
delegates to the methodBuildCaches()
we covered in the previous section aboutapt update
. This method is responsible to build the APT cache. - 2
- Concrete values will be replaced in the function
ListSingleVersion
by replacing${Package}
,${Origin}
, … by their real values. - 3
- The real implementation uses the type
LocalitySortedVersionSet
which is a list ordering packages based on their names in theTranslation
files of the user locale.
Like for the apt update
command, the code is simply using the information present in the APT cache. In this case, it happens in the function GetVersionSet
:
bool GetVersionSet(pkgCacheFile &CacheFile, std::list<pkgCache::VerIterator> versions){ pkgCache * const Cache = CacheFile.GetPkgCache(); pkgDepCache * const DepCache = CacheFile.GetDepCache();
bool const insertCurrentVer = _config->FindB("APT::Cmd::Installed", false); bool const insertUpgradable = _config->FindB("APT::Cmd::Upgradable", false);
for (pkgCache::PkgIterator P = Cache->PkgBegin(); P.end() == false; ++P) { pkgDepCache::StateCache &state = (*DepCache)[P]; if (insertCurrentVer == true) { if (P->CurrentVer != 0) versions->insert(P.CurrentVer()); } else if (insertUpgradable == true) { if (P.CurrentVer() && state.Upgradable()) versions->insert(CacheFile.GetPolicy()->GetCandidateVer(P)); } else { versions->insert(P.VersionList()); } } if (progress != NULL) progress->Done(); return true;}
- 1
- The command
apt list --installed
searches for installed packages. - 2
- The command
apt list --upgradable
searches for installed packages that can be upgraded. - 3
- The command
apt list --all-versions
searches for all packages in the APT cache.
The packages are then formatted in the function ListSingleVersion()
:
void ListSingleVersion(pkgCacheFile &CacheFile, pkgRecords &records, /*{{{*/ pkgCache::VerIterator const &V, std::ostream &out, std::string const &format){ pkgCache::PkgIterator const P = V.ParentPkg(); pkgDepCache * const DepCache = CacheFile.GetDepCache(); pkgDepCache::StateCache const &state = (*DepCache)[P];
std::string output = format;
output = SubstVar(output, "${db::Status-Abbrev}", GetFlagsStr(CacheFile, P)); output = SubstVar(output, "${Package}", P.Name()); std::string const ArchStr = GetArchitecture(CacheFile, P); output = SubstVar(output, "${Architecture}", ArchStr); std::string const InstalledVerStr = GetInstalledVersion(CacheFile, P); output = SubstVar(output, "${installed:Version}", InstalledVerStr); std::string const CandidateVerStr = GetCandidateVersion(CacheFile, P); output = SubstVar(output, "${candidate:Version}", CandidateVerStr); std::string const VersionStr = GetVersion(CacheFile, V); output = SubstVar(output, "${Version}", VersionStr); output = SubstVar(output, "${Origin}", GetArchiveSuite(CacheFile, V));
std::string StatusStr = ""; if (P->CurrentVer != 0) { if (P.CurrentVer() == V) { if (state.Upgradable() && state.CandidateVer != NULL) strprintf(StatusStr, _("[installed,upgradable to: %s]"), CandidateVerStr.c_str()); else if (V.Downloadable() == false) StatusStr = _("[installed,local]"); else if(V.Automatic() == true && state.Garbage == true) StatusStr = _("[installed,auto-removable]"); else if ((state.Flags & pkgCache::Flag::Auto) == pkgCache::Flag::Auto) StatusStr = _("[installed,automatic]"); else StatusStr = _("[installed]"); } else if (state.CandidateVer == V && state.Upgradable()) strprintf(StatusStr, _("[upgradable from: %s]"), InstalledVerStr.c_str()); } else if (V.ParentPkg()->CurrentState == pkgCache::State::ConfigFiles) StatusStr = _("[residual-config]"); output = SubstVar(output, "${apt:Status}", StatusStr); output = SubstVar(output, "${color:highlight}", _config->Find("APT::Color::Highlight", "")); output = SubstVar(output, "${color:neutral}", _config->Find("APT::Color::Neutral", "")); output = SubstVar(output, "${Description}", GetShortDescription(CacheFile, records, P)); output = SubstVar(output, "${LongDescription}", GetLongDescription(CacheFile, records, P)); output = SubstVar(output, "${ }${ }", "${ }"); output = SubstVar(output, "${ }\n", "\n"); output = SubstVar(output, "${ }", " ");
out << output;}
- 1
- The function ignores which fields are present in the output format and thus will try to replace all of them. If a field is missing, the replacement will do nothing.
- 2
- The code uses the state information present in
depPkgCache
to determine if the package is installed, or upgradable, and so on. - 3
- The code ensures no remaining braces are left.
We will close the APT section by covering the most useful command.
apt install
The entry point is the function DoInstall()
which is called by various commands: install
, reinstall
, remove
, purge
, … The code will be simplified to keep only the installation usage.
bool DoInstall(CommandLine &CmdL){ CacheFile Cache;
// Covered in step 1 if (Cache.OpenForInstall() == false) return false;
std::set<pkgCache::VerIterator> verset;
// Covered in step 2 if (!DoCacheManipulationFromCommandLine(CmdL, Cache, verset)) { return false; }
// Covered in step 3 /* Print out a list of packages that are going to be installed extra to what the user asked */ if (Cache->InstCount() != verset.size()) std::list<pkgCache::PkgIterator> extras; for (pkgCache::PkgIterator I = Cache->PkgBegin(); I.end() != true; ++I) { if ((*Cache)[Pkg].Install() == false) continue; pkgCache::VerIterator const Cand = (*Cache)[Pkg].CandidateVerIter(*Cache); if (verset->find(Cand) == verset->end()) { extra.insert(Pkg); } } ShowList(_("The following additional packages will be installed:"), extras);
/* Print out a list of suggested and recommended packages */ { std::list<std::string> Recommends, Suggests, SingleRecommends, SingleSuggests; for (auto const &Pkg: pkgCache::PkgIterator(*Cache)) { /* Just look at the ones we want to install */ if ((*Cache)[Pkg].Install() == false) continue;
// get the recommends/suggests for the candidate ver pkgCache::VerIterator CV = (*Cache)[Pkg].CandidateVerIter(*Cache); for (pkgCache::DepIterator D = CV.DependsList(); D.end() == false; ) { pkgCache::DepIterator Start; pkgCache::DepIterator End; D.GlobOr(Start, End); // advances D if (Start->Type != pkgCache::Dep::Recommends && Start->Type != pkgCache::Dep::Suggests) continue;
std::string target; for (pkgCache::DepIterator I = Start; I != D; ++I) { if (target.empty() == false) target.append(" | "); target.append(I.TargetPkg().FullName(true)); } std::list<std::string> &Type = Start->Type == pkgCache::Dep::Recommends ? Recommends : Suggests; if (std::find(Type.begin(), Type.end(), target) != Type.end()) continue; Type.push_back(target); }
} ShowList(_("Suggested packages:"), Suggests); ShowList(_("Recommended packages:"), Recommends); }
bool result;
// Covered in step 4 result = InstallPackages(Cache, false);
return result;}
- 1
- The package problem resolver is launched during step 2 and can add new packages to install to satisfy dependencies. Therefore, the number of packages to install can be different from the number of packages specified in the command line.
- Load the APT cache
The first step is without surprise to load the APT Cache using the method pkgCacheFile::Open()
which reuses methods we have already discussed before.
bool pkgCacheFile::Open(OpProgress *Progress){ if (BuildCaches(Progress) == false) return false;
if (BuildPolicy(Progress) == false) return false;
if (BuildDepCache(Progress) == false) return false;
if (Progress != NULL) Progress->Done(); if (_error->PendingError() == true) return false;
return true;}
- Determine the packages to install
Installing a package can also means uninstalling some other packages. Maybe the new version of a package stops using a dependency that was used only by this package and APT will try to autoremove it. The code is therefore a little more complicated.
For this step, we ignore most of these problems and focus on the installation of new packages with only new dependencies to install. The code will be adapted in consequence.
For every package to install, the code will update the state in pkgDepCache
using the function Cache->GetDepCache()->SetCandidateVersion()
and Cache.MarkInstall()
. After that, the code executes the pkgProblemResolver
. The goal is to fix broken packages, that is packages with missing or conflicting dependencies if the installation continues. The code is huge with more than 1000 lines of code. To give you an idea of the kind of constraints the resolver must satisfy, here are the relevant fields for a common package:
Package: nginx-coreDescription: nginx web/proxy server (standard version)Version: 1.18.0-6+b1Architecture: amd64Replaces: nginx-full (<< 1.18.0-1)Depends: libnginx-mod-http-geoip (= 1.18.0-6+b1), nginx-common (= 1.18.0-6), iproute2, libc6 (>= 2.28), libcrypt1 (>= 1:4.1.0), libpcre3, libssl1.1 (>= 1.1.1), zlib1g (>= 1:1.1.4)Suggests: nginx-doc (= 1.18.0-6)Conflicts: nginx-extras, nginx-lightBreaks: nginx (<< 1.4.5-1), nginx-full (<< 1.18.0-1)
The code documentation recognizes that the code has become complex and very sophisticated over time. Moreover, the resolver may even not be able to fix all broken packages. Packages may be missing and conflicts may still exist. Check the function pkgProblemResolver::ResolveInternal()
defined in apt-pkg/algorithms.cc
for more details.
bool DoCacheManipulationFromCommandLine(CommandLine &CmdL, CacheFile &Cache, std::set<APT::VersionSet> &verset){ std::unique_ptr<pkgProblemResolver> Fix(nullptr); Fix.reset(new pkgProblemResolver(Cache));
for (const char **I = CmdL.FileList + 1; *I != 0; ++I) { pkgCache::GrpIterator Grp = Cache.GetPkgCache()->FindGrp(pkg); verset.insert(Grp.FindPreferredPkg()) }
TryToInstall InstallAction(Cache, Fix.get());
for (unsigned short i = 0; order[i] != 0; ++i) { InstallAction = std::for_each(verset.begin(), verset.end(), InstallAction); }
// Call the scored problem resolver OpTextProgress Progress(*_config); bool resolver_fail = Fix->Resolve(true, &Progress);
if (resolver_fail == false) return false;
return true;}
- 1
- Add one to
CmdL.FileList
to skip theinstall
command name. - 2
- Mark this package version to be installed.
- 3
- Ensure the resolver fixed the broken packages before continuing the installation.
- Ask confirmation for additional packages to install
This step simply iterates over the package to install and inspects the calculated dependencies list to keep packages present in the fields Recommends
and Suggests
. The “recommended” dependencies are the most important and considerably improve the functionality offered by the package (these recommended packages are now installed by default by APT).
Here is an example of a package with recommended and suggested packages:
...Package: ngraph-gtkVersion: 6.09.01-1Maintainer: Hiroyuki Ito <[email protected]>Architecture: amd64Depends: libc6 (>= 2.4), libngraph0 (>= 6.07.02)Recommends: ngraph-gtk-addins, ngraph-gtk-docSuggests: fonts-liberationDescription: create scientific 2-dimensional graphs...
Note that dependencies of a package can also have recommended and suggested packages, and so on. Therefore, the final list displayed to the user is often pretty long:
vagrant# apt install ngraph-gtkReading package lists... DoneBuilding dependency tree... DoneReading state information... DoneThe following additional packages will be installed: adwaita-icon-theme at-spi2-core dbus-user-session dconf-gsettings-backend dconf-service fontconfig fontconfig-config fonts-dejavu-core glib-networking glib-networking-common glib-networking-services gsettings-desktop-schemas gtk-update-icon-cache hicolor-icon-theme libatk-bridge2.0-0 libatk1.0-0 libatk1.0-data libatspi2.0-0 libavahi-client3 libavahi-common-data libavahi-common3 libcairo-gobject2 libcairo2 libcolord2 libcups2 libdatrie1 libdconf1 libdeflate0 libepoxy0 libfontconfig1 libfribidi0 libgdk-pixbuf-2.0-0 libgdk-pixbuf-xlib-2.0-0 libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-bin libgdk-pixbuf2.0-common libgraphite2-3 libgsl25 libgslcblas0 libgtk-3-0 libgtk-3-bin libgtk-3-common libgtksourceview-4-0 libgtksourceview-4-common libharfbuzz0b libjbig0 libjpeg62-turbo libjson-glib-1.0-0 libjson-glib-1.0-common liblcms2-2 libngraph0 libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libpixman-1-0 libproxy1v5 librest-0.7-0 librsvg2-2 librsvg2-common libsoup-gnome2.4-1 libsoup2.4-1 libthai-data libthai0 libtiff5 libwayland-client0 libwayland-cursor0 libwayland-egl1 libwebp6 libx11-6 libx11-data libxau6 libxcb-render0 libxcb-shm0 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxdmcp6 libxext6 libxfixes3 libxi6 libxinerama1 libxkbcommon0 libxrandr2 libxrender1 libxtst6 ngraph-gtk-addins ngraph-gtk-addins-base ngraph-gtk-doc shared-mime-info x11-common xkb-dataSuggested packages: colord cups-common gsl-ref-psdoc | gsl-doc-pdf | gsl-doc-info | gsl-ref-html gvfs liblcms2-utils fonts-liberation librsvg2-binThe following NEW packages will be installed: adwaita-icon-theme at-spi2-core dbus-user-session dconf-gsettings-backend dconf-service fontconfig fontconfig-config fonts-dejavu-core glib-networking glib-networking-common glib-networking-services gsettings-desktop-schemas gtk-update-icon-cache hicolor-icon-theme libatk-bridge2.0-0 libatk1.0-0 libatk1.0-data libatspi2.0-0 libavahi-client3 libavahi-common-data libavahi-common3 libcairo-gobject2 libcairo2 libcolord2 libcups2 libdatrie1 libdconf1 libdeflate0 libepoxy0 libfontconfig1 libfribidi0 libgdk-pixbuf-2.0-0 libgdk-pixbuf-xlib-2.0-0 libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-bin libgdk-pixbuf2.0-common libgraphite2-3 libgsl25 libgslcblas0 libgtk-3-0 libgtk-3-bin libgtk-3-common libgtksourceview-4-0 libgtksourceview-4-common libharfbuzz0b libjbig0 libjpeg62-turbo libjson-glib-1.0-0 libjson-glib-1.0-common liblcms2-2 libngraph0 libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libpixman-1-0 libproxy1v5 librest-0.7-0 librsvg2-2 librsvg2-common libsoup-gnome2.4-1 libsoup2.4-1 libthai-data libthai0 libtiff5 libwayland-client0 libwayland-cursor0 libwayland-egl1 libwebp6 libx11-6 libx11-data libxau6 libxcb-render0 libxcb-shm0 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxdmcp6 libxext6 libxfixes3 libxi6 libxinerama1 libxkbcommon0 libxrandr2 libxrender1 libxtst6 ngraph-gtk ngraph-gtk-addins ngraph-gtk-addins-base ngraph-gtk-doc shared-mime-info x11-common xkb-data0 upgraded, 93 newly installed, 0 to remove and 11 not upgraded.Need to get 38.5 MB of archives.After this operation, 137 MB of additional disk space will be used.Do you want to continue? [Y/n]
We can confirm from the previous output that recommended packages are well installed by default.
- Proceed to the installation
The last step is managed by the function InstallPackages
:
bool InstallPackages(CacheFile &Cache, bool ShwKept, bool Ask){ // Create the download object aptAcquireWithTextStatus Fetcher; if (Fetcher.GetLock(_config->FindDir("Dir::Cache::Archives")) == false) return false;
// Read the source list if (Cache.BuildSourceList() == false) return false; pkgSourceList * const List = Cache.GetSourceList();
// Create the text record parser pkgRecords Recs(Cache); if (_error->PendingError() == true) return false;
// Create the package manager and prepare to download std::unique_ptr<pkgPackageManager> PM(_system->CreatePM(Cache)); if (PM->GetArchives(&Fetcher, List, &Recs) == false || _error->PendingError() == true) return false;
auto const FetchBytes = Fetcher.FetchNeeded(); auto const FetchPBytes = Fetcher.PartialPresent();
// Size delta ioprintf(c1out,_("After this operation, %sB of additional disk space " + "will be used.\n"), SizeToStr(Cache->UsrSize()).c_str());
if (_error->PendingError() == true) return false;
// Prompt to continue if (Ask == true || Fail == true) { if (_config->FindI("quiet", 0) < 2 && _config->FindB("APT::Get::Assume-Yes", false) == false) { if (YnPrompt(_("Do you want to continue?")) == false) { cout << _("Abort.") << std::endl; exit(1); } } }
// Run it bool Failed = false; while (1) { bool Transient = false; if (AcquireRun(Fetcher, 0, &Failed, &Transient) == false) return false;
if (Failed == true && _config->FindB("APT::Get::Fix-Missing",false) == false) return _error->Error(_("Unable to fetch some archives, " + "maybe run apt-get update or try with --fix-missing?"));
auto const progress = APT::Progress::PackageManagerProgressFactory(); _system->UnLockInner(); pkgPackageManager::OrderResult const Res = PM->DoInstall(progress); delete progress;
if (Res == pkgPackageManager::Failed || _error->PendingError() == true) return false; if (Res == pkgPackageManager::Completed) break;
_system->LockInner();
Fetcher.Shutdown(); if (PM->GetArchives(&Fetcher, List, &Recs) == false) return false;
Failed = false; }
std::set<std::string> const disappearedPkgs = PM->GetDisappearedPackages(); if (disappearedPkgs.empty() == false) { ShowList(c1out, P_("The following package disappeared from your system as\n" "all files have been overwritten by other packages:", "The following packages disappeared from your system as\n" "all files have been overwritten by other packages:", disappearedPkgs.size()), disappearedPkgs, [](std::string const &Pkg) { return Pkg.empty() == false; }, [](std::string const &Pkg) { return Pkg; }, [](std::string const &) { return std::string(); }); cout << _("Note: This is done automatically and on purpose by dpkg.") << std::endl; }
return true;}
- 1
- APT acquires a lock using the
fcntl()
system call which is used to manipulate file descriptors. When called using the flagF_SETLK
, the call returns -1 if the lock is already held by another process. - 2
- APT supports multiple package managers but the default is the
dpkg
command. APT uses the classdebSystem
and the associatedpkgDPkgPM
to interact with thedpkg
command. - 3
- The Acquire subsystem is reused to download the archives. Internally, the code keeps for every item to retrieve two fields
FileSize
andPartialSize
, which are the size of the object to fetch and how much was already fetched. The methodsFetcher.FetchNeeded()
andFetcher.FetchPartial()
iterates over the items to determine the total values. - 4
- APT asks for confirmation before proceeding to the installation, except if you use options like
apt -y install
. - 5
- Unlock Dpkg lock
/var/lib/dpkg/lock
to make sure thedpkg
command can use it. - 6
- The package manager reads the
/var/lib/dpkg/status
to found out the packages that were removed because none of their files was referenced by another package.
The installation logic is implemented by the class pkgDPkgPM
.
class pkgDPkgPM : public pkgPackageManager{ protected:
// progress reporting struct DpkgState { const char *state; // the dpkg state (e.g. "unpack") const char *str; // the human readable translation of the state };
// the dpkg states that the pkg will run through, the string is // the package, the vector contains the dpkg states that the package // will go through std::map<std::string,std::vector<struct DpkgState> > PackageOps; // the dpkg states that are already done; the string is the package // the int is the state that is already done (e.g. a package that is // going to be install is already in state "half-installed") std::map<std::string,unsigned int> PackageOpsDone;
// progress reporting unsigned int PackagesDone; unsigned int PackagesTotal;
public: struct Item { enum Ops {Install, Configure, Remove, Purge, ConfigurePending, TriggersPending, RemovePending, PurgePending } Op; std::string File; PkgIterator Pkg; Item(Ops Op,PkgIterator Pkg,std::string File = "") : Op(Op), File(File), Pkg(Pkg) {}; Item() {}; }; protected: std::vector<Item> List;
virtual bool Install(PkgIterator Pkg,std::string File) override; virtual bool Configure(PkgIterator Pkg) override; virtual bool Remove(PkgIterator Pkg,bool Purge = false) override;
virtual bool Go(APT::Progress::PackageManager *progress) override;
virtual void Reset() override;
public:
explicit pkgDPkgPM(pkgDepCache *Cache); virtual ~pkgDPkgPM();
APT_HIDDEN static bool ExpandPendingCalls(std::vector<Item> &List, pkgDepCache &Cache);};
- 1
- The package manager keeps a list of actions to perform.
- 2
- The method
Install
simply appends a new item inList
of typeInstall
. - 3
- The method
Go
reads the list of actions and execute them.
The only remaining code is the dpkg
command execution:
bool pkgDPkgPM::Go(APT::Progress::PackageManager *progress){ ...
// Generate the base argument list for dpkg std::vector<const char *> Args = { "dpkg" };
// this loop is runs once per dpkg operation vector<Item>::const_iterator I = List.cbegin(); while (I != List.end()) {
int fd[2]; if (pipe(fd) != 0) return _error->Errno("pipe","Failed to create IPC pipe to dpkg");
ADDARGC("--status-fd"); char status_fd_buf[20]; snprintf(status_fd_buf,sizeof(status_fd_buf),"%i", fd[1]); ADDARG(status_fd_buf); unsigned long const Op = I->Op;
switch (I->Op) { // Skip other operations
case Item::Install: ADDARGC("--unpack"); ADDARGC("--auto-deconfigure"); break; }
// Write in the file or package name if (I->Op == Item::Install) { if (I->File[0] != '/') return _error->Error("Internal Error, " + "Pathname to install is not absolute '%s'", I->File.c_str()); Args.push_back(I->File.c_str()); }
pid_t Child = ExecFork(fd[1]); if (Child == 0) { // This is the child close(fd[0]); // close the read end of the pipe
debSystem::DpkgChrootDirectory();
if (chdir(_config->FindDir("DPkg::Run-Directory","/").c_str()) != 0) _exit(100);
execvp(Args[0], (char**) &Args[0]); cerr << "Could not exec dpkg!" << endl; _exit(100); }
// we read from dpkg here int const _dpkgin = fd[0]; close(fd[1]); // close the write end of the pipe
// the result of the waitpid call int Status = 0; int res; bool waitpid_failure = false; bool dpkg_finished = false; do { if (dpkg_finished == false) { if ((res = waitpid(Child, &Status, WNOHANG)) == Child) dpkg_finished = true; else if (res < 0) { // error handling, waitpid returned -1 if (errno == EINTR) continue; waitpid_failure = true; break; } } if (dpkg_finished) break;
} while (true);
if (waitpid_failure == true) { strprintf(d->dpkg_error, "Sub-process %s couldn't be waited for.", Args[0]); _error->Error("%s", d->dpkg_error.c_str()); break; }
... }}
- 1
- The code is a classic example of Linux programming. The code uses the system calls
fork()
,exec()
, andwait()
to delegate to the commanddpkg
.
After the dpkg
command has run, the APT cache will still have to be updated as the state of some packages has been updated. There is nothing really new and we can stop our inspection of the APT code.
Case Study
Like for other parts, we will write a minimal version of the command apt install
in Go. We will not bother with a cache and simply read the Debian repositories systematically.
To test our program, we need a basic package so that we can focus on the core logic of the APT installation process without having to support advanced logics. We will use a new version of our package hello
(the code is available in the companion GitHub repository):
vagrant# tree /vagrant/hello/3.1-1/3.1-1/|-- DEBIAN| `-- control`-- usr `-- bin `-- hello
vagrant# cat /vagrant/hello/3.1-1/DEBIAN/controlPackage: helloVersion: 3.1-1Section: basePriority: optionalArchitecture: amd64Maintainer: Julien SobczakDescription: Say HelloDepends: cowsay
vagrant# cat /vagrant/hello/3.1-1/usr/bin/hello#!/bin/bashecho "hello world" | /usr/games/cowsay
- 1
- Declare a required dependency available in the standard Debian repository.
- 2
- Use the binary installed by this dependency.
To build the new package:
vagrant# $ dpkg --build 3.1-1 hello_3.1-1_amd64.deb
- 1
- We use the command
dpkg
but we could also have used our Go version created in the first part.
Example of installation using APT:
vagrant# apt install /vagrant/hello/hello_3.1-1_amd64.debReading package lists... DoneBuilding dependency tree... DoneReading state information... DoneThe following additional packages will be installed: cowsaySuggested packages: filters cowsay-offThe following NEW packages will be installed: cowsay hello0 upgraded, 2 newly installed, 0 to remove and 11 not upgraded.After this operation, 94.2 kB of additional disk space will be used.Do you want to continue? [Y/n] Y
Get:1 /vagrant/hello/hello_3.1-1_amd64.deb hello amd64 3.1-1 [20.7 kB]Get:2 http://deb.debian.org/debian bullseye/main amd64 cowsay all 3.03+dfsg2-8 [21.4 kB]Fetched 21.4 kB in 0s (66.6 kB/s)Selecting previously unselected package cowsay.(Reading database ... 34384 files and directories currently installed.)Preparing to unpack .../cowsay_3.03+dfsg2-8_all.deb ...Unpacking cowsay (3.03+dfsg2-8) ...Selecting previously unselected package hello.Preparing to unpack .../hello/hello_3.1-1_amd64.deb ...preinst says helloUnpacking hello (3.1-1) ...Setting up cowsay (3.03+dfsg2-8) ...Setting up hello (3.1-1) ...postinst says helloProcessing triggers for man-db (2.9.4-2) ...
vagrant# hello _____________< hello world > ------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || ||
The challenge is to install the same package using a basic Go program. We will reuse the dpkg
version we wrote in Go.
Here is the code:
package main
import ( "bufio" "bytes" "crypto/md5" "crypto/sha256" "fmt" "io" "io/ioutil" "net/http" "os" "os/exec" "path/filepath" "regexp" "strings" "sync"
"github.com/julien-sobczak/deb822" "github.com/ulikunitz/xz" "golang.org/x/crypto/openpgp" "golang.org/x/crypto/openpgp/clearsign")
// The command `apt install` requires more code// than our previous implementation of the command `dpkg`.// We will introduce the different components successively.
///////////////////////////////////////////////////////////
//// The Acquire subsystem//
// Apt accepts package names and needs to retrieve their archives// from repositories, commonly using HTTP.// The pkgAcquire struct downloads the various required files// using a pool of worker to process each item to download.// Like the real implementation, this system is not a generic downloader// but contains some Apt logic.
type pkgAcquire struct { // The downloaded items are used to populate the Apt cache cacheFile *CacheFile
// The items still not finished. pendingJobs int jobs chan Item results chan error // Workers are run in goroutines and push new items. jobsMutex sync.Mutex}
// There are different types of files to retrieve from an Apt repository:// - `InRelease`: the metadata about the repository.// - `Packages`: the list of packages present in the repository.// - `.deb` files: the archives to install using `dpkg`.// Each item is accessible from an URI, must be stored locally, and required// some postprocessing like checking the integrity of the files to prevent// MITM attacks.
type Item interface { // DownloadURI returns the URI to retrieve the item. DownloadURI() string
// DestFile returns the path where the file // represented by the URI must be written. DestFile(uri string) string
// Done is called when the file has been downloaded. // This function updates the cache with the retrieved item // and can trigger new downloads. Done(c *CacheFile, a *pkgAcquire) error}
// We will detail each type after the implementation of pkgAcquire.
// NewPkgAcquire initializes the Acquire system.func NewPkgAcquire(c *CacheFile) *pkgAcquire { a := &pkgAcquire{ cacheFile: c, pendingJobs: 0, jobs: make(chan Item, 1000), results: make(chan error, 1000), }
// Start the workers responsible to process the items in `jobs`. for w := 1; w <= 2; w++ { go a.worker(w, a.jobs, a.results) }
return a}
// Add is used to request the downloading of a new item.// New items are simply send to the `jobs` channel.func (a *pkgAcquire) Add(item Item) { // The function is called from different goroutines. // We use a lock to prevent data inconsistencies. a.jobsMutex.Lock() a.jobs <- item a.pendingJobs++ a.jobsMutex.Unlock()}
// A worker simply reads from the `jobs` channel and uses the different// methods defined by `Item` to know what to do.
func (a *pkgAcquire) worker(id int, jobs <-chan Item, results chan<- error) { for item := range jobs { results <- a.downloadItem(item) }}
func (a *pkgAcquire) downloadItem(item Item) error { uri := item.DownloadURI() dest := item.DestFile(uri)
// Download the file resp, err := http.Get(uri) if err != nil { fmt.Printf("Err: %v\n\t%s\n", item, err) return err } defer resp.Body.Close()
// Create the local file os.MkdirAll(filepath.Dir(dest), 0755)
out, err := os.Create(dest) if err != nil { return err } defer out.Close()
// Copy the body to the local file io.Copy(out, resp.Body)
fmt.Printf("Get: %v\n", item)
return item.Done(a.cacheFile, a)}
// There is one remaining method to cover.// The Acquire system will try to download items in parallel// but the code often need to block until all items have been downloaded// to continue. The next function is used to wait.
/** * Run downloads all items that have been added to this * download process. * * This method will block until the download completes. */func (a *pkgAcquire) Run() error { var errors []string var err error
for { // Exit when there are no more remaining jobs a.jobsMutex.Lock() if a.pendingJobs == 0 { a.jobsMutex.Unlock() break } a.jobsMutex.Unlock()
// Search for errors in the results err = <-a.results if err != nil { errors = append(errors, err.Error()) }
a.jobsMutex.Lock() a.pendingJobs-- a.jobsMutex.Unlock() }
if len(errors) > 0 { return fmt.Errorf(strings.Join(errors, "\n")) }
return nil}
// That's all for the Acquire system. What remains is the implementation// of the various types of Item.
///////////////////////////////////////////////////////////
/* * The first kind of `Item` we have to download are `InRelease` files. * These files contain metadata about other index files (ex: `Packages`) * present in the same repository and are used to check the integrity * of these files. */
type MetaIndexItem struct { // InRelease/Release files // The Debian source pointing to this repository. // The source contains fields required to determine the target URI. source *pkgSource}
func NewMetaIndexItem(source *pkgSource) *MetaIndexItem { return &MetaIndexItem{ source: source, }}
func (i *MetaIndexItem) DownloadURI() string { // Ex: http://deb.debian.org/debian/dists/buster/InRelease return i.source.URI + "/dists/" + i.source.Dist + "/InRelease"}
func (i *MetaIndexItem) DestFile(uri string) string { // Ex: /var/lib/apt/lists/deb.debian.org_debian_dists_buster_InRelease s := i.source return "/var/lib/apt/lists/" + fmt.Sprintf("%s.%s_InRelease", s.EscapedURI(), s.Dist)}
func (i *MetaIndexItem) Done(c *CacheFile, acq *pkgAcquire) error { s := i.source
filePath := i.DestFile(s.URI)
// 1. Check the file integrity
// Apt loads all GPG keys under /etc/apt/trusted.gpg.d/. // Here, for simplicity, we load only the single key we really need: // /etc/apt/trusted.gpg.d/debian-archive-buster-stable.gpg publicKey := fmt.Sprintf( "/etc/apt/trusted.gpg.d/debian-archive-%s-stable.gpg", s.Dist) decodedContent, err := gpgDecode(filePath, publicKey) if err != nil { return fmt.Errorf("the following signature couldn't be verified %s\n%v", filePath, err) }
// 2. Parse the content to extract metadata like the checksums // for other files to download parser, err := deb822.NewParser(strings.NewReader(string(decodedContent))) if err != nil { return fmt.Errorf("malformed Release file: %v", err) } doc, err := parser.Parse() if err != nil { return fmt.Errorf("malformed Release file: %v", err) }
// Extract values s.doc = doc.Paragraphs[0] s.Codename = s.doc.Value("Codename") // Ex: buster s.Suite = s.doc.Value("Suite") // Ex: stable s.Origin = s.doc.Value("Origin") // Ex: Debian s.Label = s.doc.Value("Label") // Ex: Debian s.Entries = make(map[string]string) for _, entry := range strings.Split(s.doc.Value("MD5Sum"), "\n") { // Ex: 0233ae8f041ca0f1aa5a7f395d326e80 57365 contrib/Contents-all.gz fields := regexp.MustCompile(`\s+`).Split(entry, -1) relativePath := strings.TrimSpace(fields[2]) md5sum := fields[0] s.Entries[relativePath] = md5sum }
// 3. Download the `Packages` files acq.Add(NewIndexItem(s, "main", "amd64")) // The real code download other Packages files in addition // like the ones for the `contrib` and `non-free` components.
return nil}func (i MetaIndexItem) String() string { // Ex: https://packages.grafana.com/oss/deb stable InRelease return fmt.Sprintf("%s stable InRelease", i.source.URI)}
///////////////////////////////////////////////////////////
/* * The second kind of Item we have to download are index files * (Packages and Sources files). * In this implementation, we are ignoring Sources index files. * Packages index files list the Debian control files (DEBIAN/control) * with a few additional fields for every .deb package available. */
type IndexItem struct { // `Packages`/`Sources` files source *pkgSource component string // Ex: main, free or non-free architecture string // Ex: amd64
}
func NewIndexItem(source *pkgSource, component string, architecture string) *IndexItem { return &IndexItem{ source: source, component: component, architecture: architecture, }}
func (i *IndexItem) DownloadURI() string { // Ex: http://deb.debian.org/debian/dists/buster/main/binary-all/Packages.xz return i.source.URI + "/dists/" + i.source.Dist + "/" + i.component + "/binary-" + i.architecture + "/Packages.xz"}
func (i *IndexItem) DestFile(uri string) string { // Ex: /var/lib/apt/lists/ // deb.debian.org_debian_dists_buster_main_binary-amd64_Packages.xz s := i.source return "/var/lib/apt/lists/" + fmt.Sprintf("%s.%s_%s_binary-%s_Packages.xz", s.EscapedURI(), s.Dist, i.component, i.architecture)}
func (i *IndexItem) Done(c *CacheFile, a *pkgAcquire) error { s := i.source path := i.DestFile(s.URI)
// 1. Read the file file, err := os.Open(path) if err != nil { return fmt.Errorf("missing file: %v", err) } defer file.Close()
b, err := ioutil.ReadAll(file) if err != nil { return fmt.Errorf("unable to open file %s: %v", path, err) }
// 2. Check integrity hash := md5.New() if _, err := io.Copy(hash, bytes.NewReader(b)); err != nil { return fmt.Errorf("unable to determine MD5 sum: %s", err) } md5sum := fmt.Sprintf("%x", hash.Sum(nil)) md5sumRef := s.Entries[i.EntryName()] if md5sum != md5sumRef { return fmt.Errorf("found MD5 mismatch: %v != %v", md5sum, md5sumRef) }
// 3. Extract content r, err := xz.NewReader(bytes.NewReader(b)) if err != nil { return fmt.Errorf("unable to open xz file: %v", err) } content, err := io.ReadAll(r) if err != nil { return fmt.Errorf("unable to read index file content: %v", err) }
// 4. Parse content parser, err := deb822.NewParser(strings.NewReader(string(content))) if err != nil { return fmt.Errorf("malformed index file: %v", err) } doc, err := parser.Parse() if err != nil { return fmt.Errorf("malformed index file: %v", err) }
// 5. Add the package into the Apt cache. for _, paragraph := range doc.Paragraphs { c.AddPackage(&Package{ doc: paragraph, source: s, }) }
return nil}
// EntryName returns the key in MD5Sum for this file in the Release file.func (i IndexItem) EntryName() string { // Ex: main/binary-am64/Packages return fmt.Sprintf("%s/binary-%s/Packages.xz", i.component, i.architecture)}
func (i IndexItem) String() string { // Ex: https://packages.grafana.com/oss/deb stable/main amd64 Packages return fmt.Sprintf("%s stable/%s %s Packages", i.source.URI, i.component, i.architecture)}
///////////////////////////////////////////////////////////
/* * The last kind of Item we have to download are .deb archives that will * be passed to the dpkg command to proceed to the installation. * These files are downloaded under /var/cache/apt/archives/. */
type PackageItem struct { // `.deb` files // The package metadata associated with the archive to download. pkg *Package}
func NewPackageItem(pkg *Package) *PackageItem { return &PackageItem{ pkg: pkg, }}
func (i *PackageItem) DownloadURI() string { // Ex: http://deb.debian.org/debian/pool/main/r/rsync/rsync_3.2.3_amd64.deb return i.pkg.source.URI + "/" + i.pkg.doc.Value("Filename")}
func (i *PackageItem) DestFile(uri string) string { // Ex: /var/cache/apt/archives/rsync_3.2.3-4_amd64.deb pkg := i.pkg pkg.cacheFilepath = "/var/cache/apt/archives/" + filepath.Base(uri) return pkg.cacheFilepath}
func (i *PackageItem) Done(c *CacheFile, a *pkgAcquire) error { // 1. Check file integrity f, err := os.Open(i.pkg.cacheFilepath) if err != nil { return err } defer f.Close()
h := sha256.New() if _, err := io.Copy(h, f); err != nil { return err }
indexChecksum := i.pkg.doc.Value("SHA256") effectiveChecksum := fmt.Sprintf("%x", h.Sum(nil))
if indexChecksum != effectiveChecksum { return fmt.Errorf("invalid checksum for %s", i.pkg.cacheFilepath) }
// 2. Nothing more to do. // The archive will be processed later when delegating to the `dpkg` command.
return nil}
func (i PackageItem) String() string { // Ex: https://grafana.com/oss/deb stable/main amd64 grafana amd64 7.5.5 pkg := i.pkg return fmt.Sprintf("%s stable/main %s %s %s", pkg.source.URI, pkg.Name(), pkg.Architecture(), pkg.Version())}
///////////////////////////////////////////////////////////
//// The Apt Cache//
// We try to using the same naming as for the real implementation// using similar structs but containing only the main fields.
// CacheFile is the high-level component for the Apt cache.type CacheFile struct { cache *pkgCache depCache *pkgDepCache sources []*pkgSource}
// pkgCache contains all known packages// (found in Dpkg database and in repositories)type pkgCache struct { packages map[string]*Package // The key is the package name}
// pkgDepCache contains the state information for every package// (installed, to install, upgradable, ...).type pkgDepCache struct { states map[string]*StateCache // The ordered list of packages waiting to be installed. order []string}
// pkgSource represents a single line in a source.list file.type pkgSource struct { doc deb822.Paragraph // `Release` file content
// parsed from the sources.list file Type string URI string Dist string
// parsed from the Packages file Codename string Suite string Origin string Label string Entries map[string]string // Checksums of all repository files}
// EscapedURI returns a name based on the URI that can be used in filename.// Indeed, most retrieved files are stored under /var/lib/apt/// and are named after their source.func (s *pkgSource) EscapedURI() string { return strings.ReplaceAll(strings.TrimPrefix(s.URI, "http://"), "/", "_")}
// The core of the Apt cache is the list of packages.
// Package is a Debian package.type Package struct { // The metadata as present in `Packages` or `status` file doc deb822.Paragraph // The source where this package is coming from. // Can be undefined for already installed packages. source *pkgSource
// The path under /var/cache/apt/packages. // Initialized after the download of the package. cacheFilepath string}
// We expose a few additional methods to extract attributes// from the underlying DEB822 document.
func (p *Package) Name() string { return p.doc.Value("Package")}
func (p *Package) Version() string { return p.doc.Value("Version")}
func (p *Package) Architecture() string { return p.doc.Value("Architecture")}
func (p *Package) Depends() []Dependency { return ParseDependencies(p.doc.Value("Depends"))}
func (p *Package) Suggests() []Dependency { return ParseDependencies(p.doc.Value("Suggests"))}
type Dependency struct { Name string Version string Relation string}
func ParseDependencies(values string) []Dependency { // Ex: "adduser, gpgv | gpgv2 | gpgv1, libapt-pkg5.0 (>= 1.7.0~alpha3~)" depsValues := strings.TrimSpace(values) if depsValues == "" { return nil }
var deps []Dependency for _, value := range strings.Split(depsValues, ", ") { deps = append(deps, ParseDependency(value)) } return deps}
func ParseDependency(value string) Dependency { // Example of syntax: // "adduser", "gpgv | gpgv2", "libc6 (>= 2.15)", // "python3:any (>= 3.5~)", "foo [i386]", "perl:any", "perlapi-5.28.0"
var dep Dependency
r := regexp.MustCompile(`^(?P<name>[\w\.-]+)(?:[:]\w+)?` + `(?: [(](?P<relation>(?:>>|>=|=|<=|<<)) ` + `(?P<version>\S+)[)])?(?: [|].*)?$`) res := r.FindStringSubmatch(value) names := r.SubexpNames() for i, _ := range res { switch names[i] { case "name": dep.Name = res[i] case "relation": dep.Relation = res[i] case "version": dep.Version = res[i] } } return dep}
// That's all for the different structures relating to the Apt cache.
///////////////////////////////////////////////////////////
// Now, we need to initialize the three main components.// The first step is thus to create the array containing all known packages.// This array will be populated in the successive steps.
func (c *CacheFile) BuildCaches() { c.cache = &pkgCache{ packages: make(map[string]*Package), }}
// The second step is to read the lists of sources to find the `Packages` files// containing the list of available packages.// So, we need a function to parse these local source files.
// ParseSourceFile parses a single source file.// It only supports the common multi-line format,// and not the most recent DEB822 format.func ParseSourceFile(content string) []*pkgSource { var results []*pkgSource
scanner := bufio.NewScanner(strings.NewReader(content)) // Read line by line for scanner.Scan() { line := scanner.Text() if strings.TrimSpace(line) == "" { // Ignore blank lines continue } if strings.HasPrefix("#", line) { // Ignore comments continue } parts := strings.Split(line, " ") // Basic parser (ignore some options or unused attributes) source := &pkgSource{ Type: parts[0], URI: parts[1], Dist: parts[2], } results = append(results, source) }
return results}
// BuildSourceList parses every source file.func (c *CacheFile) BuildSourceList() { var sources []*pkgSource
// Read /etc/apt/sources.list mainPath := "/etc/apt/sources.list" if _, err := os.Stat(mainPath); !os.IsNotExist(err) { content, err := ioutil.ReadFile(mainPath) if err != nil { fmt.Printf("E: Unable to read source file\n\t%s\n", err) os.Exit(1) } sources = append(sources, ParseSourceFile(string(content))...) }
// Read /etc/apt/sources.list.d/ dirPath := "/etc/apt/sources.list.d/" if _, err := os.Stat(dirPath); !os.IsNotExist(err) { files, err := ioutil.ReadDir(dirPath) if err != nil { fmt.Printf("E: Unable to read source dir\n\t%s\n", err) os.Exit(1) } for _, file := range files { filePath := filepath.Join(dirPath, file.Name()) content, err := ioutil.ReadFile(filePath) if err != nil { fmt.Printf("E: Unable to read source file\n\t%s\n", err) os.Exit(1) } sources = append(sources, ParseSourceFile(string(content))...) } } c.sources = sources}
// The last step is to read the Dpkg database// to determine the packages already installed.// Therefore, we need a function to parse the status file.
func ParseStatus() (*deb822.Document, error) { f, err := os.Open("/var/lib/dpkg/status") if err != nil { return nil, err } parser, err := deb822.NewParser(f) if err != nil { return nil, err } statusContent, err := parser.Parse() if err != nil { return nil, err } return &statusContent, nil}
func (c *CacheFile) BuildDepCache() { states := make(map[string]*StateCache)
// Read /var/lib/dpkg/status status, err := ParseStatus() if err != nil { fmt.Printf("E: The package lists or status file could not be parsed.") os.Exit(1) }
// Add state for packages already installed for _, pkg := range status.Paragraphs { // The status file also contains packages // that were partially installed or removed. if !strings.Contains(pkg.Value("Status"), "installed") { continue } state, ok := states[pkg.Value("Package")] if !ok { state = &StateCache{} states[pkg.Value("Package")] = state } state.CurrentVersion = pkg.Value("Version") }
c.depCache = &pkgDepCache{ states: states, }}
///////////////////////////////////////////////////////////
// We now have the three functions required to initialize the Apt cache.// We will hide them behind a simple method.
func (c *CacheFile) Open() { // Initialize the Acquire system to download file from repositories acq := NewPkgAcquire(c)
// Initialize the cache structure if c.sources == nil { c.BuildCaches() c.BuildSourceList() c.BuildDepCache() }
// Download items from repositories for _, source := range c.sources { if source.Type == "deb-src" { continue // We are interested only in binary packages } acq.Add(NewMetaIndexItem(source)) }
// Wait for all items to be downloaded to return err := acq.Run() if err != nil { fmt.Printf("E: Unable to fetch resources\n\t%s\n", err) os.Exit(1) }}
// As we have implemented before, the cache content is populated// from the `Done()` methods of the different types of `Item`.// We need to expose additional methods to easily add or retrieve// these packages and their state.
func (c *CacheFile) AddPackage(p *Package) { c.cache.packages[p.Name()] = p}
func (c *CacheFile) GetPackage(name string) *Package { if p, ok := c.cache.packages[name]; ok { return p } return nil}
func (c *CacheFile) GetPackages() []*Package { values := make([]*Package, 0, len(c.cache.packages)) for _, v := range c.cache.packages { values = append(values, v) } return values}
func (c *CacheFile) GetState(pkg *Package) *StateCache { var state *StateCache state, ok := c.depCache.states[pkg.Name()] if !ok { // Only the state of installed packages is present. // We defer the initialization for other packages until // the first access. state = &StateCache{ CandidateVersion: pkg.Version(), flagInstall: false, } c.depCache.states[pkg.Name()] = state } return state}
///////////////////////////////////////////////////////////
// We are almost done with the Apt cache.// We have discussed several times about the state we keep about each package// without explaining what it means.
type StateCache struct { // The version that can be installed determined using sources. CandidateVersion string // The version currently installed determined using the Dpkg database. CurrentVersion string // A flag to determine if the package is marked for installation. flagInstall bool}
func (s *StateCache) Upgradable() bool { return s.CurrentVersion != "" && s.CandidateVersion != "" && s.CurrentVersion != s.CandidateVersion}
func (s *StateCache) Install() bool { return s.flagInstall}
func (s *StateCache) Installed() bool { return s.CurrentVersion != ""}
// When installing a package, we must make sure its dependencies// are already installed or we need to install them first.// The logic is rather complicated as many things can go wrong// with dependency management like conflicts between two packages.// For this article, we will use a very basic approach.// We ignore versions completely and install each missing dependencies// without checking if it brokes other packages. This is another// reason why you must not run this code on your host directly :).
func (c *CacheFile) MarkForInstallation(pkgName string) { pkg := c.GetPackage(pkgName) if pkg == nil { fmt.Printf("E: Unable to locate package %s\n", pkgName) os.Exit(1) }
state := c.GetState(pkg) if state.Installed() || state.Install() { // Already installed or marked for installation return }
// Make sure to mark the package before checking its dependencies // to prevent infinite cycles state.CandidateVersion = pkg.Version() state.flagInstall = true
// Mark dependencies recursively for _, dep := range pkg.Depends() { c.MarkForInstallation(dep.Name) }
// Add dependencies first in the installation sequence order c.depCache.order = append(c.depCache.order, pkgName)}
// We end this section with an utility method to report the total// number of packages that will be installed.// This number differs commonly as packages have dependencies// that must be installed and we will use this method to notify// the user that more packages will be installed as the ones// passed in argument.
func (c *CacheFile) InstCount() int { count := 0 for _, state := range c.depCache.states { if state.Install() { count++ } } return count}
///////////////////////////////////////////////////////////
//// Main//
// We have everything we need to implement the command `apt install`.// We will integrate everything we have covered so far.
func main() { var pkgNames []string // The command `apt install` can be called without any package to install. if len(os.Args) > 1 { pkgNames = append(pkgNames, os.Args[1:]...) }
// Load the Cache cache := &CacheFile{} cache.Open()
// Search for the packages to install pkgs := make(map[string]*Package) for _, pkgName := range pkgNames { // The command `apt install` also supports `.deb` file. // We ignore this for simplicity to avoid // duplicating code from the previous parts of this blog post. // Check https://github.com/julien-sobczak/linux-packages-under-the-hood // for a more complete implementation.
cache.MarkForInstallation(pkgName) pkgs[pkgName] = cache.GetPackage(pkgName) }
// Print out the list of additional packages to install if cache.InstCount() != len(pkgNames) { var extras []string for _, pkg := range cache.GetPackages() { state := cache.GetState(pkg) if !state.Install() { continue } if _, ok := pkgs[pkg.Name()]; !ok { extras = append(extras, pkg.Name()) } } fmt.Printf( "The following additional packages will be installed:\n\t%s\n", strings.Join(extras, " ")) }
// Print out the list of suggested packages var suggests []string for _, pkg := range cache.GetPackages() { state := cache.GetState(pkg)
// Just look at the ones we want to install if !state.Install() { continue }
// Get the suggestions for the candidate version for _, dependency := range pkg.Suggests() { suggests = append(suggests, dependency.Name) } } if len(suggests) > 0 { fmt.Printf("Suggested packages:\n\t%s\n", strings.Join(suggests, " ")) }
err := InstallPackages(cache) if err != nil { fmt.Printf("E: %s\n", err) os.Exit(1) }}
func InstallPackages(cache *CacheFile) error { acq := NewPkgAcquire(cache)
// 1. Download package archives for _, pkgName := range cache.depCache.order { pkg := cache.GetPackage(pkgName) acq.Add(NewPackageItem(pkg)) } err := acq.Run() if err != nil { return err }
// 2. Run the command `dpkg -i` to install them var archives []string for _, pkgName := range cache.depCache.order { pkg := cache.GetPackage(pkgName) archives = append(archives, pkg.cacheFilepath) }
// We delegate to the dpkg command to avoid repeating the previous code // but the complete code source of this repository reuse the same code. // Check https://github.com/julien-sobczak/linux-packages-under-the-hood out, err := exec.Command("dpkg", "-i", strings.Join(archives, " ")).Output() if err != nil { return err } fmt.Print(string(out))
return nil}
///////////////////////////////////////////////////////////
// Helpers
// gpgDecode checks the GPP signature of a clearsigned document and// returns the content.func gpgDecode(filename string, publicKey string) ([]byte, error) { // Open gpg clearsigned document r, err := os.Open(filename) if err != nil { return nil, fmt.Errorf("error opening public key: %s", err) } defer r.Close()
// Read the content data, err := ioutil.ReadAll(r) if err != nil { return nil, err }
// Decode the content b, _ := clearsign.Decode(data) if b == nil { return nil, fmt.Errorf("not PGP signed") }
// Open the public key to validate the signature rk, err := os.Open(publicKey) if err != nil { return nil, fmt.Errorf("error opening public key: %s", err) } defer r.Close() keyring, err := openpgp.ReadKeyRing(rk) // binary if err != nil { return nil, fmt.Errorf("failed to parse public key: %v", err) }
// Check the signature using the public key _, err = openpgp.CheckDetachedSignature(keyring, bytes.NewBuffer(b.Bytes), b.ArmoredSignature.Body) if err != nil { return nil, err }
return b.Plaintext, nil}
🎉 We have finished with the command apt
. We have also finished with this article! We created a Debian archive using a basic Go program and we install the package using Go versions of dpkg
and apt
.
”One” Last Word
Linux packages are just archives containing files to extract into a different system. The problem is trivial but the evil always comes from details.
In this article, we have glimpsed at some of the challenges that a package manager must address. Packages use others packages which means the package manager must face one of the most difficult problems in computing, dependency management. Despite that, Dpkg and Apt are still approachable programs.
We wrote basic versions from scratch using only a few hundreds of lines of Go code. The biggest obstacle was that the commands dpkg
and apt
are interactive and try do too much to avoid to rely on the user to fix problems, which explains why the sum of the two programs represents approximatively 100,000 lines of C and C++ code.
If you are managing a large pool of servers like a datacenter, reimplementing your own package manager can be interesting. For example, you could centralize all local databases to ensure that all machines share the same state, or you can take corrective actions like excluding a server from the pool when an upgrade ends in a bad state. Google provides a great example of application. They decided to implement their own package management system. “Any package change is guaranteed to succeed, or the machine is rolled back completely to the previous state. If the rollback fails, the machine is sent through our repairs process for reinstallation and potential hardware replacement. This approach allows us to eliminate much of the complexity of the package states.”1 The decision was surely not obvious, but the benefits are for sure obvious.
Implementing a package manager from scratch can be intimidating, but as we have seen in this article, the reality is not so bad, especially if we consider the long list of features that Apt supports that are not useful when managing a large number of homogenous machines in an automated way.
Footnotes
-
Building Secure and Reliable Systems, O’Reilly, Chapter 9 - Design for Recovery, Footnote 18 ↩