This article looks at the mount namespace and is the third in the Linux Namespace series. In the first article, I gave an introduction to the seven most commonly used namespaces, laying the groundwork for the hands-on work started in the user namespaces article. My goal is to build out some fundamental knowledge as to how the underpinnings of Linux containers work. If you're interested in how Linux controls the resources on a system, check out the CGroup series, I wrote earlier. Hopefully, by the time you're done with the namespaces hands-on work, I can tie CGroups and namespaces together in a meaningful way, completing the picture for you.
Add User To Sudoers Alpine Boot
For now, however, this article examines the mount namespace and how it can help you get closer to understanding the isolation that Linux containers brings to sysadmins and, by extension, platforms like OpenShift and Kubernetes.
Step #3: Add admin user to /etc/sudoers. You need to add yourself to /etc/sudoers file, enter: # visudo Grant vivek user full permission via sudo: vivek ALL=(ALL) ALL. Save and close the file. How do I use sudo? To become a root user and start root shell, enter: $ sudo -i OR $ sudo -s To run a command called ‘/sbin/service httpd restart.
[ You might also like: Sharing supplemental groups with Podman containers ]
- Jul 29, 2012 Step #3: Add admin user to /etc/sudoers. You need to add yourself to /etc/sudoers file, enter: # visudo Grant vivek user full permission via sudo: vivek ALL=(ALL) ALL. Save and close the file. How do I use sudo? To become a root user and start root shell, enter: $ sudo -i OR $ sudo -s To run a command called ‘/sbin/service httpd restart.
- So, the non-root user must have access to the folder where it wants to read and write data. Please follow the below steps for the same. Create user group and assign group ID in Dockerfile. Create user with user ID and add to the group in Dockerfile. Change ownership recursively for the folders the user process wants to read/write.
The mount namespace
The mount namespace doesn't behave as you might expect after creating a new user namespace. By default, if you were to create a new mount namespace with unshare -m
, your view of the system would remain largely unchanged and unconfined. That's because whenever you create a new mount namespace, a copy of the mount points from the parent namespace is created in the new mount namespace. That means that any action taken on files inside a poorly configured mount namespace will impact the host.
Some setup steps for mount namespaces
So what use is the mount namespace then? To help demonstrate this, I use an Alpine Linux tarball.
In summary, download it, untar it, and move it into a new directory, giving the top-level directory permissions for an unprivileged user:
The fakeroot
directory needs to be owned by the user container-user because once you create a new user namespace, the root user in the new namespace will be mapped to the container-user outside of the namespace. This means that a process inside of the new namespace will think that it has the capabilities required to modify its files. Still, the host's file system permissions will prevent the container-user account from changing the Alpine files from the tarball (which have root as the owner).
So what happens if you simply start a new mount namespace?
Now that you're inside the new namespace, you might not expect to see any of the original mount points from the host. However, this isn't the case:
The reason for this is that systemd
defaults to recursively sharing the mount points with all new namespaces. If you mounted a tmpfs
filesystem somewhere, for example, /mnt
inside the new mount namespace, can the host see it?
The host, however, doesn't see this:
So at the very least, you know that the mount namespace is functioning correctly. This is a good time to take a small detour to discuss the propagation of mount points. I'm briefly summarizing but if you are interested in a greater understanding, have a look at Michael Kerrisk's LWN article as well as the man page for the mount namespace. I don't normally rely so much on the man pages as I often find that they're not easily digestible. However, in this case, they are full of examples and in (mostly) plain English.
Linux Containers
Theory of mountpoints
Mounts propagate by default because of a feature in the kernel called the shared subtree. This allows every mount point to have its own propagation type associated with it. This metadata determines whether new mounts under a given path are propagated to other mount points. The example given in the man page is that of an optical disk. If your optical disk automatically mounted under /cdrom
, the contents would only be visible in other namespaces if the appropriate propagation type is set.
Peer groups and mount states
The kernel documentation says that a 'peer group is defined as a group of vfsmounts that propagate events to each other.' Events are things such as mounting a network share or unmounting an optical device. Why is this important, you ask? Well, when it comes to the mount namespace, peer groupsare often the deciding factor as to whether or not a mount is visible and can be interacted with. A mount state determines whether a member in a peer group can receive the event. According to the same kernel documentation, there are five mount states:
- shared - A mount that belongs to a peer group. Any changes that occur will propagate through all members of the peer group.
- slave - One-way propagation. The master mount point will propagate events to a slave, but the master will not see any actions the slave takes.
- shared and slave - Indicates that the mount point has a master, but it also has its own peer group. The master will not be notified of changes to a mount point, but any peer group members downstream will.
- private - Does not receive or forward any propagation events.
- unbindable - Does not receive or forward any propagation events and cannot be bind mounted.
It's important to note that the mount point state is per mount point. This means that if you have /
and /boot
, for example, you'd have to separately apply the desired state to each mount point.
In case you're wondering about containers, most container engines use private mount states when mounting a volume inside a container. Don't worry too much about this for now. I just want to provide some context. If you want to try some specific mounting scenarios, look at the man pages as the examples are quite good.
Creating our mount namespace
If you're using a programming language like Go or C, you could use the raw system kernel calls to create the appropriate environment for your new namespace(s). However, since the intent behind this is to help you understand how to interact with a container that already exists, you'll have to do some bash trickery to get your new mount namespace into the desired state.
First, create the new mount namespace as a regular user:
Once you're inside the namespace, look at the findmnt
of the mapper device, which contains the root file system (for brevity, I removed most of the mount options from the output):
There is only one mount point that has the root device mapper. This is important because one of the things you have to do is bind the mapper device into the Alpine directory:
This is because you're using a utility called pivot_root
to perform a chroot
-like action. pivot_root
takes two arguments: new_root
and old_root
(sometimes referred to as put_old
). pivot_root
moves the root file system of the current process to the directory put_old
and makes new_root
the new root file system.
IMPORTANT: A note about chroot
. chroot
is often thought of as having extra security benefits. To some extent, this is true, as it takes a more significant amount of expertise to break free of it. A carefully constructed chroot
can be very secure. However, chroot
does not modify or restrict Linux capabilities which I touched on in the previous namespace article. Nor does it limit system calls to the kernel. This means that a sufficiently skilled aggressor could potentially escape a chroot
that has not been well thought through. The mount and user namespaces help to solve this problem.
If you use pivot_root
without the bind mount, the command responds with:
To switch to the Alpine root filesystem, first, make a directory for old_root
and then pivot into the intended (Alpine) root filesystem. Since the Alpine Linux root filesystem doesn't have symlinks for /bin
and /sbin
, you'll have to add those to your path and then finally, unmount the old_root
:
You now have a nice environment where the user and mount namespaces work together to provide a layer of isolation from the host. You no longer have access to binaries on the host. Try issuing the findmnt
command that you used before:
You can also look at the root filesystem or attempt to see what's mounted:
Interestingly, there is no proc
filesystem mounted by default. Try to mount it:
Because proc
is a special type of mount related to the PID namespace you can't mount it even though you're in your own mount namespace. This goes back to the capability inheritance that I discussed earlier. I'll pick up this discussion in the next article when I cover the PID namespace. However, as a reminder about inheritance, have a look at the diagram below:
- 2Install packages
- 3Configure xorg-server (optional)
- 10Troubleshooting
Start by booting up Alpine (see these instructions on how to do that)
When Alpine is up and running, do the initial setup.
# setup-alpine
Ensure the 'community' repository is enabled in /etc/apk/repositories. Edit the file using vi, and uncomment the line with community at the end.
# vi /etc/apk/repositories
Update the local copies of the repositories.
# apk update
Run the setup-xorg-base script to install the xorg base packages and to replace mdev with udev. We can also install xfce4 and the selected packages while here.
This might take a few minutes depending on your network speed.
# setup-xorg-base xfce4 xfce4-terminal lightdm-gtk-greeter xfce4-screensaver dbus-x11 sudo
Video packages
You will most likely want to install a package suitable for your video chipset and input devices. Otherwise, X will resort to the rather slow and cumbersome VESA standard driver.
To see available video driver packages run:
$ apk search xf86-video
For example, if you have an Sis video chipset install 'xf86-video-sis', for Intel video chipset install 'xf86-video-intel'.
# apk add xf86-video-sis
and / or
Alpine Add User To Sudoers File
# apk add xf86-input-synaptics
Use xf86-video-modesetting for Qemu/KVM guests.
Use xf86-video-vmware and xf86-video-vboxvideo for Virtualbox/VMware guests.
Use xf86-video-fbdev for Hyper-V guests.
Use xf86-video-geode for Alix1D.
Input packages
To search for xf86-input driver packages run:
$ apk search xf86-input
As good choice for the start is:
# apk add xf86-input-mouse xf86-input-keyboard
kbd may help if the Numlock service is added but does not start, or if 'setleds not found' during the boot sequence:
# apk add kbd
On most systems, xorg should be able to autodetect all devices. However you can still configure xorg-server by hand by launching:
# Xorg -configure
This will result in `/root/xorg.conf.new`. You can modify this file to fit your needs.
(When finished modifying and testing the above configuration file, move it to `/etc/X11/xorg.conf` for normal usage.)
Keyboard Layout
If you use a layout different than 'us', you need to:
# apk add setxkbmapsetxkbmap <%a language layout from /usr/share/X11/xkb/rules/xorg.lst%>
In order to make it persistent add this section to /etc/X11/xorg.conf:
Section 'InputClass' Identifier 'Keyboard Default' MatchIsKeyboard 'yes' Option 'XkbLayout' '<%a language layout from /usr/share/X11/xkb/rules/xorg.lst%>'EndSection
Another way to change the keymap when logging into X is to use ~/.xinitrc. The following example loads a British keymap, simply add this line to the beginning of the file:setxkbmap gb &
Note that you will need the 'setxkbmap' package for this to work!In addition you if you need to create the ~/.xinitrc file, add a second line like exec startxfce4
Create a normal user account.
# adduser -g 'Natanael Copa' ncopa
Optionally, give that user sudo permissions in /etc/sudoers. When doing so, it is important to use the command:
visudo
This ensures that only one user is changing the file at any given time. Visudo has two modes: Command mode and Insert mode. To edit the file, use the arrows to navigate to the appropriate line and enter Insert mode by pressing the 'i' key. To save and exit, enter Command mode by pressing the 'Esc' key, then ':w' + 'enter' to save, and finally ':q' + 'enter' to quit.
You may want to add the home directories to the lbu captures:
#lbu include /home
Depending on your setup procedure dbus probably isn't running at this point, which will lead to issues like missing icons and keyboard shortcuts.
Start dbus first.
# rc-service dbus start
You will likely also want dbus to start on boot.
# rc-update add dbus
Start lightdm and log in with your new user.
# rc-service lightdm start
Once you have verified that it actually works you can make lxdm start up at boot:
# rc-update add lightdm
If you're missing icons on menus and bars install a theme:
# apk add faenza-icon-theme
In order to allow the user to shut down the machine or reboot either elogind and polkit-elogind, or polkit and consolekit2 needs to be installed:
# apk add elogind polkit-elogind
Or:
# apk add polkit consolekit2
For browsing of network shares within XFCE that works seamlessly with file associations, you can install gvfs-fuse and the gvfs packages for the network protocols you use. For instance, for SMB:
# apk add gvfs-fuse gvfs-smb
Presently (3.11), the OpenRC script for fuse is a separate package. However, it may be sufficient for GVfs to initiate the fuse kernel module:
# apk add fuse-openrc
Then you can manually start the fuse service (you'll need to restart any XFCE sessions already in progress -- you can log them out and log in again):
# rc-service fuse start
You can set the fuse service to start up automatically at boot:
# rc-update add fuse
To enable automatic mounting of USB drives, install these packages:
# apk add thunar-volman udisks2
Also, make sure that mounting is enabled in
Thunar>Edit>Preferences>Advanced>Volume Management>Configure>Storage>Removable Storage
Packages below optional depending on what USB media you intend to mount:
ntfs-3g: NTFS supportgvfs-mtp: media players and mobile devices that use MTPgvfs-gphoto2: digital cameras and mobile devices that use PTPgvfs-afc: Apple mobile devices
If you are unable to login, check /var/log/lxdm.log, there may be output there from X to indicate failed modules, etc.
If your mouse / keyboard is not responding, try to install xf86-input-evdev (that will appeared in lxdm.log if you lack it). Or you can try to disable hotplug.
If you Xorg server segfaults in kvm/qemu then add nomodeset as a boot option when booting up.
If you are unable to login, or you see an error 'Failed to execute login command', you should check ~/.xinitrc (if you're using .xinitrc) with your preferred text editor (vi, nano, etc) and ensure that it is set to boot into xfce. To do this, the 'exec' line (usually the last line in the file) should read 'exec startxfce4'. If ~/.xinitrc does not exist, create it and add the exec line. this command will do it:
$ echo 'exec startxfce4' >> ~/.xinitrc
Compositor
If you login to xfce once, logout, and then login again, and your panel and windows disappear or start flickering, this is because xfce is writing a default config file with the compositor enabled, but does not enable it during your first login. Clear out the ~/.config/xfce directory, and login as 'first time' again, as the default vblank setting for the compositor is likely incorrect. Open the windows manager tweaks and dconf editor (or use dconf-query) before you log out. Tick the compositor to off in the window manager tweaks ui. If you have a recent enough xfce (4.14) there is a ui in window manager tweaks to set syncing mode, and you can try different values, such as vblank, xpresent, and glx, while turning the compositor on and off, until you find one that works. Or, from dconf editor, you can set xfwm4 /general/vblank_mode, which you will find is set to 'auto' by default, and then turn the compositor on again. This can also be accomplished from the command line using using:
xfconf-query -c xfwm4 -p /general/vblank_mode -s mode
where mode is vblank, glx or xpresent.
You have to use xfconf-query from within the xfce terminal session, or at least with the xfce settings daemon started.
- Install X-Window in Alpine Linux Joachim Nilsson 2017