* Drop default NetworkManager configuration
NetworkManager will automatically connect using the global defaults.
Also Supervisor today will create a profiles once the user configures
the network explicitly.
* Create system-connection directory
* Add AArch64/ARM64 EFI boot support (for QEMU and some boards)
* Allow GRUB to load cmdline.txt-like
* Enable qcow2/vmdk disk images
Co-authored-by: Stefan Agner <stefan@agner.ch>
* updated generic_raw_uart to latest version which comes with dualcopro
support for the HmIP-RFUSB usb rf-sticks by eQ3/ELV.
* remove 99-hmip-rfusb.rules to keep a HmIP-RFUSB device free from being
occupied by the cp210x driver but use the new generic_raw_uart support
instead allowing for advanced dualcopro support for HomeMatic/BidCos-RF
and homematicIP support in parallel.
* Add systemd-journal-remote to the image
This allows to access journald's log from within Supervisor and expose
more system logs to users.
* Allow to access systemd-journal-gatewayd from Supervisor
Create a systemd-journal-gatewayd.socket service using a Unix socket and
bind mount it into the Supervisor container. This allows to query
systemd-journald from Supervisor directly.
Currently the hassos-apparmor.service wants the
hassos-supervisor.service and vice-versa. This is unnecessary and leads
to activation of hassos-supervisor.service when reload/restart
hassos-apparmor.service (Supervisor is doing that on startup).
Make hassos-apparmor.service independent and add dependency as well as
ordering from hassos-supervisor.service side.
* Avoid duplicate log entries
So far the hassos-supervisor.service starts the hassos-supervisor script
which in turn attaches to the Supervisor container. This causes stdout
and stderr to be forwarded to the service unit, which in turn logs it in
the journal.
However, Docker too logs all stdout/stderr to the journal through the
systemd-journald log driver.
Do not attach to the Supervisor container to avoid logging the
Supervisor twice.
Note that this no longer forwards signals to the container. However, the
hassos-supervisor.service uses the ExecStop= setting to make sure the
container gets gracefully stopped.
* Use image and container name as syslog identifier
By default Docker users the container id as syslog identifier. This
leads to log messages which cannot easily be attributed to a particular
container (since the container id is a random hex string).
Use the image and container name as syslog identifier.
Note that the Docker journald log driver still stores the container id
as a separate field (CONTAINER_ID), in case the particular instance need
to be tracked.
Since we start the HomeAssistant shell directly on tty the service
responsible for starting did not restart the shell on exit. Remove the
RemainAfterExit flag to make sure that the shell restarts on exit.
When using quotes currently, they are not passed to HA due to $*. This
doesn't allow to use some commands properly, e.g. snapshot restore with
a passwort with spaces:
```
ha snapshot restore c31f3c93 --password "test test"
...
time="2021-05-26T11:24:19+02:00" level=fatal msg="Error while executing rootCmd: accepts 1 arg(s), received 2"
```
Properly pass all arguments using $@ in quotes.
* Recreate Supervisor container on OS upgrade/downgrade
When the operating system gets upgraded or downgraded the Supervisor
start script might start the Supervisor slightly differently (e.g. with
RT scheduling support). If the container has already been created, a OS
upgrade or downgrade won't recreate the Supervisor container.
* Move startup script version file to /mnt/data
* Start ha-cli on tty1 instead of a getty
Instead of starting a getty start the ha-cli directly. This will show
the banner right on startup with the important information such as IP
address of the instance or the URL to reach it.
* Use default shell as root shell instead of HA CLI
Instead of using the ha-cli.sh script as login shell use the regular
shell. Amongst other things, this allows to run VS Code devcontainers
remotely via SSH or using scp. The HA CLI is still available using the
`ha` command.
* Enable systemd-time-wait-sync.service by default
Enable the systemd-time-wait-sync.service by default. This allows to use
the time-sync.target which allows to make sure services only get started
once the time is synchronized.
* Make sure time is synchronized when starting hassos-supervisor.service
Use the time-sync.target to make sure that the Supervisor gets stsarted
after the time has been synchronized.
* Set timeout for systemd-time-wait-sync.service
Don't delay startup forever in case time synchronization doesn't work.
This allows to boot the system even without Internet connection.
The latest version of OS Agent sets haos.wipe=1 as kernel argument to
trigger a device wipe. Let systemd pickup this kernel command line
argument and start haos-wipe.service.
This rather complex architecture allows to add other triggers in the
future, e.g. a button read in the boot loader.
* Add --cpu-rt-runtime to allow Docker allocate real-time CPU time (#1235)
* Enable Supervisor's CPU bandwith allocation feature (#1235)
Since we have CONFIG_RT_GROUP_SCHED enabled in the Home Assistant OS
kernel the Supervisor needs to enable CPU bandwith allocation for
Add-Ons which need real-time scheduling. Set the appropriate environment
variable.
It seems that Busybox shell (ash) cannot calculate the disk size
properly probably due to integer overflow. Use jq to calculate the last
usable LBA which seems to be able to handle large integers.
There are incident reports on the internet where poeple report that
fsck.(v)fat actually leads to problems rather file system fixes. Around
the time when Home Assistant OS added fsck.fat for the boot partition,
reports of empty boot partitions or file with weired filenames started
to appear. This could be caused by fsck.fat.
Disable fsck on the boot partition.
Use udev rules to set the CPU online. For memory, we let the kernel
bring memory online automatically. This is preferred as udev rule
processing might be delayed in a low memory situation, see:
https://lwn.net/Articles/668944/
Partition handling for disks with 4k sectors broke partition resizing
when using MBR disk label. It seems that sfdisk doesn't calculate the
last LBA for diks with MBR label. Calculate the last usable LBA ourselfs
in the MBR case.
The calculation whether to resize the partition only works with disks
with 512 byte sector size. Use values provided by sfdisk exclusively to
make sure comparing the same sector size.
Furthermore, it seems that sgdisk does not like sfdisk's backup GPT
placement:
$ sgdisk -e /dev/zram1
Warning! Secondary partition table overlaps the last partition by 250 blocks!
Today it seems sfdisk can handle GPT quite well. Use sfdisk for all
operations in hassos-expand.
The supervisor container requires the "hassio-supervisor" AppArmor
profile. Make sure our AppArmor service hassos-apparmor is a dependency
of the hassos-supervisor.service.
* Use systemd-growfs instead of resize2fs (#1106)
Since systemd 236 systemd has a built-in file system growing mechanism.
The mechanism relies on the kernels online file system resize
capabilities instead of the external resize2fs utility. Online resizing
is supposedly much faster since the kernel takes care of things.
This also makes sure that external file systems get resized which
previously have not been taken care of.
* Drop HA OS specific file system resizing
Since we have systemd-growfs in place now we can drop our file system
resizing code.
* Make sure /dev/disk/by-label/hassos-data is present after resizing
Note: systemd will retry mnt-data.mount later, so at least in theory
this shouldn't really matter. However, the journal has a lot of churn
due to that reordering.
When we write the update to the boot partiton, there is nothing which
makes sure that data is written to disk. This leaves a rather large
window (probably around 30s) where a machine reset/poweroff can lead
to a corrupted boot partition. Use the sync mount option to minimize the
corruption window.
Note that sync is not ideal for flash drives normally. But since we
write very little and typically only on OS update to the boot partition,
this shouldn't be a problem.
* Avoid waiting for external drive unnecessarily
Even though the condition to start hassos-data.service is not met (the
file /mnt/overlay/data-move is not there by default), it seems that
systemd waits for the dependencies for hassos-data.service. Don't
Require or Wants any dependencies which might not be present by
default.
* Use systemd to wait for partition using partlabel device
* Use sfdisk which allows to wipe filesystem signatures
Even though we zap the partition table using sgdisk, the file system
superblock (which contains the file system label) does survive. This
can cause problems when trying to reuse a disk previously already
labeled using hassos-data: It might take precendence on next boot
over the existing data partition on the eMMC.
Make sure to clean all file system signatures using sfdisk.
* Make the datactl command more robust
Validate target disk (partition) size to avoid a copy attempt which will
fail. If e2image operation fails, make sure the leftover copy is not
regonized as data partition.
* Fix hassos-data service device unit dependencies
In case the data partition is missing avoid using the Docker command.
The Docker command triggers a socket activation, which in turn makes
systemd wait for the data partition. This blocks entry into the shell
forever.
Just enter the shell in case data partition is not mounted.
* Rewrite datactl command
Prepare the target partition as part of the datactl command. Rely on
partlabel for the target disk since we are always using GPT on the
target disk. Use systemd and partlabel mechanism to wait and find
the target data disk. Keep using the file system label to identify
the source disk.
Also use e2image instead of raw dd to move data. This should
speed up the processes significantly.
* Fix corner case when reusing same disk again
* Remove busybox Linux module support
Since systemd relies on the upstream Linux kernel module handling
utility "kmod" the busybox implementations are not required. Already
today the official "kmod" utility takes precedence:
haos # ls -la /usr/sbin/*mod*
lrwxrwxrwx 1 root root 11 Nov 11 11:32 /usr/sbin/depmod -> ../bin/kmod
lrwxrwxrwx 1 root root 11 Nov 11 11:32 /usr/sbin/insmod -> ../bin/kmod
lrwxrwxrwx 1 root root 11 Nov 11 11:32 /usr/sbin/lsmod -> ../bin/kmod
lrwxrwxrwx 1 root root 11 Nov 11 11:32 /usr/sbin/modinfo -> ../bin/kmod
lrwxrwxrwx 1 root root 11 Nov 11 11:32 /usr/sbin/modprobe -> ../bin/kmod
lrwxrwxrwx 1 root root 11 Nov 11 11:32 /usr/sbin/rmmod -> ../bin/kmod
* Move modprobe configuration alsa-base.conf to correct location
The official modprobe package from kmod checks three locations:
/etc/modprobe.d/, /lib/modprobe.d/ and /run/modprobe.d/. Since usr-move
/lib is a symlink to /usr/lib, the correct location for distribution
provided modprobe files is /usr/lib/modprobe.d.
* Use /run as default location for lock files for U-Boot tools
While there is a command line parameter to set the lock file explicitly,
there are other tools invoking fw_setenv (in particular rauc) which do
not set the lock file. Using /run by default makes fw_setenv use the
correct lock file in all situations.
* Don't explicitly set lock file location
Since we patch U-Boot tools to use /run by default setting it explicitly
is unnecessary.
* Update buildroot-patches for 2020.11-rc1 buildroot
* Update buildroot to 2020.11-rc1
Signed-off-by: Stefan Agner <stefan@agner.ch>
* Don't rely on sfdisk --list-free output
The --list-free (-F) argument does not allow machine readable mode. And
it seems that the output format changes over time (different spacing,
using size postfixes instead of raw blocks).
Use sfdisk json output and calculate free partition space ourselfs. This
works for 2.35 and 2.36 and is more robust since we rely on output which
is meant for scripts to parse.
* Migrate defconfigs for Buildroot 2020.11-rc1
In particular, rename BR2_TARGET_UBOOT_BOOT_SCRIPT(_SOURCE) to
BR2_PACKAGE_HOST_UBOOT_TOOLS_BOOT_SCRIPT(_SOURCE).
* Rebase/remove systemd patches for systemd 246
* Drop apparmor/libapparmor from buildroot-external
* hassos-persists: use /run as directory for lockfiles
The U-Boot tools use /var/lock by default which is not created any more
by systemd by default (it is under tmpfiles legacy.conf, which we no
longer install).
* Disable systemd-update-done.service
The service is not suited for pure read-only systems. In particular the
service needs to be able to write a file in /etc and /var. Remove the
service. Note: This is a static service and cannot be removed using
systemd-preset.
* Disable apparmor.service for now
The service loads all default profiles. Some might actually cause
problems. E.g. the profile for ping seems not to match our setup for
/etc/resolv.conf:
[85503.634653] audit: type=1400 audit(1605286002.684:236): apparmor="DENIED" operation="open" profile="ping" name="/run/resolv.conf" pid=27585 comm="ping" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Drop AVAHI and use systemd-resolved to announce hostname via mDNS
and LLMNR. Also continue to offer the _workstation._tcp.local service
since it is used by the CoreDNS mDNS plug-in.
In case a container image is corrupted `docker inspect` might fail:
# docker inspect --format='{{.Id}}' "${SUPERVISOR_IMAGE}"
Error response from daemon: readlink /mnt/data/docker/overlay2: invalid argument
In that same state the `docker images` command still shows the images.
Since `docker inspect` returns an error SUPERVISOR_IMAGE_ID will be empty
and a simple `docker pull` will be attempted. That does not suffice to
recover from a corrupted container image.
Use `docker images` to get the image ids and make sure to delete all
image ids found by that command.
Also don't use RuntimeDirectory since it deletes the runtime directory
between the service start attempts which defeats the purpose.
* Simplify self healing capabilities of Supervisor service
Instead of relying on time based information on how long the container
has been running use a startup marker file to infer if the last startup
has been successful.
* Update buildroot-external/rootfs-overlay/usr/sbin/hassos-supervisor
Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch>
Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch>
* automatically fsck to repair partitions
* add fsck.fat so rpi boot partition can be repaired
* Use Wants= instead of Requires=
Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch>
* add dosfstools to all images
* run hassos-data and hassos-expand after fsck
Co-authored-by: Pascal Vizeli <pascal.vizeli@syshack.ch>
The Docker socket path is /run/docker.sock. Also only one path can be
used per property. This fixes the supervisor service, which currently
refuses to start due to missing Docker socket.