330 Commits

Author SHA1 Message Date
Stefan Agner
399997e83c Set umask on swapfile creation (#2436)
Make sure the swapfile is only readable by the owner.
2023-03-28 18:18:58 +02:00
Stefan Agner
1edb5c8c9e Limit systemd-journald log size to 500MB (#2226) (#2435) 2023-03-28 18:18:27 +02:00
Stefan Agner
4744a2f123 Fix swapfile creation for all memory sizes (#2427)
* Fix swapfile creation for all memory sizes

In certain situation awk prints the swapfile size in scientific
notation. The script can't deal with that, in which case swap file
creation fails.

Use int to convert the number to an integer.

Since pages are 4k, also make sure swapsize is aligned to 4k blocks.

* Add info message
2023-03-27 09:34:38 +02:00
Stefan Agner
a8f6f7aa43 Don't kill ssh connection on OOM (#2424)
By default systemd kills the service which causes an OOM. That make
sense for a typical service, however, for SSH we don't want this
behavior: The connection should continue, just the command which caused
OOM should be killed.
2023-03-23 21:45:57 +01:00
Stefan Agner
75dcb932f8 Use zswap instead of swap in zram (#2420)
* Use zswap instead of swap in zram

This requires a swap file which will get generated automatically on
startup.

* Fix file size and free disk space comparison

* Set zswap factor to 33%

* Set vm.swappiness to 1

Decrease swapping to a minimum. This is also recommended for database
work loads by the MariaDB documentation. In practice it causes the least
amount of writes to disk when under memory pressure, while still making
swap available when needed.
2023-03-22 11:08:05 +01:00
Stefan Agner
787fc22f83 Avoid moving data to same device (#2412)
* Avoid moving data to same device

When a data disk move is triggered when the data disk is already in use
the script currently renames that only data disk, rendering the system
unusable.

Don't continue if source and destination happens to be the same device.

* On failure rename to hassos-data-fail

The label hassos-data-failed is too long.
2023-03-15 22:47:31 +01:00
Stefan Agner
5200096c4e Deactivate any external data disk device on first boot (#2390) (#2410)
* Deactivate any external data disk device on first boot (#2390)

* Use lsblk to determine the underlying device file

Comparing major number is not reliable, e.g. virtio disks have the same
major number despite being different devices. Use lsblk to find the
underlying device, and compare the device name instead.
2023-03-15 14:16:11 +01:00
Stefan Agner
ca6bccbfa9 Use new containerd.sock location of Docker 23.0 (#2382) 2023-03-03 18:07:29 +01:00
Stefan Agner
a69f94803b Increase net.core.optmem_max for OTBR (#2375)
The OTBR install scripts by default increases the net.core.optmem_max
ancillary buffer size to 64KiB to allow for a larger number of multicast
groups. Arch Linux as well recommends this size for high speed network
links.
2023-03-02 00:06:42 +01:00
Stefan Agner
b8a00ecbfa Symlink firmware update directory to Supervisor writeable location (#2225) 2023-03-01 00:36:22 +01:00
Stefan Agner
9bd101431e Revert bridge support (#2345)
The bridge support is not complete and causes issues in Supervisor.
Supervisor first needs proper support for it before we can deploy it in
Operating System.

See also: https://github.com/home-assistant/supervisor/pull/4133
2023-02-22 12:08:13 +01:00
xonestonex
6d8faa90a7 WiFi Access Point / HotSpot management in NetworkManager (#2304)
* Enable wpa_supplicant access point funtionality, to allow NetworkManager to manage WiFi interfaces as HotSpots or access points.

* Add an exception, to allow NetworkManager to manage bridge interfaces whose name starts with 'bridge'.

* Update buildroot-external/rootfs-overlay/etc/NetworkManager/NetworkManager.conf

Co-authored-by: Stefan Agner <stefan@agner.ch>

Co-authored-by: Stefan Agner <stefan@agner.ch>
2023-01-23 14:00:04 +01:00
Stefan Agner
970bec6ac3 Use same class B network for Docker as Supervisor (#2246) (#2259)
Use a subnet in the same class B network for the Docker default bridge
is using. This avoids conflicting with more than one class B network.
2022-12-10 23:58:42 +01:00
Stefan Agner
eaeac710eb Enable experimental APIs for Bluetooth daemon (#2251)
To get access to the experimental advertisement monitor api
experimental mode is required. This eanbles the experimental D-Bus API
by default.

See also: https://github.com/hbldh/bleak/pull/884
2022-11-29 11:31:59 +01:00
Stefan Agner
ce16ee5d49 Decrease network size of Docker default bridge (#2246) 2022-11-25 13:24:45 +01:00
Mike Degatano
0404657e74 Connectivity check interval to 10 minutes (#2127) 2022-09-12 22:16:17 +02:00
Stefan Agner
41b452ff48 Fix Docker key.json corruption check (#2125)
* Fix Docker key.json corruption check

Since /etc/docker does not get bind mounted anymore (see #2116),
key.json from the overlay partition is used directly.

* Use -e flag for jq to get useful exit code
2022-09-12 22:16:00 +02:00
Stefan Agner
a6445af712 Remvoe image name from journald tag/identifier (#2118)
The image name is stored in a separate field IMAGE_NAME as well. This
allows to use the container name (e.g. `hassio_supervisor`) to get logs
of all Supervisors independent of the image name (which differs for
every version).
2022-09-08 12:20:02 +02:00
Stefan Agner
66c15adbbf Move Docker configuration to daemon.json (#2116)
This is more readable than passing arguments to the daemon directly. It
also shortens the ExecStart command significantly, which is stored in
every log entry in systemd-journald.
2022-09-07 19:13:47 +02:00
Stefan Agner
cf11b5a745 Try using old image name of the Supervisor image (#2111) (#2113)
* Try using old image name of the Supervisor image

* Tag the old image with the new name so recreation works
2022-09-06 20:28:24 +02:00
Stefan Agner
b1df44421b Bump commit interval to 30s (#2103)
A higher file system commit interval can help to decrease the amount of
writes. In tests, a commit interval of higher than 30s seems not to help
much in practice. Settle with 30s for now.
2022-09-02 15:23:38 +02:00
Stefan Agner
303f63c222 Add access to containerd for Supervisor (#2102)
Add direct access to Docker's containerd instance by passing in its GRCP
socket. This can be useful to talk to the containerd GRPC API directly,
which exposes more information than the Docker API (e.g. OOM kill
events).
2022-09-02 14:49:25 +02:00
Stefan Agner
ed554f2a39 Check free disk space before starting Docker (#2097)
It seems that Docker can fail to start if there is no space left on the
device. Try to free up some space in that case by asking journald to
limit its size to 256MiB.

This should work for any storage larger than ~2.5GiB (as the journals
maximum size is 10% of the disk size). It still should leave enough
logs to diagnose problems if necessary.

Note: We could also limit the size of the journal in first place, but
that isn't sustainable: Once that space is used up, we run into the
same problem again.

By only asking journalctl to free up if necessary, we kinda (miss)use
the journal as way to "reserve" some space which we can free up at boot
if necessary.
2022-08-31 23:04:23 +02:00
Stefan Agner
cd5e42341d Start dropbear earlier (#2083)
This can be helpful when debugging HAOS issues. Dropbear is only started
for users which actually enabled it by configuring a SSH key, so this
change won't have an effect for most people.
2022-08-25 00:09:13 +02:00
Stefan Agner
ea5acb0950 Fix delaying systemd-timesyncd start correctly (#2082)
Unfortunately, orderings like Before= cannot be overriden by vendor
settings. This is mentioned in "Example 2. Overriding vendor settings"
on https://www.freedesktop.org/software/systemd/man/systemd.unit.html.

Correctly fix ordering by overriding the entire unit.
2022-08-24 23:02:09 +02:00
Stefan Agner
f8c8198bb9 Fix delaying systemd-timesyncd start (#2069)
* Fix delaying systemd-timesyncd

Setting WantedBy=time-sync.target in a service.d config file does not
clear previous assignments of WantedBy. This caused the services to still
be pulled in by the sysinit.target, causing a ordering cycle and the
system to not start essential services.

* Remove sysinit.target from Before ordering
2022-08-18 15:51:07 +02:00
Stefan Agner
7a693bed46 Delay systemd-timesyncd start after network is deemed online (#2068)
With commit 2d3119ef22 ("Delay Supervisor start until time has been
sychronized (#1360)") systemd-time-wait-sync.service got enabled, which
waits until systemd-timesyncd synchronizes time with a NTP server.

By default systemd-timesyncd.service and systemd-time-wait-sync.service
are pulled in by sysinit.target. This starts the services before full
network connectivity is established. The first sychronization fails and
systemd-timesyncd only retries after a ratelimit mechanism times out.
This causes a dealy of 30s during startup. While systemd-timesyncd has
a mechanism to (re)try time synchronization when network becomes
online, it seems that those only work properly when systemd-networkd
is used, see also https://github.com/systemd/systemd/issues/24298.

Simply reordering systemd-timesyncd.service after network-online.target
does not work as it causes circular dependencies (NetworkManager itself
depends ultimately on the sysinit.target).

With this change, the services are only pulled in by time-sync.target.
That allows to order the service after network-online.target. With that
the first synchronization succeeds.

This mechanism also works when a NTP server is provided through DHCP.
In that case, a the systemd-timesyncd service is started by the dispatch
script /usr/lib/NetworkManager/dispatcher.d/10-ntp before the systemd
even considers starting the service. Tests show that the default
fallback NTP is not contacted, only the DHCP provided service.
2022-08-17 18:51:35 +02:00
Pascal Vizeli
05778a2d32 Support IPv6 NAT (#2051)
* Support IPv6 NAT

* Add experimental

* Enable IPv6 NAT in kernel configuration

Co-authored-by: Stefan Agner <stefan@agner.ch>
2022-08-12 17:43:49 +02:00
Stefan Agner
7729db1e11 Synchronize network time quicker on bootup (#2057)
Currently systemd-timesyncd tries to connect to the NTP server quite
early at boot-up. At this time the network connection has not been
established yet. This causes resolving the NTP server to fail and
a rate limit kicks in which makes systemd-timesyncd wait for 30s until
the next attempt.

Lowering the retry attempt to 10s makes systemd-timesyncd connecting
shortly after.

Note: The rate limit is 10 attempts per 10s. Because the attempts are
immediately exhausted lowering connection retry attempt below 10s
adds no benefit.

See also: https://github.com/systemd/systemd/issues/24298
2022-08-12 17:43:26 +02:00
Stefan Agner
2d8ec0c8ee Use dbus-broker as default D-Bus broker (#2053)
* Bump buildroot

* buildroot 99b62b8bd3...97287bbebf (3):
  > package/dbus-broker: bump to release 32
  > package/dbus-broker: new package
  > Merge pull request #3 from home-assistant/2022.02.x-haos-cgroup-v2

* Use dbus-broker as default D-Bus broker

The dbus-broker (Linux D-Bus Message Broker) aims to be a high
performance and reliable D-Bus broker which can be used as a drop in
replacement to the reference implementation D-Bus broker. In tests it
showed significantly better performance especially when routing BLE
messages.

* Allow dbus-broker to start early

For HAOS device wipe feature we need haos-agent.service and
udisk2.service early. Both require a working D-Bus broker.
The options PrivateTmp and PrivateDevices add additional After=
orderings which doesn't allow dbus-broker to be started early.

* Fix D-Bus dependency

D-Bus services should just depend on dbus.socket.
2022-08-10 17:01:02 +02:00
Stefan Agner
5d0a61fafc Set lower OOM Score for Supervisor (#2050)
* Set lower OOM Score for Supervisor

* Adjust OOM for Docker daemon
2022-08-10 13:56:45 +02:00
Stefan Agner
4d9b604c04 Use Control Group v2 (#1329)
* Disable real-time scheduling

It seems that Linux' cgroup v2 currenlty does not support RT scheduling.

* Remove Supervisor RT support flag

With CGroups v2 we can no longer support CPU resource allocation for
realtime scheduling.

* Bump OS Agent to 1.3.0 for CGroups v2 support
2022-08-09 11:29:12 +02:00
Joakim Sørensen
4da0ad7da2 Fix ghcr URL (#2014)
Co-authored-by: Franck Nijhof <git@frenck.dev>
2022-07-09 23:24:57 +02:00
Stefan Agner
5932f1212e Increase Supervisor start rate limit (#2010)
A faster restart policy is unlikely to help. Increasing the limit makes
it less likely to run into cloud service rate limits (e.g. container
registry).
2022-07-08 22:35:52 +02:00
Stefan Agner
0139030404 Use GitHub Container Registry for Supervisor (#2005) (#2009)
* Use GitHub Container Registry (#2005)

* Tag with ghcr.io prefix
2022-07-08 16:33:04 +02:00
Joakim Sørensen
11df4745e7 Use checkonline instead of version for connectivity check (#1991) 2022-06-27 16:30:05 +02:00
Stefan Agner
26bca2666d Remove key.json file if it appears to be corrupted (#1706) (#1988)
* Remove key.json file if it appears to be corrupted (#1706)

* Check with jq if key.json is parsable
2022-06-25 09:30:20 -07:00
Stefan Agner
720f604f98 Increase maximum socket receive and send buffer size (#1964) (#1968)
Some applications try to increase the buffers for performance reason. The
QUIC Go implementation for instance tries to request a 2048 kiB buffer
size.

The kernel default depends on skubuf size (which is architecture
dependent), but it is memory size independet and typically around 200 kiB
(see [1]).

Other network tuning guides suggest 16MiB for 1GB ethernet, as well as
changing the default as well as maximum bufffer size (see [2]). This
conservatively increases the maximum buffer size to 4MiB.

[1]: https://elixir.bootlin.com/linux/v5.15.45/source/include/net/sock.h#L2742
[2]: https://nateware.com/2013/04/06/linux-network-tuning-for-2013/
2022-06-08 16:17:49 +02:00
Stefan Agner
0c9cfde907 Make NTP dispatch script executable (#1798) (#1933)
* Make NTP dispatch script executable (#1798)

* Address shellcheck issues
2022-05-19 21:47:27 +02:00
Stefan Agner
d284a1dee7 Small Yellow improvements (#1875)
* Enable additional LED triggers

* Improve Yellow device tree

Fix soundcard name and use BTN_1 as key code.

* Add input-event-daemon configuration

Add minimal input-event-daemon configuration to avoid the default
configuration taking effect. This minimal configuration triggers
the USB configuration import on button press.
2022-04-27 22:51:55 +02:00
Stefan Agner
a9c2eedc71 Handle long-press system keys only (#1874)
Only handle long-press system power, reboot and sleep keys.
2022-04-27 20:33:39 +02:00
Stefan Agner
4310cfe916 Enable file system check for FAT boot partition (#1857) 2022-04-20 14:06:56 +02:00
Stefan Agner
d9ec603164 Enable IPv6 forwarding by default (#1832)
Enable IPv6 forwarding by default which is useful to run IPv6 based
OpenThread Border Router.

Currently Docker enables IPv4 forwarding by default. Enabling IPv6
support will enable IPv6 routing as well, but we are not ready yet to
enable IPv6 support for Docker at this point.

Enabling IPv6 forwarding should be harmless as there are no IPv6
addresses configured internally and Home Assistant OS is not typically
dual-homed. In cases where it is dual-homed (e.g. VPN), routing is
often used and firewalling is setup as part of that add-on.
2022-04-07 13:24:13 +02:00
Stefan Agner
99be958c4f Drop NetworkManager default config (#1813)
* Drop default NetworkManager configuration

NetworkManager will automatically connect using the global defaults.
Also Supervisor today will create a profiles once the user configures
the network explicitly.

* Create system-connection directory
2022-03-25 08:53:30 +01:00
redgryphon
f62fee2ff7 Add support for NTP configuration via DHCP (fixes #689) (#1798)
* Add support for NTP configuration via DHCP.

* Default fallback NTP pool is the Cloudflare's one
2022-03-21 00:40:43 +01:00
Stefan Agner
f509e9ce5d Shutdown HA CLI properly (#1768)
Drop IgnoreOnIsolate to make sure the service is shutdown during
shutdown.
2022-02-25 19:17:57 +01:00
Stefan Agner
d1cc7394b5 Use GRUB bootloader for all UEFI platforms (#1762)
* Use GRUB bootloader for all UEFI platforms
* Introduce and use file_env command
* Compress squashfs for aarch64 as well
2022-02-24 13:42:17 +01:00
Mark Dietzer
0f4016c180 Add support for AArch64/ARM64 EFI architecture (#1757)
* Add AArch64/ARM64 EFI boot support (for QEMU and some boards)
* Allow GRUB to load cmdline.txt-like
* Enable qcow2/vmdk disk images

Co-authored-by: Stefan Agner <stefan@agner.ch>
2022-02-23 10:42:02 +01:00
Jens Maus
c1bd178021 Integrate dual HomeMatic+HomeMatic IP support for HmIP-RFUSB (#1683)
* updated generic_raw_uart to latest version which comes with dualcopro
support for the HmIP-RFUSB usb rf-sticks by eQ3/ELV.

* remove 99-hmip-rfusb.rules to keep a HmIP-RFUSB device free from being
occupied by the cp210x driver but use the new generic_raw_uart support
instead allowing for advanced dualcopro support for HomeMatic/BidCos-RF
and homematicIP support in parallel.
2022-01-26 13:53:16 +01:00
Stefan Agner
5fd943c936 Expose systemd-journal-gatewayd to Supervisor (#1627)
* Add systemd-journal-remote to the image

This allows to access journald's log from within Supervisor and expose
more system logs to users.

* Allow to access systemd-journal-gatewayd from Supervisor

Create a systemd-journal-gatewayd.socket service using a Unix socket and
bind mount it into the Supervisor container. This allows to query
systemd-journald from Supervisor directly.
2021-11-04 15:38:35 +01:00